DeerFlow 2.0: The Framework-Shaped Agent
DeerFlow (Deep Exploration and Efficient Research Flow) is the most composable agent in this set — not a single CLI persona, but a LangGraph-based runtime that orchestrates sub-agents, memory, sandboxes, and extensible skills to do almost anything.
What makes DeerFlow different
While Claude Code reads like a bespoke product and Crush reads like a polished Go terminal app, DeerFlow reads like a framework for building agent systems. It was originally a deep research tool by ByteDance but was completely rewritten from scratch into v2.0 — a general-purpose orchestration runtime with FastAPI gateway, LangGraph server, and a Next.js frontend.
Two startup modes
Standard mode: Separate FastAPI gateway + LangGraph server (4 containers). Gateway mode (experimental): Embeds the agent runtime directly inside the gateway process, eliminating the LangGraph server and reducing to 3 containers. This also eliminates the need for a LangGraph Platform license.
The 14-layer middleware stack
Every agent turn in DeerFlow passes through a fixed-order middleware chain. No other repo in this set has a composable middleware architecture this deep; most handle these concerns inline or not at all.
| Order | Middleware | Purpose |
|---|---|---|
| 0 | ThreadDataMiddleware | Attaches thread-scoped data to each run |
| 1 | UploadsMiddleware | Processes user-uploaded files into the run context |
| 2 | SandboxMiddleware | Acquires/releases sandbox environment for the turn |
| 3 | DanglingToolCallMiddleware | Patches missing ToolMessages before the model sees history |
| 4 | GuardrailMiddleware | Pre-execution tool call validation (fail-closed by default) |
| 5 | ToolErrorHandlingMiddleware | Converts tool exceptions into ToolMessage error responses |
| 6 | SummarizationMiddleware | Context summarization when token/message thresholds fire |
| 7 | TodoMiddleware | Plan mode todo list management |
| 8 | TitleMiddleware | Auto-generates thread titles from conversation content |
| 9 | MemoryMiddleware | LLM-driven long-term memory extraction and injection |
| 10 | ViewImageMiddleware | Vision model image handling |
| 11 | SubagentLimitMiddleware | Enforces concurrency limits, timeouts, and max turns for sub-agents |
| 12 | LoopDetectionMiddleware | Detects tool call repetition with semantic normalization |
| 13 | ClarificationMiddleware | Always last — asks clarifying questions before final output |
Custom middleware insertion with @Next/@Prev anchors
Custom middlewares can declare @Next(OtherMiddleware) or
@Prev(OtherMiddleware) class decorators for precise positioning
in the chain. The algorithm validates for circular dependencies, handles
cross-anchoring between extras, and guarantees ClarificationMiddleware
stays last. This is a sophisticated plugin pattern rare in agent codebases.
Loop detection with 200-line bucketing
DeerFlow's LoopDetectionMiddleware is the most sophisticated
loop detector in this set after Crush's SHA-256 approach. It hashes tool name
+ input + output, but has a critical special case:
The read_file false positive problem
When an agent reads a file with pagination (lines 0-200, then 200-400, etc.), naive hashing sees the same tool name and thinks it's a loop. DeerFlow solves this by bucketing line numbers into 200-line groups before hashing. Reading lines 0-200 and 200-400 produce different hashes because they hit different buckets.
Two-stage response
At 3 repeats: injects a HumanMessage warning:
"you are repeating yourself — wrap up." At 5 repeats:
strips tool_calls entirely from the response,
forcing a plain-text answer and definitively ending the loop.
For write_file and str_replace, the full arguments
are hashed to avoid false positives from legitimate repeated edits. This is
far more nuanced than most agents' "stop after N identical calls" approach.
Sub-agent orchestration with parallel execution
The lead agent can spawn sub-agents via the task_tool. The
SubagentLimitMiddleware enforces hard limits:
| Parameter | Default | Notes |
|---|---|---|
max_concurrent | 3 | Parallel sub-agent cap |
timeout_seconds | 900 | 15-minute timeout per sub-agent |
max_turns | configurable | Turn limit per sub-agent run |
Sub-agents run in background with cooperative cancellation via
threading.Event checked at astream() iteration
boundaries. Deferred cleanup uses asyncio.create_task() to
avoid race conditions. The parent sees only the delegation call and the
child's summary result — never the intermediate tool calls.
Real-time streaming of sub-agent messages is supported via the
StreamBridge abstraction, which decouples agent workers
(producers) from SSE endpoints (consumers). Currently uses
MemoryStreamBridge (in-memory queue); Redis is planned
for Phase 2 for horizontal scaling.
Model support: 6 custom providers + LangChain compatibility
DeerFlow uses a config-driven model factory (models/factory.py)
with a use field like langchain_openai:ChatOpenAI
or custom provider classes:
Claude provider
ClaudeChatModel loads OAuth tokens from
~/.claude/.credentials.json or env vars. Supports prompt
caching, auto thinking budget, and retry logic. Uses the same billing
as Claude Code CLI.
Codex provider
CodexChatModel calls the ChatGPT Codex Responses API
(chatgpt.com/backend-api/codex/responses) with SSE streaming.
Auto-loads ~/.codex/auth.json. Same endpoint as Codex CLI.
vLLM provider
VllmChatModel supports vLLM 0.19.0 with Qwen-style reasoning
toggle via extra_body.chat_template_kwargs.enable_thinking.
For self-hosted open-source models.
OpenAI-compatible
PatchedChatOpenAI handles OpenAI-compatible gateways
(OpenRouter, Novita AI, etc.) with tool-call thought_signature
preservation for Gemini compatibility.
DeepSeek provider
PatchedChatDeepSeek adds thinking mode support for
DeepSeek V3/V3.2/Reasoner models.
MiniMax provider
PatchedMiniMax for MiniMax M2.5/M2.7 models — a
Chinese model provider not commonly seen in Western agent stacks.
Recommended models from the README: Doubao-Seed-2.0-Code, DeepSeek V3.2, and Kimi 2.5.
SSE streaming and stream bridge
DeerFlow's SSE streaming is decoupled from the agent runtime via an abstract
StreamBridge protocol:
Tool calls, thoughts, text chunks → StreamBridge.enqueue()
In-memory queue (currently) with HEARTBEAT_SENTINEL every 15s
FastAPI SSE route reads from bridge, formats as Server-Sent Events
Clean stream termination with proper event signaling
This decoupling is architecturally significant: the agent runtime doesn't know about HTTP. It can run embedded, in a CLI, or on a separate server. The stream bridge is the only coupling, and it's pluggable — Redis support is planned for Phase 2 horizontal scaling.
Smoke-Test Skill
New in this update: a comprehensive smoke-test skill for end-to-end testing
in .agent/skills/smoke-test/:
- Local and Docker deployment modes
- Automatic mode switching based on network conditions
- Phase-based execution: Code Update → Environment Check → Configuration → Deployment → Health Check → Report
- Checks Node.js 22+, pnpm, uv, nginx, and required ports (2026, 3000, 8001, 2024)
- Comprehensive SOP documentation and troubleshooting guide (613 lines)
Skill system and self-evolution
DeerFlow has a structured skills system with SKILL.md files
in skills/public/ (built-in) and skills/custom/
(user-created). Skills support progressive loading, validation, and atomic
writes with JSONL history tracking.
The most unusual feature: skill self-evolution. When
skill_evolution.enabled: true, the agent can create or improve
skills during a session. Triggers defined in the system prompt include:
- "5+ tool calls used" in a pattern worth codifying
- "User corrected approach" — the user overrode the agent's method
- "Non-obvious errors encountered" — the agent discovered a gotcha worth documenting
Skills are cached at startup with a warm-up daemon thread:
warm_enabled_skills_cache(timeout=5.0). The skills cache
pre-loads in a daemon background thread at startup for fast access.
LLM-driven memory system
DeerFlow's memory is not a vector database — it's an LLM extraction pipeline.
The MemoryMiddleware uses an LLM to extract facts, preferences,
corrections, and reinforcement signals from conversations:
- Per-agent namespaces — memory is scoped to specific agent types
- Correction detection — when the user corrects the agent, that signal is extracted as a high-confidence fact
- Reinforcement detection — when the user praises an approach, that's stored as positive reinforcement
- Upload-event scrubbing — sensitive uploaded data is automatically scrubbed from memory
- Confidence thresholds — facts below a configurable confidence are not stored
- Debouncing — memory updates are debounced (configurable
debounce_seconds) to avoid excessive LLM calls
Guardrail system
DeerFlow has a pluggable guardrail system for pre-execution tool call
validation. The GuardrailMiddleware uses a
GuardrailProvider interface with evaluate() and
aevaluate() methods:
Fail-closed by default
If the guardrail provider raises an exception, the middleware blocks
the tool call by default (fail_closed: true). This can be
configured to allow through with a warning instead. This is a security
decision: when in doubt, deny.
Sandbox architecture
DeerFlow supports two sandbox providers:
LocalSandboxProvider
Direct host execution. Not a secure isolation boundary — suitable for trusted environments. Host bash is disabled by default when using LocalSandboxProvider. Uses a singleton pattern.
AioSandboxProvider
Container-based sandbox supporting Docker, Apple Container, and
Kubernetes backends. Uses deterministic sandbox IDs (SHA-256 of
thread_id) and file locking (fcntl on Unix,
msvcrt.locking on Windows) for cross-process
coordination.
Messaging channel integration
DeerFlow includes built-in support for messaging platforms in
backend/app/channels/:
| Platform | Features |
|---|---|
| 52KB integration with iLink long-polling, AES-128-ECB encryption, QR code bootstrap, media uploads | |
| Discord | Full Discord bot integration |
| Slack | Per-user session settings, slash command dispatch |
| Telegram | Per-user session settings, custom agent routing |
| Feishu/Lark | WebSocket + Webhook, interactive card events |
| WeCom | Enterprise WeChat integration |
New in this update: WeChat integration (52KB, 1371 lines) with support for TEXT, IMAGE, VOICE, FILE, VIDEO message types, AES encryption, and QR code login. Discord channel also added.
Deferred tool loading (tool_search)
When tool_search.enabled: true, MCP tools are not bound
directly to the agent. Instead, they are registered in a
DeferredToolRegistry and exposed via a tool_search
tool that the agent can discover at runtime.
This is a smart design for environments with many MCP servers: rather than polluting the agent's context with hundreds of tool descriptions, the agent can search for tools on demand. Only tools it actually needs get loaded.
ACP integration
DeerFlow has an invoke_acp_agent tool that calls external
ACP-compatible agents. It expects ACP adapters (e.g.,
@zed-industries/claude-agent-acp,
@zed-industries/codex-acp), not raw CLI binaries. This means
DeerFlow can delegate work to Claude Code or Codex through the ACP protocol
as a first-class tool call.
Configuration system
DeerFlow uses a versioned YAML config (config_version: 5) with:
- Environment variable interpolation —
$OPENAI_API_KEYin config values - Dynamic class resolution —
use: package.module:ClassNameloads any LangChain-compatible class - Config upgrade script —
make config-upgrademigrates older configs - Pydantic validation — all config is validated at load time
- Per-agent configs — custom
config.yamlfiles underagents/with custom model, soul (prompt), skills, and tool groups
LLM error handling middleware
The LLMErrorHandlingMiddleware classifies errors into:
- Retriable: busy, transient errors (408, 429, 500) — retried up to 3 times with exponential backoff and
Retry-Afterheader respect - Non-retriable: quota exceeded, auth errors — returns user-friendly
AIMessageinstead of crashing
It emits llm_retry stream events so the frontend can show
retry progress to the user.
Citation System Removed
In a significant simplification, DeerFlow removed its citation system entirely in this update:
- Deleted
SafeCitationContentcomponent - Deleted
inline-citation.tsx(289 lines) - Removed citation core utilities
- Replaced with simple
MarkdownContentrenderer
This suggests the complexity of citation handling wasn't worth it for their use case — a notable example of an agent choosing simplicity over feature completeness.
Where DeerFlow is weaker
Less of a single polished CLI identity
Unlike Claude Code or Crush, DeerFlow is not designed to be a standalone terminal experience. It's a harness — powerful but less opinionated. You need to configure it to get value.
Local sandbox is not secure isolation
The default LocalSandboxProvider runs commands directly on the host. It's convenient but not a security boundary. The AIO sandbox requires Docker or container infrastructure.
Bottom line
DeerFlow is the most framework-shaped agent in this set. If Claude Code is a product, DeerFlow is a platform. Its 14-layer middleware stack, sub-agent orchestration with parallel execution, LLM-driven memory, skill self-evolution, and SSE stream bridge make it the most extensible runtime here.
The tradeoff is that it's less immediately usable as a CLI tool — you need to configure models, skills, and sandboxes to get it working. But if you want to build an agent system rather than use one, DeerFlow is the most interesting starting point.