ByteDance • LangGraph • Super Agent Harness • Updated April 2026

DeerFlow 2.0: The Framework-Shaped Agent

DeerFlow (Deep Exploration and Efficient Research Flow) is the most composable agent in this set — not a single CLI persona, but a LangGraph-based runtime that orchestrates sub-agents, memory, sandboxes, and extensible skills to do almost anything.

(Alright, ad over. Back to the serious technical analysis.)

What makes DeerFlow different

While Claude Code reads like a bespoke product and Crush reads like a polished Go terminal app, DeerFlow reads like a framework for building agent systems. It was originally a deep research tool by ByteDance but was completely rewritten from scratch into v2.0 — a general-purpose orchestration runtime with FastAPI gateway, LangGraph server, and a Next.js frontend.

🏗️

Two startup modes

Standard mode: Separate FastAPI gateway + LangGraph server (4 containers). Gateway mode (experimental): Embeds the agent runtime directly inside the gateway process, eliminating the LangGraph server and reducing to 3 containers. This also eliminates the need for a LangGraph Platform license.

The 14-layer middleware stack

Every agent turn in DeerFlow passes through a fixed-order middleware chain. No other repo in this set has a composable middleware architecture this deep; most handle these concerns inline or not at all.

Order	Middleware	Purpose
0	`ThreadDataMiddleware`	Attaches thread-scoped data to each run
1	`UploadsMiddleware`	Processes user-uploaded files into the run context
2	`SandboxMiddleware`	Acquires/releases sandbox environment for the turn
3	`DanglingToolCallMiddleware`	Patches missing ToolMessages before the model sees history
4	`GuardrailMiddleware`	Pre-execution tool call validation (fail-closed by default)
5	`ToolErrorHandlingMiddleware`	Converts tool exceptions into ToolMessage error responses
6	`SummarizationMiddleware`	Context summarization when token/message thresholds fire
7	`TodoMiddleware`	Plan mode todo list management
8	`TitleMiddleware`	Auto-generates thread titles from conversation content
9	`MemoryMiddleware`	LLM-driven long-term memory extraction and injection
10	`ViewImageMiddleware`	Vision model image handling
11	`SubagentLimitMiddleware`	Enforces concurrency limits, timeouts, and max turns for sub-agents
12	`LoopDetectionMiddleware`	Detects tool call repetition with semantic normalization
13	`ClarificationMiddleware`	Always last — asks clarifying questions before final output

💡

Custom middleware insertion with @Next/@Prev anchors

Custom middlewares can declare @Next(OtherMiddleware) or @Prev(OtherMiddleware) class decorators for precise positioning in the chain. The algorithm validates for circular dependencies, handles cross-anchoring between extras, and guarantees ClarificationMiddleware stays last. This is a sophisticated plugin pattern rare in agent codebases.

Loop detection with 200-line bucketing

DeerFlow's LoopDetectionMiddleware is the most sophisticated loop detector in this set after Crush's SHA-256 approach. It hashes tool name + input + output, but has a critical special case:

The read_file false positive problem

When an agent reads a file with pagination (lines 0-200, then 200-400, etc.), naive hashing sees the same tool name and thinks it's a loop. DeerFlow solves this by bucketing line numbers into 200-line groups before hashing. Reading lines 0-200 and 200-400 produce different hashes because they hit different buckets.

Two-stage response

At 3 repeats: injects a HumanMessage warning: "you are repeating yourself — wrap up." At 5 repeats: strips tool_calls entirely from the response, forcing a plain-text answer and definitively ending the loop.

For write_file and str_replace, the full arguments are hashed to avoid false positives from legitimate repeated edits. This is far more nuanced than most agents' "stop after N identical calls" approach.

Sub-agent orchestration with parallel execution

The lead agent can spawn sub-agents via the task_tool. The SubagentLimitMiddleware enforces hard limits:

Parameter	Default	Notes
`max_concurrent`	3	Parallel sub-agent cap
`timeout_seconds`	900	15-minute timeout per sub-agent
`max_turns`	configurable	Turn limit per sub-agent run

Sub-agents run in background with cooperative cancellation via threading.Event checked at astream() iteration boundaries. Deferred cleanup uses asyncio.create_task() to avoid race conditions. The parent sees only the delegation call and the child's summary result — never the intermediate tool calls.

Real-time streaming of sub-agent messages is supported via the StreamBridge abstraction, which decouples agent workers (producers) from SSE endpoints (consumers). Currently uses MemoryStreamBridge (in-memory queue); Redis is planned for Phase 2 for horizontal scaling.

Model support: 6 custom providers + LangChain compatibility

DeerFlow uses a config-driven model factory (models/factory.py) with a use field like langchain_openai:ChatOpenAI or custom provider classes:

Claude provider

ClaudeChatModel loads OAuth tokens from ~/.claude/.credentials.json or env vars. Supports prompt caching, auto thinking budget, and retry logic. Uses the same billing as Claude Code CLI.

Codex provider

CodexChatModel calls the ChatGPT Codex Responses API (chatgpt.com/backend-api/codex/responses) with SSE streaming. Auto-loads ~/.codex/auth.json. Same endpoint as Codex CLI.

vLLM provider

VllmChatModel supports vLLM 0.19.0 with Qwen-style reasoning toggle via extra_body.chat_template_kwargs.enable_thinking. For self-hosted open-source models.

OpenAI-compatible

PatchedChatOpenAI handles OpenAI-compatible gateways (OpenRouter, Novita AI, etc.) with tool-call thought_signature preservation for Gemini compatibility.

DeepSeek provider

PatchedChatDeepSeek adds thinking mode support for DeepSeek V3/V3.2/Reasoner models.

MiniMax provider

PatchedMiniMax for MiniMax M2.5/M2.7 models — a Chinese model provider not commonly seen in Western agent stacks.

Recommended models from the README: Doubao-Seed-2.0-Code, DeepSeek V3.2, and Kimi 2.5.

SSE streaming and stream bridge

DeerFlow's SSE streaming is decoupled from the agent runtime via an abstract StreamBridge protocol:

Agent worker produces events

Tool calls, thoughts, text chunks → StreamBridge.enqueue()

↓

StreamBridge buffers

In-memory queue (currently) with HEARTBEAT_SENTINEL every 15s

↓

SSE endpoint consumes

FastAPI SSE route reads from bridge, formats as Server-Sent Events

↓

END_SENTINEL terminates

Clean stream termination with proper event signaling

This decoupling is architecturally significant: the agent runtime doesn't know about HTTP. It can run embedded, in a CLI, or on a separate server. The stream bridge is the only coupling, and it's pluggable — Redis support is planned for Phase 2 horizontal scaling.

Smoke-Test Skill

New in this update: a comprehensive smoke-test skill for end-to-end testing in .agent/skills/smoke-test/:

Local and Docker deployment modes
Automatic mode switching based on network conditions
Phase-based execution: Code Update → Environment Check → Configuration → Deployment → Health Check → Report
Checks Node.js 22+, pnpm, uv, nginx, and required ports (2026, 3000, 8001, 2024)
Comprehensive SOP documentation and troubleshooting guide (613 lines)

Skill system and self-evolution

DeerFlow has a structured skills system with SKILL.md files in skills/public/ (built-in) and skills/custom/ (user-created). Skills support progressive loading, validation, and atomic writes with JSONL history tracking.

The most unusual feature: skill self-evolution. When skill_evolution.enabled: true, the agent can create or improve skills during a session. Triggers defined in the system prompt include:

"5+ tool calls used" in a pattern worth codifying
"User corrected approach" — the user overrode the agent's method
"Non-obvious errors encountered" — the agent discovered a gotcha worth documenting

Skills are cached at startup with a warm-up daemon thread: warm_enabled_skills_cache(timeout=5.0). The skills cache pre-loads in a daemon background thread at startup for fast access.

LLM-driven memory system

DeerFlow's memory is not a vector database — it's an LLM extraction pipeline. The MemoryMiddleware uses an LLM to extract facts, preferences, corrections, and reinforcement signals from conversations:

Per-agent namespaces — memory is scoped to specific agent types
Correction detection — when the user corrects the agent, that signal is extracted as a high-confidence fact
Reinforcement detection — when the user praises an approach, that's stored as positive reinforcement
Upload-event scrubbing — sensitive uploaded data is automatically scrubbed from memory
Confidence thresholds — facts below a configurable confidence are not stored
Debouncing — memory updates are debounced (configurable debounce_seconds) to avoid excessive LLM calls

Guardrail system

DeerFlow has a pluggable guardrail system for pre-execution tool call validation. The GuardrailMiddleware uses a GuardrailProvider interface with evaluate() and aevaluate() methods:

🔒

Fail-closed by default

If the guardrail provider raises an exception, the middleware blocks the tool call by default (fail_closed: true). This can be configured to allow through with a warning instead. This is a security decision: when in doubt, deny.

Sandbox architecture

DeerFlow supports two sandbox providers:

LocalSandboxProvider

Direct host execution. Not a secure isolation boundary — suitable for trusted environments. Host bash is disabled by default when using LocalSandboxProvider. Uses a singleton pattern.

AioSandboxProvider

Container-based sandbox supporting Docker, Apple Container, and Kubernetes backends. Uses deterministic sandbox IDs (SHA-256 of thread_id) and file locking (fcntl on Unix, msvcrt.locking on Windows) for cross-process coordination.

Messaging channel integration

DeerFlow includes built-in support for messaging platforms in backend/app/channels/:

Platform	Features
WeChat	52KB integration with iLink long-polling, AES-128-ECB encryption, QR code bootstrap, media uploads
Discord	Full Discord bot integration
Slack	Per-user session settings, slash command dispatch
Telegram	Per-user session settings, custom agent routing
Feishu/Lark	WebSocket + Webhook, interactive card events
WeCom	Enterprise WeChat integration

New in this update: WeChat integration (52KB, 1371 lines) with support for TEXT, IMAGE, VOICE, FILE, VIDEO message types, AES encryption, and QR code login. Discord channel also added.

Deferred tool loading (tool_search)

When tool_search.enabled: true, MCP tools are not bound directly to the agent. Instead, they are registered in a DeferredToolRegistry and exposed via a tool_search tool that the agent can discover at runtime.

This is a smart design for environments with many MCP servers: rather than polluting the agent's context with hundreds of tool descriptions, the agent can search for tools on demand. Only tools it actually needs get loaded.

ACP integration

DeerFlow has an invoke_acp_agent tool that calls external ACP-compatible agents. It expects ACP adapters (e.g., @zed-industries/claude-agent-acp, @zed-industries/codex-acp), not raw CLI binaries. This means DeerFlow can delegate work to Claude Code or Codex through the ACP protocol as a first-class tool call.

Configuration system

DeerFlow uses a versioned YAML config (config_version: 5) with:

Environment variable interpolation — $OPENAI_API_KEY in config values
Dynamic class resolution — use: package.module:ClassName loads any LangChain-compatible class
Config upgrade script — make config-upgrade migrates older configs
Pydantic validation — all config is validated at load time
Per-agent configs — custom config.yaml files under agents/ with custom model, soul (prompt), skills, and tool groups

LLM error handling middleware

The LLMErrorHandlingMiddleware classifies errors into:

Retriable: busy, transient errors (408, 429, 500) — retried up to 3 times with exponential backoff and Retry-After header respect
Non-retriable: quota exceeded, auth errors — returns user-friendly AIMessage instead of crashing

It emits llm_retry stream events so the frontend can show retry progress to the user.

Citation System Removed

In a significant simplification, DeerFlow removed its citation system entirely in this update:

Deleted SafeCitationContent component
Deleted inline-citation.tsx (289 lines)
Removed citation core utilities
Replaced with simple MarkdownContent renderer

This suggests the complexity of citation handling wasn't worth it for their use case — a notable example of an agent choosing simplicity over feature completeness.

Where DeerFlow is weaker

Less of a single polished CLI identity

Unlike Claude Code or Crush, DeerFlow is not designed to be a standalone terminal experience. It's a harness — powerful but less opinionated. You need to configure it to get value.

Local sandbox is not secure isolation

The default LocalSandboxProvider runs commands directly on the host. It's convenient but not a security boundary. The AIO sandbox requires Docker or container infrastructure.

Bottom line

DeerFlow is the most framework-shaped agent in this set. If Claude Code is a product, DeerFlow is a platform. Its 14-layer middleware stack, sub-agent orchestration with parallel execution, LLM-driven memory, skill self-evolution, and SSE stream bridge make it the most extensible runtime here.

The tradeoff is that it's less immediately usable as a CLI tool — you need to configure models, skills, and sandboxes to get it working. But if you want to build an agent system rather than use one, DeerFlow is the most interesting starting point.