AI Coding Guides Deep Dives
ByteDance • LangGraph • Super Agent Harness • Updated April 2026

DeerFlow 2.0: The Framework-Shaped Agent

DeerFlow (Deep Exploration and Efficient Research Flow) is the most composable agent in this set — not a single CLI persona, but a LangGraph-based runtime that orchestrates sub-agents, memory, sandboxes, and extensible skills to do almost anything.

(Alright, ad over. Back to the serious technical analysis.)

What makes DeerFlow different

While Claude Code reads like a bespoke product and Crush reads like a polished Go terminal app, DeerFlow reads like a framework for building agent systems. It was originally a deep research tool by ByteDance but was completely rewritten from scratch into v2.0 — a general-purpose orchestration runtime with FastAPI gateway, LangGraph server, and a Next.js frontend.

🏗️

Two startup modes

Standard mode: Separate FastAPI gateway + LangGraph server (4 containers). Gateway mode (experimental): Embeds the agent runtime directly inside the gateway process, eliminating the LangGraph server and reducing to 3 containers. This also eliminates the need for a LangGraph Platform license.

The 14-layer middleware stack

Every agent turn in DeerFlow passes through a fixed-order middleware chain. No other repo in this set has a composable middleware architecture this deep; most handle these concerns inline or not at all.

OrderMiddlewarePurpose
0ThreadDataMiddlewareAttaches thread-scoped data to each run
1UploadsMiddlewareProcesses user-uploaded files into the run context
2SandboxMiddlewareAcquires/releases sandbox environment for the turn
3DanglingToolCallMiddlewarePatches missing ToolMessages before the model sees history
4GuardrailMiddlewarePre-execution tool call validation (fail-closed by default)
5ToolErrorHandlingMiddlewareConverts tool exceptions into ToolMessage error responses
6SummarizationMiddlewareContext summarization when token/message thresholds fire
7TodoMiddlewarePlan mode todo list management
8TitleMiddlewareAuto-generates thread titles from conversation content
9MemoryMiddlewareLLM-driven long-term memory extraction and injection
10ViewImageMiddlewareVision model image handling
11SubagentLimitMiddlewareEnforces concurrency limits, timeouts, and max turns for sub-agents
12LoopDetectionMiddlewareDetects tool call repetition with semantic normalization
13ClarificationMiddlewareAlways last — asks clarifying questions before final output
💡

Custom middleware insertion with @Next/@Prev anchors

Custom middlewares can declare @Next(OtherMiddleware) or @Prev(OtherMiddleware) class decorators for precise positioning in the chain. The algorithm validates for circular dependencies, handles cross-anchoring between extras, and guarantees ClarificationMiddleware stays last. This is a sophisticated plugin pattern rare in agent codebases.

Loop detection with 200-line bucketing

DeerFlow's LoopDetectionMiddleware is the most sophisticated loop detector in this set after Crush's SHA-256 approach. It hashes tool name + input + output, but has a critical special case:

The read_file false positive problem

When an agent reads a file with pagination (lines 0-200, then 200-400, etc.), naive hashing sees the same tool name and thinks it's a loop. DeerFlow solves this by bucketing line numbers into 200-line groups before hashing. Reading lines 0-200 and 200-400 produce different hashes because they hit different buckets.

Two-stage response

At 3 repeats: injects a HumanMessage warning: "you are repeating yourself — wrap up." At 5 repeats: strips tool_calls entirely from the response, forcing a plain-text answer and definitively ending the loop.

For write_file and str_replace, the full arguments are hashed to avoid false positives from legitimate repeated edits. This is far more nuanced than most agents' "stop after N identical calls" approach.

Sub-agent orchestration with parallel execution

The lead agent can spawn sub-agents via the task_tool. The SubagentLimitMiddleware enforces hard limits:

ParameterDefaultNotes
max_concurrent3Parallel sub-agent cap
timeout_seconds90015-minute timeout per sub-agent
max_turnsconfigurableTurn limit per sub-agent run

Sub-agents run in background with cooperative cancellation via threading.Event checked at astream() iteration boundaries. Deferred cleanup uses asyncio.create_task() to avoid race conditions. The parent sees only the delegation call and the child's summary result — never the intermediate tool calls.

Real-time streaming of sub-agent messages is supported via the StreamBridge abstraction, which decouples agent workers (producers) from SSE endpoints (consumers). Currently uses MemoryStreamBridge (in-memory queue); Redis is planned for Phase 2 for horizontal scaling.

Model support: 6 custom providers + LangChain compatibility

DeerFlow uses a config-driven model factory (models/factory.py) with a use field like langchain_openai:ChatOpenAI or custom provider classes:

Claude provider

ClaudeChatModel loads OAuth tokens from ~/.claude/.credentials.json or env vars. Supports prompt caching, auto thinking budget, and retry logic. Uses the same billing as Claude Code CLI.

Codex provider

CodexChatModel calls the ChatGPT Codex Responses API (chatgpt.com/backend-api/codex/responses) with SSE streaming. Auto-loads ~/.codex/auth.json. Same endpoint as Codex CLI.

vLLM provider

VllmChatModel supports vLLM 0.19.0 with Qwen-style reasoning toggle via extra_body.chat_template_kwargs.enable_thinking. For self-hosted open-source models.

OpenAI-compatible

PatchedChatOpenAI handles OpenAI-compatible gateways (OpenRouter, Novita AI, etc.) with tool-call thought_signature preservation for Gemini compatibility.

DeepSeek provider

PatchedChatDeepSeek adds thinking mode support for DeepSeek V3/V3.2/Reasoner models.

MiniMax provider

PatchedMiniMax for MiniMax M2.5/M2.7 models — a Chinese model provider not commonly seen in Western agent stacks.

Recommended models from the README: Doubao-Seed-2.0-Code, DeepSeek V3.2, and Kimi 2.5.

SSE streaming and stream bridge

DeerFlow's SSE streaming is decoupled from the agent runtime via an abstract StreamBridge protocol:

1
Agent worker produces events

Tool calls, thoughts, text chunks → StreamBridge.enqueue()

2
StreamBridge buffers

In-memory queue (currently) with HEARTBEAT_SENTINEL every 15s

3
SSE endpoint consumes

FastAPI SSE route reads from bridge, formats as Server-Sent Events

4
END_SENTINEL terminates

Clean stream termination with proper event signaling

This decoupling is architecturally significant: the agent runtime doesn't know about HTTP. It can run embedded, in a CLI, or on a separate server. The stream bridge is the only coupling, and it's pluggable — Redis support is planned for Phase 2 horizontal scaling.

Smoke-Test Skill

New in this update: a comprehensive smoke-test skill for end-to-end testing in .agent/skills/smoke-test/:

Skill system and self-evolution

DeerFlow has a structured skills system with SKILL.md files in skills/public/ (built-in) and skills/custom/ (user-created). Skills support progressive loading, validation, and atomic writes with JSONL history tracking.

The most unusual feature: skill self-evolution. When skill_evolution.enabled: true, the agent can create or improve skills during a session. Triggers defined in the system prompt include:

Skills are cached at startup with a warm-up daemon thread: warm_enabled_skills_cache(timeout=5.0). The skills cache pre-loads in a daemon background thread at startup for fast access.

LLM-driven memory system

DeerFlow's memory is not a vector database — it's an LLM extraction pipeline. The MemoryMiddleware uses an LLM to extract facts, preferences, corrections, and reinforcement signals from conversations:

Guardrail system

DeerFlow has a pluggable guardrail system for pre-execution tool call validation. The GuardrailMiddleware uses a GuardrailProvider interface with evaluate() and aevaluate() methods:

🔒

Fail-closed by default

If the guardrail provider raises an exception, the middleware blocks the tool call by default (fail_closed: true). This can be configured to allow through with a warning instead. This is a security decision: when in doubt, deny.

Sandbox architecture

DeerFlow supports two sandbox providers:

LocalSandboxProvider

Direct host execution. Not a secure isolation boundary — suitable for trusted environments. Host bash is disabled by default when using LocalSandboxProvider. Uses a singleton pattern.

AioSandboxProvider

Container-based sandbox supporting Docker, Apple Container, and Kubernetes backends. Uses deterministic sandbox IDs (SHA-256 of thread_id) and file locking (fcntl on Unix, msvcrt.locking on Windows) for cross-process coordination.

Messaging channel integration

DeerFlow includes built-in support for messaging platforms in backend/app/channels/:

PlatformFeatures
WeChat52KB integration with iLink long-polling, AES-128-ECB encryption, QR code bootstrap, media uploads
DiscordFull Discord bot integration
SlackPer-user session settings, slash command dispatch
TelegramPer-user session settings, custom agent routing
Feishu/LarkWebSocket + Webhook, interactive card events
WeComEnterprise WeChat integration

New in this update: WeChat integration (52KB, 1371 lines) with support for TEXT, IMAGE, VOICE, FILE, VIDEO message types, AES encryption, and QR code login. Discord channel also added.

Deferred tool loading (tool_search)

When tool_search.enabled: true, MCP tools are not bound directly to the agent. Instead, they are registered in a DeferredToolRegistry and exposed via a tool_search tool that the agent can discover at runtime.

This is a smart design for environments with many MCP servers: rather than polluting the agent's context with hundreds of tool descriptions, the agent can search for tools on demand. Only tools it actually needs get loaded.

ACP integration

DeerFlow has an invoke_acp_agent tool that calls external ACP-compatible agents. It expects ACP adapters (e.g., @zed-industries/claude-agent-acp, @zed-industries/codex-acp), not raw CLI binaries. This means DeerFlow can delegate work to Claude Code or Codex through the ACP protocol as a first-class tool call.

Configuration system

DeerFlow uses a versioned YAML config (config_version: 5) with:

LLM error handling middleware

The LLMErrorHandlingMiddleware classifies errors into:

It emits llm_retry stream events so the frontend can show retry progress to the user.

Citation System Removed

In a significant simplification, DeerFlow removed its citation system entirely in this update:

This suggests the complexity of citation handling wasn't worth it for their use case — a notable example of an agent choosing simplicity over feature completeness.

Where DeerFlow is weaker

Less of a single polished CLI identity

Unlike Claude Code or Crush, DeerFlow is not designed to be a standalone terminal experience. It's a harness — powerful but less opinionated. You need to configure it to get value.

Local sandbox is not secure isolation

The default LocalSandboxProvider runs commands directly on the host. It's convenient but not a security boundary. The AIO sandbox requires Docker or container infrastructure.

Bottom line

DeerFlow is the most framework-shaped agent in this set. If Claude Code is a product, DeerFlow is a platform. Its 14-layer middleware stack, sub-agent orchestration with parallel execution, LLM-driven memory, skill self-evolution, and SSE stream bridge make it the most extensible runtime here.

The tradeoff is that it's less immediately usable as a CLI tool — you need to configure models, skills, and sandboxes to get it working. But if you want to build an agent system rather than use one, DeerFlow is the most interesting starting point.