AI Coding Guides Deep Dives
Profiles • Deep Dives • Specific Code • Honest Tradeoffs

Twelve repos, twelve distinct engineering choices

This is the specific read: what each repo is actually built from, what files do the most interesting work, and where the implementation choices reveal the product's real priorities.

(Alright, ad over. Back to the serious technical analysis.)

Quick profile matrix

Repo Primary stack Feels like Best at Main caveat
Pochi TypeScript (Bun monorepo — Bun required for wa-sqlite WASM) Six-vendor aggregation platform Wraps 6 backends (Tabby, Pochi, Gemini CLI, Codex, GitHub Copilot, Qwen Code), apply-diff safety, CustomAgent schema, exponential-backoff retry, Hono+Zod API Complexity grows because it bridges many ecosystems at once; not a standalone coding agent
Neovate Code TypeScript (pnpm workspace) Provider-rich CLI with strong security opinions Broad provider support, multi-stage context compression, and the most opinionated bash safety in the set More generic than distinct; identity comes from integrations, not one native runtime
Mux TypeScript / Electron / web Workspace-centric multi-runtime agent product 7 runtime backends, 1Password secret refs, regex tool allow/deny, enforced agent_report protocol, SSH credential forwarding Large surface area makes it harder to read quickly than tighter CLIs
Crush Go Productized terminal app with loop detection Coherence, readability, SHA-256 loop detection, and binary-embedded prompt templates Less ecosystem breadth than the most adapter-heavy TypeScript projects
Kimi CLI Python + own kosong package Protocol-aware terminal assistant with a multi-provider abstraction layer ACP bridging, hooks system (pre/post tool use), and provider-aware message conversion More platform-centered than provider-neutral
Qwen Code TypeScript Config-heavy multi-model CLI Five-layer model config resolution, declarative tools, and MCP lifecycle management Still inherits some shape from Gemini CLI ancestry
OpenHands Python + TypeScript Agent platform runtime Sandbox and app/server architecture The most important modern agent core is not fully in this repo snapshot
Claude Code TypeScript / Bun / React Ink Bespoke coding-agent runtime with deep security engineering 10+8 state machine, token budget constants, undercover mode, Zsh attack mitigation, tree-sitter shell parsing, 40+ tools, teams/tasks subsystem Much less provider-neutral than peers like Mux or Qwen
Open Claude Code TypeScript / Node.js (ESM) Clean-room implementation targeting Claude Code v2.1.91 Async generator loop (13 events), 25 tools, 5 providers (Anthropic/OpenAI/Google/Bedrock/Vertex), 7 hook events, git worktree isolation, file checkpointing, 40 slash commands, automated nightly releases Independent rebuild via ruDevolution decompilation — functional but adds unique features (multi-agent teams, session teleport)
DeerFlow Python + LangGraph + FastAPI Super-agent harness with 14-layer middleware and SSE streaming 13-middleware stack, composable loop detection, skill self-evolution, IM channel integration, LangGraph thread management Less of a single polished CLI identity than Claude Code or Crush
Hermes Python (Nous Research) Self-improving multi-platform agent Learning loop (skills), multi-platform (Telegram/Discord/Slack), MoA synthesis, RL training, remote execution Breadth over focus — not as sharp as a dedicated coding agent
Pi Mono TypeScript (Node.js 20+, v0.66.1, MIT) Minimalist extension-first kernel 23 providers across 10 APIs, tree-structured JSONL v3 sessions, differential TUI, Pi Packages (npm/git bundles), parallel tool execution, precise multi-edit with file mutation queue, 4 run modes No built-in permissions, no MCP, no sub-agents — all must be added via extensions or packages
Codex Rust (Cargo workspace, 70+ crates, 3,805 files, Apache-2.0) OpenAI's production coding agent — deeply integrated product runtime Platform-specific sandboxing (Seatbelt/bubblewrap/Windows tokens), bidirectional MCP, multi-agent job execution, IDE extensions OpenAI-centric (Responses API only), Rust-only contribution barrier, no built-in learning loop
Wintermolt Zig 0.15 (3 MB native binary) Everything-agent platform with 7 modes and dozens of features 6 backends, 16 tools, cron scheduling, Tailscale, camera vision, browser automation, MCP bidirectional, 4 chat bridges, macOS menu bar Enormous feature surface; bash safety is pattern-based (not AST); TypeScript sidecars for non-CLI modes
Zaica Zig 0.15 (~9,100 lines) Focused coding specialist with structured workflows Chain mode (.chain.md), Wyhash loop detection (3-tier), reactive state (zefx), parallel sub-agent dispatch, Cyrillic REPL Only 5 core tools; no MCP, no RAG, no web UI; terminal-only experience
Goose Rust (Cargo workspace, Apache-2.0, v1.32.0) Extension-first agent with LLM-based security 15+ providers, 5-layer security inspector stack, AdversaryInspector (LLM-based review), GooseMode (Auto/Approve/SmartApprove/Chat), recipe framework, MOIM injection Extension-first model means core needs extensions for full functionality
Dirac TypeScript (fork of Cline) Hash-anchored coding agent with AST precision Hash-anchored parallel edits, AST-native precision, multi-file batching, 64.8% cost reduction, no MCP, 8-type hook system, git checkpoints, state mutex, 40+ providers No MCP support; inherits Cline architecture

Deep per-agent profiles

Pochi — A six-vendor aggregator, not a single agent

The most important thing to understand about Pochi: it is not a single coding agent. It is a multi-vendor aggregation platform that wraps six distinct backends: vendor-tabby, vendor-pochi, vendor-gemini-cli, vendor-codex, vendor-github-copilot, and vendor-qwen-code. The runtime requires Bun (not Node.js) specifically because it needs wa-sqlite WASM support for its local database layer.

Pochi's internal API layer uses Hono + Zod validation and exposes POST /api/chat/stream, POST /api/chat, and GET /api/models. Models carry a costType: "basic" | "premium" that maps to user-facing labels "swift" and "super". Despite aggregating six backends, the vendor-pochi package exposes only 2 tools: webFetch and webSearch — everything else comes from the wrapped backends.

The retry strategy is explicit: withRetry() caps at 3 attempts with 1000ms initial delay, 10000ms maximum, and a multiplier of 2 (exponential backoff). Authentication uses better-auth with JWT + device-link plugins; a set-auth-token response header silently renews credentials before they expire.

The apply-diff tool includes an expectedReplacements safety parameter — if you declare two replacements but the pattern matches three times, the edit fails. The CustomAgent shape in new-task.ts lets you define inline sub-agents; the planner agent uniquely retains askFollowupQuestion while others lose it.

📌

Key files

packages/vendor-pochi/src/vendor.ts (withRetry, better-auth), packages/vendor-pochi/src/pochi-api.ts (Hono + Zod routes), packages/tools/src/apply-diff.ts (expectedReplacements safety), packages/tools/src/new-task.ts (CustomAgent schema), packages/vendor-codex/, packages/vendor-gemini-cli/

Neovate Code — The most opinionated bash tool in the set

Neovate's bash tool (src/tools/bash.ts) contains the most elaborate pre-execution security logic in this repository set. The banned command list is concrete and non-trivial:

alias, aria2c, axel, bash, chrome, curl, curlie, eval,
firefox, fish, http-prompt, httpie, links, lynx, nc,
rm, safari, sh, source, telnet, w3m, wget, xh, zsh

Beyond static bans, it detects command substitution by actually parsing the shell syntax: it tracks single-quote, double-quote, and backslash states to correctly identify $() and backtick substitutions that would survive a naive regex check. The same character state machine is used to split pipeline segments correctly before checking each segment independently.

For high-risk detection it checks patterns like rm -rf, sudo, dd if=, mkfs, and curl | sh, as well as every segment in a pipeline individually — if any segment is high-risk, the whole command is considered high-risk.

The context compression system (src/compression.ts) is also more configurable than most: it has separate pruning and compaction phases, a triggerRatio that triggers compaction when context usage exceeds a percentage of the model limit, protectedTools that are never pruned, a protectTurns count, and an autoContinue mode that automatically resumes after compaction.

📌

Key files

src/tools/bash.ts (banned commands, command substitution detection, pipeline segment analysis), src/compression.ts (compaction + pruning config), src/tools/task.ts (AI sub-task invocation)

Mux — Seven runtimes, 1Password secrets, and regex-gated tool lists

Mux ships seven distinct runtime backends in src/node/runtime/: CoderSSHRuntime, DevcontainerRuntime, DockerRuntime, LocalRuntime, SSHRuntime, WorktreeRuntime, and RemoteRuntime — each with its own .test.ts. That list alone explains why the repo is large: it supports everywhere code can run, not just local.

The infrastructure around those runtimes is equally serious: gitBundleSync.ts syncs git bundles to remote environments, credentialForwarding.ts tunnels SSH credentials into containers, openSshPromptMediation.ts mediates SSH_ASKPASS prompts, and SSH2ConnectionPool.ts manages a connection pool. Config is JSONC and uses secret://op/... references for 1Password integration — secrets never hard-coded.

The 9 built-in agents are defined as Markdown files with YAML frontmatter in src/node/builtinAgents/. Tool lists use regex allow/deny patterns: - .* adds every available tool; - file_edit_.* removes all file-edit tools; - mux_agents_.* blocks config tools from sub-agents. The exec.md agent has a hard protocol rule: "Before your stream ends, you MUST call agent_report exactly once." The orchestrator.md agent has an explicit prohibition: "Do NOT create pull requests, push to remote branches, or run any gh pr / git push commands."

📌

Key files

src/node/runtime/ (7 runtime backends), src/node/builtinAgents/exec.md (enforced agent_report protocol), src/node/builtinAgents/orchestrator.md (no-push rule), src/node/gitBundleSync.ts, src/node/SSH2ConnectionPool.ts, src/node/config.ts (1Password secret references)

Crush — Go discipline, binary-embedded prompts, SHA-256 loop detection, and LSP integration

Crush uses Go's //go:embed directive to embed its system prompt templates directly into the binary at compile time. The three templates — coder.md.tpl, task.md.tpl, initialize.md.tpl — are Go text/template files that render with runtime data: working directory, git repo status, date, platform, context files, and available skill XML.

The coder prompt template runs to hundreds of lines and is remarkably specific. It contains sections for <critical_rules>, <communication_style>, <workflow>, <decision_making>, <editing_files>, <whitespace_and_exact_matching>, <task_completion>, and <error_handling>. MCP server instructions are injected as a separate <mcp-instructions> block appended to the system prompt at runtime.

The most distinctive engineering decision in this repo is the SHA-256 loop detection in internal/agent/loop_detection.go. For each agent step, a signature is computed by hashing the concatenated tool_name + "\x00" + input + "\x00" + output for every tool call in that step. If any signature appears more than 5 times in the last 10 steps, the agent is considered stuck. This is far more robust than checking tool names alone — the same tool called with different arguments or producing different output gets a different hash.

Crush also includes a dedicated internal/agent/agentic_fetch_tool.go that uses a smaller model to browse the web on behalf of the main agent — a mini-agent within the agent for cost-effective information retrieval.

Two more tools stand out as highly unusual in a coding CLI:

  • Sourcegraph search (sourcegraph.go) — a native tool that queries Sourcegraph's code search API. Parameters include query, count, context_window, and timeout. This is the only agent in the set with first-class Sourcegraph integration.
  • Full LSP integrationdiagnostics.go exposes an lsp_diagnostics tool that returns project-wide or file-level diagnostics from a live Language Server. references.go exposes a symbol references tool: given a symbol name, it queries the LSP to find all references across the project. These tools make Crush the only agent in this set that can call into an actual Language Server during a session.
📌

Key files

internal/agent/loop_detection.go (SHA-256 tool signature hashing), internal/agent/templates/coder.md.tpl (full system prompt), internal/agent/prompts.go (go:embed usage), internal/agent/tools/sourcegraph.go (Sourcegraph search), internal/agent/tools/diagnostics.go (LSP diagnostics), internal/agent/tools/references.go (LSP symbol references), internal/agent/agentic_fetch_tool.go (mini-agent for web browsing)

Crush — Recent: PreToolUse hooks, Hyper provider, Azure support

Major recent additions to Crush:

PreToolUse hook system (dc003bf7, ~1,196 lines across 4 new files): Shell commands fire on tool events. Hooks receive structured JSON via stdin (CRUSH_EVENT, CRUSH_TOOL_NAME, CRUSH_CWD, etc.) and return decisions: allow, deny, halt (stops the whole turn), and optional context or updated_input JSON patches. Hooks run in parallel via goroutines with configurable timeouts. Output format supports both Crush format and Claude Code hook format for compatibility. Exit code 2 blocks, exit code 49 halts the turn.

Hyper provider: Crush ships with a new hyper provider from charm.land (internal/agent/hyper/provider.go, provider.json). Enable via HYPER_API_KEY or HYPER_URL env vars. Provider.json is embedded at compile time via //go:embed. Models include GLM-5, GLM-5.1, GPT-OSS, Kimi K2.5, Kimi K2.6 with per-model pricing, context windows, and reasoning levels. A new quickstyle.go (980 lines) provides theme support with a themes.go system.

Azure provider support added. DeepSeek V4 reasoning content support fixed and maintained. Bedrock adaptive thinking improvements. The UI received significant style refactoring with semantic color names and improved theme switching.

📌

New key files

internal/hooks/hooks.go (Runner, hook execution), internal/hooks/input.go (payload/env builder), internal/hooks/runner.go (parallel execution, exit code semantics), internal/agent/hyper/ (Hyper provider + provider.json), internal/ui/styles/quickstyle.go (new 980-line style system)

Kimi CLI — Provider-native message conversion and a hooks architecture

Kimi CLI is built around its own kosong abstraction package which handles provider-specific message conversion at the low level. The package contains dedicated converters for Anthropic (with tool_use/tool_use_id), Google GenAI (with function_call parts and thought_signature for thinking tokens), and OpenAI Responses API (with function_call/function_call_output items).

One concrete example of the care here: the Google GenAI converter handles the fact that Gemini rejects an id field in function_call or function_response parts — there are API snapshot tests specifically for this case (test_google_genai_no_id_in_function_call_or_response).

The hooks system (src/kimi_cli/hooks/events.py) is another distinctive feature: three events are defined for every tool call — pre_tool_use, post_tool_use, and post_tool_use_failure. This lets external code intercept tool calls before they run, observe results after they run, and handle failures separately. The integration and E2E test suites confirm this system is well-tested.

The ACP bridge layer converts internal tool results into protocol-transportable content. The docs honestly list current gaps — missing session/set_mode and session/set_model — rather than implying perfect coverage.

📌

Key files

packages/kosong/src/kosong/contrib/chat_provider/anthropic.py, packages/kosong/src/kosong/contrib/chat_provider/google_genai.py, src/kimi_cli/hooks/events.py, packages/kosong/tests/api_snapshot_tests/

Qwen Code — Five-layer model config resolution

Qwen Code's model configuration system is the most rigorous in this set. The ModelConfigResolver in packages/core/src/models/modelConfigResolver.ts defines five named source layers with explicit precedence:

  1. modelProvider — explicit selection from ModelProviders config (highest priority)
  2. CLI arguments--model, --openaiApiKey, etc.
  3. Environment variablesOPENAI_API_KEY, OPENAI_MODEL
  4. Settings — user/workspace settings file
  5. Defaults — built-in default values (lowest priority)

Each layer is a typed ConfigLayer object. The resolver uses named source types (cliSource, settingsSource, modelProvidersSource, envLayer, defaultSource, computedSource) so you can always trace which layer a resolved value came from. Qwen Code also has a special QWEN_OAUTH_ALLOWED_MODELS list that gates the OAuth auth path to specific models. The system prompt can be overridden by setting QWEN_SYSTEM_MD to a file path (e.g. .qwen/system.md), set to 0/false to disable, or to 1/true for the default path.

The turn.ts event model defines 16 named event types in the GeminiEventType enum: Content, ToolCallRequest, ToolCallResponse, ToolCallConfirmation, UserCancelled, Error, ChatCompressed, Thought, MaxSessionTurns, SessionTokenLimitExceeded, Finished, LoopDetected, Citation, Retry, HookSystemMessage, ToolCallConfirmation. The LoopDetected and ChatCompressed events are first-class system conditions, not error states.

The truncation recovery is equally detailed: two constants — TRUNCATION_PARAM_GUIDANCE and TRUNCATION_EDIT_REJECTION — handle the case where the model's output is cut off mid-tool-call. The scheduler imports diff and fast-levenshtein to verify proposed file edits aren't corrupted by truncation.

📌

Key files

packages/core/src/models/modelConfigResolver.ts (5-layer resolution), packages/core/src/core/turn.ts (16 GeminiEventType values), packages/core/src/core/coreToolScheduler.ts (truncation recovery), packages/core/src/models/modelRegistry.ts, packages/core/src/mcp/

OpenHands — Platform architecture with an ingenious retry strategy

OpenHands is the hardest repo to score fairly because the local snapshot is explicitly described as incomplete. The modern V1 agent core moved to a separate Software Agent SDK repository. But what remains is still architecturally interesting.

The retry logic (openhands/llm/retry_mixin.py) uses the tenacity library with a documented, intentional quirk: on LLMNoResponseError when temperature is 0, it automatically bumps temperature to 1.0 on the next attempt. The rationale: a deterministic model that returns nothing is stuck in a degenerate fixed point. Adding randomness breaks the loop. This is one of the more thoughtful retry strategies in the set.

fn_call_converter.py — marked LEGACY V0, removal April 1, 2026 — converts between JSON function-calling and XML for models that don't support native tool calls. The XML format uses <function=name><parameter=key>value</parameter></function> and uses </function as a stream-stop word. The refine_prompt() function replaces 'bash' with 'powershell' automatically on Windows.

Most striking: OpenHands defines a CondensationRequestTool — the agent itself can request history condensation, not just the runtime. All tool calls carry a security_risk attribute validated against a RISK_LEVELS dict.

The system prompt uses 9 Jinja2 .j2 templates with named XML sections: <ROLE>, <EFFICIENCY>, <SECURITY>, <SECURITY_RISK_ASSESSMENT> (a separate included template), <PULL_REQUESTS>, <PROBLEM_SOLVING_WORKFLOW>, and more. The long-horizon variant adds <TASK_MANAGEMENT> and <TASK_TRACKING_PERSISTENCE> sections.

Claude Code — Undercover mode, a 10-state machine, and 15-file shell engineering

Claude Code's BashTool is not one file — it is a directory with 15 specialized modules. This is the most specific signal of how Claude Code treats coding-agent behavior as its own software domain:

  • BashTool.tsx — main tool definition
  • bashSecurity.ts — Zsh-specific attack detection
  • bashPermissions.ts — permission gate logic
  • commandSemantics.ts — semantic classification of commands
  • destructiveCommandWarning.ts — explicit user warnings
  • sedEditParser.ts + sedValidation.ts — sed-style inline edits
  • modeValidation.ts — mode checks per command
  • pathValidation.ts — path safety checks
  • readOnlyValidation.ts — read-only mode enforcement
  • shouldUseSandbox.ts — sandbox routing decision
  • bashCommandHelpers.ts, commentLabel.ts, utils.ts

The bashSecurity.ts file alone covers Zsh-specific attack vectors that no other agent in this set defends against explicitly. ZSH_DANGEROUS_COMMANDS is a Set for O(1) lookup. Blocks include: zmodload, emulate, sysopen/sysread/syswrite, zpty, zsocket, all zf_* filesystem primitives, process substitution <(/>(/=(, heredoc-in-substitution $\(.*<<, and even PowerShell comment syntax <# as "defense in depth against future changes." Tree-sitter parses the shell AST to detect these reliably.

The query engine runs as a named state machine (src/query/transitions.ts) with 10 terminal exit reasons and 8 continue reasons. Terminal: 'completed', 'blocking_limit', 'image_error', 'model_error', 'aborted_streaming', 'aborted_tools', 'prompt_too_long', 'stop_hook_prevented', 'hook_stopped', 'max_turns'. Continue: 'tool_use', 'reactive_compact_retry', 'max_output_tokens_recovery', 'max_output_tokens_escalate', 'collapse_drain_retry', 'stop_hook_blocking', 'token_budget_continuation', 'queued_command'.

Claude Code contains an "undercover mode" (isUndercover() in src/tools/BashTool/prompt.ts) that activates when process.env.USER_TYPE === 'ant'. Purpose: prevent the model from volunteering internal Anthropic codenames in commit messages. The code comments note: "Defense-in-depth: undercover instructions must survive even if the user has disabled git instructions entirely." It is built with Bun's import { feature } from 'bun:bundle'.

Token budget tracking (src/query/tokenBudget.ts) uses COMPLETION_THRESHOLD = 0.9 to nudge at 90% budget and DIMINISHING_THRESHOLD = 500 — if the per-check token delta drops below 500 for three consecutive checks, the agent is considered done. Sub-agents skip budget tracking entirely when agentId is present.

📌

Key files

src/tools/BashTool/bashSecurity.ts (Zsh attack catalog), src/tools/BashTool/prompt.ts (undercover mode), src/query/transitions.ts (10+8 state machine), src/query/tokenBudget.ts (COMPLETION_THRESHOLD=0.9), src/tools/TeamCreateTool/, src/QueryEngine.ts

Open Claude Code — Async generator clone with nightly verification

Open Claude Code 2.0 is a clean-room rebuild of Claude Code v2.1.91 via "ruDevolution" — AI-powered decompilation of the published npm package. The archive/ contains the decompiled 7.3MB CLI; v2/ is the clean-room reimplementation: 61 files, 8,314 lines, 1,581 tests.

Its async generator agent loop (v2/src/core/agent-loop.mjs) yields 13 event types and recursively calls itself after tool execution (yield* run(null, { continuation: true })). The loop handles streaming, token tracking, auto-compaction at 80% threshold, and 7 hook events: PreToolUse, PostToolUse, PreToolUseFailure, PostToolUseFailure, Notification, Stop, SessionStart. Exit code 2 blocks; exit code 49 halts the turn.

5 providers: Anthropic (primary), OpenAI, Google, AWS Bedrock, Google Vertex. Request/response transforms normalize across API shapes. 25 tools with validateInput/call interface. File checkpointing via checkpoints.mjs before dangerous ops. Git worktree isolation via EnterWorktree/ExitWorktree for parallel agent tasks — unique among these agents. Session export/import for "teleport" between machines.

Nightly release pipeline: Automated CI/CD detects new Claude Code npm releases (03:00 UTC), runs 903+ tests, npm audit, and Claude Sonnet 4.6 AI-powered change analysis. Only publishes if ALL gates pass. The rudevolution submodule tracks 34,759+ function declarations with 95.7% naming accuracy.

📌

Key files

v2/src/core/agent-loop.mjs (462 lines, async generator), v2/src/tools/bash.mjs (148 lines, sandboxed shell), v2/src/tools/agent.mjs (127 lines, worktree support), v2/src/core/providers.mjs (5-provider multi-client), v2/src/hooks/engine.mjs (7 hook types), v2/src/ui/commands.mjs (40 slash commands), archive/open_claude_code/cli.mjs (7.3MB decompiled)

DeerFlow — LangGraph harness with 14-layer middleware and SSE streaming

DeerFlow is the only agent in this set built on LangGraph, and it shows. The backend is a FastAPI application (backend/app/gateway/app.py) that initializes a LangGraph runtime on startup: checkpointer, store, StreamBridge, and RunManager all come up as async components in the application lifespan handler.

Every agent turn passes through a 14-layer middleware stack in agents/middlewares/: LoopDetectionMiddleware, TokenUsageMiddleware, MemoryMiddleware, TodoMiddleware, TitleMiddleware, ClarificationMiddleware, SubagentLimitMiddleware, ViewImageMiddleware, SandboxAuditMiddleware, DeferredToolFilterMiddleware, DanglingToolCallMiddleware, ToolErrorHandlingMiddleware, UploadsMiddleware. No other repo in this set has a composable middleware architecture; most handle these concerns inline or not at all.

DeerFlow's loop detection (agents/middlewares/loop_detection_middleware.py) has a noteworthy special case: calls to read_file have their line numbers bucketed into 200-line groups before hashing, to avoid false positives from paginated reads. On warn (3 repeats), it injects a HumanMessage("you are repeating yourself — wrap up"). On hard limit (5 repeats), it strips tool_calls entirely from the response, forcing a plain-text answer.

Skill self-evolution is explicit in the system prompt: triggers include "5+ tool calls used," "user corrected approach," and "non-obvious errors encountered." The prompt warns hard: "HARD ERROR. The system WILL discard excess [sub-agent] calls and you WILL lose work." Skills cache pre-loads in a daemon background thread (warm_enabled_skills_cache(timeout=5.0)) at startup.

📌

Key files

backend/app/gateway/app.py (FastAPI + LangGraph init), agents/middlewares/loop_detection_middleware.py (200-line bucket, tool_calls strip), agents/middlewares/ (13-middleware stack), backend/app/gateway/routers/runs.py (SSE streaming), backend/langgraph.json

Hermes — Self-improving agent with multi-platform reach and RL infrastructure

Hermes by Nous Research is unique in this study: it is the only agent with a closed learning loop, multi-platform messaging support, and RL training infrastructure in the same codebase.

The skill system stores reusable procedures as SKILL.md files in ~/.hermes/skills/. Skills are injected as user messages (not system prompt) to preserve the prompt cache. The agent can create, edit, and delete skills via skill_manager_tool.py, and every agent-created skill is security-scanned before saving.

Two memory files persist knowledge across sessions: MEMORY.md (agent notes) and USER.md (user model). Both are scanned for prompt injection before loading. The context compressor (agent/context_compressor.py) uses a five-step algorithm with structured summaries (Goal, Progress, Decisions, Files, Next Steps) and iterative update on repeated compression.

The tools/mixture_of_agents_tool.py implements MoA: four reference models (Claude Opus, Gemini Pro, GPT-5, DeepSeek) run in parallel, and an aggregator model synthesizes the results. This is an optional tool the agent can invoke for hard problems — unique in this set.

The gateway covers 14+ messaging platforms: Telegram, Discord, Slack, WhatsApp, Signal, WeChat/WeCom, Matrix, Mattermost, Feishu/Lark, DingTalk, Email, SMS, HomeAssistant, and a generic Webhook adapter. Each platform module is a full adapter with auth, inbound message handling, allowlists, dedup, and ACK logic.

See the dedicated Hermes page for full coverage.

Which code looks best designed?

1. Claude Code

Best end-to-end product coherence. Every tool is a directory, not a file. Permissions, UX, security, and task management are all native product concerns.

2. Crush

Best structural cleanliness. Binary-embedded templates, SHA-256 loop detection, and the Go type system keep the design honest and maintainable.

3. Qwen Code

A very solid multi-provider CLI core. The five-layer config resolver and clean tool/MCP separation are the strongest generic engineering here.

4. Mux

Strongest large-surface product architecture among the provider-rich repos. Ambitious, broad, and still impressively organized.

5. Hermes

Most functionally unique. The learning loop, multi-platform, and RL infrastructure are unlike anything else here. The tradeoff is focus.

Pi Mono — The 438-file extension-first kernel

Pi Mono (@mariozechner/pi-coding-agent, v0.66.1, MIT, by Mario Zechner) is a TypeScript coding agent that deliberately omits features other agents bake in: no MCP, no sub-agents, no permissions, no plan mode, no todos. Instead it ships a 438-file kernel with 7 core tools, 23 provider support, tree-structured JSONL v3 sessions, and a differential-rendering TUI.

The 7 tools (read, bash, edit, write, grep, find, ls) use TypeBox JSON schemas with AJV validation. The edit tool supports multiple disjoint edits per call with fuzzy matching (normalizes Unicode quotes, dashes, spaces), uniqueness validation, reverse-order application, and line ending preservation. A file mutation queue serializes concurrent writes to the same file — a subtle race condition that plagues many other agents.

The bash tool uses a pluggable BashOperations interface with streaming output, process tree killing, 10MB stdout/stderr caps, and temp file fallback for overflow. Extensions can intercept via BashSpawnHook.

Sessions are tree-structured JSONL (v3) with id/parentId fields enabling in-place branching. Entry types include message, thinking_level_change, model_change, compaction (with firstKeptEntryId), branch_summary, and label (bookmarks). The /tree command navigates the tree; /fork creates a new session file from a branch point.

The agent runtime supports parallel tool execution (default, configurable to sequential), steering messages (delivered after current turn — real-time interruption), followup messages (delivered after agent stops), and a PendingMessageQueue with two delivery modes. The event system exposes 20+ lifecycle events including beforeToolCall/afterToolCall hooks.

23 providers across 10 API implementations: Anthropic, OpenAI, Google (3 variants), Azure OpenAI, OpenAI Codex, GitHub Copilot, xAI, Groq, Cerebras, OpenRouter, Vercel AI Gateway, ZAI, Mistral, MiniMax (2 variants), HuggingFace, OpenCode (2 variants), Kimi Coding, and Amazon Bedrock. The ModelRegistry (with 14,278 lines of auto-generated model definitions) supports glob patterns, thinking level suffixes (model:high), alias preference over dated versions, and ambiguity rejection.

The extension system uses jiti (TypeScript runtime executor) with virtual module support for compiled Bun binaries. Extensions can register tools, commands, shortcuts, flags, message renderers, custom providers, and subscribe to 20+ events. Pi Packages (installable via npm or git) auto-discover extensions, skills, prompts, and themes from their directory structure.

The TUI uses differential rendering — only changed terminal cells are redrawn at 60fps throttle (16ms min). It supports Kitty graphics protocol, hardware cursor via APC escape sequences, and overlay stack with focus management.

The web UI (Lit components) includes a full artifact system (HtmlArtifact, ImageArtifact, MarkdownArtifact, etc.), sandboxed iframes with runtime providers, a JavaScript REPL, and document extraction (PDF, DOCX, XLSX).

📌

Key files

packages/coding-agent/src/core/tools/*.ts (TypeBox tool definitions), packages/coding-agent/src/core/tools/edit-diff.ts (multi-edit, fuzzy matching), packages/coding-agent/src/core/tools/file-mutation-queue.ts (per-file locking), packages/agent/src/agent-loop.ts (parallel execution, steering/followup queues), packages/ai/src/providers/models.generated.ts (14,278 lines, 23 providers), packages/tui/src/tui.ts (differential rendering), packages/coding-agent/src/core/extensions/ (jiti-based extension system), packages/web-ui/src/tools/artifacts/ (artifact system, JS REPL)

Codex — OpenAI's production coding agent in Rust

Codex is OpenAI's open-source coding agent, implemented entirely in Rust as a Cargo workspace with 70+ crates across 3,805 files (Apache-2.0 license). It represents the most production-grade agent codebase OpenAI has released — the same runtime that powers their internal product, now available for inspection and contribution.

The sandboxing system is the most platform-specific in this set: macOS uses Seatbelt (sandbox-exec), Linux uses bubblewrap (bwrap), and Windows uses process tokens and job objects. Each platform gets a native sandbox implementation rather than a cross-platform abstraction. The sandbox profiles define fine-grained filesystem, network, and process isolation policies.

MCP is bidirectional — Codex can act as an MCP server (exposing its tools to external clients) and as an MCP client (loading external MCP servers). Multi-agent job execution is built-in: the agent can spawn sub-jobs with their own tool scopes and collect results asynchronously. IDE extensions are first-class: VS Code and JetBrains integrations ship alongside the CLI.

The core business logic lives in codex-rs/core, the TUI uses Ratatui in codex-rs/tui, tool schemas are defined in codex-rs/tools, platform sandboxes are in codex-rs/sandboxing, and the execution policy rule engine is in codex-rs/execpolicy.

📌

Key files

codex-rs/core/ (business logic), codex-rs/tui/ (Ratatui TUI), codex-rs/tools/ (tool schemas), codex-rs/sandboxing/ (Seatbelt/bubblewrap/Windows tokens), codex-rs/execpolicy/ (rule engine)

Wintermolt — The 3 MB everything-agent in Zig

Wintermolt compiles to a single 3 MB native binary with zero runtime (no Node.js, no Python, no Electron). It links only libcurl and sqlite3, both pre-installed on macOS and most Linux distributions. It can be cross-compiled to any Zig target — including ARM boards like Jetson and Raspberry Pi — with one command: zig build -Dtarget=aarch64-linux-gnu.

The codebase is ~18,400 lines across 51 Zig source files, plus two prebuilt static libraries from sibling projects. The agentic loop (src/agent/loop.zig, 646 lines) runs up to 25 tool iterations per turn with automatic fallback (primary → Ollama → OpenAI) if the backend fails.

6 AI backends share a unified wire protocol (src/api/protocol.zig, 281 lines). Three hand-written streaming parsers handle the different formats: SSE for Anthropic (461 lines), SSE for OpenAI (287 lines), and NDJSON for Ollama (250 lines). The DeepSeekClient is effectively a universal OpenAI-compatible client reused for OpenAI, DeepSeek, Qwen, Gemini, and custom endpoints.

16+ built-in tools span bash, file I/O, grep, glob, HTTP, web search, camera capture, image processing, Chrome DevTools Protocol browser automation (~895 lines), Pinecone RAG memory search, cron scheduling, Tailscale mesh queries, A2UI canvas rendering, and CLI-Anything harness generation. Tool dispatch uses a 3-layer system: built-in → runtime skills → MCP remote tools.

MCP is bidirectional — server (~214 lines) exposes 20+ tools via JSON-RPC 2.0 over stdio; client (~421 lines) loads external MCP servers from config, spawns them as child processes, runs the 3-step handshake, and prefixes discovered tools with "servername__".

The cron scheduler (src/agent/scheduler.zig, 756 lines) is SQLite-persisted with three schedule types (every/at/cron) and auto-disables jobs after max_retries failures. No other agent in this set has a built-in cron scheduler.

All sidecars (chat, web, menubar, gateway) follow the same JSON-lines-over-stdio IPC pattern — Zig spawns a child process (TypeScript/Swift), communicates via clean stdin/stdout pipes. The chat bridge uses a 7-tier priority binding system (src/agent/router.zig, 585 lines).

📌

Key files

src/agent/loop.zig (agentic loop, 646 lines), src/agent/tools.zig (tool dispatch, 717 lines), src/agent/scheduler.zig (cron, 756 lines), src/tools/browser.zig (CDP, 895 lines), src/web/bridge.zig (WebSocket, 1,062 lines), src/mcp/client.zig (MCP client, 421 lines), src/agent/rag.zig (Pinecone RAG, 577 lines)

Zaica — The focused Zig coding specialist

Zaica is a ~9,100-line Zig 0.15 coding assistant with zero runtime dependencies beyond the standard library. It is distributed via Homebrew for macOS (aarch64 + x86_64) and Linux (aarch64 + x86_64). It uses std.http.Client directly — no HTTP library needed.

The central abstraction is the Node in src/node.zig (731 lines) — the generic agentic loop used by both the REPL and sub-agents. In terminal mode, tools run in parallel using std.Thread.spawn; in silent mode (sub-agents, chain mode), they run sequentially to avoid threads-inside-threads.

Zaica's most distinctive feature is its Wyhash-based loop detection: a ring buffer of tool call signatures detects repeating patterns of length 1, 2, or 3 within a 10-call window with 3-tier escalation (warning → stronger warning → force break). The same tool with different arguments gets a different hash — so legitimate iteration isn't flagged. This is the most sophisticated loop detection in this entire directory set.

Chain mode (src/chain.zig, 528 lines) implements structured workflows via .chain.md files with per-step tool filtering, variable substitution ({task}, {previous}), and max iterations. No other agent in this set offers chain mode.

Instead of imperative state management, zaica uses a custom reactive state graph via zefx (Effector-inspired). Events trigger Store reducers, which trigger watchers in a two-phase flush. The status bar is a watcher on derived stores — unique architecture for a coding agent.

The 3-tier permission model is elegantly simple: [y]es all / [s]afe only / [n]o, asked once per session. The bash wrapper redirects stdin from /dev/null, kills process trees on timeout, and caps output at 1 MB.

The config system uses a 6-layer JSON priority chain (comptime defaults → provider presets → user config → project config → env vars → CLI flags) with deep object merging. The REPL (src/repl.zig, 2,153 lines) implements manual line editing with full UTF-8 and Cyrillic support.

📌

Key files

src/node.zig (agentic loop, 731 lines), src/repl.zig (full REPL, 2,153 lines), src/chain.zig (structured workflows, 528 lines), src/state.zig (reactive state, 600 lines), src/tools.zig (7 tools + permissions, 631 lines), src/client/sse.zig (SSE parser, 408 lines), lib/zefx/ (reactive engine)

Dirac — Hash-anchored edits, AST precision, and 64.8% cost reduction

Dirac is a TypeScript coding agent — a fork of Cline — that takes a fundamentally different approach to file editing. Where most agents use line numbers (which drift when the file changes), Dirac uses stable line hashes to anchor edits. When the model reads a file, each line gets a deterministic hash. Edits are then specified as anchor + end_anchor + replacement text rather than start_line + end_line + text. This means edits survive file shifts and multiple disjoint edits can be applied in a single batch with no coordinate conflicts.

The EditExecutor (src/core/task/tools/handlers/edit-file/EditExecutor.ts) resolves anchors by checking that the anchor name starts with a capital letter, exists in the file's line hash list, and that the provided content matches the actual file content at that hash. If any check fails, the edit fails with a diagnostic. The BatchProcessor applies multiple edits in reverse line order — highest line index first — so that earlier edits don't shift the coordinates of later ones.

Beyond hash-anchored text edits, Dirac has AST-aware tools that target specific symbols — classes, functions, interfaces — directly rather than by text position. This means edits are always syntactically valid; JSDoc comments, decorators, and type annotations are preserved automatically. This is the structural equivalent of the hash-anchored system: both prevent the "friction" of coordinate-based editing.

Dirac's token efficiency is a deliberate engineering target backed by multiple mechanisms: hash-anchored multi-file batching (multiple files edited in a single LLM roundtrip), get_file_skeleton for project structure mapping without reading every line, ContextManager truncation with half/quarter strategies, concurrent tool calling, and a minimal PRIME DIRECTIVES system prompt. On 8 real-world refactoring tasks, Dirac achieved 8/8 correct at an average cost of $0.18 — versus $0.38–$0.73 for competitors. That is a 2.8x cost reduction.

The state mutex pattern (via p-mutex) is used to serialize all state modifications in the main Task loop. This prevents race conditions between the concurrent tool executor and the main task loop without sacrificing the performance benefit of parallel tool calls. Every state modification — tool results, message updates, task history writes — goes through withStateLock.

Dirac has an 8-type hook system: onTaskStart, onTaskComplete, onTaskCancel, onTaskResume, preToolUse, postToolUse, preCompact, and preRequest. Hooks are discovered via auto-discovery from AGENTS.md, .claude/, or .agents/ directories. The hook executor runs scripts with structured JSON via stdin and supports cancellation via AbortController. Hooks can return a cancel decision, a contextModification to alter behavior, or an errorMessage.

The git checkpoint system creates commits before risky operations, enabling revert. Plan/Act mode separation is first-class — Plan mode gathers information and presents a strategy before asking for user approval to switch to Act mode. YOLO mode (dirac -y) runs fully autonomously with auto-approval.

Shell command validation uses DIRAC_COMMAND_PERMISSIONS — a JSON object with allow/deny glob patterns — more flexible than a ban list. The subagent system (SubagentToolHandler) spawns isolated children with their own configuration. Skills are auto-discovered from AGENTS.md, .claude/, and .agents/. Provider support covers 40+ APIs including Anthropic, OpenAI, Google, AWS Bedrock, Azure, and many OpenAI-compatible gateways.

The most notable gap: Dirac does not implement MCP support. This is an explicit design decision, not an oversight. For users who need MCP integration, this is a significant gap. For users who want a tightly integrated, self-contained agent with the best structural edit accuracy in the TypeScript agent space, this is a feature.

📌

Key files

src/core/task/index.ts (1,868 lines — agent loop, state mutex, tool orchestration), src/core/task/tools/handlers/edit-file/EditExecutor.ts (hash-anchor resolution), src/core/task/tools/handlers/edit-file/BatchProcessor.ts (reverse-order batch processor), src/core/context/context-management/ContextManager.ts (half/quarter truncation), src/core/hooks/hook-executor.ts (8-type hook system with streaming and cancellation), src/core/prompts/system-prompt/template.ts (PRIME DIRECTIVES), src/core/api/retry.ts (exponential backoff 2s, 4s, 8s), cli/man/dirac.1.md (full CLI reference)

📝

Important caveat

"Best designed" here means best aligned between code and product intent, not "best for every user." Kimi is more protocol-focused, OpenHands is more platform-shaped, DeerFlow is more compositional, and Neovate is more security-conscious about shell execution. Different goals produce different tradeoffs.