Twelve repos, twelve distinct engineering choices
This is the specific read: what each repo is actually built from, what files do the most interesting work, and where the implementation choices reveal the product's real priorities.
Quick profile matrix
| Repo | Primary stack | Feels like | Best at | Main caveat |
|---|---|---|---|---|
| Pochi | TypeScript (Bun monorepo — Bun required for wa-sqlite WASM) | Six-vendor aggregation platform | Wraps 6 backends (Tabby, Pochi, Gemini CLI, Codex, GitHub Copilot, Qwen Code), apply-diff safety, CustomAgent schema, exponential-backoff retry, Hono+Zod API | Complexity grows because it bridges many ecosystems at once; not a standalone coding agent |
| Neovate Code | TypeScript (pnpm workspace) | Provider-rich CLI with strong security opinions | Broad provider support, multi-stage context compression, and the most opinionated bash safety in the set | More generic than distinct; identity comes from integrations, not one native runtime |
| Mux | TypeScript / Electron / web | Workspace-centric multi-runtime agent product | 7 runtime backends, 1Password secret refs, regex tool allow/deny, enforced agent_report protocol, SSH credential forwarding | Large surface area makes it harder to read quickly than tighter CLIs |
| Crush | Go | Productized terminal app with loop detection | Coherence, readability, SHA-256 loop detection, and binary-embedded prompt templates | Less ecosystem breadth than the most adapter-heavy TypeScript projects |
| Kimi CLI | Python + own kosong package |
Protocol-aware terminal assistant with a multi-provider abstraction layer | ACP bridging, hooks system (pre/post tool use), and provider-aware message conversion | More platform-centered than provider-neutral |
| Qwen Code | TypeScript | Config-heavy multi-model CLI | Five-layer model config resolution, declarative tools, and MCP lifecycle management | Still inherits some shape from Gemini CLI ancestry |
| OpenHands | Python + TypeScript | Agent platform runtime | Sandbox and app/server architecture | The most important modern agent core is not fully in this repo snapshot |
| Claude Code | TypeScript / Bun / React Ink | Bespoke coding-agent runtime with deep security engineering | 10+8 state machine, token budget constants, undercover mode, Zsh attack mitigation, tree-sitter shell parsing, 40+ tools, teams/tasks subsystem | Much less provider-neutral than peers like Mux or Qwen |
| Open Claude Code | TypeScript / Node.js (ESM) | Clean-room implementation targeting Claude Code v2.1.91 | Async generator loop (13 events), 25 tools, 5 providers (Anthropic/OpenAI/Google/Bedrock/Vertex), 7 hook events, git worktree isolation, file checkpointing, 40 slash commands, automated nightly releases | Independent rebuild via ruDevolution decompilation — functional but adds unique features (multi-agent teams, session teleport) |
| DeerFlow | Python + LangGraph + FastAPI | Super-agent harness with 14-layer middleware and SSE streaming | 13-middleware stack, composable loop detection, skill self-evolution, IM channel integration, LangGraph thread management | Less of a single polished CLI identity than Claude Code or Crush |
| Hermes | Python (Nous Research) | Self-improving multi-platform agent | Learning loop (skills), multi-platform (Telegram/Discord/Slack), MoA synthesis, RL training, remote execution | Breadth over focus — not as sharp as a dedicated coding agent |
| Pi Mono | TypeScript (Node.js 20+, v0.66.1, MIT) | Minimalist extension-first kernel | 23 providers across 10 APIs, tree-structured JSONL v3 sessions, differential TUI, Pi Packages (npm/git bundles), parallel tool execution, precise multi-edit with file mutation queue, 4 run modes | No built-in permissions, no MCP, no sub-agents — all must be added via extensions or packages |
| Codex | Rust (Cargo workspace, 70+ crates, 3,805 files, Apache-2.0) | OpenAI's production coding agent — deeply integrated product runtime | Platform-specific sandboxing (Seatbelt/bubblewrap/Windows tokens), bidirectional MCP, multi-agent job execution, IDE extensions | OpenAI-centric (Responses API only), Rust-only contribution barrier, no built-in learning loop |
| Wintermolt | Zig 0.15 (3 MB native binary) | Everything-agent platform with 7 modes and dozens of features | 6 backends, 16 tools, cron scheduling, Tailscale, camera vision, browser automation, MCP bidirectional, 4 chat bridges, macOS menu bar | Enormous feature surface; bash safety is pattern-based (not AST); TypeScript sidecars for non-CLI modes |
| Zaica | Zig 0.15 (~9,100 lines) | Focused coding specialist with structured workflows | Chain mode (.chain.md), Wyhash loop detection (3-tier), reactive state (zefx), parallel sub-agent dispatch, Cyrillic REPL | Only 5 core tools; no MCP, no RAG, no web UI; terminal-only experience |
| Goose | Rust (Cargo workspace, Apache-2.0, v1.32.0) | Extension-first agent with LLM-based security | 15+ providers, 5-layer security inspector stack, AdversaryInspector (LLM-based review), GooseMode (Auto/Approve/SmartApprove/Chat), recipe framework, MOIM injection | Extension-first model means core needs extensions for full functionality |
| Dirac | TypeScript (fork of Cline) | Hash-anchored coding agent with AST precision | Hash-anchored parallel edits, AST-native precision, multi-file batching, 64.8% cost reduction, no MCP, 8-type hook system, git checkpoints, state mutex, 40+ providers | No MCP support; inherits Cline architecture |
Deep per-agent profiles
Pochi — A six-vendor aggregator, not a single agent
The most important thing to understand about Pochi: it is not a single coding
agent. It is a multi-vendor aggregation platform that wraps
six distinct backends: vendor-tabby, vendor-pochi,
vendor-gemini-cli, vendor-codex,
vendor-github-copilot, and vendor-qwen-code.
The runtime requires Bun (not Node.js) specifically because
it needs wa-sqlite WASM support for its local database layer.
Pochi's internal API layer uses Hono + Zod
validation and exposes POST /api/chat/stream,
POST /api/chat, and GET /api/models. Models carry
a costType: "basic" | "premium" that maps to user-facing labels
"swift" and "super". Despite aggregating six backends,
the vendor-pochi package exposes only 2 tools:
webFetch and webSearch — everything else comes
from the wrapped backends.
The retry strategy is explicit: withRetry() caps at
3 attempts with 1000ms initial delay, 10000ms maximum,
and a multiplier of 2 (exponential backoff). Authentication uses
better-auth with JWT + device-link plugins; a
set-auth-token response header silently renews credentials
before they expire.
The apply-diff tool includes an expectedReplacements
safety parameter — if you declare two replacements but the pattern matches
three times, the edit fails. The CustomAgent shape in
new-task.ts lets you define inline sub-agents; the planner
agent uniquely retains askFollowupQuestion while others lose it.
Key files
packages/vendor-pochi/src/vendor.ts (withRetry, better-auth),
packages/vendor-pochi/src/pochi-api.ts (Hono + Zod routes),
packages/tools/src/apply-diff.ts (expectedReplacements safety),
packages/tools/src/new-task.ts (CustomAgent schema),
packages/vendor-codex/, packages/vendor-gemini-cli/
Neovate Code — The most opinionated bash tool in the set
Neovate's bash tool (src/tools/bash.ts) contains the most
elaborate pre-execution security logic in this repository set. The banned
command list is concrete and non-trivial:
alias, aria2c, axel, bash, chrome, curl, curlie, eval,
firefox, fish, http-prompt, httpie, links, lynx, nc,
rm, safari, sh, source, telnet, w3m, wget, xh, zsh
Beyond static bans, it detects command substitution by actually
parsing the shell syntax: it tracks single-quote, double-quote,
and backslash states to correctly identify $() and backtick
substitutions that would survive a naive regex check. The same character
state machine is used to split pipeline segments correctly before checking
each segment independently.
For high-risk detection it checks patterns like rm -rf,
sudo, dd if=, mkfs, and
curl | sh, as well as every segment in a pipeline individually
— if any segment is high-risk, the whole command is considered high-risk.
The context compression system (src/compression.ts) is also
more configurable than most: it has separate pruning and compaction phases,
a triggerRatio that triggers compaction when context usage
exceeds a percentage of the model limit, protectedTools that
are never pruned, a protectTurns count, and an
autoContinue mode that automatically resumes after compaction.
Key files
src/tools/bash.ts (banned commands, command substitution detection, pipeline segment analysis),
src/compression.ts (compaction + pruning config),
src/tools/task.ts (AI sub-task invocation)
Mux — Seven runtimes, 1Password secrets, and regex-gated tool lists
Mux ships seven distinct runtime backends in
src/node/runtime/: CoderSSHRuntime,
DevcontainerRuntime, DockerRuntime,
LocalRuntime, SSHRuntime,
WorktreeRuntime, and RemoteRuntime — each with
its own .test.ts. That list alone explains why the repo is
large: it supports everywhere code can run, not just local.
The infrastructure around those runtimes is equally serious:
gitBundleSync.ts syncs git bundles to remote environments,
credentialForwarding.ts tunnels SSH credentials into containers,
openSshPromptMediation.ts mediates SSH_ASKPASS
prompts, and SSH2ConnectionPool.ts manages a connection pool.
Config is JSONC and uses secret://op/... references for
1Password integration — secrets never hard-coded.
The 9 built-in agents are defined as Markdown files with YAML frontmatter
in src/node/builtinAgents/. Tool lists use
regex allow/deny patterns: - .* adds every
available tool; - file_edit_.* removes all file-edit tools;
- mux_agents_.* blocks config tools from sub-agents. The
exec.md agent has a hard protocol rule: "Before your
stream ends, you MUST call agent_report exactly once."
The orchestrator.md agent has an explicit prohibition:
"Do NOT create pull requests, push to remote branches, or run any
gh pr / git push commands."
Key files
src/node/runtime/ (7 runtime backends),
src/node/builtinAgents/exec.md (enforced agent_report protocol),
src/node/builtinAgents/orchestrator.md (no-push rule),
src/node/gitBundleSync.ts,
src/node/SSH2ConnectionPool.ts,
src/node/config.ts (1Password secret references)
Crush — Go discipline, binary-embedded prompts, SHA-256 loop detection, and LSP integration
Crush uses Go's //go:embed directive to embed its system prompt
templates directly into the binary at compile time. The three templates —
coder.md.tpl, task.md.tpl,
initialize.md.tpl — are Go text/template files
that render with runtime data: working directory, git repo status, date,
platform, context files, and available skill XML.
The coder prompt template runs to hundreds of lines and is remarkably specific.
It contains sections for <critical_rules>,
<communication_style>, <workflow>,
<decision_making>, <editing_files>,
<whitespace_and_exact_matching>,
<task_completion>, and <error_handling>.
MCP server instructions are injected as a separate
<mcp-instructions> block appended to the system prompt at
runtime.
The most distinctive engineering decision in this repo is the
SHA-256 loop detection in
internal/agent/loop_detection.go. For each agent step, a
signature is computed by hashing the concatenated
tool_name + "\x00" + input + "\x00" + output for every tool call
in that step. If any signature appears more than 5 times in the last 10 steps,
the agent is considered stuck. This is far more robust than checking tool names
alone — the same tool called with different arguments or producing different
output gets a different hash.
Crush also includes a dedicated internal/agent/agentic_fetch_tool.go
that uses a smaller model to browse the web on behalf of the main agent — a
mini-agent within the agent for cost-effective information retrieval.
Two more tools stand out as highly unusual in a coding CLI:
-
Sourcegraph search (
sourcegraph.go) — a native tool that queries Sourcegraph's code search API. Parameters includequery,count,context_window, andtimeout. This is the only agent in the set with first-class Sourcegraph integration. -
Full LSP integration —
diagnostics.goexposes anlsp_diagnosticstool that returns project-wide or file-level diagnostics from a live Language Server.references.goexposes a symbol references tool: given a symbol name, it queries the LSP to find all references across the project. These tools make Crush the only agent in this set that can call into an actual Language Server during a session.
Key files
internal/agent/loop_detection.go (SHA-256 tool signature hashing),
internal/agent/templates/coder.md.tpl (full system prompt),
internal/agent/prompts.go (go:embed usage),
internal/agent/tools/sourcegraph.go (Sourcegraph search),
internal/agent/tools/diagnostics.go (LSP diagnostics),
internal/agent/tools/references.go (LSP symbol references),
internal/agent/agentic_fetch_tool.go (mini-agent for web browsing)
Crush — Recent: PreToolUse hooks, Hyper provider, Azure support
Major recent additions to Crush:
PreToolUse hook system (dc003bf7, ~1,196 lines across 4 new files):
Shell commands fire on tool events. Hooks receive structured JSON via stdin
(CRUSH_EVENT, CRUSH_TOOL_NAME, CRUSH_CWD, etc.) and return decisions: allow, deny, halt (stops the whole turn), and optional context or updated_input JSON patches. Hooks run in parallel via goroutines with configurable timeouts. Output format supports both Crush format and Claude Code hook format for compatibility. Exit code 2 blocks, exit code 49 halts the turn.
Hyper provider: Crush ships with a new hyper provider from charm.land
(internal/agent/hyper/provider.go, provider.json). Enable via HYPER_API_KEY or HYPER_URL env vars. Provider.json is embedded at compile time via //go:embed. Models include GLM-5, GLM-5.1, GPT-OSS, Kimi K2.5, Kimi K2.6 with per-model pricing, context windows, and reasoning levels. A new quickstyle.go (980 lines) provides theme support with a themes.go system.
Azure provider support added. DeepSeek V4 reasoning content support fixed and maintained. Bedrock adaptive thinking improvements. The UI received significant style refactoring with semantic color names and improved theme switching.
New key files
internal/hooks/hooks.go (Runner, hook execution),
internal/hooks/input.go (payload/env builder),
internal/hooks/runner.go (parallel execution, exit code semantics),
internal/agent/hyper/ (Hyper provider + provider.json),
internal/ui/styles/quickstyle.go (new 980-line style system)
Kimi CLI — Provider-native message conversion and a hooks architecture
Kimi CLI is built around its own kosong abstraction package
which handles provider-specific message conversion at the low level.
The package contains dedicated converters for Anthropic (with
tool_use/tool_use_id), Google GenAI (with
function_call parts and thought_signature for
thinking tokens), and OpenAI Responses API (with
function_call/function_call_output items).
One concrete example of the care here: the Google GenAI converter handles
the fact that Gemini rejects an id field in
function_call or function_response parts —
there are API snapshot tests specifically for this case
(test_google_genai_no_id_in_function_call_or_response).
The hooks system (src/kimi_cli/hooks/events.py) is another
distinctive feature: three events are defined for every tool call —
pre_tool_use, post_tool_use, and
post_tool_use_failure. This lets external code intercept
tool calls before they run, observe results after they run, and handle
failures separately. The integration and E2E test suites confirm this
system is well-tested.
The ACP bridge layer converts internal tool results into
protocol-transportable content. The docs honestly list current gaps —
missing session/set_mode and session/set_model
— rather than implying perfect coverage.
Key files
packages/kosong/src/kosong/contrib/chat_provider/anthropic.py,
packages/kosong/src/kosong/contrib/chat_provider/google_genai.py,
src/kimi_cli/hooks/events.py,
packages/kosong/tests/api_snapshot_tests/
Qwen Code — Five-layer model config resolution
Qwen Code's model configuration system is the most rigorous in this set.
The ModelConfigResolver in
packages/core/src/models/modelConfigResolver.ts defines five
named source layers with explicit precedence:
- modelProvider — explicit selection from ModelProviders config (highest priority)
- CLI arguments —
--model,--openaiApiKey, etc. - Environment variables —
OPENAI_API_KEY,OPENAI_MODEL - Settings — user/workspace settings file
- Defaults — built-in default values (lowest priority)
Each layer is a typed ConfigLayer object. The resolver uses
named source types (cliSource, settingsSource,
modelProvidersSource, envLayer,
defaultSource, computedSource) so you can always
trace which layer a resolved value came from. Qwen Code also has a special
QWEN_OAUTH_ALLOWED_MODELS list that gates the OAuth auth path
to specific models. The system prompt can be overridden by setting
QWEN_SYSTEM_MD to a file path (e.g. .qwen/system.md),
set to 0/false to disable, or to 1/true
for the default path.
The turn.ts event model defines 16 named event types
in the GeminiEventType enum: Content,
ToolCallRequest, ToolCallResponse,
ToolCallConfirmation, UserCancelled,
Error, ChatCompressed, Thought,
MaxSessionTurns, SessionTokenLimitExceeded,
Finished, LoopDetected, Citation,
Retry, HookSystemMessage,
ToolCallConfirmation. The LoopDetected and
ChatCompressed events are first-class system conditions,
not error states.
The truncation recovery is equally detailed: two constants —
TRUNCATION_PARAM_GUIDANCE and
TRUNCATION_EDIT_REJECTION — handle the case where the
model's output is cut off mid-tool-call. The scheduler imports
diff and fast-levenshtein to verify
proposed file edits aren't corrupted by truncation.
Key files
packages/core/src/models/modelConfigResolver.ts (5-layer resolution),
packages/core/src/core/turn.ts (16 GeminiEventType values),
packages/core/src/core/coreToolScheduler.ts (truncation recovery),
packages/core/src/models/modelRegistry.ts,
packages/core/src/mcp/
OpenHands — Platform architecture with an ingenious retry strategy
OpenHands is the hardest repo to score fairly because the local snapshot is explicitly described as incomplete. The modern V1 agent core moved to a separate Software Agent SDK repository. But what remains is still architecturally interesting.
The retry logic (openhands/llm/retry_mixin.py) uses the
tenacity library with a documented, intentional quirk: on
LLMNoResponseError when temperature is 0, it
automatically bumps temperature to 1.0 on the next attempt.
The rationale: a deterministic model that returns nothing is stuck in a
degenerate fixed point. Adding randomness breaks the loop. This is one of
the more thoughtful retry strategies in the set.
fn_call_converter.py — marked LEGACY V0, removal
April 1, 2026 — converts between JSON function-calling and XML for
models that don't support native tool calls. The XML format uses
<function=name><parameter=key>value</parameter></function>
and uses </function as a stream-stop word. The
refine_prompt() function replaces 'bash' with
'powershell' automatically on Windows.
Most striking: OpenHands defines a CondensationRequestTool
— the agent itself can request history condensation, not
just the runtime. All tool calls carry a security_risk
attribute validated against a RISK_LEVELS dict.
The system prompt uses 9 Jinja2 .j2 templates
with named XML sections: <ROLE>,
<EFFICIENCY>, <SECURITY>,
<SECURITY_RISK_ASSESSMENT> (a separate included template),
<PULL_REQUESTS>, <PROBLEM_SOLVING_WORKFLOW>,
and more. The long-horizon variant adds
<TASK_MANAGEMENT> and
<TASK_TRACKING_PERSISTENCE> sections.
Claude Code — Undercover mode, a 10-state machine, and 15-file shell engineering
Claude Code's BashTool is not one file — it is a directory with 15 specialized modules. This is the most specific signal of how Claude Code treats coding-agent behavior as its own software domain:
BashTool.tsx— main tool definitionbashSecurity.ts— Zsh-specific attack detectionbashPermissions.ts— permission gate logiccommandSemantics.ts— semantic classification of commandsdestructiveCommandWarning.ts— explicit user warningssedEditParser.ts+sedValidation.ts— sed-style inline editsmodeValidation.ts— mode checks per commandpathValidation.ts— path safety checksreadOnlyValidation.ts— read-only mode enforcementshouldUseSandbox.ts— sandbox routing decisionbashCommandHelpers.ts,commentLabel.ts,utils.ts
The bashSecurity.ts file alone covers Zsh-specific attack vectors
that no other agent in this set defends against explicitly. ZSH_DANGEROUS_COMMANDS
is a Set for O(1) lookup. Blocks include: zmodload,
emulate, sysopen/sysread/syswrite, zpty,
zsocket, all zf_* filesystem primitives, process
substitution <(/>(/=(, heredoc-in-substitution
$\(.*<<, and even PowerShell comment syntax <#
as "defense in depth against future changes." Tree-sitter parses the shell AST
to detect these reliably.
The query engine runs as a named state machine
(src/query/transitions.ts) with 10 terminal exit reasons
and 8 continue reasons. Terminal: 'completed',
'blocking_limit', 'image_error',
'model_error', 'aborted_streaming',
'aborted_tools', 'prompt_too_long',
'stop_hook_prevented', 'hook_stopped',
'max_turns'. Continue: 'tool_use',
'reactive_compact_retry',
'max_output_tokens_recovery',
'max_output_tokens_escalate',
'collapse_drain_retry', 'stop_hook_blocking',
'token_budget_continuation', 'queued_command'.
Claude Code contains an "undercover mode"
(isUndercover() in src/tools/BashTool/prompt.ts)
that activates when process.env.USER_TYPE === 'ant'.
Purpose: prevent the model from volunteering internal Anthropic codenames
in commit messages. The code comments note: "Defense-in-depth: undercover
instructions must survive even if the user has disabled git instructions
entirely." It is built with Bun's import { feature } from 'bun:bundle'.
Token budget tracking (src/query/tokenBudget.ts) uses
COMPLETION_THRESHOLD = 0.9 to nudge at 90% budget and
DIMINISHING_THRESHOLD = 500 — if the per-check token delta
drops below 500 for three consecutive checks, the agent is considered done.
Sub-agents skip budget tracking entirely when agentId is present.
Key files
src/tools/BashTool/bashSecurity.ts (Zsh attack catalog),
src/tools/BashTool/prompt.ts (undercover mode),
src/query/transitions.ts (10+8 state machine),
src/query/tokenBudget.ts (COMPLETION_THRESHOLD=0.9),
src/tools/TeamCreateTool/, src/QueryEngine.ts
Open Claude Code — Async generator clone with nightly verification
Open Claude Code 2.0 is a clean-room rebuild of Claude Code v2.1.91 via
"ruDevolution" — AI-powered decompilation of the published npm package. The
archive/ contains the decompiled 7.3MB CLI; v2/ is the
clean-room reimplementation: 61 files, 8,314 lines, 1,581 tests.
Its async generator agent loop (v2/src/core/agent-loop.mjs)
yields 13 event types and recursively calls itself after tool execution
(yield* run(null, { continuation: true })). The loop handles
streaming, token tracking, auto-compaction at 80% threshold, and 7 hook events:
PreToolUse, PostToolUse, PreToolUseFailure, PostToolUseFailure, Notification,
Stop, SessionStart. Exit code 2 blocks; exit code 49 halts the turn.
5 providers: Anthropic (primary), OpenAI, Google, AWS Bedrock,
Google Vertex. Request/response transforms normalize across API shapes.
25 tools with validateInput/call interface. File checkpointing via
checkpoints.mjs before dangerous ops. Git worktree isolation via
EnterWorktree/ExitWorktree for parallel agent tasks — unique among
these agents. Session export/import for "teleport" between machines.
Nightly release pipeline: Automated CI/CD detects new Claude Code
npm releases (03:00 UTC), runs 903+ tests, npm audit, and Claude Sonnet 4.6 AI-powered
change analysis. Only publishes if ALL gates pass. The rudevolution
submodule tracks 34,759+ function declarations with 95.7% naming accuracy.
Key files
v2/src/core/agent-loop.mjs (462 lines, async generator),
v2/src/tools/bash.mjs (148 lines, sandboxed shell),
v2/src/tools/agent.mjs (127 lines, worktree support),
v2/src/core/providers.mjs (5-provider multi-client),
v2/src/hooks/engine.mjs (7 hook types),
v2/src/ui/commands.mjs (40 slash commands),
archive/open_claude_code/cli.mjs (7.3MB decompiled)
DeerFlow — LangGraph harness with 14-layer middleware and SSE streaming
DeerFlow is the only agent in this set built on LangGraph, and it shows.
The backend is a FastAPI application (backend/app/gateway/app.py)
that initializes a LangGraph runtime on startup: checkpointer, store,
StreamBridge, and RunManager all come up as async components in the
application lifespan handler.
Every agent turn passes through a 14-layer middleware stack
in agents/middlewares/:
LoopDetectionMiddleware,
TokenUsageMiddleware,
MemoryMiddleware,
TodoMiddleware,
TitleMiddleware,
ClarificationMiddleware,
SubagentLimitMiddleware,
ViewImageMiddleware,
SandboxAuditMiddleware,
DeferredToolFilterMiddleware,
DanglingToolCallMiddleware,
ToolErrorHandlingMiddleware,
UploadsMiddleware.
No other repo in this set has a composable middleware architecture; most
handle these concerns inline or not at all.
DeerFlow's loop detection (agents/middlewares/loop_detection_middleware.py)
has a noteworthy special case: calls to read_file have their
line numbers bucketed into 200-line groups before hashing, to
avoid false positives from paginated reads. On warn (3 repeats), it injects
a HumanMessage("you are repeating yourself — wrap up"). On hard
limit (5 repeats), it strips tool_calls entirely
from the response, forcing a plain-text answer.
Skill self-evolution is explicit in the system prompt: triggers include
"5+ tool calls used," "user corrected approach," and "non-obvious errors
encountered." The prompt warns hard: "HARD ERROR. The system WILL
discard excess [sub-agent] calls and you WILL lose work." Skills cache
pre-loads in a daemon background thread (warm_enabled_skills_cache(timeout=5.0))
at startup.
Key files
backend/app/gateway/app.py (FastAPI + LangGraph init),
agents/middlewares/loop_detection_middleware.py (200-line bucket, tool_calls strip),
agents/middlewares/ (13-middleware stack),
backend/app/gateway/routers/runs.py (SSE streaming),
backend/langgraph.json
Hermes — Self-improving agent with multi-platform reach and RL infrastructure
Hermes by Nous Research is unique in this study: it is the only agent with a closed learning loop, multi-platform messaging support, and RL training infrastructure in the same codebase.
The skill system stores reusable procedures as SKILL.md
files in ~/.hermes/skills/. Skills are injected as
user messages (not system prompt) to preserve the prompt cache.
The agent can create, edit, and delete skills via skill_manager_tool.py,
and every agent-created skill is security-scanned before saving.
Two memory files persist knowledge across sessions: MEMORY.md
(agent notes) and USER.md (user model). Both are scanned for
prompt injection before loading. The context compressor
(agent/context_compressor.py) uses a five-step algorithm with
structured summaries (Goal, Progress, Decisions, Files, Next Steps) and
iterative update on repeated compression.
The tools/mixture_of_agents_tool.py implements MoA: four
reference models (Claude Opus, Gemini Pro, GPT-5, DeepSeek) run in parallel,
and an aggregator model synthesizes the results. This is an optional tool
the agent can invoke for hard problems — unique in this set.
The gateway covers 14+ messaging platforms: Telegram, Discord, Slack, WhatsApp, Signal, WeChat/WeCom, Matrix, Mattermost, Feishu/Lark, DingTalk, Email, SMS, HomeAssistant, and a generic Webhook adapter. Each platform module is a full adapter with auth, inbound message handling, allowlists, dedup, and ACK logic.
See the dedicated Hermes page for full coverage.
Which code looks best designed?
1. Claude Code
Best end-to-end product coherence. Every tool is a directory, not a file. Permissions, UX, security, and task management are all native product concerns.
2. Crush
Best structural cleanliness. Binary-embedded templates, SHA-256 loop detection, and the Go type system keep the design honest and maintainable.
3. Qwen Code
A very solid multi-provider CLI core. The five-layer config resolver and clean tool/MCP separation are the strongest generic engineering here.
4. Mux
Strongest large-surface product architecture among the provider-rich repos. Ambitious, broad, and still impressively organized.
5. Hermes
Most functionally unique. The learning loop, multi-platform, and RL infrastructure are unlike anything else here. The tradeoff is focus.
Pi Mono — The 438-file extension-first kernel
Pi Mono (@mariozechner/pi-coding-agent,
v0.66.1, MIT, by Mario Zechner) is a TypeScript
coding agent that deliberately omits features other agents bake in:
no MCP, no sub-agents, no permissions, no plan mode, no todos.
Instead it ships a 438-file kernel with 7 core tools,
23 provider support, tree-structured JSONL v3 sessions, and a
differential-rendering TUI.
The 7 tools (read, bash, edit, write, grep, find, ls) use TypeBox JSON schemas with AJV validation. The edit tool supports multiple disjoint edits per call with fuzzy matching (normalizes Unicode quotes, dashes, spaces), uniqueness validation, reverse-order application, and line ending preservation. A file mutation queue serializes concurrent writes to the same file — a subtle race condition that plagues many other agents.
The bash tool uses a pluggable BashOperations interface
with streaming output, process tree killing, 10MB stdout/stderr caps,
and temp file fallback for overflow. Extensions can intercept via
BashSpawnHook.
Sessions are tree-structured JSONL (v3) with
id/parentId fields enabling in-place
branching. Entry types include message,
thinking_level_change, model_change,
compaction (with firstKeptEntryId),
branch_summary, and label (bookmarks).
The /tree command navigates the tree; /fork
creates a new session file from a branch point.
The agent runtime supports parallel tool execution
(default, configurable to sequential),
steering messages (delivered after current turn —
real-time interruption), followup messages (delivered
after agent stops), and a PendingMessageQueue with
two delivery modes. The event system exposes 20+ lifecycle events
including beforeToolCall/afterToolCall hooks.
23 providers across 10 API implementations: Anthropic,
OpenAI, Google (3 variants), Azure OpenAI, OpenAI Codex, GitHub Copilot,
xAI, Groq, Cerebras, OpenRouter, Vercel AI Gateway, ZAI, Mistral, MiniMax
(2 variants), HuggingFace, OpenCode (2 variants), Kimi Coding, and Amazon
Bedrock. The ModelRegistry (with
14,278 lines of auto-generated model definitions) supports
glob patterns, thinking level suffixes (model:high), alias
preference over dated versions, and ambiguity rejection.
The extension system uses jiti (TypeScript runtime executor) with virtual module support for compiled Bun binaries. Extensions can register tools, commands, shortcuts, flags, message renderers, custom providers, and subscribe to 20+ events. Pi Packages (installable via npm or git) auto-discover extensions, skills, prompts, and themes from their directory structure.
The TUI uses differential rendering — only changed terminal cells are redrawn at 60fps throttle (16ms min). It supports Kitty graphics protocol, hardware cursor via APC escape sequences, and overlay stack with focus management.
The web UI (Lit components) includes a full artifact system (HtmlArtifact, ImageArtifact, MarkdownArtifact, etc.), sandboxed iframes with runtime providers, a JavaScript REPL, and document extraction (PDF, DOCX, XLSX).
Key files
packages/coding-agent/src/core/tools/*.ts (TypeBox tool definitions),
packages/coding-agent/src/core/tools/edit-diff.ts (multi-edit, fuzzy matching),
packages/coding-agent/src/core/tools/file-mutation-queue.ts (per-file locking),
packages/agent/src/agent-loop.ts (parallel execution, steering/followup queues),
packages/ai/src/providers/models.generated.ts (14,278 lines, 23 providers),
packages/tui/src/tui.ts (differential rendering),
packages/coding-agent/src/core/extensions/ (jiti-based extension system),
packages/web-ui/src/tools/artifacts/ (artifact system, JS REPL)
Codex — OpenAI's production coding agent in Rust
Codex is OpenAI's open-source coding agent, implemented entirely in Rust as a Cargo workspace with 70+ crates across 3,805 files (Apache-2.0 license). It represents the most production-grade agent codebase OpenAI has released — the same runtime that powers their internal product, now available for inspection and contribution.
The sandboxing system is the most platform-specific in this set: macOS uses Seatbelt (sandbox-exec), Linux uses bubblewrap (bwrap), and Windows uses process tokens and job objects. Each platform gets a native sandbox implementation rather than a cross-platform abstraction. The sandbox profiles define fine-grained filesystem, network, and process isolation policies.
MCP is bidirectional — Codex can act as an MCP server (exposing its tools to external clients) and as an MCP client (loading external MCP servers). Multi-agent job execution is built-in: the agent can spawn sub-jobs with their own tool scopes and collect results asynchronously. IDE extensions are first-class: VS Code and JetBrains integrations ship alongside the CLI.
The core business logic lives in codex-rs/core, the TUI uses
Ratatui in codex-rs/tui, tool schemas are
defined in codex-rs/tools, platform sandboxes are in
codex-rs/sandboxing, and the execution policy rule engine is
in codex-rs/execpolicy.
Key files
codex-rs/core/ (business logic),
codex-rs/tui/ (Ratatui TUI),
codex-rs/tools/ (tool schemas),
codex-rs/sandboxing/ (Seatbelt/bubblewrap/Windows tokens),
codex-rs/execpolicy/ (rule engine)
Wintermolt — The 3 MB everything-agent in Zig
Wintermolt compiles to a single 3 MB native binary
with zero runtime (no Node.js, no Python, no Electron). It links only
libcurl and sqlite3, both pre-installed on
macOS and most Linux distributions. It can be cross-compiled to any
Zig target — including ARM boards like Jetson and Raspberry Pi — with
one command: zig build -Dtarget=aarch64-linux-gnu.
The codebase is ~18,400 lines across 51 Zig source files,
plus two prebuilt static libraries from sibling projects. The agentic
loop (src/agent/loop.zig, 646 lines) runs up to
25 tool iterations per turn with automatic fallback
(primary → Ollama → OpenAI) if the backend fails.
6 AI backends share a unified wire protocol
(src/api/protocol.zig, 281 lines). Three hand-written
streaming parsers handle the different formats: SSE for Anthropic
(461 lines), SSE for OpenAI (287 lines), and NDJSON for Ollama
(250 lines). The DeepSeekClient is effectively a
universal OpenAI-compatible client reused for OpenAI, DeepSeek, Qwen,
Gemini, and custom endpoints.
16+ built-in tools span bash, file I/O, grep, glob, HTTP, web search, camera capture, image processing, Chrome DevTools Protocol browser automation (~895 lines), Pinecone RAG memory search, cron scheduling, Tailscale mesh queries, A2UI canvas rendering, and CLI-Anything harness generation. Tool dispatch uses a 3-layer system: built-in → runtime skills → MCP remote tools.
MCP is bidirectional — server (~214 lines) exposes
20+ tools via JSON-RPC 2.0 over stdio; client (~421 lines) loads
external MCP servers from config, spawns them as child processes, runs
the 3-step handshake, and prefixes discovered tools with
"servername__".
The cron scheduler (src/agent/scheduler.zig,
756 lines) is SQLite-persisted with three schedule types (every/at/cron)
and auto-disables jobs after max_retries failures.
No other agent in this set has a built-in cron scheduler.
All sidecars (chat, web, menubar, gateway) follow the same
JSON-lines-over-stdio IPC pattern — Zig spawns a
child process (TypeScript/Swift), communicates via clean stdin/stdout
pipes. The chat bridge uses a 7-tier priority binding system
(src/agent/router.zig, 585 lines).
Key files
src/agent/loop.zig (agentic loop, 646 lines),
src/agent/tools.zig (tool dispatch, 717 lines),
src/agent/scheduler.zig (cron, 756 lines),
src/tools/browser.zig (CDP, 895 lines),
src/web/bridge.zig (WebSocket, 1,062 lines),
src/mcp/client.zig (MCP client, 421 lines),
src/agent/rag.zig (Pinecone RAG, 577 lines)
Zaica — The focused Zig coding specialist
Zaica is a ~9,100-line Zig 0.15 coding assistant
with zero runtime dependencies beyond the standard library. It is
distributed via Homebrew for macOS (aarch64 + x86_64)
and Linux (aarch64 + x86_64). It uses std.http.Client
directly — no HTTP library needed.
The central abstraction is the Node in
src/node.zig (731 lines) — the generic agentic loop
used by both the REPL and sub-agents. In terminal mode, tools run
in parallel using std.Thread.spawn;
in silent mode (sub-agents, chain mode), they run
sequentially to avoid threads-inside-threads.
Zaica's most distinctive feature is its Wyhash-based loop detection: a ring buffer of tool call signatures detects repeating patterns of length 1, 2, or 3 within a 10-call window with 3-tier escalation (warning → stronger warning → force break). The same tool with different arguments gets a different hash — so legitimate iteration isn't flagged. This is the most sophisticated loop detection in this entire directory set.
Chain mode (src/chain.zig, 528 lines)
implements structured workflows via .chain.md files
with per-step tool filtering, variable substitution
({task}, {previous}), and max iterations.
No other agent in this set offers chain mode.
Instead of imperative state management, zaica uses a custom
reactive state graph via zefx
(Effector-inspired). Events trigger Store reducers, which trigger
watchers in a two-phase flush. The status bar is a watcher on
derived stores — unique architecture for a coding agent.
The 3-tier permission model is elegantly simple:
[y]es all / [s]afe only / [n]o,
asked once per session. The bash wrapper redirects stdin from
/dev/null, kills process trees on timeout, and caps
output at 1 MB.
The config system uses a 6-layer JSON priority chain
(comptime defaults → provider presets → user config → project config
→ env vars → CLI flags) with deep object merging. The REPL
(src/repl.zig, 2,153 lines) implements manual line
editing with full UTF-8 and Cyrillic support.
Key files
src/node.zig (agentic loop, 731 lines),
src/repl.zig (full REPL, 2,153 lines),
src/chain.zig (structured workflows, 528 lines),
src/state.zig (reactive state, 600 lines),
src/tools.zig (7 tools + permissions, 631 lines),
src/client/sse.zig (SSE parser, 408 lines),
lib/zefx/ (reactive engine)
Dirac — Hash-anchored edits, AST precision, and 64.8% cost reduction
Dirac is a TypeScript coding agent — a fork of Cline — that takes a
fundamentally different approach to file editing. Where most agents use
line numbers (which drift when the file changes), Dirac uses
stable line hashes to anchor edits. When the model reads
a file, each line gets a deterministic hash. Edits are then specified as
anchor + end_anchor + replacement text rather than
start_line + end_line + text. This means edits survive file
shifts and multiple disjoint edits can be applied in a single batch
with no coordinate conflicts.
The EditExecutor (src/core/task/tools/handlers/edit-file/EditExecutor.ts)
resolves anchors by checking that the anchor name starts with a capital
letter, exists in the file's line hash list, and that the provided content
matches the actual file content at that hash. If any check fails, the edit
fails with a diagnostic. The BatchProcessor applies multiple
edits in reverse line order — highest line index first — so that earlier
edits don't shift the coordinates of later ones.
Beyond hash-anchored text edits, Dirac has AST-aware tools that target specific symbols — classes, functions, interfaces — directly rather than by text position. This means edits are always syntactically valid; JSDoc comments, decorators, and type annotations are preserved automatically. This is the structural equivalent of the hash-anchored system: both prevent the "friction" of coordinate-based editing.
Dirac's token efficiency is a deliberate engineering target backed by
multiple mechanisms: hash-anchored multi-file batching (multiple files
edited in a single LLM roundtrip), get_file_skeleton for
project structure mapping without reading every line, ContextManager
truncation with half/quarter strategies, concurrent tool calling, and
a minimal PRIME DIRECTIVES system prompt. On 8 real-world refactoring
tasks, Dirac achieved 8/8 correct at an average cost of
$0.18 — versus $0.38–$0.73 for competitors. That is
a 2.8x cost reduction.
The state mutex pattern (via p-mutex) is
used to serialize all state modifications in the main Task loop. This
prevents race conditions between the concurrent tool executor and the
main task loop without sacrificing the performance benefit of parallel
tool calls. Every state modification — tool results, message updates,
task history writes — goes through withStateLock.
Dirac has an 8-type hook system: onTaskStart, onTaskComplete,
onTaskCancel, onTaskResume, preToolUse, postToolUse, preCompact, and preRequest.
Hooks are discovered via auto-discovery from AGENTS.md,
.claude/, or .agents/ directories. The hook executor
runs scripts with structured JSON via stdin and supports cancellation via
AbortController. Hooks can return a cancel decision,
a contextModification to alter behavior, or an
errorMessage.
The git checkpoint system creates commits before risky
operations, enabling revert. Plan/Act mode separation is
first-class — Plan mode gathers information and presents a strategy before
asking for user approval to switch to Act mode. YOLO mode
(dirac -y) runs fully autonomously with auto-approval.
Shell command validation uses DIRAC_COMMAND_PERMISSIONS — a
JSON object with allow/deny glob patterns — more flexible than a ban list.
The subagent system (SubagentToolHandler) spawns isolated
children with their own configuration. Skills are
auto-discovered from AGENTS.md, .claude/, and
.agents/. Provider support covers 40+ APIs
including Anthropic, OpenAI, Google, AWS Bedrock, Azure, and many
OpenAI-compatible gateways.
The most notable gap: Dirac does not implement MCP support. This is an explicit design decision, not an oversight. For users who need MCP integration, this is a significant gap. For users who want a tightly integrated, self-contained agent with the best structural edit accuracy in the TypeScript agent space, this is a feature.
Key files
src/core/task/index.ts (1,868 lines — agent loop, state mutex, tool orchestration),
src/core/task/tools/handlers/edit-file/EditExecutor.ts (hash-anchor resolution),
src/core/task/tools/handlers/edit-file/BatchProcessor.ts (reverse-order batch processor),
src/core/context/context-management/ContextManager.ts (half/quarter truncation),
src/core/hooks/hook-executor.ts (8-type hook system with streaming and cancellation),
src/core/prompts/system-prompt/template.ts (PRIME DIRECTIVES),
src/core/api/retry.ts (exponential backoff 2s, 4s, 8s),
cli/man/dirac.1.md (full CLI reference)
Important caveat
"Best designed" here means best aligned between code and product intent, not "best for every user." Kimi is more protocol-focused, OpenHands is more platform-shaped, DeerFlow is more compositional, and Neovate is more security-conscious about shell execution. Different goals produce different tradeoffs.