The Agent Harness Field Guide
This blog audits the code that is actually present in
coding-agents\: how each agent wires models, exposes tools,
runs shell commands, speaks MCP or ACP, and where Claude Code and Hermes
feel fundamentally different from the rest of the field — plus OpenAI's
own Codex CLI, a Rust-native agent with platform-specific sandboxes and
bidirectional MCP support.
What this deep dive covers
The interesting part of coding agents is not the marketing layer. It is the runtime beneath it: whether the tool layer is generic or bespoke, whether shell access is tightly guarded or casually wrapped, whether model support is truly abstracted or just superficially multiplexed, and whether the repo reads like a productized operating environment or a fast-moving integration shell.
I read the local repositories for Pochi, Neovate Code, Mux, Crush, Kimi CLI, Qwen Code, OpenHands, Claude Code, DeerFlow, Hermes Agent (by Nous Research), Codex CLI (by OpenAI), and Open Claude Code 2.0 (a clean-room implementation via AI decompilation), then mapped them against the same questions on every page of this site.
Methodology note
This is a local-only repo study. I did not use outside documentation beyond what is already checked into the worktree. That matters most for OpenHands, where the repo itself says the newer V1 agent core now lives elsewhere.
Repos audited
- Pochi - TypeScript monorepo with vendor-specific model adapters and built-in agents.
- Neovate Code - TypeScript CLI with AI SDK providers, MCP, and hardened bash tooling.
- Mux - large TypeScript desktop/browser agent platform with workspaces and provider routing.
- Crush - Go-based terminal product with custom tools, permissions, and provider metadata plumbing.
- Kimi CLI - Python terminal agent focused on Kimi platform flows plus ACP and MCP bridges.
- Qwen Code - Gemini CLI descendant with strong config resolution, declarative tools, and MCP lifecycle management.
- OpenHands - platform/runtime repo with sandbox and legacy CodeAct agent pieces still present locally.
- Claude Code - deeply integrated Bun + React/Ink runtime centered on Anthropic models and a huge internal tool surface.
- DeerFlow - LangGraph-based super-agent harness with middleware, subagents, SSE streaming, and config-driven model factories.
- Hermes Agent (Nous Research) - Python agent with persistent skill learning, 14+ messaging platform gateways (Telegram, Discord, Slack, WhatsApp, Signal, WeChat, Matrix, Mattermost, Feishu, DingTalk, Email, SMS, HomeAssistant, Webhook), MoA synthesis, 6 execution backends, and RL training infrastructure.
- Codex CLI (OpenAI) - Rust-native coding agent (3,805 files, 70+ crates) with macOS Seatbelt, Linux bubblewrap/Landlock, Windows sandbox, MCP client and server, multi-agent spawning, OpenAI Responses API plumbing, configurable TOML config, IDE extensions, and a Ratatui TUI.
- Pi Mono - TypeScript minimalist kernel (438 files) with tree-structured JSONL v3 sessions, differential TUI rendering, Pi Packages (shareable bundles via npm/git), 23 providers across 10 APIs, parallel tool execution, file mutation queue, 4 run modes, MIT license, author Mario Zechner.
- Wintermolt - Zig 0.15 native binary (3 MB, zero runtime) with 6 AI backends, 16 tools, cron scheduling, Tailscale mesh, camera vision, browser automation, MCP bidirectional, chat bridges to 4 platforms, and a macOS menu bar app.
- Zaica - Zig 0.15 focused coding agent (~9,100 lines, zero runtime) with multi-provider LLM support, chain-mode structured workflows, parallel sub-agent dispatch, reactive state management (zefx), and Wyhash-based loop detection.
- Goose (AAIF/Linux Foundation) - Rust-native AI agent (v1.32.0, Apache-2.0) with 15+ providers, 5-layer security inspector stack, LLM-based AdversaryInspector, 4 GooseModes (Auto/Approve/SmartApprove/Chat), extension system, recipe framework, and MOIM injection.
- Open Claude Code 2.0 - Clean-room implementation of Claude Code via AI-powered decompilation (1,581 tests, async generator architecture, multi-agent teams, git worktree isolation, 7-type hook system).
- Dirac - TypeScript fork of Cline with hash-anchored parallel edits, AST-native precision, multi-file batching, 64.8% cost reduction, no MCP, hook system, git checkpoints, state mutex, and 40+ provider support. 8/8 on TerminalBench 2.0 evals.
The landscape in one screen
Bespoke runtime products
Claude Code and Crush feel like full terminal operating environments, not thin wrappers. Their tool, permission, and UX layers are part of the product, not just adapters around a chat loop.
Most opinionatedProvider multiplexers
Mux, Neovate, and Qwen Code all build serious provider catalogs and shared abstractions. They want broad model reach more than a single model-native identity.
Most configurableProtocol and adapter layers
Pochi and Kimi CLI stand out for their ecosystem bridges. Pochi ships vendor-specific packages for Codex, Qwen, Copilot, and others. Kimi invests heavily in ACP and in translating internal tool output into protocol-friendly shapes.
Most bridge-heavyAgent frameworks and sandboxes
DeerFlow and OpenHands are less about one CLI persona and more about broader orchestration: sandboxes, middleware, long-running services, app servers, and task execution environments.
Most framework-likeMinimalist extension-first agents
Pi Mono ships a razor-sharp kernel (438 files) with tree-structured JSONL v3 sessions, differential TUI rendering, and 23 providers across 10 APIs — plus Pi Packages (shareable bundles via npm/git), parallel tool execution, a file mutation queue, and 4 run modes. MIT licensed by Mario Zechner.
Most minimalistSelf-improving multi-platform agents
Hermes Agent (Nous Research) is in a category of its own: a persistent skill-learning loop, six remote execution backends, 14+ messaging platform gateways (Telegram, Discord, Slack, WhatsApp, Signal, WeChat, Matrix, Mattermost, Feishu, DingTalk, Email, SMS, HomeAssistant, Webhook), MoA synthesis, and RL training infrastructure.
Most functionally uniqueZero-runtime native binaries
Wintermolt and Zaica are both written entirely in Zig 0.15 — no Node.js, no Python, no garbage collector. They compile to single native binaries (Wintermolt is 3 MB) with cross-compilation to any Zig target including ARM boards. Wintermolt goes wide (7 modes, cron, Tailscale, camera, browser, MCP, chat bridges). Zaica goes deep (chain-mode workflows, reactive state, Wyhash loop detection).
Most portableOpenAI's own coding agent
Codex CLI is OpenAI's production coding agent — a Rust workspace of 70+ crates with platform-specific sandboxes (Seatbelt, bubblewrap/Landlock, Windows restricted tokens), bidirectional MCP (client and server), multi-agent job execution, and a configurable provider system that supports Ollama and LM Studio.
Most sandboxedExtension-first security advocates
Goose (AAIF at Linux Foundation) takes a unique approach
to security: it uses an LLM-based AdversaryInspector that fires
a second LLM call to review tool calls against user-defined rules from
~/.config/goose/adversary.md. This is defense-in-depth for
multi-agent setups where parent agents delegate to sub-agents. With 15+ providers,
4 GooseModes (Auto/Approve/SmartApprove/Chat), and a recipe framework,
Goose is Rust-native with extensive feature gates (local-inference, aws-providers, otel).
Fast takeaways
| Question | Best answer from this snapshot | Why |
|---|---|---|
| Which repo feels most different? | Claude Code | It is the least generic and the most integrated: Anthropic-first, huge tool catalog, plan/worktree/team flows, React terminal UI, permission system, and a massive central query runtime. |
| Which repos are most model-agnostic? | Mux, Neovate, Qwen Code, Goose | All invest in provider registries, routing layers, and shared config resolution instead of pinning themselves to one native model family. Goose ships 35+ provider modules across direct APIs, ACP bridges, and declarative JSON configs. |
| Which repo adapts multiple ecosystems most explicitly? | Pochi | It does not stop at a generic provider interface; it ships vendor-specific packages for Codex, Qwen Code, GitHub Copilot, Gemini CLI, and more. |
| Which shell tooling is most safety-conscious? | Neovate Code, Claude Code, and Goose | Neovate hard-codes command bans and high-risk detection (22-item banned list, quote-aware pipeline parser), while Claude layers permissions, tree-sitter AST analysis, and Zsh-specific attack detection over a richer command surface. Hermes uses supply-chain verification (cosign provenance) for its execution environment. Goose uniquely uses an LLM-based AdversaryInspector that fires a second LLM call to review tool calls against user-defined rules. |
| Which repo has the most unique capabilities? | Hermes Agent | Self-improving skill loop, 6 remote backends, multi-platform IM gateways, MoA synthesis across 4 frontier models, and RL training infrastructure — none of which appear anywhere else in this set. |
| Which repo is hardest to judge from local code alone? | OpenHands | The local repo still contains useful architecture, but its own docs say the newer V1 agent core moved to a separate Software Agent SDK repository. |
| Which code feels most polished? | Claude Code, Crush, Mux, Qwen Code | These four snapshots show the clearest internal consistency between product goals, tool design, configuration, and error handling. Crush is notable for being the only agent with native LSP diagnostics and Sourcegraph code search as first-class tools. |
| Which repo is the most extension-friendly? | Pi Mono | A tight 438-file kernel that deliberately ships without MCP, permissions, or sub-agents — expecting you to compose them via extensions. Pi Packages let you bundle and share configurations across projects via npm or git. |
| Which repo has the most platform-specific sandboxing? | Codex CLI | Three separate sandbox implementations — macOS Seatbelt, Linux bubblewrap/Landlock, and Windows restricted tokens — each with split-filesystem awareness and carveout support. Also the only agent in this set that doubles as an MCP server for other agents. |
Approximate codebase size by file count
File count is not the same thing as quality, but it does reveal where the implementation surface is broadest.
Codex CLI
3805 filesOpenHands
2774 filesMux
2226 filesClaude Code
2137 filesQwen Code
2038 filesPochi
1315 filesKimi CLI
899 filesDeerFlow
810 filesCrush
799 filesNeovate
582 filesHermes
~450 filesPi Mono
438 filesWintermolt
51 Zig files (~18,400 lines)Zaica
~13 files (~9,100 lines)Open Claude Code 2.0
61 files (~8,300 lines)Goose
Rust Cargo workspace (~6+ crates)Dirac
TypeScript monorepo (fork of Cline)My high-level verdict
Best designed, if you value a coherent product runtime
Claude Code is the standout. It is not the most provider-flexible repo, but it is the clearest example of an agent built as its own operating model: tool schemas, permissioning, commands, tasking, worktrees, UI, feature flags, and retry logic all sit inside one deliberate runtime.
Best designed, if you value clean systems engineering
Crush is the nicest surprise. The Go codebase feels disciplined, modular, and product-minded without being bloated. Its provider plumbing, permissions, and TUI organization are easier to reason about than many faster-moving TypeScript peers.
Best multi-model architecture
Mux and Qwen Code lead here. Mux has a broad provider routing layer with desktop app ambitions, while Qwen Code has a particularly strong configuration and runtime model-resolution story.
Most extensible framework shape
DeerFlow wins on composability. It feels more like a harness for building agent systems than a single agent persona, which makes it powerful but also less opinionated than Claude Code or Crush.
Most functionally unique
Hermes Agent by Nous Research. The self-improving skill loop, 14+ messaging platform gateways, MoA tool (4 frontier models in parallel), and RL training infrastructure are not features in any other repo here. It is the only agent that explicitly tries to get better at your tasks over time.
Most portable — zero runtime, one binary
Wintermolt and Zaica are the only agents here that compile to a single native binary with zero runtime dependency. Wintermolt (3 MB, 18,400 lines) is the most ambitious agent in any language. Zaica (~9,100 lines) is the most focused coding specialist with chain-mode workflows and best-in-class loop detection.
Most extension-friendly kernel
Pi Mono by Mario Zechner. A tight 438-file TypeScript kernel with tree-structured JSONL v3 sessions, differential TUI rendering, 23 providers across 10 APIs, Pi Packages (shareable bundles via npm/git), parallel tool execution, a file mutation queue, and 4 run modes. MIT licensed and deliberately minimal so you can build MCP, permissions, or sub-agents yourself.
Most security-conscious sandboxing
Codex CLI by OpenAI. Three platform-specific sandbox
implementations (macOS Seatbelt, Linux bubblewrap/Landlock, Windows
restricted tokens), split-filesystem awareness, an execution policy
engine with a rule DSL, bidirectional MCP (client and server), and a
strict clippy lint policy that bans unwrap_used and
expect_used across 70+ crates.
How to read the rest of this site
Compare tool schemas, shell execution, MCP support, and recovery patterns.
Read per-repo profiles, strengths, weaknesses, and fit.
See who is genuinely provider-neutral and who writes model-specific logic.
The dedicated page on why Claude Code feels like a category of its own.
The LangGraph-based super agent harness with 14-layer middleware, skill evolution, and sub-agent orchestration.
The platform-shaped agent with Docker sandboxing and ingenious temperature-bumping retry logic.
Deep dive into shell injection defense, prompt injection scanning, permissions, sandboxing, and loop detection.
MCP and ACP implementation compared — transports, OAuth, lifecycle, and deferred tool loading.
How agents delegate work, isolate children, enforce concurrency limits, and collect results.
The completely separate deep dive on the most unusual agent in the set — self-improving, multi-platform, and RL-augmented.
The minimalist kernel — 438 files, tree-structured JSONL v3 sessions, differential TUI, Pi Packages, 23 providers across 10 APIs, parallel tool execution, file mutation queue, 4 run modes, MIT licensed.
OpenAI's production agent: 3,805 files, 70+ Rust crates, three platform-specific sandboxes, bidirectional MCP, multi-agent jobs, and IDE extensions.
The 3 MB everything-agent: 6 backends, 16 tools, cron, Tailscale, camera, browser, MCP, chat bridges, and a macOS menu bar app.
The focused specialist: chain-mode workflows, reactive state management, Wyhash loop detection, and a hand-crafted terminal REPL.
The extension-first Rust agent: LLM-based AdversaryInspector, 4 GooseModes, 15+ providers, recipe framework, and MOIM injection.
Head-to-head comparison: two agents, one language, opposite philosophies — platform vs. specialist, 18,400 lines vs. ~9,100.
Clean-room rebuild of Claude Code v2.1.91 via ruDevolution decompilation: async generator loop, 25 tools, 5 providers, nightly releases.
Hash-anchored parallel edits, AST-native precision, 64.8% cost reduction vs competitors, no MCP, 8-type hook system, git checkpoints.