AI Coding Guides Deep Dives
Repository Study • 18 agents • Local snapshot

The Agent Harness Field Guide

This blog audits the code that is actually present in coding-agents\: how each agent wires models, exposes tools, runs shell commands, speaks MCP or ACP, and where Claude Code and Hermes feel fundamentally different from the rest of the field — plus OpenAI's own Codex CLI, a Rust-native agent with platform-specific sandboxes and bidirectional MCP support.

(Alright, ad over. Back to the serious technical analysis.)

What this deep dive covers

The interesting part of coding agents is not the marketing layer. It is the runtime beneath it: whether the tool layer is generic or bespoke, whether shell access is tightly guarded or casually wrapped, whether model support is truly abstracted or just superficially multiplexed, and whether the repo reads like a productized operating environment or a fast-moving integration shell.

I read the local repositories for Pochi, Neovate Code, Mux, Crush, Kimi CLI, Qwen Code, OpenHands, Claude Code, DeerFlow, Hermes Agent (by Nous Research), Codex CLI (by OpenAI), and Open Claude Code 2.0 (a clean-room implementation via AI decompilation), then mapped them against the same questions on every page of this site.

🔍

Methodology note

This is a local-only repo study. I did not use outside documentation beyond what is already checked into the worktree. That matters most for OpenHands, where the repo itself says the newer V1 agent core now lives elsewhere.

Repos audited

The landscape in one screen

🧠

Bespoke runtime products

Claude Code and Crush feel like full terminal operating environments, not thin wrappers. Their tool, permission, and UX layers are part of the product, not just adapters around a chat loop.

Most opinionated
🔌

Provider multiplexers

Mux, Neovate, and Qwen Code all build serious provider catalogs and shared abstractions. They want broad model reach more than a single model-native identity.

Most configurable
🧩

Protocol and adapter layers

Pochi and Kimi CLI stand out for their ecosystem bridges. Pochi ships vendor-specific packages for Codex, Qwen, Copilot, and others. Kimi invests heavily in ACP and in translating internal tool output into protocol-friendly shapes.

Most bridge-heavy
🏗️

Agent frameworks and sandboxes

DeerFlow and OpenHands are less about one CLI persona and more about broader orchestration: sandboxes, middleware, long-running services, app servers, and task execution environments.

Most framework-like

Minimalist extension-first agents

Pi Mono ships a razor-sharp kernel (438 files) with tree-structured JSONL v3 sessions, differential TUI rendering, and 23 providers across 10 APIs — plus Pi Packages (shareable bundles via npm/git), parallel tool execution, a file mutation queue, and 4 run modes. MIT licensed by Mario Zechner.

Most minimalist

Self-improving multi-platform agents

Hermes Agent (Nous Research) is in a category of its own: a persistent skill-learning loop, six remote execution backends, 14+ messaging platform gateways (Telegram, Discord, Slack, WhatsApp, Signal, WeChat, Matrix, Mattermost, Feishu, DingTalk, Email, SMS, HomeAssistant, Webhook), MoA synthesis, and RL training infrastructure.

Most functionally unique

Zero-runtime native binaries

Wintermolt and Zaica are both written entirely in Zig 0.15 — no Node.js, no Python, no garbage collector. They compile to single native binaries (Wintermolt is 3 MB) with cross-compilation to any Zig target including ARM boards. Wintermolt goes wide (7 modes, cron, Tailscale, camera, browser, MCP, chat bridges). Zaica goes deep (chain-mode workflows, reactive state, Wyhash loop detection).

Most portable

OpenAI's own coding agent

Codex CLI is OpenAI's production coding agent — a Rust workspace of 70+ crates with platform-specific sandboxes (Seatbelt, bubblewrap/Landlock, Windows restricted tokens), bidirectional MCP (client and server), multi-agent job execution, and a configurable provider system that supports Ollama and LM Studio.

Most sandboxed

Extension-first security advocates

Goose (AAIF at Linux Foundation) takes a unique approach to security: it uses an LLM-based AdversaryInspector that fires a second LLM call to review tool calls against user-defined rules from ~/.config/goose/adversary.md. This is defense-in-depth for multi-agent setups where parent agents delegate to sub-agents. With 15+ providers, 4 GooseModes (Auto/Approve/SmartApprove/Chat), and a recipe framework, Goose is Rust-native with extensive feature gates (local-inference, aws-providers, otel).

Most LLM-secured

Fast takeaways

Question Best answer from this snapshot Why
Which repo feels most different? Claude Code It is the least generic and the most integrated: Anthropic-first, huge tool catalog, plan/worktree/team flows, React terminal UI, permission system, and a massive central query runtime.
Which repos are most model-agnostic? Mux, Neovate, Qwen Code, Goose All invest in provider registries, routing layers, and shared config resolution instead of pinning themselves to one native model family. Goose ships 35+ provider modules across direct APIs, ACP bridges, and declarative JSON configs.
Which repo adapts multiple ecosystems most explicitly? Pochi It does not stop at a generic provider interface; it ships vendor-specific packages for Codex, Qwen Code, GitHub Copilot, Gemini CLI, and more.
Which shell tooling is most safety-conscious? Neovate Code, Claude Code, and Goose Neovate hard-codes command bans and high-risk detection (22-item banned list, quote-aware pipeline parser), while Claude layers permissions, tree-sitter AST analysis, and Zsh-specific attack detection over a richer command surface. Hermes uses supply-chain verification (cosign provenance) for its execution environment. Goose uniquely uses an LLM-based AdversaryInspector that fires a second LLM call to review tool calls against user-defined rules.
Which repo has the most unique capabilities? Hermes Agent Self-improving skill loop, 6 remote backends, multi-platform IM gateways, MoA synthesis across 4 frontier models, and RL training infrastructure — none of which appear anywhere else in this set.
Which repo is hardest to judge from local code alone? OpenHands The local repo still contains useful architecture, but its own docs say the newer V1 agent core moved to a separate Software Agent SDK repository.
Which code feels most polished? Claude Code, Crush, Mux, Qwen Code These four snapshots show the clearest internal consistency between product goals, tool design, configuration, and error handling. Crush is notable for being the only agent with native LSP diagnostics and Sourcegraph code search as first-class tools.
Which repo is the most extension-friendly? Pi Mono A tight 438-file kernel that deliberately ships without MCP, permissions, or sub-agents — expecting you to compose them via extensions. Pi Packages let you bundle and share configurations across projects via npm or git.
Which repo has the most platform-specific sandboxing? Codex CLI Three separate sandbox implementations — macOS Seatbelt, Linux bubblewrap/Landlock, and Windows restricted tokens — each with split-filesystem awareness and carveout support. Also the only agent in this set that doubles as an MCP server for other agents.

Approximate codebase size by file count

File count is not the same thing as quality, but it does reveal where the implementation surface is broadest.

Codex CLI

3805 files

OpenHands

2774 files

Mux

2226 files

Claude Code

2137 files

Qwen Code

2038 files

Pochi

1315 files

Kimi CLI

899 files

DeerFlow

810 files

Crush

799 files

Neovate

582 files

Hermes

~450 files

Pi Mono

438 files

Wintermolt

51 Zig files (~18,400 lines)

Zaica

~13 files (~9,100 lines)

Open Claude Code 2.0

61 files (~8,300 lines)

Goose

Rust Cargo workspace (~6+ crates)

Dirac

TypeScript monorepo (fork of Cline)

My high-level verdict

Best designed, if you value a coherent product runtime

Claude Code is the standout. It is not the most provider-flexible repo, but it is the clearest example of an agent built as its own operating model: tool schemas, permissioning, commands, tasking, worktrees, UI, feature flags, and retry logic all sit inside one deliberate runtime.

Best designed, if you value clean systems engineering

Crush is the nicest surprise. The Go codebase feels disciplined, modular, and product-minded without being bloated. Its provider plumbing, permissions, and TUI organization are easier to reason about than many faster-moving TypeScript peers.

Best multi-model architecture

Mux and Qwen Code lead here. Mux has a broad provider routing layer with desktop app ambitions, while Qwen Code has a particularly strong configuration and runtime model-resolution story.

Most extensible framework shape

DeerFlow wins on composability. It feels more like a harness for building agent systems than a single agent persona, which makes it powerful but also less opinionated than Claude Code or Crush.

Most functionally unique

Hermes Agent by Nous Research. The self-improving skill loop, 14+ messaging platform gateways, MoA tool (4 frontier models in parallel), and RL training infrastructure are not features in any other repo here. It is the only agent that explicitly tries to get better at your tasks over time.

Most portable — zero runtime, one binary

Wintermolt and Zaica are the only agents here that compile to a single native binary with zero runtime dependency. Wintermolt (3 MB, 18,400 lines) is the most ambitious agent in any language. Zaica (~9,100 lines) is the most focused coding specialist with chain-mode workflows and best-in-class loop detection.

Most extension-friendly kernel

Pi Mono by Mario Zechner. A tight 438-file TypeScript kernel with tree-structured JSONL v3 sessions, differential TUI rendering, 23 providers across 10 APIs, Pi Packages (shareable bundles via npm/git), parallel tool execution, a file mutation queue, and 4 run modes. MIT licensed and deliberately minimal so you can build MCP, permissions, or sub-agents yourself.

Most security-conscious sandboxing

Codex CLI by OpenAI. Three platform-specific sandbox implementations (macOS Seatbelt, Linux bubblewrap/Landlock, Windows restricted tokens), split-filesystem awareness, an execution policy engine with a rule DSL, bidirectional MCP (client and server), and a strict clippy lint policy that bans unwrap_used and expect_used across 70+ crates.

How to read the rest of this site

1
Architecture

Compare tool schemas, shell execution, MCP support, and recovery patterns.

2
Agents

Read per-repo profiles, strengths, weaknesses, and fit.

3
Models

See who is genuinely provider-neutral and who writes model-specific logic.

4
Claude Code

The dedicated page on why Claude Code feels like a category of its own.

5
DeerFlow

The LangGraph-based super agent harness with 14-layer middleware, skill evolution, and sub-agent orchestration.

6
OpenHands

The platform-shaped agent with Docker sandboxing and ingenious temperature-bumping retry logic.

7
Security

Deep dive into shell injection defense, prompt injection scanning, permissions, sandboxing, and loop detection.

8
Protocols

MCP and ACP implementation compared — transports, OAuth, lifecycle, and deferred tool loading.

9
Subagents

How agents delegate work, isolate children, enforce concurrency limits, and collect results.

10
Hermes Agent

The completely separate deep dive on the most unusual agent in the set — self-improving, multi-platform, and RL-augmented.

11
Pi Mono

The minimalist kernel — 438 files, tree-structured JSONL v3 sessions, differential TUI, Pi Packages, 23 providers across 10 APIs, parallel tool execution, file mutation queue, 4 run modes, MIT licensed.

12
Codex CLI

OpenAI's production agent: 3,805 files, 70+ Rust crates, three platform-specific sandboxes, bidirectional MCP, multi-agent jobs, and IDE extensions.

13
Wintermolt

The 3 MB everything-agent: 6 backends, 16 tools, cron, Tailscale, camera, browser, MCP, chat bridges, and a macOS menu bar app.

14
Zaica

The focused specialist: chain-mode workflows, reactive state management, Wyhash loop detection, and a hand-crafted terminal REPL.

15
Goose

The extension-first Rust agent: LLM-based AdversaryInspector, 4 GooseModes, 15+ providers, recipe framework, and MOIM injection.

16
Zig Agents

Head-to-head comparison: two agents, one language, opposite philosophies — platform vs. specialist, 18,400 lines vs. ~9,100.

17
Open Claude Code

Clean-room rebuild of Claude Code v2.1.91 via ruDevolution decompilation: async generator loop, 25 tools, 5 providers, nightly releases.

18
Dirac

Hash-anchored parallel edits, AST-native precision, 64.8% cost reduction vs competitors, no MCP, 8-type hook system, git checkpoints.