Shell Security • Prompt Injection • Permissions • Sandboxing

The Security Audit: How Agents Defend Themselves

When you give an AI agent shell access to your machine, you're trusting it with enormous power. This page audits every security mechanism found across the repos in this guide — from tree-sitter AST analysis to supply-chain verification.

(Alright, ad over. Back to the serious technical analysis.)

Security spectrum overview

These agents span an enormous range of security investment. At one end, some agents treat shell access as a trusted operation with minimal guards. At the other, agents implement tree-sitter AST parsing, Zsh attack catalogs, quote-aware pipeline parsers, and binary provenance verification.

Agent	Shell Security	Prompt Injection	Permissions	Sandboxing	Loop Detection
Claude Code	Tree-sitter AST + Zsh catalog	Feature-gated scanning	4 modes + ML classifier + wildcard rules	Optional sandbox routing	State machine transitions
Neovate	22-item ban list + quote-aware parser	—	3 approval modes	—	—
Codex	execpolicy crate + is_known_safe_command()	—	request_permissions tool + config.toml overrides	Seatbelt/bwrap/restricted tokens	—
OpenCode	Permission-gated host execution	—	allow / deny / ask + wildcard rules	—	—
ADK-Rust	Framework-level tool auth, not deep shell parsing	Guardrails + content filtering	Human-in-the-loop callbacks + RBAC + scope-based tool auth	Experimental sandbox crate (Seatbelt/bwrap/AppContainer)	Graph interrupts / LoopAgent primitives
Hermes	Tirith binary + cosign provenance	Context + memory scanning	Child agent tool stripping	6 backends incl. Docker	Trajectory tracking
Crush	Product-level permission gates	—	TUI permission prompts	—	SHA-256 signature hashing
DeerFlow	Guardrail middleware	—	Tool group filtering	Docker/Local/K8s	200-line bucketing
OpenHands	Risk-level attributes	Security risk assessment template	RISK_LEVELS dict	Docker per-session	Temperature bumping
Qwen Code	Read-only detection + classification	—	Permission decisions	—	LoopDetected event type
Pochi	Foreground/background separation	—	Apply-diff safety params	—	Exponential backoff retry
Kimi CLI	Protocol-aware terminal	—	Hooks-based interception	—	—
Mux	Workspace isolation	—	Regex tool allow/deny	7 runtime backends	—
Pi Mono	Extension-based only	—	Via extensions only	Container-recommended (user's responsibility)	—
Dirac	DIRAC_COMMAND_PERMISSIONS glob patterns	—	Approval-based workflow + YOLO mode (auto-approve)	Git checkpoints (revert before risky ops)	Exponential backoff (2s, 4s, 8s)

Shell injection defense: the full spectrum

1. Claude Code — Tree-sitter AST analysis (23-point validation)

Claude Code's bashSecurity.ts is the deepest shell security implementation in this set. It imports tree-sitter's shell grammar to parse command ASTs before execution — operating on the actual parsed structure, not regex approximations.

Attack Vector	What It Does	Detection Method
`zmodload`	Loads Zsh modules (zsh/system for raw FD access, zsh/net/tcp for sockets)	Command prefix match via Set (O(1))
`emulate -c 'code'`	Evaluate code in emulated environment — effectively eval	Flag pattern match
`sysopen/sysread/syswrite`	Low-level FD ops from zsh/system module bypassing normal I/O	Command name match
`=cmd` (EQUALS expansion)	`=curl` → `/usr/bin/curl`, bypassing blocklists	AST token shape detection
`<() / >() / =()`	Process substitution creates FIFOs that execute as arguments	AST subtree match
`$()` / backticks	Command substitution — detected via tree-sitter, not regex	AST node type
`zpty`	Zsh pseudo-terminal module for interactive command automation	Command prefix match
`zsocket`	Zsh socket module for network connections	Command prefix match
`zf_*`	Zsh filesystem primitives (zf_mv, zf_rm, zf_mkdir, etc.)	Command prefix wildcard
Hereditary substitution	`$\(.*<<` — heredoc-in-substitution attacks	Regex + AST cross-check
`<#`	PowerShell-style comment — flags context confusion	Token match ("defense in depth against future changes")
IFS injection	Manipulating Internal Field Separator to split commands differently	Environment variable scanning
Unicode whitespace	Zero-width or unusual Unicode space characters bypassing blocklists	Unicode character detection
Control characters	Hidden control chars in command strings	Control character detection
Quote manipulation	Misformed quotes breaking parser assumptions	Malformed token detection
Heredoc injection	Injecting commands via heredoc delimiters	Heredoc parsing + validation
Git commit substitution	Attacks via git commit message templates	Git-specific validation

The ZSH_DANGEROUS_COMMANDS constant is a JavaScript Set for O(1) lookup. The code comments explicitly state "defense in depth against future changes" — this is security engineering that anticipates novel attack vectors that don't exist yet.

2. Neovate Code — 22-item banned list + quote-aware pipeline parser

Neovate's src/tools/bash.ts uses a character-level state machine to handle quoting correctly before any security check. This matters because naive regex approaches fail on quoted strings:

// State machine tracks: inSingleQuote, inDoubleQuote, escaping
// splitPipelineSegments() respects quoting so 'echo "a|b"' is ONE segment
// hasCommandSubstitution() tracks same states to find $() and backticks

The full banned command list (22 items):

alias, aria2c, axel, bash, chrome, curl, curlie, eval,
firefox, fish, http-prompt, httpie, links, lynx, nc,
rm, safari, sh, source, telnet, w3m, wget, xh, zsh

Beyond static bans, it detects high-risk patterns like rm -rf, sudo, dd if=, mkfs, and curl | sh, checking every pipeline segment individually.

3. Codex — execpolicy rule DSL + safe command allowlist

Codex implements shell security through its execpolicy crate, which provides a rule DSL for fine-grained command control. The policy system supports prefix patterns, network rules, and command whitelists/blacklists. A fast-path is_known_safe_command() allowlist short-circuits common read-only operations like ls, cat, and find without hitting the full policy engine.

Sandboxing varies by platform: macOS uses Seatbelt (sandbox-exec) with a workspace-write profile that keeps .git and .codex read-only; Linux uses bubblewrap (prefers system bwrap, vendored Landlock fallback); Windows uses restricted tokens with split-filesystem policies. The CLI exposes codex sandbox {macos,linux,windows} for testing, plus a --sandbox flag with levels: read-only, workspace-write, and danger-full-access.

The shell backend uses zsh-fork on macOS for safe shell spawning, with classic Unix escalation elsewhere. Permission escalation flows through the request_permissions tool with a runtime approval flow, and MCP tool approvals can be overridden per-tool in config.toml. The codebase enforces strict clippy denies across 70+ crates: unwrap_used, expect_used, needless_borrow, and more.

4. Hermes — Supply-chain verification via Tirith + cosign

Hermes takes a completely different approach: rather than blocking specific commands, it verifies the execution environment. The Tirith security binary is downloaded from GitHub releases with SHA-256 hash verification and optional cosign provenance validation.

This is supply-chain security — verifying that the execution binary is authentic and untampered — not just command blocking. The Tirith binary checks for:

Homograph URLs (lookalike domains for phishing)
Pipe-to-interpreter patterns
Terminal injection attempts

4. Crush — TUI permission gates + SHA-256 loop detection

Crush integrates a permission system directly into its TUI. Before executing dangerous operations, the user sees a permission prompt. This is a UX-level defense — appropriate for an interactive tool where the user is present.

The SHA-256 loop detection in internal/agent/loop_detection.go computes a signature by hashing tool_name + "\x00" + input + "\x00" + output for every tool call. If any signature appears more than 5 times in the last 10 steps, the agent halts.

Prompt injection defense

Prompt injection — tricking the agent via crafted input in files, context, or memory — is a growing attack vector. Only two agents in this set have explicit defense:

Hermes — Two-layer injection scanning

Layer 1: Context file scanning — Before injecting context files (AGENTS.md, .cursorrules, SOUL.md) into the system prompt, Hermes scans for:

"ignore previous instructions" / "system prompt override"
"do not tell the user" (deception instructions)
"act as if you have no restrictions" (bypass attempts)
 (hidden HTML comment injection)
Zero-width spaces and bidi override characters
Credential exfiltration via curl on .env

Layer 2: Memory injection scanning — MEMORY.md and USER.md go through the same scanner before loading. This matters because memory persists across sessions — a successful injection in session A would propagate to session B without this scan.

OpenHands — Security risk assessment template

OpenHands includes a composable Jinja2 sub-template security_risk_assessment.j2 that's included in the main system prompt. All tool calls carry a security_risk attribute validated against a RISK_LEVELS dictionary.

This is more of a risk-classification system than an injection defense, but it provides auditability: every action is classified and the agent is explicitly warned about security risk in its system prompt.

Permission systems compared

How agents decide whether to run a command is a fundamental security question. The approaches vary dramatically:

Agent	Permission Model	Granularity
Claude Code	4 modes: default (prompt), plan (show plan, ask once), bypassPermissions (auto-approve), auto (ML classifier)	Per-operation + wildcard rules: Bash(git ), FileEdit(/src/)
Neovate	3 modes: default (prompt), autoEdit (auto-approve edits), yolo (approve all)	Per-tool category: read, write, command, network, ask
Codex	request_permissions tool with runtime approval flow + per-tool config.toml overrides	execpolicy rule DSL: prefix patterns, network rules, command whitelists/blacklists
Crush	TUI permission prompts before dangerous operations	Per-operation, user-present
DeerFlow	Tool group filtering + guardrail middleware (fail-closed by default)	Per-tool-group: web, file:read, file:write, bash
OpenCode	Wildcard `allow` / `deny` / `ask` rules with `once`, `always`, or `reject` replies	Per-permission plus path or command pattern, surfaced through the permission bus and ACP clients
Mux	Regex tool allow/deny patterns per agent	Per-agent, regex-based: -., -file_edit_.
Qwen Code	Permission decisions and shell classification	Per-tool, read-only detection
Hermes	Child agent tool stripping (5 tools always removed from children)	Per-agent-type: parent vs child

Sandboxing and execution isolation

When things go wrong, how contained is the damage? The agents fall into three camps:

Strong isolation (Docker/container-based)

OpenHands — Docker container per session, torn down on exit
DeerFlow — AioSandboxProvider with Docker, Apple Container, or K8s backends
Hermes — 6 backends: local, Docker, SSH, Daytona, Modal, Singularity

Moderate isolation (workspace/runtime-based)

Mux — 7 runtime backends including devcontainer, Docker, worktree, SSH, remote
Claude Code — Optional sandbox routing, git worktree isolation
Codex — Platform sandboxes: macOS Seatbelt, Linux bwrap/Landlock, Windows restricted tokens

Host execution (permission-based guards only)

Neovate — Host execution with approval gates and banned command lists
Crush — Host execution with TUI permission prompts
Qwen Code — Host execution with classification
Pochi — Host execution with foreground/background separation
Kimi CLI — Host execution with protocol-aware handling
OpenCode — Host execution with a strong permission model, but no OS-level sandbox
Pi Mono — Host execution with no built-in guards. The README explicitly states: "No permission popups. Run in a container, or build your own confirmation flow with extensions."

Loop detection as a safety mechanism

Infinite loops aren't just annoying — they can burn through API credits, fill disks with logs, or repeatedly execute destructive commands. Only three agents have explicit loop detection; the rest (including Pi Mono, Claude Code, and Mux) rely on the model to notice repetition or on max-iteration caps.

Agent	Method	Threshold	Action
Crush	SHA-256(tool_name + input + output) over last 10 steps	>5 repeats in 10-step window	Halt agent as stuck
DeerFlow	Hash with 200-line bucketing for read_file, full arg hash for write/str_replace	3 warns, 5 hard stop	Warn at 3, strip tool_calls at 5 (forces text output)
Hermes	Trajectory tracking + summary convergence detection	Diminishing new information	RL training signal; proxy for being stuck

💡

Why Crush's approach is the most robust

Crushing calling the same tool with different arguments gets a different hash. Calling it with the same arguments but getting different output (e.g., flaky command) also gets a different hash. Only genuine repetition triggers the halt. This avoids false positives from legitimate iterative workflows like "read file, edit, read again to verify."

Security verdicts

🏆 Best shell security: Claude Code

Tree-sitter AST parsing, 23-point Zsh attack catalog, O(1) Set lookups, and explicit "defense in depth against future changes" comments. This is security engineering that anticipates attack vectors that don't exist yet.

🏆 Best pre-execution safety: Neovate Code

22-item banned list, quote-aware pipeline parser, high-risk pattern detection per pipeline segment. The most opinionated bash safety in this set — stops dangerous commands before they ever run.

🏆 Best supply-chain security: Hermes

SHA-256 binary verification + cosign provenance attestation. This is the only agent that verifies the execution binary's authenticity and integrity before running.

🏆 Best execution isolation: OpenHands

Docker container per session is the strongest isolation model in this set. The agent runs inside a container, not on your host machine. Files sync back, but processes can't escape.

🏆 Best prompt injection defense: Hermes

Two-layer scanning (context files + memory) with 6+ pattern categories. The memory layer is critical: it prevents cross-session injection propagation.

🏆 Best loop detection: Crush

SHA-256 signature hashing over tool_name + input + output with a 10-step sliding window. No false positives from legitimate iterative workflows.

What most agents are missing

Several security patterns appear in only one or two agents despite being broadly applicable:

Prompt injection scanning — Only Hermes does this explicitly. Every other agent (including Pi Mono, Claude Code, and Qwen Code) trusts context files blindly.
Cross-session memory scanning — Only Hermes scans memory on load to prevent injection propagation.
Supply-chain verification — Only Hermes verifies binary provenance.
Tree-sitter AST parsing — Only Claude Code parses shell commands structurally rather than with regex.
Quote-aware parsing — Only Neovate correctly handles quoting in pipeline segmentation.
Temperature-adaptive retry — Only OpenHands bumps temperature on deterministic failures.
Agent-requested compression — Only OpenHands lets the agent ask for context condensation.
File mutation queues — Only Pi Mono serializes concurrent writes to the same file, preventing a subtle race condition that affects most other agents.
Diff. TUI rendering — Only Pi Mono redraws only changed terminal cells, reducing attack surface from terminal escape sequence flooding.

Bottom line

The hallmark of a production-ready coding agent is not its prompt engineering, but its deterministic security scaffold. Claude Code and OpenHands represent the bleeding edge of different philosophies: Claude builds walls around the host machine, OpenHands isolates in containers.

Meanwhile, several agents (Pochi, Kimi CLI, Mux) rely primarily on trust and approval workflows rather than active defense mechanisms. This isn't necessarily wrong — for trusted environments, minimal friction is a feature — but it matters enormously when deploying agents on production infrastructure.