AI Coding Guides Deep Dives
Shell Security • Prompt Injection • Permissions • Sandboxing

The Security Audit: How Agents Defend Themselves

When you give an AI agent shell access to your machine, you're trusting it with enormous power. This page audits every security mechanism found across all 10 agents — from tree-sitter AST analysis to supply-chain verification.

(Alright, ad over. Back to the serious technical analysis.)

Security spectrum overview

These agents span an enormous range of security investment. At one end, some agents treat shell access as a trusted operation with minimal guards. At the other, agents implement tree-sitter AST parsing, Zsh attack catalogs, quote-aware pipeline parsers, and binary provenance verification.

AgentShell SecurityPrompt InjectionPermissionsSandboxingLoop Detection
Claude Code Tree-sitter AST + Zsh catalog Feature-gated scanning 4 modes + ML classifier + wildcard rules Optional sandbox routing State machine transitions
Neovate 22-item ban list + quote-aware parser 3 approval modes
Codex execpolicy crate + is_known_safe_command() request_permissions tool + config.toml overrides Seatbelt/bwrap/restricted tokens
Hermes Tirith binary + cosign provenance Context + memory scanning Child agent tool stripping 6 backends incl. Docker Trajectory tracking
Crush Product-level permission gates TUI permission prompts SHA-256 signature hashing
DeerFlow Guardrail middleware Tool group filtering Docker/Local/K8s 200-line bucketing
OpenHands Risk-level attributes Security risk assessment template RISK_LEVELS dict Docker per-session Temperature bumping
Qwen Code Read-only detection + classification Permission decisions LoopDetected event type
Pochi Foreground/background separation Apply-diff safety params Exponential backoff retry
Kimi CLI Protocol-aware terminal Hooks-based interception
Mux Workspace isolation Regex tool allow/deny 7 runtime backends
Pi Mono Extension-based only Via extensions only Container-recommended (user's responsibility)
Dirac DIRAC_COMMAND_PERMISSIONS glob patterns Approval-based workflow + YOLO mode (auto-approve) Git checkpoints (revert before risky ops) Exponential backoff (2s, 4s, 8s)

Shell injection defense: the full spectrum

1. Claude Code — Tree-sitter AST analysis (23-point validation)

Claude Code's bashSecurity.ts is the deepest shell security implementation in this set. It imports tree-sitter's shell grammar to parse command ASTs before execution — operating on the actual parsed structure, not regex approximations.

Attack VectorWhat It DoesDetection Method
zmodloadLoads Zsh modules (zsh/system for raw FD access, zsh/net/tcp for sockets)Command prefix match via Set (O(1))
emulate -c 'code'Evaluate code in emulated environment — effectively evalFlag pattern match
sysopen/sysread/syswriteLow-level FD ops from zsh/system module bypassing normal I/OCommand name match
=cmd (EQUALS expansion)=curl/usr/bin/curl, bypassing blocklistsAST token shape detection
<() / >() / =()Process substitution creates FIFOs that execute as argumentsAST subtree match
$() / backticksCommand substitution — detected via tree-sitter, not regexAST node type
zptyZsh pseudo-terminal module for interactive command automationCommand prefix match
zsocketZsh socket module for network connectionsCommand prefix match
zf_*Zsh filesystem primitives (zf_mv, zf_rm, zf_mkdir, etc.)Command prefix wildcard
Hereditary substitution$\(.*<< — heredoc-in-substitution attacksRegex + AST cross-check
<#PowerShell-style comment — flags context confusionToken match ("defense in depth against future changes")
IFS injectionManipulating Internal Field Separator to split commands differentlyEnvironment variable scanning
Unicode whitespaceZero-width or unusual Unicode space characters bypassing blocklistsUnicode character detection
Control charactersHidden control chars in command stringsControl character detection
Quote manipulationMisformed quotes breaking parser assumptionsMalformed token detection
Heredoc injectionInjecting commands via heredoc delimitersHeredoc parsing + validation
Git commit substitutionAttacks via git commit message templatesGit-specific validation

The ZSH_DANGEROUS_COMMANDS constant is a JavaScript Set for O(1) lookup. The code comments explicitly state "defense in depth against future changes" — this is security engineering that anticipates novel attack vectors that don't exist yet.

2. Neovate Code — 22-item banned list + quote-aware pipeline parser

Neovate's src/tools/bash.ts uses a character-level state machine to handle quoting correctly before any security check. This matters because naive regex approaches fail on quoted strings:

// State machine tracks: inSingleQuote, inDoubleQuote, escaping
// splitPipelineSegments() respects quoting so 'echo "a|b"' is ONE segment
// hasCommandSubstitution() tracks same states to find $() and backticks

The full banned command list (22 items):

alias, aria2c, axel, bash, chrome, curl, curlie, eval,
firefox, fish, http-prompt, httpie, links, lynx, nc,
rm, safari, sh, source, telnet, w3m, wget, xh, zsh

Beyond static bans, it detects high-risk patterns like rm -rf, sudo, dd if=, mkfs, and curl | sh, checking every pipeline segment individually.

3. Codex — execpolicy rule DSL + safe command allowlist

Codex implements shell security through its execpolicy crate, which provides a rule DSL for fine-grained command control. The policy system supports prefix patterns, network rules, and command whitelists/blacklists. A fast-path is_known_safe_command() allowlist short-circuits common read-only operations like ls, cat, and find without hitting the full policy engine.

Sandboxing varies by platform: macOS uses Seatbelt (sandbox-exec) with a workspace-write profile that keeps .git and .codex read-only; Linux uses bubblewrap (prefers system bwrap, vendored Landlock fallback); Windows uses restricted tokens with split-filesystem policies. The CLI exposes codex sandbox {macos,linux,windows} for testing, plus a --sandbox flag with levels: read-only, workspace-write, and danger-full-access.

The shell backend uses zsh-fork on macOS for safe shell spawning, with classic Unix escalation elsewhere. Permission escalation flows through the request_permissions tool with a runtime approval flow, and MCP tool approvals can be overridden per-tool in config.toml. The codebase enforces strict clippy denies across 70+ crates: unwrap_used, expect_used, needless_borrow, and more.

4. Hermes — Supply-chain verification via Tirith + cosign

Hermes takes a completely different approach: rather than blocking specific commands, it verifies the execution environment. The Tirith security binary is downloaded from GitHub releases with SHA-256 hash verification and optional cosign provenance validation.

This is supply-chain security — verifying that the execution binary is authentic and untampered — not just command blocking. The Tirith binary checks for:

4. Crush — TUI permission gates + SHA-256 loop detection

Crush integrates a permission system directly into its TUI. Before executing dangerous operations, the user sees a permission prompt. This is a UX-level defense — appropriate for an interactive tool where the user is present.

The SHA-256 loop detection in internal/agent/loop_detection.go computes a signature by hashing tool_name + "\x00" + input + "\x00" + output for every tool call. If any signature appears more than 5 times in the last 10 steps, the agent halts.

Prompt injection defense

Prompt injection — tricking the agent via crafted input in files, context, or memory — is a growing attack vector. Only two agents in this set have explicit defense:

Hermes — Two-layer injection scanning

Layer 1: Context file scanning — Before injecting context files (AGENTS.md, .cursorrules, SOUL.md) into the system prompt, Hermes scans for:

  • "ignore previous instructions" / "system prompt override"
  • "do not tell the user" (deception instructions)
  • "act as if you have no restrictions" (bypass attempts)
  • <!-- ignore --> (hidden HTML comment injection)
  • Zero-width spaces and bidi override characters
  • Credential exfiltration via curl on .env

Layer 2: Memory injection scanning — MEMORY.md and USER.md go through the same scanner before loading. This matters because memory persists across sessions — a successful injection in session A would propagate to session B without this scan.

OpenHands — Security risk assessment template

OpenHands includes a composable Jinja2 sub-template security_risk_assessment.j2 that's included in the main system prompt. All tool calls carry a security_risk attribute validated against a RISK_LEVELS dictionary.

This is more of a risk-classification system than an injection defense, but it provides auditability: every action is classified and the agent is explicitly warned about security risk in its system prompt.

Permission systems compared

How agents decide whether to run a command is a fundamental security question. The approaches vary dramatically:

AgentPermission ModelGranularity
Claude Code 4 modes: default (prompt), plan (show plan, ask once), bypassPermissions (auto-approve), auto (ML classifier) Per-operation + wildcard rules: Bash(git *), FileEdit(/src/*)
Neovate 3 modes: default (prompt), autoEdit (auto-approve edits), yolo (approve all) Per-tool category: read, write, command, network, ask
Codex request_permissions tool with runtime approval flow + per-tool config.toml overrides execpolicy rule DSL: prefix patterns, network rules, command whitelists/blacklists
Crush TUI permission prompts before dangerous operations Per-operation, user-present
DeerFlow Tool group filtering + guardrail middleware (fail-closed by default) Per-tool-group: web, file:read, file:write, bash
Mux Regex tool allow/deny patterns per agent Per-agent, regex-based: -.*, -file_edit_.*
Qwen Code Permission decisions and shell classification Per-tool, read-only detection
Hermes Child agent tool stripping (5 tools always removed from children) Per-agent-type: parent vs child

Sandboxing and execution isolation

When things go wrong, how contained is the damage? The agents fall into three camps:

Strong isolation (Docker/container-based)

  • OpenHands — Docker container per session, torn down on exit
  • DeerFlow — AioSandboxProvider with Docker, Apple Container, or K8s backends
  • Hermes — 6 backends: local, Docker, SSH, Daytona, Modal, Singularity

Moderate isolation (workspace/runtime-based)

  • Mux — 7 runtime backends including devcontainer, Docker, worktree, SSH, remote
  • Claude Code — Optional sandbox routing, git worktree isolation
  • Codex — Platform sandboxes: macOS Seatbelt, Linux bwrap/Landlock, Windows restricted tokens

Host execution (permission-based guards only)

  • Neovate — Host execution with approval gates and banned command lists
  • Crush — Host execution with TUI permission prompts
  • Qwen Code — Host execution with classification
  • Pochi — Host execution with foreground/background separation
  • Kimi CLI — Host execution with protocol-aware handling
  • Pi Mono — Host execution with no built-in guards. The README explicitly states: "No permission popups. Run in a container, or build your own confirmation flow with extensions."

Loop detection as a safety mechanism

Infinite loops aren't just annoying — they can burn through API credits, fill disks with logs, or repeatedly execute destructive commands. Only three agents have explicit loop detection; the rest (including Pi Mono, Claude Code, and Mux) rely on the model to notice repetition or on max-iteration caps.

AgentMethodThresholdAction
Crush SHA-256(tool_name + input + output) over last 10 steps >5 repeats in 10-step window Halt agent as stuck
DeerFlow Hash with 200-line bucketing for read_file, full arg hash for write/str_replace 3 warns, 5 hard stop Warn at 3, strip tool_calls at 5 (forces text output)
Hermes Trajectory tracking + summary convergence detection Diminishing new information RL training signal; proxy for being stuck
💡

Why Crush's approach is the most robust

Crushing calling the same tool with different arguments gets a different hash. Calling it with the same arguments but getting different output (e.g., flaky command) also gets a different hash. Only genuine repetition triggers the halt. This avoids false positives from legitimate iterative workflows like "read file, edit, read again to verify."

Security verdicts

🏆 Best shell security: Claude Code

Tree-sitter AST parsing, 23-point Zsh attack catalog, O(1) Set lookups, and explicit "defense in depth against future changes" comments. This is security engineering that anticipates attack vectors that don't exist yet.

🏆 Best pre-execution safety: Neovate Code

22-item banned list, quote-aware pipeline parser, high-risk pattern detection per pipeline segment. The most opinionated bash safety in this set — stops dangerous commands before they ever run.

🏆 Best supply-chain security: Hermes

SHA-256 binary verification + cosign provenance attestation. This is the only agent that verifies the execution binary's authenticity and integrity before running.

🏆 Best execution isolation: OpenHands

Docker container per session is the strongest isolation model in this set. The agent runs inside a container, not on your host machine. Files sync back, but processes can't escape.

🏆 Best prompt injection defense: Hermes

Two-layer scanning (context files + memory) with 6+ pattern categories. The memory layer is critical: it prevents cross-session injection propagation.

🏆 Best loop detection: Crush

SHA-256 signature hashing over tool_name + input + output with a 10-step sliding window. No false positives from legitimate iterative workflows.

What most agents are missing

Several security patterns appear in only one or two agents despite being broadly applicable:

Bottom line

The hallmark of a production-ready coding agent is not its prompt engineering, but its deterministic security scaffold. Claude Code and OpenHands represent the bleeding edge of different philosophies: Claude builds walls around the host machine, OpenHands isolates in containers.

Meanwhile, several agents (Pochi, Kimi CLI, Mux) rely primarily on trust and approval workflows rather than active defense mechanisms. This isn't necessarily wrong — for trusted environments, minimal friction is a feature — but it matters enormously when deploying agents on production infrastructure.