Codex CLI: OpenAI's Production Coding Agent
Codex CLI is OpenAI's own coding agent, built in Rust as a zero-dependency native binary. It ships with platform-specific sandboxes (macOS Seatbelt, Linux bubblewrap/Landlock, Windows restricted tokens), a Ratatui TUI, an MCP client and server, multi-agent spawning, OpenAI-native Responses API plumbing, and a configurable provider system that supports local models, Ollama, and LM Studio out of the box.
What is Codex CLI?
Codex CLI (@openai/codex) is a coding agent from OpenAI that runs
locally on your computer. It is distributed as a standalone native binary — no
Node.js, no Python, no JVM — built from a Rust workspace of
70+ crates totaling approximately 3,805 files
(roughly 1,418 Rust source files in the core workspace alone).
The project is licensed under the Apache-2.0 License.
The Rust implementation is the maintained Codex CLI — a successor to an
earlier TypeScript CLI that OpenAI no longer actively develops. The Rust version
ships with features the TypeScript version never supported: rich config.toml
parsing, MCP client and server modes, an exec headless automation CLI,
collaboration mode, and platform-specific sandbox hardening.
Installation
Codex is installed via npm (npm i -g @openai/codex) or Homebrew
(brew install --cask codex), but the distributed artifact is a
compiled Rust binary — the npm package is just a thin wrapper that
downloads the platform-specific native executable.
Code organization: a workspace of workspace crates
Codex's codex-rs/ directory is a Cargo workspace with over 70 member
crates. This is not a monolith — it is a modular architecture where each crate
has a narrow responsibility. The key crates are:
| Crate | Role |
|---|---|
core | Business logic — session management, tool routing, turn orchestration, thread rollout, model routing |
tui | Full-screen Ratatui-based terminal UI — frames, panels, styles, tooltips |
cli | Multi-tool CLI entry point — subcommands for codex, codex exec, codex sandbox, codex mcp |
exec | Headless/non-interactive mode for automation — codex exec PROMPT runs until done and exits |
tools | Shared tool schemas — Responses API tool primitives, JSON schemas, MCP tool adapters, dynamic tool parsing |
sandboxing | Platform-specific sandbox enforcement — Seatbelt profiles, Landlock, Windows restricted tokens |
config | Configuration loading — config.toml parsing, model provider resolution |
codex-mcp | MCP client implementation — connecting to external MCP servers |
mcp-server | MCP server implementation — lets other agents use Codex as a tool |
codex-api | API client layer — OpenAI Responses API, retry logic, streaming |
execpolicy | Execution policy engine — rule-based allow/deny evaluation for commands and network access |
skills | Embedded system skills — bundled Markdown procedures installed into CODEX_HOME/skills/.system |
app-server | App server for IDE integration — protocol handling for VS Code, Cursor, Windsurf extensions |
app-server-client | Client library for communicating with the app server |
ollama | Ollama integration — local model provider support |
lmstudio | LM Studio integration — local model provider support |
model-provider-info | Built-in model provider registry — OpenAI defaults plus user-defined providers from config |
state | SQLite-backed state DB — session persistence, rollout tracking |
login | ChatGPT authentication — sign-in flow, API key management, keyring store |
linux-sandbox | Linux sandbox helper — bubblewrap integration with Landlock fallback |
windows-sandbox-rs | Windows restricted-token sandbox — split-filesystem policy enforcement |
collaboration-mode-templates | Templates for multi-agent collaboration workflows |
The workspace uses Rust 2024 edition with resolver = "2", fat LTO
in release mode, and symbol stripping for minimal binary size. The Cargo.toml
enforces a strict clippy lint policy: unwrap_used,
expect_used, needless_borrow, and dozens of other
pedantic checks are all set to deny. This is a codebase that takes
correctness seriously.
The tool layer: schemas, handlers, and runtimes
Codex's tool architecture is one of the most systematic in this entire study. Tools flow through three layers:
Tools are defined in codex-tools as ToolSpec,
ToolDefinition, and ResponsesApiTool structs.
Schemas use JSON Schema (JsonSchema via schemars)
with support for freeform text input and structured argument objects.
Functions like tool_definition_to_responses_api_tool(),
mcp_tool_to_responses_api_tool(), and
dynamic_tool_to_responses_api_tool() convert internal specs
into OpenAI Responses API tool objects — the wire format the model receives.
At runtime, codex-core dispatches to handler structs
(ShellHandler, ApplyPatchHandler,
McpHandler, ListDirHandler, etc.) that
implement the ToolHandler trait with
PreToolUsePayload and PostToolUsePayload hooks.
Built-in tools
Codex ships with a rich catalog of first-class tools:
- shell / shell_command — command execution with zsh-fork backend on macOS, classic Unix escalation elsewhere
- apply_patch — unified diff patch application (simulated via a virtual CLI when
--codex-run-as-apply-patch) - list_dir — directory listing with gitignore awareness
- grep / tool_search — file content search with filtering
- tool_suggest — discoverable tool discovery
- view_image — image viewing with detail level control
- js_repl — JavaScript REPL via V8 engine (
v8-poccrate) - request_user_input — interactive prompts for user clarification
- request_permissions — runtime permission escalation
Multi-agent tools (v1 and v2)
Codex supports spawning and managing child agents:
- spawn_agent (v1 and v2) — create isolated child agents
- wait_agent (v1 and v2) — block on child completion with timeout
- send_input / send_message — send messages to running agents
- close_agent — terminate a child agent
- resume_agent — resume a paused agent
- list_agents — enumerate active child agents
- followup_task — assign follow-up work to an agent
The v2 API introduces SpawnAgentToolOptions and
WaitAgentTimeoutOptions for finer-grained control.
Shell execution: zsh-fork backend and escalation
Codex does not just pass commands to /bin/sh. The shell handler
has two backends:
- Zsh Fork Backend — On macOS, Codex uses a dedicated zsh-fork backend (
tools/runtimes/shell/zsh_fork_backend.rs) that avoids fork-related vulnerabilities in the standard shell escalation path. This is a response to macOS-specific security considerations around process spawning. - Classic Unix Escalation — On other platforms, the classic escalation path (
tools/runtimes/shell/unix_escalation.rs) handles permission elevation and command validation.
The shell handler also integrates with the execpolicy engine, which
parses policy rules with a domain-specific language supporting prefix patterns,
network rules (with protocol granularity), and command whitelists/blacklists.
Safety checking also uses is_known_safe_command() from
codex-shell-command for a fast-path allowlist before deeper policy
evaluation.
Sandboxing: three platform-specific enforcement layers
Codex's sandbox story is one of the most sophisticated in this study. Rather than a one-size-fits-all approach, Codex implements platform-native enforcement:
macOS: Seatbelt (sandbox-exec)
Uses /usr/bin/sandbox-exec with Seatbelt profiles. The
workspace-write policy allows writes under configured roots
while keeping .git, resolved gitdir: targets,
and .codex read-only. Legacy user-preference-read access is
preserved for cfprefs-backed behavior. Network and filesystem roots are
controlled by SandboxPolicy.
Linux: bubblewrap + Landlock
Prefers system bwrap (found on PATH outside cwd) with a
vendored fallback compiled into the binary. Landlock is used as a fallback
when bubblewrap cannot create user namespaces (e.g., WSL1). Split filesystem
policies with carveouts (e.g., /repo = write,
/repo/a = none, /repo/a/b = write) automatically
route through bubblewrap. WSL2 uses the normal Linux path; WSL1 is not
supported for bubblewrap.
Windows: restricted tokens + split filesystem
The elevated setup/runner backend supports legacy ReadOnlyAccess::Restricted
and split filesystem policies. Backend-managed system read roots include
C:\Windows, C:\Program Files, and
C:\ProgramData when include_platform_defaults = true.
The unelevated restricted-token backend supports a narrower subset, and
policies that would require unreadable carveouts fail closed.
Sandbox CLI for testing
Codex exposes codex sandbox {macos,linux,windows} subcommands
so developers can test what happens when commands run under the sandbox
without involving the agent loop. The legacy aliases
codex debug seatbelt and codex debug landlock
are still supported.
Danger mode exists
codex --sandbox danger-full-access disables sandboxing entirely.
Codex explicitly warns users to only use this when already running in a
container or other isolated environment. The same setting can be persisted
via sandbox_mode = "danger-full-access" in
~/.codex/config.toml.
Model handling: OpenAI-first but provider-extensible
Codex is built by OpenAI, and its default model family is the Responses API
(wire_api = "responses"). The legacy Chat Completions API
(wire_api = "chat") is explicitly no longer supported and produces
a migration error at config parse time. However, Codex is not hard-coded to
OpenAI's cloud:
Built-in provider defaults
The codex-model-provider-info crate ships with built-in
defaults for the OpenAI provider, so Codex works out-of-the-box. The
WireApi::Responses enum is currently the only supported
wire protocol.
User-defined providers in config.toml
Users can define additional providers under model_providers
in ~/.codex/config.toml. These override or extend the defaults
at runtime, enabling use of local models via Ollama, LM Studio, or any
OpenAI-compatible endpoint.
Dedicated integration crates
Codex bundles dedicated provider integration crates:
codex-ollama and codex-lmstudio. These are
not thin config wrappers — they are full client implementations with
their own discovery, connection, and error-handaling logic.
Retry logic is configurable: defaults include a 300,000 ms stream idle
timeout, 5 stream max retries, and
4 request max retries, with hard caps of 100 for user-configured
values. OpenTelemetry tracing is available via the codex-otel crate.
MCP: client and server in one binary
Codex does not just consume MCP — it also provides it. Two separate crates handle this:
MCP Client (codex-mcp)
Codex functions as an MCP client that connects to MCP servers on startup.
Servers are configured in ~/.codex/config.toml under
mcp_servers. Per-tool approval overrides are supported via
mcp_servers.<name>.tools.<tool>.approval_mode.
Deferred tool loading is supported: MCP tools can be declared as deferred,
meaning they are not loaded into the prompt until the agent explicitly
requests them, reducing context pressure.
MCP Server (codex-mcp-server)
Run codex mcp-server and Codex becomes an MCP server, allowing
other MCP clients to use Codex itself as a tool. This can be tested with
npx @modelcontextprotocol/inspector codex mcp-server. The
codex mcp subcommand manages MCP server launchers (add/list/get/remove)
defined in config.
Subagent architecture: spawn, wait, send, close
Codex supports multi-agent workflows through a structured subagent API. The
handler code lives in two parallel module trees: multi_agents (v1)
and multi_agents_v2 (v2), reflecting an evolving API design:
| Tool | Function | V2 additions |
|---|---|---|
spawn_agent |
Create an isolated child agent with focused prompt | v2 adds SpawnAgentToolOptions for fine-grained control |
wait_agent |
Block until a child agent completes or times out | v2 adds WaitAgentTimeoutOptions |
send_input (v1) / send_message (v2) |
Send messages/input to a running child agent | v2 renames and extends the messaging protocol |
close_agent |
Terminate a child agent | — |
resume_agent |
Resume a paused child agent | — |
list_agents |
Enumerate active child agents | — |
followup_task |
Assign follow-up work to an agent (v2 only) | v2-exclusive feature |
Codex also supports agent jobs — a higher-level workflow
abstraction via spawn_agents_on_csv_tool and
create_report_agent_job_result_tool, which allows spawning
multiple agents from CSV input and collecting structured results. This is
distinct from the lower-level spawn/wait API and is designed for batch
processing scenarios.
Configuration: TOML, not JSON
Unlike many agents in this study that use JSON config, Codex uses
config.toml (the legacy TypeScript CLI used config.json,
but the Rust version deliberately switched). Key configuration areas:
- Sandbox mode —
sandbox_mode = "read-only" | "workspace-write" | "danger-full-access" - Model providers —
[model_providers]table for user-defined providers - MCP servers —
[[mcp_servers]]array with command, transport, and per-tool approval overrides - Notification hooks — script run on turn completion; auto-detects WSL2 + Windows Terminal for native toast notifications
- SQLite state —
sqlite_homeconfig key orCODEX_SQLITE_HOMEenv var for state DB location - Custom CA certificates —
CODEX_CA_CERTIFICATEenv var for enterprise proxy scenarios - Plan mode reasoning —
plan_mode_reasoning_effortfor model-specific reasoning presets - Notices —
[notice]table for "do not show again" UI prompt flags - Custom CA bundles — supports multiple PEM certs, OpenSSL TRUSTED CERTIFICATE labels, and X509 CRL sections
A generated JSON Schema for config.toml lives at
codex-rs/core/config.schema.json, enabling IDE autocomplete
and validation.
TUI: Ratatui with frames, styles, and tooltips
Codex's terminal UI is built with Ratatui (forked patches
for color query support) and organized into a frames/ directory
holding distinct UI panels. A dedicated styles.md file and
tooltips.txt file in the TUI crate suggest a design-conscious
approach — someone cared about how the terminal looks, not just what it does.
The TUI also features a prompt_for_init_command.md template,
suggesting an onboarding flow that guides new users through their first
interaction. Notifications surface as native toast on Windows (when WSL2
inside Windows Terminal is detected) and via configurable scripts on macOS
(terminal-notifier).
Skills system
Codex includes a codex-skills crate that manages embedded
system skills — procedural Markdown instructions bundled at compile time
via include_dir! and installed into
CODEX_HOME/skills/.system on first run. A marker file with a
fingerprint avoids redundant re-installation. This is a lightweight but
deliberate skills mechanism, similar in spirit to Claude Code's built-in
instructions and Hermes's SKILL.md files, but focused on
system-level guidance rather than learned procedures.
IDE integration: app server protocol
Codex is not just a CLI. The app-server,
app-server-client, app-server-protocol, and
app-server-test-client crates form an integration layer for
IDE extensions. Codex supports running as an extension in VS Code, Cursor,
and Windsurf, with a dedicated codex app command for the
desktop app experience. The app server protocol handles initialization,
client info reporting (the TUI reports codex-tui), and
turn lifecycle events.
Collaboration mode
Codex includes a collaboration-mode-templates crate, suggesting
support for multi-user or multi-instance collaboration scenarios. While the
exact user-facing flow is not fully documented in the local repo, the presence
of dedicated templates and collaboration tool specs
(create_close_agent_tool, agent job tools for CSV fanout and
reporting) indicates a deliberate investment in multi-agent orchestration
beyond simple parent-child spawning.
Authentication: ChatGPT plans and API keys
Codex's recommended setup is to sign in with a ChatGPT account (Plus, Pro,
Business, Edu, or Enterprise plans). The codex-login crate
handles the sign-in flow, while codex-keyring-store persists
credentials using the platform's native keyring. API key authentication is
also supported but requires additional setup via the OpenAI developer docs.
Strengths
Platform-native sandboxing
Three separate sandbox implementations — macOS Seatbelt, Linux bubblewrap/ Landlock, and Windows restricted tokens — each with split-filesystem awareness and carveout support. This is a level of platform-specific hardening unmatched by any other agent in this study.
Modular workspace architecture
70+ crates with narrow responsibilities make the codebase auditable and
testable. The strict clippy lint policy (no unwrap_used,
no expect_used, no needless_borrow) signals
engineering discipline.
MCP bidirectional
Codex is both an MCP client and an MCP server. This means it can consume external tools and also be consumed as a tool by other agents. Few agents in this study offer this symmetry.
IDE and desktop presence
Codex runs in VS Code, Cursor, and Windsurf via an app server protocol,
and offers a desktop app via codex app. It bridges the gap
between terminal power users and developer tooling ecosystems.
Weaknesses
OpenAI-centric by design
While local providers (Ollama, LM Studio) are supported, the core wire
protocol is the OpenAI Responses API. The WireApi enum
currently has a single variant (Responses), and the legacy
Chat Completions API was explicitly removed. This is less flexible than
agents like Mux or Qwen Code that abstract across many API shapes.
Complexity surface area
With 3,800+ files and 70+ crates, Codex is one of the larger repos in
this study. The modular architecture helps, but the learning curve for
contributors is steep — particularly for understanding the interplay
between core, tools, execpolicy,
and sandboxing.
No built-in learning loop
Unlike Hermes, Codex does not learn from prior sessions or refine its own procedures over time. Its skills system is static (bundled at compile time) rather than adaptive. Memory, if any, is limited to the SQLite session history without automated skill extraction.
Rust-only contribution barrier
The entire runtime is Rust. While this ensures performance and safety, it limits the pool of potential contributors compared to TypeScript or Python agents where the barrier to entry is lower.
Where Codex fits in the landscape
Codex occupies a space between Claude Code and Crush: like Claude Code, it is a deeply integrated product with its own runtime, tool catalog, permission system, and TUI; like Crush, it is a native binary (Rust, not Go) with a strong emphasis on correctness, sandboxing, and cross-platform distribution. But Codex stands apart in its bidirectional MCP support, its multi-agent job execution framework, and its IDE extension ecosystem.
If Claude Code is the most Anthropic-native agent, Codex is the most OpenAI-native one. But unlike Claude Code's tight coupling to Anthropic's ecosystem, Codex at least attempts provider extensibility — even if the wire protocol remains Responses API-shaped.