AI Coding Guides Deep Dives
OpenAI • Rust Native • Apache-2.0 • 70+ Crates • Multi-Provider

Codex CLI: OpenAI's Production Coding Agent

Codex CLI is OpenAI's own coding agent, built in Rust as a zero-dependency native binary. It ships with platform-specific sandboxes (macOS Seatbelt, Linux bubblewrap/Landlock, Windows restricted tokens), a Ratatui TUI, an MCP client and server, multi-agent spawning, OpenAI-native Responses API plumbing, and a configurable provider system that supports local models, Ollama, and LM Studio out of the box.

(Alright, ad over. Back to the serious technical analysis.)

What is Codex CLI?

Codex CLI (@openai/codex) is a coding agent from OpenAI that runs locally on your computer. It is distributed as a standalone native binary — no Node.js, no Python, no JVM — built from a Rust workspace of 70+ crates totaling approximately 3,805 files (roughly 1,418 Rust source files in the core workspace alone). The project is licensed under the Apache-2.0 License.

The Rust implementation is the maintained Codex CLI — a successor to an earlier TypeScript CLI that OpenAI no longer actively develops. The Rust version ships with features the TypeScript version never supported: rich config.toml parsing, MCP client and server modes, an exec headless automation CLI, collaboration mode, and platform-specific sandbox hardening.

📦

Installation

Codex is installed via npm (npm i -g @openai/codex) or Homebrew (brew install --cask codex), but the distributed artifact is a compiled Rust binary — the npm package is just a thin wrapper that downloads the platform-specific native executable.

Code organization: a workspace of workspace crates

Codex's codex-rs/ directory is a Cargo workspace with over 70 member crates. This is not a monolith — it is a modular architecture where each crate has a narrow responsibility. The key crates are:

CrateRole
coreBusiness logic — session management, tool routing, turn orchestration, thread rollout, model routing
tuiFull-screen Ratatui-based terminal UI — frames, panels, styles, tooltips
cliMulti-tool CLI entry point — subcommands for codex, codex exec, codex sandbox, codex mcp
execHeadless/non-interactive mode for automation — codex exec PROMPT runs until done and exits
toolsShared tool schemas — Responses API tool primitives, JSON schemas, MCP tool adapters, dynamic tool parsing
sandboxingPlatform-specific sandbox enforcement — Seatbelt profiles, Landlock, Windows restricted tokens
configConfiguration loading — config.toml parsing, model provider resolution
codex-mcpMCP client implementation — connecting to external MCP servers
mcp-serverMCP server implementation — lets other agents use Codex as a tool
codex-apiAPI client layer — OpenAI Responses API, retry logic, streaming
execpolicyExecution policy engine — rule-based allow/deny evaluation for commands and network access
skillsEmbedded system skills — bundled Markdown procedures installed into CODEX_HOME/skills/.system
app-serverApp server for IDE integration — protocol handling for VS Code, Cursor, Windsurf extensions
app-server-clientClient library for communicating with the app server
ollamaOllama integration — local model provider support
lmstudioLM Studio integration — local model provider support
model-provider-infoBuilt-in model provider registry — OpenAI defaults plus user-defined providers from config
stateSQLite-backed state DB — session persistence, rollout tracking
loginChatGPT authentication — sign-in flow, API key management, keyring store
linux-sandboxLinux sandbox helper — bubblewrap integration with Landlock fallback
windows-sandbox-rsWindows restricted-token sandbox — split-filesystem policy enforcement
collaboration-mode-templatesTemplates for multi-agent collaboration workflows

The workspace uses Rust 2024 edition with resolver = "2", fat LTO in release mode, and symbol stripping for minimal binary size. The Cargo.toml enforces a strict clippy lint policy: unwrap_used, expect_used, needless_borrow, and dozens of other pedantic checks are all set to deny. This is a codebase that takes correctness seriously.

The tool layer: schemas, handlers, and runtimes

Codex's tool architecture is one of the most systematic in this entire study. Tools flow through three layers:

1
Schema definition

Tools are defined in codex-tools as ToolSpec, ToolDefinition, and ResponsesApiTool structs. Schemas use JSON Schema (JsonSchema via schemars) with support for freeform text input and structured argument objects.

2
Conversion to API format

Functions like tool_definition_to_responses_api_tool(), mcp_tool_to_responses_api_tool(), and dynamic_tool_to_responses_api_tool() convert internal specs into OpenAI Responses API tool objects — the wire format the model receives.

3
Handler dispatch

At runtime, codex-core dispatches to handler structs (ShellHandler, ApplyPatchHandler, McpHandler, ListDirHandler, etc.) that implement the ToolHandler trait with PreToolUsePayload and PostToolUsePayload hooks.

Built-in tools

Codex ships with a rich catalog of first-class tools:

  • shell / shell_command — command execution with zsh-fork backend on macOS, classic Unix escalation elsewhere
  • apply_patch — unified diff patch application (simulated via a virtual CLI when --codex-run-as-apply-patch)
  • list_dir — directory listing with gitignore awareness
  • grep / tool_search — file content search with filtering
  • tool_suggest — discoverable tool discovery
  • view_image — image viewing with detail level control
  • js_repl — JavaScript REPL via V8 engine (v8-poc crate)
  • request_user_input — interactive prompts for user clarification
  • request_permissions — runtime permission escalation

Multi-agent tools (v1 and v2)

Codex supports spawning and managing child agents:

  • spawn_agent (v1 and v2) — create isolated child agents
  • wait_agent (v1 and v2) — block on child completion with timeout
  • send_input / send_message — send messages to running agents
  • close_agent — terminate a child agent
  • resume_agent — resume a paused agent
  • list_agents — enumerate active child agents
  • followup_task — assign follow-up work to an agent

The v2 API introduces SpawnAgentToolOptions and WaitAgentTimeoutOptions for finer-grained control.

Shell execution: zsh-fork backend and escalation

Codex does not just pass commands to /bin/sh. The shell handler has two backends:

The shell handler also integrates with the execpolicy engine, which parses policy rules with a domain-specific language supporting prefix patterns, network rules (with protocol granularity), and command whitelists/blacklists. Safety checking also uses is_known_safe_command() from codex-shell-command for a fast-path allowlist before deeper policy evaluation.

Sandboxing: three platform-specific enforcement layers

Codex's sandbox story is one of the most sophisticated in this study. Rather than a one-size-fits-all approach, Codex implements platform-native enforcement:

macOS: Seatbelt (sandbox-exec)

Uses /usr/bin/sandbox-exec with Seatbelt profiles. The workspace-write policy allows writes under configured roots while keeping .git, resolved gitdir: targets, and .codex read-only. Legacy user-preference-read access is preserved for cfprefs-backed behavior. Network and filesystem roots are controlled by SandboxPolicy.

Linux: bubblewrap + Landlock

Prefers system bwrap (found on PATH outside cwd) with a vendored fallback compiled into the binary. Landlock is used as a fallback when bubblewrap cannot create user namespaces (e.g., WSL1). Split filesystem policies with carveouts (e.g., /repo = write, /repo/a = none, /repo/a/b = write) automatically route through bubblewrap. WSL2 uses the normal Linux path; WSL1 is not supported for bubblewrap.

Windows: restricted tokens + split filesystem

The elevated setup/runner backend supports legacy ReadOnlyAccess::Restricted and split filesystem policies. Backend-managed system read roots include C:\Windows, C:\Program Files, and C:\ProgramData when include_platform_defaults = true. The unelevated restricted-token backend supports a narrower subset, and policies that would require unreadable carveouts fail closed.

Sandbox CLI for testing

Codex exposes codex sandbox {macos,linux,windows} subcommands so developers can test what happens when commands run under the sandbox without involving the agent loop. The legacy aliases codex debug seatbelt and codex debug landlock are still supported.

⚠️

Danger mode exists

codex --sandbox danger-full-access disables sandboxing entirely. Codex explicitly warns users to only use this when already running in a container or other isolated environment. The same setting can be persisted via sandbox_mode = "danger-full-access" in ~/.codex/config.toml.

Model handling: OpenAI-first but provider-extensible

Codex is built by OpenAI, and its default model family is the Responses API (wire_api = "responses"). The legacy Chat Completions API (wire_api = "chat") is explicitly no longer supported and produces a migration error at config parse time. However, Codex is not hard-coded to OpenAI's cloud:

Built-in provider defaults

The codex-model-provider-info crate ships with built-in defaults for the OpenAI provider, so Codex works out-of-the-box. The WireApi::Responses enum is currently the only supported wire protocol.

User-defined providers in config.toml

Users can define additional providers under model_providers in ~/.codex/config.toml. These override or extend the defaults at runtime, enabling use of local models via Ollama, LM Studio, or any OpenAI-compatible endpoint.

Dedicated integration crates

Codex bundles dedicated provider integration crates: codex-ollama and codex-lmstudio. These are not thin config wrappers — they are full client implementations with their own discovery, connection, and error-handaling logic.

Retry logic is configurable: defaults include a 300,000 ms stream idle timeout, 5 stream max retries, and 4 request max retries, with hard caps of 100 for user-configured values. OpenTelemetry tracing is available via the codex-otel crate.

MCP: client and server in one binary

Codex does not just consume MCP — it also provides it. Two separate crates handle this:

MCP Client (codex-mcp)

Codex functions as an MCP client that connects to MCP servers on startup. Servers are configured in ~/.codex/config.toml under mcp_servers. Per-tool approval overrides are supported via mcp_servers.<name>.tools.<tool>.approval_mode. Deferred tool loading is supported: MCP tools can be declared as deferred, meaning they are not loaded into the prompt until the agent explicitly requests them, reducing context pressure.

MCP Server (codex-mcp-server)

Run codex mcp-server and Codex becomes an MCP server, allowing other MCP clients to use Codex itself as a tool. This can be tested with npx @modelcontextprotocol/inspector codex mcp-server. The codex mcp subcommand manages MCP server launchers (add/list/get/remove) defined in config.

Subagent architecture: spawn, wait, send, close

Codex supports multi-agent workflows through a structured subagent API. The handler code lives in two parallel module trees: multi_agents (v1) and multi_agents_v2 (v2), reflecting an evolving API design:

Tool Function V2 additions
spawn_agent Create an isolated child agent with focused prompt v2 adds SpawnAgentToolOptions for fine-grained control
wait_agent Block until a child agent completes or times out v2 adds WaitAgentTimeoutOptions
send_input (v1) / send_message (v2) Send messages/input to a running child agent v2 renames and extends the messaging protocol
close_agent Terminate a child agent
resume_agent Resume a paused child agent
list_agents Enumerate active child agents
followup_task Assign follow-up work to an agent (v2 only) v2-exclusive feature

Codex also supports agent jobs — a higher-level workflow abstraction via spawn_agents_on_csv_tool and create_report_agent_job_result_tool, which allows spawning multiple agents from CSV input and collecting structured results. This is distinct from the lower-level spawn/wait API and is designed for batch processing scenarios.

Configuration: TOML, not JSON

Unlike many agents in this study that use JSON config, Codex uses config.toml (the legacy TypeScript CLI used config.json, but the Rust version deliberately switched). Key configuration areas:

A generated JSON Schema for config.toml lives at codex-rs/core/config.schema.json, enabling IDE autocomplete and validation.

TUI: Ratatui with frames, styles, and tooltips

Codex's terminal UI is built with Ratatui (forked patches for color query support) and organized into a frames/ directory holding distinct UI panels. A dedicated styles.md file and tooltips.txt file in the TUI crate suggest a design-conscious approach — someone cared about how the terminal looks, not just what it does.

The TUI also features a prompt_for_init_command.md template, suggesting an onboarding flow that guides new users through their first interaction. Notifications surface as native toast on Windows (when WSL2 inside Windows Terminal is detected) and via configurable scripts on macOS (terminal-notifier).

Skills system

Codex includes a codex-skills crate that manages embedded system skills — procedural Markdown instructions bundled at compile time via include_dir! and installed into CODEX_HOME/skills/.system on first run. A marker file with a fingerprint avoids redundant re-installation. This is a lightweight but deliberate skills mechanism, similar in spirit to Claude Code's built-in instructions and Hermes's SKILL.md files, but focused on system-level guidance rather than learned procedures.

IDE integration: app server protocol

Codex is not just a CLI. The app-server, app-server-client, app-server-protocol, and app-server-test-client crates form an integration layer for IDE extensions. Codex supports running as an extension in VS Code, Cursor, and Windsurf, with a dedicated codex app command for the desktop app experience. The app server protocol handles initialization, client info reporting (the TUI reports codex-tui), and turn lifecycle events.

Collaboration mode

Codex includes a collaboration-mode-templates crate, suggesting support for multi-user or multi-instance collaboration scenarios. While the exact user-facing flow is not fully documented in the local repo, the presence of dedicated templates and collaboration tool specs (create_close_agent_tool, agent job tools for CSV fanout and reporting) indicates a deliberate investment in multi-agent orchestration beyond simple parent-child spawning.

Authentication: ChatGPT plans and API keys

Codex's recommended setup is to sign in with a ChatGPT account (Plus, Pro, Business, Edu, or Enterprise plans). The codex-login crate handles the sign-in flow, while codex-keyring-store persists credentials using the platform's native keyring. API key authentication is also supported but requires additional setup via the OpenAI developer docs.

Strengths

Platform-native sandboxing

Three separate sandbox implementations — macOS Seatbelt, Linux bubblewrap/ Landlock, and Windows restricted tokens — each with split-filesystem awareness and carveout support. This is a level of platform-specific hardening unmatched by any other agent in this study.

Modular workspace architecture

70+ crates with narrow responsibilities make the codebase auditable and testable. The strict clippy lint policy (no unwrap_used, no expect_used, no needless_borrow) signals engineering discipline.

MCP bidirectional

Codex is both an MCP client and an MCP server. This means it can consume external tools and also be consumed as a tool by other agents. Few agents in this study offer this symmetry.

IDE and desktop presence

Codex runs in VS Code, Cursor, and Windsurf via an app server protocol, and offers a desktop app via codex app. It bridges the gap between terminal power users and developer tooling ecosystems.

Weaknesses

OpenAI-centric by design

While local providers (Ollama, LM Studio) are supported, the core wire protocol is the OpenAI Responses API. The WireApi enum currently has a single variant (Responses), and the legacy Chat Completions API was explicitly removed. This is less flexible than agents like Mux or Qwen Code that abstract across many API shapes.

Complexity surface area

With 3,800+ files and 70+ crates, Codex is one of the larger repos in this study. The modular architecture helps, but the learning curve for contributors is steep — particularly for understanding the interplay between core, tools, execpolicy, and sandboxing.

No built-in learning loop

Unlike Hermes, Codex does not learn from prior sessions or refine its own procedures over time. Its skills system is static (bundled at compile time) rather than adaptive. Memory, if any, is limited to the SQLite session history without automated skill extraction.

Rust-only contribution barrier

The entire runtime is Rust. While this ensures performance and safety, it limits the pool of potential contributors compared to TypeScript or Python agents where the barrier to entry is lower.

Where Codex fits in the landscape

Codex occupies a space between Claude Code and Crush: like Claude Code, it is a deeply integrated product with its own runtime, tool catalog, permission system, and TUI; like Crush, it is a native binary (Rust, not Go) with a strong emphasis on correctness, sandboxing, and cross-platform distribution. But Codex stands apart in its bidirectional MCP support, its multi-agent job execution framework, and its IDE extension ecosystem.

If Claude Code is the most Anthropic-native agent, Codex is the most OpenAI-native one. But unlike Claude Code's tight coupling to Anthropic's ecosystem, Codex at least attempts provider extensibility — even if the wire protocol remains Responses API-shaped.