Platform • Sandbox • CodeAct Agent

OpenHands: The Platform-Shaped Agent

OpenHands is the hardest repo to score fairly — the local snapshot is explicitly described as incomplete, with the modern V1 agent core having moved to a separate Software Agent SDK repository. But what remains is still architecturally fascinating, including one of the most ingenious retry strategies in this set.

(Alright, ad over. Back to the serious technical analysis.)

The important caveat

⚠️

The V1 agent core is not in this repo

OpenHands' own documentation states that the newer V1 agent core moved to a separate Software Agent SDK repository. What remains locally includes the platform architecture, sandbox infrastructure, app/server code, and the legacy CodeAct agent — useful for understanding the platform shape, but not the full current product story.

That said, the local snapshot still reveals significant architectural decisions that differentiate OpenHands from every other agent in this set.

The ingenious temperature-bumping retry

The single most interesting thing in this repo is the retry logic in openhands/llm/retry_mixin.py. It uses the tenacity library with a documented, intentional quirk:

LLMNoResponseError at temperature 0 → bump to 1.0

When the model returns no response at all (empty stream, no tokens) and the temperature is set to 0, OpenHands automatically sets temperature = 1.0 on the next retry attempt.

The reasoning is explicit in the code comments: a fully deterministic model (temp=0) that returns nothing is stuck in a degenerate fixed point. Adding randomness breaks the loop. This is one of the more thoughtful LLM retry patterns in the set — it adapts the request rather than just retrying identically.

# Intentional: on LLMNoResponseError at temp=0,
# set temperature = 1.0 on next retry.
# Rationale: deterministic model returning nothing
# is in a degenerate fixed point. Randomness breaks it.

This is the kind of production scar tissue you only get from running an agent at scale. Most agents just retry the same request and hope for a different result. OpenHands recognizes that identical requests to a deterministic model produce identical outputs — so it changes the model's behavior parameters.

CondensationRequestTool — the agent requests its own compression

OpenHands defines a CondensationRequestTool that the agent itself can invoke to request history condensation. This is unusual: most agents have the runtime decide when to compress context. In OpenHands, the agent can notice it's running low on context and ask for compression.

This is a more agent-centric design philosophy: the LLM is trusted to know its own context state and make informed decisions about when to compact history. It requires the agent to understand the tradeoff (losing detail for more working room), but gives it autonomy over its own cognitive budget.

9 Jinja2 prompt templates with XML sections

OpenHands has the most modular prompt system in this set: 9 Jinja2 .j2 templates in openhands/agenthub/codeact_agent/prompts/, assembled at runtime. The main template uses named XML sections:

Section	Purpose
`<ROLE>`	Defines the agent's identity and capabilities
`<EFFICIENCY>`	Guidelines for efficient behavior
`<FILE_SYSTEM_GUIDELINES>`	File operation best practices
`<CODE_QUALITY>`	Code standards and testing expectations
`<VERSION_CONTROL>`	Git workflow expectations
`<PULL_REQUESTS>`	PR creation and review guidelines
`<PROBLEM_SOLVING_WORKFLOW>`	Systematic problem-solving approach
`<SECURITY>`	Security practices and risk awareness
`<EXTERNAL_SERVICES>`	Integration with external APIs and services
`<ENVIRONMENT_SETUP>`	Environment configuration and dependencies

The security section includes {% include 'security_risk_assessment.j2' %} — a composable sub-template, not inline text. This is Jinja2's template composition at work, allowing security guidelines to be maintained separately from the main prompt.

The long-horizon variant (system_prompt_long_horizon.j2) extends the base to add <TASK_MANAGEMENT> and <TASK_TRACKING_PERSISTENCE> for the task_tracker tool — designed for multi-step, multi-session tasks that span hours or days.

Additional templates: in_context_learning_example.j2, microagent_info.j2, additional_info.j2, system_prompt_interactive.j2, system_prompt_tech_philosophy.j2.

fn_call_converter — LEGACY V0, removal April 1, 2026

The fn_call_converter.py file is marked LEGACY V0, removal April 1, 2026. It converts between JSON function-calling and XML for models that don't support native tool calls:

<function=name>
  <parameter=key>value</parameter>
</function>

It uses </function as a stream-stop word for incremental parsing. The refine_prompt() function automatically replaces 'bash' with 'powershell' on Windows — an automatic platform adaptation that no other agent formalizes at the prompt conversion level.

Sandbox and Docker architecture

OpenHands is the most security-conscious agent in this set when it comes to execution isolation. Rather than running commands on the host machine with guards and blocklists, it provisions Docker containers as isolated execution environments:

Container-per-session model

Each agent session gets its own Docker container. Files, processes, and network access are all sandboxed. The container is torn down when the session ends. This is the strongest isolation model in this set.

File synchronization

The sandbox manager syncs file changes between the container and the host workspace. The agent works inside the container, but file edits are reflected back to the user's workspace in real time.

All tool calls carry a security_risk attribute validated against a RISK_LEVELS dictionary. This is defense in depth: even inside a sandbox, the agent's actions are classified and auditable.

CodeAct agent architecture

The CodeAct agent is OpenHands's primary agent loop. It follows the "code as action" paradigm: the agent writes and executes Python/bash code as its primary action mechanism, rather than calling predefined tools. This is more flexible than tool-based agents — the agent can write arbitrary code to solve problems — but requires stronger sandbox isolation.

The local snapshot still contains the CodeAct agent's action definitions, observation types, and the agent hub structure. The modern V1 implementation lives in the separate Software Agent SDK, but the conceptual architecture is visible here.

Model support

OpenHands historically supported broad model flexibility via LiteLLM-style compatibility. The local repo shows:

OpenAI — GPT-4, GPT-4o, o-series models
Anthropic — Claude Sonnet, Opus, Haiku
Google — Gemini Pro, Flash
Open-source — Any LiteLLM-compatible model
Local models — Ollama, vLLM, and other local inference servers

The fn_call_converter enables models that don't support native function calling to still participate as agents through the XML tool format.

App/server architecture

OpenHands includes a full web application with:

Next.js frontend — React-based web UI for interacting with agents
FastAPI backend — Python API server managing agent sessions
WebSocket communication — Real-time streaming of agent actions and observations
Session management — Persistent agent sessions that survive browser refresh

This makes OpenHands more of a platform than a CLI tool. You can run it as a self-hosted service with multiple users, each with their own agent sessions and sandboxed environments.

Where OpenHands is weaker

Hard to judge from local code alone

The most important modern agent core is not fully in this repo snapshot. The V1 agent SDK lives elsewhere, so any assessment based on the local code is inherently incomplete.

Heavier infrastructure requirements

Docker-based sandboxing means you need Docker running. This is fine for a self-hosted platform but rules out quick "just install and run" usage that CLI agents like Crush or Claude Code support.

Bottom line

OpenHands is the most platform-shaped agent in this set. It's not a CLI tool — it's a self-hostable service with Docker sandboxing, a web UI, session management, and a composible prompt template system.

The temperature-bumping retry strategy alone is worth studying: it's the kind of production scar tissue that separates serious agent operators from weekend wrappers. The CondensationRequestTool, which lets the agent request its own context compression, is an agent-centric design philosophy that trusts the LLM with cognitive budget decisions.

The caveat is that the local repo is a partial snapshot. The V1 agent core moved to a separate SDK repository, so this represents the platform architecture more than the current agent implementation.