Generic adapters vs model-native engineering
Almost every repo says it supports many models. The real question is how deep that support goes: shared config only, or real provider-specific behavior, auth, and prompt or tool adaptation?
The key split
Provider-agnostic systems
Mux, Neovate, and Qwen Code put significant effort into treating providers as interchangeable enough to route, catalog, and configure them through shared abstractions.
Model-native systems
Claude Code, Kimi CLI, and parts of DeerFlow or Pochi are more willing to write code that is obviously shaped around a specific model family or platform.
Model plumbing matrix
| Repo | Provider strategy | Model-specific tuning depth | Key evidence from the repo |
|---|---|---|---|
| Claude Code | Anthropic-centered | Very high | Anthropic SDK dependency, model-first runtime assumptions, and deep integration with Claude-style capabilities |
| Pochi | Shared config plus vendor-specific packages | High | Separate packages for Codex, Qwen Code, GitHub Copilot, Gemini CLI, and others |
| DeerFlow | Config-driven class loading | High | Factory chooses provider classes; vLLM provider preserves Qwen reasoning fields; Codex and Claude CLI-backed models appear in examples |
| Kimi CLI | Platform-centered with configurable providers | Medium-high | Moonshot and Kimi platform auth flows, managed model refresh, ACP handling |
| Mux | Provider catalog and routing factory | Medium | Known model catalog, provider constants, fetch wrapping, gateway and local-provider logic |
| Qwen Code | Unified resolver with special auth paths | Medium | ModelConfigResolver, models registry, runtime snapshots, Qwen OAuth path alongside generic APIs |
| Neovate Code | Large explicit provider matrix | Medium | Many provider files and shared provider types under one AI SDK-style layer |
| Crush | Generic provider abstraction with productized metadata handling | Medium | Provider metadata fetching and caching in Go; broad support without a single dominant native model identity |
| OpenHands | Historically broad via LiteLLM-style compatibility | Hard to judge | The local repo shows enough to infer flexibility, but not enough to fully score the current V1 agent core |
| Hermes | Model-neutral via OpenRouter with 200+ model support | High breadth | No native model bias; uses OpenRouter as the abstraction layer. MoA tool runs claude-opus, gemini-pro, gpt-5, deepseek simultaneously. |
The Qwen Code five-layer config resolver
Qwen Code's ModelConfigResolver in
packages/core/src/models/modelConfigResolver.ts is the most
rigorous model configuration system in this TypeScript set. It defines
five typed source layers with explicit precedence (highest to lowest):
Explicit ModelProviders config selection — highest authority. Maps to modelProvidersSource type.
Flags like --model, --openaiApiKey. Maps to cliSource.
OPENAI_API_KEY, OPENAI_MODEL, etc. Maps to envLayer.
User or workspace settings. Maps to settingsSource.
Built-in fallback values. Maps to defaultSource / computedSource.
Every resolved configuration field carries its source type, so you can always
trace where a value came from. The OAuth flow for Qwen models is gated by a
QWEN_OAUTH_ALLOWED_MODELS list — Qwen-specific auth is not
exposed for non-Qwen models even if someone builds a config that tries to use it.
Kimi CLI's kosong package — provider-native message conversion
Kimi CLI is the only agent in this set that ships its own separate
abstraction package (kosong) for multi-provider message
conversion. Rather than a single generic converter, it has a dedicated file per
provider family:
| File | Handles | Key non-obvious detail |
|---|---|---|
anthropic.py |
Anthropic Messages API | Tracks tool_use_id for result correlation; handles Anthropic's tool_use block format specifically |
google_genai.py |
Google GenAI / Gemini | Strips id field from function_call and function_response parts — Gemini rejects it. thought_signature is preserved for thinking tokens. |
openai_responses.py |
OpenAI Responses API | Handles function_call/function_call_output item types; tracks conversation state across multi-turn tool use |
The Google adapter is worth noting specifically: most agents that "support Google"
discover the id-field rejection in production and add a hotfix.
Kimi CLI has API snapshot tests for this specific case, meaning it was caught and
tested before shipping.
Kimi CLI's compression prompt — structured XML output format
Kimi CLI's context compression is driven by a dedicated
prompts/compact.md prompt that instructs the model to produce
structured XML output — not free-form prose. The required output uses
named tags with explicit retention rules:
| Tag | Content | Retention rule |
|---|---|---|
<current_focus> |
Active task state | Always keep — this is what the agent is doing right now |
<active_issues> |
Errors, stack traces, working solutions | MUST KEEP errors and working solutions verbatim |
<code_state> |
Nested <file> blocks (final versions only) |
Keep full if <20 lines; else signature + key logic only. REMOVE failed attempts. |
<completed_tasks> |
What has been done | MERGE similar discussions; CONDENSE to outcomes |
<environment> |
System context | Keep stable facts; drop ephemeral state |
<important_context> |
Design decisions and TODO items | Keep design rationale; REMOVE redundant explanations |
Kimi also ships a prompts/init.md that instructs the model to
explore the project codebase and produce an AGENTS.md file: project
overview, build/test commands, code style, testing instructions, and security
considerations — using the project's native language. This is a structured
project-onboarding ritual that no other agent formalizes as a named prompt.
OpenHands — Nine Jinja2 prompt templates with XML sections
OpenHands has the most modular prompt system in the set: 9 Jinja2
.j2 templates in openhands/agenthub/codeact_agent/prompts/,
assembled at runtime. The main template uses named XML sections:
<ROLE>, <EFFICIENCY>,
<FILE_SYSTEM_GUIDELINES>, <CODE_QUALITY>,
<VERSION_CONTROL>, <PULL_REQUESTS>,
<PROBLEM_SOLVING_WORKFLOW>, <SECURITY>,
<EXTERNAL_SERVICES>, and <ENVIRONMENT_SETUP>.
The security section includes
{% include 'security_risk_assessment.j2' %} — a composable sub-template,
not inline text.
The long-horizon variant (system_prompt_long_horizon.j2) extends the
base to add <TASK_MANAGEMENT> and
<TASK_TRACKING_PERSISTENCE> for the task_tracker
tool. Additional templates: in_context_learning_example.j2,
microagent_info.j2, additional_info.j2,
system_prompt_interactive.j2,
system_prompt_tech_philosophy.j2.
Codex — OpenAI-native with provider extensibility
Codex is built by OpenAI, and its default wire protocol is the
Responses API (wire_api = "responses").
The legacy Chat Completions API (wire_api = "chat") is
explicitly no longer supported — Codex produces a migration error at
config parse time if you try to use it. However, Codex is not hard-coded
to OpenAI's cloud:
| Feature | Detail |
|---|---|
| User-defined providers | [model_providers] table in config.toml — supports Ollama, LM Studio, any OpenAI-compatible endpoint |
| Dedicated integration crates | codex-ollama and codex-lmstudio — full client implementations with discovery, connection, and error handling, not thin config wrappers |
| Retry defaults | 300,000 ms stream idle timeout, 5 stream max retries, 4 request max retries (hard cap: 100) |
| Plan mode reasoning | plan_mode_reasoning_effort config for model-specific reasoning presets (defaults to medium, supports none) |
| Tracing | OpenTelemetry via codex-otel crate |
| Provider registry | codex-model-provider-info — built-in defaults for OpenAI, user overrides from config at runtime |
The WireApi enum currently has a single variant
(Responses), which means Codex is more OpenAI-centric
than agents like Mux or Qwen Code that abstract across many API shapes.
But the dedicated Ollama and LM Studio crates show OpenAI is investing
in local model support beyond just cloud API access.
Hermes — Maximum provider breadth via OpenRouter
Hermes takes the opposite approach from Claude Code. Rather than being shaped around one model family, it is model-neutral by design: OpenRouter provides access to 200+ models through a single API endpoint. There is no native model assumption anywhere in the codebase.
This unlocks a unique feature: the Mixture of Agents (MoA) tool.
Because Hermes has no model loyalty, it can run four different frontier models
(Claude Opus, Gemini Pro, GPT-5, DeepSeek v3) in parallel via
ThreadPoolExecutor, collect all four responses, and pass them
to a fifth aggregator model that synthesizes the best answer. This is the only
built-in MoA implementation in this set.
See the dedicated Hermes page for detailed coverage of the MoA tool and aggregation algorithm.
DeerFlow — LangGraph model factory
DeerFlow's model loading uses LangGraph's native model factory pattern.
The vLLM provider specializes to preserve Qwen's reasoning
fields, and the factory supports CLI-backed models (models where the actual
inference is done by a subprocess CLI command rather than an API call).
This extensibility is a result of the LangGraph architecture — any
LangChain-compatible model can be plugged in via config.
The langgraph.json file defines graph entrypoints and can
specify different models per node — meaning the planner, writer, and
tools can theoretically use different models without any code changes,
just configuration.
Claude Code
The purest example of model-native engineering. Instead of abstracting away Claude's shape, the repo leans into it and builds a full product around those assumptions.
Pochi
The most explicit multi-ecosystem adapter. It proves that "supports many models" can still mean dedicated code for each ecosystem, not just endpoint compatibility.
DeerFlow
The most flexible in a framework sense. It allows provider classes to specialize around thinking tokens, vLLM quirks, and CLI-backed model wrappers.
Codex
OpenAI-native but provider-extensible. Responses API by default, with
dedicated Ollama and LM Studio crates, configurable retry logic, and
plan-mode reasoning presets. The single-variant WireApi
enum makes it less generic than Mux, but the local model support is
more substantial than thin config adapters.
Auth and configuration maturity
Mux, Qwen Code, and Neovate do the best job of making model choice feel like a governed system instead of a loose settings file. Qwen is especially strong here: it resolves model configuration from multiple sources, tracks origin, and snapshots runtime state.
Claude Code and Kimi CLI are far more willing to admit that provider behavior is not generic. Claude's repo is structurally centered on Anthropic. Kimi includes explicit OAuth and managed platform logic for Moonshot and Kimi Code experiences.
DeerFlow approaches the problem from a harness angle. Instead of one canonical provider matrix, it loads configured model classes and lets middleware or role configuration shape how they are used. That is more extensible, but less immediately uniform.
My verdict on model handling
If you want the best provider-neutral design, look at Mux and Qwen Code. If you want the most honest model-native design, Claude Code is the clearest answer. If you want the most explicit multi-ecosystem adapter, Pochi wins. If you want the most extensible model factory, DeerFlow is the most interesting. If you want the most maximum model breadth (200+ models, no native preference), Hermes via OpenRouter wins outright. And if you want OpenAI-native with real local model support, Codex offers dedicated Ollama and LM Studio crates, not just config-level provider overrides.
The lesson is that "supports many models" is not one thing. Some repos are really doing transport compatibility. Some are doing routing. Some are doing full provider-specific product behavior. And some — like Hermes with MoA — are doing multi-model synthesis that requires genuine model-neutrality to work. The difference matters a lot when tool calling, reasoning settings, auth flows, or MCP behavior start to diverge.