Hermes Agent: The Outlier of the Group
Hermes, by Nous Research, was not covered in the original analysis of this repo set โ an oversight worth correcting. It is the most architecturally distinct agent in the directory: not just a coding agent, but a self-improving, multi-platform autonomous system with a built-in learning loop.
Why Hermes is different
Every other agent in this set is primarily a coding assistant that happens to run in a terminal or browser. Hermes is something broader: a general agent designed to be always-on, learn from its own history, delegate work to subagents, run on remote infrastructure, and talk to you through whichever messaging platform you prefer โ including while it is still working.
The caduceus symbol in the name
Hermes is named after the Greek messenger god โ the patron of travelers, commerce, and transitions between worlds. The caduceus (โค) in the project branding is a deliberate choice: this is an agent designed to move information across boundaries, including between your laptop, a cloud VM, and a Telegram conversation.
The learning loop: skills, memory, and session search
The single most unusual thing about Hermes relative to every other agent in this study
is its closed learning loop. After complex tasks, Hermes can create and
refine skills โ reusable Markdown-based procedures stored in
~/.hermes/skills/ โ and inject them back into the next session's system
prompt.
Skills as procedural memory
Each skill is a SKILL.md file in a structured directory:
~/.hermes/skills/
โโโ systematic-debugging/
โ โโโ SKILL.md โ frontmatter + content
โโโ test-driven-development/
โ โโโ SKILL.md
โโโ code-review/
โโโ SKILL.md
Frontmatter includes tags, related skills, authorship, and version. Skills are injected as user messages (not system prompt), which deliberately preserves the prompt cache for the entire session.
Memory files: MEMORY.md + USER.md
Two separate memory stores are loaded as a frozen snapshot at session start:
- MEMORY.md โ agent's personal notes, environment facts, project conventions, tool quirks
- USER.md โ what the agent knows about the user: preferences, workflow habits, communication style
Mid-session writes are durable (written to disk immediately) but do
not update the current session's system prompt. The snapshot
is refreshed on the next session start to avoid breaking the prefix cache.
The delimiter between entries is ยง (the section sign), chosen
because it is unambiguous in plaintext.
FTS5 session search
All conversation history is stored in SQLite using FTS5 full-text search.
The session_search_tool lets the agent find its own past
conversations by keyword, then summarizes relevant sessions using a cheap
auxiliary model instead of loading raw transcripts into context. This is a
compact, efficient long-term memory strategy compared to vector search.
Context compression: structured summaries
Hermes has a ContextCompressor class in
agent/context_compressor.py with a five-step algorithm that goes
significantly further than typical sliding-window approaches:
Replace the content of old tool result messages with a placeholder
[Old tool output cleared to save context space]. This is
a cheap pre-pass that requires no LLM call.
Keep the system prompt and first exchange intact โ these anchor the session's identity and cannot be summarized away.
Protect the most recent ~20K tokens, not a fixed message count. This is adaptive: a model with a large context window gets a larger tail budget.
The middle turns are summarized with a structured prompt that produces five named sections: Goal, Progress, Decisions, Files, Next Steps. Summary budget is proportional to compressed content (capped at 12K tokens).
If a summary already exists from a prior compaction, the new pass updates it rather than creating a fresh one, preserving knowledge across multiple compression events.
The summary is injected with the prefix: "[CONTEXT COMPACTION] Earlier turns in this conversation were compacted to save context spaceโฆ" This is transparent: the model is explicitly told it is reading a summary, not a live transcript.
Security: two layers of injection defense
Hermes is unusually security-minded about the content that gets loaded into its system prompt. Two separate scanning layers exist:
Context file scanning (AGENTS.md, .cursorrules, SOUL.md)
Before injecting context files into the system prompt, Hermes scans for:
- Prompt injection phrases ("ignore previous instructions", "system prompt override")
- Deception instructions ("do not tell the user")
- Bypass attempts ("act as if you have no restrictions")
- Hidden HTML comment injection (
<!-- ignore -->) - Invisible Unicode characters (zero-width spaces, bidi overrides)
- Credential exfiltration via
curl/caton.env
Blocked content is replaced with a visible warning instead of silently dropped.
Memory injection scanning (MEMORY.md, USER.md)
Memory content goes through a separate scanner before loading, checking for the same injection patterns. This matters because memory persists across sessions โ a successful injection in session A would propagate to session B if memory is not scanned on load.
The Tirith security scanner (tools/tirith_security.py) goes
further: it wraps an external binary that checks commands for homograph URLs,
pipe-to-interpreter patterns, and terminal injection. The binary is
auto-downloaded from GitHub releases with SHA-256 verification and optional
cosign provenance validation.
Multi-platform gateway architecture
Most agents in this set are terminal-first with optional API access. Hermes reverses
the priority: it runs a persistent messaging gateway
(gateway/run.py) that connects to
14+ messaging platforms via adapters in gateway/platforms/:
Telegram, Discord, Slack, WhatsApp, Signal, WeChat/WeCom, Matrix, Mattermost,
Feishu/Lark, DingTalk, Email, SMS, HomeAssistant, and a generic Webhook.
| Platform | Adapter file | Notes |
|---|---|---|
| Telegram | gateway/platforms/telegram.py |
Generates BotCommand menu from the central slash command registry |
| Discord | gateway/platforms/discord.py |
Same slash command dispatch, adapts for Discord slash command format |
| Slack | gateway/platforms/slack.py |
Maps /hermes <subcommand> via slack_subcommand_map() |
gateway/platforms/whatsapp.py |
Voice memo transcription supported | |
| Signal | gateway/platforms/signal.py |
Cross-platform conversation continuity |
| WeChat / WeCom | gateway/platforms/wecom.py |
Enterprise WeChat (WeCom/ไผไธๅพฎไฟก) integration |
| Matrix | gateway/platforms/matrix.py |
Open federated messaging protocol |
| Mattermost | gateway/platforms/mattermost.py |
Self-hosted team messaging |
| Feishu / Lark | gateway/platforms/feishu.py |
WebSocket + Webhook; interactive card button-click events as commands; ACK emoji reactions; dedup across restarts |
| DingTalk | gateway/platforms/dingtalk.py |
Alibaba enterprise messaging |
gateway/platforms/email.py |
SMTP/IMAP integration | |
| SMS | gateway/platforms/sms.py |
SMS message gateway |
| Home Assistant | gateway/platforms/homeassistant.py |
Smart home device control via gateway integration |
| Webhook | gateway/platforms/webhook.py |
Generic inbound/outbound webhook adapter for custom integrations |
The slash command registry pattern
All slash commands are defined as CommandDef objects in a
central COMMAND_REGISTRY list. Every downstream consumer โ
CLI autocomplete, Telegram BotCommand menu, Slack subcommand map,
gateway help text, gateway dispatch โ derives from this single registry
automatically. Adding an alias requires changing exactly one field in one
place.
Six execution backends
Hermes can run its terminal sessions across six different backends defined in
tools/environments/:
Local
Standard local process execution. The default for most users.
Docker
Isolated containerized execution. Files and processes are sandboxed per session.
SSH
Remote execution via SSH. The agent's context stays local but commands run on a remote server.
Daytona
Serverless persistent workspace. The environment hibernates when idle and wakes on demand. The agent can stop and resume across days without losing state.
Modal
Serverless compute platform. Good for GPU workloads and tasks that need to run in the cloud but cost nearly nothing between sessions.
Singularity
HPC container format. Relevant for ML research workflows on cluster environments.
Smart model routing and IterationBudget
Smart model routing
agent/smart_model_routing.py routes each turn to either the
configured "strong" model or a configured "cheap" model. The decision
uses a _COMPLEX_KEYWORDS set of 47 words:
debug, implement, refactor,
traceback, analyze, benchmark,
pytest, docker, kubernetes,
delegate, subagent, cron, and more.
If the user message contains none of those keywords and no URL
pattern, choose_cheap_model_route() routes to the cheap model.
This saves cost on simple queries transparently.
IterationBudget โ per-role turn limits
The IterationBudget class enforces turn limits with
role-aware values: max_total=90 for the parent agent,
max_total=50 for sub-agents. The budget is thread-safe
via threading.Lock. Crucially, execute_code
calls include a refund() call โ code execution turns don't
consume budget, because they are "free" computation steps rather than
reasoning steps. The max parallel tool workers is capped at
8 via _MAX_TOOL_WORKERS.
Interactive tools (clarify) are always run sequentially
via _NEVER_PARALLEL_TOOLS. Path-scoped tools
(read_file, write_file, patch)
are safe to parallelize if targeting non-overlapping paths, determined
by _paths_overlap().
Mixture-of-Agents (MoA) tool
Hermes includes a mixture_of_agents_tool in
tools/mixture_of_agents_tool.py, implementing the MoA methodology
from the arXiv paper "Mixture-of-Agents Enhances Large Language Model Capabilities"
(Wang et al., 2406.04692). The architecture:
Reference models (parallel)
claude-opus-4.6, gemini-3-pro-preview, gpt-5.4-pro, deepseek-v3.2These run in parallel, generating diverse initial responses to the same problem.
Aggregator model
claude-opus-4.6Synthesizes the reference responses into a single high-quality output.
All reference model calls go through OpenRouter. The tool is specialized for "extremely difficult problems requiring intense reasoning" โ coding, mathematics, and complex analytical tasks. This means Hermes can optionally spawn four frontier models in a single tool call and synthesize their output rather than relying on one model's judgment.
Subagent delegation
The delegate_tool (tools/delegate_tool.py) lets Hermes
spawn isolated child agent instances for parallel workstreams. Key design decisions:
What children cannot do
Five tools are always stripped from child agents:
delegate_taskโ no recursive delegation (depth โค 2)clarifyโ no user interaction from subagentsmemoryโ no writes to shared MEMORY.mdsend_messageโ no cross-platform side effectsexecute_codeโ children should reason step-by-step
Isolation and concurrency
Each child gets a fresh conversation with no parent history, its own
task_id, a focused system prompt built from the delegated goal,
and a restricted toolset. The parent sees only the delegation call and the
child's summary result โ never the intermediate tool calls.
Up to 3 children run concurrently via ThreadPoolExecutor.
Research and RL training infrastructure
Unlike every other agent in this study, Hermes includes infrastructure for training the next generation of tool-calling models:
| Component | File | Purpose |
|---|---|---|
| Batch runner | batch_runner.py |
Parallel batch trajectory generation for dataset creation |
| Trajectory compressor | trajectory_compressor.py |
Compresses agent trajectories for training efficiency |
| RL environments | environments/ |
Atropos RL training environments (Nous Research's RL framework) |
| RL CLI | rl_cli.py |
Command line for interacting with RL training pipelines |
| SWE runner | mini_swe_runner.py |
Runs agent on software engineering benchmark tasks |
The trajectory format serializes tool calls using
<tool_call>/</tool_call> XML tags wrapping
JSON โ a format that matches the Hermes-style tool call convention used in
Nous Research model fine-tuning. This gives the agent a built-in path to
generate its own training data.
Test names reveal production bug history
The test suite encodes historical bugs in file names:
test_860_dedup.py (message deduplication, issue #860),
test_413_compression.py (HTTP 413 payload-too-large triggers),
test_1630_context_overflow_loop.py (infinite loop caused by context overflows).
These are regression tests for real incidents โ a sign of production usage
rather than just research development.
Tool count and architecture at a glance
| Tool category | Files | Notable |
|---|---|---|
| Terminal / shell | terminal_tool.py, environments/ |
6 backends, process registry, signal handling |
| File operations | file_tools.py, file_operations.py |
Read/write/search/patch with patch parser |
| Web | web_tools.py, browser_tool.py |
Parallel + Firecrawl, Browserbase automation |
| Memory | memory_tool.py, session_search_tool.py |
MEMORY.md/USER.md + FTS5 session search |
| Skills | skill_manager_tool.py, skills_tool.py |
Create/edit/delete skills; skills hub integration |
| MCP | mcp_tool.py (~1050 lines) |
Stdio + HTTP transport, exponential backoff, MCP sampling support |
| Delegation | delegate_tool.py |
Isolated child agents, tool restriction, MAX_DEPTH=2 |
| Intelligence | mixture_of_agents_tool.py |
MoA: 4 reference models + 1 aggregator |
| Media / voice | tts_tool.py, transcription_tools.py, voice_mode.py |
TTS output, speech transcription, voice memo handling |
| Messaging | send_message_tool.py |
Cross-platform message dispatch from within agent tasks |
How Hermes compares to the rest of this set
| Capability | Hermes | Typical agent in this set |
|---|---|---|
| Cross-session learning | Yes โ skills, MEMORY.md, FTS5 search | No persistent learning |
| Multi-platform messaging | Yes โ Telegram, Discord, Slack, WhatsApp, Signal | Terminal / API only |
| Remote execution | Yes โ Docker, SSH, Daytona, Modal, Singularity | Local process only |
| Subagent delegation | Yes โ ThreadPoolExecutor, depth โค 2 | Rare; DeerFlow does this via LangGraph |
| Prompt injection defense | Two layers โ context files + memory | Not present in other repos studied |
| MoA synthesis | Yes โ 4 models + aggregator | Not present in other repos studied |
| RL training data generation | Yes โ batch runner, trajectory compression | Not present in other repos studied |
| Coding terminal agent | Yes but not the primary focus | Primary focus |
The tradeoff
Hermes pays for its breadth in focus. Its codebase is large and its feature surface is wide. The skills, memory, gateway, and RL subsystems all add complexity that a focused coding terminal agent like Crush or Claude Code does not carry. If you want a single sharp tool for coding, Hermes is not that. If you want a general-purpose agent that can run on a VPS, talk to you on Telegram, remember context across weeks, and train its own successor, nothing else in this repo set comes close.