Nous Research • Self-Improving • Multi-Platform

Hermes Agent: The Outlier of the Group

Hermes, by Nous Research, was not covered in the original analysis of this repo set — an oversight worth correcting. It is the most architecturally distinct agent in the directory: not just a coding agent, but a self-improving, multi-platform autonomous system with a built-in learning loop.

(Alright, ad over. Back to the serious technical analysis.)

Why Hermes is different

Every other agent in this set is primarily a coding assistant that happens to run in a terminal or browser. Hermes is something broader: a general agent designed to be always-on, learn from its own history, delegate work to subagents, run on remote infrastructure, and talk to you through whichever messaging platform you prefer — including while it is still working.

☤

The caduceus symbol in the name

Hermes is named after the Greek messenger god — the patron of travelers, commerce, and transitions between worlds. The caduceus (☤) in the project branding is a deliberate choice: this is an agent designed to move information across boundaries, including between your laptop, a cloud VM, and a Telegram conversation.

The learning loop: skills, memory, and session search

The single most unusual thing about Hermes relative to every other agent in this study is its closed learning loop. After complex tasks, Hermes can create and refine skills — reusable Markdown-based procedures stored in ~/.hermes/skills/ — and inject them back into the next session's system prompt.

Skills as procedural memory

Each skill is a SKILL.md file in a structured directory:

~/.hermes/skills/
├── systematic-debugging/
│   └── SKILL.md        ← frontmatter + content
├── test-driven-development/
│   └── SKILL.md
└── code-review/
    └── SKILL.md

Frontmatter includes tags, related skills, authorship, and version. Skills are injected as user messages (not system prompt), which deliberately preserves the prompt cache for the entire session.

Memory files: MEMORY.md + USER.md

Two separate memory stores are loaded as a frozen snapshot at session start:

MEMORY.md — agent's personal notes, environment facts, project conventions, tool quirks
USER.md — what the agent knows about the user: preferences, workflow habits, communication style

Mid-session writes are durable (written to disk immediately) but do not update the current session's system prompt. The snapshot is refreshed on the next session start to avoid breaking the prefix cache. The delimiter between entries is § (the section sign), chosen because it is unambiguous in plaintext.

FTS5 session search

All conversation history is stored in SQLite using FTS5 full-text search. The session_search_tool lets the agent find its own past conversations by keyword, then summarizes relevant sessions using a cheap auxiliary model instead of loading raw transcripts into context. This is a compact, efficient long-term memory strategy compared to vector search.

Context compression: structured summaries

Hermes has a ContextCompressor class in agent/context_compressor.py with a five-step algorithm that goes significantly further than typical sliding-window approaches:

Prune old tool results

Replace the content of old tool result messages with a placeholder [Old tool output cleared to save context space]. This is a cheap pre-pass that requires no LLM call.

↓

Protect the head

Keep the system prompt and first exchange intact — these anchor the session's identity and cannot be summarized away.

↓

Protect the tail (token budget)

Protect the most recent ~20K tokens, not a fixed message count. This is adaptive: a model with a large context window gets a larger tail budget.

↓

LLM summarization of the middle

The middle turns are summarized with a structured prompt that produces five named sections: Goal, Progress, Decisions, Files, Next Steps. Summary budget is proportional to compressed content (capped at 12K tokens).

↓

Iterative updates on repeated compression

If a summary already exists from a prior compaction, the new pass updates it rather than creating a fresh one, preserving knowledge across multiple compression events.

The summary is injected with the prefix: "[CONTEXT COMPACTION] Earlier turns in this conversation were compacted to save context space…" This is transparent: the model is explicitly told it is reading a summary, not a live transcript.

Security: two layers of injection defense

Hermes is unusually security-minded about the content that gets loaded into its system prompt. Two separate scanning layers exist:

Context file scanning (AGENTS.md, .cursorrules, SOUL.md)

Before injecting context files into the system prompt, Hermes scans for:

Prompt injection phrases ("ignore previous instructions", "system prompt override")
Deception instructions ("do not tell the user")
Bypass attempts ("act as if you have no restrictions")
Hidden HTML comment injection ()
Invisible Unicode characters (zero-width spaces, bidi overrides)
Credential exfiltration via curl / cat on .env

Blocked content is replaced with a visible warning instead of silently dropped.

Memory injection scanning (MEMORY.md, USER.md)

Memory content goes through a separate scanner before loading, checking for the same injection patterns. This matters because memory persists across sessions — a successful injection in session A would propagate to session B if memory is not scanned on load.

The Tirith security scanner (tools/tirith_security.py) goes further: it wraps an external binary that checks commands for homograph URLs, pipe-to-interpreter patterns, and terminal injection. The binary is auto-downloaded from GitHub releases with SHA-256 verification and optional cosign provenance validation.

Multi-platform gateway architecture

Most agents in this set are terminal-first with optional API access. Hermes reverses the priority: it runs a persistent messaging gateway (gateway/run.py) that connects to 14+ messaging platforms via adapters in gateway/platforms/: Telegram, Discord, Slack, WhatsApp, Signal, WeChat/WeCom, Matrix, Mattermost, Feishu/Lark, DingTalk, Email, SMS, HomeAssistant, and a generic Webhook.

Platform	Adapter file	Notes
Telegram	`gateway/platforms/telegram.py`	Generates BotCommand menu from the central slash command registry
Discord	`gateway/platforms/discord.py`	Same slash command dispatch, adapts for Discord slash command format
Slack	`gateway/platforms/slack.py`	Maps `/hermes <subcommand>` via `slack_subcommand_map()`
WhatsApp	`gateway/platforms/whatsapp.py`	Voice memo transcription supported
Signal	`gateway/platforms/signal.py`	Cross-platform conversation continuity
WeChat / WeCom	`gateway/platforms/wecom.py`	Enterprise WeChat (WeCom/企业微信) integration
Matrix	`gateway/platforms/matrix.py`	Open federated messaging protocol
Mattermost	`gateway/platforms/mattermost.py`	Self-hosted team messaging
Feishu / Lark	`gateway/platforms/feishu.py`	WebSocket + Webhook; interactive card button-click events as commands; ACK emoji reactions; dedup across restarts
DingTalk	`gateway/platforms/dingtalk.py`	Alibaba enterprise messaging
Email	`gateway/platforms/email.py`	SMTP/IMAP integration
SMS	`gateway/platforms/sms.py`	SMS message gateway
Home Assistant	`gateway/platforms/homeassistant.py`	Smart home device control via gateway integration
Webhook	`gateway/platforms/webhook.py`	Generic inbound/outbound webhook adapter for custom integrations

💡

The slash command registry pattern

All slash commands are defined as CommandDef objects in a central COMMAND_REGISTRY list. Every downstream consumer — CLI autocomplete, Telegram BotCommand menu, Slack subcommand map, gateway help text, gateway dispatch — derives from this single registry automatically. Adding an alias requires changing exactly one field in one place.

Six execution backends

Hermes can run its terminal sessions across six different backends defined in tools/environments/:

Local

Standard local process execution. The default for most users.

Docker

Isolated containerized execution. Files and processes are sandboxed per session.

SSH

Remote execution via SSH. The agent's context stays local but commands run on a remote server.

Daytona

Serverless persistent workspace. The environment hibernates when idle and wakes on demand. The agent can stop and resume across days without losing state.

Modal

Serverless compute platform. Good for GPU workloads and tasks that need to run in the cloud but cost nearly nothing between sessions.

Singularity

HPC container format. Relevant for ML research workflows on cluster environments.

Smart model routing and IterationBudget

Smart model routing

agent/smart_model_routing.py routes each turn to either the configured "strong" model or a configured "cheap" model. The decision uses a _COMPLEX_KEYWORDS set of 47 words: debug, implement, refactor, traceback, analyze, benchmark, pytest, docker, kubernetes, delegate, subagent, cron, and more. If the user message contains none of those keywords and no URL pattern, choose_cheap_model_route() routes to the cheap model. This saves cost on simple queries transparently.

IterationBudget — per-role turn limits

The IterationBudget class enforces turn limits with role-aware values: max_total=90 for the parent agent, max_total=50 for sub-agents. The budget is thread-safe via threading.Lock. Crucially, execute_code calls include a refund() call — code execution turns don't consume budget, because they are "free" computation steps rather than reasoning steps. The max parallel tool workers is capped at 8 via _MAX_TOOL_WORKERS.

Interactive tools (clarify) are always run sequentially via _NEVER_PARALLEL_TOOLS. Path-scoped tools (read_file, write_file, patch) are safe to parallelize if targeting non-overlapping paths, determined by _paths_overlap().

Mixture-of-Agents (MoA) tool

Hermes includes a mixture_of_agents_tool in tools/mixture_of_agents_tool.py, implementing the MoA methodology from the arXiv paper "Mixture-of-Agents Enhances Large Language Model Capabilities" (Wang et al., 2406.04692). The architecture:

Reference models (parallel)

claude-opus-4.6, gemini-3-pro-preview, gpt-5.4-pro, deepseek-v3.2

These run in parallel, generating diverse initial responses to the same problem.

Aggregator model

claude-opus-4.6

Synthesizes the reference responses into a single high-quality output.

All reference model calls go through OpenRouter. The tool is specialized for "extremely difficult problems requiring intense reasoning" — coding, mathematics, and complex analytical tasks. This means Hermes can optionally spawn four frontier models in a single tool call and synthesize their output rather than relying on one model's judgment.

Subagent delegation

The delegate_tool (tools/delegate_tool.py) lets Hermes spawn isolated child agent instances for parallel workstreams. Key design decisions:

What children cannot do

Five tools are always stripped from child agents:

delegate_task — no recursive delegation (depth ≤ 2)
clarify — no user interaction from subagents
memory — no writes to shared MEMORY.md
send_message — no cross-platform side effects
execute_code — children should reason step-by-step

Isolation and concurrency

Each child gets a fresh conversation with no parent history, its own task_id, a focused system prompt built from the delegated goal, and a restricted toolset. The parent sees only the delegation call and the child's summary result — never the intermediate tool calls. Up to 3 children run concurrently via ThreadPoolExecutor.

Research and RL training infrastructure

Unlike every other agent in this study, Hermes includes infrastructure for training the next generation of tool-calling models:

Component	File	Purpose
Batch runner	`batch_runner.py`	Parallel batch trajectory generation for dataset creation
Trajectory compressor	`trajectory_compressor.py`	Compresses agent trajectories for training efficiency
RL environments	`environments/`	Atropos RL training environments (Nous Research's RL framework)
RL CLI	`rl_cli.py`	Command line for interacting with RL training pipelines
SWE runner	`mini_swe_runner.py`	Runs agent on software engineering benchmark tasks

The trajectory format serializes tool calls using <tool_call>/</tool_call> XML tags wrapping JSON — a format that matches the Hermes-style tool call convention used in Nous Research model fine-tuning. This gives the agent a built-in path to generate its own training data.

🔬

Test names reveal production bug history

The test suite encodes historical bugs in file names: test_860_dedup.py (message deduplication, issue #860), test_413_compression.py (HTTP 413 payload-too-large triggers), test_1630_context_overflow_loop.py (infinite loop caused by context overflows). These are regression tests for real incidents — a sign of production usage rather than just research development.

Tool count and architecture at a glance

Tool category	Files	Notable
Terminal / shell	`terminal_tool.py`, `environments/`	6 backends, process registry, signal handling
File operations	`file_tools.py`, `file_operations.py`	Read/write/search/patch with patch parser
Web	`web_tools.py`, `browser_tool.py`	Parallel + Firecrawl, Browserbase automation
Memory	`memory_tool.py`, `session_search_tool.py`	MEMORY.md/USER.md + FTS5 session search
Skills	`skill_manager_tool.py`, `skills_tool.py`	Create/edit/delete skills; skills hub integration
MCP	`mcp_tool.py` (~1050 lines)	Stdio + HTTP transport, exponential backoff, MCP sampling support
Delegation	`delegate_tool.py`	Isolated child agents, tool restriction, MAX_DEPTH=2
Intelligence	`mixture_of_agents_tool.py`	MoA: 4 reference models + 1 aggregator
Media / voice	`tts_tool.py`, `transcription_tools.py`, `voice_mode.py`	TTS output, speech transcription, voice memo handling
Messaging	`send_message_tool.py`	Cross-platform message dispatch from within agent tasks

How Hermes compares to the rest of this set

Capability	Hermes	Typical agent in this set
Cross-session learning	Yes — skills, MEMORY.md, FTS5 search	No persistent learning
Multi-platform messaging	Yes — Telegram, Discord, Slack, WhatsApp, Signal	Terminal / API only
Remote execution	Yes — Docker, SSH, Daytona, Modal, Singularity	Local process only
Subagent delegation	Yes — ThreadPoolExecutor, depth ≤ 2	Rare; DeerFlow does this via LangGraph
Prompt injection defense	Two layers — context files + memory	Not present in other repos studied
MoA synthesis	Yes — 4 models + aggregator	Not present in other repos studied
RL training data generation	Yes — batch runner, trajectory compression	Not present in other repos studied
Coding terminal agent	Yes but not the primary focus	Primary focus

⚠️

The tradeoff

Hermes pays for its breadth in focus. Its codebase is large and its feature surface is wide. The skills, memory, gateway, and RL subsystems all add complexity that a focused coding terminal agent like Crush or Claude Code does not carry. If you want a single sharp tool for coding, Hermes is not that. If you want a general-purpose agent that can run on a VPS, talk to you on Telegram, remember context across weeks, and train its own successor, nothing else in this repo set comes close.