Source - Hermes Agent Loop (Nous Research Developer Guide)
Source - Hermes Agent Loop (Nous Research Developer Guide)
URL: https://hermes-agent.nousresearch.com/docs/developer-guide/agent-loop
Author/Org: Nous Research
Ingested: 2026-04-14
Summary
Developer guide for the AIAgent class (run_agent.py) at the core of Nous Research's Hermes Agent system. Covers the full agent loop: prompt assembly, provider selection, turn lifecycle, tool dispatch, compression, budget tracking, and callback surfaces.
Key Takeaways
Scale
The AIAgent class is ~10,700 lines and handles prompt assembly, tool dispatch, and provider failover in a single orchestration engine.
API Mode Abstraction
Three execution modes with automatic resolution:
chat_completions— OpenAI-compatible endpointscodex_responses— OpenAI Codex/Responses API formatanthropic_messages— native Anthropic Messages API via adapter
Mode priority: explicit constructor arg → provider detection → base URL heuristics → default (chat_completions).
Turn Lifecycle (10 steps)
- Generate task_id if needed
- Append user message to history
- Build or reuse cached system prompt
- Check if compression needed (>50% context)
- Build API messages from conversation history
- Inject ephemeral prompt layers
- Apply Anthropic prompt caching markers if applicable
- Make interruptible API call
- Parse response → execute tools or return final response
Interruptible Calls
API requests run in background threads; main thread monitors for interrupts. On interrupt: thread abandoned, partial results discarded — never injected into history.
Tool Execution
- Single tool call: sequential, main thread
- Multiple tool calls: concurrent via
ThreadPoolExecutor; results reinserted in original call order - Interactive tools (e.g.,
clarify): force sequential
Agent-level tools intercepted before the registry: todo, memory, session_search, delegate_task (spawns subagents).
Iteration Budget
- Default: 90 turns (
agent.max_turns) - Subagents capped at
delegation.max_iterations(default 50) - At 100%: agent stops and returns summary
Compression
- Preflight trigger: >50% of context window
- Gateway auto-compression: >85%
- Procedure: flush memory to disk → summarize middle turns → preserve last N messages intact → keep tool call/result pairs together → generate new session lineage ID
Fallback Model
On primary model failure (rate limits, server errors, auth failures): attempts fallback providers in configured order, continues conversation with successful provider.
Message Format
All messages use OpenAI-compatible format internally:
{"role": "system", "content": "..."}
{"role": "user", "content": "..."}
{"role": "assistant", "content": "...", "tool_calls": [...]}
{"role": "tool", "tool_call_id": "...", "content": "..."}
Reasoning content stored in assistant_msg["reasoning"].
Alternation rules: strict user/assistant alternation; only tool role allows consecutive entries.
Callback Surfaces
tool_progress_callback, thinking_callback, reasoning_callback, clarify_callback, step_callback, stream_delta_callback, status_callback.
Key Files
| File | Purpose |
|---|---|
run_agent.py |
AIAgent class (~10,700 lines) |
prompt_builder.py |
System prompt assembly |
context_engine.py |
Pluggable context management |
context_compressor.py |
Lossy summarization algorithm |
prompt_caching.py |
Anthropic caching markers |
auxiliary_client.py |
Auxiliary LLM for side tasks |
model_tools.py |
Tool schema collection and dispatch |
Related Pages
- Hermes Agent Loop — concept page distilling the mechanics
- HermesAgent — entity page for the system
- NousResearch — entity page for the org
- Compaction — overlapping concept (compression mechanics)
- Agent Loop (OpenClaw) — compare: OpenClaw's loop in the same knowledge base
- pi-mono Agent Loop — compare: pi-mono's inner loop