Hermes Agent Loop

Hermes Agent Loop

The core execution cycle of HermesAgent, implemented in AIAgent (run_agent.py). Manages prompt assembly, provider selection, tool dispatch, compression, and budget enforcement.

Entry Points

Method Returns
agent.chat() Final response string
agent.run_conversation() Dict: {messages, metadata, usage}

chat() wraps run_conversation() and extracts final_response.

Turn Lifecycle

Each iteration:

  1. Generate task_id if not present
  2. Append user message to conversation history
  3. Build system prompt (or reuse cached)
  4. Preflight compression check — if history >50% of context window, compress before calling model
  5. Build API messages from conversation history
  6. Inject ephemeral prompt layers
  7. Apply prompt caching markers (Anthropic mode only)
  8. Interruptible API call — runs in background thread; main thread polls for cancellation
  9. Parse response — if tool calls present, execute them and loop; otherwise return final response

API Mode Selection

Priority: explicit constructor arg → provider detection → base URL heuristics → chat_completions

Mode Client
chat_completions openai.OpenAI
codex_responses openai.OpenAI (Responses format)
anthropic_messages anthropic.Anthropic via adapter

Message Format & Alternation

All messages stored in OpenAI-compatible format. Strict alternation: User → Assistant → User → Assistant. Only tool role may appear consecutively (one per tool call result). Reasoning content stored in assistant_msg["reasoning"].

Tool Dispatch

Agent-level tools (intercepted before registry):

Tool Effect
todo Read/write agent-local task state
memory Write to persistent memory files
session_search Query session history
delegate_task Spawn subagent with isolated context

Iteration Budget

Fallback / Provider Failover

On primary model failure (rate limit, server error, auth): attempts fallback providers in configured order. Conversation continues with whichever provider succeeds.

Compression

Trigger Threshold
Preflight (agent-side) >50% of context window
Gateway auto-compression >85% of context window

Compression procedure:

  1. Flush memory to disk
  2. Summarize middle turns (lossy)
  3. Preserve last N messages intact
  4. Keep tool call / result pairs together
  5. Generate new session lineage ID

After each turn, messages are persisted to session store and memory changes are flushed to files — enabling later resumption.

Callback Surfaces

Callback Fires when
tool_progress_callback Before/after each tool execution
thinking_callback Model starts/stops thinking
reasoning_callback Reasoning content returned
clarify_callback clarify tool invoked
step_callback After each agent turn
stream_delta_callback Each streaming token
status_callback State changes

Key Source Files

File Role
run_agent.py AIAgent class — main loop
prompt_builder.py System prompt assembly
context_engine.py Pluggable context management
context_compressor.py Lossy summarization
prompt_caching.py Anthropic caching markers
auxiliary_client.py Side-task LLM calls
model_tools.py Tool schema + dispatch

Comparisons

Aspect Hermes Agent OpenClaw pi-mono
Tool concurrency ThreadPoolExecutor
Compression trigger 50% / 85%
Budget tracking parent + child caps
Interruptible calls background thread
Message format OpenAI-compat OpenAI-compat OpenAI-compat

See Agent Loop (OpenClaw) and pi-mono Agent Loop for the other implementations in this wiki.