Compaction: OpenClaw vs Claude Code, and Prompt Cache Impact

OpenClaw Compaction

OpenClaw has a single-level compaction mechanism — LLM summarization. When a session approaches the model's context limit or receives an overflow API error (request_too_large, context_length_exceeded, etc.), it:

  1. Optionally runs a silent memory flush (agents write critical info to persistent storage before the boundary)
  2. Sends older conversation turns to the model for summarization
  3. Writes the condensed summary back to the .jsonl session transcript (durable)
  4. Preserves tool call/result pairs through the summary (tool-pair invariant maintained)

Complementary mechanism — Session Pruning: Before each LLM call, OpenClaw separately trims aged tool outputs in-memory (soft-trim → ellipsis → hard-clear), without touching the transcript. This reduces the pressure that would otherwise trigger compaction earlier.

Configuration: model for summarization, identifierPolicy (strict/off/custom), notifyUser flag, plugin API registerCompactionProvider().


Claude Code Compaction

Claude Code runs a three-level hierarchy, cheapest-first, at the top of every query loop turn before the API call:

shouldAutoCompact?
  └─ yes → trySessionMemoryCompaction()
               └─ success → done
               └─ null   → Full LLM Compaction
  └─ no  → microcompactMessages()
               └─ time-based trigger? → Time-Based Microcompact
               └─ cached MC enabled? → Cached Microcompact (cache_edits API)
               └─ otherwise → no-op

Level 1 — Microcompaction: Prunes tool results only, no summarization.

Level 2 — Session Memory Compaction (experimental): Reuses the continuously-written session memory file as the summary — no API call needed. Keep boundary: ≥10K tokens / 5 messages, ≤40K tokens.

Level 3 — Full LLM Compaction: Sends the conversation for structured summarization into 9 sections (intent, concepts, files+snippets, errors, problem-solving, all user messages verbatim, pending tasks, current work, next step). Strips the <analysis> scratchpad; only <summary> enters context.

Auto-compact threshold: effective context window (model limit − 20K) − 13K tokens.
Circuit breaker: stops after 3 consecutive failures (MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES).


Side-by-Side Comparison

Dimension OpenClaw Claude Code
Levels 1 (LLM summarization only) 3 (micro → session-memory → LLM)
Lightweight pruning Session Pruning (separate, per-request) Microcompaction (integrated in hierarchy)
Avoids API call No Yes — Level 2 reuses session memory file
Trigger Context limit / overflow error Fixed token threshold, checked every turn
Circuit breaker Not documented Yes (3 failures)
Partial compaction Not documented Yes (from / up_to pivot)
Post-compact restore Not documented Yes — files, skills, plan, MCP tools re-injected
Config surface openclaw.json (model, identifierPolicy, notifyUser) Env vars + GrowthBook feature flags
Manual command /compact [guidance] /compact [guidance]

Prompt Cache Impact of Compaction

Every compaction that replaces the message array busts the server-side KV prompt cache — the cache key is system prompt + tools + model + messages prefix + thinking config, so any prefix change forces cold re-creation.

OpenClaw's docs do not describe a cache-preservation strategy for compaction.

Claude Code addresses this with six dedicated layers:

Layer Mechanism What it protects
1 — Cache-sharing fork Summarization request sent through a forked agent carrying the parent's CacheSafeParams; fork hits the parent's warm cache. skipCacheWrite: true shifts cache_control to second-to-last to avoid writing the fork's turn into the KVCC. Full LLM compaction API call
2 — Single cache_control marker Exactly one marker per request. Two markers would unnecessarily protect stale KV pages. Every API call
3 — TTL stability latch 5-min vs 1-h TTL locked at session start; never re-read mid-session. Mixing TTLs mid-session busts ~20K tokens. Entire session
4 — Cached microcompaction Deletes tool results via cache_edits blocks server-side — content unchanged, prefix identical → cache hit. Old deletion blocks pinned and re-sent every turn. Frequent tool-result pruning
5 — Post-compaction baseline reset notifyCompaction() nulls prevCacheReadTokens; notifyCacheDeletion() sets cacheDeletionsPending. Prevents break-detection from false-alarming on expected token drops. All compaction types
6 — Break detection Pre/post-call snapshot of the full cache key (system prompt hash, tool schema hash, cache_control hash, model, beta headers, etc.). Classifies drops as noise / expected / server eviction / explained break. Writes diff to temp file on genuine breaks. Ongoing observability

Net effect by compaction type:

Type Cache outcome
Cached microcompact No impact — server-side deletion only
Time-based microcompact Cache already cold; content mutation acceptable
Session memory compaction Busts cache; baseline reset prevents false break alerts
Full LLM compaction Fork reuses parent cache; only post-compact tail is cold
Partial from Cache-preserving (prefix unchanged)
Partial up_to Cache-invalidating; baseline reset applied

See Also