Compaction: OpenClaw vs Claude Code, and Prompt Cache Impact
OpenClaw Compaction
OpenClaw has a single-level compaction mechanism — LLM summarization. When a session approaches the model's context limit or receives an overflow API error (request_too_large, context_length_exceeded, etc.), it:
- Optionally runs a silent memory flush (agents write critical info to persistent storage before the boundary)
- Sends older conversation turns to the model for summarization
- Writes the condensed summary back to the
.jsonlsession transcript (durable) - Preserves tool call/result pairs through the summary (tool-pair invariant maintained)
Complementary mechanism — Session Pruning: Before each LLM call, OpenClaw separately trims aged tool outputs in-memory (soft-trim → ellipsis → hard-clear), without touching the transcript. This reduces the pressure that would otherwise trigger compaction earlier.
Configuration: model for summarization, identifierPolicy (strict/off/custom), notifyUser flag, plugin API registerCompactionProvider().
Claude Code Compaction
Claude Code runs a three-level hierarchy, cheapest-first, at the top of every query loop turn before the API call:
shouldAutoCompact?
└─ yes → trySessionMemoryCompaction()
└─ success → done
└─ null → Full LLM Compaction
└─ no → microcompactMessages()
└─ time-based trigger? → Time-Based Microcompact
└─ cached MC enabled? → Cached Microcompact (cache_edits API)
└─ otherwise → no-op
Level 1 — Microcompaction: Prunes tool results only, no summarization.
- Cached: deletes results server-side via the Cache Editing API — message content unchanged, cache prefix intact.
- Time-based: fires when the cache is certainly cold (long idle); mutates content directly.
Level 2 — Session Memory Compaction (experimental): Reuses the continuously-written session memory file as the summary — no API call needed. Keep boundary: ≥10K tokens / 5 messages, ≤40K tokens.
Level 3 — Full LLM Compaction: Sends the conversation for structured summarization into 9 sections (intent, concepts, files+snippets, errors, problem-solving, all user messages verbatim, pending tasks, current work, next step). Strips the <analysis> scratchpad; only <summary> enters context.
Auto-compact threshold: effective context window (model limit − 20K) − 13K tokens.
Circuit breaker: stops after 3 consecutive failures (MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES).
Side-by-Side Comparison
| Dimension | OpenClaw | Claude Code |
|---|---|---|
| Levels | 1 (LLM summarization only) | 3 (micro → session-memory → LLM) |
| Lightweight pruning | Session Pruning (separate, per-request) | Microcompaction (integrated in hierarchy) |
| Avoids API call | No | Yes — Level 2 reuses session memory file |
| Trigger | Context limit / overflow error | Fixed token threshold, checked every turn |
| Circuit breaker | Not documented | Yes (3 failures) |
| Partial compaction | Not documented | Yes (from / up_to pivot) |
| Post-compact restore | Not documented | Yes — files, skills, plan, MCP tools re-injected |
| Config surface | openclaw.json (model, identifierPolicy, notifyUser) |
Env vars + GrowthBook feature flags |
| Manual command | /compact [guidance] |
/compact [guidance] |
Prompt Cache Impact of Compaction
Every compaction that replaces the message array busts the server-side KV prompt cache — the cache key is system prompt + tools + model + messages prefix + thinking config, so any prefix change forces cold re-creation.
OpenClaw's docs do not describe a cache-preservation strategy for compaction.
Claude Code addresses this with six dedicated layers:
| Layer | Mechanism | What it protects |
|---|---|---|
| 1 — Cache-sharing fork | Summarization request sent through a forked agent carrying the parent's CacheSafeParams; fork hits the parent's warm cache. skipCacheWrite: true shifts cache_control to second-to-last to avoid writing the fork's turn into the KVCC. |
Full LLM compaction API call |
2 — Single cache_control marker |
Exactly one marker per request. Two markers would unnecessarily protect stale KV pages. | Every API call |
| 3 — TTL stability latch | 5-min vs 1-h TTL locked at session start; never re-read mid-session. Mixing TTLs mid-session busts ~20K tokens. | Entire session |
| 4 — Cached microcompaction | Deletes tool results via cache_edits blocks server-side — content unchanged, prefix identical → cache hit. Old deletion blocks pinned and re-sent every turn. |
Frequent tool-result pruning |
| 5 — Post-compaction baseline reset | notifyCompaction() nulls prevCacheReadTokens; notifyCacheDeletion() sets cacheDeletionsPending. Prevents break-detection from false-alarming on expected token drops. |
All compaction types |
| 6 — Break detection | Pre/post-call snapshot of the full cache key (system prompt hash, tool schema hash, cache_control hash, model, beta headers, etc.). Classifies drops as noise / expected / server eviction / explained break. Writes diff to temp file on genuine breaks. |
Ongoing observability |
Net effect by compaction type:
| Type | Cache outcome |
|---|---|
| Cached microcompact | No impact — server-side deletion only |
| Time-based microcompact | Cache already cold; content mutation acceptable |
| Session memory compaction | Busts cache; baseline reset prevents false break alerts |
| Full LLM compaction | Fork reuses parent cache; only post-compact tail is cold |
Partial from |
Cache-preserving (prefix unchanged) |
Partial up_to |
Cache-invalidating; baseline reset applied |
See Also
- Compaction — OpenClaw compaction detail
- Session Pruning — OpenClaw's per-request pruning counterpart
- Claude Code Compaction — full three-level reference
- Prompt Cache Management (Claude Code) — six-layer cache strategy detail