Compaction: OpenClaw vs Claude Code, and Prompt Cache Impact

#compaction #openclaw #claude-code #prompt-cache #context-management

OpenClaw Compaction

OpenClaw has a single-level compaction mechanism — LLM summarization. When a session approaches the model's context limit or receives an overflow API error (request_too_large, context_length_exceeded, etc.), it:

Optionally runs a silent memory flush (agents write critical info to persistent storage before the boundary)
Sends older conversation turns to the model for summarization
Writes the condensed summary back to the .jsonl session transcript (durable)
Preserves tool call/result pairs through the summary (tool-pair invariant maintained)

Complementary mechanism — Session Pruning: Before each LLM call, OpenClaw separately trims aged tool outputs in-memory (soft-trim → ellipsis → hard-clear), without touching the transcript. This reduces the pressure that would otherwise trigger compaction earlier.

Configuration: model for summarization, identifierPolicy (strict/off/custom), notifyUser flag, plugin API registerCompactionProvider().

Claude Code Compaction

Claude Code runs a three-level hierarchy, cheapest-first, at the top of every query loop turn before the API call:

shouldAutoCompact?
  └─ yes → trySessionMemoryCompaction()
               └─ success → done
               └─ null   → Full LLM Compaction
  └─ no  → microcompactMessages()
               └─ time-based trigger? → Time-Based Microcompact
               └─ cached MC enabled? → Cached Microcompact (cache_edits API)
               └─ otherwise → no-op

Level 1 — Microcompaction: Prunes tool results only, no summarization.

Cached: deletes results server-side via the Cache Editing API — message content unchanged, cache prefix intact.
Time-based: fires when the cache is certainly cold (long idle); mutates content directly.

Level 2 — Session Memory Compaction (experimental): Reuses the continuously-written session memory file as the summary — no API call needed. Keep boundary: ≥10K tokens / 5 messages, ≤40K tokens.

Level 3 — Full LLM Compaction: Sends the conversation for structured summarization into 9 sections (intent, concepts, files+snippets, errors, problem-solving, all user messages verbatim, pending tasks, current work, next step). Strips the <analysis> scratchpad; only <summary> enters context.

Auto-compact threshold: effective context window (model limit − 20K) − 13K tokens.
Circuit breaker: stops after 3 consecutive failures (MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES).

Side-by-Side Comparison

Dimension	OpenClaw	Claude Code
Levels	1 (LLM summarization only)	3 (micro → session-memory → LLM)
Lightweight pruning	Session Pruning (separate, per-request)	Microcompaction (integrated in hierarchy)
Avoids API call	No	Yes — Level 2 reuses session memory file
Trigger	Context limit / overflow error	Fixed token threshold, checked every turn
Circuit breaker	Not documented	Yes (3 failures)
Partial compaction	Not documented	Yes (`from` / `up_to` pivot)
Post-compact restore	Not documented	Yes — files, skills, plan, MCP tools re-injected
Config surface	`openclaw.json` (model, identifierPolicy, notifyUser)	Env vars + GrowthBook feature flags
Manual command	`/compact [guidance]`	`/compact [guidance]`

Prompt Cache Impact of Compaction

Every compaction that replaces the message array busts the server-side KV prompt cache — the cache key is system prompt + tools + model + messages prefix + thinking config, so any prefix change forces cold re-creation.

OpenClaw's docs do not describe a cache-preservation strategy for compaction.

Claude Code addresses this with six dedicated layers:

Layer	Mechanism	What it protects
1 — Cache-sharing fork	Summarization request sent through a forked agent carrying the parent's `CacheSafeParams`; fork hits the parent's warm cache. `skipCacheWrite: true` shifts `cache_control` to second-to-last to avoid writing the fork's turn into the KVCC.	Full LLM compaction API call
2 — Single `cache_control` marker	Exactly one marker per request. Two markers would unnecessarily protect stale KV pages.	Every API call
3 — TTL stability latch	5-min vs 1-h TTL locked at session start; never re-read mid-session. Mixing TTLs mid-session busts ~20K tokens.	Entire session
4 — Cached microcompaction	Deletes tool results via `cache_edits` blocks server-side — content unchanged, prefix identical → cache hit. Old deletion blocks pinned and re-sent every turn.	Frequent tool-result pruning
5 — Post-compaction baseline reset	`notifyCompaction()` nulls `prevCacheReadTokens`; `notifyCacheDeletion()` sets `cacheDeletionsPending`. Prevents break-detection from false-alarming on expected token drops.	All compaction types
6 — Break detection	Pre/post-call snapshot of the full cache key (system prompt hash, tool schema hash, `cache_control` hash, model, beta headers, etc.). Classifies drops as noise / expected / server eviction / explained break. Writes diff to temp file on genuine breaks.	Ongoing observability

Net effect by compaction type:

Type	Cache outcome
Cached microcompact	No impact — server-side deletion only
Time-based microcompact	Cache already cold; content mutation acceptable
Session memory compaction	Busts cache; baseline reset prevents false break alerts
Full LLM compaction	Fork reuses parent cache; only post-compact tail is cold
Partial `from`	Cache-preserving (prefix unchanged)
Partial `up_to`	Cache-invalidating; baseline reset applied