How Claude Code Uses Prompt Caching During Compaction
Source: raw/docs/claude-code-compaction-prompt-cache.md
Summary
Every compaction event risks wiping the server-side KV prompt cache. Claude Code applies six engineering layers to minimize this cost.
Layer 1 — Cache-sharing fork (compact.ts / tengu_compact_cache_prefix): The summarization API call goes through a forked agent that carries the parent's CacheSafeParams (system prompt + tools + model + messages prefix + thinking config). The fork hits the parent's warm cache instead of paying cold cache_creation tokens.
skipCacheWrite: true shifts the cache_control marker to the second-to-last message, so the fork's own summarization turn is never written into the KVCC. Fallback: direct streaming call without cache sharing.
Layer 2 — Exactly one cache_control marker per request (claude.ts / addCacheBreakpoints()): Only one marker per API call (last or second-to-last user/assistant message). Two markers would protect second-to-last KV pages unnecessarily, preventing immediate freeing of local-attention pages.
Layer 3 — Cache TTL stability (claude.ts / should1hCacheTTL() / tengu_prompt_cache_1h_config): TTL (5 min vs 1 h) is latched at session start into bootstrap state and never re-read, even if GrowthBook updates mid-session. Mixing TTLs mid-session busts ~20K tokens. 1h granted to Anthropic internal users and Claude.ai subscribers not on overage.
Layer 4 — Cached Microcompaction (microCompact.ts / CACHED_MICROCOMPACT): Uses cache_edits deletion blocks to remove old tool results server-side without touching message content. Previously-sent blocks are pinned and re-sent on every subsequent turn (server requires this to apply edits on top of cached prefix). Time-based fallback mutates content directly (cache is cold anyway) and calls resetMicrocompactState() to discard stale pinned-edit state.
Layer 5 — Post-compaction baseline reset (promptCacheBreakDetection.ts / notifyCompaction()): After any compaction, prevCacheReadTokens = null resets the break-detection baseline. Without this, the first post-compact response's token drop would be misread as a cache break. notifyCacheDeletion() similarly suppresses expected drops from cached microcompact deletions.
Layer 6 — Cache Break Detection (promptCacheBreakDetection.ts): Two-phase per API call:
- Pre-call
recordPromptState(): snapshots system prompt hash, tool schema hash,cache_controlhash, per-tool hashes, model, fast mode, beta headers, effort value, extra body params, global cache strategy. - Post-call
checkResponseForCacheBreak(): comparescache_read_input_tokensto previous turn. Drops <5% or <2,000 tokens ignored as noise. Larger drops with no pending changes → server eviction/TTL. Larger drops with pending changes → explained break. Writes before/after diff to temp file for debugging.
Cache impact by compaction type
| Compaction type | Cache impact | Mitigation |
|---|---|---|
| Cached microcompact | None | cache_edits API + pinned re-sent blocks |
| Time-based microcompact | Cold (long idle) | Direct content mutation + resetMicrocompactState() |
| Session memory compaction | Replaces message array | notifyCompaction() resets baseline |
| Full LLM compaction | Replaces message array | Fork reuses parent cached prefix + notifyCompaction() |
Partial compaction (from) |
Prefix preserved | No bust for kept portion |
Partial compaction (up_to) |
Cache invalidated | notifyCompaction() resets baseline |