How Claude Code Uses Prompt Caching During Compaction

Source: raw/docs/claude-code-compaction-prompt-cache.md

Summary

Every compaction event risks wiping the server-side KV prompt cache. Claude Code applies six engineering layers to minimize this cost.

Layer 1 — Cache-sharing fork (compact.ts / tengu_compact_cache_prefix): The summarization API call goes through a forked agent that carries the parent's CacheSafeParams (system prompt + tools + model + messages prefix + thinking config). The fork hits the parent's warm cache instead of paying cold cache_creation tokens.
skipCacheWrite: true shifts the cache_control marker to the second-to-last message, so the fork's own summarization turn is never written into the KVCC. Fallback: direct streaming call without cache sharing.

Layer 2 — Exactly one cache_control marker per request (claude.ts / addCacheBreakpoints()): Only one marker per API call (last or second-to-last user/assistant message). Two markers would protect second-to-last KV pages unnecessarily, preventing immediate freeing of local-attention pages.

Layer 3 — Cache TTL stability (claude.ts / should1hCacheTTL() / tengu_prompt_cache_1h_config): TTL (5 min vs 1 h) is latched at session start into bootstrap state and never re-read, even if GrowthBook updates mid-session. Mixing TTLs mid-session busts ~20K tokens. 1h granted to Anthropic internal users and Claude.ai subscribers not on overage.

Layer 4 — Cached Microcompaction (microCompact.ts / CACHED_MICROCOMPACT): Uses cache_edits deletion blocks to remove old tool results server-side without touching message content. Previously-sent blocks are pinned and re-sent on every subsequent turn (server requires this to apply edits on top of cached prefix). Time-based fallback mutates content directly (cache is cold anyway) and calls resetMicrocompactState() to discard stale pinned-edit state.

Layer 5 — Post-compaction baseline reset (promptCacheBreakDetection.ts / notifyCompaction()): After any compaction, prevCacheReadTokens = null resets the break-detection baseline. Without this, the first post-compact response's token drop would be misread as a cache break. notifyCacheDeletion() similarly suppresses expected drops from cached microcompact deletions.

Layer 6 — Cache Break Detection (promptCacheBreakDetection.ts): Two-phase per API call:

Cache impact by compaction type

Compaction type Cache impact Mitigation
Cached microcompact None cache_edits API + pinned re-sent blocks
Time-based microcompact Cold (long idle) Direct content mutation + resetMicrocompactState()
Session memory compaction Replaces message array notifyCompaction() resets baseline
Full LLM compaction Replaces message array Fork reuses parent cached prefix + notifyCompaction()
Partial compaction (from) Prefix preserved No bust for kept portion
Partial compaction (up_to) Cache invalidated notifyCompaction() resets baseline

See Also