Claude Code Compaction Policy

#claude-code #compaction #context-management

Source: raw/docs/claude-code-compaction-policy.md

Summary

Claude Code uses a three-level compaction hierarchy — cheapest first — to prevent conversation history from exceeding the model's context window.

Level 1: Microcompaction (src/services/compact/microCompact.ts) — prunes individual tool results without summarizing. Two sub-modes:

Cached microcompaction (CACHED_MICROCOMPACT feature flag): uses Cache Editing API to delete old tool results server-side, preserving the prompt cache prefix. Only on main thread; applies to Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write results.
Time-based microcompaction: fires when gap since last assistant message exceeds threshold (cache is cold anyway). Clears all but N most recent compactable tool results in message content directly.

Level 2: Session Memory Compaction (src/services/compact/sessionMemoryCompact.ts, experimental) — avoids an API call by reusing the continuously-written session memory file as the summary. Keeps recent messages verbatim (≥10,000 tokens / 5 text blocks; ≤40,000 tokens). Falls back to Level 3 if memory is missing, empty, or post-compact count still too large.

Level 3: Full LLM Compaction (src/services/compact/compact.ts) — sends conversation to model for structured summarization. 9-section output (intent, concepts, files+snippets, errors, problem-solving, all user messages verbatim, pending tasks, current work, next step). Strips <analysis> scratchpad; only <summary> enters context.

Token thresholds (autoCompact.ts)

Constant	Value	Meaning
`AUTOCOMPACT_BUFFER_TOKENS`	13,000	Gap below effective window where auto-compact fires
`WARNING_THRESHOLD_BUFFER_TOKENS`	20,000	Orange UI warning
`MAX_OUTPUT_TOKENS_FOR_SUMMARY`	20,000	Reserved for summary output
`MANUAL_COMPACT_BUFFER_TOKENS`	3,000	Blocking limit for `/compact`

Effective context window = model_context_window − 20,000. Auto-compact fires at effective window − 13,000.

Triggers

Auto-compact: top of each query loop turn, before API call
Manual /compact slash command
Reactive: after prompt_too_long API error

Disable conditions

DISABLE_COMPACT / DISABLE_AUTO_COMPACT env vars, autoCompactEnabled: false, query source is session_memory or compact (recursion guard), Context Collapse mode active, reactive-only experiment flag.

Circuit breaker

MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3 — stops retrying after 3 consecutive failures.

Post-compaction context restoration

Re-injected as AttachmentMessages: up to 5 recently-read files (50K token budget, 5K/file), async agent status, current plan file, plan mode instructions, invoked skills (25K budget, 5K/skill, MRU order), deferred tool schemas, agent listings, MCP tool instructions, session start hooks.

Partial compaction

from direction: summarizes messages after pivot, keeps prefix (cache-preserving)
up_to direction: summarizes messages before pivot, keeps tail (invalidates cache; strips old compact boundaries from tail)

Execution order per turn

shouldAutoCompact?
  └─ yes → trySessionMemoryCompaction()
               └─ success → done
               └─ null   → compactConversation() [LLM]
  └─ no  → microcompactMessages()
               └─ time-based trigger? → time-based MC
               └─ cached MC enabled? → cachedMicrocompactPath
               └─ otherwise → no-op