Claude Code Compaction Policy
Source: raw/docs/claude-code-compaction-policy.md
Summary
Claude Code uses a three-level compaction hierarchy — cheapest first — to prevent conversation history from exceeding the model's context window.
Level 1: Microcompaction (src/services/compact/microCompact.ts) — prunes individual tool results without summarizing. Two sub-modes:
- Cached microcompaction (
CACHED_MICROCOMPACTfeature flag): uses Cache Editing API to delete old tool results server-side, preserving the prompt cache prefix. Only on main thread; applies to Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write results. - Time-based microcompaction: fires when gap since last assistant message exceeds threshold (cache is cold anyway). Clears all but N most recent compactable tool results in message content directly.
Level 2: Session Memory Compaction (src/services/compact/sessionMemoryCompact.ts, experimental) — avoids an API call by reusing the continuously-written session memory file as the summary. Keeps recent messages verbatim (≥10,000 tokens / 5 text blocks; ≤40,000 tokens). Falls back to Level 3 if memory is missing, empty, or post-compact count still too large.
Level 3: Full LLM Compaction (src/services/compact/compact.ts) — sends conversation to model for structured summarization. 9-section output (intent, concepts, files+snippets, errors, problem-solving, all user messages verbatim, pending tasks, current work, next step). Strips <analysis> scratchpad; only <summary> enters context.
Token thresholds (autoCompact.ts)
| Constant | Value | Meaning |
|---|---|---|
AUTOCOMPACT_BUFFER_TOKENS |
13,000 | Gap below effective window where auto-compact fires |
WARNING_THRESHOLD_BUFFER_TOKENS |
20,000 | Orange UI warning |
MAX_OUTPUT_TOKENS_FOR_SUMMARY |
20,000 | Reserved for summary output |
MANUAL_COMPACT_BUFFER_TOKENS |
3,000 | Blocking limit for /compact |
Effective context window = model_context_window − 20,000. Auto-compact fires at effective window − 13,000.
Triggers
- Auto-compact: top of each query loop turn, before API call
- Manual
/compactslash command - Reactive: after
prompt_too_longAPI error
Disable conditions
DISABLE_COMPACT / DISABLE_AUTO_COMPACT env vars, autoCompactEnabled: false, query source is session_memory or compact (recursion guard), Context Collapse mode active, reactive-only experiment flag.
Circuit breaker
MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3 — stops retrying after 3 consecutive failures.
Post-compaction context restoration
Re-injected as AttachmentMessages: up to 5 recently-read files (50K token budget, 5K/file), async agent status, current plan file, plan mode instructions, invoked skills (25K budget, 5K/skill, MRU order), deferred tool schemas, agent listings, MCP tool instructions, session start hooks.
Partial compaction
fromdirection: summarizes messages after pivot, keeps prefix (cache-preserving)up_todirection: summarizes messages before pivot, keeps tail (invalidates cache; strips old compact boundaries from tail)
Execution order per turn
shouldAutoCompact?
└─ yes → trySessionMemoryCompaction()
└─ success → done
└─ null → compactConversation() [LLM]
└─ no → microcompactMessages()
└─ time-based trigger? → time-based MC
└─ cached MC enabled? → cachedMicrocompactPath
└─ otherwise → no-op