Claude Code manages context through six distinct strategies that work together: token counting, auto-compact, reactive compact, context collapse, snip, and micro-compact. Here is exactly how each one works, when it triggers, and how to control the behavior for maximum productivity.
Claude Code manages its context window through six distinct strategies that work together automatically: token counting, auto-compact, reactive compact, context collapse, snip, and micro-compact. Understanding these mechanisms is the difference between a session that degrades into confusion after 30 minutes and one that stays sharp across hours of complex refactoring. This guide documents exactly how each strategy works, when it triggers, what it preserves, and how you can influence the behavior to get better results from your coding sessions.
If you have used Claude Code for any sustained period, you have experienced the moment where the model seems to forget what you were working on, or where a long tool output disappears from the conversation. That is not a bug. It is context management doing its job — compressing lower-value information to make room for what matters right now. The question is whether you understand the system well enough to work with it rather than against it.
How Claude Code Counts Tokens
Every Claude Code session maintains a running token tally. Rather than counting tokens locally with a tokenizer (which would add latency and complexity), Claude Code reads the usage field from the API response after every model call. The API returns the exact number of input and output tokens consumed, and Claude Code adds these to its running total.
This approach has a practical advantage: it is always accurate. Local tokenizers can drift from the server-side tokenizer, especially after model updates. By reading the authoritative count from the API response, Claude Code avoids the class of bugs where local and server token counts diverge and compaction triggers at the wrong time.
The system reserves approximately 33,000 tokens as a buffer — roughly 16.5% of a 200K context window. This buffer exists because the model needs room to generate a response after the prompt is assembled. If the prompt consumed 100% of the context window, there would be zero tokens available for the response, and the call would fail. The 16.5% reserve ensures there is always room for a substantive response even when the context is nearly full.
You can check your current token usage at any time with the /cost command. This displays the running token count, the cost incurred so far, and how close you are to the compaction threshold. Use our free token counter tool to estimate token counts for text you plan to paste into a session — useful for deciding whether a large file dump will push you over the compaction threshold.
Strategy 1: Auto-Compact
Auto-compact is the primary context management mechanism. It triggers automatically when the running token count reaches approximately 83.5% of the context window (that is, total window minus the 33K buffer). When triggered, Claude Code takes the entire conversation history and asks the model to produce a compressed summary that preserves the essential information while discarding verbose intermediate steps.
The summary retains:
- What files were discussed and their current state
- What decisions were made and why
- What the current task is and what remains to be done
- Key code snippets or patterns that were established
- Error messages or issues that are still relevant
The summary discards:
- Full tool call outputs that have been superseded by later changes
- Exploratory conversation branches that led nowhere
- Verbose file contents that were read but not modified
- Intermediate debugging steps for issues that have been resolved
Since version 2.0.64, released in February 2026, auto-compact executes instantly. Earlier versions had a noticeable pause during compaction — sometimes several seconds — which interrupted the flow of work. The performance improvement came from optimizing how the summary prompt is constructed and from using a faster model for the summarization step. If you are running an older version of Claude Code, updating to the latest release eliminates the compaction delay entirely.
The practical implication: you do not need to manually manage your context window for most sessions. Auto-compact handles the transition seamlessly, and the model continues working with a compressed but accurate representation of the conversation. Based on our testing, the quality of work after auto-compact is indistinguishable from the quality before compaction for the vast majority of coding tasks.
Strategy 2: Reactive Compact
Reactive compact is the emergency fallback. It triggers when the API returns a context_length_exceeded error — meaning the prompt was too large for the model to process even with the buffer. This can happen when auto-compact did not trigger in time (for example, if a single tool call returned an unexpectedly large result that pushed the context past the limit in one step).
When reactive compact fires, it performs a more aggressive summarization than auto-compact. Where auto-compact tries to preserve nuance and detail, reactive compact prioritizes keeping the conversation functional at all. It will discard more context, summarize more aggressively, and produce a shorter summary to ensure the next API call succeeds.
You should rarely see reactive compact in normal usage. If it triggers frequently, that is a signal that your workflow is generating unusually large tool outputs — perhaps reading very large files, or running commands that produce extensive output. The fix is to be more surgical with your requests: read specific line ranges instead of entire files, pipe command output through head or tail, and avoid pasting multi-thousand-line files directly into the conversation.
Strategy 3: Context Collapse
Context collapse is a targeted optimization that removes the internal details of tool calls while preserving their outcomes. When Claude Code reads a file, runs a command, or performs a search, the full tool call and its response are stored in the conversation. Over time, these accumulate and consume significant context.
Context collapse works by replacing the verbose tool interaction with a compact summary. For example, a file read that returned 200 lines of code might be collapsed to: “Read src/components/UserProfile.tsx (200 lines) — React component with useState for form state, useEffect for data fetching, renders a form with name/email/avatar fields.” The full file content is removed from context, but the model retains what it learned from reading the file.
This strategy is particularly effective because tool calls are often the largest individual items in the context. A single file read or command execution can consume thousands of tokens. Context collapse reclaims that space while keeping the semantic content — what the tool result meant, what was decided based on it — intact.
The key insight: context collapse keeps what was decided and drops how it was computed. If you ran a grep across the codebase and found that a function is used in 14 files, context collapse retains “function X is used in 14 files” but drops the full list of file paths and matching lines. If you need those details again later, you can re-run the search.
Strategy 4: Snip
Snip is surgical removal. While the other strategies operate on broad categories of content (all tool calls, the entire conversation history), snip targets specific messages or tool results that are no longer relevant to the current task.
For example, if you spent the first half of a session debugging a CSS issue and then pivoted to implementing a new API endpoint, the CSS debugging context is no longer relevant. Snip can remove those specific messages — the file reads, the style experiments, the browser output descriptions — without touching the API implementation context that is currently active.
Snip operates with precision that the broader compaction strategies cannot match. It does not summarize; it removes. This means there is zero information loss for the retained content and complete information loss for the removed content. The tradeoff is appropriate when the removed content is genuinely irrelevant to the current task direction.
Strategy 5: Micro-Compact
Micro-compact operates at the level of individual tool results rather than the conversation as a whole. When a specific tool result is large but only a small portion of it is relevant going forward, micro-compact compresses that single result in place.
Consider a scenario where Claude Code runs npm test and the output is 500 lines, but only 3 tests failed. Micro-compact compresses that tool result from 500 lines to something like: “Ran 247 tests. 244 passed. 3 failed: UserAuth.test.ts line 45 (expected 401, got 200), PaymentFlow.test.ts line 112 (timeout after 5000ms), DataExport.test.ts line 78 (undefined is not a function).” The conclusion is preserved. The 244 lines of passing test output are gone.
Micro-compact is particularly valuable for iterative workflows. When you are running tests repeatedly, each run produces hundreds of lines of output but only the failures matter. Without micro-compact, four test runs could consume 2,000 lines of context. With it, the same information occupies perhaps 20 lines — a 100x compression ratio with zero loss of actionable information.
The 1M Context Window: What Changes
Since March 2026, Claude Code supports a 1 million token context window when using Claude Opus 4.6 or Sonnet 4.6. This became generally available after a beta period, and it fundamentally changes how context management works in practice.
With a 200K context window, auto-compact triggers after roughly 30–60 minutes of active coding depending on how many files you read and how verbose your tool usage is. With a 1M context window, you can work for hours before hitting the compaction threshold. For many coding sessions, you will never trigger compaction at all.
The 1M window does not eliminate the need for context management — it raises the ceiling. If you are working on a massive refactoring task that touches 50 files, reading each file consumes tokens regardless of the window size. The six strategies still operate; they just trigger less frequently. And when they do trigger, they have more material to work with, which means the compressed summaries tend to be higher quality because there is more context available to inform the summarization.
The practical recommendation: if you are working on tasks that require holding large amounts of code in context simultaneously — cross-file refactoring, large-scale migrations, complex debugging that spans multiple services — the 1M window is a meaningful productivity improvement. For shorter, focused tasks (implementing a single component, fixing a specific bug), the 200K window is usually sufficient and the difference is negligible.
Custom Compaction with /compact
The /compact command gives you manual control over compaction. Running /compact with no arguments triggers an immediate compaction using the default summarization strategy. But the real power is in custom instructions.
Running /compact [instructions] tells the compaction process to prioritize specific information. For example:
/compact focus on the database migration changes and ignore the CSS work— produces a summary weighted toward the migration context/compact preserve all file paths and function signatures discussed— ensures structural information survives compaction/compact summarize decisions only, drop all exploration— aggressive compression that keeps conclusions and drops the reasoning process
Custom compaction instructions are particularly useful at natural task boundaries. When you finish one phase of work and are about to start another, running /compact with instructions that emphasize the completed work and de-emphasize exploratory dead ends gives you a clean, focused context for the next phase.
One pattern that experienced Claude Code users employ: before starting a complex task, they run /compact preserve only the project structure and current task description to clear out irrelevant context from earlier in the session. This is especially valuable if you have been switching between multiple tasks in the same session and want to focus the model’s attention on the current one.
CLAUDE.md: Context That Survives Everything
CLAUDE.md is the single most important file for long-term context management in Claude Code. It is a markdown file at the root of your project that Claude Code reads at the start of every session and preserves through every compaction cycle. Content in CLAUDE.md is never summarized, never compressed, and never discarded.
This makes CLAUDE.md the right place for information that must be available in every session regardless of what else is happening:
- Project architecture decisions: “We use Server Components by default, client components only when state or interactivity is required.”
- Code conventions: “No any types. Use unknown with type narrowing. Named exports for all non-page components.”
- Forbidden patterns: “Never install shadcn/ui. Never use inline styles. Never use var.”
- Deployment context: “Docker standalone build, Traefik handles SSL, push to master triggers auto-deploy.”
- Key file locations: “Tool registry at src/data/tools-registry.ts, site config at src/config/site.ts.”
The distinction is critical: anything in CLAUDE.md is permanent context. Anything in the conversation is temporary context that will eventually be compressed or removed. If a piece of information must be available across sessions and survive compaction, it belongs in CLAUDE.md. If it is relevant only to the current task, it belongs in the conversation.
You can also create CLAUDE.md files in subdirectories. Claude Code reads the root CLAUDE.md plus any CLAUDE.md in the current working directory or its ancestors. This lets you set project-wide conventions in the root file and module-specific conventions in subdirectory files. For instance, a src/components/CLAUDE.md might specify component naming conventions, while the root CLAUDE.md covers the overall architecture.
For a practical example of a production CLAUDE.md, see our comparison of Claude Code, Cursor, and GitHub Copilot where we discuss how each tool handles persistent project context.
Practical Tips for Managing Context Effectively
Based on extensive use of Claude Code across production codebases, here are the patterns that consistently produce the best results:
1. Read Specific Line Ranges, Not Entire Files
Every line you read consumes context tokens. If you need to understand a specific function, read only that function’s line range rather than the entire file. Claude Code’s Read tool accepts offset and limit parameters for exactly this reason. A 500-line file costs roughly 2,000 tokens to read in full. If you only need lines 45–80, you consume approximately 150 tokens instead — a 13x reduction.
2. Use Grep Before Read
Before reading a file, use Grep to find the specific lines you need. This is faster, consumes less context, and gives you line numbers you can use for targeted reads. The pattern is: Grep to locate, Read with offset/limit to examine, Edit to modify. This three-step workflow minimizes context consumption at every stage.
3. Run /compact at Task Boundaries
When you finish implementing a feature and are about to start the next one, run /compact with instructions that summarize the completed work. This clears out the implementation details (which are now in the committed code, not needed in context) and frees space for the next task. The committed code is the source of truth; the context only needs to know what was done, not the line-by-line details of how.
4. Front-Load Critical Context in CLAUDE.md
Put the most important conventions and constraints at the top of your CLAUDE.md. While the entire file is read, information at the top tends to have stronger influence on the model’s behavior. Lead with absolute rules (forbidden patterns, required conventions) and follow with reference information (file locations, architecture notes). If you are working with structured data, our JSON formatter tool can help you keep configuration files readable and well-organized.
5. Prefer Multiple Short Sessions Over One Marathon
Even with the 1M context window, starting a fresh session for a new task is often more effective than continuing an existing session. A fresh session loads CLAUDE.md with no other context, giving the model a clean slate focused entirely on the new task. A continued session carries compressed artifacts from the previous task that may subtly influence the model’s approach. For truly independent tasks, a new session is almost always better.
6. Watch for Post-Compaction Drift
After auto-compact triggers, verify that the model still has accurate context for your current task by asking a specific question about the current state. For example: “What file are we currently modifying and what is the remaining work?” If the model’s answer is accurate, continue working. If it has lost critical context, re-establish it by reading the relevant files or providing a brief summary of the current state. This takes 10 seconds and can save you from minutes of confused work.
7. Use the /cost Command Proactively
Check /cost periodically during long sessions, especially before starting an operation that will consume significant context (like reading multiple large files). If you are at 70% utilization and about to read three 500-line files, compaction will trigger mid-operation. Better to run /compact first, then proceed with the reads on a clean context.
How the Six Strategies Work Together
The six strategies are not independent — they form a layered system where each handles a different scale of context pressure:
- Micro-compact handles individual tool results as they are generated, compressing verbose output in place
- Snip removes specific irrelevant messages as the conversation evolves
- Context collapse compresses tool call details across the session, keeping outcomes and dropping mechanics
- Auto-compact performs full conversation summarization when the token count approaches the threshold
- Reactive compact fires as an emergency fallback if the context overflows despite auto-compact
- Token counting underlies everything, providing the real-time utilization data that triggers each strategy at the right moment
In a typical long session, you will experience micro-compact and context collapse silently throughout, auto-compact once or twice at natural inflection points, and reactive compact never. Snip operates opportunistically when the conversation direction shifts. The entire system is designed to be invisible when working correctly — you should notice it only when you deliberately check with /cost or when the model’s context seems to have shifted.
Common Mistakes That Waste Context
Certain patterns consume context disproportionately and trigger premature compaction:
- Reading entire files when you need one function: The most common context waste. Always use targeted reads.
- Pasting large code blocks into the conversation: If the code exists in a file, reference the file. Do not paste it. Claude Code can read it directly.
- Running commands with unbounded output: A bare
find .orgit logwithout limits can produce thousands of lines. Always constrain output with flags like--max-count,head, ortail. - Asking the model to “show me” code it just wrote: The model already has the code in context. Asking it to repeat the code doubles the context consumption for zero information gain.
- Switching between unrelated tasks without compacting: If you finish a CSS task and start an API task, the CSS context is dead weight. Run
/compactbetween task switches.
Context Management Across AI Coding Tools
Claude Code’s six-strategy approach is more granular than most competing tools. Cursor uses a fixed sliding window that drops older messages wholesale. GitHub Copilot Chat does not expose its context management to users at all. The advantage of Claude Code’s layered approach is that it preserves high-value information (decisions, outcomes, current state) while aggressively compressing low-value information (verbose tool output, superseded file contents, resolved debugging steps).
For a detailed comparison of how these tools handle long coding sessions differently, read our Claude Code vs Cursor vs GitHub Copilot comparison which covers context management, code quality, and workflow integration across all three platforms.
Conclusion
Claude Code’s context management is a six-layer system designed to keep your session productive without requiring you to think about token limits. Token counting provides the data. Micro-compact and snip handle granular optimization. Context collapse compresses tool interactions. Auto-compact performs full summarization at the threshold. Reactive compact catches edge cases. And CLAUDE.md sits above all of it as persistent context that survives every compaction cycle.
The developers who get the most out of Claude Code are the ones who understand this system well enough to work with it: using targeted reads instead of full-file dumps, running /compact at task boundaries, front-loading critical context in CLAUDE.md, and checking /cost proactively during long sessions. The 1M context window available since March 2026 raises the ceiling significantly, but the fundamentals of efficient context usage remain the same regardless of window size. Master these six strategies and your Claude Code sessions will stay sharp from the first prompt to the last commit.