Stop Burning Your Claude Code Context Window

The session slows down. Responses get strange. Claude starts forgetting things you told it twenty minutes ago. That is the context window filling up.

It is not a bug. It is the nature of how large language models work. Every token Claude processes counts against a hard limit. The question is whether you are spending those tokens on work that matters, or burning them on noise.

What Lives in Your Context Window

Before you type a single word, the context window is already filling up. Your CLAUDE.md instructions load automatically. Then add every file Claude reads during your session, every command it runs, the full output of each command, every message you send, and every response it gives back.

A single read on a large file can burn thousands of tokens instantly. Multiply that across a real working session and you see how fast the bench fills up.

What Happens When It Fills Up

Claude Code runs auto-compaction: it summarizes your conversation history to make room. Not everything survives. This is worth understanding.

What	After compaction
System prompt and output style	Always preserved
Project-root `CLAUDE.md`	Re-injected from disk
Auto memory files	Re-injected from disk
Path-scoped rules (`paths:` frontmatter)	Lost until that file is read again
Nested `CLAUDE.md` in subdirectories	Lost until you touch that folder
Skill bodies	Re-injected, capped at 5,000 tokens per skill

The takeaway: instructions in your root CLAUDE.md survive. Instructions buried in nested files or path-scoped rules do not. Put the things you cannot afford to lose at the top level, and put the most important lines near the top of every file.

Seven Ways to Reduce Context Window Burn

1. Use /clear between unrelated tasks

This is the simplest and most overlooked habit. When you finish a task and start something new, run /clear. It resets the conversation entirely. Long sessions with irrelevant context do not just waste tokens. They actively degrade response quality because Claude is working around all that noise.

2. Use /compact with instructions

When you want to keep working in the same session but reclaim space, run /compact Focus on the API changes and ignore the debugging we already resolved. Claude summarizes the conversation based on your guidance. You control what survives.

3. Rewind and summarize a specific section

Press Esc + Esc or run /rewind, pick a checkpoint, and choose "Summarize from here." This condenses only the messages after that point, leaving earlier context intact. Useful when one branch of the conversation went long but the rest is still relevant.

4. Use /btw for quick questions

Need to check something without it living in your history forever? Use /btw. The answer appears in a dismissible overlay and never enters the conversation. You get the information without spending the tokens.

5. Delegate research to subagents

When Claude needs to explore a codebase or read a lot of files, it burns your context window doing it. Instead, ask it to delegate: "Use subagents to investigate the auth module and report back." Subagents run in their own separate context windows. They explore, summarize, and return a clean report to your main session, which stays focused on implementation.

Context is your fundamental constraint. Subagents are one of the most powerful tools available precisely because they protect it. When Claude researches a codebase it reads lots of files, all of which consume your context. Subagents run in separate context windows and report back summaries.

6. Write a tight CLAUDE.md

Every token in CLAUDE.md loads before every session. Treat it like a tight brief, not a running document. The most important instructions go at the top, because compaction preserves the start of files and truncates from the bottom. If something is not actively shaping Claude's behavior, it does not belong there.

7. Keep large outputs out of context entirely

The biggest single lever is preventing large outputs from entering your context window in the first place. Tools like context-mode do this by routing fetched content and command output to a sandbox knowledge base. Only the relevant excerpts from a search query come back to your session.

In practice: a 26-minute session that generated 128 KB of data had only 42 KB enter the context window. That is a 67% reduction and roughly 22 extra minutes of session life gained.

The Mental Model That Makes It Click

Your context window is a workbench. Every file you open, every command you run, every message you exchange takes up space on that bench. When the bench fills up, Claude starts pushing older work off the edge.

Your job is to be disciplined about what earns space on that bench. Delegate the exploration. Summarize the history. Clear between tasks. Keep your standing instructions tight. Route large outputs around the bench entirely when you can.

The developers who get the most out of Claude Code are not the ones with the biggest context windows. They are the ones who treat context as the scarce resource it is.

Want to think more clearly about how you work?

The same principles that apply to AI workflows apply to team decisions: be deliberate about what gets attention, and ruthless about what doesn't. Let's talk.

Book a Session →