A simple, plain-English guide to one decision you make all the time: should you keep going in a long session that already has a lot of context loaded, or open a fresh one? It comes down to two things, quality and cost, and both are about to matter more as token subsidies end. Here is how to make the call.
You can run with just that rule and already work smarter. But the rest of this guide explains how context actually works, so you can trust the call you are making and set up your agents the same way.
What is context?
Context is everything the model re-reads on every turn: the setup instructions, the entire conversation so far (or a summary of it), and your newest message. Here is the part that surprises most people: AI models have no memory. Nothing carries over on its own. Every time you hit send, the whole conversation is packaged up and read again from scratch.
The most a model can read at once is its context window, often around 200,000 tokens. A token is roughly three quarters of a word, so that is about 150,000 words, a small book.
The setup and tools stay about the same size. The history is the part that grows. That single fact is the root of everything that follows.
Why a long chat quietly gets expensive
Because the model re-reads everything on every turn, you are billed for the whole conversation on every message, not just the new part. On turn 1 you might send 3,000 tokens. By turn 50, the same short question rides on top of 150,000 tokens of history, and you pay to process all of it again. The cost of a single message keeps climbing even when the message itself is tiny.
This was easy to ignore while tokens were heavily subsidized. After June 15, 2026, when full rates apply, the climbing cost of a long conversation becomes a real line on your bill.
Context rot: more is not better
There is a second problem that has nothing to do with money. As the context fills up, models get worse at using it. They miss details buried in the middle, lose the thread, and start repeating themselves or contradicting decisions made earlier in the chat. Anthropic calls this context rot. You may also hear it called context drift.
It is not a bug you can patch. It comes from how the technology works: the model weighs every token against every other token, and as the pile grows, that attention gets stretched thin. It shows up in every model, although newer ones handle it more gracefully than older ones.
A practitioner rule of thumb: a model tends to stay reliable only up to roughly half to two thirds of its advertised window. Treat the headline number as a ceiling, not a comfortable working range.
Automatic compaction, and a hidden cost
Modern tools try to help. When a conversation gets close to the limit, the coding CLI (Claude Code, Codex, and the agents inside CRHQ) automatically summarizes the older parts and replaces them with a shorter recap. This is called compaction. In CRHQ you do not control it, the underlying CLI decides when and how, and it generally does a good job.
Compaction is genuinely useful, but it has a side effect worth understanding. Picture a tool that compacts once the window gets roughly 70% full, then shrinks the history back down to a small recap. The session climbs again, gets compacted again, and keeps sawtoothing up and down for as long as you keep it open.
Here is the part that matters for your bill: the lower half of each climb is cheap, and the upper half is where tokens really add up. Early in a fresh session you are down in the cheap half. The longer a single chat runs, the more of its turns sit up in the pricey half, often re-reading a pile of context that no longer matters. Starting a new session for a new task drops you back to the bottom, into the cheap half again. Think in proportions, not token counts, because the exact numbers differ by model. Some windows are 200,000 tokens, some reach a million.
Caching, and why a long pause costs you
One thing works in your favor: caching. When you send the same beginning of a conversation again, the provider can reuse the work it already did and charges you a fraction of the price for that repeated part, often around one tenth.
The catch is that the cache stays warm only for a short time. If you keep chatting, replies land within minutes and stay cheap. But if you walk away and come back later, the cache expires. Your comeback message then has to reload the entire history at the full, higher rate. With a large context and a long pause, that single message can be surprisingly expensive.
So, continue or start fresh?
The honest answer is that there is no magic number of turns, and anyone who hands you one is guessing. Turn count is a bad measure anyway, because one task can be ten times heavier than another. The thing that actually matters is simple: how much of the current context is relevant to what you are about to do.
The test that makes it easy
Before your next message, ask yourself one question:
If I handed this task to a brand-new teammate right now, would I paste them this whole chat, or just a paragraph?
If you would paste most of it, the context is still pulling its weight, so continue. If you would only paste a paragraph, then most of what you are paying to re-send is dead weight, so start fresh, and that paragraph you imagined writing is your handoff into the new session.
This is exactly the case where continuing hurts: you have done three or four different things in one chat, and now you are starting something new where maybe one tenth of the history matters. You keep paying to re-send the other 85 to 90% on every turn, and it cannot help, it can only crowd and confuse.
Signals that make the call
Early on, none of this applies. While the context is small, continuing is both cheaper and better, so do not overthink it. The whole decision only starts to matter once you are deep, which is exactly when most people stop paying attention.
What this means for your agents
This is not only a manual-chat concern. It matters even more when you structure agents, because in an agentic loop every reasoning step re-sends the whole accumulated history, and tool output (logs, file dumps, search results) piles up fast. Cost and rot compound quickest exactly where autonomous agents live.
The practical move is to design agents to delegate distinct tasks to fresh sessions rather than pouring everything into one immortal session. Starting fresh is not scary in CRHQ, because you do not start cold: persistent memory and project documents let a new session load the durable context, your standards, your decisions, your project facts, instead of rebuilding it from nothing.
That is the answer the whole industry is quietly converging on. Do not hoard context, and do not throw it away. Keep the small, high-value part and let the rest go. CRHQ's memory and project docs are that idea, built in.
The short version
The longer version
- Context is everything the model re-reads each turn: setup, the full history (or its summary), and your new message.
- Models have no memory, so the entire history is re-sent and re-billed on every turn. Long chats cost more, and the cost climbs even when your messages are short.
- Quality also drops as the window fills. This is context rot (or context drift). It affects every model; newer ones handle it better but not perfectly.
- Tools auto-compact long chats by summarizing the old parts. You do not control this in CRHQ. After the first compaction you keep bouncing around the fuller, pricier half of the window.
- Caching makes repeated context roughly ten times cheaper, but only while it is warm. Returning after a long pause reloads everything near full price.
- There is no magic turn count. Decide by relevance: how much of the current context helps your next move.
- Golden rule: if less than 15% of your context is relevant to the next turn, start a new session.
- Use the brief a new teammate test. The paragraph you would paste becomes your handoff into the fresh session.
- For agents: delegate distinct tasks to fresh sessions, and lean on CRHQ memory and project docs to carry the durable context forward cheaply.
- All of this matters more after June 15, when subsidies end and you pay full rate for every re-sent token.
The takeaway is not always start fresh, or never let a chat die. It is that you, the human, are the only one who can cheaply judge what in the context still matters, because the model would have to read all of it again just to decide what to drop. Make that judgment on purpose, and both your output and your bill get better.
Common questions
When should I start a new AI session instead of continuing?
When less than about 15% of what is already in your chat is relevant to your next task. A quick test: if you would not paste this whole conversation to brief a new teammate on the next task, start fresh and paste just the paragraph that matters.
Does a longer chat actually cost more?
Yes. Models have no memory, so the entire conversation is re-sent and re-billed on every turn. A short question on turn 50 can be processing 150,000 tokens of history, so the per-message cost keeps climbing even when your messages stay short.
What is context rot?
It is the drop in quality as the context window fills up. The model starts missing details, losing the thread, and repeating or contradicting itself. It affects every model, though newer ones degrade more gently, and it is separate from cost.
Does automatic compaction solve the problem?
It helps by summarizing old history when the window gets full, but it does not return you to a cheap, clean state. After the first compaction you tend to keep bouncing around the fuller, pricier half of the window, so deciding to start fresh still matters.
How does this apply to AI agents?
Agentic loops re-send the whole history on every step and pile up tool output fast, so cost and rot compound quickest there. Design agents to delegate distinct tasks to fresh sessions, and use persistent memory and project documents so a new session loads the durable context instead of rebuilding it.