All Posts June 9, 2026

When should you start a new AI session?

A plain-English guide to how context works, why a long chat quietly gets more expensive and less reliable, and when to start fresh.

contextcosthow-to

A simple, plain-English guide to one decision you make all the time: should you keep going in a long session that already has a lot of context loaded, or open a fresh one? It comes down to two things, quality and cost, and both are about to matter more as token subsidies end. Here is how to make the call.

The golden rule

If less than 15% of what is in your chat is relevant to the next thing you ask, start a new session.

You can run with just that rule and already work smarter. But the rest of this guide explains how context actually works, so you can trust the call you are making and set up your agents the same way.

What is context?

Context is everything the model re-reads on every turn: the setup instructions, the entire conversation so far (or a summary of it), and your newest message. Here is the part that surprises most people: AI models have no memory. Nothing carries over on its own. Every time you hit send, the whole conversation is packaged up and read again from scratch.

The most a model can read at once is its context window, often around 200,000 tokens. A token is roughly three quarters of a word, so that is about 150,000 words, a small book.

The

Everything

System setup and instructions

fixed, small

Tool definitions

fixed, small

Conversation history, every earlier turn

grows with every message, never shrinks on its own

Your newest message

the only truly new part

Context

The setup and tools stay about the same size. The history is the part that grows. That single fact is the root of everything that follows.

Why a long chat quietly gets expensive

Because the model re-reads everything on every turn, you are billed for the whole conversation on every message, not just the new part. On turn 1 you might send 3,000 tokens. By turn 50, the same short question rides on top of 150,000 tokens of history, and you pay to process all of it again. The cost of a single message keeps climbing even when the message itself is tiny.

What you pay to process on each turn

The gap is the problem. If the model truly remembered, each turn would cost about the same. In reality, every turn re-reads the full history, so by turn 50 a one-line question is processing 150,000 tokens.

This was easy to ignore while tokens were heavily subsidized. After June 15, 2026, when full rates apply, the climbing cost of a long conversation becomes a real line on your bill.

Context rot: more is not better

There is a second problem that has nothing to do with money. As the context fills up, models get worse at using it. They miss details buried in the middle, lose the thread, and start repeating themselves or contradicting decisions made earlier in the chat. Anthropic calls this context rot. You may also hear it called context drift.

It is not a bug you can patch. It comes from how the technology works: the model weighs every token against every other token, and as the pile grows, that attention gets stretched thin. It shows up in every model, although newer ones handle it more gracefully than older ones.

Multi-fact recall as the context fills (2026 models)

Fuller context, weaker recall, for every model. Today's best hold up far better than older ones did, but the direction is the same for all of them, and the drop steepens as you push toward the limit.

Multi-fact recall (MRCR v2, 8-needle), 2026, via CodingFleet. For historical contrast: on the NoLiMa test, GPT-4o fell from 99% to about 70% by just 32K tokens (Adobe), a level today's best models hold out to hundreds of thousands of tokens.

A practitioner rule of thumb: a model tends to stay reliable only up to roughly half to two thirds of its advertised window. Treat the headline number as a ceiling, not a comfortable working range.

Automatic compaction, and a hidden cost

Modern tools try to help. When a conversation gets close to the limit, the coding CLI (Claude Code, Codex, and the agents inside CRHQ) automatically summarizes the older parts and replaces them with a shorter recap. This is called compaction. In CRHQ you do not control it, the underlying CLI decides when and how, and it generally does a good job.

Compaction is genuinely useful, but it has a side effect worth understanding. Picture a tool that compacts once the window gets roughly 70% full, then shrinks the history back down to a small recap. The session climbs again, gets compacted again, and keeps sawtoothing up and down for as long as you keep it open.

Here is the part that matters for your bill: the lower half of each climb is cheap, and the upper half is where tokens really add up. Early in a fresh session you are down in the cheap half. The longer a single chat runs, the more of its turns sit up in the pricey half, often re-reading a pile of context that no longer matters. Starting a new session for a new task drops you back to the bottom, into the cheap half again. Think in proportions, not token counts, because the exact numbers differ by model. Some windows are 200,000 tokens, some reach a million.

Where your turns happen as a session grows

Every climb has a cheap half and a pricey half. A session starts at the bottom, climbs, gets compacted, and climbs again. The longer it runs, the more of its turns sit up in the pricey upper half. Start fresh for a new task and you drop back into the cheap zone.

Caching, and why a long pause costs you

One thing works in your favor: caching. When you send the same beginning of a conversation again, the provider can reuse the work it already did and charges you a fraction of the price for that repeated part, often around one tenth.

The catch is that the cache stays warm only for a short time. If you keep chatting, replies land within minutes and stay cheap. But if you walk away and come back later, the cache expires. Your comeback message then has to reload the entire history at the full, higher rate. With a large context and a long pause, that single message can be surprisingly expensive.

Cost to re-send the same history

The same action can be cheap or costly depending on timing. Warm means you kept chatting and the cache is still live. Cold means you paused long enough that it expired, so the next message reloads everything near full price.

Relative figures. Warm cache reads are commonly billed around one tenth of the full rate; exact numbers vary by provider and model.

So, continue or start fresh?

The honest answer is that there is no magic number of turns, and anyone who hands you one is guessing. Turn count is a bad measure anyway, because one task can be ten times heavier than another. The thing that actually matters is simple: how much of the current context is relevant to what you are about to do.

The test that makes it easy

Before your next message, ask yourself one question:

If I handed this task to a brand-new teammate right now, would I paste them this whole chat, or just a paragraph?

If you would paste most of it, the context is still pulling its weight, so continue. If you would only paste a paragraph, then most of what you are paying to re-send is dead weight, so start fresh, and that paragraph you imagined writing is your handoff into the new session.

This is exactly the case where continuing hurts: you have done three or four different things in one chat, and now you are starting something new where maybe one tenth of the history matters. You keep paying to re-send the other 85 to 90% on every turn, and it cannot help, it can only crowd and confuse.

What you pay to re-send when you continue a long chat into a new task

You pay for all of it, every turn, but only the slice in color helps. When the relevant part drops below about 15%, a fresh session almost always wins on both cost and quality.

Signals that make the call

stay

Same task, flowing well, tight back-and-forth. Continue. The context is relevant and the cache is warm.

new

Switching to a different task. Start fresh. Carry over a short summary only if any of it matters.

new

You notice rot: repetition, contradicting earlier decisions, losing the thread. Start fresh now, regardless of cost. You are already getting worse output.

new

Coming back after a long break to do more. A lean fresh start usually beats reloading a big, cold history.

new

The tool just auto-compacted and you are about to change direction. Treat that as your checkpoint. You are up in the fuller half now, so a pivot is a good moment to reset.

Early on, none of this applies. While the context is small, continuing is both cheaper and better, so do not overthink it. The whole decision only starts to matter once you are deep, which is exactly when most people stop paying attention.

What this means for your agents

This is not only a manual-chat concern. It matters even more when you structure agents, because in an agentic loop every reasoning step re-sends the whole accumulated history, and tool output (logs, file dumps, search results) piles up fast. Cost and rot compound quickest exactly where autonomous agents live.

The practical move is to design agents to delegate distinct tasks to fresh sessions rather than pouring everything into one immortal session. Starting fresh is not scary in CRHQ, because you do not start cold: persistent memory and project documents let a new session load the durable context, your standards, your decisions, your project facts, instead of rebuilding it from nothing.

That is the answer the whole industry is quietly converging on. Do not hoard context, and do not throw it away. Keep the small, high-value part and let the rest go. CRHQ's memory and project docs are that idea, built in.

The short version

It matters for cost.

Every turn re-reads and re-bills the entire conversation, so a long chat keeps getting more expensive even when your messages are short.

It matters for quality.

As the window fills, the model loses focus and starts to drift, which is known as context rot.

The simple rule.

If most of what is already in your chat is not relevant to the next thing you ask, start a new session.

The longer version

Context is everything the model re-reads each turn: setup, the full history (or its summary), and your new message.
Models have no memory, so the entire history is re-sent and re-billed on every turn. Long chats cost more, and the cost climbs even when your messages are short.
Quality also drops as the window fills. This is context rot (or context drift). It affects every model; newer ones handle it better but not perfectly.
Tools auto-compact long chats by summarizing the old parts. You do not control this in CRHQ. After the first compaction you keep bouncing around the fuller, pricier half of the window.
Caching makes repeated context roughly ten times cheaper, but only while it is warm. Returning after a long pause reloads everything near full price.
There is no magic turn count. Decide by relevance: how much of the current context helps your next move.
Golden rule: if less than 15% of your context is relevant to the next turn, start a new session.
Use the brief a new teammate test. The paragraph you would paste becomes your handoff into the fresh session.
For agents: delegate distinct tasks to fresh sessions, and lean on CRHQ memory and project docs to carry the durable context forward cheaply.
All of this matters more after June 15, when subsidies end and you pay full rate for every re-sent token.

The takeaway is not always start fresh, or never let a chat die. It is that you, the human, are the only one who can cheaply judge what in the context still matters, because the model would have to read all of it again just to decide what to drop. Make that judgment on purpose, and both your output and your bill get better.

Common questions

When should I start a new AI session instead of continuing?

When less than about 15% of what is already in your chat is relevant to your next task. A quick test: if you would not paste this whole conversation to brief a new teammate on the next task, start fresh and paste just the paragraph that matters.

Does a longer chat actually cost more?

Yes. Models have no memory, so the entire conversation is re-sent and re-billed on every turn. A short question on turn 50 can be processing 150,000 tokens of history, so the per-message cost keeps climbing even when your messages stay short.

What is context rot?

It is the drop in quality as the context window fills up. The model starts missing details, losing the thread, and repeating or contradicting itself. It affects every model, though newer ones degrade more gently, and it is separate from cost.

Does automatic compaction solve the problem?

It helps by summarizing old history when the window gets full, but it does not return you to a cheap, clean state. After the first compaction you tend to keep bouncing around the fuller, pricier half of the window, so deciding to start fresh still matters.

How does this apply to AI agents?

Agentic loops re-send the whole history on every step and pile up tool output fast, so cost and rot compound quickest there. Design agents to delegate distinct tasks to fresh sessions, and use persistent memory and project documents so a new session loads the durable context instead of rebuilding it.