DESK · THEORY
ExplainerBeginner · June 2, 2026 · 3 min read
On this page

What is a context window?

The model's working memory: everything it can see right now, measured in tokens. If the thing it needs isn't in the window, the model doesn't ask for it. It guesses.

You paste a long document, ask three follow-up questions, and on the fourth the model contradicts something you told it at the top. It's not being lazy. The top of the conversation scrolled out of view, and the model can only reason about what's in view.

A context window is the slice of text a model can hold in mind at one moment. Everything in that slice, your prompt, the files you pasted, the model's own earlier replies, is fair game for the answer. Everything outside it might as well not exist.

What it is (in plain English)

A large language model doesn't have a memory of your conversation the way a person does. It has a window. Each time you send a message, the model reads the whole window from scratch, predicts the next stretch of text, and stops. The window is measured in tokens, which are chunks of text roughly the size of a short word. A model's window might hold a few thousand tokens or a few million.

Two ways the thing you need ends up outside the window. Either you never put it there (the model has no idea what "ARR" means in your business unless you said so), or you put it there early and the conversation grew until it pushed off the top. In both cases the model fills the gap the only way it can: it guesses, fluently, and you can't always tell.

That's the whole concept. Not a hard drive. A desk. Only what's on the desk gets used.

Why CEOs care

This is the real reason a long task degrades, and the reason a "smarter model" often isn't the fix.

An AI agent running a multi-step job fills its own window as it works: tool output, half-finished steps, earlier reasoning. The decisions it made at the start scroll off. By the end it's guessing about choices it already made, which is why something that dazzled in a five-minute demo falls apart over a multi-hour job. The model didn't get worse. Its window got crowded.

And bigger windows are not a silver bullet. Independent testing of around 18 frontier models found they all degrade as the input grows, well before they hit their maximum window, and none of them read long context evenly. The industry calls it "context rot." A million-token window is not a filing cabinet you dump everything into and trust. Past a point, the more you stuff in, the worse the recall on any one piece.

So the lever isn't a fancier model. It's managing what the model sees.

Where you'll see it

What to do next

Three habits keep the window working for you instead of against you.

Keep tasks scoped. One bounded job with a clear "done" fills less of the window than a sprawling mandate, and fails less. Give the agent a written brief and a memory: a short CLAUDE.md file about your business plus memory across sessions means the model stops guessing the things only you know. And when a thread gets bloated and the answers start drifting, don't fight it. Start fresh.

If your agents keep falling apart on real work, the deeper diagnosis is here: why most AI agents fall apart in real work.

The Thursday 3

Get three workflows like this every Thursday

The Thursday 3 is a free weekly email. Three workflows that put you in the top 1% of CEOs. 90-second read. Every card links back to a step-by-step guide like this one.

Get the newsletter →
The Desk Theory books

Make this run while you sleep.

The Complete Guide to OpenCLAW is the 270-page manual for the always-on harness behind workflows like this one. $99, or the bundle for $149.

Get the OpenCLAW guide · $99

Want one workflow like this taken apart end-to-end every week? The Tuesday Pro Deep Dive · $39/mo.