Sign in
Key Concepts

Context Windows

The maximum text a model can see at once, and why it's so hard to extend

What it is

The context window is the maximum number of tokens an LLM can process in a single forward pass, everything the model can "see" when generating a response. This includes the system prompt, conversation history, any retrieved documents, and the current user message.

Extending context windows is technically hard for two reasons: the attention mechanism's compute scales quadratically with sequence length (doubling tokens → 4x compute), and training data rarely contains sequences millions of tokens long, so models don't learn to use very long contexts well.

In practice, longer context windows enable document analysis, long conversations, and agentic tasks, which is why they matter commercially.

Why it matters

Context window size is one of the most important practical constraints when building AI products. It determines whether you need RAG (if your data doesn't fit), how long conversations can be before history needs to be summarized, and how much code an agent can hold in its working memory. "What fits in the context window?" is a question you'll answer constantly.

Related concepts

Resources

Deep Dive into LLMs like ChatGPT (section: hallucinations, tool use, knowledge/working memory ~1:20:00
youtube.com· Frames context windows as "working memory" vs. parameters as "vague recollections." Incredibly intuitive mental model.
20 min
Intro to Large Language Models
youtube.com· The "LLM as operating system" analogy, context window = RAM, training data = hard disk. Very memorable framing for CS-adjacent students.
60 min
What Is a Context Window?
ibm.com· Authoritative explainer covering tokens, self-attention mechanics, tradeoffs of larger windows, and current model comparisons. Updated regularly.
8 min
Why Larger LLM Context Windows Are All the Rage
research.ibm.com· Goes deeper into the engineering challenges, why bigger isn't always better, the "lost in the middle" problem, and IBM's research on context compression. Good intermediate-level follow-up.
10 min