Context Windows
The maximum text a model can see at once, and why it's so hard to extend
What it is
The context window is the maximum number of tokens an LLM can process in a single forward pass, everything the model can "see" when generating a response. This includes the system prompt, conversation history, any retrieved documents, and the current user message.
Extending context windows is technically hard for two reasons: the attention mechanism's compute scales quadratically with sequence length (doubling tokens → 4x compute), and training data rarely contains sequences millions of tokens long, so models don't learn to use very long contexts well.
In practice, longer context windows enable document analysis, long conversations, and agentic tasks, which is why they matter commercially.