PaPoo
cover

What is a context window?

A context window is the amount of text a language model can read and keep in mind at once while it generates a response.

Why it matters

The context window sets the practical limit for what you can ask a model to consider in a single turn: your prompt, system instructions, chat history, retrieved documents, and the model’s own recent output all compete for that space.

In practice, it matters when you want the model to:

If the relevant information does not fit in the window, the model cannot directly attend to it. That is why long-context support is a core capability for chat assistants, coding tools, and retrieval-augmented generation systems.

How it works

A transformer language model processes a sequence of tokens, not raw characters or words. The context window is the maximum token sequence length the model is designed to handle in one forward pass or one generation step.

As the model reads tokens, it uses attention to weigh which earlier tokens are most relevant to the next token. Tokens inside the window can influence the output; tokens outside it cannot. If the input is longer than the window, something has to give: the app may truncate older messages, summarize them, or split the task into chunks.

A useful way to think about it: the context window is not long-term memory. It is more like the model’s working scratch space for the current interaction.

Tiny concrete example

Suppose a chat app has a 16k-token context window and your conversation includes:

That already fills the window. If you add more text, the app must drop or compress something before sending the request. If the answer depends on an earlier message that was pushed out, the model may behave as if it never saw it.

Common pitfalls / when NOT to use it

In practice, most teams treat the context window as a budget: use it for the most relevant instructions and evidence, and offload the rest to retrieval, memory, or summaries.

Related terms

同じ著者の記事