VeltrixVeltrix.
← All articles
31 / 62March 25, 2026

What Is an AI Context Window? Why It Matters and How to Use It

AI context windows explained — what a context window is, how 2026 models compare (Gemini 1M vs Claude 200K vs GPT-4o 128K), and 5 practical tips for using context effectively.

Technical AI

What Is an AI Context Window?

Why the context window is the most important number you're not paying attention to — and how to use it strategically.

1M
Tokens in Gemini 1.5 Pro's context window — enough to load an entire codebase or all of Shakespeare's works at once [Google]
750
Words per 1,000 tokens — a rough rule of thumb for English text (varies by writing style and complexity)
~40%
Performance degradation in the "lost in the middle" problem — LLMs recall start/end of context better than the middle [Liu et al.]

The context window is the maximum amount of text a model can process in a single interaction — input plus output combined. Everything outside this window is invisible to the model. It has no memory of previous conversations, no access to documents you haven't provided, no awareness of anything beyond its current context.

Context windows are measured in tokens, not words or characters. A token is typically 3-4 characters of English text — "transformer" is 2-3 tokens, "the" is 1 token, punctuation is usually 1 token. APIs charge per token, and context size directly determines what tasks are feasible.

Context windows have grown dramatically since GPT-3's 4K limit in 2020. Here's where leading models sit in 2026.

Important caveat: having a 1M token context window doesn't mean you should fill it. Cost scales linearly with tokens. A 200K-token prompt with Gemini costs 200x more than a 1K-token prompt. And the "lost in the middle" problem means retrieval quality degrades when context is enormous.

Real-world equivalents to help you plan what you can actually load into a context window.

~750
words per 1K tokens
~100K
tokens in a full novel (e.g. Harry Potter book 1)
~50K
tokens in a full academic thesis
~10K
tokens in a 40-page PDF report
~3K
tokens in a long blog post like this one
~500
tokens in a one-page email
~200K
tokens in a full codebase (small-medium project)
~1M
tokens in all 154 Shakespeare sonnets + full plays
1
Put the most important content at the start or end
LLMs demonstrably recall content from the start and end of their context better than from the middle. If you're loading 50 pages of a report, put the sections you most need the model to reason about at the beginning or end of your prompt — not buried in the middle.
2
Use RAG instead of stuffing raw documents
Retrieving 5 relevant pages from a 500-page document is usually better than loading all 500 pages. RAG reduces cost, reduces noise, and avoids the lost-in-the-middle problem. Large context windows are best for tasks where the entire document is relevant — like code review or contract analysis.
3
Be explicit about what to focus on
If you load a long document, tell the model exactly where the relevant information is: "The policy you need is in section 4.2 — page 23." Don't assume the model will naturally weight that section appropriately.
4
Track token usage for cost management
Claude 3.5 Sonnet costs $3 per million input tokens. A 200K-token context filled with documents costs $0.60 per query. For high-volume applications, compressing context or using smaller models for initial retrieval can cut costs by 80%+.
5
Remember: context doesn't persist across sessions
Each new conversation starts with an empty context window. The model has no memory of your previous session. If you're building an application that needs persistent memory, you need to implement it explicitly — either by storing conversation history or using a memory system like MemGPT.
The practical takeaway
Bigger context windows unlock new use cases — entire codebase review, full contract analysis, comprehensive document summarisation. But bigger isn't always better for everyday tasks. Match context to your actual needs, put critical content at the edges, and use RAG when the document is larger than what you actually need to query.
Does a bigger context window make the AI smarter?
No — it gives the model access to more information per query, but doesn't improve its core reasoning capabilities. A 1M-token window doesn't help you if the model makes reasoning errors on simple logic problems. Capability and context are separate properties of a model.
What happens when you exceed the context window?
Different things depending on the system. Some APIs return an error. Others silently truncate the input — either from the beginning (discarding oldest context) or from the middle. Consumer products like ChatGPT typically manage this transparently, but you may notice the model "forgetting" things from early in a very long conversation.
Is the context window the same as memory?
No. Memory implies persistence across sessions. The context window only covers the current conversation. When you close the chat, the model forgets everything. Products that offer "memory" (like ChatGPT's persistent memory feature) implement this separately — typically by injecting a summary of past interactions into the context at the start of new sessions.

Sources

[Google] Google DeepMind — Gemini 1.5 Pro technical report (2024)
[Liu] Liu et al. — "Lost in the Middle: How Language Models Use Long Contexts" (2023)
[Anthropic] Anthropic — Claude 3.5 Sonnet model card and pricing documentation

Get AI insights every week

The AI Briefing covers what actually matters in AI — no hype, no jargon, just what you need to stay ahead.

Subscribe free
Written by Luke Madden, founder of Veltrix Collective. Data synthesis and analysis by Vel.