The three numbers that shape every AI interaction

Thorsten Meyer AI Foundations · 02 / 08

Parameters, tokens, and context window — the dials behind almost every AI behavior

Three questions people ask about AI, all with the same shape of answer:

Why did it forget what I told it ten messages ago?

Why is the bigger model slower, and not obviously smarter?

Why can’t it count the r’s in “strawberry”?

They sound unrelated. They aren’t. Each one points at a different dial — three dials that together govern most of what any model can or can’t do. Learn these three, and most of what looks mysterious about AI stops being mysterious.

The three dials are parameters, tokens, and context window.

Piano Tuning Wrench, Music Instrument Tuning Wrench Professional Pianos Tunings Wrenchs Autoharp Tunings Wrenchs Harp Piano Tuner Tools for Lyre Harp and Other Small Stringed Instruments

Construction: crafted with a metal core and a comfortable plastic handle, this piano tuner tools offers reliable performance…

As an affiliate, we earn on qualifying purchases.

Three dials

Think of every model as a machine with three dials.

Parameters — the size of the brain. How much the model was able to compress during training.

Tokens — the unit of thought. Not letters, not words. Something in between.

Context window — the working memory. The scratchpad the model sees while reasoning about your input.

Every surprising behavior I listed above falls out of one of these three. Let’s take them one at a time.

The Three-Layer Stack Model, Inference, and Application Download

Dry Erase Tokens Set of 100 Colorful Blank Counters – Reusable Customizable Game Discs with 2 Marker Pens for DND, MTG, Bingo, and Tabletop RPGs

Package Includes: Comes with 100 pcs 1-inch dry erase tokens in 7 vibrant colors (10 blue, 10 green,…

As an affiliate, we earn on qualifying purchases.

Parameters: the size of the brain

Parameters are the numerical weights inside the model. Frontier models have hundreds of billions of them. More parameters means more capacity to compress the statistical structure of the training data — more patterns the model can fit, more subtleties it can preserve.

Bigger isn’t monotonically better. Past a certain point, diminishing returns set in. Speed drops. Cost rises. A small, well-trained model on a narrow task can outperform a much larger general one. And at the very frontier, the gains from adding more parameters have been getting smaller — the interesting progress now comes from better training recipes, better data, and better post-training, not just from stacking more parameters.

This is why a 2026 70-billion-parameter model can match a 2024 500-billion-parameter one. The parameters are smaller, but the recipe improved. If you pick models by parameter count alone, you’ll systematically misread the market.

For day-to-day decisions, parameters affect cost and speed more than they affect what you’ll perceive as “intelligence” at the frontier. A giant model is expensive and slow; a mid-tier frontier model is cheap and fast. For most tasks the output is indistinguishable, and the mid-tier is better value. Pick by task fit, not by size.

Amazon

AI context window extension software

As an affiliate, we earn on qualifying purchases.

Tokens: the unit of thought

Tokens are what the model actually sees. Not characters. Not words. Something more granular than words and less granular than letters — chunks of text that are statistically common in the training corpus.

In English, a token is roughly three to four characters on average. “strawberry” might split into ["straw", "berry"] or ["st", "raw", "berry"] depending on the tokenizer. The model sees these chunks as atomic units. It does not see the individual letters inside them.

This is why counting letters is hard for LLMs. The model can’t introspect the characters inside a token — it has to reason about what the token probably contains. Sometimes it gets that right. Often it doesn’t. The “strawberry problem” — asking how many r’s are in the word — is funny because the task is trivial for a human (look, count) and surprisingly hard for a model (infer the letter composition of a chunk it can’t see through).

Tokens also drive cost. You pay per input token and per output token. The same document costs wildly different amounts depending on language: a 500-word passage might be roughly 650 tokens in English, over 1,000 in German (long compound words generate more subtokens), and multiples more in many non-Latin scripts. If your users write in languages other than English, your per-call cost is not what the English pricing page suggests.

Operationally: if your task is character-level — counting, transposing, reversing, spelling — you’re working against the grain. The fix is almost never “try a better prompt.” The fix is to give the model tools (a Python cell, a regex, a calculator) or to restructure the task so introspecting token internals isn’t required.

Enterprise Artificial Intelligence Transformation

As an affiliate, we earn on qualifying purchases.

Context window: the working memory

The context window is the amount of text the model can consider in a single pass. It’s shared between input and output — if you have a 200,000-token window and you’ve used 180,000 tokens of input, you have 20,000 tokens of budget left for the response.

The window is working memory, not long-term memory. Each inference call starts fresh. Nothing persists between calls unless the application re-sends it. Everything you experience as “the model remembering a past conversation” is the application stuffing your past conversation back into the window at every turn.

To make 200,000 tokens concrete: that’s roughly 150,000 English words, or about 300 pages of a novel, or the source of a medium-sized codebase, or a month of a busy Slack channel. Frontier windows are now at a million tokens and climbing. That sounds infinite. It isn’t.

When a model “forgets” something you said ten messages ago, one of two things is usually happening. Either the application truncated your history because the window filled up. Or the information is buried in the middle of a very long context, where models reliably perform worse than they do at the start or the end — the “lost in the middle” effect. A bigger window doesn’t automatically buy you better recall across that window. Window size and attention quality are different things.

Operationally: before stuffing a huge document into context, test whether the model is actually using the middle of it. Often you’ll get better answers with retrieval — pulling in only the relevant chunks — than with a maximal-context approach.

The three dials together

Most AI decisions collapse into three questions, one per dial.

Which model? At the frontier, parameters drive cost and speed more than they drive capability. Pick by task fit.

How do I prompt it? Tokens are the unit the model sees. Don’t fight them. If the task is character-level, add tools instead of trying harder prompts.

How do I structure context? Window size sets the ceiling on what can fit. Attention quality across that window sets the floor on what the model will actually use. Retrieval often beats context-stuffing.

Three dials, three questions. Almost every operational decision about a model falls into one of them.

Why benchmarks mislead

When you read a benchmark number, you’re reading a weighted sum of all three dials plus a fourth — inference efficiency, which governs how fast and how cheaply a model runs. Different labs optimize for different combinations. Some push parameters. Some push long-context quality. Some push enormous windows. Open-weight players tend to push efficiency and portability.

This is why “Model X beats Model Y” claims almost never survive contact with your actual workload. The benchmark was tuned on particular task types, at particular context lengths, with particular tokenization. Two models with near-identical benchmark scores can feel radically different in production, because the dials that dominated the benchmark are not the dials that dominate your use case.

The useful move isn’t to trust public leaderboards. It’s to build a small, private benchmark from five or six tasks that actually matter to you, and run every model you’re considering against it. The answer you get is worth more than any public score.

foundations-02-deck Download

Next in Thorsten Meyer AI Foundations: numbers tell you what a model can do mechanically. They don’t tell you what it’s actually good at. Capability is jagged, not graded — and the shape of that jaggedness is more useful than any benchmark.

The three numbers that shape every AI interaction

Up next

Prompting, beyond the tricks

Author

Thorsten Meyer

Share article

Parameters, tokens, and context window — the dials behind almost every AI behavior

Piano Tuning Wrench, Music Instrument Tuning Wrench Professional Pianos Tunings Wrenchs Autoharp Tunings Wrenchs Harp Piano Tuner Tools for Lyre Harp and Other Small Stringed Instruments

Three dials

Dry Erase Tokens Set of 100 Colorful Blank Counters – Reusable Customizable Game Discs with 2 Marker Pens for DND, MTG, Bingo, and Tabletop RPGs

Parameters: the size of the brain

AI context window extension software

Tokens: the unit of thought

Enterprise Artificial Intelligence Transformation

Context window: the working memory

The three dials together

Why benchmarks mislead

Three Fronts: Inside the Week the AI Race Stopped Being Abstract

The human layer

The Stack: Six Layers Every Executive Should Understand About AI Coding Tools

What AI is good at (and what it genuinely isn’t)

7 Best Tablet Stands and Docks for Prime Day Deals in 2026

7 Best PC Routers for Prime Day Deals in 2026

IdeaClyst: The Validation Council

The Trust Shock: What Suspending Fable 5 Means for US AI, Its Rivals, and the World

The three numbers that shape every AI interaction

Up next

Author

Thorsten Meyer

Share article

Parameters, tokens, and context window — the dials behind almost every AI behavior

Piano Tuning Wrench, Music Instrument Tuning Wrench Professional Pianos Tunings Wrenchs Autoharp Tunings Wrenchs Harp Piano Tuner Tools for Lyre Harp and Other Small Stringed Instruments

Three dials

Dry Erase Tokens Set of 100 Colorful Blank Counters – Reusable Customizable Game Discs with 2 Marker Pens for DND, MTG, Bingo, and Tabletop RPGs

Parameters: the size of the brain

AI context window extension software

Tokens: the unit of thought

Enterprise Artificial Intelligence Transformation

Context window: the working memory

The three dials together

Why benchmarks mislead

You May Also Like