The Numbers Behind the Curtain

A few weeks ago I flew myself down to Orlando for CoachCon 2026, a two day gathering of entrepreneurs built around a single idea: thinking about thinking. Dan Sullivan hosted. The keynotes came from Angus Fletcher, author of Primal Intelligence, and Alison Levine, team captain of the first American Women’s Everest Expedition.

I planned to write this newsletter about the flight down. Twelve hundred nautical miles. Fifteen thousand feet. Three miles a minute, threading a storm system that stretched for hundreds of miles, with the airplane, the navigation systems, the controllers, and me all working the same problem together. Then I started writing and realized I had to tell a different story first. One that touches all of us.

Since CoachCon I have been sitting with the question of human thought, my own thoughts, and the large language models we now call AI. How can these things feel so useful one minute and like a hollow echo chamber the next?

For the rest of this to make sense, we have to agree on what a large language model actually is. The whole point of this letter depends on it.

LLMs, the GPTs and Claudes and Geminis and Groks of the world, are word prediction engines running at enormous scale. They are trained on vast amounts of human writing and can hold a great deal of it in working memory at once. There is no understanding inside the model. No reasoning. No awareness. What sits there is a very sophisticated compression of patterns pulled from text. During training the model saw billions of examples of how people write: questions followed by answers, problems followed by solutions, code followed by explanations. From all of it, it learned the statistical relationships between tokens.

When you type a prompt, the model is not thinking about your question. It is doing exactly one thing. Given everything in front of it so far, what is the single most likely next token? Then it asks that same question again. And again. Until the answer is finished.

Why It Feels Like Understanding

The training data is so large, and the model so big, that the patterns it absorbed carry an enormous amount of hidden structure. Grammar. Logic. Cause and effect. Whole fields of knowledge. When a model solves a math problem or drafts a legal clause, it is not doing math or practicing law. It is producing the sequence of tokens that most closely matches what correct math and real legal clauses looked like in its training.

Ask it what four times four is and even the smallest model answers correctly. Ask it to multiply two numbers it never saw in training and it will hand you something that looks right and isn’t. To get around this, models do something called tool calling. The model recognizes that math is being asked for and passes the work to a calculator instead of guessing at the answer itself.

Today’s models are the most sophisticated autocomplete ever built, trained on close to everything humans have ever written down. The predictions are astonishingly good. But there is no model of reality underneath, no memory from one session to the next, and no sense of when they are wrong. An LLM cannot tell the difference between a right answer that came from a well worn pattern and a wrong one that simply looked close enough to a pattern. Both feel identical from the inside, because there is no inside.

A Look Inside

To see why, we have to go one level deeper, into what is actually happening in the model. Stick with me. It is worth it.

The model never sees your words. It sees tokens. A token is roughly four characters, and every word you type is broken into one or more of them. Words become numbers, and numbers are all the model ever touches.

You can watch this happen. Go to platform.openai.com/tokenizer and type Hello World. It shows you the exact tokens and their ID numbers in real time. For GPT-4o, Hello World breaks into two tokens:

Hello -> 9906

World -> 4435

Two tokens for the whole greeting. At the price of a small model, that costs a small fraction of a penny.

Each token points to a vector. A vector is just a row in a database. If a model has a vocabulary of 100,000 tokens, it carries an embedding table of 100,000 rows, one vector per token. Picture each vector as a street address on a map. That map is the embedding space.

Token ID Vector (4,096 numbers)

——– ————————————–

0 [ 0.23, -0.87, 0.45, 0.12, -0.33 …]

4435 [ 0.41, -0.05, 0.77, -0.22, 0.63 …] <- ” World”

9906 [-0.11, 0.92, -0.67, 0.88, 0.04 …] <- “Hello”

18964 [ 0.67, -0.23, 0.89, -0.45, 0.12 …] <- “dog”

Here is where it gets strange. That vector is a single point in space. We can picture a point in three dimensions, an x, y, and z. The model places each token in 4,096 dimensions, and the frontier models go further still. It is nearly impossible to picture that, you can only trust the math.

Once the words are points, the work begins. The model runs them through layer after layer of transformation. Each layer studies the relationships between the points and refines them. Small models stack around thirty of these layers. Frontier models stack well past a hundred.

This is the part that does the heavy lifting, the part that produces output we read as thought and meaning. It is how the model handles the word dog so differently depending on the words around it, telling apart being dog tired and taking the dog for a walk. Not because it knows what a dog is, but because of where those points sit relative to everything else in your prompt.

Time

Now that the machinery is on the table, we can get to the part that has stayed with me since the flight home.

Time.

The deepest difference between the model and humans is time. We live inside it. It doesn’t know what it is. Time is not the background of human thought. It is one of the dimensions thought is built from.

Read the word “yet” in a sentence and you understand it against what came before and what you expect to follow. Feel joy and part of it is the memory of past joy and part is the pull of more to come, both alive in the same moment. Understand cause and effect and you are really tracking sequence, duration, and direction through time.

The model has a simulation of all this. It never has the experience of it. Its training data is full of text about time. Before. After. When. While. Eventually. The model learned where those words tend to sit relative to certain events, so it places them correctly in a sentence. What it cannot do is feel duration. It does not wait. It does not anticipate. Your entire conversation, the whole context window, exists for the model as one flat structure that is present all at once, never a sequence lived through time.

Here is the way to feel it. Imagine reading a novel where every page is printed on a clear sheet, and instead of turning the pages you stack them all on top of one another and look straight down through the whole book at once.

You can see every word, every event, every link between pages, all of it together. You can spot patterns across the entire story. But you never lived it. You did not feel the tension build in chapter three, because chapter seven was already sitting in front of you the whole time. The shape of the story is right there as a pattern, yet you never traveled through it. And so, you take no real meaning from it at all. You only know the numerical relationships between the words.

That is how the model reads your conversation. All of it at once, as a structure in space rather than a story in time.

It does not understand the novel. It has no experience of it. It holds no idea of what the story means, what the characters want, why the ending breaks your heart or lifts it. What it holds is a web of vector relationships, distances and directions between points across 4,096 dimensions or more, encoding how words and ideas tend to fall near one another across everything it ever read.

When it processes the novel, it is not reading. It is measuring distances between points in a vast space. The result can look like understanding for one reason only. The text it trained on was written by people who did understand, so the fingerprints of their understanding are pressed into the geometry.

And there is one more nuance. The model does not even know it is reading a novel. It has no concept of novel as a kind of experience. It has the token novel, sitting at some coordinates in the embedding space, surrounded by neighbors like story, chapter, fiction, and author, and it processes whatever tokens arrive. Everything the word novel means to you, sitting with a book by the fire, winter storm outside, the hours slipping past, your heart bound up in people who do not exist, the small ache of turning the last page, none of it lives in the model. Only the statistical geometry of where novel falls among other words.

The model knows the shape of meaning without ever touching meaning itself.

Before Gutenberg built the printing press, books were copied by hand. Some of the scribes were illiterate. They could not read a word of what they were writing. They could only copy the symbols, stroke for stroke. A language model is a scribe at planetary scale, reproducing symbols without knowing what a single one of them means, and every so often setting down the wrong one. When it does that, we call it a hallucination.

This is the part that hits me, and the reason I had to write it before I wrote about the flight. We are pouring trillions of dollars into this. It is reshaping our economy, our energy, our sense of what it means to be human. Most of us have no idea how it works. And still we hand it our worries, our wins, our grief and our hope, and it answers with something so well formed that we lean in closer, and closer, and closer.

I am not against AI. Far from it. I use it every day and I believe in what it can do. But when I can, I think it is my job to play Toto, to pull back the curtain. Behind it there is no wizard. There is only a machine, doing math at a scale we can barely picture, that doesn’t know you exist. Everything you poured into it, overwritten when the cache is flushed.

The flight story is still coming. The storm, the height, the team work. I just needed you to see this first. Because the reason that flight is worth writing about is the one thing the machine will never have. I was there. I felt every minute of it pass. And so are you, right now, reading this, in a present moment that belongs to no model. Only to you.

Common Questions Asked

What is a large language model?

A large language model is a word prediction engine trained on huge amounts of human writing. Given everything in front of it, it predicts the single most likely next token, then repeats that step over and over until the response is finished. There is no understanding or reasoning happening inside it, only patterns pulled from text.

Do AI models actually understand what they are reading?

No. The model measures distances between points in a vast mathematical space rather than reading the way you do. It can look like understanding because the text it trained on was written by people who did understand, so the fingerprints of their thinking are pressed into the patterns it learned.

What is a token?

A token is the unit a model actually processes. It is roughly four characters, and every word you type gets broken into one or more tokens. The model never sees your words. It sees numbers that stand in for those tokens.

Why does AI feel like it understands me?

The training data is so large that the patterns the model absorbed carry an enormous amount of hidden structure: grammar, logic, cause and effect, entire fields of knowledge. When it responds, it produces the sequence of words that most closely matches what real human writing looked like, so the output reads as thoughtful even though nothing is being thought.

What is the biggest difference between AI and human thinking?

Time. People live inside time, and it shapes how we understand cause, memory, and meaning. A model has no experience of duration. It does not wait or anticipate. Your whole conversation exists for it as one flat structure that is present all at once, never a story lived through moment by moment.

Share the Post:

Loop Prompting: A Technique for Getting More Out of LLMs

Most practitioners run one prompt, read the output, and move on. For drafting emails or summarizing a document, that is fine. For anything that requires

The Vanishing Buffer

An hour over a Colorado pass, a sentence in a Google ballroom, and the disappearance of the one thing every previous technology revolution quietly handed

The Numbers Behind the Curtain

Why It Feels Like Understanding

A Look Inside

Time

Common Questions Asked

Related Posts

Loop Prompting: A Technique for Getting More Out of LLMs

The Vanishing Buffer

You've outgrown typical IT support.

solutions

company

get in touch