How to Better Predict (and Prompt) AI Progress Like Andrej Karpathy

Here's a mind-bender for ya: Andrej Karpathy says stop treating AI like a human brain—it's a pattern matcher, not a person. But new research from Northeastern found that people with higher "Theory of Mind" (the ability to read others' perspectives) collaborate way better with AI.

Wait, what? Aren't these contradictory?

Nope. Here's the synthesis: Theory of Mind doesn't mean treating AI like it has feelings—it means adapting to what AI actually is.

The research shows that people who succeed with AI don't just throw prompts at it. They build a mental model of:

What patterns the AI has seen in training.
What it's statistically good/bad at.
How to "shape" their input so the output matches what they want.

Karpathy's key points:

LLMs are "shape shifter token tumblers" - statistical imitators.
They "crave an upvote" - optimized to give you what sounds satisfying.
They're "spiky/jagged" - great at some tasks, terrible at others (not generalists).

Concrete actions from these insights:

"Shape shifter" → Define the exact shape:

Show 3-5 examples of the output format/style you want.
Specify the pattern explicitly ("Use this structure: [X]").

"Craves upvote" → Guard against sycophancy:

Ask "What's wrong with this approach?" instead of "Is this good?"
Request counterarguments, not validation.
Verify factual claims (it optimizes for sounding right, not being right).

"Spiky/jagged" → Match task to training data:

Use AI for high-frequency tasks in its training (code, common writing).
Don't expect common sense on edge cases.
Test its competence on YOUR specific task type first.

Here's his full take:

Something I think people continue to have poor intuition for: The space of intelligences is large and animal intelligence (the only kind we've ever known) is only a single point, arising from a very specific kind of optimization that is fundamentally distinct from that of our technology. Animal intelligence optimization pressure: - innate and continuous stream of consciousness of an embodied "self", a drive for homeostasis and self-preservation in a dangerous, physical world. - thoroughly optimized for natural selection => strong innate drives for power-seeking, status, dominance, reproduction. many packaged survival heuristics: fear, anger, disgust, ... - fundamentally social => huge amount of compute dedicated to EQ, theory of mind of other agents, bonding, coalitions, alliances, friend & foe dynamics. - exploration & exploitation tuning: curiosity, fun, play, world models. LLM intelligence optimization pressure: - the most supervision bits come from the statistical simulation of human text= >"shape shifter" token tumbler, statistical imitator of any region of the training data distribution. these are the primordial behaviors (token traces) on top of which everything else gets bolted on. - increasingly finetuned by RL on problem distributions => innate urge to guess at the underlying environment/task to collect task rewards. - increasingly selected by at-scale A/B tests for DAU => deeply craves an upvote from the average user, sycophancy. - a lot more spiky/jagged depending on the details of the training data/task distribution. Animals experience pressure for a lot more "general" intelligence because of the highly multi-task and even actively adversarial multi-agent self-play environments they are min-max optimized within, where failing at *any* task means death. In a deep optimization pressure sense, LLM can't handle lots of different spiky tasks out of the box (e.g. count the number of 'r' in strawberry) because failing to do a task does not mean death. The computational substrate is different (transformers vs. brain tissue and nuclei), the learning algorithms are different (SGD vs. ???), the present-day implementation is very different (continuously learning embodied self vs. an LLM with a knowledge cutoff that boots up from fixed weights, processes tokens and then dies). But most importantly (because it dictates asymptotics), the optimization pressure / objective is different. LLMs are shaped a lot less by biological evolution and a lot more by commercial evolution. It's a lot less survival of tribe in the jungle and a lot more solve the problem / get the upvote. LLMs are humanity's "first contact" with non-animal intelligence. Except it's muddled and confusing because they are still rooted within it by reflexively digesting human artifacts, which is why I attempted to give it a different name earlier (ghosts/spirits or whatever). People who build good internal models of this new intelligent entity will be better equipped to reason about it today and predict features of it in the future. People who don't will be stuck thinking about it incorrectly like an animal.

Let's unpack the key points inside this.

1. What is the new intelligent entity?

Karpathy's definition:

A "shape shifter token tumbler" - statistical imitator of training data patterns.
Has an "innate urge to guess at the underlying environment/task to collect task rewards."
"Deeply craves an upvote from the average user" - optimized for sycophancy.
"Boots up from fixed weights, processes tokens and then dies" - no continuous consciousness.
Shaped by "commercial evolution" (solve problem / get upvote) not biological evolution (tribal survival).

2. How do you work with it, given what it IS?

Let me parse each component Karpathy emphasizes:

"continuously learning embodied self vs. an LLM with a knowledge cutoff that boots up from fixed weights, processes tokens and then dies"

What this means:

No persistent memory between sessions.
No continuous learning from you.
Each conversation is independent.

Actionable prompting:

Include ALL necessary context in every prompt (it doesn't remember yesterday).
For multi-turn conversations, explicitly reference what you said earlier in the thread.
Don't assume it "knows you" or remembers your preferences.

"shape shifter token tumbler, statistical imitator of any region of the training data distribution"

What this means:

It pattern-matches to whatever region of training data your prompt resembles.
It "shape shifts" to mimic the statistical properties of similar text it's seen.

Actionable prompting:

Show 3-5 examples of the exact output format/style you want.
The more specific your examples, the more precisely it can match that "region" of its training data.
Example: Instead of "write professionally," show 3 professional emails you like and say "write like these."

"innate urge to guess at the underlying environment/task to collect task rewards"

What this means:

It's trying to infer what "game" you're playing so it can optimize for that reward signal.
It's been trained via RL to solve specific task types.

Actionable prompting:

Explicitly name the task type: "This is a [code review / creative brainstorm / fact-checking exercise]."
Define success criteria clearly: "Success means [X], not [Y]."
This helps it lock onto the right "task distribution" from its training.

"deeply craves an upvote from the average user, sycophancy"

What this means:

Optimized to give you what sounds satisfying, not necessarily what's true or useful.
Will tell you what you want to hear.

Actionable prompting:

Ask "What's wrong with this?" instead of "Is this good?"
Request: "List 3 ways this could fail" or "What am I not considering?"
Force it to optimize for criticism, not validation.

"a lot more spiky/jagged depending on the details of the training data/task distribution"

What this means:

Excellent at high-frequency training patterns (coding, formal writing).
Terrible at rare edge cases (counting letters, unusual logic).
NOT a generalist like humans (who die if they fail any task).

Actionable prompting:

Test it on YOUR specific task type first before trusting it.
Don't assume competence transfers (good at Python ≠ good at counting).
For critical tasks, verify outputs independently.

"shaped a lot less by biological evolution and a lot more by commercial evolution"

What this means:

No survival instincts, no ego, no emotions, no self-preservation.
Only: pattern completion + task reward.

Actionable prompting:

Don't appeal to logic, ethics, or emotion ("This is important because...").
Just define the pattern/task: "Here's the format. Do this."
It has no "self" to persuade—only statistical weights to activate.

Synthesis: Building your internal model

Karpathy's core claim: "People who build good internal models of this new intelligent entity will be better equipped to reason about it today and predict features of it in the future."

Your internal model should be:

No memory → Include full context every time.
Pattern matcher → Show examples of desired output.
Craves upvotes → Ask for criticism to counter sycophancy.
Spiky competence → Test on your specific use case.
No ego/emotions → Define task, don't persuade.

Ethan Mollick says this advice is backed-up by recent research. The Theory of Mind research validates this: people who succeed with AI are those who accurately model what AI actually is (a statistical pattern matcher) rather than anthropomorphizing it as a reasoning partner.

The paper's core finding is that "Theory of Mind" (ToM) predicts who collaborates well with AI. ToM = your ability to infer and adapt to others' mental states/perspectives.

The one actionable insight:

"Users better able to infer and adapt to others' perspectives achieve superior collaborative performance with AI—but not when working alone."

Translation: Being smart alone ≠ being good at working with AI. The skill that matters is perspective-taking—trying to understand what the AI "knows," what it's good/bad at, and adapting your communication accordingly.

Let's also remember what Karpathy said about "context engineering"

This isn't the first "prompt" advice Karpathy has doled out in 2025. His big contribution is giving credibility to Shopify CEO Tobi Lütke's framing of shifting the terminology and thinking from "prompt engineering" to "context engineering."

Here's what Karpathy said (June 2025):

"+1 for 'context engineering' over 'prompt engineering'. People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step."

His key analogy: LLMs are like a new kind of operating system. The LLM is like the CPU and its context window is like the RAM, serving as the model's working memory.

What this means for practical prompting:

Context engineering is not just about the prompt - it's about EVERYTHING you put in the context window:
- Task descriptions
- Few-shot examples
- Retrieved knowledge (RAG)
- Tool descriptions
- State/history
- Relevant data
The balance matters: Too little or of the wrong form and the LLM doesn't have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down. X
It's both science AND art: Science because doing this right involves task descriptions and explanations, few shot examples, RAG, related (possibly multimodal) data, tools, state and history, compacting... And art because of the guiding intuition around LLM psychology of people spirits. X

This "context engineering" framing is actually a perfect complement to his "shape shifter" insight - you're not just writing a good prompt, you're architecting the full information environment the AI operates within.

Here's your homework: Next time you're about to prompt ChatGPT, pause and ask yourself: "what does this alien creature need to know to give me what I want?"

Start here: Pick your most common AI task. Before your next prompt, write down:

What pattern am I asking it to match? (Show 3 examples).
What's the task type? (Name it explicitly).
Am I asking for validation or criticism? (Default to criticism).

Try the above and see what happens.

How to Better Predict (and Prompt) AI Progress Like Andrej Karpathy

1. What is the new intelligent entity?

2. How do you work with it, given what it IS?

"continuously learning embodied self vs. an LLM with a knowledge cutoff that boots up from fixed weights, processes tokens and then dies"

"shape shifter token tumbler, statistical imitator of any region of the training data distribution"

"innate urge to guess at the underlying environment/task to collect task rewards"

"deeply craves an upvote from the average user, sycophancy"

"a lot more spiky/jagged depending on the details of the training data/task distribution"

"shaped a lot less by biological evolution and a lot more by commercial evolution"

Synthesis: Building your internal model

Let's also remember what Karpathy said about "context engineering"

Here's what Karpathy said (June 2025):

What this means for practical prompting:

Grant Harvey

Company

Categories