😸 Dr. Fei-Fei Li, the Godmother of AI, on why we need "Spatial Intelligence"

PLUS: Is this the end of LLMs?

Welcome, humans.

Quick heads up: We are planning a special Black Friday edition of The Neuron, which will feature our favorite tools as well as some from our partners, too. If you work at an AI company with a special offer for the rest of our readers, click below and fill out the form (or email partnerships@theneurondaily.com ) and Mindy will help you get it set-up!

Advertise in The Neuron here!

Now, in another sign of the times, there is now a concerted effort to “save the em-dash” from the clutches of AI. Or put another way, to “make the em-dash cool again.” If only there was a “cool” form of punctuation that signified a pause that AI never uses …

Here’s what happened in AI today:

Dr. Fei-Fei Li on how “spatial intelligence” will move us beyond today’s AI.
Intel's AI chief left to lead compute at OpenAI.
Anthropic expects break-even in 2028, OpenAI profitability in 2030.
Stanford found advanced AI models fail to recognize human false beliefs.

The Godmother of AI says the Future is “Spatial Intelligence”

DEEP DIVE: Dr Fei-Fei Li Says AI Needs to Stop Being a "Wordsmith in the Dark"

The “Godmother of AI” just published an essay arguing that language models have hit a wall, and spatial intelligence is the key to unlocking the next era of AI.

Fei-Fei Li is a Stanford professor, co-founder of World Labs, and the researcher who built ImageNet (one of the three key ingredients that birthed modern AI), so when she drops a manifesto on AI, it’s kinda like if Kendrick Lamar dropped a mixtape: you pay attention.

Her thesis = today's AI can write, reason, and code brilliantly, but it's fundamentally blind to the physical world around it.

Think about it: ChatGPT can explain quantum physics, but can't tell you how far apart two objects are in a photo. It can write a screenplay, but can't mentally rotate a cube. It's like having a genius professor who's never left their study. In this case, the study is a rack of pipin’ hot NVIDIA servers.

So what's spatial intelligence? It's how humans and animals understand, navigate, and interact with the 3D world. It's why a firefighter can read a collapsing building through smoke. It's why you can catch keys tossed across a room without thinking. And it's why Watson and Crick built physical models to discover DNA's structure—some breakthroughs require manipulating space, not just “tokens.”

Li argues to unlock spatial intelligence, we need world models: a new type of AI with three core capabilities:

Generative: Creates virtual worlds that follow physics and geometry.
Multimodal: Processes images, video, depth maps, actions (not just words).
Interactive: Predicts what happens next when you take an action.

Where this leads:

Creativity tools (shipping now): World Labs' Marble already lets creators generate explorable 3D worlds from text prompts—very cool project, go check it out!
Robotics (mid-term): Robots that actually understand their environment, not just follow scripts (or VR-assisted directions).
Science (longer-term): AI that can simulate drug interactions in 3D or explore environments humans can't access.

Why it matters: Developing spatial intelligence could mean the difference between AI that talks about the world versus AI that actually understands (and exists in) it. Language made AI smart through what Karpathy called our “crappy evolution”, but spatial intelligence will make it actually useful beyond a chat interface.

As Li puts it, “the limits of my language mean the limits of my world.” For AI to escape those limits, it needs to see, touch, and move.

Check out the full essay on her Substack for the deeper dive into evolutionary neuroscience and why this took nature 500M years to figure out.

FROM OUR PARTNERS

How to Build Secure MCP Auth for Enterprise AI

As AI agents connect to enterprise systems through MCP, secure authorization is essential.

This guide explains how to implement OAuth 2.1 with PKCE, scopes, user consent, and token revocation to give agents scoped, auditable access without relying on API keys.

Learn how to design production-ready MCP auth for enterprise-grade AI systems.

Read the guide →

Prompt Tip of the Day

Daniel from xAI has a speed hack for working with AI for ya: you speak faster than you type, but you read faster than you listen. The sweet spot? Speak your prompts, then read the responses.

This “asymmetric” workflow is a game-changer for long prompts. Amanda Askell, an AI researcher at Anthropic, says she regularly uses prompts over 100 pages long—and says most people prompting are way too succinct. Her logic: if a task needs a handbook, give the model the handbook instead of making it play 20 questions with you.

Voice mode makes this possible. Multiple users swear by voice prompting for complex ideas. Some even report pacing and rambling for 20 minutes to dump all their context before hitting send. Others use voice transcription apps, paste the transcript, then ask the AI to tighten it up before executing. No typing friction = no reason to compress your thoughts.

So then how do you do this? Here’s how to set it up depending on which AI you use:

ChatGPT: Tap the microphone icon (not the Advanced Voice Mode soundwave icon, or if you do, exit it when it starts talking), speak your prompt, then read the response.
Claude: Tap sound wave on mobile app, speak, read on-screen response.
Gemini: Hit the microphone button, dictate, then read the text (on Android w/ Gemini Live it’ll talk back to you, but read the responses instead of listen).
Grok: Tap voice icon (top right), speak, then read the captioned output (btw, the voice captioning feature makes this great for language learners).

This speak-and-read workflow feels wrong at first; we're conditioned to think voice mode = full voice conversations. But it's genuinely faster for complex prompts where you need detailed responses. Try it for your next research task or when building a multi-step prompt, and you'll never go back to typing.

Remember: “Be precise” doesn’t = “be concise.” Voice removes the friction of detailed context—so talk it out, get specific, and let the AI work with the full picture.

Treats to Try

Someone made an AI Agent tool for Minecraft called Steve that you can tell to "mine iron" or "build a castle" and multiple agents will spawn and automatically coordinate to divide the work and avoid conflicts.

Fireworks AI lets you customize open models like DeepSeek and Kimi for your specific product without building an ML team, using their Eval Protocol that trains on your real production data—Vercel achieved 94% error-free code generation and 40x faster speeds.—free to try (through 11/24).
The new TIME AI Agent lets you ask questions about 102 years of TIME journalism and get summaries, translations in 13 languages, or audio briefings—like creating a debate on “Is AI good for humanity?” (lol) using decades of TIME archives (read more).
Gamma instantly creates presentations and websites from your rough ideas (raised $68M).
Apollo-V0.1-4B-Thinking becomes any character you give it; insert a persona in the system prompt and it thinks and responds entirely as that character, whether it's a novelist crafting story arcs or a DM-ready NPC for your next campaign (Corey says this is his new DND go to!).
Open-dLLM is the first fully open-source diffusion language model that generates code by iteratively refining multiple tokens in parallel instead of one-at-a-time, letting you train, evaluate, and deploy your own diffusion model with their complete pipeline, data, and checkpoints for free—(HuggingFace, read more, demo with bad code which the OP admits, but shows how it works).
1. Related: This paper shows how diffusion models are “super learners”

Around the Horn

Intel's AI chief Sachin Katti left the company to lead compute infrastructure at OpenAI.
Anthropic believes it will break even in 2028, while OpenAI expects to turn profitable in 2030 (after, y’know, $70B+ in losses).
Google is replacing Google Assistant with Gemini AI on Google TV Streamer devices starting this month.
Microsoft partnered with top lifestyle influencers like Alix Earle to make its Copilot AI assistant more appealing to young consumers.
Stanford researchers evaluated 24 advanced language models using 13,000 questions and found that even the most powerful AI systems often failed to recognize when humans hold false beliefs, revealing a key weakness in their reasoning abilities (paper).
The things we can get robots to do rn is low-key amazing; watch this whole embodied avatar video to see how the Unitree G1 can mimic human movements in real time; there’s also firefighting robot dogs AND firefighting drones… plus the IRON humanoid robot that set the internet on fire last week… and then there’s THIS… the funniest robot video we’ve ever seen…
1. Related: this guy who DIY’ed his own genius robot halloween decorations with windshield wiper motors and a few other gizmos.

FROM OUR PARTNERS

Your AI Is only as smart as your Data. Get ready with CData Sync.

Start Your Free Trial Today!

NEW SECTION: Tuesday Tool Tip!

Grant and Corey just put together a beginner-friendly 12 min explainer on how to get started with ComfyUI, the node-based tool for generating images on your own computer. It looks scary to start, but is actually pretty simple once you get past the spaghetti-like interface.

We show you how to install it, load templates, and generate your first AI image in under 10 minutes (plus download time, because those models are chunky). Check it out!