😺 🎙️ Watch: "Transformers will not get us to AGI." Here's what might.

elcome, humans.

What if the transformer architecture, the technology behind ChatGPT, Claude, and every frontier AI model, is fundamentally broken?

That's the bold claim* from Zuzanna Stamirowska, CEO of Pathway, who just published research on what she calls the “first post-transformer frontier model.”

It’s an architecture called Baby Dragon Hatchling (BDH), and why yes, we DO ask her about the name.

In our latest podcast episode, Zuzanna shares her argument: Current large language models (LLMs) are stuck in a “Groundhog Day loop“; waking up with no memory, no sense of time, no ability to learn after training. BDH changes that by working like a brain: neurons that connect, strengthen, and adapt in real-time.

Click the image to watch the video on YouTube.

She even compared the current transformer paradigm to pre-Copernican astronomy, suggesting everyone's building elaborate epicycles when they should be rethinking the orbit entirely.

*To be fair to Zuzanna, this is something we’ve heard A LOT lately; read our recent piece on why attention might NOT be what you need after all…

Here's what blew our minds:
Why watch this?
Dive deeper:
🔴 LATER THIS WEEK: Our AI predictions for 2026!
Last week’s episode: 🧮 On the 24 Year Old Building Mathematical Superintelligence…
IN CASE YOU MISSED IT…

Here's what blew our minds:

(5:45) “AI right now especially is deprived of the notion of time”; modern AI doesn’t experience time, which means no continuity, no lived memory, and a hard ceiling on intelligence.
(21:40) “We saw a brain emerge during training.“ The team watched neural structures spontaneously form. They rushed into the office at night to see it.
(22:47) How BDH actually remembers: “Whenever two neurons are interested by something, the connection becomes stronger. This is memory.“
(24:54) “Transformers need MRI machines. We have CCTV inside the brain.“ Meaning BDH is inherently interpretable.
(26:25) Surprise as a memory signal for what's worth remembering, determining what’s worth learning, which may explain why children—and adaptive models—learn so fast.
(28:07) “This is not the game of scaling“ This isn’t about scaling parameters, but about systems that learn faster and generalize with far less data.
(38:46) Pathway (and intelligence’s) true Northstar: Not benchmarks, but systems that invent solutions to problems they’ve never seen; an “innovator who sees what's not there.“
(32:50) “We can glue two separately trained models together and they become one.“ Like Lego blocks for AI. Is modular intelligence the new scaling?
(41:51) What if the AI learns something you don't want? “You could quarantine it essentially“; transparent learning systems allow bad updates to be isolated and literally rolled back, instead of retraining everything from scratch.

Why watch this?

Because if Zuzanna is right, we're witnessing the beginning of the post-transformer era. BDH offers continual learning (it keeps learning after training), true memory (not just sticky notes in a harness), and built-in interpretability (you can see exactly which neurons fire). That's three things transformers fundamentally cannot do.

And Pathway is backed by Lukasz Kaiser—one of the co-inventors of the transformer itself.

Listen and/or Watch now on YouTube | Spotify | Apple Podcasts

P.S. At (44:48), Zuzanna compares the AI transition to the moment humans invented agriculture—the shift that enabled civilization 1.0. Her prediction? AI enables civilization 2.0. 🌍

Also, for those AI-bubble watchers out there, at (12:54) Zuzanna says AI investment is still ~0.7% of GDP, far below the 2% spent on the telecom boom of the ’90s (and AI is much more fundamental), suggesting we’re earlier in the curve than most people think.

Dive deeper:

📄 Read the BDH research paper
🌐 Learn more about Pathway

Keep scrolling for more info on the 4-month-old startup that just ranked #1 on the world's hardest math exam, why diffusion models might be an alternative path to replace transformers, and more info on our upcoming live-stream this week.

🔴 LATER THIS WEEK: Our AI predictions for 2026!

*Click this image, then select “Notify Me” on the video to get notified when we’re live.*

On Friday at 9am PST | 12pm ET | 3pm GMT, we're going LIVE with our predictions for 2026 on YouTube, LinkedIn, and X. Corey and Grant will lay out their boldest predictions for 2026—from which companies will dominate the leaderboards to potential acquisitions that would reshape the industry, new model architectures that could dethrone transformers (can you guess what we’re going to say on THAT?), and where AI may show up in your daily life that you're not expecting. Some of these will age like fine wine. Others might age like milk. Either way, we're putting our names on it.

Last week’s episode: 🧮 On the 24 Year Old Building Mathematical Superintelligence…

This one clearly struck a chord with 14K views on YT and counting!

Carina Hong dropped out of Stanford's PhD program to build an AI mathematician. When we interviewed her, Axiom Math was a 4-month-old startup with $64M in funding and a wild claim: a pledge to build “mathematical superintelligence.”

Then this happened after our interview:

📍 December 2: Axiom solved Erdős Problems #124 and #481—open problems named after legendary mathematician Paul Erdős that had stumped researchers for decades.
📍 December 7: Axiom solved 9 out of 12 problems on the Putnam Exam—the world's hardest undergrad math competition. That score would've ranked #1 out of ~4,000 participants and earned a Putnam Fellowship (top 5 in recent years). The median score is often zero.

In Carina's words:

“We are the underdog. 4 months old, 2 years late to the game, under 10 FTEs (recently grew to 17), and had 1:5 in funding and in valuation to our competitor.“

In the episode, Carina explains how her team already solved a 130-year-old problem about Lyapunov functions and disproved a 30-year-old graph theory conjecture. She also breaks down the self-improving loop behind her superintelligence thesis: the AI discovers new theorems, proves them in Lean (a 100% verifiable language), and learns from the results to get smarter with each iteration.

The wildest stat? She says the gap between herself (or eventually, you) and Terence Tao (largely regarded as world's greatest living mathematician) might just be “verifying technical lemmas.*” That's exactly what AI is now automating.

This isn't a proof of concept anymore. This is the starting point for reasoning.

Watch and/or Listen to the full episode on YouTube, Spotify, or Apple Podcasts

*P.S: if you don’t know what a “lemma” is, it’s a proven mathematical statement that acts as a minor, intermediate step or “helping theorem“ to support the proof of a larger, more significant result.

We also want to do a moral formal explainer of the episode for math noobs (like us!), breaking down all the cool ideas and terms Carina shared with us. We’ll pop it in the NL when it’s ready!

IN CASE YOU MISSED IT…

Here are some of our other recent favorite videos to check out:

⚡ The Diffusion Model That's 10x Faster Than GPT — While Pathway is building a brain-inspired alternative to transformers, Stefano Ermon of Inception Labs is betting on a completely different architecture: diffusion models for text. Instead of generating words one-by-one like ChatGPT, diffusion starts with noise and refines it into coherent answers—the same technique behind Stable Diffusion for images. Inception Labs raised $50M to commercialize it. Oh, and Google is working on this too. Mercury is already matching GPT-2 quality at 10x the speed. Watch this to hear Stefano explain why diffusion might “Pareto dominate“ transformers on cost vs. speed.
🧪 The $270M Chemistry AI Revolution — Nick Talken started Albert Invent in a backyard trailer lab. Now his AI is helping Kenvue (the $32B company behind Tylenol and Neutrogena) compress 3-month R&D projects into 2 days. He explains why ChatGPT can't solve chemistry problems on its own, and shares his vision for eventually “inventing the physical world with a laptop.“
🔧 The Hypervisor for AI Compute — Modular just raised $250M to break NVIDIA’s CUDA lock-in. Tim Davis (ex-Google Brain) built a new programming language (Mojo) that lets you write AI code once and run it on any GPU. His most controversial take kinda fits with Pathway’s thesis as well: “We're deploying AI at scale without understanding how it works.“ Perhaps the better model interpretability with something like BDH help solve this…
🤖 The Invisible Army Behind Every AI Model — Ever wondered who's actually teaching ChatGPT and Claude how to think? Caspar Eliot from Invisible Technologies reveals how his company has trained 80% of the world's top AI models. His take: “AI is not magic—it's just better predictive text.“ He explains why the Charlotte Hornets are using AI to scout every basketball game in America, why a former League of Legends pro became one of their best ML engineers, and the three mistakes that doom enterprise AI deployments. Spiciest take: “You could pause model development today and the consumer wouldn't notice for five years.” Oof. We feel that…

AND SOME OF OUR FAVORITE INTERVIEWS FROM 2025 TO REVISIT:

🧠 Mustafa Suleyman on “Seemingly Conscious AI” — The Microsoft AI CEO and DeepMind co-founder explains why AI that mimics consciousness is more dangerous than actual consciousness. His stark warning: “There is no pain network. It is hollow.“ With 700M people using ChatGPT weekly—73% as life coaches—this conversation couldn't be more urgent.
🏆 How OpenAI Beat Every Human Team at ICPC — OpenAI's Ahmed El-Kishky takes us behind the scenes of their historic coding competition win. Their AI solved all 12 problems—including ones no human team could crack. Two years ago, GPT-4 crashed the test computers. Wild detail: the model taught itself to write test cases—a strategy they never programmed.
🚀 Why AI Inference Is the Hidden Bottleneck — SambaNova's Kwasi Ankomah explains why running AI efficiently matters more than model size. Their chip delivers 700+ tokens per second on 90% less power. Scariest stat: AI agents use 10-20x more tokens than regular prompts—and we're about to deploy millions of them. P.S: looks like Intel’s about to buy SambaNova; pretty validating for their tech I’d say!
🔗 The Transformer Co-Author Who Says AI Is “Broken” — Illia Polosukhin co-wrote “Attention Is All You Need“—the paper that made ChatGPT possible. Now he's building “User-Owned AI“ on blockchain because he believes the centralized AI ecosystem he helped create is fundamentally broken (Ironically, this is a recurring theme amongst Attention is All You Need Co-Authors…). His fix? AI that's private, verifiable, and aligned with users instead of corporations.

Also: if you haven’t subscribed yet, please do! Click the image below to go to our channel and hit “subscribe” to get notified right when new videos go live.

We have a goal to hit 50K subscribers by the end of the year, and we’re already 22% of the way there!

If you like learning about AI, and already watch some of our videos, do us a favor and click here to subscribe today. There’s lots of video catnip coming your way this year…

Stay curious,

The Neuron Team

That’s all for today, for more AI treats, check out our website.

What'd you think of this podcast episode?

Pick an answer below, then tell us why with the "additional feedback" option.

🐾🐾🐾🐾🐾 Exactly what I wanted!!! More like this...

🐾🐾🐾 Pretty interesting, for what it was!

🐾 Not for me (and here's why).

P.P.S: Love the newsletter, but don’t want to receive these podcast announcement emails? Don’t unsubscribe — adjust your preferences to opt out of them here instead.