Google just announced a strategic partnership with Sakana AI, the Tokyo-based startup founded by former Google researchers. The deal includes a financial investment from Google to strengthen collaboration across three key areas: accelerating innovation using Google's Gemini and Gemma models, enhancing AI product quality through direct feedback loops, and deploying AI solutions in mission-critical industries like finance and government.
On paper, it looks like a standard big-tech-backs-promising-startup story. But dig deeper and this partnership reveals something far more interesting about where AI research might be heading—and why even the architects of today's dominant paradigm are looking for escape routes.
"I founded Sakana AI after my time at Google," wrote David Ha, Sakana's CEO, on X, "so it is incredibly meaningful to be able to partner with them now. It feels like a special connection to be working together again to advance the AI ecosystem in Japan."
That "special connection" goes deeper than corporate nostalgia. Ha previously led the Google Brain Research team in Japan (and has a storied research career to boot). And his co-founder, Llion Jones, is one of the eight authors of the 2017 "Attention Is All You Need" paper—the research that gave us the transformer architecture powering GPT, Claude, Gemini, and nearly every frontier AI model in existence.
Now Jones is publicly declaring he's "absolutely sick of transformers." And Google is investing in his company anyway.
What's going on here? Below we'll break it all down.
- Who Is Sakana AI?
- What Does Google DeepMind CEO Demis Hassabis Think Is Missing?
- So Why Would Google Partner With Sakana, Really?
- What This Partnership Could Produce
- What Jerry Tworek's Departure From OpenAI Tells Us
- The Bigger Picture: Are We Approaching a Paradigm Shift?
- So is Jerry Right About The Paradigm Lock-In vs. Exploration Problem?
Who Is Sakana AI?
Sakana AI was founded in August 2023 by David Ha, Llion Jones, and Ren Ito. The company's logo—a red fish swimming against the current—captures their mission: while the rest of the industry scales transformers ever larger, Sakana is exploring fundamentally different approaches inspired by nature.
"Ants move around and dynamically form a bridge by themselves, which might not be the strongest bridge, but they can do it right away and adapt to the environments," Ha told Bloomberg. "I think this sort of adaptation is one of the very powerful concepts that we see in natural algorithms."
The company has raised over $350M across seed, Series A, and Series B rounds, reaching a valuation of approximately $2.65B. Investors include Khosla Ventures, NVIDIA, Lux Capital, NTT, KDDI, Sony, and major Japanese financial institutions like Mitsubishi UFJ Financial Group (MUFG) and Daiwa Securities Group.
But Sakana isn't just another well-funded AI lab chasing the same benchmarks. Their research portfolio represents a deliberate bet that the current paradigm has fundamental limitations—and that biological inspiration might offer a way forward.
The AI Scientist
Perhaps Sakana's most ambitious project is The AI Scientist, a system designed to automate the entire research lifecycle: from generating hypotheses to designing experiments to writing and even peer-reviewing scientific papers.
The concept is audacious: what if AI could conduct AI research, potentially accelerating the pace of discovery itself?
The results have been mixed. Independent evaluations found critical shortcomings in novelty assessment and experimental execution. But Sakana's AI Scientist-v2 achieved something no AI system had done before: generating a paper that passed peer review at an ICLR 2025 workshop without any human modifications. (Sakana withdrew the paper post-acceptance in the interest of transparency.)
The system can produce a full research paper for approximately $6-15 with about 3.5 hours of human involvement—far outpacing traditional research timelines, even if the quality resembles what evaluators called "a rushed undergraduate paper."
Related tangent: Do you work in AI/ML and/or are you a grad student or teacher? This might be useful for you.
- Zechen Zhang (shared on X) built this Claude skill (ML Paper Writing Skill) which drafts machine learning papers from your research repo using best practices from top researchers—explores your code, applies frameworks like "What/Why/So What," and verifies citations via Semantic Scholar to prevent hallucinations.
- It was built by distilling research writing advice from leading ML researchers including Neel Nanda, Andrej Karpathy, Sebastian Farquhar, and others.
- It's basically an elite research writing partner that knows all the principles these experts teach—narrative structures, abstract formulas, clarity tips, and common pitfalls.
- What makes it valuable: It includes templates for major ML conferences (ICML, NeurIPS, ICLR, ACL, AAAI, COLM), conference checklists, reviewer evaluation criteria, and crucially, citation verification that prevents Claude from hallucinating papers by checking Semantic Scholar and CrossRef APIs.
- If there's no DOI, it marks citations as [PLACEHOLDER] for you to verify manually.
- You work with it iteratively in your IDE, pointing it at your research repo to generate first drafts that beat staring at a blank page. Check it out!
Continuous Thought Machines
More fundamentally novel is Sakana's work on the Continuous Thought Machine (CTM), a new neural network architecture that incorporates timing information at the neuron level; something conspicuously absent from transformers and most modern AI architectures.
The insight is biological: in real brains, the precise timing of when neurons fire relative to each other appears critical for cognition. Traditional artificial neural networks ignore this timing information entirely.
"At Sakana AI we decided to rethink an important feature at the heart of cognition: time," the company wrote in their CTM announcement. "Despite modern AI being based on the brain as 'artificial neural networks,' the overlap between AI research and neuroscience is surprisingly thin even today."
The CTM's results are striking for their interpretability. When solving mazes, the model's attention patterns literally trace the path through the maze; a human-like approach that emerges naturally without explicit programming. When classifying images, it examines different parts of the scene step-by-step before making decisions, with accuracy improving the longer it "thinks."
For a deeper technical breakdown, see our explainer on Continuous Thought Machines.
ALE-Agent and Beyond
Sakana's ALE-Agent, which was just released in the beginning of January 2026, tackled hard optimization problems and achieved 21st place out of 1,000 human participants in a live AtCoder Heuristic Competition in December of last year. This was a turning point for AI discovery of solutions to NP-hard problems with real-world applications.
The company has also developed evolutionary model merging techniques for combining existing models into better ones, AB-MCTS algorithms for enabling multiple frontier models to cooperate, and energy-efficient language models designed for edge devices, and more.
It's a diverse portfolio unified by a single conviction: the transformer architecture, for all its success, isn't the end of the story.
What Does Google DeepMind CEO Demis Hassabis Think Is Missing?
The timing of Google's investment in Sakana becomes more interesting when you consider what Google DeepMind CEO Demis Hassabis said just recently about what's still needed for AGI.
In a January 2025 interview at Davos on the Big Technology podcast with Alex Kantrowitz, Hassabis was asked about the limitations of current large language models:
"I'm definitely a subscriber to the idea that maybe we need one or two more big breakthroughs before we'll get to AGI. And I think they're along the lines of things like continual learning, better memory, longer context windows or perhaps more efficient context windows would be the right way to say it. So don't store everything, just store the important things. That's what the brain does."
This is striking. The CEO of the company that invented the transformer is publicly stating that transformers alone won't get us to AGI, and specifically calling out the exact problems Sakana's research addresses.
To Hassabis, there's a specific framing at the crux of his thinking. Hassabis says:
"No matter what camp you're in, we're going to need large foundation models as the key component of the final AGI systems—of that I'm sure. So I don't subscribe to someone like Yann LeCun who thinks they're just sort of some kind of dead end. I think the only debate in my mind is: are they a key component or the only component."
In our opinion, this is the strategic logic of the Sakana investment in one sentence. Google is betting both ways:
- Large language models (LLMs) as a key component: As part of the deal, Sakana will use Gemini and Gemma extensively.
- But LLMs as not the only component: Perhaps Sakana's novel architectures and new ones still to be developed could be part of that solution or lead to the "one or two more breakthroughs" Hassabis says we need.
For example, later in the interview Hassabis talks about systems that can plan over "very long time horizons." CTM's step-by-step timing-based reasoning is one approach to that sequential planning problem. We're sure Sakana has other ideas the team is willing to explore as well.
Watch the full interview here; IMO, it's one of Demis' best from a marathon week of interviews he did at Davos.
The Goldfish Brain Problem
This issue cuts to the heart of why continual learning matters. Today's models are frozen after training; no matter how much they "learn" in a conversation, that knowledge evaporates the moment you close the window.
Hassabis went further, arguing that learning isn't just one capability among many: it's the defining feature of intelligence itself:
"Learning is a critical part of AGI. It's actually almost the defining feature. When we say 'general' we mean general learning. Can it learn new knowledge and can it learn across any domain? That's the general part. So for me, learning is synonymous with intelligence and always has been."
If learning is synonymous with intelligence, and current models can't actually learn from their interactions, then by Hassabis's own definition, we're missing something fundamental.
Above and throughout the interview, he lists exactly what those breakthroughs might be:
- Continual learning.
- Better/efficient memory ("store the important things—that's what the brain does").
- Long-term reasoning and planning (mentioned later in the interview)
- Hybrid/neurosymbolic systems (we'll get into this below).
So could this investment in Sakana help with any of these?
Hybrid Systems and Nature-Inspired Approaches
Interestingly, Hassabis expressed enthusiasm for hybrid approaches that combine neural networks with other techniques:
"I'm very interested in hybrid systems... or neurosymbolic, sometimes people call them. AlphaFold, Alpha Go are examples of that. So some of our most important work combines neural networks and deep learning with things like Monte Carlo tree search."
He also mentioned Google's work "using LLMs with things like evolutionary methods—Alpha Evolve—to actually go and discover new knowledge."
For hybrid architectures, AlphaGeometry 2 is explicitly "a neuro-symbolic hybrid system"—combining a Gemini-based language model with a symbolic geometry engine that's 100x faster than its predecessor. It solved 83% of all historical IMO geometry problems before the 2024 competition, compared to 53% for the previous version.
This type of structure is also precisely Sakana's territory. Their evolutionary model merging, AB-MCTS for multi-model cooperation, and nature-inspired architectures all represent the kind of hybrid, biologically-inspired approaches Hassabis describes.
Previous Attempts at Continual Learning
What has Google actually done to attempt to solve this problem?
"We've done some work on this in the past with things like Alpha Zero," Hassabis explained "That learned from scratch... Alpha Go Zero also learned on top of the knowledge it already had. So we've done it in much narrower domains. Games are obviously a lot easier than the messy real world. So it remains to be seen if those kinds of techniques will really scale and generalize to the real world."
The challenge is clear: techniques that work beautifully in constrained game environments don't automatically transfer to the complexity of real-world problems. Sakana's work on the AI Scientist and ALE-Agent represents attempts to bridge that gap—applying evolutionary and self-improving techniques to open-ended scientific and engineering challenges.
Interestingly, Google has done its own work to try to solve continual learning. Here are a few of their previous attempts:
Google's Decade-Long Quest to Solve These Problems
Hassabis wasn't speaking abstractly about what's missing. Google DeepMind has spent nearly a decade publishing foundational research on exactly these problems. Looking back, some of these papers may prove as consequential as the 2017 transformer paper itself—we just won't know until hindsight clarifies which approaches actually scale.
Continual Learning: The Forgetting Problem
The goldfish brain problem isn't new to DeepMind. In 2017, a team including Hassabis and Raia Hadsell (now VP of Research leading DeepMind's Frontier AI unit) published Elastic Weight Consolidation (EWC), which selectively slows learning on neural network weights that are important to previously learned tasks. The insight was borrowed from neuroscience: mammalian brains consolidate important memories during sleep, protecting them from being overwritten.
EWC worked—an agent could learn to play one Atari game, then another, without completely forgetting the first. But it didn't reach the performance of separate networks trained on each task individually.
Progressive Neural Networks (2016) took a different approach: add new network "columns" for each task, freeze the old ones, and use lateral connections for knowledge transfer. The system was immune to forgetting by design, but parameters grew with every new task.
Progress & Compress (2018) combined both ideas: alternating between a "progress" phase (learn the new task) and a "compress" phase (distill knowledge into a stable knowledge base). It mimicked day/night learning cycles in the brain and achieved constant parameter count—but again, only in narrow domains like Atari games and handwritten alphabets.
The Titans Paper: Memory That Learns at Test Time
The most recent major push came in December 2024 with Titans, which introduced neural long-term memory modules that can actually learn to memorize at test time—not just during training. The key insight: separate short-term memory (attention mechanisms) from long-term memory (a neural network that updates its own weights based on how "surprising" new information is).
The results were striking: Titans scaled to 2 million+ token context windows and outperformed GPT-4 on the BABILong benchmark for long-context reasoning. It's the most direct attempt yet to solve Hassabis's complaint that models "store everything instead of just the important things."
Nested Learning and Hope: Self-Modifying Architectures
At NeurIPS 2025, Google Research published Nested Learning, which reframes neural networks as systems of interconnected optimization problems running at different timescales. The accompanying "Hope" architecture creates what they call "continuum memory systems"—essentially extending Titans from two levels of memory to an unbounded hierarchy of parameter updates.
The insight sounds abstract but has profound implications: transformers and memory modules are fundamentally the same thing (linear layers), just with different update frequencies. By controlling those frequencies, you can build systems that learn at multiple timescales simultaneously—the way humans do.
Long-Term Reasoning: Beyond Chat Completion
For planning and reasoning, DeepMind's AlphaProof (published in Nature, November 2025) combines a Gemini-based language model with the AlphaZero reinforcement learning algorithm to prove mathematical theorems in the formal language Lean. At the 2024 International Mathematical Olympiad, it solved the competition's hardest problem—one that only 5 of 600+ human competitors got full marks on.
The system uses "Test-Time RL"—generating and learning from millions of problem variants at inference time. It's the clearest demonstration yet that AI can do deep, multi-step reasoning, not just pattern matching. But it required problems to be manually translated into formal mathematical language, and took up to three days per solution. The messy real world doesn't come with Lean specifications.
For safer long-term planning, DeepMind's MONA (Myopic Optimization with Non-myopic Approval) addresses a subtle problem: as AI systems get better at planning, they might learn sophisticated strategies that humans can't understand well enough to evaluate. MONA restricts agents to only learn plans that human overseers explicitly approve in advance—trading some capability for interpretability.
Perhaps the most consequential research for AGI might be Google's Genie, a foundation "world model" trained on unlabeled internet videos to generate interactive environments from any prompt—text, images, or sketches. Genie 2 (December 2024) extended this to photorealistic 3D worlds, and Genie 3 (August 2025) achieved real-time interaction at 720p and 24 fps.
DeepMind explicitly describes Genie as "a key stepping stone on the path to AGI" because it enables training AI agents in an unlimited curriculum of simulated environments. It's the infrastructure for the kind of long-term planning Hassabis describes—understanding causality, physics, and consequences before acting in the real world.
World Models for Long-Term Planning
Hassabis also discussed video generation models with Alex Kantrowitz as early forms of "world models", or systems that understand the mechanics and causality of the physical world.
"You can think of a video model that can generate you 10 seconds, 20 seconds of a realistic scene. It's sort of a model of the physical world, intuitive physics... It sort of intuitively understood how liquids and objects behave in the world."
Why does this matter for AGI? Planning:
"That would be I think essential for AGI because that would allow these systems to plan long-term in the real world... over perhaps very long time horizons, which of course we as humans can do. You know, 'I'll spend four years getting a degree so that I have more qualifications so that in 10 years I'll have a better job.' These are very long-term plans that we all do quite effortlessly, and at the moment these systems, we still don't know how to do."
Sakana's Continuous Thought Machine, with its step-by-step reasoning and timing-based processing, represents one approach to this problem; allowing models to "think" through sequences rather than producing answers in a single forward pass. This is of course one attempt to solve this problem, and many others are in development or actively being researched.
The pattern across all of Google's previous papers on this topic: each technique worked in narrow domains (Atari, math competitions, generated 3D worlds) but none scaled to the "messy real world" Hassabis keeps referencing. They're proofs of concept, not production systems. The transformer paper was also a proof of concept in 2017, as it took years before the industry realized what it enabled.
The Scaling vs. Innovation Debate
When asked whether pre-training scaling alone could solve these problems, Hassabis hedged—then came down firmly on one side:
"It remains to be seen whether just sort of scaling up existing ideas and technologies will be enough to do that. Or we need one or two more really big insightful innovations. I'm probably—if you were to push me—I would be in the latter camp."
He did qualify that foundation models will remain "a key component of the final AGI systems—of that I'm sure." But the debate, in his mind, is whether they're "a key component or the only component."
Translation: Google's own CEO thinks we need new paradigms (at least one or two), not just bigger transformers. And now Google is investing in a company founded by people who helped invent transformers but have publicly declared they're "absolutely sick" of them.
So Why Would Google Partner With Sakana, Really?
The strategic logic becomes clearer when you connect the dots:
1. Hedging Against Paradigm Shift
Google's core business still depends on search and advertising, but AI threatens to disrupt both. If the transformer paradigm hits fundamental limits—as several of its own creators suggest—Google needs exposure to whatever comes next. Sakana represents a relatively low-cost option on post-transformer research.
2. The Japan Connection
On the Big Technology pod, Hassabis specifically mentioned Google's partnership with Warby Parker, Gentle Monster, and Samsung for smart glasses, expected to launch by summer 2025. Japan is a critical market for consumer electronics, and Sakana's deep connections there (with MUFG, Daiwa, Sony, NTT, and the Japanese government) give Google a local partner for deploying AI in mission-critical sectors.
From Sakana's announcement: "For regulated sectors requiring the highest levels of security and data sovereignty—such as financial institutions and government organizations—we will deploy solutions utilizing Google's platform to promote AI adoption."
Win win on the commercial front, I'd imagine.
3. Talent Pipeline
Both David Ha and Llion Jones came from Google. By maintaining a relationship with Sakana, Google keeps connections to a cohort of researchers who have demonstrated both technical depth and willingness to explore unconventional directions. If Sakana's research pans out, Google has a path to bring that talent—and technology—back in-house.
4. The AI Scientist as Force Multiplier
Sakana's work on automating scientific discovery has obvious implications for AI research itself. If AI can accelerate the pace of discovering new AI techniques, whoever gets there first gains a compounding advantage. Google's Gemini and Gemma models powering Sakana's AI Scientist creates a feedback loop where Google's infrastructure accelerates research that could eventually benefit Google's models.
From Sakana's announcement: "Our existing breakthroughs, such as The AI Scientist and our ALE-Agent, have already demonstrated the power of utilizing these models, and we look forward to further pushing the limits of frontier models."
5. Addressing Hassabis's Concerns
Hassabis himself identified continual learning, better memory, and world models as the key missing pieces for AGI. Sakana's research portfolio (Continuous Thought Machines, evolutionary model merging, nature-inspired adaptation) directly addresses these gaps.
What This Partnership Could Produce
The announcement specifically highlights three collaboration areas:
- Accelerating Innovation: Sakana will use Gemini and Gemma to push further on automated scientific discovery and agentic AI. The AI Scientist-v2 already demonstrated the power of frontier models for research automation; with deeper integration into Google's model ecosystem, expect more ambitious experiments.
- Enhancing AI Product Quality: Sakana will provide "direct feedback on Google's AI ecosystem" based on their deployment experience. This suggests Sakana becomes a kind of advanced beta tester for Google's models, identifying limitations and edge cases that emerge from pushing models in novel directions.
- Mission-Critical Deployment: For financial institutions and government organizations requiring high security and data sovereignty, Sakana will deploy solutions on Google's platform. This is where Sakana's Japan connections become strategically valuable—it's a beachhead for Google Cloud in sectors that have historically been cautious about foreign tech platforms.
More speculatively: if Continuous Thought Machines or similar architectures prove valuable, we might see elements of Sakana's research incorporated into future Gemini versions. Google has a history of acquiring or licensing breakthrough research (see: DeepMind, Character.AI). This partnership could be the first step toward deeper integration.
This brings us to another major AI researcher who had some interesting thoughts to share this week...
What Jerry Tworek's Departure From OpenAI Tells Us
The Google-Sakana partnership takes on additional significance in light of another recent development: Jerry Tworek's departure from OpenAI after seven years.
Tworek led or contributed to many of OpenAI's biggest projects, including the reasoning models (Strawberry/o1) that defined the company's recent trajectory. His reflections on leaving offer a window into the broader dynamics shaping frontier AI research.
The Problem With "Doing the Same Thing"
"I am definitely extremely extremely sad that all the AI labs are trying to do the same thing as OpenAI is doing," Tworek said in a recent interview on the Core Memory podcast. "Open AI is clearly a very successful company that got a lot of things right... But how many companies doing exactly the same thing do we need in the world?"
He continued:
"We have, I don't know, five pretty serious scaling AI companies right now doing exactly the same recipe and trying to build maybe maybe slightly differentiated product on top of the same technology. And maybe it's the right thing to do, but I would I would like there to be a little bit more diversity, a little bit more differences in the models."
This echoes Jones's concern about research "narrowing" despite unprecedented resources. The competitive dynamics of the AI race may be pushing labs toward consensus rather than exploration.
Watch the full interview here:
The Hard Part: From 1 to 100
Tworek articulated what might be the key challenge for novel AI research: taking ideas from proof-of-concept to frontier scale.
"A lot of academic research has done that where you take, you create some completely fresh idea, you show that it somewhat works and then you put it in," he explained.
"What I think I really specialize in with my team at OpenAI and what I think we did a really really great job is taking research from one to 100. Taking new ideas that are different that we haven't done before but have been a little bit proven and figure out how to get them to work reliably at scale."
This is precisely where the Google-Sakana partnership makes strategic sense. Sakana has innovative research (Continuous Thought Machines, evolutionary model merging, the AI Scientist) but limited compute and infrastructure compared to the hyperscalers. Google has world-class infrastructure and, as Demis Hassabis says, a deep bench of research talent; but you can always have more, right?
"Training frontier models combined with a lot of other things that also are involved there," Tworek said. "It can take you years if you don't do it well or it can take you months if you have a good algorithm."
By partnering with Google, Sakana gains access to Gemini, Gemma, and Google Cloud infrastructure to scale their research. And Google gets a window into what might be the next paradigm... and a talented team willing to explore ideas that don't fit neatly into org charts optimized for the current scaling race.
Why High Conviction Bets Matter
Tworek was candid about why he left OpenAI: his sense of "what the path forward is diverged at least in some meaningful enough way" from the company's chosen direction.
"If you do just one project it goes much much faster because you can focus more you can have conviction more," he observed. "And if it doesn't succeed you are kind of screwed. If it does, you have the best model in the world."
While frontier labs hedge their bets across multiple projects, in a way this partnership allows Sakana to hedge its own bets by using Gemini models for its more commercial endeavors in Japan. Could this open the Sakana research team up to bet moreheavily on nature-inspired approaches and post-transformer architectures? Google's investment suggests at least some inside the company see this as a hedge worth taking; but remains to be seen. We'd love to talk to both parties to find out more about what they are most excited about doing together!
The Bigger Picture: Are We Approaching a Paradigm Shift?
Perhaps the most telling part of Hassabis's Davos interview came at the very end, when asked about what happens when AI masters human knowledge the way AlphaGo mastered the game of Go—and then you "let it loose":
"That's what to me would be the AGI moment... Then it will discover a new superconductor—room temperature superconductor that's possible in the laws of physics but we just haven't found that needle in the haystack—or a new source of energy, a new way to build optimal batteries. I think all of those things will become possible."
This vision, AI that doesn't just replicate human knowledge but goes beyond it into uncharted territory, is what makes Sakana's research agenda so strategically interesting.
Hassabis connected this directly to his AlphaFold work:
"AlphaFold... we solved all the protein structures that are kind of known to science. How have we done that? Well, because only a certain number of those in the almost infinite possibilities of protein structures are stable—and those are the ones you've got to find. So you've got to understand that topology, that information topology, and follow it. And then suddenly these problems that seem to be intractable... actually become very tractable if you understand the energy landscape or the information landscape around that."
This is exactly what Sakana's ALE-Agent demonstrated when it independently discovered "Virtual Power"; a heuristic the contest designers hadn't anticipated. The agent moved beyond applying known techniques and navigated an information landscape to find novel solutions.
Which of Google's papers on these problems will look, in hindsight, like the 2017 transformer moment? The honest answer: we don't know yet. But several candidates stand out:
- Titans for memory: If neural long-term memory that learns at test time actually generalizes, it solves the goldfish brain problem directly.
- Nested Learning for continual learning: If self-modifying architectures with unbounded memory hierarchies work at scale, it's a fundamental shift in how we think about model architecture.
- Genie for world models: If AI agents can be trained in unlimited simulated environments, it opens the path to the kind of "let it loose" moment Hassabis described for AlphaGo-style discovery.
- AlphaProof for reasoning: If formal verification can scale beyond mathematics, it provides the grounding that prevents hallucination in high-stakes domains.
The Google-Sakana partnership fits into this picture as a hedge. Sakana's Continuous Thought Machine, with its neuroscience-inspired timing mechanisms, represents yet another approach to these problems—one developed outside Google's internal research culture. If it works, Google has a relationship. If their internal approaches work instead, Google still benefits.
So is Jerry Right About The Paradigm Lock-In vs. Exploration Problem?
This is normal in the history of technology. The people who create breakthrough innovations often have the clearest view of their limitations. The inventors of the transistor weren't surprised when integrated circuits emerged. The pioneers of packet switching anticipated the internet's evolution.
But the AI industry has a particularly intense form of paradigm lock-in right now. Billions of dollars are flowing into transformer-based models. GPU clusters are being built at nation-state scale. Career incentives reward incremental improvements on established benchmarks. The infrastructure, economics, and social dynamics all push toward more of the same.
Against this backdrop, the fact that one of the transformer's co-authors is "absolutely sick of transformers"—and that Google is investing in his company anyway—is a signal worth paying attention to.
"The age of research," as Ilya Sutskever framed it on the Dwarkesh Patel pod, may not be as binary as he suggested. But as Tworek observed, "there's much more to explore in the world of AI and machine learning than is currently being explored. We settled on a transformer architecture series six years ago and people have been scaling transformers for quite a while and it's going pretty well... but is that it?"
The honest answer, even from the people who built the current paradigm: probably not.
For instance, consider what would happen if, in hindsight, any of these papers below have their own transformer-like moment?
- Titans for memory: If neural long-term memory that learns at test time actually generalizes, it solves the goldfish brain problem directly.
- Nested Learning for continual learning: If self-modifying architectures with unbounded memory hierarchies work at scale, it's a fundamental shift in how we think about model architecture.
- Genie for world models: If AI agents can be trained in unlimited simulated environments, it opens the path to the kind of "let it loose" moment Hassabis described for AlphaGo-style discovery.
- AlphaProof for reasoning: If formal verification can scale beyond mathematics, it provides the grounding that prevents hallucination in high-stakes domains.
The Google-Sakana partnership fits into this picture as a hedge. Sakana's ideas represent yet another approach to these problems; one developed outside of Google. If it works, Google has a relationship. If Google's internal approaches work instead, Google still benefits from Sakana using their models and contributing to the larger corpus of research.
So Google's bet on Sakana isn't a rejection of transformers, per say. After all, Sakana will use Gemini and Gemma extensively. It's a recognition that the most interesting AI research may come from people willing to, ahem, swim against the current.
Google is also betting big on AI for science, and Sakana's AI Scientist already generates peer-reviewed research for $15 a paper. If the road to a true AI scientist runs through hybrid approaches like what Sakana is building, Google just bought itself a seat at that table too.
For more on Sakana's Continuous Thought Machine research, see our explainer on CTMs and Sakana's interactive report.
Watch Demis Hassabis's full Davos interview and Jerry Tworek's Core Memory podcast interview for the full context on the future of AI research.