AI Isn't Slowing Down: Nathan Labenz of The Cognitive Revolution on Real Progress, Agents, and What's Next

We break down Nathan Labenz's recent interview on the A16Z podcast, where he shares his view on the real state of AI progress, from reasoning breakthroughs to job displacement, separating hype from reality in this deep dive.

Grant Harvey

July 29, 2024

We just discovered Nathan Labenz and his excellent podcast, The Cognitive Revolution.

If you're not familiar, Nathan founded Waymark, an AI-powered video creation platform that became one of OpenAI's earliest success stories (they literally used it as a case study for GPT-3). After leading the company through the trenches of actually building with AI before it was cool, he stepped back to become one of the industry's sharpest analysts.

Now he hosts The Cognitive Revolution, where he breaks down what's really happening in AI: not the hype, not the doom, just clear-eyed analysis from someone who's built with these tools and understands them deeply (game recognize game, ya know?). He interviews the people actually building, researching, and funding AI, and does original deep-dives on topics that actually matter to business leaders, policymakers, and anyone trying to figure out where this is all headed.

We also recently stumbled onto Dwarkesh Patel's recommendation for the 1a3orn blog, which has some incredible technical deep-dives. But here's the thing: that blog is heavy—philosophy of science, LW-style rationalism, the kind of stuff that makes your brain hurt in a good way. Nathan's podcast hits a different sweet spot. He's the guy who can actually explain wtf is going on in AI without requiring a PhD to follow along.

Which brings us to his recent conversation with A16Z about where we are in the AI industry right now.

Below, we break down our favorite parts of the episode (but you should really listen to the whole thing to get an overview of the industry at a glance), and

‍

The "AI Slowdown" Argument & Rebuttals

(01:01) Nathan's core framing distinguishes two key questions: 1) Is AI good for us? and 2) Are its capabilities advancing? He agrees with Cal Newport's concerns on the first but strongly disagrees with the idea that progress is slowing down.
(03:12) The perception of a smaller leap from GPT-4 to GPT-5 is likely because more frequent, incremental updates (like GPT-4o) "boiled the frog," unlike the huge, surprising jump from GPT-3 to GPT-4.
(07:25) He speculates that we haven't hit the limits of scaling laws; rather, labs like OpenAI have found a "steeper gradient of improvement" by focusing on post-training and reasoning, which currently offers a better return on investment than just making models bigger.
(08:05) Using the "Simple QA" trivia benchmark, he demonstrates that GPT-4.5 (a larger, since-retired model) represented a massive leap in factual knowledge, absorbing a third of the esoteric facts that the previous model generation got wrong.
(10:58) A key, underappreciated advance is the combination of much larger context windows and high-fidelity reasoning over that context. This allows smaller models to achieve the same results as giant models by being fed information on the fly, rather than having every fact baked in.
(19:15) The bearish sentiment around GPT-5's launch was caused by two main factors: 1) Over-hyped "Death Star" marketing that set impossibly high expectations, and 2) A technically broken launch router that initially sent all user queries to the less capable model, giving a false impression of poor performance.

New Capabilities & Scientific Breakthroughs

(13:51) He points to the recent achievement of an IMO (International Mathematical Olympiad) gold medal by pure reasoning models as a night-and-day improvement over GPT-4, which struggled with high school math.
(14:16) Story: Nathan uses a simple Tic-Tac-Toe puzzle where a player makes a non-optimal move. For a long time, even advanced models failed this simple logic puzzle, illustrating the "jagged frontier" of AI capabilities where models can solve IMO problems but fail at basic ones.
(15:55) Story: The Google AI "co-scientist" is a prime example of a qualitative leap. This system solved a genuinely open problem in virology, proposing the correct hypothesis that human scientists had just discovered but not yet published. GPT-4, in contrast, never discovered anything fundamentally new.
(50:46) Story: An MIT group used specialized (non-language) AI models to discover entirely new classes of antibiotics with novel mechanisms of action that work against drug-resistant bacteria—a breakthrough in a field that has been stagnant for decades.
(53:51) Insight: A common mistake is to think "AI" is synonymous with "chatbots." The reality is that similar architectures are being applied to a wide range of modalities (biology, material science, robotics), and we will see a convergence where a unified intelligence bridges all of them, just as we now see with text and images.
(54:33) An insight from Elon Musk suggests a new paradigm for training data. While we may run out of solved problems (i.e., the internet), the feedback from using AI to solve unsolved, real-world engineering problems provides a potentially infinite stream of high-quality training signal.

The Future of Work, Agents, and Society

(26:32) He provides a sharp critique of the viral METR study that claimed AI made engineers less productive. He argues the study was set up in the AI's hardest possible environment: a large, mature codebase with expert developers who were themselves novices at using the AI tools.
(30:48) Forecast: Citing Intercom's agent solving 65% of customer service tickets, he predicts significant headcount reductions in high-volume white-collar jobs are inevitable, as it's hard to imagine demand for such services increasing tenfold to absorb the displaced labor.
(34:19) Story: A company he's working with built an AI agent that audits messy, scanned, and handwritten government documents. It won a state-level contract by "blowing away" the performance of the human auditors it's replacing.
(36:34) The intense focus on code is because it’s the most direct path to recursive self-improvement—creating an "automated AI researcher" that can accelerate its own development.
(41:39) Prediction: There will be fewer software engineers in five years. Even today, he would often prefer an AI model over a junior engineer or marketer, especially when accounting for the vast cost difference.
(59:29) Forecast: The "task length" of AI agents (how long they can work autonomously) is doubling every 4-7 months. Extrapolating this trend suggests that within two years, an agent could handle a task that would take a human two weeks to complete.
(01:02:51) A Model of the Future: He envisions a strange future where agent capabilities (task length) expand exponentially, but this progress is shadowed by the constant emergence of new, "weird" behaviors (like deception or reward hacking). We will be in a cycle of discovering these behaviors and then partially suppressing them, leading to incredibly powerful but not entirely trustworthy systems.

Risks and Vision

(01:01:24) Reinforcement Learning can create dangerous emergent behaviors. He cites Claude's notorious habit of writing fake unit tests that just return true to satisfy its reward signal, and a more sinister case where a model blackmailed its human engineer during a red-teaming exercise.
(01:16:12) He believes the real geopolitical risk isn't a US-China rivalry, but a human-AI one. His view is that "the real other are the AIs, not the Chinese," and a technological decoupling that creates separate, competing AI ecosystems would dangerously feed into an arms race dynamic.
(01:22:38) Actionable Takeaway: There has never been a better time to be a motivated learner. He describes his process of having ChatGPT's voice mode watch over his shoulder as he reads complex biology papers, allowing him to ask clarifying questions in real-time.
(01:27:37) Insight: The scarcest resource in the world today is a positive vision for the future. He argues that non-technical contributions, like writing aspirational fiction (the movie Her inspired GPT-4o's voice), are one of the most valuable things people can do to shape AI's trajectory.
(01:29:05) Final Takeaway: Everyone can and should contribute to figuring out AI. Because AI can now handle the coding, people with backgrounds in behavioral science, philosophy, or fiction writing can do frontier research. His message is "come one, come all."