😸 5 Major AI Debates, Explained...

PLUS: China drops TWO trillion-parameter models, OpenAI to burn $115B

Welcome, humans.

There is not one but TWO hunger strikes going on outside AI offices atm (one in front of Anthropic and another in front of Google DeepMind). The goal? Stop “ever more powerful AI systems” from being developed.

Naturally, you give people an opening and nano banana, and they’re going to meme the situation.

But it’s an interesting debate to be had: should we slow down the AI race, or keep it going? Because even if you hunger strike outside of every US AI Lab, there’s still all the major labs in China… and based on this weekend’s news, Chinese labs are shipping beefy trillion-parameter models like there’s no tomorrow.

Then again, based on today’s AI models, I don’t think we need to be worried about out of control superintelligence anytime soon…

Here’s what happened in AI today:

We break down the top debate inside the AI industry atm.
OpenAI projected $115B cash burn through 2029, $80B above forecasts.
Authors sued Apple over pirated books; Anthropic settled similar case for $1.5B.
New OpenAI paper blamed AI hallucinations on current training process.

It was the best of AI times, it was the worst of AI times… the Top debates in AI atm, explained…

Anyone else’s head spinning try to make sense of the industry right now?

See, there are a couple key “tug-of-wars” happening in AI that deserve calling out:

First up: Is AI fueling the economy and simultaneously crushing the job market by slowing hiring, or is it unproductive with no ROI? (P.S: both can be true; not mutually exclusive).

The AI adoption bull case: Derek Thompson’s breakdown and interview w/ Stanford researchers shows how recent job data suggests AI's impact on hiring is “plausibly yes” a major factor.
The AI adoption bear case: Mike Judge’s take that even where AI is being adopted the most (software), its not actually making us more productive, as demonstrated by the lack of “shovelware” (low quality but easy to “shove out” software).

Then there’s the AI labs trying to make their own chips (you can call this rift “To NVIDIA, or not to NVIDIA?). Examples:

OpenAI is set to start mass production of its own AI chips with Broadcom in a $10B deal.
Google is no longer only using its own chips to train its AI but selling them to other cloud providers.
Amazon is requiring Anthropic (who just raised $13B) to train new Claude models on its Trainium chips (which also inspired this banger meme on X, and that fair warning, features lots of hilarious four letter words; SemiAnalysis has a great piece on this partnership too).

This is all likely to reduce reliance on NVIDIA, who is expensive AF (pardon our french acronyms s’il vous plait), and whose $4.6 trillion market cap stock dipped on the OpenAI / Broadcom news.

The newest rift: To use evals, or not to use evals? If you don’t know, AI “evals” are quality control tests that ensure your AI systems work correctly and fairly before they make decisions that affect our daily life (like job applications, medical diagnoses, or loan approvals), the same way we safety-test cars or inspect restaurants.

And yeah, the industry’s pretty split on whether or not they actually matter:

Swyx summarized the debate well in a single tweet, pointing out how all the major labs eval based on “vibes” and not rigid testing, while Julia Neagu dove deep into the topic if you’re actually considering building with AI.
This is important, as everyone these days is trying to spin up what’s called “RL environments” (places where AI can run around and learn things via reinforcement learning, like Prime Intellect / Mechanize)

Shreya Shankar wrote this great piece in defense of evals, arguing that teams claiming they don't need systematic evaluation are usually already doing it through dogfooding and error analysis—they just don't call it “evals”—and that dismissing formal evaluation frameworks particularly harms new AI builders who lack the domain expertise to rely on intuition alone.

Allie K. Miller added her own retort in favor of evals, explaining 6 reasons why you actually really do want evals (is this AI good enough for job augmentation and/or replacement? Does it create business value? Does it help us defeat hype? Can the AI actually complete a goal? And are the AI labs actually improving their AI… or just changing things?).

Then there's the age old (well, in AI years, anyway) philosophical split: To go open source, or not?

While American AI labs are locking down their top models tighter than Fort Knox, Chinese companies just dropped two unique trillion-parameter models (that’s trillions of numbers determining AI responses, btw!) in one weekend:

Alibaba's Qwen3-Max-Preview…
And Moonshot's Kimi K2-Instruct…

…both of which claim major performance increases (read more about them here).

What gives?? In a nutshell: Chinese labs are moving fast and breaking things while US labs are moving slow(er) and fixing things (like GPT-5).

Paging Google DeepMind… we need Gemini 3 Pro, pronto! We need a Bat signal equivalent for Logan Kilpatrick and Demis Hassabis… maybe a giant (nano) banana??

Now, the new Qwen Max is actually closed behind their API atm… their first non-open-model (that we know of, anyway).

As the adage goes: open source when you're behind, closed source when you're ahead.

So are Chinese labs feeling confident they're gaining on US peers? Or is this just the reality of unleashing something so big: ya kinda need to actually monetize it??

Our take: This is all probably signs of a healthy and growing ecosystem. There’s lots of debate, there’s lots of opportunity, there’s lots to still be learned. As Mike Knoop of ARC-AGI says, there’s only really been two major breakthroughs in language models: the original transformer discovery (tech that powers ChatGPT), and chain of thought reasoning (the “thinking” mode). Translation? We’re still early, ppl.

FROM OUR PARTNERS

Saddle Up: Warp launches Warp Code

Building software with AI can feel like wrangling cattle: agents get stuck going in circles or bolt in the wrong direction.

Warp Code is here to change that.

Warp is the best way to code with AI agents from prompt to production. With Warp Code, new features like a built-in editor and code review join existing favorites like MCP and CLI tools, all in one place.

Why Warp?

Top-rated agent: #1 on Terminal-bench, top-3 on SWE-bench Verified
All the top models, one subscription: Opus, Sonnet, GPT-5, and more
Built-in code editor, code review, file tree, and syntax highlighting… you won’t miss your IDE

See why Warp is trusted by 700k devs and 56% of Fortune 500 engineering teams.

👉 Try for free or get Pro for $1 with code NEURON1

Prompt Tip of the Day.

Okay, so I know last week we told y’all not to use ChatGPT like Google. HOWEVER… there’s two killer features of using AI you should probably know (that make it, well, kinda like Google):

Use thinking before pretty much every question you ask. Claude with “extended thinking” does this automatically, but in ChatGPT, you have to select “GPT-5 Thinking” as your default model (bookmark this link). You'll get better answers for basically anything you ask, and it only slows response down a little bit (GPT-5 thinks ~2 minutes on average if using web search, while Claude thinks ~10-15 seconds).
Use web search, like, for any and all questions you ask that requires real time info. To do this, toggle on the “web search” feature (bookmark this link to toggle it on in GPT-5). With GPT-5 Thinking, it should automatically select web search if you say “look this up” or “using web search.”

If you do both of these things for most things you ask, you’re going to raise the average quality of the answer you get by a factor of (my completely vibed-based eval) 10x.

P.S: you don’t have to take our word for it—patron saint of AI engineering, Simon Willison, also says GPT-5 Thinking is good at search, calling it his research goblin lol.

Treats to Try.

*Asterisk = from our partners. Advertise in The Neuron here.

*Murf AI turns your text into human-like speech in 200+ voices, so you can narrate presentations or create audiobooks without recording anything yourself.
Sierra creates branded customer service agents that handle your support tickets, returns, and subscription changes 24/7 (raised $350M).
Baseten powers the infrastructure behind your AI apps, automatically scaling to handle millions of users without you managing servers (raised $150M).
Incerto automatically detects issues with your database, optimizes queries, and turns your text descriptions into SQL solutions with 90% accuracy.
Oasis 2.0 from Decart (who makes Mirage, which we covered last week) transforms your Minecraft world in real time, letting you instantly change environments from regular terrain to Swiss Alps or Burning Man while you play.
Earthly Insight gives you ChatGPT, Gemini, and Claude in one app while automatically donating 33% of your subscription to rewilding projects, so your daily AI tasks directly fund ecosystem restoration.

Around the Horn.

The Bessemer Cloud 100 Benchmark Report is out, revealing that top private cloud companies have crossed a historic $1.1 trillion in aggregate value (due to AI, of course!)—lots of great AI-related findings in the full report!
OpenAI projected a $115B cash burn through 2029, $80B higher than earlier forecasts, as it scales compute and data centers.
Mark Zuckerberg was caught on hot mic telling Trump he wasn't sure how much AI investment to promise for the U.S.
Authors sued Apple over alleged use of pirated books to train Apple Intelligence, while Anthropic agreed to pay $1.5B to settle a similar class action.
OpenAI published a paper arguing that large language models like ChatGPT hallucinate because training rewards guessing over admitting uncertainty.
Matt Berman had a great live show on Friday; check out the convo on universal basic income (which the U.S. state of Alaska has apparently had in a sense for 40+ years) and the dangers of AI psychosis specifically!