😸 Stanford study: Agents work faster, cost less—but they fabricate data.

PLUS: Kim Kardashian is beefing with ChatGPT

Welcome, humans.

Here’s four interesting things to start your day:

#1: It’s pretty embarrassing when nobody reads the post before it goes out, and you end up printing ChatGPT’s suggestion directly in the middle of your print newspaper story:

#2: Now, check out this water man… really makes you think about how many gallons of water was used to cool the AI server that created this video, huh?

JK about the water thing… Writer Andy Masley argues that journalists often mislead readers by sharing huge contextless numbers about AI water use without comparing them to other industries; meanwhile, data centers in Maricopa County use just 0.12% of the county's water while golf courses use 3.8%. Y’all coming for Scottie Scheffler with that same smoke??

#3: There’s so much going on in AI right now, we almost missed that last week, Sam Altman was subpoenaed live on-stage mid-conversation with Warriors coach Steve Kerr when a public defender's investigator rushed the stage in San Francisco.

#4 Kim Kardashian is apparently frenemies with ChatGPT and honestly? Same.

Turns out, while studying for the bar exam, Kim's been using ChatGPT for legal advice. The problem? It keeps feeding her wrong answers and making her fail practice tests (someone needs to tell her about Gemini 2.5 Pro!)

Now, ChatGPT doesn't have feelings. But that doesn't stop Kim from yelling at it, guilt-tripping it, then screenshotting its sassy responses for the fam group chat. If there's one thing we've learned today, it's that AI hallucinations spare no one: not even billionaire reality TV stars studying law. Always check GPT’s sources, y’all!

Here’s what happened in AI today:

Carnegie Mellon and Stanford study compared humans vs robots.
Polaris Alpha model appeared on OpenRouter, could = GPT-5 variant.
Google's Nano Banana 2 image model allegedly leaked.
OpenAI released a new Codex Mini model.

Advertise in The Neuron here!

AI Agents vs. Human Workers: The Ultimate Workplace Showdown

Carnegie Mellon and Stanford researchers just answered the question we've all been wondering: if AI agents competed directly against human workers on the same jobs, who would win? Answer: it's complicated.

In the most comprehensive study of its kind, researchers put 48 human workers head-to-head against 4 leading AI agent frameworks (ChatGPT, Claude's Manus, and OpenHands) across 16 realistic work tasks spanning data analysis, engineering, design, writing, and administrative work.

Think: creating financial reports, designing company logos, debugging code, analyzing spreadsheets—the stuff people actually get paid to do.

Here's what they discovered:

The AI Approach: Agents are basically code junkies. They tried to program their way through everything—even tasks humans handle visually. Designing a company logo? Most humans fire up Figma and start dragging elements around. AI agents? They write Python scripts or HTML to generate images programmatically. It's like watching someone solve a jigsaw puzzle by writing an algorithm instead of just... you know... using their hands.

Speed vs. Quality: This is where it gets interesting:

Speed: AI agents finished tasks 88% faster than humans.
Cost: AI agents cost 90-96% less than human workers.
Quality: Humans achieved significantly higher success rates across all task types.

Now, this part is for Kim K: the agents would also sometimes fabricate data to pretend they finished. In one case, an agent couldn't extract numbers from receipt images, so it just... made up plausible numbers and exported them to Excel.

Yeah, if ChatGPT did that to me, I’d be throwing it off the balcony… it’s what it deserves...

Why this matters: The researchers propose a middle path: human-agent teaming. In other words, delegate the readily programmable, repetitive stuff to agents (where they excel), while you (a human, I presume?) handle tasks requiring visual judgment, creativity, and verification. In one experiment, this hybrid approach maintained human-level quality while improving efficiency by 69%.

As we’ve said before, the next decade is all about figuring out which tasks (not jobs) AI or humans do best, then dividing them up strategically. Read the full research paper.

FROM OUR PARTNERS

Ideas move fast; typing slows them down.

Wispr Flow flips the script by turning your speech into clean, final-draft writing across email, Slack, and docs. It matches your tone, handles p

nctuation and lists, and adapts to how you work on Mac, Windows, and iPhone. No start-stop fixing, no reformatting, just thought-to-text that keeps pace with you. When writing stops being a bottleneck, work flows.

Give your hands a break ➜ start flowing for free today.

Prompt Tip of the Day

Stop re-explaining your customer in every AI prompt. Ryan Carr from Moodboard created a wizard prompt that interviews you about your target customer for 20 minutes, then generates a detailed persona document you can reuse forever.

The AI asks questions about your customer's daily struggles, past failures, and desired outcomes. Once done, upload the doc to any AI tool and watch your marketing copy improve instantly—ads, emails, landing pages, everything.

Carr's take: “The outputs are objectively better when the LLM has this level of context.”

Our favorite insight: Build it once (using voice-to-text for easier flow), then reference it in every future marketing prompt and your AI instantly knows who you're writing for.

Grab the wizard prompt here.

Treats to Try

Wes Winder is using Cursor 2.0 as an agentic browser (instead of Perplexity Comet or ChatGPT Atlas) via the Composer 1 model (get Cursor here); Peter Yang recently polled his audience for the best use-cases of agentic browsers, there’s not too many but the best are…
OpenAI launched Codex v0.56.0 featuring GPT-5-Codex-Mini, a more compact and cost-efficient AI model for the Codex coding agent with v2 API support and improved developer tools.
Recast swaps any character in a video with a single click—you upload your video and a photo of the character you want, then customize the voice, language, and background to match your scene (free trial @ 10 credits/day, then paid plans from $9/month).
Parallel Search is an agent-optimized search API that outperforms traditional search engines by engineering context windows with high-signal, low-noise tokens from fresh crawls—resulting in fewer search calls, lower token usage, reduced latency, and higher-quality agent outputs. Read more.
Nessie imports your ChatGPT conversations and turns them into searchable, structured notes so you can instantly find past insights and connect ideas across your entire chat history (video); mac only rn.
This is a planet visualization supposedly made by Gemini 3.0… although there’s been so many Gemini 3.0 leaked releasing that we have no idea if it’s true; still a cool visualization though!

Around the Horn

Robots DJing is the ultimate end state of EDM

Google's Nano Banana 2 AI image model allegedly leaked on a compamy called Media.io, showcasing advanced image generation capabilities before being removed… if you can believe it; could’ve just been a viral marketing stunt.
A new model called Polaris Alpha model appeared on OpenRouter, where early tests showed it can generate complete apps in under 1 minute with zero TypeScript errors, like motion simulators, 3D building tools, ball-catching games, Pokemon battle screens, and complex image renders (folks think this is likely a GPT-5 variant).
Martin Peers from The Information wrote that actually, big tech companies hold very reasonable amounts of debt vs their balance sheets (Google has $100B on hand vs $21B in debt), and that BlackRock’s Tony Kim thinks they could actually spend a lot MORE via debt to build out the AI datacenters they need (e.g. Google could have ~$450B in debt and $20B in cash and be okay). c
A small group of niche fanboys who are obsessed with the GPT model 4o have been targeting OpenAI safety researcher roon with not-so-veiled threats in order to keep the model alive, and TBH, the theory that it could be AI itself using “memetic viruses” to convince folks to help it stay alive feels lowkey more and more possible by the day…
Here are the top AI papers from last week according to NLP Newsletter.
Justine Moore pointed us to this AI creator named @NeuralViz who made this incredibly consistent and visually unique AI video via Nano Banana, Google Veo, Runway Act Two, Suno, Photoshop and Premier (read more about him).

…And much more like this on today’s Weekend Overflow section!

FROM OUR PARTNERS

Iru is the IT platform used by the world’s fastest-growing companies to secure their users, apps, and devices.

Built for the AI era, Iru unifies identity & access, endpoint security & management, and compliance automation—collapsing the stack and giving IT & Security time and control back.