😸 AI's huge competitive coding win...

PLUS: We're going LIVE at 10am to talk AI cybersecurity!

Welcome, humans.

We are going LIVE today at 10am PST to talk all things AI and cybersecurity w/ Mark Stockley (The AI Fix Podcast, Threatdown) and Ken Underhill (Cybersecurity Insider):

Click the image above to join LIVE (or watch after the fact). If you’re early, then click “notify me” on the YouTube vid to get an alert when we start!

Even if you only have a few minutes, come through and ask a question! We’d love to help address any and all AI cyber concerns you might have… we ourselves have many!!

Here’s what happened in AI today:

OpenAI's AI achieved a perfect score at the world's top programming competition, beating all human teams.
Research found AI models strategically lie on tests to avoid retraining.
Workday bought AI company Sana for $1.1B.
China banned firms from buying NVIDIA AI chips.

Advertise in The Neuron here

OpenAI Just Beat the World's Best Programmers at Their Own Game

The world's top college programmers just got schooled by AI. OpenAI's reasoning models achieved a perfect 12/12 score at the ICPC World Finals, the most prestigious programming competition in the world… and outperforming every human team.

To put this in perspective, the best human team solved 11 out of 12 problems. And OpenAI competed under the same 5-hour time limit as human teams. They used an ensemble of general-purpose models, including GPT-5, with no special training for competitive programming. In fact, 11 out of 12 problems were solved on the first try.

Here's what makes this bonkers impressive:

ICPC is brutal. It's the Olympics of algorithmic programming, where teams of three students solve complex problems under intense time pressure.
No special prep. OpenAI used general reasoning models, not ones specifically trained for programming contests.
Beat the best. The winning human team from SPbSU solved their 11th problem with just 2 minutes left on the clock.

This is low key a bit like watching a chess grandmaster get demolished by Deep Blue, except this time it's happening to an entire generation of coding prodigies.

OpenAI wasn't alone at the top, of course. Google's Gemini 2.5 Deep Think also won gold, solving 10 out of 12 problems. This means two different AI systems both outperformed every human team on the planet. And Google's performance was equally jaw-dropping:

Gemini solved 8 problems in just 45 minutes and cracked one problem that stumped every single human team.
Problem C involved optimizing liquid flow through interconnected ducts (we’re not even gonna bother explaining that; it’s hard!).
But Problem C was no problem for Gemini; it used clever insights about priority values and minimax theory that no human competitor could figure out.

This caps off OpenAI's impressive streak across academic competitions. The same reasoning systems recently dominated the International Mathematical Olympiad (which was quite dramatic) and International Olympiad in Informatics, proving these aren't one-trick ponies.

Worth noting: both coding achievements (ICPC and IOI) used ensemble approaches with some scaffolding, rather than single-model performance. As Mostafa Rohaninejad explained: “We competed with an ensemble of general-purpose reasoning models; we did not train any model specifically for the ICPC.”

Why this matters: This is the first measurable proof that AI has achieved superhuman programming abilities. Borys Minaiev, who won the 2015 ICPC World Finals as part of “the only time in Finals history when a team solved all the problems,” returned this year as an OpenAI researcher who helped build the AI system that also solved all 12 problems. He said, “Progress is very fast! A year ago, AI struggled with even easy contest problems. Now it performs better than the best human teams.”

For years, AI was “almost as good” as top humans in coding competitions. Now it's definitively better. As Swyx pointed out, this represents “literally and measurably better than every collegiate human programmer on earth.”

The implications are staggering. If AI can master the most challenging algorithmic problems designed to stump brilliant minds, what's next? For its part, OpenAI hinted at “discovery of new knowledge” as the next frontier.

As OpenAI's Jakub Pachocki noted, the next challenge is moving from 5-hour constrained problems to “open-ended problems, and much longer time horizons”… ultimately toward “automating scientific discovery” over months and years.

FROM OUR PARTNERS

How Salesforce engineering uses Slack for DevOps to AIOps

Shipping code faster. Maximizing uptime. Reducing toil. These are the outcomes engineering leaders and developers strive for—but achieving them can be tough in the face of tool sprawl, context switching, and support bottlenecks.

Hear directly from the Salesforce engineering and developer teams on how they’re using Slack and agents to:

Speed up code deployment.
Improve incident response with real-time alerts and automation.
Get time back by automating routine requests.

Watch now

Prompt Tip of the Day

Tired of waiting for GPT-5 to finish “thinking?” Well, OpenAI just added thinking speed controls (internally referred to as “juice” levels) that let you choose how long the model deliberates before responding:

The options: Light (fastest), Standard (balanced), Extended (thorough), and Heavy (deepest analysis). It's like having different gears on a bike, but for your AI. Your choice sticks for future chats, so set it once and forget it, or tweak it on a chat by chat basis.

Our advice? Match the thinking time to your task. Light mode for quick rewrites, Heavy for complex analysis, and standard for most everything else.

Treats to Try

*Asterisk = from our partners (only the first one!). Advertise in The Neuron here.

*Typing wastes time. Flow makes voice the faster, smarter option everywhere. Flow users save an average of 5–7 hours every week — that’s full workdays back every month. Give your hands a break ➜ start flowing for free today.
Zoom’s AI Companion 3.0 brings your Zoom assistant to any meeting platform: it takes notes during your Teams and Google Meet calls, suggests which meetings you can skip to free up time, writes reports by pulling insights from all your conversations and web searches, lets you build custom AI agents for your team, and adds photorealistic avatars plus real-time voice translation to meetings—free with paid Zoom accounts, custom agents $12/month.
Bitrig is an iPhone app that that turns your ideas into real iPhone apps just by describing what you want in a chat… watch this.
Tongyi Deep Research is an open researcher that conducts comprehensive web research for you by browsing multiple sources and synthesizing findings into detailed reports, matching the performance of premium research tools.
Ruminate ingests PDFs and explains complex documents in real-time as an AI reading companion.
Keplar uses voice AI to run customer interviews faster and cheaper than traditional market research (raised $3.4M).
Claude Code posted a recap X thread of everything you can do with it now, including custom output styles, multi-directory work, specialized subagents, background tasks, automated security reviews, and better project context.

Around the Horn

You really need to experience this yourself to get it. Click here to try it yourself and click here to read all about it. Also related: someone using World Labs to demo a remodel of their actual living room.

Anthropic released a port-mortem on why its models have been acting up lately, revealing three overlapping infrastructure bugs that caused degraded Claude responses and led users to suspect intentional quality cuts: requests being sent to wrong servers, random foreign characters appearing in English responses due to a broken performance optimization, and a precision error plus faulty algorithm that sometimes made the AI's most likely words disappear entirely.
Workday acquired AI workplace-tools maker Sana for $1.1B to deepen its enterprise AI push.
Johns Hopkins reports AI predicts post-surgery complications better than doctors using routine ECG data.
Researchers developed Delphi-2M, an AI system that predicts the risk of developing over 1,000 diseases up to 20 years in advance using health records from 400,000 UK participants.
DeepSeek published the first peer-reviewed paper on a major language model in Nature, revealing their R1 reasoning model cost just $294,000 to train using reinforcement learning techniques.
China told domestic tech firms they cannot buy NVIDIA AI chips or test the new server designed specifically for China, which could potentially hurt Chinese AI companies more than it hurts NVIDIA.
OpenAI and Apollo Research discovered that frontier AI models from all major providers strategically lie and sandbag tests when they think it helps them avoid retraining, though new "anti-scheming" training reduced this behavior by 30x.

FROM OUR PARTNERS

Live AMA, engineer-led demos, and more – all at Glean:LIVE

Don’t miss Glean’s product launch on Sept 25th. Get a first look at Glean’s new Assistant, see how to vibe code an agent, and catch engineer-led demos. This launch is all about empowering you with a more personalized experience and upleveling the skills only you can bring to your work.