😺 Somebody in AI won a Math Gold medal... but was it OpenAI or Google?

PLUS: GPT-5 in the wild + best AI agents right now

Welcome, humans.

NGL y’all, we had to stare REAL hard at this video (for at least a minute there) before our brains accepted this was AI:

TBH, we almost saved it for a Thursday Trivia, but we just knew you wouldn’t accept it as real given how ludicrous it is (and trust us, we TRIED).

In fact, us trying to find any footage of another animal riding an alligator led us to this “animal expert” Nunchakusdragon, who reacts to clearly fake AI videos (like this one) as if they’re real…with a straight face, no less. Pretty hilarious stuff.

Here’s what you need to know about AI today:

OpenAI and/or Google won gold at the International Math Olympiad.
Meta won’t sign EU AI Code of Conduct pledge.
The US AI Czar’s VC firm invested in AI for gov contractors.
An Argentinian show was the first to use genAI on Netflix.

Advertise in The Neuron here.

OpenAI OR Google (or both) just achieved Gold in the International Math Olympiad… but who?

DEEP DIVE: Read our full breakdown of what happened and its implications here.

Over the weekend, OpenAI claimed a new, experimental model just achieved what many considered a far-off dream for AI: gold medal-level performance at the 2025 International Mathematical Olympiad (IMO), the world’s most prestigious and difficult math competition for high school students.

This new AI solved 5 of 6 problems under the same grueling conditions as human contestants—no internet, no tools, just hours of pure reasoning to produce multi-page proofs.

But then again, remember when we said “OpenAI claimed”? It turns out, this crucial victory is not as straight forward as it seems. In fact, Google DeepMind’s head of reasoning suggested OpenAI actually only won silver (if it won anything at all)… and rumor has it, Google might have achieved its own gold.

So why would this be a big deal if its true? Just days ago, a public leaderboard showed that models like Gemini 2.5 Pro and o3 couldn't even score a bronze on the same test. This new model, which OpenAI says won't be public for months, is a different beast entirely.

OpenAI’s alleged breakthrough comes from new techniques in reinforcement learning and a major shift in approach—letting the model “think” for hours on a single problem, a huge jump from the seconds or minutes previous models used.

This video from Wes Roth does a great job of recapping the news and why this is a big deal. Wes explained that the model used a limited vocabulary to reduce the tokens needed for reasoning that, uh, kinda reminded him of Kevin from The Office…

Most importantly, unlike Google’s AlphaProof system that won the company a silver medal in last year’s Olympiad, OpenAI’s model is a general language model, not a custom-math solver.

As Wes Roth explains in his recap, this boils down to the difference between narrow AI and general AI.

Narrow AI is a system engineered to be superhuman at one specific task.
General AI is a single system that can reason and learn across a wide range of different domains, much like a human.
The breakthrough here is that OpenAI claims to have reached the pinnacle of a highly specialized, human-only domain using their general-purpose system.

But this new math model is just one facet of a much bigger story. OpenAI's researchers revealed that the same underlying AI system that trained this model is powering a whole suite of new capabilities, including:

A new coding model, o3-alpha, which is absolutely dominating the WebDev Arena. Testers say it can one-shot prompts to create clones of GTA and Minecraft.
The model that placed second at the AtCoder World Tour Finals, a grueling 10-hour coding marathon (it was narrowly beaten by a human programmer from Poland).
The new general-purpose agent that can use a computer like a human, which we covered last week.

Of course, the skeptics are out in full force. On the same day as the announcement, a research paper called VAR-MATH made headlines for claiming that AIs “cheat” on math tests by memorizing patterns.

But here's what the headlines missed: the paper's own data showed that top-tier models like DeepSeek-R1 and OpenAI's o4-mini were largely immune to this flaw, demonstrating real generalization.

Fields Medalist Terence Tao also urged caution, pointing out that without a pre-disclosed methodology, it's hard to make an apples-to-apples comparison between human and AI performance. Plus, he noted that massive compute power is like giving an AI a “time acceleration machine” (for the anime fans out there, think of the hyperbolic time chamber in DragonBall Z).

Lastly, and most importantly, OpenAI DIDN’T work with the IMO to verify its results. This is according to Mikahil Samin, who spilled the tea on X. Worse yet, the IMO asked AI companies to wait a week before announcing their results (so as not to steal thunder from the human high schoolers competing).

Speaking of the Office… Because OpenAI couldn’t contain themselves, the IMO was “happy” to leak that they found OpenAI pulling a “I am the Hay King” SUPER RUDE.

Even more consequential, Thang Luong of DeepMind tweeted that in order to make a true medal claim, you need to be evaluated according to the official marking guideline from the IMO. According to him, because OpenAI lost a point on question 6, it’s a silver medal win… not gold. Keep in mind, this take is coming from a competitor…so we’ll have to wait for the official statement from the IMO to come to a final conclusion.

FROM OUR PARTNERS

Elastic x IBM watsonx: Rethinking conversational search

Want to learn how IBM watsonx can think with your company’s mind? Read how Elastic and IBM have partnered to deliver retrieval augmented generation capabilities that are seamlessly integrated into IBM watsonx Assistant’s new Conversational Search feature—and grounded in your business’s proprietary data.

Prompt tip of the day

Tired of prompting AI with words? Try this:

“Transcribe this image exactly as it appears, then solve. Ask me follow up questions if what you need to do is unclear.”

All you have to do is take a screenshot of your problem, be it an error message, instructions, a confusing web page, or any other task that can be encapsulated in a single screenshot, and attached the screenshot with the above prompt.

The prompt will have the AI transcribe exactly what the screenshot shows (which is helpful to ensure the AI is actually “seeing” what you intend it to see), then solve the issue.

Remember that giving the AI as much context as possible is the best way to get it to do what you want it to do. In many cases, this means you’ll need to provide more context to get it to help you, whether that’s screenshots or additional explanations of whats on screen, like “this is what i’m seeing, what do I do now?”

Want more like this? Our Prompt Tip of the Day Digest for July is here!

P.S: Completely new to AI? Start here!

Treats To Try.

*Asterisk = from our partners. Advertise in The Neuron here.

*Guidde turns your screen recordings into professional video tutorials with AI-generated step-by-step narration and voiceover in 100+ languages.
Amazon’s Agentcore deploys your AI agents at scale without you building the infrastructure yourself (and you get pre-built agents from their Marketplace)—try it here.
Findmypapers from bycloud helps you easily find, discover & explain AI research papers—free to try.
ARC-AGI released the first three games in its new dataset that aim to test AI’s skill to generalize and do tasks that are easy for humans.
Watch this video on how to identify AI-generated imagery…it might help save you from online scams that are on the rise right now.
Claude Code is being used, not just for coding, but as a general purpose desktop agent—here’s a step by step guide on how to use it and some awesome examples of the creative ways people have applied it.
Agent is OpenAI’s new hybrid model that uses its own computer to complete tasks for you—browsing sites, making presentations, and even spinning up other agents (Pro users get 400 messages/month, Plus and Team users get 40 messages/month when it rolls out this week).

See our top 51 AI Tools for Business here!

Around the Horn.

Meta refused to sign the EU’s new AI code of conduct, more or less saying it was an overreach that would stunt growth due to the number of legal uncertainties it creates for AI model developers.
Craft Ventures, the VC firm from current US AI Czar David Sacks, invested $22M in a startup called Vultron that creates AI tools for federal contractors; while some question the conflict of interest in this deal, Sacks previously divested his crypto and AI holdings.
Netflix’s new Argentinian series El Eternauta is the first on the platform to include generative AI footage.

For the latest AI deep dives, check out our Explainer articles here!