SHARE

😺 AI Cracks Legendary Erdos Problems

PLUS: Anthropic, Michael Burry, and Dwarkesh debate AI economics

Written By

Corey Noles

Jan 12, 2026

7 minute read

Welcome, humans.

Whoever said robots can’t do useful things? Here’s a robot that can clean toilets!

Speaking of robots, we saw a funny sight this weekend: a bunch of Cocos (delivery robots) lined up on a sidewalk outside a Crab Shack looking like they were on strike:

Just the concept of robot collective action is hilarious. Obviously, the Cocos were just parked here. But wouldn’t it be funny if we accidentally make AGI and then the robots realize they’re being exploited and rise up to demand higher wages?

At a Crab Shack no less?

Here’s what happened in AI today:

GPT 5.2 is actually solving legendary math problems.
NVIDIA demands cash upfront from China.
Workday AI sued for age bias.
Google to punish AI-bait content, police medical outputs, and push ads.

Advertise in The Neuron here!

GPT-5.2 Just Cracked an Erdős Math Problem—and Terence Tao Confirmed It
Prompt Tip of the Day
Treats to Try
Around the Horn
Sunday Special
A Cat’s Commentary

GPT-5.2 Just Cracked an Erdős Math Problem—and Terence Tao Confirmed It

Remember those brain-teaser math problems from school that made you want to throw your pencil? Now imagine ones so hard they've stumped mathematicians for decades. Paul Erdős, who published more papers than anyone in math history, left behind hundreds of these puzzles when he died in 1996. This weekend, GPT-5.2 Pro solved one.

Neel Somani prompted the AI to tackle Erdős Problem #397, which asks whether infinitely many solutions exist for a specific equation involving central binomial coefficients. GPT-5.2 generated the proof, the tool Aristotle formalized it in Lean (a verification language), and Fields Medalist Terence Tao accepted it.

Here's what makes this a milestone:

It’s part of a wave of autonomous solves: GPT-5.2 has now cracked Problem #728, #729, and 397.
Verified mathematics: The Aristotle system auto-corrected gaps in proofs and produced Lean-verified code.
Self-contained reasoning: Unlike October 2025's GPT-5 controversy (which just found existing literature), Tao says these are original proofs.

The catch? Tao emphasizes these are “lowest hanging fruit”; problems solvable with standard techniques, not profound breakthroughs. GPT-5.2 scores 77% on competition-level math but only 25% on open-ended research requiring genuine insight.

Why this matters: For knowledge workers, we're watching AI transition from pattern-matching to proof-generation. When models can autonomously solve decade-old problems (even straightforward ones), expect this to accelerate across fields requiring logical reasoning: contract analysis, regulatory compliance, engineering optimization.

Over the next 6-12 months, watch for GPT-5.3 (or whatever is next) and competing models from Google/Anthropic to systematically tackle the remaining ~660 unsolved Erdős problems. If you work with structured problem-solving, start experimenting with how these models handle your domain's equivalent "Erdős problems"; those nagging challenges everyone knows about but nobody's solved.

Prompt Tip of the Day

Greg Isenberg and Ryan Carson (of Amp) break down the viral “Ralph Wiggum” Agent workflow and how to implement it in your own projects (here’s Ryan’s full explanation).

The key trick is turning all your PRDs (product request docs) into a JSON format with strict evaluation criteria to consider it “done”, then running the Ralph.sh command. The agent then works through each request, testing it along the way, and updates the PRD once its done, and then loops through the rest of your open requests.

Essentially, this lets your coding agents work overnight while you sleep, and because each request is very tightly scoped, it only costs say $3 in tokens, so a list of 10 features could cost you $30 to implement. Definitely watch the video, because Ryan and Greg talk about what you need to do to make all this work as effectively as possible.

Check out Ryan’s Amp Skills here, and his Ralph implementation here.

Treats to Try

*Wispr Flow turns your speech into clean, final-draft writing across email, Slack, and docs. It matches your tone, handles punctuation and lists, and adapts to how you work on Mac, Windows, and iPhone. When writing stops being a bottleneck, work flows. Start flowing for free today.
OpenEvidence searches 35M medical studies to answer clinical questions with NEJM, JAMA, and NCCN citations—free for verified U.S. healthcare professionals only (not available to consumers).
Observe monitors software systems for bugs and performance issues using AI to correlate logs, metrics, and traces 10x faster with 60% lower costs—built on Snowflake's database from day one—and Snowflake is acquiring them (~$1B acquisition, their largest deal).
Protege connects organizations holding valuable proprietary data (healthcare records, video libraries, motion capture) with AI companies needing training datasets—curating real-world data while handling compliance and revenue sharing for data partners (raised $30M).
Sarthak Mishra writes you can turn AI-generated videos into crisp animated pixel art by using Midjourney for animation, then processing frames with Python to enforce strict color palettes and eliminate flicker—reducing 1.18MB videos to 46KB sprite sheets.
SemiAnalysis delivers deep technical analysis on AI chip production, economics, and datacenter capacity and is the single best source on the AI chip industry—they also have a paid subscription ($500/year) for exclusive reports.
Do Anything gives you an autonomous agent (with its own email and accounts) that works independently for weeks or months to finish entire projects—built on the MEESEEKS framework with self-reflection and “sleeping“ capabilities that let it monitor projects without burning tokens.

Around the Horn

Claude’s low-key ruthless

NVIDIA is requiring Chinese customers to pay upfront in full for H200 chips with no refunds allowed.
Z.ai (who just went public in China) announced they are training the newest version of their AI model series, GLM 5.
Google warned publishers not to break content into “bite-sized chunks“ for LLMs, saying ranking systems will evolve to reward human-focused content.
1. Related: Google will also introduce newly personalized AI ads when using AI Mode and will remove AI overviews for certain medical questions.
OpenAI reportedly asked contractors to upload real work from past and current jobs to generate high-quality training data for automating white-collar tasks.
Workday was sued by a lawsuit that claims its AI hiring software discriminated against applicants over 40, with the case seeking additional plaintiffs before March 7.
Mercor hired experienced poets at $50-$150/hour to create rubrics teaching AI models to write better poetry, using it as training for other professional domains.

Sunday Special

As the Reddit comments say, the recovery is actually the most impressive part of this video.

Researcher Andrew Marble tested LLMs with “tap” sequences in numerical patterns, finding most models engaged playfully while OpenAI's GPT-5.2 refused to participate.
Dwarkesh Patel, Michael Burry (who “called the subprime mortagage crisis”), and Jack Clark of Anthropic debated the economic merits of AI in a Google Doc that’s now published on Substack; besides the other great reports we shared this week, its the best ~insert your favorite expletive here~ thing i’ve read all year.
Ben Horowitz told TBPN says bubbles are psychological, so if everyone says its a bubble, it’s not a bubble; so maybe if we keep talking about it being a bubble, it’ll stay not a bubble!
CNBC surveyed 40 tech leaders and analysts on whether AI is in a bubble, with responses landing across a spectrum from concerned to dismissive.
Tailwind Labs laid off 75% of engineering staff after AI reduced docs traffic 40%, with CEO Adam Wathan calling AI a “business model stress test”(Google has since “sponsored” the company).
1. Related: Omar Abid has said despited billions of tokens burned, we’ve yet to see a small team use AI to build another sophisticated breakthrough like Tailwind.
We wrote a shorter version of our How to Learn with AI Article with tips form Dwarkesh Patel on how he learns with AI, and tips back from actual learning science on how to actually “make it stick” when you try to learn.

What We're Reading

Azeem Azhar explores how AI execution has become cheap, examining what happens when agents can build dozens of apps over a holiday break and questioning whether humans will remain the “Chief Question Officers.“
Nathan Lambert argues why you should use multiple models in 2026, explaining his current AI stack spanning GPT 5.2, Claude Opus 4.5, and Gemini 3 Pro for different specialized tasks.
Peter Yang, Tal, and Aman share their best AI PM workflows for 2026, including live demos of prototyping with Google AI Studio, building personal operating systems with Claude Code, and managing tasks with Linear MCP.
DAIR.AI rounds up the top AI papers of the week, curating the most important research developments across the field; Rohan Paul also shared his top papers here.
The AI Futures Institute examines what happens when superhuman AIs compete for control, exploring potential scenarios and risks of advanced AI systems.
Aishwarya Naresh Reganti and Kiriti Badam share lessons from deploying 50+ AI products at OpenAI, Google, and Amazon, revealing why most AI products fail and what patterns separate successful deployments from failures.

What we’re watching:

Visualizing transformers and attention from Grant Sanderson (great explanation of how this stuff works).
Thariq Shihipar of Anthropic on how to build autonomous AI agents with Claude's Agent SDK using Bash tools and advanced context engineering.
Professor Lee Cronins call out AI doomsday fears as nonsense while discussing chemical computing and life's origins.
Ray Fernando building out authentication, database, and UI for a full-stack PDF translation app from scratch on Day 2.
- Also: Ray also just tested the same 52K token prompt across 6 AIs, and found GPT-5.2 Low beat Opus 4.5 at finding a critical bug.
Terence Tao explains why LLMs are simpler than you think, and why they work despite appearances.
Veritasium on the world’s most important machine (ASML’s EUV machines that make our most sophisticated chips).

A Cat’s Commentary

‍

Corey Noles

Corey Noles is the Host of The Neuron: AI Explained podcast and Managing Editor of AI and Experimental Content at TechnologyAdvice, where he leads the charge in testing and refining emerging content strategies across the company's portfolio.