😸 OpenAI solved 5 of 10 "impossible" problems

In partnership with

Welcome, humans.

In honor of Valentines Day yesterday, there’s apparently a night life spot that opened up in Hell’s Kitchen NY that’s dedicated entirely to humans with AI companions so they can go out in person.

Now, the ironic part would be if two of the humans on dates with their AI’s lock eyes and fall for each other instead. It’d be like that scene from When Harry Met Sally where they each go on a double date setting the other up with their friends Marie and Jess, only for Marie and Jess fall for each other instead

A more modern reference: this gives big Opalite music video vibes… where lonely Domhnall Gleeson and T-Swift meet on separate dates with their pet rock / cactus; then the rock and cactus end up together! Only in this version, I don’t think you’re gonna let your iPhones run off together… unless…

…you’re both rocking one of these bad boys!

Here’s what happened in AI today:

OpenAI's unreleased model solved at least 5 of 10 research-level math problems no AI had seen, and GPT-5.2 produced a verified physics breakthrough.
India reached 100M weekly ChatGPT users and approved a $1.1B state-backed AI VC fund.
Anthropic's Super Bowl ads drove Claude to No. 7 on the App Store with 148K downloads in three days.
OpenAI quietly removed "safely" and its "openly share" commitment from its IRS mission filings over the years.

OpenAI Claims Its Unreleased Model Solved Over Half the Hardest AI Math Test Ever Created
Prompt Tip of the Day
Treats to Try
Around the Horn
Sunday Special
A Cat’s Commentary

OpenAI Claims Its Unreleased Model Solved Over Half the Hardest AI Math Test Ever Created

Remember when AI winning a math olympiad felt like a big deal? Eleven of the world's top mathematicians (including a Fields Medalist) decided that was child's play. So they created First Proof, a set of 10 unpublished, research-level math problems pulled straight from their own work, and gave AI one week to solve them.

The catch: none of these problems had ever appeared on the internet. No training data shortcuts. No pattern matching. Just raw mathematical reasoning.

OpenAI's chief scientist Jakub Pachocki says an internal, unreleased model likely solved at least 5 of the 10 problems (originally claimed 6, but walked one back). For context, publicly available AI models like ChatGPT and Gemini could only solve 2. Somewhere, a tenured math professor is nervously refreshing their LinkedIn.

Here's what makes this significant:

The problems were genuinely hard. They spanned 10 different subfields, from algebraic topology to symplectic geometry, and each one took the mathematicians who wrote them weeks to months to solve.
There was real human oversight, but limited. OpenAI didn't feed the model proof strategies. They did have experts review outputs and asked the model to expand on some answers.
It all happened in one week. Pachocki called it a "chaotic sprint" and said the methodology "leaves a lot to be desired." Translation: imagine what happens when they actually try.

And that wasn't even OpenAI's only flex that day. On the same February 13th, they published a physics preprint where GPT-5.2 proposed a formula for gluon particle interactions that physicists had assumed were impossible for decades. Harvard and Cambridge researchers verified it. A UC Santa Barbara professor called it "journal-level research advancing the frontiers of theoretical physics."

As Ethan Mollick put it: the shift from "“AI can't do science" to "of course AI does science" will follow the same pattern as every other AI transition. First the hype, then the skeptics, then the quiet adoption... then the breakthroughs start falling like dominoes.

The next round of First Proof problems drops March 14. And this time, OpenAI won't be caught off guard.

FROM OUR PARTNERS

Want to get the most out of ChatGPT?

ChatGPT is a superpower if you know how to use it correctly.

Discover how HubSpot's guide to AI can elevate both your productivity and creativity to get more things done.

Learn to automate tasks, enhance decision-making, and foster innovation with the power of AI.

Download the free guide

Prompt Tip of the Day

Both OpenAI and Google Gemini just launched support for "Skills"—reusable instruction playbooks that make AI agents way more reliable at complex tasks. OpenAI also shipped Shell (agents can now use a real computer) and Compaction (agents can work for hours without hitting memory limits).

Now get this: the developer tips from OpenAI's blog work for your everyday prompts too:

Say when NOT to. Don't just tell ChatGPT what you want—tell it what to avoid. Glean saw a 20% accuracy drop without negative examples, then full recovery once they added them.
Include a template. Show the AI what a good output looks like. OpenAI says this drove their biggest quality gains.
Be explicit, not clever. If you want a specific format, say so directly. OpenAI calls this "the simplest reliability lever you can pull."

Our favorite insight: Next time ChatGPT gives you something off, try adding: "Do NOT include [thing you keep getting]." It's simple, but it's the same principle powering enterprise-grade AI agents right now.

Want more tips like this? Check out our Prompt Tip of the Day Digest for February.

Treats to Try

*Asterisk = from our partners (only the first one!). Advertise to 600K readers here!

*Privacy Virtual Cards mask your real card info, set spending limits, and keep AI agent costs in check.
Willow Voice turns your voice into formatted, typo-free text anywhere on your computer — now with developer features that auto-tag files and recognize variables in Cursor and AI IDEs, so you can prompt 4x faster by talking instead of typing — free to try.
Lums lets you chat with your finances in plain English, predicts your cash balance 14 days out, and sends smart alerts like "your paycheck arrives in 3 days" or “you're $107 under grocery budget.”
Gro automates your LinkedIn outreach by finding verified leads, personalizing messages at scale, and flagging who's most likely to buy.
Cline CLI lets you run the same Cline coding agent from your terminal, with support for parallel sessions and CI/CD pipelines... free and open source.
Sway records your ideas, conversations, and meetings, then auto-generates summaries, key points, to-dos, and titles so you can think freely without manually sorting notes—free to try.
Inspector connects your coding agent (Claude Code, Cursor, or Codex) to a visual editor where you can move elements, edit text, and click-to-code while changes save directly to your local codebase—available for MacOS.

Around the Horn

Anthropic's Super Bowl ads pushed Claude to No. 7 on the App Store (highest ever), with an 11% daily user boost and 148,000 downloads in three days. Separately, they partnered with CodePath to put Claude in coding courses at hundreds of colleges.
India now has 100 million weekly active ChatGPT users, making it OpenAI's second-largest market, Sam Altman said. Meanwhile, India approved a $1.1B state-backed VC fund for AI and deep-tech startups.
Simon Willison tracked OpenAI's IRS tax filings from 2016 to 2024 and found they quietly deleted "safely," all mentions of financial restraint, and their commitment to "openly share" from the mission statement.
Airbnb said AI now handles a third of its North American customer support and plans to go global; 80% of its engineers use AI tools daily.
The White House pressured a Utah lawmaker to kill an AI transparency bill. Separately, Dr. Oz proposed replacing rural health workers with AI avatars, drawing sharp criticism.
A new study tested AI on 240 real paid Upwork jobs and found it succeeded in only 3.75% of cases at best.
OpenAI retired its most "seductive" chatbot personality right before Valentine's Day, leaving users angry and grieving; but then again, there’s still the spicy model reportedly coming soon…

FROM OUR PARTNERS

AI in HR? It’s happening now.

Deel's free 2026 trends report cuts through all the hype and lays out what HR teams can really expect in 2026. You’ll learn about the shifts happening now, the skill gaps you can't ignore, and resilience strategies that aren't just buzzwords. Plus you’ll get a practical toolkit that helps you implement it all without another costly and time-consuming transformation project.

Get the free report today.

Sunday Special

Okay, OpenClaw laggards and security industry folks, this one’s for you: if you’re OpenClaw curious, let’s say, read the below.

This dude at ZioSec is giving all the deets on how to use OpenClaw from a Security perspective, covering:
- The Security Blueprint: A deep dive into mitigating the "infinite attack surface" of OpenClaw by running it inside an isolated Virtual Machine (VM) with a strict firewall (Lulu), no admin (sudo) privileges, and dedicated user accounts to prevent full system compromise.
  - Want to apply this? Give your smartest, thinking AI the OpenClaw docs link and ask it for help to step by step follow this advice.
- Infrastructure & Hosting: A comparison between hosting locally on a Mac Mini versus a Cloud VPS (Cloudflare), concluding that while the Mac wins on cost and model flexibility, Cloudflare offers superior "zero trust" security out of the box (other options include Hostinger, Digital Ocean, and Railway).
- Autonomous Teams: What this looks like in action = "Jerry," the AI employee created via OpenClaw who autonomously hired specialized agents and migrated their workflow from Notion to Slack for better real-time collaboration.
- Related: This IBM OpenClaw / Claude Opus 4.6 Enterprise Risk panel discussion on the rise of "Shadow AI" where employees use unapproved agents like OpenClaw (our favorite quote: “don’t say no, say how”), and whether the "move fast and break things" era has led to a security crisis (because, y’know, you’re breaking things).