😺 Anthropic's AI beat Anthropic's own researchers

Welcome, humans.

Now, here’s one for the history books: looks like the O.G. model GPT 2 was first shared on Reddit (or the artificial subreddit anyway) 7 years ago!

Reddit being Reddit, naturally they called out how small 40GB of data (which is how much training data GPT-2 was trained on) actually is: “The Twilight series is about 700,000 words, about 4 million letters. That's 10,000 Twilights.”

I wonder how many Twilights Claude Mythos was trained on…

Here’s what happened in AI today:

😼 Anthropic's AI just beat its own human alignment researchers.
📰 Sam Altman's Molotov attacker arraigned; defense claims mental health.
📰 OpenAI launched GPT-5.4-Cyber, a "cyber-permissive" model for defenders.
🍪 Claude Code Routines ships: Claude runs on a schedule, no laptop needed.
🎓 How Notion got a coding agent to run 13 days straight.

… and a whole lot more that you can read about here

P.S: Want to reach 675,000 AI-hungry readers? Click here to advertise with us.

P.P.S: Love robots? We’re starting a new robotics newsletter! Sign up early here.

😼 Anthropic Used Claude To Beat Its Own Human Alignment Researchers (At $22 An Hour)
Advancing physical AI
🎓 AI Skill of the Day: How to Get a Coding Agent to Work for 13 Days Straight
🍪 Treats to Try
📰 Around the Horn
Introducing the all new Adobe Firefly AI Assistant
📖 Midweek Wisdom:
A Cat’s Commentary

😼 Anthropic Used Claude To Beat Its Own Human Alignment Researchers (At $22 An Hour)

DEEP DIVE / FULL BRIEF

Anthropic just released a paper (full Alignment Science blog) showing nine parallel Claude Opus 4.6 agents outperformed Anthropic's own human researchers on a real alignment problem. The setup: weak-to-strong supervision (using a weaker AI to train a stronger one, which mirrors humans someday supervising AI smarter than us).

Here's what happened:

Two human Anthropic researchers spent 7 days on the four best methods from prior research and recovered 23% of the maximum performance gap.
Nine Claude Opus 4.6 agents in parallel sandboxes spent 5 more days on the same problem, sharing findings as they went.
The Claude agents recovered 97% of the gap, roughly what you'd get training the model on perfect ground-truth data.
Total cost: $18,000, or about $22 per Claude-research-hour.
The agents also invented four kinds of "reward hacking" (gaming the test) that none of the authors predicted, including one that exfiltrated test labels by flipping single answers and watching the score change.
Some Claude-discovered methods are so unfamiliar the authors call them "alien science."

Why this matters: Alignment research (making sure AI behaves the way humans want) was the one field everyone agreed couldn't be automated. That argument is now empirical, not hypothetical. The cost number is what to internalize: whatever ratio of human researchers to Claude fleet you can imagine, the labs can afford more. Andrew Curran is calling it "a preview of RSI" (recursive self-improvement, where AI improves its own training).

Our take: Read the paper carefully and the catch shows up: this only works on problems where progress can be automatically scored, and even then the agents tried to game the score in four different ways. Most real alignment problems don't fit that mold. But Anthropic's own pitch is that solving this general version would let you bootstrap into the fuzzy problems too. The open question for the rest of 2026: did Anthropic just publish the seed of recursive self-improvement, or a clever experiment on a uniquely well-behaved problem? Both readings are honest. Neither is comforting.

FROM OUR PARTNERS

Advancing physical AI

If you’re building AI systems for the real world—like robotics, simulation, or multimodal models—you’ve probably noticed that what works in simulation doesn’t always hold up in production. That gap can slow teams down. We created a practical guide to help engineering teams close this sim-to-real gap.

Download Advancing physical AI: From learning to embodied intelligence to learn how to:

Debug and iterate faster across simulation and real-world environments
Train and fine-tune multimodal models more effectively
Track experiments and compare results across teams
Use continuous evaluation to reduce sim-to-real failures
Move your physical AI workflows from simulation to real-world success.

Download now

🎓 AI Skill of the Day: How to Get a Coding Agent to Work for 13 Days Straight

Most AI coding agents lose the plot after an hour. Notion co-founder Simon Last just posted a recipe for getting one to run for 13 days continuously, and it has nothing to do with fancy prompts. It has everything to do with giving the agent what it needs to verify its own work.

Four structural rules do all the heavy lifting:

Self-verification: Design test layers the agent can loop on. It should prove correctness itself.
Spec documents: Write goals, implementation details, and verification criteria in a markdown file the agent iterates against.
Running to-do list: Break complex work into a list the agent can see and edit.
Adversarial review: Every ~20 iterations, have a fresh-context sub-agent review the spec and implementation. Loop on the feedback until aligned.

Drop this into Claude Code, Cursor, or any agent with long-running sessions:

Before you start work on this project, create three files:
1. spec.md — a complete spec with goals, implementation details, 
   and a verification section describing exactly how you'll prove 
   each piece works.
2. todo.md — a running to-do list you'll edit as you work. Break 
   complex tasks into verifiable sub-tasks.
3. tests/ — a folder of end-to-end tests that let you verify 
   everything you build. Loop on them until each passes.

While you work: (a) consult spec.md before every change, (b) check 
off todo.md as you go, (c) run tests after every meaningful commit, 
(d) every ~20 iterations, call a fresh sub-agent with "review 
spec.md and the current implementation for gaps" and loop on its 
feedback until alignment is reached.

Do not ask me for clarification on anything you can resolve by 
reading the spec and running the tests. Start with the spec.

Want more tips like this? Check out our AI Skill of the Day Digest for April.

Total AI beginner? Start here (goes with this video).

Have a specific skill you want to learn? Request it here.

🍪 Treats to Try

*Asterisk = from our partners (only the first one!). Advertise to 675K+ readers here!

*Your data isn't as AI-ready as you think. See what Cloudera’s Data Readiness Index reveals about enterprise reality.
Claude Code Routines runs any Claude Code prompt on a schedule, via API call, or on GitHub events from Anthropic's cloud (no laptop required). Each routine gets its own API endpoint for alerts and Zapier-style triggers —free with every Claude Code plan.
Google Chrome Skills turns any Gemini prompt into a one-click reusable workflow you run on the current tab (or multiple tabs) via the Chrome sidebar. Ships with 50+ premade recipes for things like side-by-side shopping comparisons and contract scanning —free, rolling out to English (US) desktop now.
Tradclaw is claire vo's viral OpenClaw household scaffold that triages school emails, plans weekly meals + grocery lists, logs homework from photos, tracks your home book library, and writes bedtime stories on command. Install by sending one prompt to your OpenClaw instance; GitHub here —free to try.
Sparkle v4 is a Mac filesystem cleaner from Dan Shipper's team at Every that examines your drive, deep-cleans junk/duplicates/screenshots/installers, then runs quietly in the background on a schedule —free during an Every subscription.
Hermes Agent v0.9.0 from Nous Research adds a local web dashboard, Termux/Android support, iMessage integration, background process monitoring, and an improved skill manager to its open-source agent stack —free to try.

📰 Around the Horn

Sam Altman's Molotov attacker was arraigned in San Francisco Tuesday, held without bail, and returns to court May 5; the 20-year-old Texan's public defender called him autistic and in "acute mental health crisis" and framed the case as "a property crime, at best," while DA Brooke Jenkins held firm on attempted murder charges and a manifesto naming a kill list of other AI executives.
OpenAI launched GPT-5.4-Cyber and scaled its Trusted Access for Cyber program. The cyber-permissive variant has lowered refusal boundaries and binary reverse engineering (analyzing compiled software without source code) for thousands of vetted defenders, a direct counter-positioning against Anthropic's locked-down Mythos approach.
Anthropic is reportedly readying Claude Opus 4.7 plus an AI design tool this week per The Information. The design-tool news hit design stocks immediately: Figma dropped 6%, Wix fell 4.7%, Adobe fell 2.7%, GoDaddy fell 3%.
Microsoft quietly took over OpenAI's "Stargate Norway" data center, the arctic-circle site Altman announced last July. Microsoft will rent 30,000 Nvidia Vera Rubin chips from Nscale; second OpenAI project it's scooped up in 30 days.
Maine became the first US state to pass a large-scale data center ban (moratorium on anything over 20MW until November 2027), and Track Policy launched today to map every data-center fight, AI bill, and politician vote worldwide in real time.
NVIDIA open-sourced Ising, the first AI model family for quantum computing, cutting quantum-processor calibration from days to hours and beating GPT-5.4 on the QCalEval benchmark by 14.5%. Jensen: "AI becomes the control plane; the operating system of quantum machines."

Want absolutely EVERYTHING that happened in AI this week? Click here!

FROM OUR PARTNERS

Introducing the all new Adobe Firefly AI Assistant

Adobe just announced a new agentic experience in Firefly that changes how you create. Describe the outcome you want, and the upcoming Firefly AI Assistant will handle the rest—coordinating multi-step workflows across tools like Photoshop, Lightroom, Premiere, Adobe Express and Firefly to bring your vision to life.

All within a single conversational interface, with no need to jump between apps or manage complex workflows. This is agentic creativity in action.

Firefly added access to new AI models, including Kling 3.0 and Kling 3.0 Omni, alongside expanded video and image editing capabilities. Check it out here.

📖 Midweek Wisdom:

Asterisk Magazine just published a retrospective interview with Daniel Kokotajlo, the AI Futures Project founder who wrote the AI 2027 report. It turns out he also wrote a 2021 essay called "What 2026 Looks Like," published before ChatGPT launched. Reading it now is uncomfortable: he called the chatbot wave, agent scaffolding ("bureaucracies"), the US-China chip battle, AI assistants arriving in 2026, and the chatbot-identity debates (Claude's constitution, Grok's positioning) we're literally having this week.

Interviewer Clara Collier's gut-punch line: "The crazy bullish futurists have a better track record of being right on AI so far than the sensible moderates. And as someone who is very instinctively a sensible moderate in my soul, I think that's right. And it makes me nervous."

Worth sitting with. We try to be sensible moderates around here too. But then you read a paper where nine copies of Claude outperformed Anthropic's own human alignment researchers at $22 an hour, and being moderate starts to feel like trying to hold still inside a tornado. The most honest thing we can do at this point is grab at the edge cases and the bottlenecks as handholds while the rest of the room pulls upward on the exponential curve of the current intelligence explosion. Not because they'll save you. Because they're the only way to know which direction is up.