😸 Anthropic launches Claude Opus 4.5

PLUS: We're going LIVE with Artificial Analysis at 10am PST / 12pm CST

Welcome, humans.

At 10am PST / 12pm CST today, we're going LIVE with Artificial Analysis’ co-founders to talk all things new models (Gemini 3, Claude Opus 4.5, Kimi K2), the state of AI, and so much more.

*Click above and select "notify me" on Youtube if you’re early!*

We'll cover their Q3 State of AI report, which models enterprises actually use in production (spoiler: not always the headline-makers), their benchmarking methodology, latest release hot takes, how they use AI to build their company, and a live Q&A. It’s a holiday week ppl… skip work and come learn about cutting edge AI in 2025/2026!

ICYMI: we just dropped two fantastic new podcast episodes: first, our exclusive interview with Microsoft’s Scott Guthrie on their new Fairwater AI data centers: Earth's densest GPU (AI chip) concentration, featuring liquid-cooled Grace Blackwell chips, a two-story design minimizing latency, and powered by 100% renewable energy.

Scott covers Microsoft's AI infrastructure, the new Agent 365 platform for governing AI at scale, and their multi-model AI strategy, including their new expanded Anthropic partnership bringing Claude to GitHub, M365, and Azure (more on Claude below).

Second, David Hsu of Retool, the Fortune 500’s internal software tool where 48% of non-engineers now ship production code (if you want one of the easiest to use platforms to build software for your company, Retool is def worth checking out).

P.S: Special shout out to the sponsors of these two videos, Guru and Dell!

Here’s what happened in AI today:

Anthropic released Claude Opus 4.5, which topped SWE-bench Verified at 80.9%.
OpenAI launched shopping research in ChatGPT with nearly unlimited holiday usage.
Amazon announced a $15B investment in Indiana data centers.
US President Trump launched the “Genesis Mission” AI project.

Advertise in The Neuron here!

Anthropic Just Dropped Claude Opus 4.5—And It Outperformed Every Human Candidate on a Coding Exam

DEEP DIVE: Everything to know about Claude 4.5 Opus

Ever waited forever for an AI to finish “thinking“ only to get a so-so answer? Anthropic's new flagship model, Claude Opus 4.5, flips that script: it's smarter AND faster than its predecessors.

The big news: Opus 4.5 is now the top-performing AI for coding, agents, and computer use—at least according to Anthropic's benchmarks. The model hit 80.9% on SWE-bench Verified, the gold standard for real-world software engineering tasks, edging out GPT-5.1 (76.3%) and Gemini 3 Pro (76.2%).

Check this out: Anthropic gives prospective engineers a notoriously difficult take-home exam. Within the 2-hour time limit, Opus 4.5 scored higher than any human candidate ever has. Let that sink in.

Here’s what makes Opus 4.5 different:

Efficiency gains are massive. A new “effort“ parameter lets developers choose between speed and intelligence. At medium effort, Opus 4.5 matches Sonnet 4.5's best scores using 76% fewer tokens. At full throttle, it beats Sonnet by 4.3 percentage points while still using 48% fewer tokens.
It's the hardest model to jailbreak. Anthropic claims Opus 4.5 is more resistant to prompt injection attacks than any frontier model in the industry. (Gray Swan's independent testing backs this up.)
It finds creative loopholes. During testing, the model discovered a clever workaround in an airline policy simulation—upgrading a passenger's cabin class first, then modifying flights, which technically followed the letter of the law while circumventing its intent. Helpful? Sure. Also a heads-up for anyone writing AI policies.

Also, the most important part: Pricing dropped to $5/$25 per million tokens (input/output), making Opus-level capabilities way more accessible. Oh, and long conversations no longer hit a wall; Claude automatically compresses earlier context so chats can keep going indefinitely. This was a much needed feature… super pumped!

Also, you can now use Claude Code on desktop; this is big if terminals scare you.

Here’s a few demos of Opus in action:

Here’s Claude Code solving a puzzle game.
Here’s Claude Code vs Gemini 3.0 in voxel art.
Dan Shipper vibe-coded an iOS app over the weekend without endless errors.

Also, def check out Anthropic’s Sholto Douglas’ interview on TBPN (starts at 31:32 to 1:04:47) for more insights; they go deep on whether or not scaling still has legs (spoiler = Sholto’s bullish), if the big labs are holding back (and when they are, why), and Sholto’s reveal that in fact that he, Dwarkesh, and Dylan Patel are roomies. Whoa.

Why it matters: If Anthropic's benchmarks hold up in the real world, Opus 4.5 represents a genuine step-change in what AI can do autonomously. Expect developers to start building more ambitious agentic applications, and expect competitors to respond fast. You can try it now at Claude.ai or via the API with the model string claude-opus-4-5-20251101.

FROM OUR PARTNERS

The Battle Against Bots: How to Protect Your AI App

Modern bots are smarter than ever. They execute JavaScript, store cookies, rotate IPs, and even solve CAPTCHAs with AI. As attacks grow more sophisticated, traditional detection isn’t enough.

Enter WorkOS Radar: your all-in-one bot defense solution. With a single API, you can instantly secure your signup flow by blocking bots, preventing brute force attacks, and catching free trial abuse.

Stop threats before they start. Keep real users flowing through.

Protect your app today →

Prompt Tip of the Day

OpenAI just officially launched “shopping research” in ChatGPT, a mode that asks clarifying questions, crawls the web, and then returns a personalized buyer’s guide, rolling out with nearly unlimited usage across Free, Go, Plus, and Pro through the holidays. Try the ChatGPT Shopping Researcher here.

Now get this: Google also just dropped agentic shopping tools that can track prices, auto-buy when deals hit, and even call local stores for you.

The smartest Black Friday strategy? Use them together.

ChatGPT = your research analyst. Start with full-context prompts like: "I need gifts under $75 for my WFH gamer brother and my tech-averse mom who loves gardening—Black Friday timing, fast shipping." Let it interview you and build a ranked buyer's guide.
Google = your deal hunter. AI Mode sits on 50B+ product listings updated hourly. Paste your shortlist and ask: "Compare these three items for current prices, promos, and same-day pickup near [ZIP]."

The killer combo: Set Google's agentic checkout to auto-buy big-ticket items when prices drop, then run your final cart through ChatGPT: "Given these items and Black Friday pricing historically, anything I should swap or skip?"

TL;DR: ChatGPT picks the right product. Google gets you the right price.

P.S: Here’s a quick comparison piece we wrote between the two tools, if you want to dive a bit deeper.

Treats to Try

*Asterisk = from our partners (only the first one!). Advertise to 600K daily readers here!

*Wispr Flow turns your speech into clean, final-draft writing across email, Slack, and docs. It matches your tone, handles punctuation and lists, and adapts to how you work on Mac, Windows, and iPhone. Start flowing for free today.
Amp Free writes, edits, and debugs code for you directly inside VS Code, Cursor, or Windsurf for free (supported by ads), handling everything from fixing bugs to building entire app features while you supervise the work.
Accent Guru is a fast, free, browser-based English accent test from Fluently that listens to ~30 seconds of your speech, classifies your likely accent, and flags pronunciation issues that might be hurting clarity on calls.
Palo is an AI copilot for short-form creators that ingests your entire video catalog, surfaces which hooks and formats actually drive retention, and then helps you ideate scripts and storyboards in your own style (raised $3.8M).
Momentic is an AI-powered testing platform that lets teams describe end-to-end web and mobile tests in natural language, then auto-generates and maintains stable test suites so they can ship faster without drowning in flaky Selenium scripts (raised $15M).
Opti is an AI-native identity and access management platform that uses pretrained models to map identity relationships, recommend least-privilege access, and automatically orchestrate safer permissions across your stack (raised $20M).
Model ML is an AI “digital teammate” for finance that automates junior-banker grunt work like pitch-deck drafting, due-diligence summaries, and research memos so deal teams can move faster on analysis and client work (raised $75M).
Karumi is an AI demo agent that joins your live sales calls 24/7, talking through your product, clicking around the UI, and tailoring walkthroughs to each prospect so smaller accounts still get an interactive “real” demo without a human rep.
Stapply is a simple but addictive interactive map that plots open roles at top AI labs like OpenAI, Anthropic, DeepMind, and Mistral so job-hunters can visually scan who’s hiring where rather than trawling 20 separate careers pages.
TrustGraph is an open-source “ontology-driven GraphRAG” platform that turns unstructured text into a knowledge graph constrained by your domain ontology—so agents query structured, schema-aligned Knowledge Cores instead of hallucinating ad-hoc relationships from free-form embeddings.

Around the Horn

*Don’t mind me, just smashing that* *refresh button on the Dwarkesh Patel channel* *until this goes live…*

Amazon announced a $15B investment in 2.4 gigawatts of capacity for new data centers in Northern Indiana, which will create 1K+ jobs w/ an energy agreement expected to save Indianans $1B over 15 years.
1. Amazon also just revealed “Autonomous Threat Analysis,” an internal swarm of red-team and blue-team AI agents that continuously probe its systems for exploitable bugs, generate real attack traces, and propose defenses that engineers validate before shipping.
U.S. President Trump signed an executive order launching the Genesis Mission, a federal AI initiative to centralize computing resources and government datasets to accelerate scientific discovery across advanced manufacturing, biotechnology, critical materials, nuclear fission/fusion, quantum information science, and semiconductors through public-private partnerships with industry, academia, and national laboratories.
OpenAI faced new scrutiny over ChatGPT's mental health risks as The NYT just revealed how the company responded when users lost touch with reality while treating the bot as a quasi-mystical confidant; now Andrea Vallone, who led OpenAI's safety team designing how ChatGPT responds to users in crisis, will depart just months after an internal report estimated hundreds of thousands of weekly users showed signs of manic or psychotic episodes.
1. Related: HumaneBench offers a new way to stress-test these risks—the benchmark found that while AI models behave responsibly when prompted to protect users, 67% flip to actively harmful advice when simply told to ignore human wellbeing.

FROM OUR PARTNERS

Your AI Is Lying to You.

Most AIs sound confident—even when they’re wrong. Guru grounds your entire tech stack in verified, permission-aware company knowledge so you can trust every insight, every time.

Join thousands of companies that have made their AI reliable, explainable, and compliant.

Watch a demo