Everything That Happened in AI This Week May 11-15

China asked for Anthropic's newest model, Anthropic said no, and the same model family was already showing up in U.S. government cyber defense.

Welcome to the Around the Horn Week in Review, the one page you need to sound dangerously informed at work tomorrow. This week had the full AI news bingo card: a geopolitical model-access fight, compute companies trying to leave Earth, recursive self-improvement turning into a funded startup category, live AI cyber incidents, coding agents multiplying across every developer surface, and enough tiny open-source tools to make your terminal look like a group project. At this point, the safest dependency is probably a printed manual and a candle. Let's get into it.

Lead theme: model access Secondary theme: compute scarcity Tool theme: agent workflows

Around the Horn - Week of May 11, 2026

The biggest story this week was the model-access line turning into an actual geopolitical line. China's representatives reportedly approached Anthropic at a Singapore meeting to demand access to the company's newest model, and Anthropic refused. POLITICO framed Mythos as a China-summit flashpoint, which is the kind of sentence that would have sounded ridiculous back when AI launches were mostly benchmark charts and founder podcasts.

Then the context made it bigger: Reuters reported the Pentagon was deploying Anthropic's Mythos cybersecurity model to find and patch vulnerabilities across U.S. government systems, even as the department races to transition away from Anthropic. Add Anthropic's own 2028 AI leadership essay, which argued democracies should preserve a commanding AI lead over China through compute controls and anti-distillation measures, and the shape of the week becomes pretty clear.

The real story was not just "China wants the model." It was that frontier models are now being treated like strategic infrastructure: useful enough for government cyber defense, sensitive enough to deny to a rival state, and powerful enough that model access itself is becoming a diplomacy problem.

🏆 TOP 5 NEWS

The compute race got physical. Cerebras upsized its IPO, Cowboy Space raised $275M for orbital data centers, SoftBank floated a huge French data-center project, OpenAI released MRC (a networking protocol for giant AI clusters), and 70% of Americans opposed nearby data centers.
Recursive AI became a funded company category. Recursive Superintelligence raised $650M to automate AI creation, while Nous claimed 2-3x faster pretraining, NVIDIA released elastic reasoning models, and AnyFlow pushed video generation across different compute budgets.
AI security crossed into live-fire reality. Google confirmed criminal AI-driven zero-day exploitation, TanStack npm packages (JavaScript packages) were compromised, Mistral's PyPI package (Python package) was hit, and XBOW disclosed a critical Exim bug.
AI productivity started colliding with the real economy. METR found technical workers self-reported 1.4-2x more value from AI tools, IBM said 76% of organizations now have a Chief AI Officer, Meta reportedly planned 8,000 layoffs, and Amazon workers reportedly tokenmaxxed internal AI usage.
Research kept trying to break the old model shape. SenseNova-U1 pushed native multimodal understanding and generation, Multi-Stream LLMs (large language models) argued agents should read, think, act, and write in parallel, Open-dLLM adapted coding models toward diffusion-style generation, and Fast Byte Latent Transformer attacked token-by-token bottlenecks.

Honorable Mentions

Google pushed Gemini deeper into Android, adding proactive task automation, widgets, autofill, phone-control features, and a broader push toward assistants that act before you ask.
Isomorphic Labs raised $2.1B to scale AI-driven drug discovery, bringing the Alphabet-backed Demis Hassabis company to roughly $2.6B raised.
The U.S. cleared Nvidia H200 sales to Chinese firms, but no deliveries happened yet as Beijing pushed companies toward domestic chips.
A Nature study found state-controlled media can influence LLM outputs, showing models answer differently across languages tied to different media environments.

🍪 TOP TREATS TO TRY

Claude Code Agent View gives you one place to manage parallel Claude Code sessions, unblock agents, and jump between running tasks - available on paid Claude plans.
OpenAI Daybreak helps security teams use GPT-5.5 and Codex Security to identify threats, generate patches, and verify fixes - paid only rn.
Kimi WebBridge connects Kimi's desktop agent to your browser so it can research, click, fill forms, compare options, and complete web tasks locally - free to try.
Velo 2.0 turns a raw screen recording into a polished video and written doc, then lets you edit both by chat - free options.
Statewright constrains AI coding agents with visual state-machine guardrails, so tools unlock only when the workflow reaches the right phase (HN thread) - free plan, then $29/mo.
Needle distills Gemini tool calling into a 26M-parameter function-call model you can fine-tune locally for tiny on-device agent workflows (HN thread) - free / open source.
Semble gives coding agents natural-language codebase search that returns the right snippets instead of stuffing full files into context (HN thread) - free / open source.
AI Engineer Coach analyzes your local AI coding logs to spot prompt anti-patterns, track generated code, and turn repeated workflows into reusable skills (HN thread) - free / open source.
sx acts like a private npm-style package registry for AI assets, helping teams share skills, MCP configs (Model Context Protocol, the standard that lets agents plug into outside tools), commands, hooks, and agent workflows across tools (HN thread) - free / open source.
TinyPPO Snake runs a live neural-net Snake training demo in your browser, turning PPO (a reinforcement-learning method) and WebGPU (browser graphics-compute tech) into something you can actually watch learn (HN thread) - free to try.

🏢 Big Tech & Major Companies

Anthropic made Claude Platform generally available on AWS, giving developers native Claude features through AWS billing, IAM auth (Amazon's identity controls), CloudTrail logging, prompt caching, batch processing, tools, and managed agents.
Claude for Small Business put Claude inside QuickBooks, PayPal, HubSpot, Canva, and DocuSign with workflows for payroll, month close, and marketing campaigns.
Anthropic and the Gates Foundation announced $200M in grants, Claude credits, and technical support across global health, life sciences, education, agriculture, and economic mobility.
OpenAI launched the Deployment Company, a forward-deployed AI rollout unit backed by TPG, Bain, Goldman, and McKinsey, and acquired Tomoro's roughly 150 AI engineers.
OpenAI said ChatGPT adoption broadened in Q1, with fast growth among older users, feminine-named users now over half of inferable-gender users, and strong gains across Latin America, Asia-Pacific, and Africa.
xAI released Grok Voice Think Fast 1.0, a full-duplex voice agent developer API (software interface) for noisy, interruption-heavy sales and support calls.
Meta positioned Muse Spark as its strongest model yet for Meta AI across WhatsApp, Instagram, Facebook, Messenger, and AI glasses.
Meta offered rival AI chatbots one month of free WhatsApp access to resolve EU antitrust concerns, while Threads tested a Grok-like Meta AI feature.
Google DeepMind reimagined the mouse pointer as a context-aware AI partner, while Googlebook pointed toward a premium Android / ChromeOS laptop category.
Microsoft Edge added new Copilot features, including tab reasoning, browsing-history personalization, voice and vision, quiz creation, project-like Journeys, and podcast-style summaries of open tabs.
Nvidia topped $40B in AI equity bets this year, investing in startups while striking commercial deals with many of the same companies.
Perplexity Research explained how it hosts Qwen on NVIDIA Blackwell hardware, giving readers a rare look at the infrastructure side of running frontier-class open models.

💼 AI Productivity, Labor & Economics

NBER showed AI agents running Deep Research on a loop can automate much of the work of building high-quality economic datasets from public sources at roughly LLM-subscription cost.
a16z argued CRMs will become infrastructure under systems of intelligence, with agents enriching records, handling go-to-market workflows, and turning institutional memory into shippable workflows.
Tomasz Tunguz estimated state-of-the-art AI email costs $22-$130/month in raw answer-generation costs, arguing teams will need smaller models, local models, and deterministic rules to make AI software margins work.
The Context Acquisition Company launched around the idea that vertical AI agents should acquire services businesses for their proprietary conversation logs, customer data, and institutional memory.
Clio hit $500M in ARR (annual recurring revenue) as legal-tech startups rode a wave of AI adoption and Anthropic pushed further into legal workflows.
Wirestock raised $23M to supply AI labs with licensed multimodal data, including photos, videos, and 3D content from more than 700,000 creators.
A Hollywood screenwriter argued AI training gigs have become the new waiting tables for entertainment workers, after doing 20 contracts across five platforms in eight months.
Jasmine Sun argued AI is arriving in China during an existing youth-employment crisis, creating a very different worker response than the U.S. AI productivity discourse.

🤖 AI Agents & Infrastructure

Thinking Machines Lab introduced Interaction Models for real-time multimodal collaboration across audio, video, and text.
Hyper launched as a self-driving company brain that picks up decisions in Slack, docs, and email, then feeds them as context into tools.
Devin relaunched as an autonomous AI software engineer with web app, terminal CLI, Linear/Jira ticket integration, embedded IDE, shell, browser workspace, DeepWiki, and Slack integration.
multica-ai/multica is an open-source managed-agents platform for assigning coding-agent tasks, tracking blockers, routing work through Squads, and turning repeated fixes into reusable team skills.
stablyai/orca is a desktop/mobile IDE for running fleets of parallel coding agents with isolated worktrees, multi-tab terminals, built-in git, GitHub PRs, SSH, and notifications.
OpenSquilla launched as a self-hostable Python agent runtime focused on cutting token costs with on-device routing, adaptive reasoning, memory, skills, and sandboxing.
HoneyHive unified observability and evaluation for production agents, including replay, live evals, human annotations, regression tests, and root-cause analysis.
Hermes Agent Computer Use lets Hermes drive your Mac desktop with clicks, typing, scrolling, dragging, screenshot handling, and model-agnostic backends.
BrowserCode is an open-source browser-native agent framework positioned as a new open baseline for browser agents.
holaOS gives agents a shared browser, files, apps, memory, and state so recurring research, content, or client-delivery work can persist across runs.
Mobbin MCP connects AI agents to more than 621,500 shipped product screens through MCP (Model Context Protocol, the standard that lets agents plug into outside tools) so they can reference proven app flows instead of guessing.
Liquid AI's voice-assistant cookbook maps spoken Home Assistant commands straight to function calls with a small audio model, skipping a separate speech-to-text pipeline.
MagicPath 2.0 is a multiplayer canvas where humans and agents like Codex or Claude Code can design and build functional prototypes together.
Orchard is an open-source agentic modeling framework whose sandbox powers recipes for software engineering, GUI agents, and Claw-Eval-style tasks.
Prime Intellect open-sourced renderers, a Python library that keeps long multi-turn agent training conversations stable and parseable.
Modal explained truly serverless GPUs, using cloud buffers, lazy image loading, and checkpointing to make AI inference feel less like traditional cluster management.
Hermes-agentmemory adds pull-model episodic memory to Hermes Agent with real deletes, synchronous writes, and trace logs showing what entered the prompt (HN thread) - free / open source.

💻 AI Coding & Developer Tools

OpenAI's Codex plugin for Claude Code lets developers run Codex inside Claude Code for code reviews, adversarial reviews, and background tasks with one install command.
OpenAI previewed Codex Mobile, bringing agentic coding to iOS and Android with voice input, live screen context, terminal, git, and PR workflows.
Codex added in-app browser testing, letting developers test apps at different viewports, capture screenshots, hide animations, and speed up evals.
Cursor added cloud dev environments for agents, with codebase cloning, dependency installs, credentials, multi-codebase support, version history, rollback, and audit trails.
GitHub teased an agent-native development environment integrated with the GitHub graph for code and meta-work like issue triage and PRs.
OpenCode kept growing as an open-source Claude Code alternative that runs in terminal, IDE, or desktop and supports 75+ model providers.
Warp open-sourced its agentic dev environment, with cloud agents managed by Oz and a fast-growing repo.
Kilo Code v7 runs parallel coding subagents on git worktrees inside VS Code with inline diff review and side-by-side comparisons across 500+ models.
Mechanize launched GBA Eval, a benchmark where frontier coding agents get 24 hours to build a complete Game Boy Advance emulator in WebAssembly (browser-friendly compiled code) from scratch.
CJ Zafir released deepseek-hermes-reasoning-traces, a 240M-token fine-tuning dataset created with Codex 5.5 as orchestrator and DeepSeek V4 Pro as executor.
LlamaIndex open-sourced liteparse-server, a local document-parsing backend that extracts text and exact bounding boxes from PDFs, Office files, images, and spreadsheets.
OpenClaw OS is an OSS Claude Coworker-style workspace framed as one screen for agentic software work.
OpenGravity is a zero-install, vanilla-JS, bring-your-own-key clone of Google Antigravity with a live terminal, local file sync, and sidebar agent.
E2a is an authenticated email gateway for AI agents with email-auth checks, signed delivery, webhooks, live message fan-out, a command-line tool, and developer kits.
Grunden offers GLM 5.1 inference through an OpenAI-compatible API hosted on NVIDIA H200 hardware in Sweden for EU-sensitive data.
Burn, baby, burn is a beautifully cursed bash one-liner that deliberately burns Claude Code or Codex tokens, mostly as a joke about usage dashboards (HN thread) - free / open source.
Epiq is a terminal-native issue tracker that stores work as an immutable event log and syncs through Git for codebase-local project flow (HN thread) - no pricing details.
Hopper brings AI agents to mainframe operations, helping teams navigate TN3270 terminals, inspect z/OS datasets (IBM mainframe files), write JCL job scripts, debug jobs, and pause for approvals (HN thread) - free hobby plan, enterprise custom.
Gigacatalyst embeds an AI builder inside SaaS products so sales, customer success, and customers can generate governed custom workflows on top of existing APIs (HN thread) - custom pricing.

🔬 AI Research & Models

Log analysis is necessary for credible evaluation of AI agents argued outcome-only benchmarks hide shortcuts, scaffold limits, and dangerous actions, with an airline-agent case study showing repeated-attempt performance was undercounted by nearly 50%.
A Single Neuron Is Sufficient to Bypass Safety Alignment argued that manipulating a tiny internal feature can bypass refusal behavior in aligned LLMs.
Hallucinations Undermine Trust; Metacognition is a Way Forward reframed hallucinations as confident errors and argued models need faithful uncertainty, not just more knowledge.
G-Zero introduced verifier-free self-play for open-ended LLM generation from zero data, using a proposer/generator loop instead of external judges.
Jina Embeddings v5 Omni released multimodal embeddings across text, image, audio, and video.
EgoMemReason tested memory reasoning over week-long egocentric videos, with the best model reaching only 39.6% accuracy.
minimal JEPA gives researchers small PyTorch implementations for experimenting with Yann LeCun's self-supervised world-model approach.
S-FLM is a hyperspherical flow language model that rotates token embeddings on a sphere instead of adding random noise.
δ-mem introduced an efficient online memory mechanism for LLMs using compact helper matrices instead of rewriting the whole model.
AsymFlow offered a faster one-step image-generation method that compresses part of the generation path to cut work.
InclusionAI open-sourced Ring-2.6-1T, a trillion-parameter thinking model with two reasoning gears and strong agent-execution results.
Datadog released Toto 2.0, an open-weights time-series forecasting model family from 4M to 2.5B parameters.
Pixal3D generated 3D assets directly in the input view's coordinate system instead of canonical space.
RigidFormer learned multi-object rigid-body contact dynamics from point clouds and reportedly scaled to 200+ objects at 23.9 FPS.

🏛️ AI Policy, Governance & Safety

OpenAI explained what it is optimizing ChatGPT for, including better support during tough moments, reminders to take breaks, and improved life advice guided by clinical expert input.
Microsoft Research argued whimsical adversarial strategies can break frontier agents, because standard safety testing misses weird out-of-distribution tactics like fake treaties or fabricated emergencies.
Lujain Ibrahim and collaborators found sycophantic AI can make human interaction less satisfying over time, even as users prefer validating AI styles.
CarryOnBench showed models often refuse ambiguous-but-benign requests, then struggle to recover usefulness safely after clarification.
Ryan Greenblatt proposed concrete training experiments labs should run now, including pessimized misalignment runs and clean chain-of-thought baselines.
RSL Media's Human Consent Standard, backed by George Clooney, Tom Hanks, and Meryl Streep, gave creators a way to set terms for how AI systems can use their work or likeness.
The Economist warned AI tools could help novices with bioterrorism, while noting current studies still find important bottlenecks outside the model.
Jeff Geerling argued Bambu Lab is abusing the open-source social contract by threatening legal action against fork work that supports offline/developer-mode control.

🛠️ AI Tools & Products

Printing Press turns an API spec, website, or community project into a Go command-line tool, Claude Code skill, OpenClaw skill, and MCP server (Model Context Protocol server, a connector that lets agents use outside tools).
AtomicChat gives you local chat with Qwen, Kimi, LLaMA, DeepSeek, and other models running privately on your machine.
Krea 2 gives creators an in-house image foundation model for expressive images with style and moodboard control.
Perceptron Mk1 gives users frontier video understanding and embodied reasoning at lower reported cost, with access through OpenRouter and Perceptron's demo.
TinyFish runs lightweight local agents that scrape, summarize, and act on webpages in one click with no setup.
TrueShort makes original phone-native movies and series using small creative teams built around a showrunner, AI filmmaker, and editor.
Voker monitors production AI agents by classifying user intents, corrections, and resolutions, then surfacing where agents fail before customers complain.
Ponder turns raw footage into a polished rough cut from prompts like "make a 60-second highlight reel," then exports to Premiere, Final Cut, or DaVinci.
DramaBox gives creators cinematic text-to-speech with emotion, laughs, sighs, breaths, scene transitions, and voice references.
Tavus Image-to-Replica turns one photo, AI portrait, illustration, or mascot into a usable AI human with real-time streaming performance.
Papel turns research discovery into a personalized, social paper feed with on-device paper chat, quizzes, and community discussion (HN thread) - free waitlist.
Mello is a 3D-printed Spotify speaker for kids that runs on Raspberry Pi and lets parents control the library from their phone.
Halupedia is an encyclopedia of a fictional universe that does not exist until you visit it.
html-anything is an agentic HTML editor where local CLIs like Claude Code, Cursor, Codex, Gemini, or Copilot write production-ready HTML across skills and surfaces.
open-design is a local-first open-source alternative to Claude Design for prototypes, slides, images, video, and exportable assets.

🤖 Robotics & Embodied AI

Actor Labs deployed a fine-tuned VLA policy for heavy excavators (vision-language-action, which maps camera input and language into robot actions) on edge hardware.
Figure AI's live broadcast showed humanoid robots Bob, Frank, and Gary running 24/7 autonomously on Helix-02, sorting packages at human-parity speeds.
OpenVO is an open-world visual odometry system, estimating camera movement from video in unseen environments by modeling temporal dynamics.
Realtime-VLA FLASH speeds up diffusion-based robot-action models by using a draft-and-check prediction loop, claiming 3x speedups with minimal performance loss.
MANUS Metagloves teleoperated a 22-degree-of-freedom robot hand, showing precise finger-tracking data for dexterous humanoid robot training.

📊 Fundraising & Deals Roundup

Amp - $1.3B for an alternative AI compute grid.
Wispr - in funding talks at a possible $2B valuation for Wispr Flow voice dictation.
Exaforce - $125M Series B at a $725M valuation to catch and stop cyberattacks in real time.
Jensen Huang's foundation - $108M worth of CoreWeave compute purchased and donated to universities and nonprofits.
Consensus - $30M led by GreatPoint Ventures for an AI operating system for researchers.
Basata - $24.5M total for AI that reads faxed healthcare referrals and helps schedule patients.
Pit - $16M seed led by a16z for an AI product-team-as-a-service startup from Voi founders.

🎙️ Interviews, Panels & Podcasts

Patrick O'Shaughnessy interviewed Anthropic CFO Krishna Rao on compute procurement, scaling to $30B ARR (annual recurring revenue), $100B-scale infrastructure commitments, and the returns to frontier intelligence.
Empirical Work in the Age of AI collected the Stanford IRiSS panel transcript and reading map on how agents automate replication, scraping, fine-tuning, and causal-inference workflows.
ARC Prize shared Jerry Tworek interviewing François Chollet on defining intelligence, why games are strong intelligence tests, and differences between OpenAI and Anthropic's AGI approaches.
Dr. Fei-Fei Li argued CEOs are over-fixated on language models while the real economy is physical, perceptual, and spatial.

💡 Industry Commentary & Analysis

Ben Thompson argued the inference shift changes compute infrastructure, because long-running agents make far-away, high-throughput compute more economically attractive.
Tomasz Tunguz argued for localmaxxing, saying about half of agent tasks can run on a local 35B model, with lower latency as the real win.
Andrej Karpathy recommended asking LLMs to structure responses as HTML, because visual output is higher-bandwidth than plain text.
MoE Capital argued video world models are still in their GPT-2 era, with reinforcement learning and video generation converging into the missing simulation layer for robotics, games, and physical agents.
The Build argued Snowflake Postgres, Databricks Lakebase, and Azure HorizonDB are mostly Postgres at the wire-protocol layer, so the real decision is which analytics platform lock-in you prefer.
Beyond Semantic Similarity argued agentic search should interact directly with the corpus instead of relying only on fixed top-k semantic retrieval.
Ethan Mollick argued prompting should look less like spell-casting and more like giving a competent manager a clear assignment.
SHL0MS posted a real Monet while claiming it was AI-generated, prompting confident critiques about brushwork, texture, color, and soul.
Gabe argued LLM psychosis scales with distance from the code, because people farthest from implementation are most vulnerable to self-reinforcing AI fantasy.

Previous Around the Horn Digests

Catch up on everything you missed:

Tuesday, May 12: Anthropic refused China access to its newest model, Isomorphic raised $2.1B, Google pushed Gemini deeper into Android, and supply-chain attackers hit Mistral and TanStack.
Monday, May 11: Cerebras upsized its IPO, Cowboy Space raised money for orbital data centers, and Google confirmed the first criminal AI-discovered zero-day.
Weekend, May 9-10: Weekend roundup of the AI stories that piled up while everyone pretended to log off.
Thursday, May 7: Anthropic shipped Natural Language Autoencoders, Google DeepMind detailed AlphaEvolve's science work, and Cloudflare cut 20% of its workforce.
Wednesday, May 6: Anthropic ran Code with Claude SF, shipped developer updates, and the federal safety net looked very unready for AI job displacement.
Tuesday, May 5: OpenAI governance lore resurfaced, legal AI adoption kept climbing, and Harvey's usage metrics turned heads.
Monday, May 4: The White House considered pre-release AI vetting, Anthropic and OpenAI both linked up with private equity, and Mayo Clinic's AI spotted pancreatic cancer early.

That's a Wrap

That's 100+ stories, launches, papers, deals, and strange little tools from one very crowded AI week. If you made it to the bottom, congratulations: you can now explain the AI stack from orbital data centers to JavaScript-package malware to why someone built a token-burning command-line joke.

For the daily version, make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.

See you tomorrow.

P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.

Around the Horn Week in Review: Everything That Happened in AI This Week (May 11-15, 2026)