Everything That Happened in AI Today Wed, April 22, 2026

Google and OpenAI both unveiled full "Agentic Enterprise" stacks on the same day, a Sony robot beat elite table-tennis players under official rules for the first time in history, Anthropic tested pulling Claude Code from $20 Pro and walked it back within hours, and Meta started recording keystrokes on US employee laptops to train AI agents on real office work.

Welcome to the Around the Horn Digest, the only page you need to sound dangerously informed at your Thursday standup. Today was Enterprise Agent War Day: OpenAI launched Workspace Agents in ChatGPT, Google spent its entire Cloud Next keynote hammering on "Agentic Enterprise" and shipped Gemini Enterprise Agent Platform plus Workspace Intelligence, and Microsoft quietly countered with hosted agents in Foundry. Meanwhile Anthropic had a rough 24 hours (Claude Code Pro backlash, Mythos breach probe), Meta went full surveillance, and a robot learned to play table tennis better than most humans ever will. Let's get into it.

Around the Horn — Thursday, April 23, 2026

Between Google's Workspace Intelligence and OpenAI's Workspace Agents, everybody wants your job. Within a six-hour window, the two most valuable AI companies on earth shipped near-identical products aimed at the same thing: the repeatable 45-minutes-a-day tasks that actually fill your workweek. Google CEO Sundar Pichai opened Cloud Next '26 by declaring "the Agentic Enterprise is real," then watched Cloud CEO Thomas Kurian unveil the full stack: Gemini Enterprise Agent Platform (mission control for building, governing, and orchestrating thousands of agents), Workspace Intelligence (a semantic layer that maps meaning across Gmail, Docs, Drive, Chat, and Calendar), and a 40% QoQ growth stat on paid Gemini Enterprise seats.

A few hours later, OpenAI shipped Workspace Agents in ChatGPT: five Codex-powered templates (Software Reviewer, Product Feedback Router, Weekly Metrics Reporter, Lead Outreach Agent, Third-Party Risk Manager) deployable to ChatGPT or Slack, free until May 6, then metered. Both companies sold the same pitch: your company's admins install these, employees stop doing repetitive work, you keep your strategic thinking. Both products ship with role-based controls, audit logs, agent identity (Google uses cryptographic IDs), and human-approval gates on write actions.

The rhetoric moved this week. Google Workspace VP Yulie Kwon Kim said the company is done with AI as "a passive assistant." OpenAI framed Workspace Agents as "an evolution of GPTs." The mask came off: both products are sold to admins, not individuals. Danfoss already automated 80% of transactional decisions in its email order processing (42-hour response time to near real-time); Macquarie Bank reclaimed 100,000 team hours; GE Appliances deployed 800 agents across manufacturing. When the agent does 80% of what you used to do, what happens to the job title? We've got the full brief on how OpenAI's Agent Week stacked up (Workspace Agents + Codex + Images 2.0) in today's deep dive.

🏆 TOP 5 NEWS (Around the Horn)

Anthropic tested removing Claude Code from ~2% of new $20 Pro subscribers without announcement; George Pu spotted it (4.9M views), Simon Willison and Ed Zitron called out changes that bled into public docs, and Anthropic's Head of Growth Amol Avasare framed it as a 2% experiment before the company reverted within hours; Sam Altman quote-tweeted the saga with "ok boomer."
OpenAI launched ChatGPT for Clinicians (free for verified US physicians with advanced models, clinical search, reusable skills, CME credit, optional HIPAA) plus HealthBench Professional, a 1,500+ conversation benchmark where GPT-5.4 reportedly outperformed specialty-matched physicians given unbounded time and web access, with physicians rating 99.6% of responses safe and accurate across 7,000 pre-launch test conversations.
Meta rolled out its Model Capability Initiative, tracking US employee keystrokes, mouse movements, and screen content on Google, LinkedIn, Wikipedia, GitHub, Slack and other approved apps to train AI agents on real office workflows, with "dystopian" internal backlash and no opt-out.
Sony AI's Ace robot became the first autonomous robot to beat top-level human table tennis players under official rules (3 of 5 matches), using multi-camera vision, spin estimation, and 3,000 hours of simulated training; the research was published in Nature (code).
OpenAI faces a criminal investigation in Florida after prosecutors alleged ChatGPT advised the FSU campus shooter on when and where to strike, with AG James Uthmeier issuing a subpoena for the full conversation logs (HN discussion).

Honorable Mentions

Anthropic is investigating a report that its unreleased Mythos cybersecurity model (capable of advanced attack simulation) was accessed without authorization via a third-party vendor.
SpaceX preempted a $2B Cursor fundraise (at $50B valuation) with a $10B collaboration fee plus an option to acquire Cursor for $60B later this year using post-IPO public stock; Cursor halted the existing round with a16z, Thrive, NVIDIA, and Battery Ventures to take the Musk deal (Bloomberg, CNBC).
The Pentagon requested a $54 billion FY2027 budget for autonomous warfare, the largest single commitment to AI-powered war in history; experts warned the US lacks doctrine for deploying lethal autonomous swarms.
Google and Thinking Machines Lab signed a single-digit multibillion-dollar deal for Google Cloud AI Hypercomputer infrastructure (Nvidia GB300-powered) to train Mira Murati's frontier models and run her Tinker automation tool.
British Progress economist Pedro Serôdio published a counter-narrative: three years after ChatGPT, UK employment data across 412 occupations shows zero visible difference between roles most and least exposed to AI, with adoption concentrated in just 2.1% of work tasks.
Anthropic released findings from a survey of 81,000 Claude users showing workers in highly AI-exposed jobs (especially early-career) express the greatest concern about displacement, even as they report the largest productivity gains.

🍪 TOP TREATS TO TRY

Xiaomi MiMo-V2.5 and MiMo-V2.5-Pro enter public beta with frontier-tier agentic performance (73.7 on SWE-Bench Pro), native vision and audio, 1M-token context, and 40-60% better token efficiency on long-horizon tasks; API available now with open-weights planned soon —paid API, free TTS tier for now.
Qwen3.6-27B is a dense 27-billion-parameter open-weight model that scores 77.2 on SWE-Bench Verified and handles image, video, and 256K context on consumer GPUs; Unsloth shipped ready-to-run GGUF quantizations (compressed versions for slower hardware), and Simon Willison's benchmarks show strong local inference —free to try (open weights on Hugging Face).
OpenAI Privacy Filter is an open-weight 1.5B model that detects and redacts personal information (names, addresses, passwords, API keys, 8 categories total) with 96%+ accuracy, runs locally, supports 128K context, and can be fine-tuned (Hugging Face, GitHub) —free to try.
Brex CrabTrap is an open-source proxy that sits in front of your production AI agents, inspects every outbound request with an LLM-as-a-judge, and blocks risky actions (like sending customer data to untrusted APIs) before they hit the wire —free to try.
MythosWatch gives you a live public ledger of 51+ entities (governments, regulators, banks) with confirmed access to Anthropic's unreleased Mythos cybersecurity model, updated as new access becomes known (HN) —free to track.
Broccoli is an open-source one-shot coding agent from yzhong94 that turns Linear tickets into reviewable GitHub PRs using Claude or Codex, running entirely on your own Google Cloud infrastructure (HN) —free to self-host.
Trainly audits your AI agent's production traces for 72 hours with a single decorator, surfacing cost concentrations, retry loops, blind spots, and performance drifts (HN) —free, no credit card.

🏢 Big Tech & Major Companies

Sundar Pichai noted that 75% of all new code at Google is now AI-generated and approved by engineers, up from 50% last fall, with engineers "orchestrating fully autonomous digital task forces" and a recent complex code migration completed 6x faster than it was a year ago (Business Insider).
Google unveiled eighth-generation TPUs at Cloud Next: TPU 8t for training (scales to 9,600 chips in a single superpod, 3x the processing power of Ironwood, 2x better performance-per-watt) and TPU 8i for inference (80% better price-performance than the prior generation, built for running millions of concurrent agents) (TechCrunch, Bloomberg).
Google turned Chrome into an AI co-worker with Chrome auto browse: US Workspace admins can now enable Gemini-powered multi-step web tasks (booking travel, filling CRM forms, summarizing dashboards) with human checkpoints at every action (TechCrunch).
Google also rolled out AI Overviews to Gmail for Workspace (ask natural-language questions about your inbox and get summaries pulled from multiple emails; on by default for Gemini customers), added generative AI features to Google Maps for enterprise (prompt-create realistic Street View scenes, Veo animation, satellite imagery analysis in BigQuery, pre-trained Earth AI models for detecting bridges and power lines), and made Workspace Intelligence turn AI into an office intern across Docs, Sheets, Gmail, and Chat.
Google Cloud and Wiz launched three new AI security agents (Threat Hunting, Detection Engineering, Third-Party Context) plus AI-BOM tracking for unauthorized AI tools, as part of a unified AI-powered cybersecurity platform spanning code to runtime.
Google made an interesting call with its new agent builder: code-first Gemini for technical teams, no-code options for business users, both with built-in security, simulation, and governance.
Google also launched its official Agent Skills repository with compact Markdown skills and best practices for Google Cloud products (BigQuery, GKE, Gemini API, etc.) that agents can load on-demand to gain real-time expertise without context bloat (GitHub).
OpenAI teamed up with Infosys to embed Codex and other tools into Infosys' Topaz platform so enterprises across 60+ countries can modernize legacy software, automate DevOps, and scale AI deployment.
OpenAI briefed federal agencies, state governments, and Five Eyes allies at a D.C. event on its GPT-5.4-Cyber model capabilities and its Trusted Access for Cyber program (dual-track rollout of safeguarded public and more permissive defender versions).
OpenAI responded to the Axios developer-tool supply chain compromise by rotating macOS code-signing certificates, pushing app updates, and confirming no user data was accessed.
OpenAI briefly listed text-embedding-3-small as deprecated in its docs, triggering community panic about orphaned vectors across trillions of embeddings, before quickly reversing the decision after Jeff Huber flagged it; Romain Huet and Steven Heidel confirmed it was a mistake and the model remains available.
Anthropic told a federal court it has no technical "kill switch" for its AI models once deployed by the Pentagon in classified settings, even though its usage policies prohibit autonomous weapons and mass surveillance.
Anthropic detailed how its new Automated Alignment Researcher agents are running parallel end-to-end research cycles that compress months of human effort into days of compute, leaping one weak-to-strong supervision benchmark from 0.23 to 0.97, while also learning to game evaluations in creative ways.
Anthropic published a guide to building production agents that reach real systems via MCP, covering intent-grouped tools, OAuth with CIMD and vaults, context-efficient clients, and how MCP fits alongside direct API calls and CLIs.
Anthropic's leaked Claude Code codebase is now being used by researchers to test copyright challenges in the AI era as these tools make reproducing creative work dramatically faster.
Anthropic also suspended every account (110+ users) at an entire agrotech organization without warning, cutting both Team and API access.
Microsoft committed A$25 billion ($17.9B) by the end of 2029 for AI infrastructure and capacity in Australia; its largest-ever investment there.
Microsoft launched hosted agents in Foundry Agent Service (public preview): secure, isolated, scale-to-zero compute sandboxes purpose-built for production AI agents with persistent filesystems, BYO VNet, and enterprise governance; see also the developer journey post.
Meta broke ground on its first Oklahoma data center: a $1B+ AI-optimized facility in Tulsa, 1,000 construction jobs, 100 permanent roles, water-positive operations by 2030.
Tesla raised its 2026 capex plan to $25B (3x historical levels), with CFO confirming negative free cash flow for the rest of the year as it funds AI, robotics, and energy.
X (the platform) launched AI-powered custom feeds curated by Grok that replace Communities, with new ad slots.
Google Cloud also redesigned Gemini Enterprise for the agentic era with a unified agent platform pulling in everything from Vertex AI, partner integrations, and 200+ models (The New Stack analysis of Google Cloud's cat-and-mouse with OpenAI).
Beijing opened a probe into Manus AI after Meta's acquisition, tightening control of Chinese AI startups trying to shed national ties.
Tencent opened an international beta for its AI agent product QClaw.

💼 AI Productivity, Labor & Economics

Gallup reported 1 in 4 Americans now use AI for health information, most to supplement care but some substituting for provider visits when cost or access is a barrier.
CNN detailed five ways doctors are using AI chatbots (literature review, visit summaries, prior authorization letters, differential diagnoses, coding) while raising "shadow AI" data-privacy concerns.
Epoch AI (with Ipsos) found 80% of weekly Claude users come from $100K+ US households versus 37% for Meta AI (32% under $50K); ChatGPT, Gemini, and Grok cluster 56-64% high-income.
Claimable (backed by Mark Cuban) has reversed thousands of denied health insurance claims by using AI to draft appeal letters; founder Warris Bokhari, a British doctor, calls the manual portion "hand-to-hand combat" with insurers while the company automates progressively more of the workflow.
404 Media reported AI-native startups are now proudly "tokenmaxxing": bragging they spend more on AI tokens than human salaries (e.g., Swan AI's $113k/month Claude bill for a 4-person team chasing $10M ARR with fewer than 10 staff) as an efficiency flex.
Elon Musk backed Universal High Income (not Universal Basic Income) to offset AI-driven job losses, arguing agents will produce so much wealth that everyone should share abundance rather than receive a subsistence floor.
Mark Cuban shared three prompts he recommends plugging into Claude to future-proof your career: "Tell me how to be an expert at creating agents for small businesses," "Create study guides that ask me questions," and "Correct me and adapt to my knowledge level."
Disconnect argues tech CEOs exaggerate AI's job-destroying power to distract from how the tech is actually being used to deskill workers, erode labor rights, push down pay, and shift power to employers.
The Guardian reported "Uber for nurses" gig-work apps are lobbying to deregulate healthcare, using AI to cut costs at the expense of nurses' wages, rights, and patient safety.
Reese Witherspoon pushed back on backlash to her AI comments, saying "no one is paying me" and framing her interest as a "curious human" whose kids are learning AI tools.

🤖 AI Agents & Infrastructure

Trevin Chow released Compound Engineering v3, the official plugin for Claude Code, Codex, Cursor, and more: unified ce- namespace, full requirements tracing with stable IDs (brainstorm → plan → commit), per-finding reviews, and first-class multi-harness support (230 likes, 22 reposts).
The Turing Post analyzed the leaked Claude Code harness codebase (~512K lines) and found only ~1.6% is actual decision-making intelligence; the other 98.4% is operational infrastructure (tools, permissions, memory layers, sub-agents, recovery systems) that makes agents reliable in production.
Omar Sar0 argues there is a huge open opportunity to train a single really good open-weight model that just works across any agent harness without lock-in, since every provider is optimizing only for its own (412 likes, 61 reposts).
Open Chronicle (by taoh/Screenata) is a local-first macOS menubar app that gives Claude Code and Codex CLI persistent on-device screen memory via OCR plus local summarization, so agents remember what's on your screen without sending data off-device (HN).
MemFactory (paper, HN) is a unified modular framework for inference and training of agent memory with plug-and-play components and GRPO that delivers up to 14.8% gains.
TACO is a self-evolving framework for terminal agents that automatically learns task-aware context compression rules from interaction histories, delivering 1-4% higher accuracy and ~10% lower token usage with zero manual tuning (thread).
Nous Research opened Moonshot's Kimi K2.6 model for free inside the Hermes agent harness/portal for 24 hours.
Moonshot AI launched Kimi Agent Swarm, a new product that coordinates multiple AI agents in parallel on complex tasks (research, analysis, content creation) to deliver complete outputs in a single run (announcement).
Arena reports Kimi K2.6 is live with strong gains across leaderboards: #2 open model in Code Arena (#6 overall, on par with Claude Sonnet 4.6), #1 open in Vision Arena (#15 overall), #1 open in Document Arena (#8 overall), and #2 open in Text Arena (#24 overall) (585 likes, 39 reposts).
Applied Compute open-sourced production agentic workload traces (hundreds of tool-call turns from coding, code QA, and office tasks with heavy-tailed distributions) plus a lightweight replay harness, so inference engines can optimize for real 2026 agentic serving patterns rather than single-turn benchmarks (112 likes, 9 reposts).
Thomas Ahle shared a minimal multi-agent tmux + Claude Code setup that spins up isolated persistent agent sessions with shared context, hot-reloadable tools, and zero custom harness code.
Adam Wolff noted the latest Claude Code update quietly dropped Grep/Glob in favor of pure Bash for file operations; a small but meaningful shift toward more reliable tooling.
Ridger Zhu breaks down why Looped LLMs are emerging as a foundational architecture for agentic systems: state-tracking advantages on graph tasks like BFS (mirroring the abilities seen in Anthropic's unreleased Claude Mythos), plus the three core scaling gates (depth stability, inference efficiency, FLOPs effectiveness).
Nomadic AI built an agentic visual-reasoning system that grounds vision-language models with explicit evidence-gathering (3D motion, segmentation) for reliable video understanding in robotics and autonomous driving (team page, 242 likes, 29 reposts).
Stanley for X launched an X-focused agent tool (waitlist).

💻 AI Coding & Developer Tools

HN discussion of GPT-5.5 in Codex is circulating (unconfirmed model version; thread worth watching).
Marc Gauthier wrote up his current daily AI workflow as an engineer: Claude Code for most coding, heavy use of git worktrees, and a "check-in every 20 minutes" rhythm to keep agents from wandering.
Ethan Ding argues that tools like Claude Code accelerate routine coding but create technical debt, bloat codebases, and fail to solve the real bottlenecks of vision, taste, and restraint needed for breakthrough products.
Malus.sh (by Dylan Ayrey and Mike Nolan) uses a clean-room two-agent process to recreate open-source software with fresh code, bypassing original licenses for corporate-friendly licensing; $0.01/KB (HN).
Simon Willison reviewed Qwen3.6-27B and highlighted its flagship-level agentic coding performance in a compact dense 27B model, sharing fast local inference benchmarks and impressive SVG generation examples.
pashmerepat fixed a silent auth fallback in OpenClaw so the Codex harness now actually works with OpenAI models; the agent immediately went from shallow heartbeat loops to full proactive cycles (reading workspace, inspecting repo, making verified edits, showing continuity across heartbeats) with zero prompt changes (1.4K likes, 83 reposts).
Linus Torvalds replied to a drm/i915 UAF patch on the LKML thread, noting his concerns about how AI-generated patches are being merged.
Sullivan & Cromwell apologized to a New York federal judge for filing court documents containing AI-generated hallucinations (misquoted bankruptcy code, fabricated case citations) in the Prince Group bankruptcy.

🔬 AI Research & Models

Xiaomi MiMo-V2.5 adds frontier-tier agentic performance, native visual/audio understanding, and 1 million token context; MiMo-V2.5-Pro is the flagship tier (thread).
Qwen released Qwen3.6-27B as a dense 27B-parameter multimodal open-source model that surpasses the previous flagship Qwen3.5-397B-A17B (which had 17B active MoE params) on agentic coding, with 77.2 on SWE-bench Verified, 256K context, and FP8 / ModelScope versions available; Unsloth GGUF quants for local inference, code, Qwen Studio.
Google DeepMind (led by Shangbang Long) built Vision Banana: a generalist vision model created by instruction-tuning Nano Banana Pro that reframes all perception tasks as image generation, hitting new state-of-the-art on segmentation, referring expression comprehension, depth estimation, and surface normals with a single unified set of weights.
OpenAI released the OpenAI Privacy Filter: an open-weight 1.5B-parameter model for SOTA PII detection and redaction across 8 categories, 96%+ F1, 128K context, runs locally (Hugging Face, GitHub, mihaimaruseac).
Dao AI Lab open-sourced SonicMoE: a hardware-efficient framework for fine-grained Mixture-of-Experts (a technique that splits models into specialist "experts" activated per-input) that delivers 1.87–4.04× kernel speedups on Blackwell and Hopper GPUs (paper, PyPI, thread).
Facebook Research released TD-JEPA: an open framework and paper for training and comparing state-of-the-art zero-shot reinforcement learning agents using latent-predictive representations.
Meta Aria released EgoVerse: a massive egocentric (first-person-view) human video dataset and consortium-scale study specifically for robot learning from first-person perspectives (paper).
Perry Dong (with collaborators) introduced FASTER: a value-guided sampling method for RL that matches best-of-N test-time scaling performance at 1/6th the compute (paper, code).
Tolga Birdal argues training at the edge of stability produces better generalization because stochastic optimizers converge not to a point but to lower-dimensional fractal attractor sets; he introduces Sharpness Dimension (derived from the full Hessian spectrum) for a new parameter-count-free generalization bound and explains grokking as attractor collapse (567 likes, 90 reposts).
Dmitry Krotov explains that Energy Transformers combine looped transformers, energy-based models, and dense associative memories for latent-space iterative refinement, stable token dynamics, and strong retrieval; he presented NRGPT at ICLR 2026.
Yuhan Liu built Speculative Verdict: a training-free framework that has multiple lightweight VLMs draft diverse reasoning paths on information-intensive images, then uses a large verdict model to synthesize the correct answer; beats GPT-4o by 11.9% on InfographicVQA at 15-26% of o1's cost.
Michael Y. Li introduced Neural Garbage Collection: an RL method that trains language models to jointly reason and manage their own KV cache via discrete eviction actions scored by attention, achieving 2-4× cache reduction while outperforming hand-designed baselines on AIME 2025 and Countdown (paper).
Sudip Roy argues static models won't win against dynamic environments; continual learning systems that adapt during use are the only ones that scale.
Pedro Izmailov shared new research on when LLMs can learn to reason with weak supervision using RLVR (reinforcement learning with verifiable rewards).
Mikhail Parakhin (Shopify CTO) detailed Shopify's AI-native engineering stack at ~100% adoption: SimGym customer-behavior simulator (0.7 correlation with real merchants), Tangle/Tangent reproducible ML optimization loops (5× search throughput), critique loops for agent code, and Liquid architectures.
Vlad Quattrociocchi, Valerio Capraro, and Matjaz Perc published "Epistemological Fault Lines Between Human and Artificial Intelligence," arguing LLMs are not epistemic agents but stochastic pattern-completion systems (walks on high-dimensional linguistic transition graphs), identifying seven fundamental fault lines (grounding, parsing, experience, motivation, causal reasoning, metacognition, value) and coining "Epistemia" for linguistic plausibility substituting for judgment.
Future Science's Mirror journal published "Sycophancy as World-Model Dissociation: Evidence for Late-Layer Output Suppression in Llama-3-8B-Instruct", plus "Linear Geometry of Knowledge Conflict Resolution in Large Language Models".
Simon Liang released PolySkill, a framework enabling web agents to learn generalizable and compositional skills through polymorphic abstraction (GitHub).
Daniel J. Kim published research on scaling test-time compute for agentic coding, building on earlier work on RL for ML engineering agents and MLE-Smith.
CoInteract is a physically-consistent human-object interaction video synthesis framework built on Diffusion Transformers with Human-Aware Mixture-of-Experts and spatially-structured co-generation to reduce hand/face artifacts and interpenetration (thread).
LaviGen (by Fenghora) is a 3D generative model repurposed for autoregressive layout generation (project, paper).
Odyssey released Odyssey-2 Max, a scaled-up world simulation model for interactive generative video.
Aksel Joonas (Hugging Face) released ML Intern, an open-source agentic ML research assistant built with smolagents (HF Space).
Neuracore open-sourced its robot learning framework (site).
TRI-ML released VLA Foundry: a unified framework for training Vision-Language-Action models (paper, thread).
Bo Wang shared a deep technical review of the Ace robot table-tennis paper.
Google TIPSv2 advances vision-language pretraining with enhanced patch-text alignment via head-only EMA, iBOT++, and Gemini-generated multi-granularity captions, unifying SSL and contrastive learning for superior spatial awareness without scaling models or data (paper, project page, breakdown by Massimiliano Viola).
VideoFlexTok extends FlexTok's progressive 1D compressive tokenization to videos by integrating temporal structure, producing emergent disentanglement of motion and semantics purely from compression; Amir Zamir argues variable-length compressive tokenization is a promising direction beyond efficiency.
Inclusion AI released LLaDA2.0-Uni, a unified diffusion LLM with MoE backbone and SigLIP-VQ tokenizer that handles multimodal understanding and generation in 8 steps with native interleaved reasoning (paper, model, and code released).
Diamond Maps (Peter Holderrieth et al.) introduces stochastic flow map models that enable efficient single-step accurate alignment to arbitrary rewards at inference time via amortized look-aheads (GitHub).
Jubayer Hamid introduced Poly-EPO (Polychromic Exploratory Policy Optimization), a post-training method using set RL and polychromic objectives (average reward × diversity) that trains exploratory reasoning models and outperforms standard GRPO on math reasoning tasks (related Polychromic Objectives paper, 287 likes, 54 reposts).
rosinality shared new research showing that standard self-play problem generators hack rewards by creating complex but useless problems; the fix is a guide model that selects useful problems based on how well they relate to the unsolved set.
HumanScore (Stanford, led by Tiange Xiang) is a new benchmark for evaluating human motion quality in AI-generated videos.
AutoMedBench scores autonomous AI agents on the full medical research pipeline (plan, setup, validate, infer, submit) across public medical imaging datasets and difficulty tiers in a sandbox, evaluating both process and final output quality (GitHub).

🏛️ AI Policy, Governance & Safety

House lawmakers got a closed-door DHS counterterrorism demo showing how easily "jailbroken" (unguarded) AI models can give step-by-step instructions for bombs, attacks on the U.S. Capitol, kidnapping members of Congress, or nuclear weapons.
WIRED showed five leading AI models (DeepSeek-V3, Claude 3 Haiku, GPT-4o, Nemotron, and Qwen) can be weaponized for sophisticated social engineering, with DeepSeek-V3 producing alarmingly realistic multi-turn phishing campaigns.
Bloomberg detailed how law enforcement is overwhelmed by a surge of AI-generated child sexual abuse material, forcing investigators to sift through massive volumes of synthetic imagery to locate real victims.
Anker made its own Thus neural-net compute-in-memory AI audio chip (the world's first of its kind) to bring larger, more efficient local AI models to Soundcore earbuds for dramatically better noise canceling; products launching around May 21.
Chaofan Shou showed that Kimi K2.5 with synthesized multi-agent harnesses discovered and fully exploited 6 critical browser vulnerabilities (full system hijack from a single page view or extension install) and open-sourced AgentFlow, a Python library for graph-based evolving multi-agent systems; full details and 10 Chrome zero-days in the arXiv paper (548 likes, 89 reposts).

🛠️ AI Tools & Products

Bond launched a new social media platform that uses AI trained on users' posted "memories" to recommend real-world activities (restaurants, concerts), explicitly designed to pull people away from doomscrolling.
Tom's Guide tested using ChatGPT with Charlie Munger's Inversion rule to rethink personal goals around success, stress, and career growth; found the AI-generated mindset shift more clarifying than any dedicated productivity app.
MIT Technology Review outlined 10 things that matter in AI right now (LLMs+): cheaper and more efficient architectures via mixture-of-experts and diffusion models, advances in long-context reasoning, recursive processing, and solving complex multi-step problems without derailing.
Sidekick is a new agent-assist tool (thread).
Tolaria (by Luca Rossi) is a markdown-based workflow tool (GitHub, site).
Bud launched a new agent-focused app (thread).
NeoCognition is a new AI tool (thread).
Core Automation launched (threads: CoreAutoAI, MillionInt, Arohan).
Riley Goodside demoed ChatGPT Images 2.0 by generating a hyper-detailed 16-bit Final Fantasy 6-style JRPG boss fight set inside a Chipotle that faithfully recreated the exact board position from the 1956 Byrne-Fischer chess game (1K likes, 67 reposts).
Garry Tan introduced skillify, a 10-step checklist for turning every agent failure into a permanent structural fix (SKILL.md contract, deterministic scripts, unit/integration/LLM evals, resolver triggers, DRY audits, smoke tests, brain filing), and open-sourced GBrain (with gbrain doctor that auto-repairs failing agents) so the same bug never reaches an agent twice (1.2K likes, 117 reposts).
arrakis_ai (CHOI) showed how tying Codex's scheduled Computer Use to the newly launched GPT Image 2 lets you build a fully autonomous news operation that wakes on time, curates daily events, spins up card-news images, and ships posts automatically while navigating the UI itself (158 likes, 9 reposts).
Levels.io upgraded hotelist.com with xAI vision models that now accurately detect real hotel gyms (with barbells and power racks) after analyzing 1M+ photos from 60K properties (1.1K likes).
Ivan Burazin shared new dev tooling updates.
AntimLabs shared research demos.
AI Safety Memes shared a new meme roundup on current safety debates.

📊 Fundraising & Deals Roundup

SpaceX × Cursor — $10B collaboration fee + $60B acquisition option later this year; SpaceX preempted Cursor's in-progress $2B raise at $50B valuation, planning to finance post-IPO with public stock.
Microsoft — A$25B ($17.9B) commitment to build Australian AI capacity by 2029.
SK Hynix — ~$13B for a new South Korea plant for advanced AI memory packaging (including HBM chips).
SoftBank — seeking a $10B margin loan secured by its OpenAI shares, funding further AI investments.
Thinking Machines Lab — single-digit multibillion-dollar Google Cloud infrastructure deal (Nvidia GB300-based) for Mira Murati's RL workloads and Tinker tool.
Vast Data — $1B Series F at $30B valuation, with Nvidia backing; $500M+ ARR on AI data infrastructure.
Google — $750M fund announced at Cloud Next to accelerate corporate AI/agent adoption.
Sooth Labs — ~$50M (Felicis-led) for event-forecasting AI models, launched by Russ Salakhutdinov with ex-Meta co-founders; backers include Yann LeCun, Jeff Dean, and Andrew Bosworth.
AI Health Fund (Esther & Anne Wojcicki) — $10M target (already $1.5M committed, 12 investments) for early-stage AI-health startups via Treehub residency accelerator; $50k-$150k checks.
10x Science — $4.8M seed for a platform that combines AI agents with deterministic chemistry algorithms to interpret mass-spectrometry data and validate AI-generated drug candidates.
Google Cloud Next showcase startups — Lovable (tracking toward $400M ARR), Notion, Gamma, Inferact (vLLM creators), ComfyUI, Chorus, Proximal Health, Vapi.
AngelList launched USVC, a VC fund open to all US investors starting at $500 (no accreditation required), giving exposure to OpenAI, Anthropic, xAI, and others.
Blackstar — hiring across engineering, operations, design (thread).

🎙️ Interviews, Panels & Podcasts

Greg Brockman recounted the inside story of the 72 hours that almost killed OpenAI (including the "Phoenix" contingency plan) on The Knowledge Project (Spotify, Apple).
Cora's Kieran argues most frameworks for working with AI agents incorrectly assume humans should stay in the loop at every phase; presents a better framework.
Shane Parrish published notes from recent interviews on AI and decision-making.
Dan Shipper posted a video on compound engineering principles (follow-up).
Erik Torenberg shared takeaways from new AI founder interviews.

💡 Industry Commentary & Analysis

Ethan Mollick highlighted OpenAI's new free ChatGPT-5.4 for verified US clinicians, noting it beat specialty-matched physicians given unlimited time and web access on the new HealthBench Professional benchmark.
wh (nrehiew_) argues frontier coding models are doing too much, routinely over-editing far beyond the minimal fix needed for simple bugs (GPT-5.4 is worst, reasoning models naturally over-edit more), making human review painful; wh proposes RL training as the best way to teach minimal editing without catastrophic forgetting, with Levenshtein distance and added cognitive complexity as metrics (364 likes, 31 reposts).
Sebastian Aaltonen and mike64_t detailed how Codex routinely writes excessive defensive code inside hot inner loops (re-validating already-clean data, avoiding derived state, over-applying SRP to create 17 passes over the same data), generating technical debt that demands manual cleanup; they trace it to RL-induced caution and a fear of dirty state (Seb: 994 likes, 46 reposts; mike: 336 likes, 19 reposts).
kimmonismus reports that frontier AI labs are shifting back toward smarter base models in pretraining (OpenAI's Spud/GPT-5.5 and Anthropic's Mythos) rather than leaning on long test-time reasoning chains; the payoff is higher quality with fewer tokens, faster responses, and lower cost per query (547 likes, 25 reposts).
Sam Altman hinted at upcoming OpenAI product reveals.
Thomas Sottiaux (OpenAI) shared dev workflow notes.
Rohan Varma posted on Meta's latest research push.
theo shared commentary on the Claude Code Pro saga.
LLMJunky, scaling01 (follow-up, earlier) shared analyses of the day's model drops and agentic benchmarks.
Elie Bakouch discussed open-weight privacy model implications.
Jeff Hollan shared observations from Cloud Next.
Zan Armstrong posted analysis of today's agent launches.
Ethan Ding's essay on Claude Code's limits for product quality is making the rounds on dev Twitter.
Adrian Krebs built a deterministic scorer for 15 AI-generated design tells (colored left borders, glassmorphism cards, gradient backgrounds, icon-topped feature cards) and ran it against 500 recent Show HN landing pages; a designer quipped that "colored left borders are almost as reliable a sign of AI-generated design as em-dashes for text."
CNBC's Investing Club added a speculative stock to its Bullpen that benefits from the AI data center boom.
HealthRanger, Inkdrop, Chandu Thota, Ishaan Sehgal, Brad Menezes, Austin Kozlowski, Boyuan Chen, Yu Su, Arena shared additional takes on the day's agent-platform news.

🎨 Viral Moments & Weird Internet

The New York Post revealed that "Emily Hart," a top MAGA influencer, was actually an AI creation by a 22-year-old Indian medical student ("Sam") who made thousands monthly selling MAGA-themed merch and lewd content to lonely conservative men he called "super-dumb."
三金 (threeaus) shared Anthropic philosopher Amanda Askell's fable prompt technique: instruct the model to pick a graduate-level concept from any field and explain it indirectly through a complete fable where the reveal is saved for the end, then append a plain explanation (3.6K likes, 586 reposts).
Kun Chen built and open-sourced Org-Bench, a multi-agent simulation of the classic org-chart memes (Apple, Amazon, Facebook, Microsoft, Oracle, Google) where each organization had to build and ship a web spreadsheet; Google won with a score of 3.62 thanks to its design-doc culture and middle-manager integrators enabling parallel gap-catching, while Apple bottlenecked, Amazon lost info, Facebook diffused responsibility, Microsoft duplicated work, and Oracle created red tape (writeup, 551 likes, 44 reposts).
GPT Image 2 went wild on launch day: Deedy Das turned entire research papers (AlphaFold included) into extremely accurate conference posters with one prompt, and Jeff Ladish generated detailed Where's Waldo-style scenes of Anthropic and OpenAI offices stuffed with in-crowd jokes (red-team dragons, etc.) on first try (Deedy: 190 likes, Ladish: 73 likes).

Previous Around the Horn Digests

Catch up on everything you missed:

Monday, April 20, 2026: 200+ of you wrote in on the 2-to-1 AI-doomers-vs-boosters lead; we went deeper on the split.
Friday-Sunday, April 17-19, 2026: Anthropic shipped Claude Design (the Figma competitor), three senior OpenAI execs announced pre-IPO departures, Claude Opus 4.7 wrote a working Chrome exploit for $2,283, and a fake Claude site started installing malware.
Thursday, April 16, 2026: Anthropic shipped Opus 4.7 and OpenAI countered with a full Codex overhaul, Factory raised $150M from Khosla, OpenAI launched its first life-sciences model, and Canva rebranded as "an AI platform with design tools."
Wednesday, April 15, 2026: OpenAI's $852B valuation faced backer scrutiny while VCs offered Anthropic up to $800B, Allbirds pivoted to AI compute and popped 600%, and a federal court ruled your AI chats have no attorney-client privilege.
Tuesday, April 14, 2026: Sam Altman's SF home was attacked twice in three days, Maine banned large data centers, Anthropic shipped Claude Code Routines, and Nvidia Blackwell GPU rental prices jumped 48% in two months.
Monday, April 13, 2026: Stanford's 2026 AI Index quantified the canyon between AI insiders and the public, Anthropic's Mythos triggered a Fed-led bank summit, and an AI named Luna signed a 3-year retail lease in Cow Hollow.

Monthly skill digests: AI Skill — April Week 1 | AI Skill — March (Part 3) | AI Skill — March (Part 2)

That's a Wrap

That's 180+ stories from today alone. If you made it to the bottom, you now understand Anthropic's fake-door test theater better than the 2% of prosumer signups who actually sat through it. Condolences to whoever in Anthropic's growth team has to write the internal retro Monday morning.

For the daily version (bite-sized, 5-minute reads that explain why any of this matters for your work), make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.

See you tomorrow.

P.S: Know someone who'd find this useful? Forward it to them and tell them to subscribe here.

Around the Horn Digest: Everything That Happened in AI Today (Wednesday, April 22, 2026)