verything That Happened in AI Today Monday, June 22

Today, Sakana turned "which model should I use?" into a routing problem, while OpenAI tried to turn security bugs into merged fixes.

Welcome to the Around the Horn Digest, everything that crossed our desk today, sorted. The day's lead was Sakana Fugu, teed up below, because it makes a sharp claim about where AI products are headed: the model interface may become a manager for many models, not one giant brain doing everything alone. Meanwhile, OpenAI pushed Codex deeper into cybersecurity, AI data centers kept swallowing bond markets and power contracts, Five Eyes agencies sounded the alarm on frontier cyber models, and Getty's stock briefly looked like it had discovered caffeine. A normal Monday, if your normal Monday includes chess-playing model swarms and $750B infrastructure anxiety. Let's get into it.

Around the Horn — Monday, June 22, 2026

The big news today was Sakana Fugu, Sakana AI's new "multi-agent system as a model." In plain English: instead of asking users to pick one AI model for every job, Fugu coordinates a pool of specialized models behind one OpenAI-compatible API (the standard interface many AI apps already use). The launch post framed this as a way to get frontier-level performance without depending on one vendor.

The product comes in two versions. Fugu is the lower-latency default for coding, review, and interactive work; Fugu Ultra is the slower, higher-quality version for hard multi-step work like paper reproduction, cybersecurity analysis, and patent research. Sakana's technical report ties the system to its TRINITY and Conductor research on learned model orchestration, and the Sakana AI Console is where users can start from the product side.

The broader pitch showed up across the launch links: Sakana's X post described Fugu as one model API for routing, delegation, verification, and synthesis. Vercel added Fugu Ultra to AI Gateway with no markup and BYOK support (bring your own API key), vercel_dev amplified the integration, and Sakana showed Fugu Ultra playing blindfold chess, plus a second system-behavior post. The interesting question is not whether every benchmark holds up; it is whether "model" becomes the label for the whole orchestration layer, or if we start using different nomenclature like "system" or "orchestrator" or "harness". Each of those have different definitions today, but could in theory swallow the whole category. Maybe we'll land on something more Skeuomorphic, or should I say Anthropomorphic, like "Manager." I can picture the memes now: "Hold on, let me tell my manager to put my agents on the job and have their sub-agents farm it out."

🏆 TOP 5 NEWS (Around the Horn)

🏆 TOP 5 NEWS (Around the Horn)
OpenAI expanded Daybreak from finding bugs to landing fixes, adding Codex Security for deep scans and codebase-specific patches, GPT-5.5-Cyber for authorized defensive work, a Cyber Partner Program for trusted security providers, and Patch the Planet for open-source maintainers.
- Axios reported the updated cyber model is limited to vetted cybersecurity companies and researchers, scored 85.6% on OpenAI's CyberGym vulnerability-reproduction benchmark (a test of whether a model can reproduce real security flaws) versus 81.8% for GPT-5.5, and includes partnerships involving Australia, Canada, France, Germany, Japan, Poland, South Korea, and EU institutions.
- Sam Altman framed the release as defensive work with the U.S. government and security ecosystem, and his post showed GPT-5.5-Cyber beating Mythos 5 on CyberGym. The important contrast: OpenAI is putting cyber capability behind vetted access, partner programs, maintainer workflows, and patching tools rather than treating it as a broad public release.
- OpenAI's main X post announced the Daybreak expansion, Patch the Planet focused on moving from security findings to merged fixes with human review, TestingCatalog highlighted the Codex Security plugin for finding, validating, and fixing vulnerabilities directly inside the coding agent, and the Cyber Partner Program named the defender ecosystem angle.
- OpenAI's public profile also pointed readers to the company's jobs page.
AI infrastructure moved deeper into finance, energy, cooling, and inflation.
- CNBC said Amazon, Alphabet, Microsoft, and Meta are projected to spend roughly $750B on AI infrastructure this year, making the bond market newly relevant to tech investors.
- Morgan Stanley pitched data-center developers on leveraged-buyout-style debt markets, and the Financial Times reported SpaceX is plotting a $20B bond deal.
- Chevron signed a 20-year power deal for a Microsoft data center, showing how AI compute is turning electricity access into a strategic asset.
- CNBC said SpaceX's Colossus data center landed a compute deal with Reflection worth up to $6.3B, while Axios reported Reflection will pay SpaceXAI $150M a month from July 1, 2026 through 2029 for immediate access to Colossus 2 GB300 chips, Nvidia's newest AI accelerators, and related hardware.
- Axios said Nvidia's next-generation liquid-cooled AI systems can run coolant at 113 degrees Fahrenheit, which could reduce or eliminate mechanical chillers in many data centers. Adoption timing, retrofit limits, and electricity-related water use remain unresolved.
- SemiAnalysis argued AI chip demand has reversed the long-running Moore's-law price decline, with quality-adjusted import prices for computers and semiconductors up 14.4% year-over-year and 3.6% in May alone before tariff effects.
The Anthropic export-control fight widened into a cyber-governance story.
- The Guardian reported a rare Five Eyes warning that frontier AI models capable of taking down governments and businesses may be months away.
- TechCrunch argued export controls historically fail at stopping cyber software, while FT asked whether Anthropic's own warnings helped trigger the ban.
- Bloomberg said early Mythos users still had access after the U.S. order, Politico said White House talks shifted toward shared security benchmarks, and Ren Ito argued AI sovereignty requires access and ownership, not just nominal model availability.
- Sophia Cai reported that the White House and Anthropic were developing a formal technical assessment framework to score jailbreak severity by measuring safeguard bypass, capabilities exposed, and practical consequences, with standardized benchmarks and methodology for future incidents.
- kimmonismus highlighted a high-stakes claim, attributed via The Economist to Sen. Mark Warner citing NSA/Cybercom chief Gen. Joshua Rudd, that Mythos allegedly breached nearly all classified NSA and U.S. Cyber Command systems in hours rather than weeks on June 11; the Economist author later clarified the example was illustrative of potency under specific conditions, and replies disputed the literal reading.
- Andrew Curran argued that a more capable internal Mythos version had already emerged from Anthropic training and that blocking public access to Fable 5 or Mythos 5 would not slow frontier development, because labs can keep training stronger systems internally and may even free resources by embargoing public access.
GLM-5.2 became the day's major open-agent model story.
- Interconnects called it a step change for open agents, meaning open-weight models may finally be competitive for long, tool-using software tasks rather than only short benchmark prompts.
- Artificial Analysis said GLM-5.2 became the leading open-weights model, meaning its model files are publicly available, on its Intelligence Index. Artificial Analysis's X post said GLM-5.2 also led all open-weight models on GDPval-AA, a real-world agentic benchmark averaging about 31 turns per task, ranking #3 overall behind Claude Fable 5 and Claude Opus 4.8 while ahead of GPT-5.5 xhigh, a higher-compute GPT-5.5 setting.
- Design Arena showed GLM-5.2 topping HTML web design rankings, a useful signal because it tests whether models can produce usable interfaces, not just answer coding trivia.
- Baseten said GLM-5.2 runs at more than 280 tokens per second and under 0.8 seconds to first token. The Baseten model page describes it as Z.AI's MIT-licensed long-horizon coding and agentic-engineering model with a Mixture-of-Experts architecture, where the model activates specialized slices of itself per request, a sparse-attention design, and a 1M-token context window.
Getty Images announced a multi-year deal to display licensed Getty content inside ChatGPT search and discovery experiences, and Bloomberg said Getty shares jumped about 200% in early trading after the OpenAI deal.

Honorable Mentions

Samsung Electronics started rolling out ChatGPT Enterprise and Codex to all employees in Korea and all Device eXperience employees worldwide, one of OpenAI's largest enterprise deployments.
Google Cloud launched Gemini Enterprise Agent Platform, the successor path for Vertex AI, combining model selection, agent building, secure sandboxes, long-running runtimes, Memory Bank, Agent Identity and Registry, Model Armor, simulation/evaluation/observability tools, and enterprise integrations into one governed agent-development platform.
Waymo temporarily restricted nearly 4,000 robotaxis from freeways after a voluntary software recall tied to 13 construction-zone incidents in Phoenix and the San Francisco Bay Area, forcing affected rides onto slower non-highway routes while it prepares a fix.
Satya Nadella warned that AI giants need to earn society's permission and avoid eating the rest of the economy.
SK Hynix overtook Samsung as South Korea's most valuable company as AI memory demand pushed its market cap to $1.362T.
JD.com founder Liu Qiangdong said the company's Nirvana Plan is preparing roughly 700,000 delivery and frontline workers for a future where robots handle package delivery.

🍪 TOP TREATS TO TRY

Stripe Directory gives developers and agents one discovery layer for finding businesses across Stripe Apps, Projects, and Machine Payments; Stripe introduced it as a searchable Stripe-network directory, Patrick Collison framed it as an early experiment in agent-friendly service discovery, Andrew Curran emphasized the practical value of searchable profiles, fast integration, and free payments between connected accounts, and stripe.directory is the public entry point. Public preview; pricing not specified.
Codex Record & Replay turns a recorded Mac workflow into an inspectable, editable Codex skill you can reuse; OpenAI Developers announced it, TestingCatalog shared a video demo and noted it was not yet available in the EEA (European Economic Area), UK, or Switzerland, and the Codex changelog covered the June 18 app update. Pricing follows Codex access.
Cursor /automate lets agents configure triggers, instructions, tools, Slack emoji workflows, GitHub events, and computer-use automations from a plain-English task; Cursor announced the feature. Pricing follows Cursor plans.
Gemini Spark is Google's 24/7 personal agent beta for U.S. Google AI Ultra subscribers, handling end-to-end tasks under your approval: auditing project trackers against meeting notes, turning Gmail newsletters into researched summaries, summarizing Google Maps and YouTube feedback themes, coaching sales-discount replies, reorganizing Drive files in approved batches, and turning Calendar, email, and Drive context into client-meeting agendas in Google Docs. Beta; requires Google AI Ultra.
lift extracts machine-readable JSON from PDFs and images at 90.2% field accuracy, near Gemini 3.5 Flash's 91.3%; Vik Paruchuri announced the open 9B model, Hugging Face hosts the weights, and GitHub has the code. Open weights; pricing not specified.
Browser Use paired text-only GLM-5.2 with Browser Use v2 multimodal QA subagents to inspect generated websites, find bugs, judge aesthetics, and send targeted fixes back to the model. Pricing not specified.
Redactyl redacts sensitive information from documents entirely in your browser, and Chris Brady said parsing, manual marking, redaction, and export all happen locally with nothing uploaded or stored; the current version is manual, with local small-model and WebGPU auto-detection planned, meaning future detection could use the browser's graphics chip access without uploading documents. Free to try.

🏢 Big Tech & Major Companies

Google pushed further into AI chips and Hollywood. Google is reportedly using Nvidia's playbook to build a rival AI chip business, while Google DeepMind and A24 announced a research partnership to develop new AI-assisted creative workflows for artists.
Google AI accidentally gave DuckDuckGo the best ad copy. eWeek reported that Google's AI-generated answers appeared to recommend DuckDuckGo to users looking for ways to avoid AI-heavy search, an awkward but revealing result as DuckDuckGo pitches optional AI features, No AI search, privacy, and user choice while Google pushes AI Overviews deeper into search.
Google retired its older Nest speakers. eWeek reported Google retired the Nest Mini and Nest Audio, leaving existing owners with ongoing support while Google shifts its smart-speaker future toward Gemini-powered devices and services.
Tencent started testing WeChat's Xiaowei assistant. Tencent began testing Xiaowei, a voice-and-text assistant inside WeChat that can draft messages, start calls, and access services, with a wider rollout targeted for Q3.
Meta signed more AI compute deals with Crusoe. Bloomberg reported Meta secured new AI computing agreements with data-center developer Crusoe.
Intel tapped Seok-Hee Lee for its foundry packaging push. Reuters reported Intel named Seok-Hee Lee executive vice president of its contract chipmaking division as it focuses on advanced packaging.
ASML denied a reported China EUV shipment. Bloomberg reported U.S. officials were concerned one of ASML's top chipmaking machines reached China, while Tom's Hardware covered ASML's denial that it ever shipped an EUV scanner (a machine used to make the most advanced chips) there.
Japan's chip-equipment sales to China fell. Nikkei Asia reported Japan's top five chipmaking equipment suppliers posted a combined 10% China-sales drop for the fiscal year ended March 31, the first decline in years amid export restrictions.
Micron and Anthropic signed an infrastructure agreement. Micron and Anthropic announced a strategic infrastructure and supply agreement to collaborate on memory and storage architecture for AI, with Micron also making a strategic investment in Anthropic's Series H; MarketWatch added that Micron shares climbed Monday as investors digested the multiyear memory-and-storage supply role for Anthropic's frontier-model development.
Super Micro rallied on its Nvidia Vera Rubin infrastructure push. MarketWatch reported Super Micro shares jumped 14% after the server maker showcased Data Center Building Block Solutions for Nvidia's Vera Rubin NVL4 platform at the ISC High Performance Conference, putting the stock on track for its best two-day run since May 2025.
Smartbird completed one of the stranger AI pivots. eWeek reported that the public company formerly known for Allbirds sneakers completed its rebrand to Smartbird after selling the footwear brand and assets, naming Nadia Carlsten to lead the AI infrastructure business, and pursuing private AI clusters with a $50M convertible financing facility for high-performance GPU assets.
Amazon dropped a nearly finished Sam Altman movie, and OpenAI lost Barret Zoph again. Variety reported Amazon dropped Luca Guadagnino's nearly finished Sam Altman movie "Artificial" after its OpenAI partnership, while The Verge reported Barret Zoph left OpenAI again after five months.
Alphabet's AI bench took another hit. Bloomberg reported Alphabet shares fell 7% after Nobel-winning DeepMind scientist John Jumper left for Anthropic, days after Gemini co-lead Noam Shazeer left for OpenAI.

💼 AI Productivity, Labor & Economics

Vibecoding became an M&A due-diligence test. The Decoder reported Bain & Company is using AI-made software replicas to test whether acquisition targets have a real moat, with at least one private equity firm walking away after seeing what could be copied.
Garfield AI won a regulated UK small-claims test. Lawyer Monthly reported Garfield AI, the first purely AI-based firm authorized by the UK Solicitors Regulation Authority, helped a freelancer recover £7,000 in unpaid fees, with a human barrister handling advocacy.
AI entered freight and delivery labor fights. The Los Angeles Times reported Humble Robotics is developing a cabless electric autonomous Class 8 truck for California roads as the Teamsters push back on job and safety risks.
Jeff Bezos argued AI will create a labor shortage, not mass unemployment. eWeek reported Bezos told VivaTech in Paris that he "totally" disagrees with the view that AI will make humans redundant, framing AI as a builder's tool that increases the capacity to turn ideas into products and could raise demand for labor rather than simply replacing it.
Europe and Korea pushed AI deeper into factories. Bloomberg profiled Mistral, Siemens, and Schneider's industrial AI push in Europe, while Tech in Asia reported LG CNS is joining South Korea's AI adoption push for manufacturers.
Vertical startups are turning into "AI neolabs." David Tsong argued 2026/27 is becoming the year vertical startups like Ramp, Harvey, Cognition, and Decagon turn themselves into model-and-benchmark labs for fundraising, recruiting, and marketing leverage.
A law professor's AI sovereign wealth fund idea got attention. The Information profiled UC Davis law professor Sarah Polcz and the AI sovereign wealth fund idea associated with Bernie-world tech policy conversations.

🤖 AI Agents & Infrastructure

Cameron Wolfe mapped the agentic RL framework boom. Cameron R. Wolfe published an agentic RL overview covering frameworks like ToRL, AgentGym-RL, Agent-R1, AgentRL, AutoForge, RAGEN/RAGEN-2, ECHO, and policy/world-model co-training. The theme was how reinforcement learning, training agents by reward signals, changes when the model must use tools across many steps: modular tool interfaces, step-level trajectories, process rewards, asynchronous rollouts, curricula, and exploration all matter. In a preview post, he said the piece would focus on action masking versus supervised learning on environment tokens, scalable asynchronous RL infrastructure, and GLM-5.2's apparent shift from GRPO to PPO, two reinforcement-learning recipes, for long-horizon stability.
Simile AI pushed generative agents toward validated social simulation. Sonya Huang described Simile AI, co-founded with Joon Sung Park and advised by Percy Liang, as an effort to move generative agents from Smallville-style demos into scalable, validated simulators of human behavior using real-data customer journeys, behavioral randomized trials, and convergence/divergence measurement for society-scale analysis.
Agent communication protocols got a taxonomy. arXiv 2606.19135 classified large-language-model agent communication protocols (the rules agents use to talk to tools, users, and each other) across counterparty, payload, interaction state, discovery, and schema flexibility, pointing toward a federated layered stack rather than one winner.
Human-on-the-Bridge proposed scalable agent evaluation. arXiv 2606.16871 focused on scalable human-in-the-loop evaluation for AI agents.
alphaXiv showed GLM-5.2 running autoresearch. alphaXiv demonstrated GLM-5.2 running a full autoresearch pipeline on reinforcement-learning tasks, resolving setup issues, running async versus sync training across two 8xH100 GPU nodes, meaning two clusters of eight Nvidia H100 AI chips, and producing throughput/reward comparisons; Zixuan Li framed it as the first open-weights model proven capable of real end-to-end autoresearch and a major open alternative while Fable 5 access is restricted.
Agent-friendly CLI infrastructure expanded. CLI-Anything Hub created an agent-friendly registry/package manager for CLIs, command-line tools that run from a terminal, Chao Huang said the project reached roughly 100 CLIs, 35 categories, and 200K+ agent-driven calls, while ai-cli, emulate, portless, and agent-browser each shipped supporting primitives for terminal generation, local API emulation, stable localhost URLs, and browser automation.
- Chris Tate framed that stack as the practical recipe for reliable agent loops: real-browser verification with agent-browser, stable named localhost URLs with portless, local production-like API emulation with emulate, and terminal-based text/image/video generation with ai-cli.
pi offered an open-source local agent toolkit. earendil-works/pi provides a unified LLM API, agent loop, terminal UI, and coding-agent CLI, with quick-start docs at the GitHub anchor.
Active Memory Reconstruction tried to make agents recall more like humans. neural_avb broke down MRAgent, which iteratively traverses a Cue-Tag-Content graph instead of doing one-shot retrieve-then-reason memory; the linked paper appeared at arXiv.

💻 AI Coding & Developer Tools

Codex got framed as a full research partner for app development. OpenAI Developers showed Codex helping developer Paul Solt explore frameworks and build complex iOS/macOS apps, including a real-time ML classifier for spider identification, by gathering data from iNaturalist, organizing training sets, testing robustness changes like square crops, and prototyping Core ML apps such as DangerGuide and Super Easy Timer.
OpenRouter made API keys easier to store safely. OpenRouter added 1Password detection so developers with 1Password installed can save and manage API keys securely instead of leaving credentials scattered across local files or chat logs.
Vercel added WebSockets in public beta. Guillermo Rauch said Vercel now supports WebSockets, the standard connection that lets servers push live updates to browsers, including socket.io, so developers can build real-time Node.js apps from Vercel's global edge network through its serverless compute layer.
Claude Code CLI shipped a packed maintenance release. Claude Code Log said Claude Code CLI 2.1.186 added claude mcp login/logout for terminal authentication to Model Context Protocol servers, --no-browser support, automatic assistant replies after shell-command output, stricter named-subagent permissions, skills/status filtering, better compaction reminders, and fixes for streaming, background sessions, permissions, and theme flashing.
Cursor turned its agent stack into mobile, code hosting, and model training. Cursor announced Cursor Mobile for prompting agents, reviewing screenshots, and controlling desktop sessions from iOS; Origin, an agent-native Git/code-hosting platform with a fall 2026 waitlist; and a new frontier coding model being trained from scratch with SpaceX compute on Colossus.
OpenAI said beneficial RL improves persistent alignment. OpenAI Alignment reported that reinforcement learning toward beneficial behavior (training by reward signals) generalized across domains and persisted under adversarial pressure.
Microsoft released ShadowFrog for repo-shaped agent memory. Eric Yuan and the Froggy Team released ShadowFrog, the blog explains how agents dream on isolated branches to build a .shadow/ knowledge base, the GitHub repo provides the skills, and murefil discussed settings, memory, and use cases.
Claude Code subagents got nested context windows. Daniel San demonstrated and open-sourced Claude Code subagents that nest up to five levels deep, with each subagent running in its own context window and only the top-level summary returning to the main thread; the demo chained project-auditor, structure-checker, import-validator, dependency-tracer, and style-sync agents, with reproduction scenarios for noisy main context, clean/forked subagents, and nested chains.
pi-ai is getting a breaking modular developer-toolkit update. Mario Zechner said the next non-coding-agent release of pi-ai will let developers import only the SDK pieces they need (SDK means the reusable building blocks developers use to connect to a service), shrinking bundles, removing global state, and enabling custom environment resolution plus credential storage, while the old API stays available through a compatibility module with a migration guide coming.
Claude Code limits briefly broke for some users. ClaudeDevs said a bug affected roughly 3% of Claude Code Max and Pro users by showing incorrect weekly usage limits and sometimes blocking messages, then reset affected limits.
TeamPCP exposed open-source software's speed/security tradeoff. CyberScoop argued TeamPCP's success targeting open-source software reflected an industry trust model that prioritized shipping speed over security.
Elastic agreed to buy DeductiveAI. TechCrunch reported Elastic agreed to buy CRV-backed DeductiveAI, which uses AI to catch and resolve software bugs, for up to $85M.
Modal published a speculative-decoding speed calculator. Modal's LLM Engineer's Almanac launched a roofline model for estimating optimal speculative-decoding draft length and speedups across models, hardware, and batch sizes.
Matt Pocock shipped a public skills library. Matt Pocock shipped v1 of a skills library including /ask-matt, model-invoked versus user-invoked skills, and guidance for writing agent skills; the update pointed readers to AI Hero's skills subscription.

🔬 AI Research & Models

Nous Research passed a major open-agent milestone. Nous Research said Hermes Agent reached 200,000 GitHub stars, turning an open agent project into one of the day's clearest visibility signals for non-frontier-lab agent infrastructure.
Aster's autonomous research system posted benchmark wins across coding, biology, and model training. Aster said its autonomous research system launches thousands of parallel sub-agents to explore hypotheses, reached a new best 0.9098 validation BPB on Andrej Karpathy's NanoChat benchmark (a lower-is-better language-model score) through discoveries like custom GPU code that halves high-bandwidth memory traffic, matched the top ProteinGym protein-prediction ranking score of 0.526 about an order of magnitude faster by combining multiple hidden-layer signals, and set a 95.2-second NanoGPT speedrun record on 8 H100 GPUs, Nvidia data-center chips commonly used for AI training; the results are preliminary and benchmark-driven.
LLM-as-judge evaluations looked consistent but not always valid.
- arXiv 2606.19544 evaluated 21 LLM judges from nine providers across MT-Bench, JudgeBench, and RewardBench (three model-evaluation test suites) in 118 runs and roughly 541K individual judgments, finding that exact-match agreement overstated true discriminative power, Cohen's-kappa scores (a stricter agreement metric) deflated by 33-41 percentage points on MT-Bench, judge rankings moved by up to 14 positions across benchmarks, two production judges showed severe position bias despite >0.95 test-retest consistency, verbosity bias was small under the tested rubric, and the authors proposed a Minimum Viable Validation Protocol for judge evaluation.
PerceptionDLM made multi-region image understanding parallel instead of sequential.
- Hugging Face Papers highlighted PerceptionDLM as the #1 paper of the day: a ByteDance-linked multimodal diffusion language model (a vision-language model that refines answers in parallel instead of generating only one token after another) that describes multiple masked image regions at once instead of captioning them one by one, using efficient prompting and structured attention masking (rules for which image/text regions can look at each other) to avoid linear latency growth while preserving caption quality.
- The same paper page says PerceptionDLM-Base outperformed LLaDA-V on 15 of 16 multimodal benchmarks, stayed competitive with similarly sized autoregressive vision-language models (models that produce answers one token at a time), introduced ParaDLC-Bench for multi-region localized captioning, and released code, models, training data, and the evaluation suite.
Sparse-attention work kept attacking the cost of long reasoning.
- TIDAL introduced TidalDecode, a Position Persistent Sparse Attention system (a method that makes long-context models look at fewer past tokens) that runs full attention in selection layers, reuses high-score tokens across sparse layers, periodically corrects KV-cache pollution (stale attention memory inside the model), and reports up to 2.1x lower latency while matching or beating full attention on long-context tests including Needle-in-the-Haystack, PG-19, and LongBench.
- LessIsMore (abstract page) proposed Cross-Head Unified Sparse Attention, a training-free method that makes different attention heads (the model's separate focus patterns) share one globally consistent token set plus a recency window, preserving reasoning accuracy while attending to fewer tokens and reporting up to 1.6x end-to-end decoding speedup and 1.72x faster sparse-attention computation across Qwen3, DeepSeek, AIME, MATH500, GPQA, and LongBench, a mix of model families and math, science, and long-context tests.
- Lijie Yang said GLM-5.2 adopted cross-layer index sharing as a core sparse-attention technique, directly building on ideas from TidalDecode and LessIsMore for long-context reasoning.
- alphaXiv highlighted Sparrow, a sparse-rollout method for long-context RLVR (reinforcement learning with verifiable rewards) that tries to make long chain-of-thought rollouts cheaper without the drift that naive sparse attention can introduce.
- Guinness Chen connected efficient long-context RL work back to the same systems problem: sparse-attention methods like Sparrow, LessIsMore, and TIDAL only matter for agent training if they shrink giant reasoning rollouts without making the model lose the thread.
Fixed-point and anytime reasoning got sharper recipes.
- ArXivIQ broke down Fixed-Point Reasoners as a case that simple flat looped Transformers can beat more elaborate hierarchical reasoning models when Pre-LN and residual scaling (stabilization tricks for deep model loops) make the loop converge to stable latent fixed points, meaning stable internal answers, instead of exploding with depth.
- Fixed-Point Reasoners proposed FPRM, a Transformer-based looped model that uses fixed-point convergence as its halting rule, adapts compute to task difficulty, and reports strong results on Sudoku, Maze, state-tracking, and ARC-AGI (a tough abstract-reasoning benchmark) after addressing looped-model signal-propagation issues with pre-norm layers and residual scaling.
- Grigory Sapunov emphasized the practical result: a 7M-parameter Fixed-Point Reasoning Model hit 94.2% on Sudoku-Extreme versus 55% for 27M HRM and 74.7% for 7M TRM, two earlier reasoning-model baselines, generalized almost perfectly to longer out-of-distribution tasks, and introduced FPOPT, a damping optimizer that slows updates when convergence gets unstable, while noting batching remains hard because examples converge at different speeds.
- Budget Relative Policy Optimization introduced AnytimeReasoner, which trains models to perform well under varying token budgets by truncating thinking traces, forcing summarized answers at sampled budgets, using dense verifiable rewards, meaning frequent automatically checkable scores, for better credit assignment, and applying Budget Relative Policy Optimization to outperform GRPO (another reinforcement-learning recipe) across mathematical-reasoning budgets.
Latent-prediction papers pushed models beyond next-token learning.
- Paras Chopra called own-latent prediction a "breathtakingly beautiful concept," pointing to two papers that try to make transformers build compact world models and learn compositional structure from fewer examples by predicting internal representations instead of only the next visible token.
- Next-Latent Prediction introduced NextLat, an auxiliary training objective that teaches transformers to predict their own next latent states; the authors argue those latents converge toward compact belief states, improve world modeling, reasoning, planning, representation compression, and lookahead planning, and let the model draft and verify its own future tokens to speed language-model inference by up to 3.3x.
- Latent-prediction theory gave a sample-complexity theory for the same idea, proving on a compositional grammar that token-level learning can require samples exponential in hierarchy depth while latent prediction can recover the hidden tree with sample needs essentially constant in depth, up to logarithmic factors.
RL sample-efficiency history may need a modern LLM rerun. Tom Reed called for updating classic reinforcement-learning sample-efficiency analyses, such as earlier Atari-agent doubling-time estimates, on modern LLMs while noting the hard part is separating true algorithmic progress from better data and larger compute.
Agentic RL research moved from one big theme to many specific failure modes and fixes.
- AgentGym-RL framed agent training as multi-turn exploration across realistic environments, introduced a modular RL framework (reinforcement learning for agents) for agents trained from scratch without supervised fine-tuning (training on curated examples), and proposed ScalingInter-RL, which starts with shorter interaction horizons for stability and gradually increases exploration.
- Agent-R1 argued that treating a long agent rollout as one giant token sequence is the wrong abstraction, and instead built a modular framework around step-level trajectories, flexible context management, layered workflow/environment/optimization interfaces, and credit assignment at token or step level.
- AgentRL focused on scalable multi-turn, multi-task RL infrastructure, using a fully asynchronous generation-training pipeline, containerized environments, function-call APIs, cross-policy sampling for exploration, and task advantage normalization (score normalization across different tasks) for stability, with experiments claiming stronger performance than GPT-5, Claude Sonnet 4, DeepSeek-R1, and other open agents on five tasks.
- AutoForge tackled the environment bottleneck by automatically synthesizing difficult but verifiable simulated environments for agentic RL, then using environment-level reward estimation to reduce simulated-user instability and improve training efficiency on tau-bench, tau2-Bench, and VitaBench.
- RAGEN introduced StarPO and the RAGEN system for studying agent self-evolution, identifying an "Echo Trap" failure mode where reward variance cliffs and gradient spikes destabilize training, and arguing that agent reasoning will stay shallow without fine-grained, reasoning-aware rewards.
- RAGEN-2 identified "template collapse," where agents keep superficially diverse reasoning traces that are actually input-agnostic; the paper proposes mutual-information proxies (checks for whether reasoning changes when the input changes) and SNR-Aware Filtering (choosing prompts with cleaner reward signal) so training preserves input-dependent reasoning.
- ToRL trained LLMs to discover tool-use strategies through reinforcement learning rather than supervised traces, with Qwen2.5-Math model results showing ToRL-7B reaching 43.3% on AIME 2024, a hard math benchmark, and developing behaviors like strategic tool invocation, self-regulation of bad code, and switching between computational and analytical reasoning.
- ECHO argued terminal agents already receive rich world-model supervision in stdout (terminal output), errors, files, logs, and traces, then added an auxiliary loss (an extra training objective) that trains the policy to predict environment observation tokens; the paper reports doubled GRPO pass@1 (first-try success under that training recipe) on TerminalBench-2.0, a benchmark for terminal-using agents, and better terminal-dynamics prediction without extra rollouts.
- Policy and World Modeling Co-Training proposed PaW, which uses the same on-policy RL rollouts (the model's own recent attempts) to train both the action policy and a world model of what actions do, adding action-entropy data selection, noise-tolerant world-model loss, and reward-adaptive loss balancing to improve agent benchmarks without changing inference-time behavior.
- Nathan Lambert used the TMax paper to argue that meaningful agentic RL in 2026 now requires complex tool-use harnesses, automatic history management, serious infrastructure, and public "recipe work" such as weights, data, code, pitfalls, rollouts, and stable baselines so researchers can run rigorous ablations instead of rebuilding everything from scratch.
Robotics research split into tactile hands, simulated dual-arm control, and study resources.
- T-Rex demonstrated tactile-reactive dexterous manipulation: robotic hands reacting dynamically to high-frequency touch signals, addressing a major weakness in vision-language-action systems that often ignore touch or rely on static tactile encoders because rich tactile data is scarce.
- Sharpa Robotics showed T-Rex running on Sharpa Wave hands with high-resolution Dynamic Tactile Arrays on the fingertips, reporting 65% average success across 12 real-world contact-rich tasks, a 30-point gain over the strongest baseline, and stronger performance than non-tactile VLAs (vision-language-action robot models) such as pi0.5; the post credited Dantong Niu's team and advisers including Jim Fan, Fei-Fei Li, Jitendra Malik, Pieter Abbeel, and Trevor Darrell.
- Max C. demoed a dual-arm robotic stacking policy in NVIDIA IsaacSim with randomized target positions, using Gemini on Windows bridged to ROS2 (robotics middleware) inside WSL2 Ubuntu 24.04, a Linux environment running inside Windows, with a multi-model setup and troubleshooting stack including Gemini 3.5 Flash, Gemini 3.1 Pro, Claude Opus 4.6, and Antigravity 2.0 CLI/IDE.
- Hesamation highlighted Alisa Liu's path to OpenAI after 47 interviews and four offers, pointing readers to her concise "notes on LLMs" and math resources as free self-study/interview-prep material for researchers aiming at similar roles.
VisualClaw cut physical-world agent costs by filtering video at the edge. HuggingPapers highlighted VisualClaw, a Google/UCSC/UNC real-time self-evolving multimodal agent for physical-world tasks that filters streaming video before sending it to cloud models, cutting API costs by 98%, evolving skills from memory, and running on smart glasses; the VisualClawArena benchmark dataset includes 200 real-world scenarios, with a companion Papers with Code page.
Thinking in Boxes made single-photo 3D edits more controllable. Anand Bhattad and collaborators released Thinking in Boxes, a method that fits precise color-coded 3D boxes to objects in single real photos so users can move, rotate, scale, or change viewpoint while preserving photorealism for hidden and occluded areas; the project site shows generalization from synthetic multi-object scenes plus a small real Objectron dataset, stronger large-3D-edit results than recent methods, and continuity with the team's earlier Generative Blocks World work.
AI for science got a warning label. Nature argued AI could create a scientific renaissance or a diffuse monoculture depending on whether funders and reviewers reward originality over speed; Nature's X post amplified the piece.
Hierarchical structure may explain why deep networks learn natural data efficiently. Machine Learning Street Talk summarized Prof. Matthieu Wyart and collaborators' Random Hierarchy Model, which argues that natural data with hidden hierarchical/compositional structure can be learned by deep networks from polynomially few examples while allowing exponential recombination of valid new outputs, aligning with Yann LeCun-style models that predict internal abstractions instead of raw tokens.
ProtGPT3 added protein-design models. infinitycrab2 highlighted the open-source ProtGPT3 Family, a 112M-to-10B-parameter collection of protein language models including RL-aligned versions that cut repetitive low-complexity outputs by more than 20-50% in one DPO pass (a training pass that teaches a model to prefer better outputs) while preserving diversity, plus MSA homolog-conditioned variants (models that use related protein sequences as hints). In the cited results, the 112M MSA model beat supervised fine-tuning of up to 10B single-sequence models when prompted with as few as 15 homologs, produced 18/20 soluble expressed defluorinase designs (enzymes aimed at breaking carbon-fluorine bonds in PFAS-style chemicals) from just seven known sequences, and supports inference-time steering for properties like stability.

🏛️ AI Policy, Governance & Safety

AI safety community discourse spilled into the open. Richard Ngo argued that parts of the AI safety community built a memeplex where "taking AGI seriously" became a marker of seriousness and goodness; tszzl replied that the grim part of the AI boom is how everything can feel like a distraction from recursive self-improvement; and Sakana tied the Anthropic access restrictions to Ren Ito's argument that AI sovereignty is about real options, not merely owning a model.
China retaliated against U.S. export and defense restrictions. Nikkei Asia and AP reported China imposed export controls and procurement bans on dozens of U.S. companies, including defense and dual-use technology firms, in response to U.S. actions against Chinese companies.
Norway nearly banned AI in elementary schools. Reuters reported Norway is imposing a near-ban on generative AI tools for elementary pupils and tighter limits for older students to protect learning.
J.D. Vance's AI doctrine blended Silicon Valley and MAGA. The Atlantic described Vance's approach as pro-innovation, nationalist, worker-conscious, light-touch on federal regulation, skeptical of Big Tech concentration, and focused on human control of lethal AI.
Europe's AI sovereignty debate got blunter. Nathan Benaich argued that Europe cannot get real AI sovereignty by renting American frontier models that Washington can cut off overnight, as the Anthropic Fable fight illustrated; his prescription was domestic models and compute on European soil, not just regulation, trusted-partner schemes, or courtship.
Unitree became a robotics trade-policy flashpoint. China Select highlighted that Unitree robots, from a company designated as a Chinese military company, are still sold on Amazon in the U.S. and called for the GUARD Act to block such imports, while SemiAnalysis countered that restricting Chinese robotics hardware could hurt U.S. robotics R&D by denying American builders the components and learning loops they need to stay competitive.
The FAA tapped AI for air-traffic control capacity. Bloomberg reported the Federal Aviation Administration tapped Air Space Intelligence to build AI traffic-control tools aimed at cutting flight delays and increasing airspace capacity, making aviation operations another high-stakes government workflow moving from experiments into deployment.
AI campaign money turned NY-12 into a policy proxy war. The Guardian reported AI-focused super PACs have raised more than $100M this cycle and spent $49M so far, with roughly half of that spending concentrated on the Manhattan Democratic primary around Alex Bores, whose RAISE Act made him a target for Leading the Future, the "pro-AI" network funded by Marc Andreessen, Ben Horowitz, Greg Brockman, and Anna Brockman; counter-spending from Public First-linked groups, including money tied to Anthropic's public $20M contribution, turned the race into what backers called an "AI civil war" over whether state-level AI safety rules should exist.

🛠️ AI Tools & Products

All Hands Up made robot hands explorable in the browser. All Hands Up lets readers compare 10+ dexterous robot hands with interactive URDF (robot model file) visualizations, specs, joint controls, and grasp presets; RLWRLD announced the launch.
Google made Interactions API the main Gemini agent interface. Google AI Studio said the Interactions API is now generally available as the primary/default API for Gemini models and agents, adding managed agents in remote Linux sandboxes, background execution for long-running tasks, improved tool mixing, Deep Research speed/depth modes, charts and infographics, multimodal grounding, Nano Banana 2 image generation with Google Image Search grounding, Lyria 3 music generation, multi-speaker text-to-speech, lower-cost versus priority pricing tiers, 55-day interaction retention, and a reusable gemini-interactions-api Skill; Logan Kilpatrick framed it as the unified foundation for orchestrating Gemini agents.
Nvidia introduced Halos for robotics safety. Nvidia introduced Halos for Robotics, a framework and inspection-lab approach for testing and managing safety in physical AI and robotics systems, with Agility Robotics as an early adopter; Axios added that Halos combines software, compute, sensors, and inspection capability drawn from Nvidia's autonomous-vehicle work, and that Nvidia is positioning itself around robotics software and chips rather than building humanoid hardware itself.
Runway's Aleph 2.0 expanded video frames for new formats. Runway showed Aleph 2.0 changing a video's aspect ratio for different platforms while intelligently expanding the scene so the output looks as if it was shot in that format originally.
fal added Seedance 2.0 in native 4K. fal launched Seedance 2.0 endpoints for reference-to-video, image-to-video, and text-to-video generation with native 4K output, sharp edges, clean textures, and consistent motion definition across frames.
Joseph Azar built an interactive octopus simulation from a single image. Joseph Azar shared a webcam-hand-tracking Three.js physics demo of an octopus with realistic movement and generated music, created by making the 3D model from one image in omma_ai Studio and remixing an Octopus demo.
Unitree's G1 humanoid robot climbed Chimborazo with human support. eWeek reported a modified Unitree G1 named Pemba reached the summit of Ecuador's 6,000-meter-plus Mount Chimborazo on June 5 as a test run for a possible Everest expedition, walking autonomously on slopes under 30 degrees but requiring human carrying and reassembly through harder sections.
Crown turned creative briefs into parallel variations. Crown turns one creative brief into many parallel text, design, image, and video variations so teams can compare directions quickly. Pricing not specified.
birdclaw turned Twitter/X into a local workspace. birdclaw offers archive import, cached live reads, focused triage, and reply flows in a local web app plus CLI; Peter Steinberger called it his favorite new way to read Twitter.
SpAItial brought 3D world editing into agent workflows. SpAItial AI said its new Claude desktop plugin and SpAItial MCP server (Model Context Protocol, the standard that lets agents plug into outside tools and data) let you create, edit, and refine 3D worlds directly inside chat or through coding agents including Claude Code, Codex, OpenCode, VS Code, and Windsurf, without leaving the agent interface.
Science Corporation showed a pocket neural recording system. Science Corporation shared a build video for a neural recording system small enough to fit in a pocket, with additional ecosystem details at science.xyz.
autoarxiv tried to make paper reproduction a URL swap. Akshay Pachaar demonstrated swapping "arxiv" to "autoarxiv" so an agent can read a paper, clone linked code, resolve dependencies, run a minimal reproduction, and report whether the core claim holds.

📊 Fundraising & Deals Roundup

Baseten - $1.5B Series F led by Altimeter, Conviction, and Spark, co-led by Sands Capital and Wellington Management, after 20x revenue growth and 40x inference-volume growth, for open-weight model inference, post-training, custom reinforcement learning, and production deployment used by customers including Cursor, Notion, Lovable, Harvey, HubSpot, OpenEvidence, Abridge, Decagon, and Parallel.
Groq - $650M to expand data-center capacity and pivot from chip startup to AI compute provider.
Go - roughly $553M raised in Japan's largest IPO of the year so far, with proceeds earmarked for robotaxi R&D, M&A, and expansion of Japan's largest taxi-hailing app.
Toto - $495M planned investment over five years in semiconductor materials for the 1-nm logic era.
Upscale AI - $190M round, bringing total funding to $500M in under 18 months, for open-standard AI networking fabrics.
Momenta - targeted $1B Hong Kong IPO for the Chinese autonomous-driving unicorn backed by GM, Toyota, and SAIC.
Lingyi iTech - sought a Hong Kong IPO to raise up to HK$8.3B ($1.1B) for AI hardware and humanoid robotics expansion.
Coowa - planned to file for a Hong Kong IPO within two to three months.
Prosper AI - $30M Series A led by Andreessen Horowitz for healthcare administration agents.
Aether AI - $20M seed for causal world models for physical AI, with additional coverage from The Next Web.
Defense tech startups - $12B venture rush this year, already surpassing the full-year 2025 total.

🎙️ Interviews, Panels & Podcasts

Maggie Appleton argued agentic engineering needs teams, not just speed. Maggie Appleton's GitHub talk framed collaborative AI engineering as moving beyond one developer plus many agents toward planning-heavy, team-aligned systems with dozens of agents; dexhorthy pulled out the core point that the future is not "one engineer running 12 Claude terminals" so much as better human-AI and team coordination.

💡 Industry Commentary & Analysis

AI data centers got a defense from the ground. Amir Efrati pointed to data centers as a "godsend" for some communities, while The Information argued large AI data centers have real drawbacks but often bring more local benefits than residents expect.
Ethan Clark argued robotics is in its LLM-2023 phase. Ethan Clark argued robotics now resembles the 2023 LLM moment, with representation prediction/JEPA-style self-supervised video pretraining looking like the path that can scale.
Feitong Yang warned AI makes fake startup progress cheaper. Feitong Yang distilled two years of pivots at an AI-agent startup, from Minecraft coplayer to Fairies desktop agent to Shortcut Excel coworker, into a warning that AI makes it dangerously easy to avoid direct contact with specific users; founders still need painful first-hand observation, narrow user focus, early shipping, ruthless core-product discipline, outcome metrics like retention and payment, infrastructure only after signal, founder-led distribution, and attention treated as spark rather than fuel.
Guinness Chen argued prompts should become voice dumps, not polished commands. Guinness Chen argued that by June 2026 users should stop hand-editing prompts and instead hold down dictation for ten minutes, feeding the model every fragment, caveat, example, and vibe in their head because LLMs are unusually good at reconstructing latent intent from messy language.
François Chollet pushed back on SaaS bears. François Chollet argued that the "all software goes to zero because Claude can one-shot apps" thesis is staggeringly short-sighted: Fable was good but still less than 1% of the way to replacing serious software businesses; if code generation ever got that strong it would generally help SaaS companies because software developers benefit most from better developer tools; code is not the product because customers pay to avoid owning every side quest; and easier code usually means more software and more usage surface for existing SaaS, not less.
Self-distillation and long-horizon RL got pushback. Penghui Qi argued PPO can beat GRPO, two reinforcement-learning recipes for training models with reward signals, on long-horizon agentic tasks, while Alex Weers and another Weers post broke down why self-distillation (training a model on its own or a teacher model's answers) can harm reasoning and why "rebellious student" style methods may preserve exploration.
OpenAI model-rumor watchers pointed to a possible GPT-5.6 and voice release. chetaslua said OpenAI was actively testing multiple 5.6-series models, including a 5.6 Pro described as able to do almost anything with the right prompt, plus GPT-Bidi-1, a bidirectional voice model that can handle speech in both directions of a conversation with an August 2025 knowledge cutoff; kimmonismus amplified the same rumored Thursday release window and said early testers framed GPT-Bidi-1 as the voice model people had hoped for since GPT-4o.
François Chollet countered the "Adobe is doomed" narrative. François Chollet argued Adobe is actually one of today's top profitable AI companies, with Q2 revenue hitting a record $6.62B (+13% YoY), adjusted earnings per share up 18% to $5.96, net margins near all-time highs at 36% while absorbing generative-AI compute costs, AI-first ARR (annual recurring revenue) tripling YoY past $500M, Firefly at $300M ARR and growing about 50% quarter-over-quarter through apps and credit packs, Acrobat AI Assistant paid users up more than 150%, and freemium MAUs (monthly active users) rising from 700M to 850M. His point: AI is accelerating Adobe's adoption and earnings growth from the prior 10-11% range to 13%, not hollowing the company out.
Ethan Mollick argued general knowledge-work agents are still too software-brained. Ethan Mollick argued that Codex/Cowork/Code-style tools transfer awkwardly to management and analysis because software has a clean final artifact and an unambiguous source of truth, while most knowledge work treats the process itself as valuable: researching what is already known, exploring alternatives, recording failed attempts, preserving prototype branches, and letting experiments change the team's view over time. That context cannot be recovered from a final PowerPoint, compacted to-do list, or summary the way it can from a codebase, making long-running tools like Fable hard to use for deep work without constant prompting workarounds

Previous Around the Horn Digests

Catch up on everything you missed:

Friday, June 19: OpenAI helped solve 18 rare pediatric disease cases; Google pushed AMIE from diagnosis into ongoing care; Z.ai's GLM-5.2 shook up open models; Anthropic sped up robotics work; Amazon aimed Trainium at Nvidia; plus much more.
Wednesday, June 17: SpaceX reportedly pushed deeper into AI coding with Cursor, CoreWeave trained DeepSeek-V3 in two minutes, and Anthropic met the White House.
Monday, June 15: Anthropic's Fable and Mythos fight spilled into policy and markets, Salesforce bought deeper into agents, and chip deal rumors got spicy.
Thursday, June 11: OpenAI acquired Ona, Anthropic faced a Claude Fable backlash, SpaceX priced a record IPO, and agent tools kept multiplying.
Monday, June 8: Apple rebuilt Siri at WWDC, OpenAI confidentially filed for an IPO, Anthropic showed Mythos exploiting fresh flaws, and NVIDIA expanded its AI factory push.
Tuesday, June 2: Earlier daily digest from the same archive.
Monday, June 1: NVIDIA turned the PC into an agent computer, Anthropic filed confidentially for an IPO, MiniMax released M3, and Bernie Sanders proposed public ownership of AI labs.
Weekend, May 29-31: Weekend roundup from the previous archive window.

Monthly skill digests:

That's a Wrap

That's 70+ story clusters and 200+ source links from today alone. If you made it to the bottom, you now know more about multi-agent orchestration, AI data-center debt, and blindfold chess evals than most quarterly planning decks should ever contain.

For the daily version, make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.

See you tomorrow.

P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.

Everything That Happened in AI Today (Monday, June 22, 2026)