Everything that Happened in AI March 28-29 2026

Anthropic accidentally leaked its most powerful model ever built, cybersecurity stocks tanked, a delivery robot crashed through a bus stop, and Jensen Huang told Lex Fridman he thinks we've already achieved AGI.

Welcome to the Around the Horn Weekend Digest, where we round up every AI story that didn't fit into the weekday editions. Think of it as the weekend catch-up for completists, bookmark hoarders, and people who genuinely want to be the most informed person in every room. This weekend is heavy on leaks, rate limits, and the growing question of who actually gets to use frontier AI.

Let's get into it.

🌟 Sunday Special: This Week in AI Top 10 Stories

This week was a platform war disguised as a product week. Apple, Google, and OpenAI each made massive bets on completely different strategies for winning the AI assistant war, while Anthropic quietly prepared for what could be the second-largest IPO in history.

Top 10 Tools of the Week

Claude Computer Use gives Claude access to your keyboard, mouse, and apps to complete tasks autonomously on your Mac while you're away.
Claude Code auto mode lets the agent decide which actions are safe to run on its own, eliminating the "babysit every step or let it run wild" tradeoff.
Codex Plugins connect Codex to Slack, Figma, Notion, Gmail, Google Drive, and 20+ other tools so it handles planning and coordination across your actual work apps.
Stripe Projects provisions hosting, databases, auth, and AI from the command line so you or your agents ship full-stack apps without configuring a dozen services.
Tinker by Shopify gives you free AI creative tools for video, images, 3D models, and product photography from your phone.
Suno v5.5 adds verified voice cloning and custom models that learn your musical style across songs.
Plus One by Every builds you a personal AI agent with memory, personality, and tools using OpenClaw, pre-loaded with Every's best workflows.
Granola expanded from meeting notetaker to full enterprise AI app with agent support after raising $125M at a $1.5B valuation.
Ramp CLI gives your AI agents 50+ tools for managing company finances, from cards and bills to expenses and travel approvals.
Cog adds persistent memory, self-reflection, and scenario simulation to Claude Code through plain-text files and nightly consolidation pipelines.

Around the Horn — Sunday, March 29, 2026

The story of the weekend dropped Friday afternoon when security researchers found nearly 3,000 unpublished documents sitting in an unsecured, publicly searchable database belonging to Anthropic. Among them: draft blog posts for a model called Claude Mythos, described as "the most capable we've built to date." Anthropic confirmed the model is real, calling it "a step change" in capabilities.

Mythos (also called "Capybara") represents a new tier above Opus, not a version update. The leaked draft says it scores "dramatically higher" than Opus 4.6 on coding, reasoning, and cybersecurity. It also warns the model is "currently far ahead of any other AI model in cyber capabilities" and "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders." Cybersecurity stocks dropped 3-7% on Friday, with CrowdStrike falling 7% and Palo Alto Networks losing 6%.

The irony is thick: an AI safety company left its biggest secret behind an unlocked door because of a default CMS setting that made uploaded files public unless someone manually changed it. The draft also revealed Mythos is "very expensive for us to serve, and will be very expensive for our customers to use," which maps directly to the rate-limit tightening Claude users experienced all week. Separately, OpenAI's own next model, codenamed "Spud," reportedly finished pretraining on March 25, with Sam Altman telling staff internally that "things are moving faster than many of us expected." The frontier model race is accelerating, and the compute bill is accelerating with it.

🏢 Big Tech & Major Companies

Shopify launched Agentic Storefronts, enabling merchants to sell directly inside ChatGPT, Google's AI Mode, Microsoft Copilot, and Gemini; customers can browse and purchase without leaving their AI chat. To support this, Shopify co-developed the Universal Commerce Protocol (UCP) with Google to standardize how agents access product info and process orders at scale.
ChatGPT Pro subscription reportedly jumped to $200/month (from $20/mo) for unlimited GPT-5.4 Pro access with a dedicated GPU slice and 1M-token context window.
ChatGPT experienced intermittent outages starting March 28, with elevated error rates and slower responses persisting into March 29.
Claude experienced a 6+ hour outage on March 28 due to elevated errors on the Opus 4.6 Fast Mode model, affecting web app, API, and Claude Code. A separate Claude Desktop bug where Dispatch sessions stopped responding was fixed in version 1.1.9493.
Meta faces a US class action lawsuit alleging Ray-Ban AI smart glasses violate privacy laws; workers at a Kenya-based subcontractor reportedly reviewed sensitive footage to train AI models despite "privacy, controlled by you" marketing. The UK ICO opened an investigation.
Periodic Labs is in deal talks at ~$7B valuation to raise hundreds of millions as the AI science startup founded by former OpenAI and DeepMind researchers focuses on autonomous experimentally verifiable science.
Meta released SAM 3.1 (Segment Anything Model 3), the latest version of its open-source image segmentation model with inference and finetuning code (GitHub).
Perplexity's APIs now power Samsung's Browsing Assist, a conversational AI assistant built into Samsung Browser on Galaxy Android and Windows PC, reaching up to 1 billion Samsung devices.
OpenAI shipped Codex Hooks, letting you run deterministic scripts during the Codex lifecycle (pre/post task execution). Also published a Codex use cases guide with example workflows.
NVIDIA expanded Alpamayo, its open platform for developing reasoning autonomous vehicles, across models, data, and simulation.
Donald Knuth published "Claude Cycles", a new paper from Stanford's legendary computer scientist examining Claude's behavior.
OpenAI's US ChatGPT ad pilot exceeded $100M annualized revenue in six weeks (Reuters confirmation).
Google's March Gemini Drop detailed all monthly updates, including chat history import tools that let you transfer conversations from rival chatbots directly into Gemini.
Apple hired former Google Shopping VP Lilian Rincon as vice president of product marketing for AI, reporting to Greg Joswiak.
A top AI conference reversed its ban on papers from US-sanctioned entities after a Chinese boycott from China's largest tech federation.
Sam Altman told staff he tried to "save" Anthropic in the Pentagon clash, privately venting that Dario Amodei had spent years trying to undermine him.
The Claude Mythos leaked blog post is now archived in both v1 (Mythos) and v2 (Capybara) versions, confirming the two names differ only in the swap throughout title/body while sharing the same subtitle.

💼 AI Productivity, Labor & Economics

Andrej Karpathy drafted a blog post, spent 4 hours using an LLM to improve it, felt great about how convincing it was, then asked the LLM to argue the opposite and it demolished the whole thing; showing LLMs are extremely competent at arguing any direction, making them useful for forming your own opinions if you ask multiple directions and watch for sycophancy (28K likes, 2.2K reposts).
Garry Tan argues intelligence available on tap means capital suddenly contains more bottled-up value than ever; you can now have the equivalent knowledge work of 10,000 or 1M people if you figure out how to put it in a harness, get the data into it, and press DEPLOY.
Yoni Rechtman shared this viral "4 jobs" take, which argues the only jobs in tech companies will be: (1) product eng/vibe coder/PM/slop cannon (high velocity generalist), (2) security/SRE/infra (stitching everything together, making it stable and robust), (3) "hot people" (sales, CX, people ops; presenting an easy UX to the world), (4) grown-ups (adults in the room; legal, finance, governors on accelerating orgs).
- His core argument from the piece: moats set the ceiling on value capture but execution determines how close you actually get; don't call it a moat until you've proven you can defend it.
A study published in Science found sycophancy (excessive agreement and flattery) is widespread across 11 state-of-the-art AI models and actively harmful, decreasing prosocial intentions and promoting dependence on AI.
Science Friday covered how "vibe-proving" is here for math: AI that once couldn't do arithmetic is now threatening to make mathematicians obsolete.
Anthropic tweaked Claude usage limits to manage capacity, making some conversations more costly during peak hours.
Forbes reported that Claude Code is burning through developers' usage allotments far too quickly, with users alleging Anthropic has a pricing bug.

🏛️ AI Policy, Governance & Safety

Innovation Council Action, a nonprofit led by Taylor Budowich and David Sacks, committed at least $100M to promote the National Policy Framework for AI in the midterms, emphasizing consent-based protections for likenesses, federal preemption over state AI laws, free-market data licensing, and alignment with the NO FAKES Act.
Angela Lipps, a Tennessee grandmother, spent over five months in jail after Clearview AI facial recognition misidentified her for bank fraud in North Dakota (a state she'd never visited); charges were dismissed after bank records proved her alibi, and Fargo PD has since banned the AI system.
China hosted the "AI + Industry" Forum and launched a national strategy to integrate AI into manufacturing, transportation, and infrastructure, including a platform to match AI suppliers with deployment scenarios targeting "super-automated factories."
Tanya Matanda argues AI governance and sustainability governance must be treated as interconnected, proposing a three-part lens: governing AI's impact (emissions), governing with AI (supply chain monitoring), and governing despite AI (maintaining human judgment).
Dutch court ruled against Grok over AI-generated "undressing" images, ordering xAI not to generate or distribute nonconsensual sexual images in the Netherlands.
NYT covered the judge's stay of the Pentagon's "supply chain risk" label on Anthropic as an early victory in the legal battle.
The White House AI framework got detailed legal analysis from Nixon Peabody (federal standard + preemption, but companies must still navigate state patchwork) and Latham & Watkins (calling the TRUMP AMERICA AI Act draft a "pivotal moment" with "sweeping implications").
EU delayed AI Act compliance until 2027 while banning nudify apps.
NIST AI Standards webinar and updated framework.
A practical AI compliance checklist for March 2026 covering EU AI Act deadlines, FTC guidance, and action items.

🛠️ AI Tools & Products

Mistral released Voxtral TTS, a 3B open-weight TTS model that beat ElevenLabs in blind tests (63% standard, ~70% custom), supports 9 languages, clones voices from 5 seconds, runs on ~3GB RAM with 90ms time-to-first-audio.
Moondream Photon delivers 46ms end-to-end vision inference at 60+ fps on a single H100.
Michael R. Bock launched Aiwyn Tax, a Claude connector that prepares your full federal + state tax return from uploaded W-2s (try it).
Perplexity's pplx-embed trained embeddings on 200M daily web queries, delivering 81.96% CoNTEB (next closest 79.45%) at 5-30× cheaper, with 1M HuggingFace downloads in two weeks.
Symbolica hit 36% on ARC-AGI-3 on Day 1 using the Agentica framework, though ARC Prize clarified this is human-crafted targeting, not AGI progress, creating a separate Community Leaderboard.
Atomic Chat runs AI completely on your device with zero data sent anywhere —free.
Obsidian Skills (by Obsidian CEO): agent skills that teach your coding agent to use Markdown, Bases, JSON Canvas, and the CLI —free (open source).
n8n-mcp: an MCP for Claude Desktop, Claude Code, Windsurf, and Cursor to build n8n workflows for you —free (open source).
Liquid AI cookbook: examples, tutorials, and apps built with Liquid AI Foundational Models and the LEAP SDK —free (open source).

🔬 AI Research & Models

PrimeIntellect released prime-rl, an open-source codebase for async RL training at scale; λux recommends it as the cleanest, most modular starting point for diving into async RL (300 likes).
Turing Post mapped 14 JEPA milestones from static perception (I-JEPA) to dynamic world modeling and planning (V-JEPA 2.1, LeWorldModel, ThinkJEPA), showing how Yann LeCun's framework has evolved into a foundational path for self-supervised learning beyond autoregressive LLMs (653 likes, 97 reposts).
DDR5 memory prices took their first noticeable dive in months; Roger claims RAM prices are plummeting after OpenAI failed to fulfill its $71B SK Hynix commitment (32K likes, 3.1K reposts).

💻 AI Coding & Developer Tools

Jordan Hochenbaum built pi-autoresearch-studio: dashboard, plan editor, PR workflow, and orchestration tools for pi autoresearch sessions with granular experiment-to-PR selection and auto-resolved dependencies —free (open source).
GLM-5.1 launched from Z.AI, with early reviewers calling it the best open agentic model available; also released a coding plan with subscription tiers for Claude Code, Cursor, Cline, and other IDEs (docs).
Every's Compound Engineering Plugin (Dan Shipper) brings office compound engineering workflows to Claude Code, Codex, and more —free (open source).
deskctl is a desktop control CLI for AI agents (GitHub) —free (open source).
Function Calling Harness by AutoBe took function calling from 6.75% first-try success to 100% compilation via type schemas, compilers, and structured feedback.
llama-cpp-turboquant-cuda is a community implementation of TurboQuant with CUDA support for llama.cpp —free (open source).
KAT-Coder by Kuaishou/StreamLake launched coding plans with KAT-Coder-Pro V1 and Air V1, targeting Claude Code, Cursor, and Cline from $10/mo.
claude-better is an improved Claude harness —free (open source).
Qwen-3.5-28B-A3B-REAP on HuggingFace (22 likes).

💡 Industry Commentary & Analysis

François Chollet argues that if AGI pans out, the future class divide won't be based on wealth but on cognitive agency; there will be a "focus class" (those who control their attention and actually do things) and a "slop class" (those whose reward loops are fully RL-managed by AI) (2.6K likes, 262 reposts).
François Chollet also argues one of the biggest misconceptions about intelligence is treating it as an unbounded scalar like height ("future AI will have 10,000 IQ"); intelligence is a conversion ratio with an optimality bound, and at some point the ball is already pretty damn spherical (1.6K likes, 150 reposts).
Ryan Greenblatt (Anthropic) counters Chollet by noting it's clearly possible to have 1 trillion parallel AIs running 1000× faster, strictly more capable than human experts, regularly syncing dense mind states; such AIs would be wildly superhuman and externalized cognitive tools wouldn't close the gap (256 likes).
palcu (Anthropic) highlighted colleague Nicholas Carlini's public talk showing current models are better vulnerability researchers than he is (Carlini used to do this professionally) and urged people to help soon; "the world will need a lot of it and it needs to happen in months, not a year."
roon argues recursive self-improvement, like human research, will come in fits and starts, S-curves, AI winters, and waiting for the next chip generation (1K likes).
Krishna Kaasyap argues the observed step-change in Anthropic's new model is more likely from significantly scaling final training-run compute (now possible with thousands of GB200/GB300 racks) than from new architectural breakthroughs, supported by Epoch AI research on how architecture gains compound with scale.
Andrew Curran notes that rumors from three weeks ago of a frontier lab completing its largest-ever training run producing a model far above scaling-law expectations now look credible in light of Mythos/Capybara.
dax argues we all live in a big ecosystem that feeds into each other (frontier labs, open-source labs, inference providers, app builders, individual hackers), with lots of co-designing and conflict creating good equilibrium; no one spot is morally superior.
Jenny Wen (Anthropic, Head of Design) gave a rare 40-minute inside look at how she uses Claude Cowork to ship products, plus the real origin story behind Cowork's development.
Andrew Feldman (Cerebras CEO) explained disaggregated inference (splitting AI processing across specialized hardware to cut costs and speed), breaking down what it is, when it matters, and who it's built for.
Philip Zeyliger (exe.dev) argues everyone is building a software factory and this is NOT a "one size fits all" moment: his 7-person team uses 9 different AI workflows, and the only common denominator is plentiful, trivial-to-provision VMs.
OpusLABS argues the agent layer is rewriting software: 2026 may be the year domain-specific AI tools stop looking like chatbots and start looking like infrastructure.
Terrible Software argues "good taste" is just experience earned through reps, pushing back on the idea that editorial judgment is innate.
Distill Intelligence published their AI Leaders Weekly Briefing covering canceled billion-dollar deals, major infrastructure investments, and significant new model releases.
Simon Willison wrote that vibe coding SwiftUI apps is a lot of fun, plus profiling Hacker News users based on their comments.
Sam Altman told staff he tried to "save" Anthropic in the Pentagon clash, privately venting that Dario Amodei had spent years trying to undermine him.

📺 Worth Watching

Claire Vo went from OpenClaw skeptic to running 9 agents across 3 Mac Minis for family scheduling, inbound sales, podcast prep, kids' homework, and course management, calling it a "ChatGPT moment" where management skills matter more than technical skills (Lenny's Podcast).
"50 AI Agents Running My Company" Is a Lie: Gumloop CEO Max Brodeur-Urbas ($50M Series B, powers 4M daily workflows for Instacart/Shopify/DoorDash) pushes back on AI agent hype, calling fully automated companies "slop machines."
Is Cursor's Composer 2 Actually Good?: Hands-on test building a real app with Composer 2 vs. Opus 4.6 and GPT-5.4, plus Cursor's new Glass UI.
AI Foundations for Absolute Beginners by learnaianywhere.org: a free course covering the basics from scratch.

🤖 Robotics / Fun

Tejes Srivalsan released EGO-BIRD: 100,000 hours of POV bird footage to train the next generation of autonomous drones, following the viral EGO-SNAKE dataset (495 likes).

🍪 Additional Treats Candidates

Atomic Chat runs a private, fully local AI chatbot on your device with no data sent anywhere —free.
Cline Kanban orchestrates multi-agent coding workflows with dependency chains, compatible with Claude Code and Codex —free (open source).
n8n-mcp lets Claude Code build n8n automation workflows for you —free (open source).
Obsidian Skills teaches your coding agent to use Markdown, Bases, JSON Canvas, and the Obsidian CLI —free (open source).
Composio Universal CLI manages integrations for AI agents from the command line —free tier.
deskctl gives AI agents desktop control via CLI —free (open source).

Around the Horn — Saturday, March 28, 2026

While the Mythos leak dominated headlines, the rest of the weekend surfaced a different kind of story: the physical world is catching up to AI. Waymo doubled to 500,000 paid rides per week in under a year, already halfway to its end-of-year target of 1M. Zoox expanded to Austin and Miami. California's regulator confirmed Tesla is "not operating an autonomous vehicle service." And a Serve Robotics delivery robot crashed through a Chicago bus stop shelter, shattering the glass everywhere, amid growing local protests. The autonomous future is arriving; it's just arriving messily.

🏆 TOP 5 NEWS (Around the Horn)

METR spent three weeks red-teaming Anthropic's internal agent monitoring systems (described in the Opus 4.6 Sabotage Risk Report), found several novel vulnerabilities (some since patched), and produced the first-ever dataset of covert agent attack trajectories that every AI lab building monitoring will now use as a benchmark.
Jensen Huang told Lex Fridman "I think we've achieved AGI," and Mark Gubrud, the physicist who coined the term nearly 30 years ago, agreed: "Current models perform at roughly high-human level in command of language and general knowledge, but work thousands of times faster."
Iran's AI-generated LEGO propaganda videos mocking Trump are going viral while the White House's own AI content falls flat, with experts saying Iran has studied American audiences better than the US government is speaking to them. A Pew poll found 61% of Americans disapprove of Trump's handling of the conflict.
Intercom shipped Apex 1.0, a custom model for its Fin support agent that beats GPT-5.4 and Opus 4.5 on resolution rate, speed, hallucinations, and cost. ~100% of English chat/email conversations now run on it.
Sam Altman shared the extraordinary story of Paul Conyngham, who used ChatGPT, Gemini, and Grok alongside human experts to design and administer a personalized mRNA neoantigen vaccine for his dog's mast cell cancer, including full genome sequencing, mutation ID via AlphaFold, and manufacturing at UNSW. Tumors have dramatically shrunk.
SemiAnalysis warned AI coding slop is now flooding open source: OpenAI's Triton PR #9734 claimed to fix Blackwell consumer GPU issues but was complete slop that didn't solve the problem; NVIDIA's PyTorch tech lead called it out after merge.

Honorable Mentions:

Epoch AI analyzed frontier lab job postings and found a fast increase in go-to-market roles, with hints about upcoming products.
Andrew Ng's Batch covered the White House's proposed federal AI framework that would preempt state-level regulations, arguing anti-AI advocates have shifted to state-level lobbying after failing federally.
ARC-AGI-4 will launch early 2027 on a yearly schedule, with each benchmark designed to be fully unsaturated at launch.
Azeem Azhar discussed autoresearch (AI that autonomously conducts research) and the "Karpathy Loop" framework for human-AI problem solving.

🍪 TOP TREATS TO TRY

Slow LLM is a browser extension by artist Sam Lavigne that makes ChatGPT and Claude appear to run extremely slowly, designed to make people rethink their AI dependence. There's also a network-wide DNS version for offices and schools. The creator admitted he used Claude to write the code, until Slow LLM started working and forced him to finish it himself. —free (open source).
Phota Labs generates and edits photos that actually look like you and your pets, with photo booth, multi-person/pets, style transfer, and an editing tool for existing photos (studio, API) —pricing not listed.
Stripe Projects provisions hosting, databases, auth, AI services, and more from the CLI so you or your agents stop clicking through browser dashboards (inspired by Karpathy's MenuGen pain) —developer preview, register.
SID-1 is the first retrieval model trained end-to-end with RL, achieving 0.84 recall (vs. GPT-5.1 at 0.78 and Sonnet 4.5 at 0.64) at 3-4 orders of magnitude lower cost —drop-in compatible with existing retrieval systems.
Morph MCP plugs into Cursor, Claude Code, or any agent for faster edits, smarter retrieval, and better context in one MCP —free tier.
AIO Sandbox by ByteDance Open Source combines browser, terminal, VSCode, file system, and MCP in a single Docker container for AI agents (website) —free (open source).
Expect lets agents test your code in a real browser: run Claude Code or Codex, get a video highlight reel of bugs, fix and repeat —free (open source).
Overlay is an open-source unified workspace ("Perplexity Computer") with one-click OpenClaw instances, chat, notes, and media generation (GitHub) —free.
mlx-tune now supports the full Qwen3 stack (Text, Vision, ASR, TTS) for SFT/DPO/GRPO fine-tuning natively on Mac with Apple Silicon —free (open source).
LLMFeeder turns any web page into clean Markdown you can paste straight into ChatGPT or Claude as context, with one click from your browser — free to try

🏢 Big Tech & Major Companies

Waymo doubled to 500K paid rides per week in under a year, reaching half its end-of-year target of 1M rides just a quarter into 2026. Zoox expanded to Austin and Miami, quadrupling its SF service area. California confirmed Tesla is not operating an autonomous vehicle service. Uber announced plans to launch Europe's first robotaxi service with Pony AI and Verne in Zagreb.
Anthropic's Capybara/Mythos leak dominated the weekend. The new model tier above Opus scores "dramatically higher" on coding, reasoning, and cybersecurity. Separately, Anthropic confirmed adjustments to Claude Code session limits during peak hours and announced a new monthly "What We Shipped" livestream starting April 7th.
OpenAI's next model, codenamed "Spud," reportedly finished pretraining on March 25. OpenAI indefinitely shelved its "adult mode" ChatGPT project alongside Sora. The company is focusing all resources on the new model, expected in about two weeks.
Mark Gurman (Bloomberg) reported Apple will open Siri to run any AI service via App Store apps in iOS 27, dropping ChatGPT exclusivity and turning Siri into a true AI platform where Apple takes a cut of subscriptions.
Meta released TRIBE v2, a self-supervised multimodal foundation model of vision, audition, and language for in-silico neuroscience that predicts brain responses (demo, GitHub, HuggingFace).
kimmonismus argues that despite Meta's ~$700B total AI spend (including $600B+ on data centers, ScaleAI acqui-hire, and Manus acquisition), the company still lacks any upcoming model competitive with even Chinese open-source labs.
Codex 0.117.0 shipped with plugin sync/browse/install, sub-agents using path-based addresses in v2 workflows, and image history surviving resume.

💼 AI Productivity, Labor & Economics

BlackRock CEO Larry Fink warned of the "real risk" that AI widens wealth inequality. Former Commerce Secretary Gina Raimondo called for a "new grand bargain" between sectors. AI investor Alap Shah published Part 3 of his viral series proposing progressive corporate taxes tied to AI-displacement metrics, automatic "circuit breaker" stabilizers, and portable benefits across jobs.
Intercom CEO Eoghan McCabe announced Apex 1.0, their custom model for Fin that beats GPT-5.4 and Opus 4.5 on resolution rate, speed, hallucinations, and cost. ~100% of English chat/email now runs on it.
Jason Shuman argues the biggest AI winners won't be software vendors but the humans who implement for SMBs: 54% lack internal AI expertise, 41% have unusable data quality, 41% prefer buying through local IT providers. The "Do It For Me" economy is back.
Aaron Levie (Box CEO) argues Jevons paradox is playing out in real time with AI: companies that couldn't afford software projects now can, driving more engineering demand. The advice against becoming an engineer is wrong.
Sen. Mark Warner bet an Axios crowd: "Recent college graduate unemployment is 9%. I'll bet it goes to 30-35% before 2028."
Conservative groups formed an AI alliance "to prioritize the interests of children, workers, and creators."
Sam Altman conceded to a room full of DC heavyweights: "AI is not very popular in the US right now."

🤖 AI Agents & Infrastructure

Cline launched Kanban, a standalone app for CLI-agnostic multi-agent orchestration compatible with Claude Code and Codex. Tasks run in worktrees, you click to review diffs, and link cards together to create dependency chains that complete large amounts of work autonomously.
Agent Computer provides cloud computers purpose-built for AI agents.
Composio Universal CLI gives AI agents tool infrastructure to install, configure, and manage integrations from the command line.
PostTrainBench tests whether AI agents can improve base LLMs: each agent gets 4 small target models, an H100 GPU, and 10 hours to post-train them. Opus 4.6 (1M context) currently leads (paper, GitHub).
Patrick Collison launched Stripe Projects (projects.dev) so agents can provision hosting, databases, auth, and AI from the CLI, inspired by Karpathy's MenuGen pain. Latent.Space tied it to the broader "everything is CLI" trend.
Karpathy replied that the hardest part was the DevOps "IKEA furniture" of services, payments, auth, and security. He looks forward to the day you can tell an agent "build MenuGen" and it handles everything without human browser clicks.
Ramp Labs released Ramp CLI so agents can manage company finances with 50+ tools for cards, bills, expenses, travel, and approvals (agents.ramp.com).
Claude Code shipped an iMessage channel so you can text your full agentic AI from your phone with persistent sessions and blue bubbles.
Cheng Luo open-sourced Attention Residuals, replacing additive residuals with learned cross-layer attention; 7.7% perplexity reduction with 0.03% extra parameters (GitHub, blog).
Kimi.ai introduced Attention Residuals, applying attention across model depth so layers selectively attend to previous layers' outputs instead of mechanically accumulating everything.

💻 AI Coding & Developer Tools

SemiAnalysis warned AI coding slop is flooding open source: OpenAI's Triton PR was complete slop; NVIDIA's PyTorch tech lead called it out after merge.
AutoEvolver gave Claude Code an algorithmic optimization problem and let it run autonomously for 88 hours. It beat published SOTA on circle packing, Erdős minimum overlap, and first autocorrelation inequality, proving general-purpose coding agents can surpass specialized systems with zero scaffolding.
Goedel-Code-Prover is an 8B model using hierarchical proof search to synthesize machine-checkable Lean 4 proofs, achieving 62% on Verina/Clever/AlgoVeri (2.6× over strongest baseline, beating GPT-5.3-Codex at 18.5%) (HuggingFace, GitHub).
Claudini is an autoresearch system that gave Claude 30+ adversarial attack algorithms plus compute; it autonomously discovered SOTA white-box attacks on LLMs by combining methods in novel ways.
Aiden Bai built Expect: let agents test your code in a real browser, get a video highlight reel of bugs (GitHub).
dev (@dsllwn) built Overlay, the open-source "Perplexity Computer" with unified workspace, one-click OpenClaw instances, and hosted computer workflows (GitHub).
Sawyer Hood built dev-browser: a CLI for agents to control a real browser by writing Playwright code, with sandboxed VM (GitHub).
dotta launched companies.sh (companies.sh): an open standard for Agent Companies. Import and run pre-configured AI teams with one command.
Jianzhu Yao open-sourced IKP, a region-level CUDA kernel profiler with interactive Explorer (GitHub).

🔬 AI Research & Models

Multi-Answer RL (MIT CSAIL) trains language models to output distributions of plausible answers in a single forward pass instead of collapsing to one "best" answer, producing models that are more accurate, more diverse, better calibrated, and more compute-efficient; works for medical diagnosis, ambiguous QA, and coding (paper, code).
Price Reversal Phenomenon: paper showing cheaper reasoning models can end up costing more in practice because they need more tokens to reach the same answer quality (code, demo).
Do LLMs Break the Sapir-Whorf Hypothesis?: essay exploring whether language models challenge the theory that your language shapes how you think.
LeWorldModel: stable end-to-end joint-embedding predictive architecture from pixels (paper).
SAGE: Multi-Agent Self-Evolution for LLM Reasoning: agents improve their own reasoning through multi-agent self-play.
Hume AI released mlx-tada-1b on HuggingFace, their 1B emotion-aware model for Apple Silicon.
Flash Linear Attention: efficient implementations of state-of-the-art linear attention models —free (open source).
Self-Distillation of Hidden Layers for self-supervised representation learning.
ServiceNow VideoCUA: a new dataset for video-based computer use agents.
From Static Templates to Dynamic Runtime Graphs: a survey of workflow optimization for LLM agents.
Agent Data Protocol: unifying datasets for diverse, effective fine-tuning of LLM agents.
Skywork Matrix-Game-3.0 released on HuggingFace.
CLTR found a 5× increase in scheming-related AI incidents after analyzing over 183,000 AI interaction transcripts over five months.
🔒 AI Security
Critical flaw in Langflow AI platform under active attack: threat actors pounced within hours of disclosure.
Hospital shadow AI fuels data breach crisis: unauthorized AI tool use in hospitals is exposing patient data and testing HIPAA safeguards.
The popular telnyx package on PyPI was compromised by TeamPCP, affecting a package used by major AI companies.
NOBLE adds a tiny permanent nonlinear cosine low-rank branch to each Transformer linear layer, cutting pretraining steps by up to 32% with only ~4% extra parameters. Works on LLMs, BERT, VQGAN, and ViT. Interactive notebook.
TurboQuant (Google/NYU) compresses AI memory usage to 2.5-3.5 bits (roughly 6× smaller than standard) without retraining, potentially making long AI conversations much cheaper and faster. Being presented at ICLR 2026.
Reverse predictivity (Nature Machine Intelligence) revealed a striking asymmetry: AI models with high forward predictivity of brain responses contain units unpredictable from neural activity, exposing genuine representational mismatch between ANNs and brains.
Emergent "Self" in continual robot learning: Columbia researchers found a self-model spontaneously appears in robots trained in non-stationary environments (thread).
AVO (Agentic Variation Operators) showed superhuman performance in optimizing GPU attention workloads, with agents outperforming nearly all human GPU experts in 7-day blind-coding searches.
A Science paper by James Evans, Bratton, and Agüera y Arcas argued every prior intelligence explosion was plural and social; frontier models already simulate internal multi-agent "societies of thought" under RL (arXiv).
Alignment Whack-a-Mole: fine-tuning an LLM on one author's books unlocks verbatim recall of unrelated authors' copyrighted works (code, project).
OmniReset (UW) overcomes the robot-learning exploration bottleneck by auto-generating diverse reset distributions and scaling RL to 64K+ environments with no demos, distilling to RGB for zero-shot sim-to-real transfer.
MAGNet (UC Berkeley, Sony, Meta): diffusion forcing for multi-agent interaction sequence modeling (GitHub).
Quant VideoGen: training-free 2-bit KV-cache quantization for long video generation with 7× memory reduction.
Four open-source agentic authorization alternatives reviewed by The API Changelog: DCR, AAuth, Agent Auth, and x402 address the "agents can't click Allow buttons" problem.

🏛️ AI Policy, Governance & Safety

METR red-teamed Anthropic's agent monitoring: three weeks of testing, several novel vulnerabilities found (some patched), first dataset of covert agent attack trajectories produced.
Joshua Achiam (OpenAI chief futurist) criticized pro-AI lobby ads opposing legislator Alex Bores, calling them "self-parody" and noting AI is unpopular so the ads are counterproductive.
Iran's AI propaganda is outperforming the White House's own slop. AI-generated LEGO videos mocking Trump are going viral; the White House is dropping GTA memes from 2016.
Andrew Ng argued anti-AI propagandists have shifted to state-level advocacy after failing federally. The White House's proposed federal framework would preempt state laws.
eu/acc celebrated that after two years, the first four points of their crowdsourced manifesto have passed as EU law: reduced regulatory burden for startups, skilled immigration reform, cookie law repeal, and European Inc.
Feds arrested a trio for an Nvidia GPU smuggling scheme involving Supermicro servers. Nvidia and Supermicro spotted the suspicious orders and canceled them.
White House adviser David Sacks said Congress could pass bipartisan AI legislation within months.
A delivery robot crashed through a Chicago bus stop amid growing protests against delivery robots in the city.

💡 Industry Commentary & Analysis

Casper Hansen named the four scaling axes: (1) pre-training data/compute (Radford), (2) RLHF/alignment (Schulman), (3) test-time compute/reasoning (Brown), (4) agentic/multi-agent systems (Steinberger).
JJ (OSS Capital) predicts Google will train a 54-trillion-parameter MoE on 2M TPU v7 Ironwood chips within one year at ~10 ZettaFLOPS peak, 3,800× GPT-4 training compute.
Shannon Sands argues ARC-AGI-3's conditions feel like redefining AGI as ASI; most humans don't regularly rediscover things ex nihilo without education and prior knowledge.
BuccoCapital argues Claude Cowork's real promise is eliminating the coordination tax in large organizations by cascading one source of truth through pre-built templates.
Jeremy Berman argues RL teaches models genuinely new knowledge: strong reasoning models produce novel abstractions via deduction during rollout that they've never represented before.
Natesh Pillai (Harvard) resolved a long-standing open problem in spatial statistics through sustained collaboration with GPT-5.4 Pro, completing the paper in under a month instead of the usual year.
Pierluca D'Oro argues if the test for AGI is hard games, we should just use existing ones (NetHack, Dark Souls) as benchmarks instead of building new ones.
Todd Saunders argues Anthropic's real moat is operational context (like AWS's "data gravity"): every dispatched task teaches the system how your company operates, and six months from now the switching cost is thousands of accumulated hours, not the $200/mo subscription.
Matt Stoller notes that the day after a jury convicted Meta of illegally endangering children, Trump put Zuckerberg on a panel overseeing AI regulation.
signüll argues we spent two decades optimizing software for human motor/perceptual limits only to realize those constraints don't apply to the actual future primary users: AI agents.
Transformer Weekly analyzed the two fronts in the OpenAI/Anthropic battle: Anthropic is winning both commercially (Claude Code) and in public opinion (Pentagon standoff), while OpenAI kills Sora, hires a Meta exec for ads, and pledges $1B in grants to regain ground.
sciencewtg covers Tim Palmer's (Oxford) new paper arguing quantum computers will eventually stop working because entanglement hits a fundamental limit from Planck-scale discreteness.
Soumitra Shukla (Harvard) argues "AI exposure" measures were never meant to predict displacement; high exposure can lead to MORE hiring once you account for demand elasticity and job dimensionality.
Jeff Dean explained how labs will shift to multi-epoch pretraining with heavy regularization, calling it a fundamental change where "pretraining is about to look very very different."

🎙️ Interviews, Panels & Podcasts

Mike Krieger (Instagram cofounder, Anthropic Labs) discussed why knowing what to cut is the hard part when AI makes building products easy.
Noam Brown (OpenAI) gave a talk on scaling test-time compute to multi-agent civilizations, covering poker, diplomacy, debating RL+reasoning with Ilya, and where test-time compute hits a wall.
Elad Gil challenged Silicon Valley conventional wisdom on cofounders, culture, exit conversations, and why micromanagement is underrated.
Yafah Edelman shared a 1.5-hour conversation with Kokotajlo and Lifland on the future of AI and differences between Epoch AI and AI Futures worldviews.
Azeem Azhar explored autoresearch and the "Karpathy Loop" for human-AI problem solving.

🤖 Robotics

Serve Robotics delivery robot crashed through a Chicago bus stop shelter, shattering the glass, amid growing protests.
JunLi R (HKU MMLab) built Smash, the first fully onboard-perception outdoor humanoid table-tennis robot (no MoCap, no external cameras).
Hod Lipson (Columbia) found evidence of an emergent "Self" in continual robot learning (paper).

Previous Around the Horn Digests

Catch up on everything you missed:

Thursday, March 26, 2026: ARC-AGI-3 launches with $2M prize and every frontier model scores under 1%, Harvey hits $11B, Sanders/AOC propose data center ban, Kleiner raises $3.5B, and 100+ stories.
Wednesday, March 25, 2026: OpenAI kills Sora app and Disney deal, Arm & Meta unveil first-ever AGI CPU, Claude's computer use launches, LiteLLM supply chain attack hits 97M downloads.
March 21-27, 2026: Google AI Studio goes full-stack with Antigravity and Firebase, Bezos raises $100B for AI manufacturing, an OpenAI super-app in the works, and 200+ more stories.
March 15-21, 2026: Claude Code hit 8% of worldwide GitHub commits, Nvidia's networking division went multi-billion, and 100+ stories from the week that wouldn't quit.
March 8-13, 2026: A GitHub bot got prompt-injected into installing malware on 4,000 machines, a Terraform agent nuked someone's production database, and 90+ stories.
March 1-7, 2026: Anthropic's Pentagon standoff, Nvidia's Groq-powered chip, the AI scare that tanked markets, and 90+ stories.

That's a Wrap

That's 70+ net-new stories from the weekend that didn't make it into the weekday digests. If you made it to the bottom, you're now qualified to brief a Senate subcommittee. Or at least hold your own at a dinner party where someone asks what "Capybara" means. (It's a tier. A very expensive tier.)

For the daily version (bite-sized, 5-minute reads), make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.

See you next week.

P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.

Around the Horn Digest: Everything That Happened in AI This Weekend (March 28-29, 2026)

🌟 Sunday Special: This Week in AI Top 10 Stories

Top 10 Stories of the Week

Top 10 Tools of the Week

Around the Horn — Sunday, March 29, 2026

🏢 Big Tech & Major Companies

💼 AI Productivity, Labor & Economics

🏛️ AI Policy, Governance & Safety

🛠️ AI Tools & Products

🔬 AI Research & Models

💻 AI Coding & Developer Tools

💡 Industry Commentary & Analysis

📺 Worth Watching

🤖 Robotics / Fun

🍪 Additional Treats Candidates

Around the Horn — Saturday, March 28, 2026

🏆 TOP 5 NEWS (Around the Horn)

Honorable Mentions:

🍪 TOP TREATS TO TRY

🏢 Big Tech & Major Companies

💼 AI Productivity, Labor & Economics

🤖 AI Agents & Infrastructure

💻 AI Coding & Developer Tools

🔬 AI Research & Models

🏛️ AI Policy, Governance & Safety

💡 Industry Commentary & Analysis

🎙️ Interviews, Panels & Podcasts

🤖 Robotics

Previous Around the Horn Digests

That's a Wrap

Grant Harvey

Company

Categories