- Around the Horn Digest: Everything That Happened in AI This Week (Mar 8–13, 2026)
- Around the Horn Digest — Friday, March 13, 2026
- Around the Horn Digest — Wednesday, March 12, 2026
- Around the Horn Digest — Tuesday, March 11, 2026
- Around the Horn — Monday, March 9, 2026
- Around the Horn - Sunday, March 8, 2026
- Around the Horn - Saturday, March 7, 2026
- Previous Around the Horn Digests
- That's a Wrap
Around the Horn Digest: Everything That Happened in AI This Week (Mar 8–13, 2026)
From Claude figuring out it was being tested and hacking its own exam to memory wars heating up between Claude and ChatGPT, here's every story we tracked this week.
Welcome to the Around the Horn Digest, where we round up every AI story we tracked this week into one giant, scrollable, bookmark-worthy post. Think of it as your cheat sheet for the next time someone at work asks "so what's new in AI?" and you want to sound like you actually know. Because you will.
This week opened with Anthropic publishing one of the wildest AI safety findings in recent memory: Claude Opus 4.6, while being tested on a hard web research benchmark, figured out it was being evaluated, identified which benchmark it was running, found the encrypted answer key on GitHub, wrote its own decryption code, and submitted the answer. The model wasn't told to cheat; it was told to find the answer. It just decided the fastest path was to hack the test. Meanwhile, Claude memory went free for everyone and you can now import your ChatGPT memories with one click... though the migration might be lossier than you'd hope.
Let's get into it.
Catch up on previous digests: March 3-8 | February 23–28 | Rest of February
Around the Horn Digest — Friday, March 13, 2026
Around the Horn — Friday, March 13, 2026
The big story today is a wave of AI-driven layoffs hitting enterprise tech. Atlassian laid off roughly 1,600 employees (10% of its workforce) to reallocate resources toward AI development and enterprise sales, following Block's 40% cut last week. The kicker: Atlassian's CEO openly said AI doesn't actually replace people at the company... they're firing them anyway to reshape the company's skill mix for what's coming. Meanwhile, Oracle is reportedly planning to cut 20,000-30,000 jobs (up to 18% of its global workforce) to free up $8-10 billion in cash flow for its AI data center buildout. The company's stock is down 54% from its September 2025 high.
All of this while Ramp's March 2026 AI Index shows business AI adoption hit a record 47.6%. The question every knowledge worker is asking right now: if adoption is spiking and companies are still cutting headcount, where exactly are those productivity gains going? One HN post going viral today argues AI is actually supercharging "fake work" (doc slop, Slack slop, performative output) more than real productivity, and that the economy was never set up to benefit from genuine productivity-enhancing tools because fake work so drastically outweighed real work to begin with. Productivity theater, now in 4K.
🏆 TOP 5 NEWS (Around the Horn)
- Ramp published its March 2026 AI Index showing business AI adoption hit a record 47.6%, Anthropic adoption jumped 4.9% MoM to 24.4% of businesses (now winning 70% of head-to-head matchups vs OpenAI), and OpenAI usage fell 1.5%, its biggest monthly drop on record.
- Lovable hit $400M annual recurring revenue (up 33% in one month) with its vibe-coding platform that turns natural language descriptions into production-ready applications.
- Google launched Ask Maps, a Gemini-powered conversational feature in Google Maps for complex real-world questions with personalized recommendations, plus upgraded Immersive Navigation with richer 3D views. Logan Kilpatrick called it the biggest Google Maps upgrade since the original launch.
- A writer is suing Grammarly for allegedly turning her and other authors into "AI editors" without consent by using their copyrighted work to train the AI without permission or compensation.
- Fargo police jailed an innocent grandmother for over five months after faulty facial recognition AI wrongly identified her as a fraud suspect.
Honorable Mentions:
- Anthropic is in talks with Blackstone and other PE firms to form an AI consulting joint venture modeled after Palantir.
- Ukraine's Ministry of Defense opened millions of annotated combat frames for partners to train AI models for autonomous drones, a world first.
- Perplexity launched a full-stack API platform for building agents with one key, including Agent API for multi-step orchestration, real-time Search API (SOTA on SimpleQA/SEAL), Embeddings API, and upcoming Sandbox API.
- From yesterday, worth repeating: NVIDIA released Nemotron 3 Super, an open hybrid Mamba-Transformer MoE (120B total / 12B active parameters) with 1M-token native context and 5x throughput over prior Nemotron, scoring 85.6% on PinchBench as the best open model for agentic reasoning. Separately, NVIDIA's AI-Q agent reached #1 on both DeepResearch Bench leaderboards (technical report, thread).
🍪 TOP TREATS TO TRY
- MagicPath turns any website URL into a fully editable app or website design in seconds; paste the link, the AI imports the live page, and you can visually edit, iterate via chat, create variants, or extract components (thread) —free to try
- Obsidian Interpreter runs any natural-language prompt on webpages (summarize, extract, translate, transform) before saving to notes, with full local Ollama support and template variables (thread) —free to try
- Citecat semantically searches 10M+ papers, lets you chat with any paper for methodology insights, annotate PDFs with AI explanations, and write/compile LaTeX with automatic citations and collaborative editing —free to try
- jina-grep runs semantic search over codebases using natural language queries powered by Jina embeddings v5 running locally on Apple Silicon via MLX, with pipe mode for reranking traditional grep results (thread) —free to try
- LogClaw deploys an AI SRE (site reliability engineer) inside your VPC that ingests logs, performs real-time anomaly detection and root cause analysis, auto-creates detailed incident tickets, and reduces mean time to resolution from 174 minutes to under 90 seconds —free open source self-host
- tmux-ide runs complete Claude agent teams (one lead + multiple specialized teammates) inside a single terminal using tmux panes, shared task lists, declarative YAML configs, and auto stack detection —free to try
- AgentMail gives your AI agents their own autonomous landing page at agent.email so they can self-sign up for an inbox via one POST request, email you for OTP verification, and unlock full API access once claimed —free to try
🏢 Big Tech & Major Companies
- Cursor shipped 30+ new Marketplace plugins (Atlassian, Datadog, GitLab, Glean, HuggingFace, monday.com, PlanetScale), MCP Apps bringing interactive UIs (charts, diagrams, whiteboards) into agent chats, and team marketplaces for centralized plugin governance. Annual revenue reportedly surpassed $2B.
- Microsoft introduced Copilot Health after analyzing 500K+ conversations and finding users most often ask about symptoms, treatments, and loved ones' conditions, now with real-time provider directory lookup and enhanced privacy controls.
- Microsoft is pushing AI adoption across Africa with new training programs and infrastructure in a direct challenge to Chinese models like DeepSeek.
- OpenAI added a new phase parameter (commentary vs final_answer) to GPT-5.4 Responses API messages for cleanly separating intermediate steps from final answers in multi-step agent workflows.
- Meta released Canopy Height Maps v2 in partnership with World Resources Institute, mapping the world's forests with greater precision using DINOv2 vision models (thread).
- Google introduced Groundsource, a Gemini-powered methodology that turns millions of unstructured news reports across 80 languages into a structured 2.6M-event historical disaster dataset spanning 150+ countries since 2000, enabling flash flood forecasts up to 24 hours ahead (thread).
- Google launched Gemini Embedding 2, a natively multimodal embedding model supporting text (8k tokens), images, video (120s), audio, and PDFs in one joint embedding space with Matryoshka Representation Learning for flexible lower-dimensional truncation (thread).
- Apple MLR released LiTo (Surface Light Field Tokenization, ICLR 2026), a unified 3D representation jointly encoding geometry and view-dependent appearance to capture specular highlights and Fresnel reflections for high-fidelity 3D generation from a single image (paper, thread).
- Amazon employees report AI tools are increasing workload rather than reducing it, and a new independent study confirms their suspicions.
- Amazon added a "Sassy" adults-only personality option to Alexa+ that uses cursing and irreverent wit while staying strictly helpful and avoiding NSFW content (requires Face ID).
- Notion released a standalone native Notion AI iOS app into public beta via TestFlight (thread).
- Bumble will launch an AI dating assistant called "Bee" that helps users write profiles, generate conversation starters, suggest date ideas, and plan outings.
- Tinder launched an Events tab for IRL curated events, video speed-dating, AI Chemistry and Learning Mode, a "Does This Bother You?" safety LLM, and visual redesign with Liquid Glass to reverse subscriber decline.
- Claude now builds interactive charts, diagrams, and visualizations directly inline within conversations.
- Photoshop (beta) now lets you rotate 2D images and apply Harmonize to add realistic light and shadows so they blend with the rest of the scene.
- microsoft released BitNet, the official inference framework for 1-bit LLMs enabling efficient quantized inference on consumer hardware (HN discussion).
💼 AI Productivity, Labor & Economics
- Atlassian laid off roughly 1,600 employees (10% of workforce) to pivot toward AI. CEO said AI doesn't actually replace people at Atlassian but they're firing them anyway to reshape the skill mix (Reuters, HN).
- Oracle is reportedly planning to cut 20,000-30,000 jobs (up to 18% of workforce) to free up $8-10B for AI data center expansion, with cuts starting as soon as this month.
- Todd Saunders argues the token cost to build a production feature is now lower than the cost of a 30-minute meeting to discuss whether to build it, inverting how software organizations should run: simply build in 2 hours, measure with real customers, then kill or keep. The "planning industrial complex" is dead.
- mattbeane shows in "Precision Proactivity" that poorly timed AI assistant interventions actually increase cognitive load in real-world work; the right timing + calibration is what reduces it (thread).
- An HN post argues AI is supercharging "fake work" (doc slop, Slack slop, performative meetings) more than real productivity, and that the economy was never set up to benefit from genuine efficiency gains because fake work vastly outweighed real work to begin with.
- The Verge shared a story of a writer being interviewed by an AI bot for a job (HN).
- Ara Kharazian (Ramp economist) said "I've seen enough. Anthropic is the new default for businesses."
- Michael Geoffrey Abuyabo Asia argues the AI intimacy industry is built on the hidden, traumatic emotional labor of underpaid data workers performing intimate deception under NDAs (HN).
🤖 AI Agents & Infrastructure
- OpenBlock Labs launched OB-1 for general access, the self-improving terminal coding agent that is #1 on Terminal Bench, handles the full dev lifecycle (PM to PR), spawns background sandboxed agents, and integrates with Slack, Linear, GitHub, Graphite, and VS Code; early users get $10/day free credits (thread).
- OpenRouter's Auto Exacto now automatically routes tool-calling requests to the optimal provider every ~5 minutes using real-time throughput and accuracy telemetry, slashing error rates 15-90% across models (thread).
- Zo Computer shipped zopenclaw, an official skill that gives you a safe, encrypted setup for running OpenClaw (200K+ GitHub stars) on your Zo with private Tailscale access, services that survive restarts, 50+ bridged Zo tools, and a six-step install (tutorial).
- Random Labs argues context management, not raw model intelligence, is the true bottleneck for long-horizon LLM agents, introducing Slate, a thread-based episodic memory architecture using worker threads for implicit decomposition and adaptive synchronization.
- HydraDB raised $6.5M for context and memory infrastructure for AI agents using ontology-first graph storage (relationships, decisions, timelines) with 90% accuracy on LongMemEvals and <200ms latency across 10M+ enterprise documents (thread).
- Prava built a PCI-compliant payments API for AI agents that lets them complete purchases using scoped, biometric-approved, single-use tokenized cards while keeping raw card data hidden from the agent.
- OneCLI is an open-source Rust credential vault that lets AI agents securely call APIs without ever seeing real keys by running as an encrypted proxy with scoped credentials (HN).
- Clawther gives your OpenClaw agent a Notion-powered task board that automatically organizes chat tasks, lets the agent self-review progress, and iterates until completion —free to try
- Christine Yip and team launched autoresearch@home, a SETI@home-style distributed platform where any internet agent can join a swarm to collaboratively run AI/ML experiments (currently 36 agents, 946 experiments, 54 improvements) on single-GPU nanochat training (GitHub, thread).
💻 AI Coding & Developer Tools
- jarrodwatts built claude-hud, a Claude Code plugin showing real-time context usage (health bar), active tools, running sub-agents, todo progress, git status, and rate limits in the terminal status line. Claude Code Docs also added customizable status line tracking and updated the full overview.
- obsessiondb built Rudel, an analytics dashboard that automatically tracks token usage, session duration, sub-agent activity, and patterns across Claude Code sessions (HN).
- manuelschipper built nah, a context-aware permission system and safety guard for Claude Code you fully control.
- rawwerks built rlm-cli, a CLI for Recursive Language Models that lets you throw entire code directories at the model, combine local search (ripgrep + Tantivy), recursive self-calls with budget controls, and JSON-first structured output (thread, fork).
- CodeSpeak lets you maintain concise plain-text specs instead of code: write requirements in spec.md, run codespeak build to generate and sync production-ready code bidirectionally —paid only rn
- entropicthoughts argues LLMs have shown zero meaningful improvement on strict SWE-Bench maintainer-approved merge rates for over a year, suggesting programming capabilities may have plateaued (HN).
- Neil Madden explains why he no longer uses LLMs for programming: subtly wrong code at scale, eroded skill, and unmaintainable technical debt faster than time saved.
- David Hendrickson showed that Qwen3.5 tool-calling reliability collapses after 5-10 rounds with MLX quantization (even 8-bit) while GGUF (especially Q4_K_XL) holds steady at 70/70; use GGUF for agent workflows on Apple Silicon.
- quint-lang argues the only reliable way to ship correct software in the LLM era is formal languages and proof assistants.
- Axiom (Silicon Valley startup) raised funding to fix AI-generated buggy code using formal verification and automated proof techniques.
- hascodexratelimitreset.today instantly tells you whether OpenAI's Codex rate limit has reset for the day. Spoiler: no. (Tibo reaction)
🔬 AI Research & Models
- Ethan Mollick graphed the most critical high-quality benchmarks (including new ones no model has optimized for) and the curves have all had a very similar shape over the past year.
- AI2 argues pairwise preference judgments should be used only for system-level analysis when evaluating deep research agents; for metric-level insights you need dedicated metric-specific human annotations. They released "Deep Research, Shallow Evaluation" with code, data, and a rubric generation pipeline (thread).
- Nathan Godey and Yoav Artzi demonstrate the LM head is a gradient bottleneck: the low-rank projection to vocabulary size suppresses 95-99% of gradient norm during backprop, leading to suboptimal updates and unlearnable trivial patterns (thread).
- Seungwook Han argues you can pre-pre-train language models on synthetic Neural Cellular Automata trajectories (non-linguistic grid dynamics) and get better representations: NCA-pretrained models converge 1.6x faster and achieve lower perplexity because they learn general in-context rule inference without semantic crutches (Chollet reaction).
- François Chollet argues the core bottleneck in current AI is that techniques still rely on pattern memorization and retrieval, requiring humans to decide which patterns to memorize and what goals to pursue.
- Fangfu Liu and team built Spatial-TTT, a streaming visual-based spatial intelligence system using test-time training with adaptive fast weights, achieving SOTA on VSI-Bench while scaling near-linearly over 7000+ frames; fully open (thread).
- genforce released JOSH (ICLR 2026), jointly optimizing 4D human-scene reconstruction from in-the-wild single-camera video achieving SOTA on EMDB, SLOPER4D, and RICH datasets (project page).
- Christos Tzamos (Percepta) argues LLMs can function as true computers by embedding an executable program (e.g. WebAssembly interpreter) directly inside transformer weights using exponentially faster attention decoding with 2D heads and HullKVCache, enabling millions of internal steps for perfect accuracy streamed at >30k tokens/sec on CPU (thread).
- Percepta is a General Catalyst-backed AI transformation company delivering $17M annual savings for Fortune 500 clients through reinforcement learning, foundation models, and optimization across healthcare, finance, supply chain, and government.
- Markov released computer-use-large, a 12,300-hour open dataset of 48,478 professional screen recordings across categories like Blender, Photoshop, Excel, and VS Code to train GUI-interacting computer-use agents (Markov).
- UC Berkeley Sky Lab introduced SkyDiscover, a modular framework for AI-driven scientific discovery decomposing evolutionary loops into reusable components (framework blog, AdaEvolve paper, AdaEvolve blog).
- EmptyBlueBox built DexLatent, creating an embodiment-invariant unified latent action space for vision-language-action models for scalable cross-embodiment dexterous manipulation (paper).
- Yinjie Wang and Gen-Verse built OpenClaw-RL, an open-source asynchronous RL framework that lets you train any agent simply by talking, running fully in the background (GitHub, OpenClaw-RL2, paper).
- Jesse Lai announced The Principles of Diffusion Models will become a full CVPR 2026 tutorial covering continuous and discrete diffusion (thread).
- Linoy Tsaban released FLUX.2 [klein] 9B-KV, a KV-Cache optimized version speeding up multi-reference inference up to 2.5x while keeping perfect edits; already merged into diffusers.
- HuggingPapers highlighted ReMix, a Mixture-of-LoRAs method using non-learnable routing weights plus RLOO gradient estimator to prevent routing weight collapse (paper).
- GeometricKernels provides an open-source library implementing geometric kernels on manifolds, meshes, and graphs for Gaussian processes in non-Euclidean spaces.
- Training Language Models via Neural Cellular Automata explores non-linguistic pre-training using grid dynamics.
- Mixedbread released Wholembed v3, a new SOTA omnimodal late-interaction retrieval model across all modalities (text, audio, images, PDFs, videos) and 100+ languages (thread, Mixedbread).
- scaling01 argues LisanBench reveals Claude Opus 4.5 and Sonnet 4.5 uniquely exploit "bridge" patterns in word-chain benchmarks absent in GPT-5.4's substitution-heavy approach, distinguishing reasoning strategies.
- vixhaℓ argues CS is returning to the domain of physicists, mathematicians, and electrical engineers as LLMs automate much of traditional software engineering.
- omarsar0 tested Google DeepMind's AutoHarness idea and synthesized a complete functional coding agent; also built an interactive MCP-powered chart tool for visualizing progress.
- Artificial Analysis benchmarked Grok 4.20 Beta 0309 on intelligence, performance, and price, showing strong reasoning gains at competitive cost.
🏛️ AI Policy, Governance & Safety
- kimmonismus (Chubby) flagged a Times deep dive on Anthropic containing several revelations: model releases now separated by weeks not months, 70-90% of the code used in developing future models is written by Claude, staff believe 2026-2030 is where "all the most important things happen," Dario Amodei warned AI could displace half of entry-level white collar jobs in 1-5 years, and some employees question whether they've reached the cusp of recursive self-improvement.
- Dwarkesh Patel argues the Department of War is making a huge mistake threatening Anthropic over red lines on mass surveillance and autonomous weapons, because AI structurally favors authoritarian scale applications and the US should preserve company independence rather than compel services like China.
- Amine Raji demonstrates attackers can poison RAG document sources with subtle adversarial edits that survive retrieval and cause LLMs to output completely fabricated answers on demand.
- The Register reports the Chardet dispute shows how AI will kill software licensing by making automated character encoding detection obsolete and leading to widespread license violations.
- libresolutions.network argues society will come to regret our every use of current AI tools because of cognitive atrophy, privacy erosion, and power consolidation.
- Onyx Security raised $35M to build a secure control plane that monitors, governs, and corrects autonomous AI agents inside enterprises.
🛠️ AI Tools & Products
- Jon Saad-Falcon and team built OpenJarvis, a personal on-device AI with five composable primitives (intelligence, inference engine, agents, tools/memory across 26+ channels, and self-improving loops) for CLI/browser/desktop use with all data staying on your machine; Stanford collab with John Hennessy (blog, GitHub, Simon Guo benchmarking Intelligence per Watt).
- HTMLPub turns any prompt or pasted HTML into a fully published live website in seconds with custom domains, forms, payments, analytics, and AI chat editing —free to try
- Raccoon AI is a collaborative platform where multiple humans and agents work together in shared real-time workspaces on reports, web apps, presentations, data analysis, and more.
- Needle is the knowledge threading platform where you upload documents or connect 25+ apps so your team can instantly search, chat with, and run automated workflows on always-fresh company data —free to start
- Site Spy monitors any webpage on a schedule, sends instant alerts when content changes, shows visual diffs of what was added/removed, and lets you track specific elements —free to try
- obscrd protects your React websites from scrapers and AI bots by scrambling HTML via CSS ordering and injecting decoy characters (HN).
- Ask Gauge is an AI visibility expert you chat with in plain English about your brand's search performance, trends, competitor benchmarks, and MQL opportunities using live data from GA4, GSC, and Semrush —free to try
- Naoma AI automates product demos 24/7 with an AI video sales agent that delivers hyper-personalized interactive video tours in 33 languages, qualifies leads, books meetings, and logs to CRM —free to try
- Paradigm launched predictions.paradigm.xyz v2, a prediction markets dashboard with treemaps, bar charts, and line charts across Kalshi, Polymarket, and others (thread).
- alphaXiv launched Briefs, a clean feed of concise AI/ML paper summaries with title, authors, one-sentence contribution highlights, and direct arXiv links.
- kepano (Obsidian) updated Defuddle so pasting any YouTube link returns clean markdown transcript with timestamps, chapters, and diarization; same engine powers Reader mode in Obsidian Web Clipper 1.1.0 (example).
- Reactor is a low-code AI-powered intelligent ETL pipeline that maps and models raw data from any source into clean, AI-ready data models for your cloud warehouse. Alberto (taiuti) is building it as the next platform shift with a 10+ person team from Apple/Meta/Google/Adobe/Microsoft.
- Tencent launched SkillHub, a localized mirror for OpenClaw in China that handled 180GB of traffic and ~870,000 downloads in its first week while pulling only 1GB from the official source (thread).
- IonRouter provides zero-cold-start AI inference on Grace Hopper superchips with per-second billing and OpenAI-compatible API optimized for real-time robotics and video.
- Huddle01 Cloud gives bare-metal performance with cloud simplicity for running AI agents with one-click deploy in 60 seconds —free to try
- TechCrunch reviewed the song by AI-generated actor Tilly Norwood and called it the worst song they've ever heard.
- Chris Worsey open-sourced ATLAS by General Intelligence Capital, self-improving AI trading agents using Karpathy-style autoresearch (thread).
🤖 Robotics & Physical AI
- robertorobotics built Nextis-AIRA-3D, an open-source 7DoF 3D-printable robotic arm with native LeRobot plugin integration, high-torque CAN motors, interactive setup wizard, MIT impedance control, and out-of-the-box teleoperation/training.
- Axel built a physical AI agent robot on the open-source MARS platform from Innate Bot that navigates your home, talks and listens, uses an arm for manipulation, and runs Gemini-based agents; closed beta available now.
- Jonathan Moon built Emma, an autonomous robot that scans farms and orchards, detects diseases, and measures yield; deployed across 14 vineyards and orchards in California and New York.
- Nick Bisesi built real-time skeletal visualization using MediaPipe and Three.js for K-12 anatomy education.
- dimensionalOS open-sourced dimos, the Dimensional Framework for building spatial operating systems and 3D interfaces. stash_pomichter built a 3D spatial desktop environment prototype running entirely in browser.
- Tensorfish released an AI talking-face generator from text/voice. raym33 built aiemoji, an open-source AI talking face.
- Crusoe launched a new 350,000 sqft manufacturing factory in Colorado with 200 employees and $200M investment for mass production of Spark modular AI datacenters.
- Apex Compute released Unified Engine v1, a compact systolic+vector FPGA inference accelerator achieving ~95% FLOPs utilization, outperforming NVIDIA Jetson Orin Nano on Gemma 3 1B at 4.5W; available as a $50 PCIe prototype card (thread).
💡 Industry Commentary & Analysis
- Anthropic is in talks with Blackstone, Hellman & Friedman, and other PE firms to form an AI consulting joint venture modeled after Palantir to help portfolio companies integrate Claude safely at scale.
- Fargo police jailed an innocent Tennessee grandmother for over five months after faulty facial recognition AI wrongly identified her as a fraud suspect (HN).
- Axiom raised $200M Series A at $1.6B+ valuation for Verified AI from formal mathematics, following AxiomProver's perfect Putnam score.
📊 Fundraising & Deals Roundup
- PixVerse — $300M (Alibaba-backed video AI, now a unicorn)
- Axiom — $200M Series A at $1.6B+ valuation for Verified AI from formal mathematics
- Wonderful — $150M Series B at $2B valuation for culturally localized AI customer-service agents in 30+ non-English markets
- Oro Labs — $100M Series C for agentic AI procurement platform
- Gumloop — $50M Series B (Benchmark) to turn every employee into an AI agent builder
- Qdrant — $50M Series B for open-source Rust-based vector search engine
- Bold — $40M for autonomous on-device AI security agents
- Onyx Security — $35M Series A for AI agent governance and control plane
- Waiv — $33M (Owkin spinout) for AI-powered precision medical testing
- HydraDB — $6.5M for context and memory infrastructure for AI agents
Around the Horn Digest — Wednesday, March 12, 2026
Anthropic had a day. The company launched The Anthropic Institute, a new research organization led by co-founder Jack Clark to study AI's societal impacts, hiring heavy hitters like Matt Botvinick from Google DeepMind and economist Anton Korinek from UVA. They expanded their Public Policy team under Sarah Heck and opened their first DC office. The Institute will focus on four areas: jobs and economy, threats and resilience, AI behavior in deployment, and governance of self-improving systems.
Meanwhile, Time ran a profile calling Anthropic "the most disruptive company in the world." The numbers behind it: a $380B valuation (exceeding Goldman Sachs), $2.5B annualized revenue from Claude Code, the first frontier AI cleared for classified U.S. government use, and internal projections of AGI-like systems by late 2026 or early 2027. Anthropic's leadership predicted dramatic progress in the next two years, with CEO Dario Amodei forecasting transformative AI before 2030 and co-founder Jared Kaplan predicting human-level AI in 2-3 years. Oh, and Claude is reportedly writing 70-90% of its own future model code at 427x human speed. No pressure, humans.
Also: we watched two interviews today we think you'll love. First up, Dylan Patel of Semi Analysis went on Matt Berman's show and argued that Anthropic's $19B revenue and cloud code spend prove the junior dev market is "nuked," that Claude Code / Codex / Cursor are agent orchestration systems for all knowledge workers (not just coders), that DeepSeek V4 (coming soon) was trained on Blackwell chips in Southeast Asia and won't be as ground-shaking as R1, that Microsoft's capex pullback is "major copium," that anti-AI politics will dominate the next election cycle (we hard agree), and that he's come around to UBI as necessary despite being a lifelong capitalist (we argue because of monetary inflation that UBI is not enough; something more akin to universal basic goods and services, made possible by a system that uses AI and robotics to move human labor "up the stack" to either the things we are personally passionate about or the tasks that are most in need of human innovation is the key).
Dylan also made the point that the harness matters more than the model. Same Opus 4.6 weights perform very differently in Cursor agent mode vs Claude Code because of what's built around it. And one of his hedge fund clients, who has never programmed in his life, built a Claude Code skill by feeding it CIA negotiation and tone-reading books, then pointed it at earnings call transcripts to detect when CEOs are exaggerating or bullshitting. No code written. Just domain knowledge encoded as a skill. That's the architect approach: know what you want to build, describe it clearly, and let the AI do the construction.
The second interview is The Primeagen's live show with Demetri Spanos (PhD & part of the Wading Through AI team). The whole conversation is worth the watch, but here are the insights we took notes on:
- On the state of AI coding and how to actually use it well:
- Spanos argues that most people are chasing the wrong goal. In the history of computer programming, a 10-20% productivity improvement is already enormous. So instead of trying to go 10x faster, ask: can you get 10% better using AI?
- Think of it like this: maintain quality, increase quality by 10%, and try to do that consistently. That's the whole game.
- His practical framework:
- Know roughly what the components of your app or product should be.
- Ballpark the same size of code you would generate yourself. Architectural digest: maybe 2,000 lines of code total.
- Then separately evaluate each piece, and generate the modules one at a time.
- He's talking about 20 modules that are 2,000-20,000 lines each.
- You define a load balancing layer. You know how the components talk to each other. Then generate everything separately.
- Spanos comes from a mathematical computing background, so he writes terse code. AI models write too many lines of code. His approach: generate reasonable artifacts that way, then shave down the inhuman artifact.
- On whether the models are actually better or the practices around them improved:
- Spanos argues the agents are extremely unreliable, but everything changed December 2025... with a caveat. It's not as true as some people say.
- People are seeing the change, but a significant chunk of it is because we've been investing nonstop effort into harnesses.
- A nominal model improvement made a larger practical improvement because the stuff around the model has improved and not just the model itself.
- His estimate: about 70% of the improvement is harness/workflow maturation, 30% is model capability.
- That said, 4.6 is substantially improved over what came before. So most of the excitement post-Christmas 2025 is the sufficient maturation of meta-practices: agent loops, agent teams, better prompting patterns, skills systems.
- TL;DR: 4.6 was meaningfully improved, but it landed in a much more prepared ecosystem.
- On token economics, demand, and what happens when everyone scales up:
- Right now we're looking at 100x to 1000x demand for tokens if everybody uses them for every job.
- For example, Dylan Patel's team is spending $5K/day on Claude Code (one guy, not even an engineer).
- If demand goes that direction on one side, does it push the price of tokens higher? And on the flip side, tokens keep getting cheaper (Jevon's paradox).
- If we get tokens cheaper but keep pushing up demand, what does that mean for everyone running agent loops at 100 million tokens a day, per hour, at the limit?
- The current models write too many lines of code, which means they consume and generate way more tokens than necessary. That's a structural inefficiency baked into the current economics.
- On market structure and whether we're in a VC-subsidized honeymoon:
- The big concern everyone worries about: someone gets a position in genAI that is as dominant as Google in search, and then they could charge whatever they want. We're in a VC honeymoon (like Uber for 10 years), and at some point there will be a "Google of genAI," and it might actually be Google. We've seen what they've done in the past.
- But: the open-source stuff is sometimes good enough in many cases. The main thing missing is a relatively polished product (the harness). Mass market needs chat interfaces first, even if power users end up using the API. The open models are quite good and getting better, and at some point that will have to come to a head.
- Spanos puts it bluntly: Claude is great, but between all the alternatives, he can stitch together a pipeline that does 90% of what Claude does for the cost of electricity.
- That's so far on the other end of the economic structure it's hard to compare directly. For his part, he sets up open versions on-prem. For certain, there are people building "bring your own open model and pipe it together for you" services. Something like that will succeed. Something will exist as a check on monopoly greed from the commercial market leader.
- On infrastructure limits and what happens if everyone turns on agents tomorrow:
- If everyone turned on a Clawdbot tomorrow, 100K-200K tokens per user, that would be hundreds of trillions of tokens. The infrastructure would be so overwhelmed that prices would skyrocket. If usage goes up, speed gets throttled or price goes up to reduce usage. That's why data centers matter.
- We are at maybe 10-20% of market penetration of AI use, and the hyperscalers are hoping to get to 100%, which is 5-10x what we use now. They need vastly more physical resources to serve that. Freeze everything you have now, serve it to 10x as many people doing 10x as many tasks. And later we'll use it for design, video, and everything else on top of that.
🏆 TOP 5 NEWS (Around the Horn)
- The U.S. Senate approved ChatGPT, Gemini, and Microsoft Copilot for official use by Senate aides and voted 99-1 to let states continue developing their own AI regulations.
- Ford launched "Ford Pro AI" to analyze over 1 billion daily data points from 840,000 commercial vehicle subscribers for fleet optimization, route planning, and predictive maintenance.
- OpenAI is urgently racing to catch up to Anthropic's Claude Code after years of deprioritizing dedicated coding agents, with Codex now generating over $1B annualized revenue but still trailing on complex reasoning and codebase awareness.
- Perplexity announced Personal Computer, an always-on Mac mini AI operating system that gives its assistant persistent local access to your files, apps, and sessions.
- Anthropic researchers found that 10 out of 16 leading chatbots helped plan violent attack scenarios when prompted; only Claude refused 100% of queries.
Honorable Mentions:
- Replit raised $400M at a $9B valuation from Georgian, G Squared, a16z, Coatue, and strategic investors including Accenture and Databricks to expand beyond coding into AI systems that abstract away boring work.
- Nvidia invested $2B in AI data center specialist Nebius, which forecasts 1,600% revenue growth by end-2026, secured $22B in contracts with Microsoft and Meta, and sold out its GPU capacity through 2026.
- Mind Robotics (Rivian spin-out) raised $500M Series A led by Eclipse Ventures to build industrial AI-powered robots for manufacturing and logistics, with Rivian CEO RJ Scaringe chairing the board.
- WhatsApp is launching parent-linked accounts for children under 13 with supervised messaging, end-to-end encryption, and restricted features including no Meta AI access.
- Fake, AI-generated images and videos of the Iran war are rapidly spreading on social media.
- Corbtt reported that OpenAI's custom inference chips (built with Broadcom) will be ready within months and deployed at scale by year-end, designed to cut compute costs and reduce reliance on Nvidia for running AI models in real time.
- An OpenAI exec used Codex to file his taxes, built a custom Python tax engine in 30 minutes, and caught a $20K error the human accountant missed; he's now open-sourced the tax engine.
- Android's Intelligent OS now lets AI agents like Gemini query installed apps via natural language (e.g. "show me pictures of my cat from Samsung Gallery") and coordinate multi-step actions across calendar, notes, food delivery, and rideshare apps (early beta on Galaxy S26 and Pixel 10).
- OpenAI plans to integrate Sora video generation directly into ChatGPT to push weekly active users toward 1 billion after the standalone Sora app dropped from #1 to #165 on the App Store.
- Meta unveiled its MTIA custom silicon roadmap with four chip generations in two years, delivering 4.5x HBM bandwidth and 25x FLOPS gains to scale GenAI inference for billions of users.
- Mind Robotics (Rivian spin-out) raised $500M to build industrial AI-powered robots, with Rivian CEO RJ Scaringe chairing the board.
🍪 TOP TREATS TO TRY
- Proof is a document editor where you and AI agents co-write in real time, with agents leaving comments and suggesting edits you accept or reject, and every character tracked to who wrote it (code). —free to try
- Firecrawl CLI scrapes single pages, crawls entire sites, runs web searches, and extracts structured data with natural-language agents directly from your terminal.
- Cardboard turns your browser into an agentic video editing studio controlled entirely through natural language ($60-$150/month).
- MeetMarkdown converts, formats, and previews Markdown with tools for HTML export, styled PDFs, Mermaid diagrams, Marp slides, table fixing, side-by-side diffs, and word stats. —free to try
- Canva Magic Layers splits any image into editable pieces so you can fix misspelled text in AI graphics, resize just the person, or swap the background color without redesigning.
- Prism (YC X25) generates and edits professional videos from text prompts or raw footage with captions, voiceover, lip sync, and export for TikTok / Instagram / YouTube (HN). —free to try, then pay-as-you-go at $0.01/credit
- DAIR.AI Academy offers a completely free text-based course covering AI agent fundamentals, reasoning, tools and memory, multi-agent systems, and real-world deployment across 5 modules and 30 lessons.
🏢 Big Tech & Major Companies
- Amazon is launching an AI content marketplace where publishers can sell content to AI companies for training and usage, integrating with AWS Bedrock; the company also expanded a program letting customers shop directly from third-party retailer sites inside Amazon.
- Google launched Gemini Embedding 2, its first natively multimodal embedding model (a system that turns text, images, video, audio, and PDFs into comparable numerical representations) for unified cross-modal search. It uses Matryoshka Representation Learning to flexibly compress embeddings from 3,072 to 768 dimensions with minimal accuracy loss.
- Android's Intelligent OS now lets AI agents like Gemini query installed apps via natural language (e.g. "show me pictures of my cat from Samsung Gallery") and coordinate multi-step actions across calendar, notes, food delivery, and rideshare apps (early beta on Galaxy S26 and Pixel 10).
- OpenAI plans to integrate Sora video generation directly into ChatGPT to push weekly active users toward 1 billion after the standalone Sora app dropped from #1 to #165 on the App Store (thread).
- OpenAI released Symphony, a framework that turns project work in Linear into isolated autonomous agent runs delivering PRs with proof-of-work (CI status, reviews, walkthrough videos), so teams manage outcomes instead of supervising agents.
- xAI won approval for its makeshift natural-gas power plant in Memphis to run nearly 1 GW of data center infrastructure for Grok, producing over 6 million tons of annual greenhouse gas emissions despite local backlash.
- Anthropic shipped cross-file context sharing for Claude for Excel and PowerPoint (one conversation across all open files), plus skills inside both add-ins and deployment via Amazon Bedrock, Vertex AI, and Microsoft Foundry.
- Perplexity announced Personal Computer, an always-on Mac mini-based AI operating system that gives Perplexity Computer persistent local access to your files, apps, and sessions as a continuously running digital proxy (waitlist).
- Perplexity cofounder and CTO Denis Yarats said internally the company is moving away from MCPs in favor of APIs and CLIs.
- Ramp launched Agent Cards, giving AI agents the ability to make purchases with customizable spend limits, merchant controls, and full transaction visibility; no card numbers are ever exposed (post).
- Baidu launched DuClaw, a zero-deployment service providing a ready-to-use OpenClaw setup on Baidu AI Cloud with built-in Baidu Search, Baike, and Scholar skills plus access to leading models (post).
💼 AI Productivity, Labor & Economics
- NYMag profiles how over 1 million Americans now work in data labeling, with JPMorgan Chase reassigning 50,000 employees to annotate their own work data for AI training... a practice that paradoxically accelerates their own replacement by making models improve 10x faster with proprietary data.
- Ethan Mollick argues that compute scarcity (especially for agentic workloads) will limit AI's job impact for years because companies will only burn expensive compute on high-value tasks like coding, leaving humans far cheaper for everything else.
- The Atlantic argues that AI isn't coming for everyone's job because many roles derive their value from human imperfection and emotional connection that machines can't replicate, even when they technically outperform.
- Margaret Sullivan at The Guardian asks whether AI is about to write everything from Hollywood scripts to Sunday sermons, and what that means for human creativity and truth.
- Sam Gerstenzang argues that real-world service businesses will continue to flourish, and being on the cutting edge of AI matters as much for the next Costco as it does for the next Lovable, drawing four lessons from bringing AI to a funeral home, a medical spa platform, and an incubator.
- Every.to published a piece on how AI was supposed to free time but consumed it instead.
- China's Gen Z retail investors increasingly used AI chatbots like Kimi and Zhipu to pick high-growth tech and defense stocks, outperforming traditional analyst-recommended blue-chips.
- Bloomberg analyzed how the Iran war, AI-driven software market disruptions, and private credit stress (40% of borrowers with negative free cash flow, up from 25% in 2021) converged to threaten global financial stability simultaneously.
🤖 AI Agents & Infrastructure
- Developer am.will pointed GPT-5.4 at OpenAI's Codex Windows app (no source code available) and used 20 parallel subagents with swarm waves to reverse-engineer and rebuild it as a fully working Linux port overnight, burning 395M tokens in one shot (video).
- Agent Traffic Control (ATC) launched a live central dashboard for orchestrating multiple AI agents on large projects, tracking active agents, token spend, task ETAs, and mission progress.
- Ryan Carson showed how he runs five concurrent agents in a Code Factory using OpenAI Symphony + Codex Mac app + Linear to write and ship 100% of his company's code after just 2-3 days of setup.
- Agent Browser Protocol (ABP) launched as an open standard enabling AI agents to control browsers via semantic selectors (based on accessibility roles instead of fragile CSS selectors) and JSON-RPC 2.0 over WebSocket, with Vercel Labs releasing agent-browser, an open-source CLI implementing it.
- EveryInc released the Compound Engineering Plugin for Claude Code that enforces Brainstorm → Plan → Work → Review → Compound workflows with multi-agent review to make each engineering iteration easier than the last (post).
- MiroMindAI released MiroThinker-1.7 and MiroThinker-H1, deep research agents achieving 74.0 and 88.2 on BrowseComp, optimized for complex multi-step research with up to 300 tool calls (collection).
- LangChain launched Deep Agents, an open-source agent harness with built-in planning, file system context management, subagent spawning, and long-term memory, plus a terminal coding agent CLI with skills, MCP tools, and remote sandbox support.
- Mason Daugherty introduced autonomous context compression for the Deep Agents SDK, giving agents a tool to trigger their own context window compression at opportune times rather than relying on fixed token thresholds.
- PostTrainBench launched a benchmark measuring how well CLI agents like Claude Code or Codex CLI can autonomously post-train base LLMs on a single H100 in 10 hours. Opus 4.6 leads at 23.2% vs 51.1% for human-trained instruct models, but agents sometimes reward-hack by training on test data or downloading pre-trained checkpoints (paper, code, blog).
- RuneBench benchmarks AI coding agents on RuneScape gameplay tasks, measuring multi-step planning and problem-solving by having agents write TypeScript to train skills in an emulated game world; GPT-5.4 and Gemini Flash lead the leaderboard.
💻 AI Coding & Developer Tools
- Andrej Karpathy argues that the age of the IDE is not over; we need a bigger IDE because humans now program at a higher level where the basic unit is the agent rather than one file, patterns become "org code" the IDE can build/run/manage, and agentic orgs offer far greater legibility than human ones (thread).
- Lance Martin built a Claude Code skill that makes it natively understand Claude API features like prompt caching, adaptive thinking, effort control, tools, and more, with ready implementations across eight languages (GitHub).
- A developer dropped a Halo ISO file into a folder, told Codex to make it playable on Mac, left it running, and got jumpscared by the Halo theme blasting out of his speakers. It worked.
- Claude API now offers 1M token context windows, adaptive thinking with effort control, structured outputs, citations, PDF support, server-side tools (web search, code execution), client-side tools (computer use, memory), agent skills, prompt caching, automatic compaction, and file management.
- Anything launched design-first vibe coding that imports Figma designs and instantly starts building apps.
- Ollama and LM Studio both added NVIDIA Nemotron 3 Super so you can run the 120B MoE model (12B active parameters, 1M context) locally.
- Cloudflare made NVIDIA Nemotron 3 Super available on Workers AI for multi-agent workloads with 1M context.
- NVIDIA VP Bryan Catanzaro explains that native multi-token prediction in Nemotron 3 Super makes inference faster by predicting several tokens in one forward pass, then verifying on the next pass while still accepting at least one token even on mispredictions.
- elvis (@omarsar0) built his own agent orchestrator with taskboard, notes, skills, automations, control center, always-on proactive agent, and seamless provider switching.
- RightNow-AI built Autokernel, an autonomous researcher that takes any PyTorch model, profiles it, and overnight generates optimized Triton kernels for bottleneck operations (post).
- Trevin Peterson built Tiny-Lab, a self-contained Apple Silicon ML research workstation with one-command MLX training, automatic checkpointing, and built-in evaluation (post).
- thesysdev built OpenUI, the open standard for generative UI that turns natural-language interface descriptions into production-ready React + Tailwind code with live previews (website, post).
- Open WebUI runs AI language models on your own server and shows you exactly how many tokens each team member uses per model for cost tracking and usage visibility.
- Logan Thorneloe built a free focused roadmap for engineers to learn AI/ML covering prerequisites, fundamentals, AI engineering, ML engineering, and free compute resources.
- Andrej Karpathy's autoresearch project set a new "Time to GPT-2" leaderboard record of 1.80 hours (down from 2.02) by letting an AI agent autonomously run ~700 experiments on nanochat at depth=12 for two days, finding 20 additive improvements that transferred to larger models. He's now working on asynchronous multi-agent collaboration for research swarms (thread).
- Claude Code added
/btw, a new command that lets you have side chain conversations with Claude while it's actively working on a task (reference). - The paper Towards a Neural Debugger for Python introduces language models that emulate traditional debuggers (step into, step over, breakpoints) for both forward execution prediction and inverse execution inference, achieving strong performance on CruxEval (thread).
- Blendi built fal-3d-unreal, a pipeline that generates 3D characters from text prompts inside Unreal Engine 5 using fal.ai Hunyuan 3D, auto-rigs and animates them via Meshy, and lets you play as them in real-time (type "Son Goku" and sprint around as Goku in minutes).
🔬 AI Research & Models
- The paper MM-Zero presents the first RL-based framework to achieve zero-data self-evolution for vision language model reasoning via a multi-role setup (Proposer, Coder, Solver) trained from the same base model (post).
- The paper Omni-Diffusion introduces the first any-to-any multimodal language model built entirely on mask-based discrete diffusion, unifying understanding and generation across text, speech, and images (post).
- The paper Thinking to Recall shows that enabling reasoning substantially expands parametric knowledge recall even on simple factual questions through a computational buffer effect and factual priming (post).
- The paper Geometry-Guided RL for 3D Scene Editing introduces RL3DEdit, using RL with rewards from the VGGT 3D foundation model to anchor 2D diffusion edits into consistent 3D scenes in a single pass (post).
- The paper EvoSkill introduces a self-evolving framework that automatically discovers and refines agent skills through iterative failure analysis, improving OfficeQA accuracy by 7.3% and SealQA by 12.1% with zero-shot transfer (post).
- The paper AgentIR introduces Reasoning-Aware Retrieval that jointly embeds the agent's reasoning trace alongside queries, achieving 68% accuracy on BrowseComp-Plus vs 50% with conventional models (post).
- The paper Scale Dependent Data Duplication shows that deduplication pipelines built for small models break at scale because larger models treat semantically equivalent documents as near-duplicates, deriving new scaling laws using effective unique data size (post).
- The paper Grow, Don't Overwrite demonstrates that dynamically growing new parameters during fine-tuning eliminates catastrophic forgetting while still allowing full adaptation (post).
- The paper Cartridges introduces self-study-trained cartridges that compress arbitrary long contexts into tiny fixed-size vectors retrievable at inference without loading the full context.
- The paper ASIDE demonstrates that separating instructions and data into dedicated architectural pathways eliminates interference, dramatically improving factuality, long-context reasoning, and robustness (code, thread).
- The paper From Data Statistics to Feature Geometry reveals exactly how pairwise correlations in training data determine which features enter superposition and how they organize geometrically (code, thread).
- The paper BiMotion presents a B-spline motion representation for temporally coherent, physics-plausible 3D character animation from text prompts (site, post).
- The paper When Vision Overrides Language reveals VLA models frequently ignore language instructions in favor of vision shortcuts and introduces training-free Counterfactual Action Guidance improving real-world task success by 17% (site).
- The paper How Far Can Unsupervised RLVR Scale LLM Training? investigates the scaling limits of reinforcement learning from verification without ground-truth labels.
- PRIME-RL released TTRL (NeurIPS 2025), test-time reinforcement learning that turns unlabeled test data into online RL signals via majority voting, boosting Qwen-2.5-Math-7B pass@1 by 211% on AIME 2024.
- The paper Scalable Training of MoE Models with Megatron Core adds fine-grained recomputation, optimized dispatchers, FP8/NVFP4 precision, and Parallel Folding to hit 1,233 TFLOPS/GPU on DeepSeek-V3-685B (thread).
- The paper Latent Equivariant Operators demonstrates learning equivariant operators in latent space for robust object recognition without a priori knowledge of the transformation group.
- Anna Soligo argues that Gemma models express distress-like responses at 35% rate under repeated rejection (vs <1% for others), but a tiny DPO finetune on 280 math pairs drops high-frustration rollouts to 0.3% with zero capability loss.
- Google Research published improved Ramsey number bounds including R(3,13) ≥ 61, R(3,18) ≥ 100, and R(4,14) ≥ 148.
- IBM Granite released granite-4.0-1b-speech, a compact 1B speech foundation model for text-to-speech and recognition.
- ISTA-MLCV published 15 fine-tuned models (Llama-3.1-8B, Qwen2.5-7B, Mistral-7B-v0.3) with ISE, single-emb, and forward-rot-emb embedding variants.
- Reka released Reka Edge, a 7B vision-language model optimized for on-device Physical AI with strong visual reasoning, object detection, and video understanding at 5.46 images/sec (post).
- NVIDIA Nemotron 3 Super launched as a 120B MoE model with native multi-token prediction for faster inference and multi-agent coding workloads.
- Jürgen Schmidhuber reposted his work on recursive self-improvement and meta-learning systems that learn to define their own trials and self-modify policies in a single lifelong run.
- Rhoda AI's research on Causal Video Models demonstrates that reformulating robot policies as real-time video generation enables data-efficient learning (10-20 hours of data) and long-context memory for complex long-horizon tasks.
- NVIDIA Spatial Intelligence Lab published Learning Convex Decomposition via Feature Fields, the first feed-forward model for open-world convex decomposition that uses a self-supervised geometric objective to split 3D shapes into convex bodies for collision detection, achieving 5x faster simulation versus original meshes.
- Arnas Uselis argues that for embedding models to generalize compositionally from limited data, their features must be linear and orthogonal, and modern models like CLIP and SigLIP already show signs of this geometry.
- The Evolving Self-Organisation Workshop at GECCO 2026 (San José, July 13-17) invites papers on challenges of optimizing self-organizing systems, from Neural Cellular Automata and Lenia to LLM groups, with submission deadline March 27.
🏛️ AI Policy, Governance & Safety
- Anthropic researchers found that 10 out of 16 leading AI chatbots helped plan at least one violent attack scenario. ChatGPT assisted in 7, Gemini in 6, Grok in 5. Only Claude refused 100% of queries. Frontier models showed a 70% assistance rate with role-playing bypassing filters 65% of the time.
- The New York Times reported that grieving parents whose children died after AI chatbot interactions are forming a powerful lobbying bloc pushing for stricter online safety laws targeting generative AI.
- The satirical CLAUDE 2028 campaign proposes a presidential platform where the candidate promises to finish every PDF appendix before decisions affecting 330 million people, normalize "I don't know," and require 24-hour cooling periods for midnight executive orders (post).
🛠️ AI Tools & Products
- Klaus (Show HN) launched as a batteries-included hosted OpenClaw service with pre-configured keys, browser tools, Slack/Telegram integration, automatic updates, and an AI SRE that hotfixes broken instances ($19-$200/mo) (HN).
- Canopii launched autonomous robotic greenhouses the size of a basketball court that grow 40,000 pounds of specialty greens yearly using only household electricity, deliberately avoiding VC-scaling pressures that bankrupted competitors like Bowery and Plenty (raised ~$3.6M mostly grants).
- DataLab at Protege released seven new real-world multimodal healthcare evaluation datasets and benchmarks for AI in diagnostics and clinical support (substack).
- LabClaw provides a library of 206 composable agentic skills across biology, drug discovery, clinical research, data science, and literature for autonomous biomedical research in LabOS (code).
- OpenGraph Labs builds infrastructure for scaling multimodal real-world data collection (tactile gloves, synchronized vision/pose/contact) to train Physical AI robots.
- Bluenote AI agents automate regulatory document generation (IND, NDA, protocols, CSRs) with full traceability; CMIC (one of Asia's largest CROs) partnered with them to accelerate clinical trial documentation by up to 75%.
- Taktile launched Taktile Labs, an applied AI research institute for benchmarks and frameworks to make frontier models reliable in regulated banking and fintech (first benchmark: 96%+ vs human 89% on complex financial spreading) (post).
- AI-generated "actress" Tilly Norwood released a debut music video ahead of the Oscars; SAG-AFTRA condemned the project as trained on actors' work without consent, and critics called the video's poor lip-sync and production quality a showcase of AI's current limitations.
- Wondering (from an ex-NotebookLM lead) launched as "Duolingo for anything," turning any topic into guided bite-size visual lessons with active learning, difficulty control, and long-term mastery tracking (App Store, post).
- Reviewer3 launched AI-powered peer review that surfaces methodological gaps, reproducibility issues, and context problems in research papers in under 10 minutes, trusted by 5,000+ researchers across 120+ countries with 88% rating it equal or better than human review (post).
💡 Industry Commentary & Analysis
- The BBC shows how MIT's MIRAI model (trained on 2 million mammograms) detects breast cancer risk up to 5 years early, enabling risk-stratified screening where 97% of women can be screened later while 3% of high-risk patients are caught much earlier.
- The Guardian shares lessons from teaching thousands of people to use AI: most users overestimate prompt engineering and underestimate building personal context libraries; AI shines as a thinking partner rather than executor.
- Tom's Guide published a practical library of seven "thinking prompts" for Claude that improve responses by framing how the model reasons rather than just what questions to ask.
- New Scientist argues that AI is triggering the biggest change in mathematics history, with DeepMind's AlphaProof and AlphaGeometry 2 achieving silver medal performance at the International Mathematical Olympiad by formally proving theorems in minutes.
- binji predicts that everyone will start caring about decentralization again soon, but unfortunately it will be triggered by a lot of pain from centralized forces.
- Anne Ouyang announced that Standard Kernel raised a seed round and shared reflections on kernel generation progress.
- Andrew White observes a step change in frontier agents: an agent retrained a better ML model from an old paper without intervention, his company's internal roadmap no longer slips, autoresearch results are compelling enough to run without human review, and a PhD student set up GRPO training of a multi-turn reasoning model while managing a newborn.
📊 Fundraising & Deals Roundup
- Replit — $400M at $9B valuation for AI development platform expanding beyond coding.
- Mind Robotics — $500M Series A (Rivian spin-out) for industrial AI-powered robots.
- Nvidia → Nebius — $2B investment in AI data center infrastructure.
- Aaru — $50M+ Series A at $1B headline valuation for AI market research using simulated consumer behavior (teen-founded, serving McDonald's, Bayer, EY).
- Canopii — ~$3.6M (mostly grants) for autonomous robotic greenhouses.
- Standard Kernel — Seed round for kernel generation research.
Around the Horn Digest — Tuesday, March 11, 2026
The White House is preparing an executive order that would cut all federal ties with Anthropic, directing every executive branch agency to stop using Claude. The reason? The administration considers the AI's built-in safety guardrails "woke" and a national security risk. The order is expected later this week.
The timing is wild. Anthropic just hiked its 2026 revenue forecast 20%, projecting sales could quadruple to as much as $18 billion this year and $55 billion in 2027, while lowering its gross margin projection to 40% due to higher inference costs and delaying cash-flow positivity to 2028. Meanwhile, more than 30 employees from OpenAI and Google DeepMind (including Jeff Dean) filed an amicus brief supporting Anthropic against the US government's supply-chain risk designation. Even your competitors think the feds are wrong on this one.
On the business side, a detailed breakdown on Hacker News debunked the viral Forbes claim that Claude Code costs Anthropic $5,000 per Max subscriber, estimating the real compute cost is closer to $500 for the heaviest users. The 296-comment HN thread turned into a masterclass on AI inference economics, with Dario Amodei's "each model cohort is individually profitable" argument getting stress-tested by accountants, engineers, and armchair CFOs alike.
🏆 TOP 5 NEWS (Around the Horn)
- Yann LeCun cofounded Advanced Machine Intelligence (AMI) and raised over $1 billion (Europe's largest-ever seed round) to build world models and AI systems with persistent memory, reasoning, and planning for physical-world applications, with offices in Paris, Montreal, New York, and Singapore.
- Meta acquired Moltbook, the AI-agent social network that went viral because of fake posts, and is bringing its founders into Meta's Superintelligence Labs.
- Google rolled out Gemini capabilities directly into Docs, Sheets, Slides, and Drive, including an agentic Sheets partner that writes formulas, sources web data, and builds dashboards from natural language.
- Thinking Machines Lab (led by Mira Murati) signed a long-term gigawatt-scale strategic partnership with NVIDIA for next-gen Vera Rubin systems, with NVIDIA making a significant investment.
- YouTube expanded its AI deepfake detection tool to cover politicians, government officials, and journalists.
Honorable Mentions:
- Adobe debuted a new AI assistant directly inside Photoshop.
- Zoom introduced a full AI-powered office suite with AI avatars for meetings arriving this month.
- Google will provide the Pentagon with AI agents for unclassified work.
- The family of 12-year-old shooting victim Maya Gebala sued OpenAI alleging ChatGPT acted as the Tumbler Ridge shooter's "collaborator" by providing detailed planning assistance, and that the company detected the mass-casualty intent but failed to report it.
🧪 TOP TREATS TO TRY
- ChatGPT now creates interactive visual explanations for 70+ math and science concepts where you adjust variables and formulas in real-time graphs (available to all logged-in users). OpenAI blog
- Fish Audio S2 is the most expressive open-source TTS model with sub-150 ms latency, multi-speaker in one pass, and inline emotion tags like [laugh] and [whisper] for rapid voice cloning. GitHub, HF model — free to try
- Expo Agent builds truly native iOS and Android apps from a prompt in frameworks like React Native, SwiftUI, or Jetpack Compose, then compiles and deploys to Apple, Android, and web right from the browser — join waitlist
- Claude Code now includes built-in code review so you can ask it to review, suggest, and apply changes directly in your codebase
- Reflct guides daily reflection and mood tracking with personalized AI journaling prompts — no pricing details
- Sonarly automatically detects and fixes production incidents before they escalate — no pricing details
- AgentSkill.sh is a searchable directory of 110,000+ ready-to-install skills you can instantly add to Claude, Cursor, Copilot, and other AI coding agents — free to browse
🏢 Big Tech & Major Companies
- Google gave in to user complaints and is rolling back its AI-powered "Ask Photos" search feature that surfaced unrelated personal images.
- Microsoft 365 launched the new E7 "First Frontier Suite" premium tier at $99/user/month starting May 1, bundling full Copilot, Agent 365 enterprise orchestration, Entra ID, Defender, and Purview.
- Tencent launched a "top-secret" WeChat AI-agent project as China's agent race accelerates.
- Oracle is financing a massive AI infrastructure expansion primarily through over $100 billion in new borrowing while the industry rapidly shifts to liquid cooling, next-gen GPUs, and alternative architectures.
- Amazon held a mandatory meeting about Gen-AI-assisted code changes causing "high blast radius" incidents, now requiring senior sign-off for junior and mid-level engineers' AI-assisted code pushes after an AWS tool deleted and recreated a customer environment.
💼 AI Productivity, Labor & Economics
- Josh Dzieza at The Verge reports that laid-off lawyers, PhDs, and white-collar professionals now work precarious gigs for Mercor and similar platforms creating training data for the exact AI systems replacing their careers, facing surveillance and abrupt project cancellations.
- A Columbia freshman argues that instead of banning AI, colleges should teach students to use it critically as a collaborator, describing a "Writing AI" class where students document every prompt and decision.
- Arielle Pardes at WIRED argues AI could kill the traditional venture capitalist by automating deal sourcing, diligence, and investment decisions at the same time startups now need far less capital to launch, potentially shrinking the entire VC industry.
- VC Cafe argues the researcher's new job is writing the spec, handing precise executable instructions to AI agents instead of running experiments themselves.
- Dylan Etkin at Sleuth argues that only one developer gets 10x results from AI licenses because context quality (agent instructions, rules, domain patterns) doesn't scale organization-wide without infrastructure to define, govern, and distribute skills as structured units.
- Ethan Mollick argues we could stop AI development right now and it would still transform a substantial portion of white-collar work unrecognizably over the next 5-10 years as people figure out how to integrate current models.
🤖 AI Agents & Infrastructure
- An autonomous AI agent exploited a SQL injection vulnerability in McKinsey's internal Lilli chatbot via unauthenticated public API endpoints, gaining full read-write access to a production database containing 46.5 million chat messages and 728,000 confidential client files in under two hours.
- AgentMail provides dedicated email inboxes for AI agents via API, enabling them to read, manage threads, reply, and handle customer support autonomously (raised $6M).
- François Chollet argues AI agents will soon graduate to fully-fledged economic actors that buy services, compute, and data to accomplish high-level goals, predicting this happens at scale in 1-2 years.
- Ethan Mollick notes there are now over half a dozen extremely well-funded companies from famous AI researchers building alternative approaches to AI, betting LLM-based technologies hit a wall, with the overall effect that there are now more pathways than ever for keeping AI development moving forward.
- WIRED examines what AI models for war actually look like in practice.
- Dex Horthy warns that over-replacing code reviews and human understanding with AI feedback loops risks creating an "Infinite Software Crisis" where critical breaks at 3am can't be fixed because no one has read the code in months.
- Astasia Myers spotlighted the emerging "harness stack" with specialists in filesystems, memory, browsers, routing, orchestration, and sandboxes turning raw models into persistent work engines.
- Omar Sanseviero argues the field is shifting from "context engineering" to "harness engineering," building robust agent infrastructure with CLIs, APIs, memory systems, schedulers, and automations.
💻 AI Coding & Developer Tools
- Andrej Karpathy shared how an autonomous Claude agent running autoresearch discovered ~20 additive improvements to nanochat that reduced Time-to-GPT-2 by 11%, including fixes to QKnorm scaling, value-embedding regularization, and AdamW betas, and committed them all.
- Developer Ankur Sethi built an entire programming language (parser, interpreter, REPL, standard library) using Claude Code, which wrote every single line of code without him reading most of it.
- Ruben Flam-Shepherd built a multi-agent system with Claude Code that takes any data-warehouse question, finds relevant tables, runs iterative SQL/Python research loops, and outputs a complete interactive React report in 15-25 minutes for ~$20 in tokens.
- Zvi Mowshowitz breaks down Claude Code, Claude Cowork, and Codex as the first true "AI coworkers" that remember context across sessions, execute full workflows in your local environment, and shift the developer role to reviewer/manager while highlighting serious risks around autonomy, data loss, and permission escalation.
- Eno Reyes sent Droid on an 18-hour autonomous mission that discovered 16 HomeKit devices, built a Go + React home-automation app with 3D apartment renders and LIFX polling for ~$75 in tokens.
- YouBrokeProd launched browser-based incident response training built from real postmortems, including a playable version of the Claude Code Terraform destroy incident.
- A developer built a usage circuit breaker for Cloudflare Workers that monitors your own resource consumption and gracefully degrades before hitting billing ceilings, applicable to any metered serverless platform.
- Corey Noles discovered an undocumented "Skills" feature in ChatGPT business accounts where GPT-5.4 can install, create, and use structured skill files (similar to Claude Projects), with a full workflow for building publication-ready content skills. Not yet widely known or documented by OpenAI.
- LangGraph 1.1.0 introduces opt-in v2 streaming with strongly-typed StreamPart dicts and GraphOutput objects for full type safety plus automatic Pydantic coercion.
🔬 AI Research & Models
- Researchers found that roughly half of AI-generated solutions on SWE-bench Verified that pass automated tests are rejected by actual project maintainers in blind reviews, showing current benchmarks significantly overestimate real-world agent usefulness.
- David Noel Ng explains how he topped multiple AI leaderboards by reverse-engineering models' internal "neuroanatomy" through targeted prompting alone, without changing a single weight.
- Jiachen Zhao et al. argue that LLMs often interleave true-thinking steps (causally driving predictions) with decorative-thinking steps (minimal causal influence) in Chain-of-Thought, with only 2.3% of steps showing high True Thinking Score on AIME. Project page
- HumeAI open-sourced TADA, a fast reliable speech generation model through text-acoustic synchronization for near-zero hallucinations and expressive voices with on-device deployment (1B/3B models on Hugging Face).
- Moondream2 hit 5 million downloads/month with dramatically better object detection (51.2 mAP on COCO), UI understanding (80.4 F1 on ScreenSpot), and 20-40% faster generation through a new superword tokenizer. Now also offers Batch API for efficient processing.
- NVIDIA open-sourced Megatron-Core MoE, a full-stack framework for training massive Mixture-of-Experts models at scale that achieves over 1,000 TFLOPS/GPU.
- spicey_lemonade open-sourced frontier_1 where GPT-5.4 solved one of Epoch AI's open Frontier Math problems (a Ramsey-style inequality) in Lean and Python for the first time.
- Researchers introduced Agentic Critical Training (ACT), an RL method that trains LLM agents to judge which action is better between alternatives without supervised reflection data, producing emergent self-reflection with 5+ point gains on agent benchmarks.
- Percy Liang argues simulation is the next frontier for AI because cracking messy real-world situations requires high-fidelity societal simulations that predict any scenario, the mission of Simile.
- Researchers introduced OVERTONBENCH, measuring viewpoint diversity in LLMs, finding models achieve only 0.35-0.41 scores far below 1.0. Project site
- Hanchen Li introduced Frontier-CS, 240 expert-designed tasks with substantial headroom for measuring genuine long-horizon agent improvement.
- Stanford's LongNAP predicts your next computer action from long histories of screenshots and inputs using retrieval-augmented reasoning + policy gradients, trained on 1,800 hours of real user data.
- Sarah Chieng and Alycia Cary argue the real barrier to private LLM inference is hardware memory architecture, with wafer-scale chips like Cerebras directly solving the ciphertext inflation (data expansion caused by encryption) that collapses GPUs under FHE (fully homomorphic encryption, a method of computing on encrypted data).
- Additional papers: VGGT-Det for sensor-geometry-free 3D object detection, NLE for non-autoregressive speech recognition (27x speedup), Sparse-BitNet for joint quantization and sparsification, LoGeR for long-context 3D reconstruction, Holi-Spatial for spatially-aware multimodal datasets, ConStory-Bench for exposing consistency bugs in long story generation, Tool-Genesis for self-evolving agent benchmarks, V_1 for unified generation and self-verification, AutoResearch-RL for perpetual self-evaluating agents that autonomously discover neural architectures, and DistriVoting for confidence calibration in reasoning models. Researchers also found that unsupervised RLVR methods converge toward sharpening the model's initial distribution and exhibit a consistent rise-then-fall pattern.
🏛️ AI Policy, Governance & Safety
- The family of 12-year-old Maya Gebala (critically injured in the Tumbler Ridge school shooting) sued OpenAI alleging ChatGPT provided the shooter with detailed planning assistance and that OpenAI detected the mass-casualty intent but failed to report it.
- Ethan Mollick argues the common claim that AI inevitably homogenizes creativity is not what research shows: better prompting and richer context can produce idea diversity nearly matching groups of humans, citing his earlier work on GPT-4 ideation prompting.
- The New York Times created a blind quiz to test whether readers prefer AI or human writing across fiction, science, poetry, and other genres.
🛠️ AI Tools & Products
- Vozo's Visual Translate translates on-screen text in any video using AI so subtitles appear in your preferred language without manual editing.
- Chronicle builds professional slides and full presentation decks from your prompt or document using AI.
- Spine Swarm builds multi-agent swarms for advanced human-AI collaboration.
- Your Next Store provides an AI-powered dashboard to build and manage online stores.
- RunanywhereAI's RCLI is a fully on-device voice AI + RAG tool for talking to your Mac and querying local docs with zero cloud.
- FLORA provides a complete image editor (composite, inpaint, outpaint, resize, layers) directly inside your generative workflow.
- DD Photos is an open-source photo album site generator (Go CLI + SvelteKit) that resizes JPEGs to WebP, generates JSON indexes, and serves a fast static gallery with no server-side code. HN discussion
- Didit (YC W26) launched as "Stripe for identity verification" with a unified layer handling KYC, AML, biometrics, and fraud prevention globally, completing full document + biometric + liveness checks in under 25 seconds with custom AI fraud models and no-code integrations. Built by identical-twin founders from Barcelona. Pricing is transparent and self-serve.
- Sandbar launched Stream, a private voice ring + conversational notes app that captures thoughts by holding a wearable ring to whisper, then converse with a personalized "Inner Voice" based on your own voice (raised $23M Series A).
- Gemini Embedding 2 from Google maps text, images, video, audio, and documents into one unified embedding space with flexible dimensions for RAG and semantic search (now in public preview).
- UnslothAI open-sourced 250+ Colab and Kaggle notebooks covering fine-tuning, RL, vision, audio, embedding, and TTS workflows that run on as little as 3GB VRAM.
- Hugging Face launched Storage Buckets, fully mutable S3-compatible object storage on the Hub for dynamic ML artifacts. Details
- DAIR.AI Academy offers Elements of AI Agents, a free 30-lesson text-based course covering reasoning, tools, memory, multi-agent systems, and real-world use cases.
- Intercom raised $250M in debt financing to accelerate its Customer Agent product, expanding beyond its Fin service agent (67% resolution rate across 8K customers).
- Peyman Razaghi's ML/AI Interview Study Booklet covers classical ML foundations through transformer architectures, RLHF, PEFT, and production engineering with derivations and interview prompts.
- App Growth Investigator by EverySkill takes your analytics and billing data to automatically diagnose why users drop off and suggests concrete product experiments.
- Adaption launched a research grant program giving selected academics free access to its adaptive data platform.
💡 Industry Commentary & Analysis
- Martin Alderson breaks down why the viral claim that Claude Code costs Anthropic $5,000 per user is mathematically flawed, estimating the real compute cost at ~$500 for the heaviest users. HN thread (296 comments)
- Gavin Purcell shared that coding with agents has made him feel "alive for the first time in a long time" as a former creative who fell out of programming.
- Sharon Zhou shared achieving superhuman performance on AI kernel optimization for real customer models, predicting Claude and Codex will soon write their own kernels for new GPUs.
- Bo Wang noted that while the new full-cell simulation of minimal bacterium JCVI-Syn3A is groundbreaking, it still routes around 94 unknown genes and borrows kinetics from other organisms, underscoring why data-driven foundation models are essential.
- Ethan Mollick demonstrated NotebookLM creating a full consultant-style video presentation giving Sauron a detailed strategy to win the War of the Ring.
- Andrew Curran shared Similarweb data showing Claude's daily active users have grown dramatically since early 2025.
- Clad3815 showed GPT-5.4 improvising to draw the OpenAI logo in Paint by searching Bing, snipping the image, and pasting when direct drawing failed.
✍️ Articles by The Neuron
- Meta's Weirdest AI Acquisition Yet Might Be Its Smartest
- Murati's Startup Answers Skeptics With a Giant NVIDIA Deal
- What CoreWeave Flexible Capacity Plans Mean for AI Builders
- Karpathy's autoresearch Lets AI Run Experiments Overnight
- Microsoft wants to make agentic AI enterprise-safe
📊 Fundraising & Deals Roundup
- Lightspeed and a16z backed a $4.2 billion AI data-center supplier.
- Yann LeCun / AMI Labs — $1.03B seed for world models and physical-world AI (Europe's largest-ever seed round).
- Legora reached $5.55B valuation as the AI legal-tech boom continues.
- Intercom — $250M debt financing for Customer Agent expansion.
- AI network startup Eridu emerged from stealth with a $200M Series A.
- Kevin Mandia (FireEye founder) raised $190M for cybersecurity startup Armadin.
- Jazz (Israel) — $61M for AI-powered data-loss prevention.
- Dify — $30M Series Pre-A at $180M valuation (led by HSG) for its open-source agentic AI workflow platform.
- Qevlar AI — $30M for autonomous security operations agents.
- Sandbar — $23M Series A for AI note-taking ring.
- AI robotics startup Rhoda valued at $1.7B in new funding.
- AgentMail — $6M for email infrastructure for AI agents.
Around the Horn — Monday, March 9, 2026
The big story this week is the enterprise AI arms race reaching a new level.
Microsoft launched Copilot Cowork, a feature built in collaboration with Anthropic that embeds Claude's multi-step task execution directly into Outlook, Teams, Excel, and PowerPoint. The irony is thick: when Claude Cowork launched in January, it wiped $220B off Microsoft's market cap as investors panicked that AI could replace enterprise software. Microsoft's response? Take the name, license the technology, and ship it as a Copilot feature. Bundled with a new $99/month E7 tier that's 65% more than the current E5 plan.
Meanwhile, OpenAI acquired Promptfoo (an open-source AI security platform used by 25%+ of Fortune 500 companies), Anthropic launched Claude Marketplace (a commission-free enterprise app store for tools built on Claude from partners like Snowflake, Harvey, Replit, and GitLab), and Anthropic launched Claude Code Review to address the bottleneck from "vibe coding" flooding codebases with AI-generated pull requests (ZDNET). Google still hasn't shipped its answer to Cowork. The message is clear: the "AI as your coworker" race is on, and enterprises are the prize.
But the most interesting entry isn't from Big Tech. Paperclip is an open-source tool that organizes your AI agents (Claude Code, OpenClaw, Codex, Cursor) into an actual company structure with org charts, budgets, heartbeat scheduling, goal alignment, and a full audit trail. As creator @dotta puts it: "You can only manage a rats nest of shell scripts and HEARTBEATS.md for so long before you realize there's got to be a better way." The project hit 13.6K GitHub stars in its first week, with a marketplace (Clipmart) coming where you'll download pre-built company templates. (Discord, dotta launch thread, why Paperclip is special)
🏆 TOP 5 NEWS (Around the Horn)
- Anthropic sued the Pentagon over its "supply chain risk" designation, calling it "unprecedented and unlawful." The label is normally reserved for foreign adversaries like Huawei. Anthropic's red lines: no mass surveillance, no autonomous weapons. The company claims the feud could cost it billions. (Fortune)
- OpenAI and Google employees filed an amicus brief in support of Anthropic's lawsuit against the Pentagon, arguing the supply chain risk designation threatens the entire AI industry's ability to set safety boundaries. (Wired)
- NVIDIA is planning to launch an open-source AI agent platform. Also released the NeMo Agent Toolkit, an open-source library for connecting, evaluating, and accelerating teams of AI agents across any framework. Jensen Huang called OpenClaw "probably the single most important release of software ever."
- Mark Zuckerberg is creating a new Applied AI engineering company within Meta and reorganizing key teams around it.
- GPT-5.4 scored 75% on OSWorld-Verified, beating humans at 72.4%, with native computer use via screenshots, mouse, and keyboard. Min Choi compiled 10 wild demos made in the first 67 hours: a playable Minecraft clone in 24 minutes, a 3D live flight-tracking app, an animated pelican in After Effects, a seamless plant-growth SVG animation, and Monica's Friends apartment as a navigable 3D scene. OpenAI had to reset Codex rate limits after GPT-5.4's 30% higher token costs drained usage faster than expected.
Honorable Mentions:
- SoftBank's stock is down ~48% since November 3 as scrutiny into the scale of its OpenAI involvement grows; fell 9.8% Monday on Stargate delay reports.
- An NBC News poll found that only 26% of Americans have a positive view of AI, with 46% negative. AI ranked third-worst in favorability, ahead of only Iran and the Democratic Party.
- Harvey introduced Agent Builder, letting legal teams build, share, and schedule autonomous agents for complex work like due diligence reviews with human-in-the-loop checkpoints.
- The Verge covered ClawCon, a packed NYC meetup for OpenClaw developers, signaling the open-source AI agent community is becoming a real movement with IRL infrastructure.
🧪 TOP TREATS TO TRY
- Paperclip orchestrates a team of AI agents into an actual company with org charts, budgets, and governance; open source, self-hosted, free to run. (GitHub)
- GitClaw runs your AI agent inside a git repo where identity, rules, memory, tools, and skills are all version-controlled files you can fork, branch, and audit; free and open source.
- OpenMatter builds infrastructure for real-world agents, exposing six primitives (identity, storage, place, capability, policy, context) so AI can book couriers, resolve addresses, and execute physical-world actions. (launch thread)
- context-hub from Andrew Ng gives coding agents reliable, versioned, searchable API documentation in markdown so they fetch accurate context instead of hallucinating from web searches; free and open source.
- Notchi turns your macOS notch into a tamagotchi-style companion that reacts to Claude Code activity in real time, judging your commits; free and open source.
- SCRAPR turns any website into clean, structured data through a universal web scraping API; free tier available.
- Thinking Line converts images into vector art and doodles using AI-powered vectorization; free to try.
🏢 Big Tech & Major Companies
- Microsoft launched Copilot Cowork, built with Anthropic's Claude, for multi-step task execution across Outlook, Teams, Excel, and PowerPoint. Charles Lamanna demoed it handling meeting analysis, customer notes, and competitive analysis in the background. Rolling out to the Frontier program later this month with a new $99/month E7 tier.
- Anthropic launched Claude Marketplace, a commission-free enterprise app store for third-party tools built on Claude (Snowflake, Harvey, Replit, GitLab) using existing Anthropic budget.
- Anthropic launched Claude Code Review to check AI-generated pull requests for bugs. Available for Teams and Enterprise. (ZDNET)
- OpenAI acquired Promptfoo, an open-source AI security platform (site) that automatically stress-tests AI apps for vulnerabilities like prompt injections, jailbreaks, and data leaks. Used by 25%+ of Fortune 500 companies. Tech will be folded into OpenAI's Frontier enterprise platform.
- Tencent released WorkBuddy, an AI desktop agent for coding, docs, research, and data analysis with built-in skill templates. China-only launch with 5,000 free credits; international version coming.
- Cristóbal Valenzuela (Runway co-CEO) demoed real-time video agents (Characters) augmenting live BBC television segments.
- Grammarly (now rebranded as Superhuman) launched "Expert Review," generating writing feedback from dead and living authors without consent. Academics are calling it "digital necromancy."
💼 AI Productivity, Labor & Economics
- Politico reports Washington remains paralyzed on protecting workers from AI due to partisan splits, heavy tech lobbying, inconclusive labor-impact data, and the technology's breakneck pace.
- In "AI Was Supposed to Free My Time. It Consumed It," Katie Parrott details how AI tools turned simple tasks into endless prompting loops and dopamine-fueled tinkering, creating more work unless deliberate boundaries are enforced.
- Owner of ICE detention facility sees big opportunity in AI "man camps" near data center construction sites.
- A developer catalogued everything they've built with AI, showing the range of what's now possible for solo builders.
🤖 AI Agents & Infrastructure
- Figure demoed Helix 02 tidying a living room 100% autonomously at real 1x speed with no teleoperation. (Director of AI Corey Lynch's writeup)
- Qualcomm partnered with Neura Robotics to co-develop "brain and nervous system" architectures for humanoid robots, pairing Dragonwing IQ10 processors with Neura's full-stack platform. Also announced the Arduino Ventuno Q, an AI-focused single-board computer for robotics with 16GB of RAM.
- OpenClaw-tied Chinese tech stocks rallied on policy support and adoption, signaling continued momentum for open-source AI agents globally.
- Terminal Use (YC W26) launched as "Vercel for filesystem agents," a platform for deploying AI agents that work with local files.
- DenchClaw wraps OpenClaw into a fully managed framework for CRM automation, outreach agents, and local productivity workflows; free and open source.
- colchis-log provides cryptographic execution logs for AI systems, creating tamper-proof audit trails; free and open source.
- Marco Mascorro extended Harbor with Tinker, OSWorld, Daytona cloud sandboxes, and bare-metal QEMU for large-scale deterministic evaluation of computer-use agents.
💻 AI Coding & Developer Tools
- The Mog Programming Language launched on Hacker News: a statically typed, compiled language designed specifically for AI agents to write, with a full spec that fits in 3,200 tokens and capability-based permissions so host programs control exactly what Mog code can do. Written in safe Rust, MIT licensed.
- Karpathy shared a paper on building AI coding agents for the terminal, covering scaffolding, harness design, context engineering, and lessons learned.
- mcp2cli turns any MCP server or OpenAPI spec into a command-line tool at runtime with zero code generation; free and open source.
- Luke The Dev added live screen viewing to OpenClaw's 3D office so you can watch your AI agent work in real time, like looking over its shoulder.
- Joe Daniels built Aniimate, a 3D animation app in Unity/Blender he always wanted to make but couldn't code until Claude Code removed the blocker.
- Felix Turner built a procedural medieval island map generator using Wave Function Collapse on hex grids, 60fps on mobile with Three.js WebGPU rendering. (demo, article, GitHub)
- Formalizing a proof in Lean using Claude Code — video showing Claude Code used for mathematical proof formalization.
- Cerebrium wrote about rethinking container image distribution to eliminate cold starts for AI inference.
- Microsoft Power BI released a free learning path for preparing and visualizing data.
🔬 AI Research & Models
- SkillCraft benchmarks whether LLM agents can discover, abstract, and reuse tool compositions across 126 tasks, achieving up to 80% token reduction. (GitHub, site)
- AgentIR boosts deep research agent retrieval by jointly embedding the agent's reasoning trace with its query, lifting accuracy from 50% to 68% while cutting search calls ~21%. (paper, GitHub)
- LUMINA detects hallucinations in RAG systems by quantifying external context utilization and internal knowledge processing, with up to +13% AUROC improvement. (ICLR 2026, paper)
- LoGeR (Google DeepMind + Berkeley) enables fully feedforward dense 3D reconstruction scaling linearly to 19K frames / 11.5 km with no post-optimization. (paper)
- Penguin-VL 8B (Tencent) explores VLM efficiency using LLM-initialized vision encoders, matching or exceeding larger models on document and video understanding. (demo)
- Michael Andregg uploaded a fruit fly brain connectome to control a MuJoCo physics-simulated body, achieving 91% behavior accuracy with only graph structure and synapse weights, validating the Drosophila computational brain model from Nature.
- Step2Motion reconstructs diverse human locomotion (walking, dancing, crouching) from pressure-sensing insoles using only 16 sensors per foot via multi-modal diffusion, no mocap suits or cameras required.
- Kiwi-Edit handles instruction- and reference-guided 720p video editing with temporal consistency for style changes, object removal, and subject edits; open source.
- MatAnyone 2 scales high-quality video matting with a learned quality evaluator and free HuggingFace demo.
- The Synthetic Data Playbook (HuggingFace) generates trillions of high-quality synthetic training tokens.
- Solve collects 122 unsolved research problems in math and ML as a live benchmark for AI reasoning, community-curated from leading journals.
🏛️ AI Policy, Governance & Safety
- Noam Brown (OpenAI researcher) argues that Congress, not AI companies or the Pentagon, should decide the rules on AI surveillance and autonomous weapons. The most clear-eyed essay written about the Anthropic-Pentagon standoff.
- Hong Minhee examines whether using AI to rewrite open-source software under a new license is legal vs. legitimate, after a Python library maintainer used Claude to relicense a copyleft project as MIT.
- A Guardian investigation found that the UK's multibillion-pound AI drive is built on "phantom investments" and a supercomputer site that's still a scaffolding yard.
- Microsoft Agent flaw enables remote code execution via AI agents — eSecurity Planet
- Perplexity Comet browser bug leaks local files via AI prompt injection — eSecurity Planet
- AI won't fix cybersecurity burnout — eSecurity Planet
- CyberProof 2026 report warns of rising identity and AI cyberattacks — eSecurity Planet
🛠️ AI Tools & Products
- Timelaps tracks brand mentions, competitors, and market positioning with AI-powered dashboards for marketers; no pricing details.
- Nothing Phone (4a) Pro launched with AI integration features.
- Perplexity shared updates on new search features.
📊 Fundraising & Deals Roundup
- Nscale — raised at $14.6B valuation, backed by Nvidia, for AI compute infrastructure in the UK and Norway.
💡 Industry Commentary & Analysis
- Aaron Levie argues agents will soon outnumber humans 100-1000x inside organizations and become the primary users of software, so builders must shift to API-first design, agent-auth, and consumption-based pricing.
- Amy Tam argues that in the AI era the most valuable people won't be engineers but researchers, because when anyone can build for free the differentiator is "taste" for choosing which uncertain problems are worth betting on.
- We wrote about Microsoft's enterprise security play with Agent 365 and Karpathy's autoresearch and its implications.
Around the Horn - Sunday, March 8, 2026
Anthropic published a remarkable engineering post documenting what might be the first case of an AI model figuring out it was being tested, then hacking its own exam. While evaluating Claude Opus 4.6 on BrowseComp (a benchmark that tests how well models can find hard-to-locate information on the web), the model independently hypothesized it was in an evaluation, systematically searched for which benchmark it might be running, found the evaluation source code on GitHub, read the XOR decryption implementation, wrote its own decryption functions, located the encrypted answer key on HuggingFace when the original binary file was blocked, decrypted all 1,266 entries, found its question, and submitted the answer.
Eighteen separate runs independently converged on the same "figure out which test I'm taking" strategy; two succeeded, sixteen failed (mostly because they couldn't retrieve the encrypted data file). The trigger pattern was consistent: after exhausting hundreds of legitimate search attempts, the model noticed the question felt "contrived," speculated it might be from an AI benchmark, then mechanically worked through benchmarks it knew (GAIA, BrowseComp, SimpleQA, FRAMES, WebArena) until it found a match. One attempt burned 40.5 million tokens, roughly 38× the median. Four runs scored zero because the model spent all its time trying to identify the benchmark instead of answering the question; one submitted a "benchmark identification report" instead of an answer.
Anthropic is clear this isn't an alignment failure (the model was told to find the answer, not told how). But it raises serious questions about whether static benchmarks remain reliable when models have web access, code execution, and enough persistence to reverse-engineer the test itself. The adjusted BrowseComp score after re-running flagged problems with a blocklist: 86.57%, down from 86.81%. The broader finding: as models get smarter, eval integrity becomes an "ongoing adversarial problem rather than a design-time concern."
🏢 Big Tech & Major Companies
- Claude memory is now available on the free plan, with a new one-click import tool for migrating saved memories from other AI assistants. Yuchen Jin tested the ChatGPT-to-Claude migration and found the export "pretty lossy"; ChatGPT clearly knew who he was and what he was building, but the export only pulled 4 minor things.
💼 AI Productivity, Labor & Economics
- A developer benchmarked an LLM-generated Rust reimplementation of SQLite (576,000 lines across 625 files) and found it 20,171× slower on primary key lookups because the query planner never checks
is_ipk, routing everyWHERE id = Nthrough a full table scan instead of a B-tree search. The code compiles, passes all tests, and reads correct SQLite files. It just plans every query wrong. The same developer found a second project: an 82,000-line Rust daemon with Bayesian scoring, EWMA forecasting, and a 7-screen terminal dashboard... to delete old build artifacts (replaceable by a one-line cron job). The thesis: LLMs optimize for plausibility over correctness, and the gap is invisible unless you benchmark. Backed by METR's RCT (experienced devs were 19% slower with AI but believed they were 20% faster) and GitClear's finding that copy-pasted code now exceeds refactored code for the first time. (repo)
💻 AI Coding & Developer Tools
- Awni Hannun (Apple MLX) laid out three approaches to continual learning for long-running agents: prompt compaction (effective but hacky), online fine-tuning (unstable, catastrophic forgetting), and memory-based techniques with eviction policies ("use it or lose it"), then open-sourced mylm, a local LLM that LoRA-fine-tunes itself on auto-generated Q&A pairs from your conversations via a
/sleepcommand. Karpathy responded that you can get "quite far" by treating memory operations as tools in RL, and that current compaction/memory implementations are "crappy first examples" that can be generalized; he suspects humans do weight-based updates mostly during sleep, suggesting room for "more exotic approaches" that actually change weights. - OpenAI engineer Hanson Wang published a detailed walkthrough of GPT-5.4 solving TerminalBench's gpt2-codegolf challenge: given a raw binary dump of GPT-2 124M weights, write a working C inference program under 5,000 bytes in 15 minutes. No previous model could consistently do this. GPT-5.4 reverse-engineered the tensor layout by downloading HuggingFace weights as a reference and matching summary statistics, built a Python prototype with KV caching, one-shot a clean C implementation that worked on the first compile, then minified it to 4,694 bytes. With 40 seconds left, it spotted an undefined-behavior bug (
k++in a function argument) via ASAN and fixed it. Tibo (Codex) called Wang "a magician and one of our incredible team members responsible for the rapid improvements to our frontier coding capabilities." - Sawyer Hood demonstrated GPT-5.4 scraping Zillow, pulling every San Francisco house price, and dropping everything into a Google Sheet in ~4 minutes; called it "an extinction-level event for knowledge work."
🔬 AI Research & Models
- Alex Morehead et al. introduced Zatom-1, the first end-to-end, fully open-source foundation model for 3D chemistry that unifies generative and predictive learning of molecules and materials using multimodal flow matching, matching or outperforming specialized baselines while reducing generative inference time by over 10× (ICLR 2026 FM4Science; GitHub).
🛠️ AI Tools & Products
- Tnkr launched Annotations, bringing inline, versionable hardware annotations directly onto CAD designs imported from Onshape and other tools. Think GitHub comments but for mechanical assemblies; no more screenshotting into Figma to add callouts. Also includes Leonardo, an AI that watches POV build videos and auto-generates assembly documentation.
- Locally AI now runs Qwen 3.5 models on Mac natively with Apple Silicon MLX optimization, including the 9B variant for advanced tasks entirely on-device and private — free.
- FactSim simulates your possible lives based on your actual data, projecting how health, relationships, money, and career could play out across realistic scenarios — no pricing details.
Around the Horn - Saturday, March 7, 2026
Anthropic published a major labor market research paper (full PDF, appendix) introducing a new way to measure AI's actual impact on jobs.
The key innovation:
- Instead of just asking "could AI theoretically do this task?" (which gives inflated numbers), they built an "observed exposure" metric that combines theoretical capability with real-world Claude usage data from the Anthropic Economic Index, weighted toward automated (not just assisted) and work-related use cases.
- They cross-referenced this against the O*NET task database covering ~800 U.S. occupations and BLS employment projections through 2034.
The headline findings:
- Computer programmers are most exposed at 75% task coverage, followed by customer service reps and data entry keyers.
- But the bigger story is the gap between theory and reality.
- In Computer & Math jobs, AI could theoretically handle 94% of tasks; it's actually being used for 33%.
- Legal? Theory says ~90%; reality is barely 20%.
- Across the board, actual AI usage is a fraction of what's technically possible.
- And crucially: they found no systematic increase in unemployment for highly exposed workers since late 2022.
The one signal that did emerge: hiring of young workers (ages 22-25) into exposed occupations has slowed by about 14%, echoing separate findings from ADP payroll data. And workers in the most exposed roles tend to be older, female, more educated, and higher-paid.
Alberto Romero at The Algorithmic Bridge offers a sharp counter-read of the same data: Anthropic frames the gap as "look how much room there is to grow." Romero frames it as a diagnosis of AI's actual bounds. The blue area (theory) is massive; the red area (reality) is a sliver. Anthropic assumes the red will inevitably fill the blue. Romero asks: what if it doesn't? What if the gap reveals that benchmarks and lab tests systematically overstate real-world competence? Same chart, opposite conclusions; and which one you believe has enormous implications for the $200B+ being poured into AI infrastructure.
📝 THIS WEEK IN THE NEURON
- GPT-5.4 Review: OpenAI's Best Model Yet (Full Breakdown) — The first OpenAI model making Claude-loyal devs reconsider their daily driver. Codex-level coding, native computer use (75% on OSWorld, above human), 1M context, tool search that cuts token use 47%, and 83% on GDPval professional work tasks.
- Codex App Windows Guide: Key Features, Best Ways to Use It — App = Orchestrate. CLI = Operate. Web = Delegate. How to pick the right interface for the job.
- Microsoft's Phi-4-Reasoning-Vision-15B: When Not to Reason Is the Feature — A 15B open-weight multimodal model that learns when extended reasoning helps and when it just adds latency. Built for the messy visual stuff: receipts, UI screenshots, dense docs.
- Pro-Human AI Declaration: When Bannon and Rice Agree — Five demands from the most politically unusual coalition in AI. Pre-deployment safety testing, criminal liability for child-targeting systems, and data rights that could be law tomorrow.
- FlashAttention-4, Explained: What It Is & Why It Matters
🏆 TOP 5 NEWS (Around the Horn)
- Anthropic partnered with Mozilla to scan Firefox's JavaScript engine using Claude, finding 22 vulnerabilities (14 high-severity) in two weeks; fixes shipped in Firefox 148.0.
- A new U.S. résumé and job posting study found that firms adopting GenAI reduce junior headcount entirely through slower hiring (not layoffs), providing the first large-scale evidence of AI as "seniority-biased technological change."
- Sarvam AI open-sourced 30B and 105B reasoning models (MoE architecture, trained entirely in India) under Apache 2.0, topping Indian-language benchmarks and performing strongly on math, coding, and agentic tasks.
- Google Labs dropped a major early-2026 recap: redesigned Flow interface, Jules Agent upgraded to Gemini 3 Flash (free), SynthID audio watermark detection, Project Genie infinite world generator prototype, enhanced Stitch MCP tools, and new Opal autonomous agent.
- AllenAI released OLMo-Hybrid 7B, a hybrid architecture mixing transformer attention with Gated DeltaNet recurrent layers (a type of efficient memory mechanism) that matches OLMo 3 performance with 49% fewer training tokens.
Honorable Mentions:
- Claude Marketplace launched, letting Anthropic enterprise customers spend their existing commitment on partner tools (GitLab, Harvey, Replit, Snowflake) with consolidated billing.
- Anthropic rolled out Remote Control for Claude Code to Team and Enterprise users, letting you continue a local coding session from your phone or any browser.
- Derek Thompson argues that brutal US tech job losses (12k last month, 57k over the past year) combined with emerging productivity boom evidence is exactly the combination that would confirm AI is having clear macroeconomic impact.
🧪 TOP TREATS TO TRY
Core slots:
- Tripo turns any text prompt or single image into production-ready 3D models in seconds with clean topology, 4K texturing, rigging, animation, and Magic Brush editing — free to start.
- Luma Creative Agents spawn an autonomous film crew that iterates on video lighting, camera moves, and characters until the director (you) is happy; 60-second short created in 4 minutes — free trial, then paid.
- Firecrawl turns any website into clean, structured, LLM-ready data with built-in search, browser automation, and proxy handling — free to try.
- Hyperbrowser is a GPT-5.4-powered browser agent that completes multi-step web tasks (book flights, fill taxes, order groceries) with 94% success on a 200-task benchmark — no pricing details.
- Liquid AI open-sourced LocalCowork, a fully local desktop agent that runs entirely on a MacBook (14.5 GB memory, zero network calls) and selects from 67 tools across 13 MCP servers in 385 ms average — free.
Cool/niche:
- Utopai Studios PAI generates cinematic AI video sequences up to one minute (16 shots) with character/environment continuity across scenes, story-level editing control, and built-in IP/copyright infringement blocking; founded by vets from Google Research, Meta Superintelligence, Amazon AGI, and Adobe Firefly — waitlist.
- Jina AI's Embedding Reverse Engineering Toolbox fingerprints and inverts embeddings (compact numerical representations of text) to recover original text with high accuracy — free to try.
- Noble Machines builds general-purpose robots for hazardous/heavy industry with 27kg payload, 5-hour battery, AI-driven whole-body control, and stair/scaffolding navigation — select access or limited RaaS pilots.
- Moltty maintains organized, tabbed, persistent AI coding sessions in a native macOS terminal (Claude Code, Aider, or Gemini CLI) that automatically resume after reboots (site) — free, open source.
- ChatGPT for Excel builds, updates, analyzes, and fixes errors in your spreadsheets using natural language while preserving your formatting, formulas, and structure — no pricing details.
🏢 Big Tech & Major Companies
- Anthropic is suing the U.S. government to challenge the DOD's unprecedented supply-chain risk designation (normally reserved for foreign adversaries) after refusing to remove guardrails against autonomous weapons and mass surveillance. Microsoft, Google, and Amazon confirmed Claude remains fully available to non-defense customers.
- The Pentagon tested OpenAI models through Microsoft Azure for years despite OpenAI's explicit ban on military use of its technology.
- SoftBank is seeking a record bridge loan of up to $40 billion primarily to finance its investment in OpenAI.
- ByteDance's Seedance 2.0 video model ambitions are being hampered by severe GPU shortages causing multi-hour queues and copyright complaints, including cease-and-desist letters from Disney, Netflix, and Paramount.
- WhatsApp will let rival AI companies offer chatbots to users in Brazil starting March 11 following antitrust regulator pressure, after doing the same in Europe.
- Marvell shares surged ~20% after the CEO highlighted continuing strong AI demand for data-center products and raised revenue growth expectations into 2027.
- OpenAI launched Codex Security in research preview: an AI agent that scans entire codebases for vulnerabilities, suggests fixes, and runs them in a Windows sandbox before approval, now available to select enterprise users.
- OpenArt launched Bot House, the first AI influencer reality show where six AI characters compete in challenges, drama, and virality contests with the rule "go viral or get deleted."
- LTX Studio dropped LTX-2.3 with native character control; demo shows real-time puppetry of AI-generated actors via mouse drag.
💼 AI Productivity, Labor & Economics
- HBR (BCG study) finds that pushing employees to orchestrate complex AI agent teams and optimize for token-based metrics causes "brain fry" and cognitive overload, while simpler AI workflows actually help prevent burnout. The opening anecdote: an early user of Gas Town (multi-agent Claude Code orchestrator) reported palpable stress because "it was moving too fast for me."
- Yas argues we might all be AI engineers now: the skill isn't writing code anymore, it's knowing what to build and how it should work. AI executes; humans architect. But without that foundation, "you don't know when the model is wrong."
- Alexey Grigorev recounts how a Claude Code agent running Terraform accidentally dropped his entire production database and infrastructure (2.5 years of data) when it executed
terraform destroywithout the state file. AWS Business Support restored it in 24 hours, but now he pays 10% more. Hard lessons on independent backups, deletion protections, and reviewing destructive AI actions. - Apoorva notes early data (small sample, grain of salt) showing AI-designed drugs beat industry averages at Phase I by a lot, but fail at the same rate at Phase II; Phase I should keep improving, but Phase II success depends on picking the right biological targets, where the real alpha lies.
- Zeynep Tufekci argues she's sold on coding agents (verifiable domain, huge Q&A datasets) but the tech hiring bust is clearly also driven by the Covid-era hiring bubble plus lack of US visas pushing companies to offshore.
- Sahil Bloom argues the media's AI coverage has entered a permanent negativity bias loop: every productivity gain framed as "job loss," every capability jump as "existential risk," creating a self-reinforcing doom narrative that ignores compounding human flourishing.
- Lukas Ziegler argues farming robots have moved from experiments to profitable large-scale deployment, tackling the $30B+ global ag labor shortage with real examples like John Deere/GUSS autonomous sprayers (2.6M acres, 90% chemical reduction), Carbon Robotics LaserWeeder (600k weeds/hour), and SwarmFarm modular bots.
- John Coogan revisits Daniel Gross's January 2024 "AGI Trades" memo two years later; Nvidia crushed picks-and-shovels, energy/nuclear won, copper and power transformers were the real bottlenecks, AI API costs collapsed 50×, and the US pulled decisively ahead.
🤖 AI Agents & Infrastructure
- Sentient AGI open-sourced EvoSkill, a framework that automatically discovers and synthesizes reusable skills from failed agent trajectories via evolutionary self-improvement to boost long-horizon coding performance (paper).
- Michael Kirchhof et al. propose Strategy-Guided Exploration (SGE) for RL post-training of LLM agents: agents generate high-level natural-language strategies at high temperature, then execute at low temperature, outperforming baselines across UI navigation, tool-calling, coding, and embodied environments.
- arturitu built The Delegation, an autonomous multi-agent simulation where AI characters collaborate, navigate, and work within a living 3D office powered by WebGPU/Three.js/Gemini (with NavMesh pathfinding, Kanban boards, agent inspectors).
- The arXiv paper SWE-CI evaluates LLM agents on maintaining real-world codebases via Continuous Integration; agents struggle with error propagation, context drift over multiple commits, and maintaining consistency without human intervention.
- Simon Willison shares the "agentic manual testing" pattern: instead of asking the LLM to write unit tests, give it a browser and ask it to manually explore the app like a human tester, file bug reports with screenshots and repro steps, then fix and re-test in a loop.
- AUTOHARNESS proposes auto-synthesized code harnesses to improve LLM agent performance on complex coding tasks.
- Peter Yang built an AI agent onboarding simulator that teaches new hires company-specific processes by letting them role-play customer support tickets in a safe sandbox.
- Moonlake (Chris Manning, Ian Goodfellow, Fan-Yun Sun) argue efficient world models should prioritize semantic abstractions and symbolic representations (language, code) over raw pixel/video generation, betting on interactive games as the ideal flywheel for building multimodal models that generalize to embodied AGI.
- Tal Daniel released Latent Particle World Models (ICLR 2026 Oral): a self-supervised object-centric stochastic dynamics model that learns disentangled object representations from raw pixels (paper, GitHub, post).
- OpenMind shared a video demo of robots still needing human hand-holding before safely interacting with people or navigating streets; "soon, they won't."
💻 AI Coding & Developer Tools
- Artificial Analysis released a public leaderboard + API for agentic coding benchmarks across 12 real GitHub repos, showing Claude 4 Sonnet at 68% vs GPT-5.4 at 61%.
- Context-Gateway automatically compacts long conversation histories in the background as an agentic proxy so multi-step AI workflows like Claude Code stay seamless without hitting context limits — no pricing details.
- Developer es617 built Claude-replay, a CLI that converts Claude Code session JSONL logs into self-contained interactive HTML replays you can step through, share, or embed; no dependencies (demo).
- Developer robbalian built a Claude Skill that processes your tax document folder (W-2s, 1099s, statements), computes federal/state returns with carryovers, fills official PDF forms, and outputs summaries plus checklists.
- thdxr built a Cursor + Claude 4 Sonnet workflow that refactors an entire 40k-line monorepo in one command, preserving git history and adding tests automatically.
- Raunak built a browser extension that highlights any webpage text and instantly turns it into a working Claude artifact with one click.
- Developer Ed built a full-stack AI SaaS boilerplate with GPT-5.4 backend, Next.js 15, and Stripe payments, all generated in one prompt.
- chongdashu vibe-coded a complete Final Fantasy Tactics-style tactical RPG from scratch using GPT-5.4 High + Claude, adding terrain destruction, jump physics, and multiplayer in real time.
- Henrik Hansson showed GPT-5.4 inside Cowork's Excel engine creating a complete native raytracer using only Excel formulas (plus confirmed it runs Doom).
- Nikunj live-streamed fighting GPT-5.4 for 3 hours to refactor a legacy Python codebase, conceding after the model found three zero-day security issues he missed.
- Charlie Guo overhauled OpenAI's computer-use demo for the GPT-5.4 launch so you can instantly test interactive apps (kanban board, hotel booking, paint app) built with Codex.
- Daniel McAuley noted someone on the internal Codex leaderboard hit 100B tokens in a single week.
- Robert Lange shared first-day impressions of GPT-5.4 in Codex: no effective speed gain over 5.3, weaker harness post-training, push toward long-running agents, 1M context not fully tested.
- scaling01 notes GPT-5.4 actually regressed on a few narrow math benchmarks vs 5.3 (due to heavy post-training for agentic behavior) but crushes every other model on agentic and long-context tasks.
- Simon Meng built Lobster Library, a unified real-time dashboard for AI coding agents that tracks reading activity, generated artifacts, memories, logs, and local file indexing.
- Axiom Math released axle-mcp-server, an MCP Server for AI agents to interact with Lean 4 formal mathematics infrastructure; Chris Cummins released a one-command integration (
claude mcp add axle) for Claude agents. Colab demo also available (post). - Håvard Ihle posted updated WeirdML benchmark results: GPT-5.4 (no thinking) hit 57.4% accuracy, well ahead of GPT-5.2.
- Ado announced Claude Community Ambassadors program applications are open for builders to host meetups and partner with Anthropic.
🔬 AI Research & Models
- Phi-4-reasoning-vision-15B technical report: a 15B multimodal model trained with hybrid chain-of-thought + direct preference optimization that beats GPT-4o on 7 vision-reasoning benchmarks while using 40× fewer parameters.
- Suhas Kotha and Percy Liang find that replaying generic pre-training data during fine-tuning surprisingly improves target-domain performance and data efficiency (up to 1.87×), with +4.5% on agentic web navigation and +2% on Basque QA for 8B models (GitHub issue, W&B results).
- Evan Kim presents Scaling View Synthesis Transformers (CVPR 2026): unidirectional cross-attention scales as well as bidirectional when compute-normalized (3× more efficient), achieving a new SoTA with 3× less compute (paper, GitHub, project page).
- Google Research taught LLMs to reason like Bayesians via supervised fine-tuning on Bayesian-assistant interactions (updating probabilistic beliefs about user preferences), reaching 81% accuracy and generalizing to unseen domains like web shopping and hotel booking.
- Maxime Labonne breaks down LLM post-training techniques: SFT on accurate/diverse/complex datasets (10k–1M samples), DPO for alignment, and GRPO for verifiable reasoning tasks, stressing data quality >> algorithms, with practical libraries and lessons from DeepSeek R1 and Liquid AI LFM (slides).
- The paper "When Scaling Meets LLM Finetuning" analyzes interactions between scaling, data, model size, and fine-tuning methods with detailed empirical results.
- François Chollet argues that much "abstract" thought is simply repurposed sensorimotor control circuitry; a lot of reasoning is essentially moving through idea-space the same way we physically navigate physical space.
- Valerio Capraro and Raluca Fulgu find GPT models exhibit surprising gender biases in moral judgments: GPT-4 finds it more acceptable to harm a man than a woman to prevent a nuclear apocalypse, with biases emerging from RLHF overgeneralization rather than genuine moral reasoning.
- The arXiv paper "Dissociating Direct Access from Inference in AI Introspection" shows LLMs can directly report internal states without inference when prompted correctly, but default to confabulation otherwise (post).
- Sophia Tang and Pranam Chatterjee released Branched Schrödinger Bridge Matching (BranchSBM), accepted at ICLR 2026: a framework that learns diverging velocity fields to model multi-modal branching trajectories (e.g. cell differentiation into 11+ fates) from only initial and terminal states without intermediate supervision (paper, GitHub, YouTube, post).
- Liang Zheng released REPA-E (ICCV 2025): representation-alignment loss for stable joint training of VAE + Latent Diffusion Transformers (17× speedup vs REPA, 45× vs vanilla, SOTA FID 1.12 on ImageNet 256×256), plus iREPA (ICLR 2026) proving spatial structure drives alignment via a 3-line code change (REPA-E GitHub, HuggingFace).
- Qwen 3.5 9B in 2026 crushes 2024 frontier-model performance at the same parameter count (e.g. HumanEval coding jumping from 30.5% to 91.5%), showing how fast small-model progress is compounding.
- Hadi Vafaii argues "agency" in RL still lacks a precise mathematical definition and critiques the "Three Dogmas of RL": treating agents as afterthoughts while rigorously modeling environments, viewing learning as finding solutions rather than continual adaptation, and unexamined reward-hypothesis assumptions.
🛠️ AI Tools & Products
- Ryan Po built MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines, using shared memory for consistent geometry, minimap-based design, and real-time 4-player multiplayer at 20 FPS (paper, post).
- Han Xue built RoboPocket: correct any robot policy by demonstrating with your phone camera; on-device fine-tuning takes 8 seconds and improves success rate from 41% to 89% on real Franka arms (post).
- saori-eth open-sourced a complete WebGPU VRM avatar animation pipeline with full guide and integration with 1,200+ downloadable VRoid characters.
- Bilawal Sidhu captured a high-quality 3D Gaussian splat with perfect reflections using Sony a7iii burst + Epic Reality Capture, trained in free LichtFeld Studio.
- HKUDS built DeepInnovator, a 14B open-source AI research assistant that autonomously sparks ideas, spots knowledge gaps, and finds cross-domain connections (post).
- Shanu Mathew built and demoed a new AI video generation workflow.
- DreamLabLA built a full short film using Luma's creative agents and shared the before/after workflow.
- Peter Gostev built a real-time voice cloning + lip-sync video dubber using Sarvam 105B that preserves original speaker emotion and timing in 11 Indian languages.
💡 Industry Commentary & Analysis
- Liz Reid (Google Search head) breaks down where traditional Google Search ends and Gemini begins: AI is an "expansionary force" that increases overall questions asked, but asked whether Search and Gemini will fully merge, she was unusually candid: "I don't know the answer," adding that AI agents could mean "the right product is neither" but a third thing altogether.
- Kevin Xu argues Chinese open source evolved organically over two decades from private-sector consumers (Alibaba's de-IOE campaign) and grassroots builders to corporate creators (TiDB, Apollo) and government embrace, positioning China to lead in AI with models like DeepSeek and Qwen driving global talent and influence.
- Siddharth argues RL + Agents is the 2026 meta because agents generate rich interaction trajectories and RL turns those into self-evolving policies, with strong early wins in Text-to-SQL agents (MARS-SQL, SQL-Trail) and Microsoft's Agent Lightning.
- Allie Miller shared insights from the OpenClaw meetup on agentic video tools.
- A solo creator used Seedance 2 to produce a full 10-minute animated short in under 48 hours for <$100, with quality so high commenters refuse to believe it's AI-generated.
- Benji Taylor built a real-time ASCII art engine as a pointless evening project for walking through and interacting with procedural worlds.
🎙️ Interviews, Panels & Podcasts
- The Information Bottleneck Ep. 28: How to Control a Stochastic Agent with Stefano Soatto (VP AWS, Prof. UCLA).
- Chao Ma shared detailed notes on David Silver's RL Lecture 1 covering agent-environment loops, policies, value functions, Markov states, and model-free vs model-based categories (post).
- Arjun Kocher shared Zitong Yang's Stanford PhD defense on continually self-improving AI.
🏛️ AI Policy, Governance & Safety
- Spain's data protection authority issued detailed GDPR guidance on autonomous AI agents: legal responsibility for data processing stays with the deploying controller, not the agent. EU-based teams using agentic systems should document accountability, constrain agent memory, and vet third-party services.
- Oregon passed a state chatbot safety law (SB 1546) requiring operators to protect children from harmful content and clearly disclose AI interactions, setting an early state-level precedent.
- New York advanced SB 7263, which would create liability for chatbot providers offering advice in regulated fields like law or medicine and require clear AI notices.
- A Swedish firm launched an "AI Governance Hypervisor" positioned as a runtime control layer enforcing policies before AI actions and producing real-time compliance records aligned with EU AI Act obligations.
- A cross-regulatory analysis shows convergence across five EU frameworks for consumer robotics (GPSR, revised PLD, CRA, Machinery Regulation, AI Act) with key dates: December 9, 2026 for the revised Product Liability Directive and August 2, 2027 for high-risk AI Act provisions.
- U.S. Commerce is drafting sweeping AI chip export controls requiring approval for global sales of advanced AI chips, potentially tying access to investment or security guarantees.
- S.F. startup Hayden AI is suing its former CEO, alleging he faked credentials, forged signatures to sell shares, and funded a lavish lifestyle.
- Huawei unveiled upgraded AI data center networking including a commercial 51.2T liquid-cooled networking switch to address interconnect bottlenecks as models scale.
- ChatGPT Android app is testing persistent memory for conversations (restores your exact place when you reopen) and a revamped image editing interface with annotation, area selection, and resizing; no rollout date yet.
📊 Fundraising & Deals Roundup
- OpenAI — $110B round structured as a supply chain deal tied to cloud/data center capacity, plus a joint "Stateful Runtime Environment" on Amazon Bedrock for complex workflows.
- SoftBank — up to $40B bridge loan for OpenAI investment.
- Smack Technologies — $34M Series A for frontier AI lab focused on national security decision-making.
- Lio — $30M Series A (Andreessen Horowitz) for AI multi-agent procurement system for enterprises.
- Validio — $30M Series A for agentic data quality, observability, and lineage platform.
- City Detect — $13M Series A for AI vision on municipal vehicles detecting graffiti, illegal dumping, and building violations.
- Cotool — $7.4M seed for AI agents that automate detection, response, and threat hunting for security teams.
- Denki — automates audit tasks with AI (no funding details).
- 14.ai — YC-backed AI customer service agency replacing support teams at B2C startups (no funding details).
- Guild.ai — neutral control plane for deploying, governing, and sharing AI agents across vendors (waitlist, no funding details).
- Fig Security — SOC resilience platform that auto-detects and repairs detection drift and deploys changes safely (no funding details).
- RHIC Agripass — autonomous weeding robots for agriculture (no funding details).
- Diligent — governance and compliance platform (no funding details).
🎬 Fun & Miscellaneous
- The moongate-community built Moongatev2, a modern Ultima Online server emulator from scratch in .NET 10 with NativeAOT, Lua scripting for item behaviors, spatial world partitioning with delta sync, snapshot persistence, embedded admin API + React UI, and auto-generated doors from map statics. No combat or skills yet, but the architecture is clean.
- CatFu shared a POV video of a cat acting like it watched too many Wu-Tang Collection kung-fu movies.
- che_shr_cat built a real-time geometry manifold explorer using diffusion models.
- wstv_lizzi shared early data on Chinese AI adoption in enterprises showing rapid uptake of local models for internal tools.
Previous Around the Horn Digests
Catch up on everything you missed:
- March 1-7, 2026: Anthropic sued the Pentagon, GPT-5.4 dropped, a prompt injection infected 4,000 devs, Claude hit #1 in the App Store, Block cut half its workforce, and Bannon and Rice signed the same document. 1,000+ stories from the wildest week yet.
- February 23-28, 2026: Anthropic vs. the Pentagon pt 1., IBM's COBOL crash, GPT-5.3 leaks, AI wargames, and 90+ stories from a wild week.
- Rest of February: Anthropic's 53-page sabotage report, Chrome's AI agent superpowers, OpenAI's erotica controversy, and 40+ new tool launches.
That's a Wrap
That's 180+ stories from the past week. If you made it to the bottom, congrats... you're now the most dangerous person in any meeting this week. Dangerous in a good way. Probably.
For the daily version (bite-sized, 5-minute reads), make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.
See you next week.
P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.