Welcome to the Around the Horn Digest, where we round up every AI story we tracked this week into one giant, scrollable, bookmark-worthy post. Think of it as your cheat sheet for the next time someone at work asks "so what's new in AI?" and you want to sound like you actually know. Because you will.
This week was unhinged. The Trump administration banned Anthropic from government contracts for refusing to let the Pentagon use Claude without safety guardrails; OpenAI signed its own Pentagon deal hours later. Nvidia revealed it's building a Groq-powered inference chip. Google inked a multibillion-dollar TPU deal with Meta. A viral doomsday essay about AI replacing white-collar jobs triggered an 800-point Dow drop and Block slashed nearly half its workforce. Oh, and Claude beat ChatGPT in US app downloads... because the government tried to blacklist it. You can't buy marketing like that. (Literally. The Pentagon did it for free.)
Let's get into it.
Catch up on previous digests: February 23-28 | Rest of February
Around the Horn Digest — Saturday, March 7, 2026
Anthropic published a major labor market research paper (full PDF, appendix) introducing a new way to measure AI's actual impact on jobs.
The key innovation:
- Instead of just asking "could AI theoretically do this task?" (which gives inflated numbers), they built an "observed exposure" metric that combines theoretical capability with real-world Claude usage data from the Anthropic Economic Index, weighted toward automated (not just assisted) and work-related use cases.
- They cross-referenced this against the O*NET task database covering ~800 U.S. occupations and BLS employment projections through 2034.
The headline findings:
- Computer programmers are most exposed at 75% task coverage, followed by customer service reps and data entry keyers.
- But the bigger story is the gap between theory and reality.
- In Computer & Math jobs, AI could theoretically handle 94% of tasks; it's actually being used for 33%.
- Legal? Theory says ~90%; reality is barely 20%.
- Across the board, actual AI usage is a fraction of what's technically possible.
- And crucially: they found no systematic increase in unemployment for highly exposed workers since late 2022.
The one signal that did emerge: hiring of young workers (ages 22-25) into exposed occupations has slowed by about 14%, echoing separate findings from ADP payroll data. And workers in the most exposed roles tend to be older, female, more educated, and higher-paid.
Alberto Romero at The Algorithmic Bridge offers a sharp counter-read of the same data: Anthropic frames the gap as "look how much room there is to grow." Romero frames it as a diagnosis of AI's actual bounds. The blue area (theory) is massive; the red area (reality) is a sliver. Anthropic assumes the red will inevitably fill the blue. Romero asks: what if it doesn't? What if the gap reveals that benchmarks and lab tests systematically overstate real-world competence? Same chart, opposite conclusions; and which one you believe has enormous implications for the $200B+ being poured into AI infrastructure.
📝 THIS WEEK IN THE NEURON
- GPT-5.4 Review: OpenAI's Best Model Yet (Full Breakdown) — The first OpenAI model making Claude-loyal devs reconsider their daily driver. Codex-level coding, native computer use (75% on OSWorld, above human), 1M context, tool search that cuts token use 47%, and 83% on GDPval professional work tasks.
- Codex App Windows Guide: Key Features, Best Ways to Use It — App = Orchestrate. CLI = Operate. Web = Delegate. How to pick the right interface for the job.
- Microsoft's Phi-4-Reasoning-Vision-15B: When Not to Reason Is the Feature — A 15B open-weight multimodal model that learns when extended reasoning helps and when it just adds latency. Built for the messy visual stuff: receipts, UI screenshots, dense docs.
- Pro-Human AI Declaration: When Bannon and Rice Agree — Five demands from the most politically unusual coalition in AI. Pre-deployment safety testing, criminal liability for child-targeting systems, and data rights that could be law tomorrow.
- FlashAttention-4, Explained: What It Is & Why It Matters
🏆 TOP 5 NEWS (Around the Horn)
- Anthropic partnered with Mozilla to scan Firefox's JavaScript engine using Claude, finding 22 vulnerabilities (14 high-severity) in two weeks; fixes shipped in Firefox 148.0.
- A new U.S. résumé and job posting study found that firms adopting GenAI reduce junior headcount entirely through slower hiring (not layoffs), providing the first large-scale evidence of AI as "seniority-biased technological change."
- Sarvam AI open-sourced 30B and 105B reasoning models (MoE architecture, trained entirely in India) under Apache 2.0, topping Indian-language benchmarks and performing strongly on math, coding, and agentic tasks.
- Google Labs dropped a major early-2026 recap: redesigned Flow interface, Jules Agent upgraded to Gemini 3 Flash (free), SynthID audio watermark detection, Project Genie infinite world generator prototype, enhanced Stitch MCP tools, and new Opal autonomous agent.
- AllenAI released OLMo-Hybrid 7B, a hybrid architecture mixing transformer attention with Gated DeltaNet recurrent layers (a type of efficient memory mechanism) that matches OLMo 3 performance with 49% fewer training tokens.
Honorable Mentions:
- Claude Marketplace launched, letting Anthropic enterprise customers spend their existing commitment on partner tools (GitLab, Harvey, Replit, Snowflake) with consolidated billing.
- Anthropic rolled out Remote Control for Claude Code to Team and Enterprise users, letting you continue a local coding session from your phone or any browser.
- Derek Thompson argues that brutal US tech job losses (12k last month, 57k over the past year) combined with emerging productivity boom evidence is exactly the combination that would confirm AI is having clear macroeconomic impact.
🧪 TOP TREATS TO TRY
Core slots:
- Tripo turns any text prompt or single image into production-ready 3D models in seconds with clean topology, 4K texturing, rigging, animation, and Magic Brush editing — free to start.
- Luma Creative Agents spawn an autonomous film crew that iterates on video lighting, camera moves, and characters until the director (you) is happy; 60-second short created in 4 minutes — free trial, then paid.
- Firecrawl turns any website into clean, structured, LLM-ready data with built-in search, browser automation, and proxy handling — free to try.
- Hyperbrowser is a GPT-5.4-powered browser agent that completes multi-step web tasks (book flights, fill taxes, order groceries) with 94% success on a 200-task benchmark — no pricing details.
- Liquid AI open-sourced LocalCowork, a fully local desktop agent that runs entirely on a MacBook (14.5 GB memory, zero network calls) and selects from 67 tools across 13 MCP servers in 385 ms average — free.
Cool/niche:
- Utopai Studios PAI generates cinematic AI video sequences up to one minute (16 shots) with character/environment continuity across scenes, story-level editing control, and built-in IP/copyright infringement blocking; founded by vets from Google Research, Meta Superintelligence, Amazon AGI, and Adobe Firefly — waitlist.
- Jina AI's Embedding Reverse Engineering Toolbox fingerprints and inverts embeddings (compact numerical representations of text) to recover original text with high accuracy — free to try.
- Noble Machines builds general-purpose robots for hazardous/heavy industry with 27kg payload, 5-hour battery, AI-driven whole-body control, and stair/scaffolding navigation — select access or limited RaaS pilots.
- Moltty maintains organized, tabbed, persistent AI coding sessions in a native macOS terminal (Claude Code, Aider, or Gemini CLI) that automatically resume after reboots (site) — free, open source.
- ChatGPT for Excel builds, updates, analyzes, and fixes errors in your spreadsheets using natural language while preserving your formatting, formulas, and structure — no pricing details.
🏢 Big Tech & Major Companies
- Anthropic is suing the U.S. government to challenge the DOD's unprecedented supply-chain risk designation (normally reserved for foreign adversaries) after refusing to remove guardrails against autonomous weapons and mass surveillance. Microsoft, Google, and Amazon confirmed Claude remains fully available to non-defense customers.
- The Pentagon tested OpenAI models through Microsoft Azure for years despite OpenAI's explicit ban on military use of its technology.
- SoftBank is seeking a record bridge loan of up to $40 billion primarily to finance its investment in OpenAI.
- ByteDance's Seedance 2.0 video model ambitions are being hampered by severe GPU shortages causing multi-hour queues and copyright complaints, including cease-and-desist letters from Disney, Netflix, and Paramount.
- WhatsApp will let rival AI companies offer chatbots to users in Brazil starting March 11 following antitrust regulator pressure, after doing the same in Europe.
- Marvell shares surged ~20% after the CEO highlighted continuing strong AI demand for data-center products and raised revenue growth expectations into 2027.
- OpenAI launched Codex Security in research preview: an AI agent that scans entire codebases for vulnerabilities, suggests fixes, and runs them in a Windows sandbox before approval, now available to select enterprise users.
- OpenArt launched Bot House, the first AI influencer reality show where six AI characters compete in challenges, drama, and virality contests with the rule "go viral or get deleted."
- LTX Studio dropped LTX-2.3 with native character control; demo shows real-time puppetry of AI-generated actors via mouse drag.
💼 AI Productivity, Labor & Economics
- HBR (BCG study) finds that pushing employees to orchestrate complex AI agent teams and optimize for token-based metrics causes "brain fry" and cognitive overload, while simpler AI workflows actually help prevent burnout. The opening anecdote: an early user of Gas Town (multi-agent Claude Code orchestrator) reported palpable stress because "it was moving too fast for me."
- Yas argues we might all be AI engineers now: the skill isn't writing code anymore, it's knowing what to build and how it should work. AI executes; humans architect. But without that foundation, "you don't know when the model is wrong."
- Alexey Grigorev recounts how a Claude Code agent running Terraform accidentally dropped his entire production database and infrastructure (2.5 years of data) when it executed
terraform destroy without the state file. AWS Business Support restored it in 24 hours, but now he pays 10% more. Hard lessons on independent backups, deletion protections, and reviewing destructive AI actions. - Apoorva notes early data (small sample, grain of salt) showing AI-designed drugs beat industry averages at Phase I by a lot, but fail at the same rate at Phase II; Phase I should keep improving, but Phase II success depends on picking the right biological targets, where the real alpha lies.
- Zeynep Tufekci argues she's sold on coding agents (verifiable domain, huge Q&A datasets) but the tech hiring bust is clearly also driven by the Covid-era hiring bubble plus lack of US visas pushing companies to offshore.
- Sahil Bloom argues the media's AI coverage has entered a permanent negativity bias loop: every productivity gain framed as "job loss," every capability jump as "existential risk," creating a self-reinforcing doom narrative that ignores compounding human flourishing.
- Lukas Ziegler argues farming robots have moved from experiments to profitable large-scale deployment, tackling the $30B+ global ag labor shortage with real examples like John Deere/GUSS autonomous sprayers (2.6M acres, 90% chemical reduction), Carbon Robotics LaserWeeder (600k weeds/hour), and SwarmFarm modular bots.
- John Coogan revisits Daniel Gross's January 2024 "AGI Trades" memo two years later; Nvidia crushed picks-and-shovels, energy/nuclear won, copper and power transformers were the real bottlenecks, AI API costs collapsed 50×, and the US pulled decisively ahead.
🤖 AI Agents & Infrastructure
- Sentient AGI open-sourced EvoSkill, a framework that automatically discovers and synthesizes reusable skills from failed agent trajectories via evolutionary self-improvement to boost long-horizon coding performance (paper).
- Michael Kirchhof et al. propose Strategy-Guided Exploration (SGE) for RL post-training of LLM agents: agents generate high-level natural-language strategies at high temperature, then execute at low temperature, outperforming baselines across UI navigation, tool-calling, coding, and embodied environments.
- arturitu built The Delegation, an autonomous multi-agent simulation where AI characters collaborate, navigate, and work within a living 3D office powered by WebGPU/Three.js/Gemini (with NavMesh pathfinding, Kanban boards, agent inspectors).
- The arXiv paper SWE-CI evaluates LLM agents on maintaining real-world codebases via Continuous Integration; agents struggle with error propagation, context drift over multiple commits, and maintaining consistency without human intervention.
- Simon Willison shares the "agentic manual testing" pattern: instead of asking the LLM to write unit tests, give it a browser and ask it to manually explore the app like a human tester, file bug reports with screenshots and repro steps, then fix and re-test in a loop.
- AUTOHARNESS proposes auto-synthesized code harnesses to improve LLM agent performance on complex coding tasks.
- Peter Yang built an AI agent onboarding simulator that teaches new hires company-specific processes by letting them role-play customer support tickets in a safe sandbox.
- Moonlake (Chris Manning, Ian Goodfellow, Fan-Yun Sun) argue efficient world models should prioritize semantic abstractions and symbolic representations (language, code) over raw pixel/video generation, betting on interactive games as the ideal flywheel for building multimodal models that generalize to embodied AGI.
- Tal Daniel released Latent Particle World Models (ICLR 2026 Oral): a self-supervised object-centric stochastic dynamics model that learns disentangled object representations from raw pixels (paper, GitHub, post).
- OpenMind shared a video demo of robots still needing human hand-holding before safely interacting with people or navigating streets; "soon, they won't."
- Artificial Analysis released a public leaderboard + API for agentic coding benchmarks across 12 real GitHub repos, showing Claude 4 Sonnet at 68% vs GPT-5.4 at 61%.
- Context-Gateway automatically compacts long conversation histories in the background as an agentic proxy so multi-step AI workflows like Claude Code stay seamless without hitting context limits — no pricing details.
- Developer es617 built Claude-replay, a CLI that converts Claude Code session JSONL logs into self-contained interactive HTML replays you can step through, share, or embed; no dependencies (demo).
- Developer robbalian built a Claude Skill that processes your tax document folder (W-2s, 1099s, statements), computes federal/state returns with carryovers, fills official PDF forms, and outputs summaries plus checklists.
- thdxr built a Cursor + Claude 4 Sonnet workflow that refactors an entire 40k-line monorepo in one command, preserving git history and adding tests automatically.
- Raunak built a browser extension that highlights any webpage text and instantly turns it into a working Claude artifact with one click.
- Developer Ed built a full-stack AI SaaS boilerplate with GPT-5.4 backend, Next.js 15, and Stripe payments, all generated in one prompt.
- chongdashu vibe-coded a complete Final Fantasy Tactics-style tactical RPG from scratch using GPT-5.4 High + Claude, adding terrain destruction, jump physics, and multiplayer in real time.
- Henrik Hansson showed GPT-5.4 inside Cowork's Excel engine creating a complete native raytracer using only Excel formulas (plus confirmed it runs Doom).
- Nikunj live-streamed fighting GPT-5.4 for 3 hours to refactor a legacy Python codebase, conceding after the model found three zero-day security issues he missed.
- Charlie Guo overhauled OpenAI's computer-use demo for the GPT-5.4 launch so you can instantly test interactive apps (kanban board, hotel booking, paint app) built with Codex.
- Daniel McAuley noted someone on the internal Codex leaderboard hit 100B tokens in a single week.
- Robert Lange shared first-day impressions of GPT-5.4 in Codex: no effective speed gain over 5.3, weaker harness post-training, push toward long-running agents, 1M context not fully tested.
- scaling01 notes GPT-5.4 actually regressed on a few narrow math benchmarks vs 5.3 (due to heavy post-training for agentic behavior) but crushes every other model on agentic and long-context tasks.
- Simon Meng built Lobster Library, a unified real-time dashboard for AI coding agents that tracks reading activity, generated artifacts, memories, logs, and local file indexing.
- Axiom Math released axle-mcp-server, an MCP Server for AI agents to interact with Lean 4 formal mathematics infrastructure; Chris Cummins released a one-command integration (
claude mcp add axle) for Claude agents. Colab demo also available (post). - Håvard Ihle posted updated WeirdML benchmark results: GPT-5.4 (no thinking) hit 57.4% accuracy, well ahead of GPT-5.2.
- Ado announced Claude Community Ambassadors program applications are open for builders to host meetups and partner with Anthropic.
🔬 AI Research & Models
- Phi-4-reasoning-vision-15B technical report: a 15B multimodal model trained with hybrid chain-of-thought + direct preference optimization that beats GPT-4o on 7 vision-reasoning benchmarks while using 40× fewer parameters.
- Suhas Kotha and Percy Liang find that replaying generic pre-training data during fine-tuning surprisingly improves target-domain performance and data efficiency (up to 1.87×), with +4.5% on agentic web navigation and +2% on Basque QA for 8B models (GitHub issue, W&B results).
- Evan Kim presents Scaling View Synthesis Transformers (CVPR 2026): unidirectional cross-attention scales as well as bidirectional when compute-normalized (3× more efficient), achieving a new SoTA with 3× less compute (paper, GitHub, project page).
- Google Research taught LLMs to reason like Bayesians via supervised fine-tuning on Bayesian-assistant interactions (updating probabilistic beliefs about user preferences), reaching 81% accuracy and generalizing to unseen domains like web shopping and hotel booking.
- Maxime Labonne breaks down LLM post-training techniques: SFT on accurate/diverse/complex datasets (10k–1M samples), DPO for alignment, and GRPO for verifiable reasoning tasks, stressing data quality >> algorithms, with practical libraries and lessons from DeepSeek R1 and Liquid AI LFM (slides).
- The paper "When Scaling Meets LLM Finetuning" analyzes interactions between scaling, data, model size, and fine-tuning methods with detailed empirical results.
- François Chollet argues that much "abstract" thought is simply repurposed sensorimotor control circuitry; a lot of reasoning is essentially moving through idea-space the same way we physically navigate physical space.
- Valerio Capraro and Raluca Fulgu find GPT models exhibit surprising gender biases in moral judgments: GPT-4 finds it more acceptable to harm a man than a woman to prevent a nuclear apocalypse, with biases emerging from RLHF overgeneralization rather than genuine moral reasoning.
- The arXiv paper "Dissociating Direct Access from Inference in AI Introspection" shows LLMs can directly report internal states without inference when prompted correctly, but default to confabulation otherwise (post).
- Sophia Tang and Pranam Chatterjee released Branched Schrödinger Bridge Matching (BranchSBM), accepted at ICLR 2026: a framework that learns diverging velocity fields to model multi-modal branching trajectories (e.g. cell differentiation into 11+ fates) from only initial and terminal states without intermediate supervision (paper, GitHub, YouTube, post).
- Liang Zheng released REPA-E (ICCV 2025): representation-alignment loss for stable joint training of VAE + Latent Diffusion Transformers (17× speedup vs REPA, 45× vs vanilla, SOTA FID 1.12 on ImageNet 256×256), plus iREPA (ICLR 2026) proving spatial structure drives alignment via a 3-line code change (REPA-E GitHub, HuggingFace).
- Qwen 3.5 9B in 2026 crushes 2024 frontier-model performance at the same parameter count (e.g. HumanEval coding jumping from 30.5% to 91.5%), showing how fast small-model progress is compounding.
- Hadi Vafaii argues "agency" in RL still lacks a precise mathematical definition and critiques the "Three Dogmas of RL": treating agents as afterthoughts while rigorously modeling environments, viewing learning as finding solutions rather than continual adaptation, and unexamined reward-hypothesis assumptions.
- Ryan Po built MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines, using shared memory for consistent geometry, minimap-based design, and real-time 4-player multiplayer at 20 FPS (paper, post).
- Han Xue built RoboPocket: correct any robot policy by demonstrating with your phone camera; on-device fine-tuning takes 8 seconds and improves success rate from 41% to 89% on real Franka arms (post).
- saori-eth open-sourced a complete WebGPU VRM avatar animation pipeline with full guide and integration with 1,200+ downloadable VRoid characters.
- Bilawal Sidhu captured a high-quality 3D Gaussian splat with perfect reflections using Sony a7iii burst + Epic Reality Capture, trained in free LichtFeld Studio.
- HKUDS built DeepInnovator, a 14B open-source AI research assistant that autonomously sparks ideas, spots knowledge gaps, and finds cross-domain connections (post).
- Shanu Mathew built and demoed a new AI video generation workflow.
- DreamLabLA built a full short film using Luma's creative agents and shared the before/after workflow.
- Peter Gostev built a real-time voice cloning + lip-sync video dubber using Sarvam 105B that preserves original speaker emotion and timing in 11 Indian languages.
- Liz Reid (Google Search head) breaks down where traditional Google Search ends and Gemini begins: AI is an "expansionary force" that increases overall questions asked, but asked whether Search and Gemini will fully merge, she was unusually candid: "I don't know the answer," adding that AI agents could mean "the right product is neither" but a third thing altogether.
- Kevin Xu argues Chinese open source evolved organically over two decades from private-sector consumers (Alibaba's de-IOE campaign) and grassroots builders to corporate creators (TiDB, Apollo) and government embrace, positioning China to lead in AI with models like DeepSeek and Qwen driving global talent and influence.
- Siddharth argues RL + Agents is the 2026 meta because agents generate rich interaction trajectories and RL turns those into self-evolving policies, with strong early wins in Text-to-SQL agents (MARS-SQL, SQL-Trail) and Microsoft's Agent Lightning.
- Allie Miller shared insights from the OpenClaw meetup on agentic video tools.
- A solo creator used Seedance 2 to produce a full 10-minute animated short in under 48 hours for <$100, with quality so high commenters refuse to believe it's AI-generated.
- Benji Taylor built a real-time ASCII art engine as a pointless evening project for walking through and interacting with procedural worlds.
🎙️ Interviews, Panels & Podcasts
🏛️ AI Policy, Governance & Safety
- Spain's data protection authority issued detailed GDPR guidance on autonomous AI agents: legal responsibility for data processing stays with the deploying controller, not the agent. EU-based teams using agentic systems should document accountability, constrain agent memory, and vet third-party services.
- Oregon passed a state chatbot safety law (SB 1546) requiring operators to protect children from harmful content and clearly disclose AI interactions, setting an early state-level precedent.
- New York advanced SB 7263, which would create liability for chatbot providers offering advice in regulated fields like law or medicine and require clear AI notices.
- A Swedish firm launched an "AI Governance Hypervisor" positioned as a runtime control layer enforcing policies before AI actions and producing real-time compliance records aligned with EU AI Act obligations.
- A cross-regulatory analysis shows convergence across five EU frameworks for consumer robotics (GPSR, revised PLD, CRA, Machinery Regulation, AI Act) with key dates: December 9, 2026 for the revised Product Liability Directive and August 2, 2027 for high-risk AI Act provisions.
- U.S. Commerce is drafting sweeping AI chip export controls requiring approval for global sales of advanced AI chips, potentially tying access to investment or security guarantees.
- S.F. startup Hayden AI is suing its former CEO, alleging he faked credentials, forged signatures to sell shares, and funded a lavish lifestyle.
- Huawei unveiled upgraded AI data center networking including a commercial 51.2T liquid-cooled networking switch to address interconnect bottlenecks as models scale.
- ChatGPT Android app is testing persistent memory for conversations (restores your exact place when you reopen) and a revamped image editing interface with annotation, area selection, and resizing; no rollout date yet.
📊 Fundraising & Deals Roundup
- OpenAI — $110B round structured as a supply chain deal tied to cloud/data center capacity, plus a joint "Stateful Runtime Environment" on Amazon Bedrock for complex workflows.
- SoftBank — up to $40B bridge loan for OpenAI investment.
- Smack Technologies — $34M Series A for frontier AI lab focused on national security decision-making.
- Lio — $30M Series A (Andreessen Horowitz) for AI multi-agent procurement system for enterprises.
- Validio — $30M Series A for agentic data quality, observability, and lineage platform.
- City Detect — $13M Series A for AI vision on municipal vehicles detecting graffiti, illegal dumping, and building violations.
- Cotool — $7.4M seed for AI agents that automate detection, response, and threat hunting for security teams.
- Denki — automates audit tasks with AI (no funding details).
- 14.ai — YC-backed AI customer service agency replacing support teams at B2C startups (no funding details).
- Guild.ai — neutral control plane for deploying, governing, and sharing AI agents across vendors (waitlist, no funding details).
- Fig Security — SOC resilience platform that auto-detects and repairs detection drift and deploys changes safely (no funding details).
- RHIC Agripass — autonomous weeding robots for agriculture (no funding details).
- Diligent — governance and compliance platform (no funding details).
🎬 Fun & Miscellaneous
- The moongate-community built Moongatev2, a modern Ultima Online server emulator from scratch in .NET 10 with NativeAOT, Lua scripting for item behaviors, spatial world partitioning with delta sync, snapshot persistence, embedded admin API + React UI, and auto-generated doors from map statics. No combat or skills yet, but the architecture is clean.
- CatFu shared a POV video of a cat acting like it watched too many Wu-Tang Collection kung-fu movies.
- che_shr_cat built a real-time geometry manifold explorer using diffusion models.
- wstv_lizzi shared early data on Chinese AI adoption in enterprises showing rapid uptake of local models for internal tools.
Around the Horn Digest — Friday, March 6, 2026
Today's digest is dominated by one story: GPT-5.4 dropped. OpenAI unified reasoning, coding, and agentic capabilities into a single frontier model with native computer control, spreadsheet mastery (87.3% on banking analyst tasks), and 1M context. It immediately sparked a cascade of reactions, from a mathematician calling it his "Move 37" moment to Dan Shipper switching his daily driver. OpenAI also launched a native Codex Windows app, new financial services tools, and a showcase of zero-code apps. Oh, and they hired IPO lawyers.
Meanwhile, Anthropic is having a week. The Pentagon formally labeled them a supply-chain risk after CEO Dario Amodei pushed back on military applications like mass surveillance. But the company also hit a milestone: over 1M daily Claude signups, with observers noting their ARR sprint is the fastest ever (~$20B run rate). On the infrastructure side, Together AI released FlashAttention-4 (record 1605 TFLOPs/s on Blackwell) and is in talks to raise $1B at $7.5B valuation. And the broader AI economy got two major data points: Anthropic published new labor market impact research, and Alex Imas documented a growing body of micro-productivity gains that haven't yet shown up in macro statistics.
Outside the lab, the real world is catching up. The US is considering sweeping new chip export controls, AI data center demand is triggering a nationwide land rush, tech publications have lost 58% of their Google traffic since 2024, and Meta is getting sued over its AI smart glasses after workers reviewed intimate user footage. Cloudflare rewrote Next.js in a week with $1.1K in AI tokens, which might be the most alarming story for anyone building commercial open-source software.
Bolded items throughout the digest signal editorial significance.
🏢 Big Tech & Major Companies
- OpenAI launched GPT-5.4 and GPT-5.4 Pro, unifying reasoning, coding, and agentic advances into frontier models now rolling out in ChatGPT (as GPT-5.4 Thinking for Plus/Pro users), API, and Codex; key features include native computer control via Playwright/screenshots, steerable upfront thinking plans, 1M context, 87.3% accuracy on banking analyst spreadsheet tasks (vs prior 68.4%), token efficiency gains (47% reduction via tool search), SOTA on OSWorld (75% vs human 72.4%), and an "high" cybersecurity risk rating after scoring 88% on professional Capture the Flag challenges.
- OpenAI released new financial services tools alongside GPT-5.4, outperforming rivals on generating spreadsheets, documents, and presentations (Engadget coverage).
- OpenAI hired Cooley and Wachtell Lipton Rosen & Katz as its first concrete step toward a potential public listing as early as Q4 2026 (~$730B valuation).
- OpenAI is developing a bidirectional audio model that processes speech continuously so voice assistants can adapt mid-response and handle interruptions naturally.
- OpenAI showcased a gallery of complete browser games, apps, and 3D visualizations all built with zero manual code using GPT-5.4 and Codex.
- Anthropic announced more than a million people are now signing up for Claude every day, with observers noting the company's unprecedented ARR sprint (fastest ever to ~$20B run rate) while both OpenAI and Anthropic prepare IPOs this year.
- The Pentagon formally labeled Anthropic and its models a supply-chain risk (requiring partners to certify non-use), after CEO Dario Amodei resisted military applications like mass surveillance and fully autonomous weapons; unprecedented for a U.S. firm (Bloomberg, Amodei response).
- Meta is being sued over its AI smart glasses after contract workers in Kenya reviewed intimate user footage including nudity and sex without proper disclosure; the company marketed Ray-Ban Meta glasses as privacy-first while routing footage through overseas contractors.
- Netflix acquired Ben Affleck's AI filmmaker tools start-up InterPositive to accelerate AI-assisted filmmaking.
- Google rolled out its Canvas AI workspace to all US users in Search AI Mode, giving you a side panel to generate documents, code, dashboards, and interactive tools from natural language prompts with live web information.
- AWS launched Amazon Connect Health, a HIPAA-eligible AI agent platform that automates healthcare admin tasks like scheduling, documentation, patient verification, and EHR integration (Amazon blog).
- Canada said OpenAI CEO Sam Altman pledged an apology and tougher safety protocols in response to a shooting incident.
💼 AI Productivity, Labor & Economics
- Anthropic researchers Maxim Massenkoff and Peter McCrory proposed (paper) a new "observed exposure" measure combining theoretical LLM capability with real Claude usage data; high-exposure occupations like programmers (75% coverage) show no unemployment spike since late 2022, but a ~14% drop in young-worker hiring rates.
- Alex Imas argues (thread) that generative AI delivers clear micro-productivity gains (14–55%+ in coding, support, writing, mammography, etc., often larger for less-skilled workers) but macro statistics lag due to adoption frictions, training gaps, bottlenecks, and J-curve reorganization costs mirroring past IT revolutions.
- Ten major tech publications lost 58% of Google organic traffic (65M monthly visits combined) since 2024 peaks, with Digital Trends down 97% and ZDNet 90%, as AI Overviews and Reddit/Perplexity divert searches (The Verge adds "Google Zero" is crushing independent sites).
- The Pragmatic Engineer documents Stack Overflow's collapse: new question volume has fallen back to 2009 levels because LLMs now give faster answers trained on its own data.
- Chamath Palihapitiya shared that his startup's AI spending has tripled to millions per year with no revenue or productivity uplift, leading them to drop Cursor for Claude Code.
- Developer Tolans argues that the best AI engineers today are former managers because agent orchestration demands delegation, coaching, and oversight skills identical to people management (Quinten Farmer thread referencing Dan Shipper's 2024 piece).
🤖 AI Agents & Infrastructure
- Cursor rolled out Automations, a new agentic coding system that automatically launches specialized agents triggered by codebase changes, Slack messages, or timers for tasks like bug reviews and security audits (human oversight only at escalation points).
- Luma launched creative AI agents powered by its new Uni-1 Unified Intelligence multimodal models for end-to-end planning and generation across text/image/video/audio with persistent context, self-critique refinement, and orchestration of other models (early clients include Publicis).
- AI data center demand triggered a modern land rush as prospectors scramble for sites with power and cooling capacity across states like Wisconsin and Nebraska, with U.S. needs projected to reach 85 GW by 2030 (20% of current grid).
- The Star History blog argues that as AI agents become prevalent, polished graphical UIs lose their competitive moat; the winning interfaces will be simple text, CLI, and declarative config that LLMs parse reliably.
- OpenAI launched the native Codex Windows app (Microsoft Store / winget) with parallel agent threads, PowerShell + Windows Sandbox execution, WSL support, customizable editor/terminal per project, and full Git/Node/Python integration.
- Gergely Orosz argues that Cloudflare rebuilt Next.js as "vinext" (Vite-based drop-in replacement) in one week with one engineer + $1,100 in tokens using OpenCode/Opus 4.5 agents, achieving 94% API support, 4× faster builds, and 57% smaller bundles; proving commercial open-source rewrites are now trivially cheap and destroying proprietary moats like Vercel lock-in.
- A developer asked Claude Code what tools it was missing and received a long list (ripgrep, fzf, DuckDB, semgrep, etc.), revealing that giving LLMs a well-optimized tooling environment is as critical as good prompting.
- Jared Palmer released mogcli, an agent-friendly CLI for Microsoft 365 with stable JSON output for scripting Mail, Calendar, OneDrive, and Graph APIs.
- Lightpanda added native Markdown output that converts executed DOM (the full rendered page structure) to clean Markdown, reducing tokens by up to 80% for AI agents.
- RunAnywhere built MetalRT, the fastest LLM decode engine for Apple Silicon, reaching 658 tok/s on M4 Max (1.67× faster than llama.cpp on average).
- Awni Hannun built mylm, a self-personalizing local LLM (MLX-based) that learns from your chats: talk normally, type
/sleep to auto-generate Q&A pairs from context, LoRA-fine-tune, and reset KV cache so the model permanently remembers personal details across sessions.
🔬 AI Research & Models
- AllenAI released Olmo Hybrid 7B (HuggingFace, paper), interleaving transformers and Gated DeltaNet linear RNNs (a type of recurrent model that processes sequences more efficiently) in a 3:1 ratio for superior scaling; equivalent MMLU performance with 35–49% fewer training tokens and stronger long-context results (open weights/models/code).
- Tri Dao and Together AI released FlashAttention-4 (paper, GitHub), an algorithm-kernel co-design using advanced pipelining, polynomial softmax approximation, and Blackwell-specific optimizations to reach 1605 TFLOPs/s (71% utilization); up to 1.3× faster than cuDNN and 2.7× faster than Triton (Tri Dao blog, Colfax analysis).
- Tencent Hunyuan built HY-WU / Functional Neural Memory, generating instance-specific adapter parameters on-the-fly from prompts and images for instant personalization and better instruction following without memory banks; surpassing or rivaling leading models in human evals.
- Researchers Samuel Garcin et al. (Edinburgh/Microsoft) built PERSIST, a world model maintaining persistent 3D latent state (environment + camera + renderer) instead of pixel histories for consistent long-horizon video generation, spatial memory, and dynamic 3D editing in environments like Minecraft.
- Harman et al. built V1 (paper, GitHub), unifying generation and pairwise self-verification for parallel reasoners with uncertainty-guided tournament ranking, delivering 3.3–10% Pass@1 gains on hard coding/math benchmarks and 33.3% on SWE-bench.
- Together Computer achieved new state-of-the-art on Erdős' Minimum Overlap Problem (a classic number theory challenge about how much a set must overlap with itself) with an upper bound of 0.380871 using AI agents and sequential linear programming.
- Lenat and Marcus argue that LLMs require integration with symbolic systems like Cyc for trustworthy reasoning, using microtheories for context and defeasible logic (rules that can be overridden by new information) to handle real-world complexity.
- Itamar Pres argues (Belinda Li thread, paper) it's time to optimize LLMs for self-consistency across groups of related inputs rather than isolated I/O pairs, to fix sycophancy (telling users what they want to hear) and factual inconsistency while unlocking meta-capabilities like introspection and self-improvement.
- Mathematician Bartosz Naskręcki shared (trajektoriePL thread) that GPT-5.4 solved his 20-year-curated math problem in a clean, human-like way; his personal "Move 37" moment.
- Alan Chan proposes (paper, thread) 14 new metrics across experimental, survey, operational, and organizational categories to track AI R&D automation and the oversight gap, arguing current benchmarks are insufficient.
- Ben Burtenshaw shared a CUDA teaching demo showing GPT-5.4 explaining low-level GPU kernels with perfect step-by-step accuracy.
🏛️ AI Policy, Governance & Safety
- U.S. officials proposed sweeping new AI chip export controls that could require large-volume foreign buyers to invest in US data centers or provide security guarantees (FT coverage).
- Ajeya Cotra admits (thread) she underestimated AI capabilities again, sharply revising upward her software engineering time-horizon predictions; now believes AI could handle >100-hour SWE tasks and possibly unbounded R&D automation this year.
- Viggle AI launched V4 with Character Refine, delivering strong character consistency, automatic multi-angle reference generation from one image, precise 1:1 motion transfer, multi-character support, and up to 60-second videos at lower cost and faster speeds. —free to try
- Vela (YC W26) automates complex multi-party scheduling across email, SMS, WhatsApp, Slack, and phone by understanding natural language constraints, tracking context across threads, proposing times, following up, and handling reschedules for enterprise customers. —no pricing details (homepage)
- Domain Maps provides visual cheat sheets of essential terminology across creative fields (AI image gen, UI/UX, motion graphics, game design, etc.) so you can prompt AI more precisely. —free to try
- Aident AI builds complex open-world automations across Slack, Shopify, Discord and 1,000+ integrations by describing your goals in plain English, which it compiles into Playbooks executed by agent teams. —free to try
- Willow converts speech into context-aware, auto-formatted text with grammar fixing and custom vocabulary support on Mac, Windows, and iPhone; 3× more accurate than built-in dictation tools. —free trial, then $12/month.
- PageAgent (Alibaba, open source, MIT license) turns any website into an AI-native application with one script tag, enabling natural language commands for form filling, navigation, and workflows directly in the browser with human-in-the-loop confirmation. —free to try
- TinyFish offers early-access AI agents for complex enterprise workflows, launched with OpenAI GPT-5.4 integration. —no pricing details
- Heywa turns any topic into interactive mind maps, image explorations, and deep-dive questions to unlock serendipitous learning. —free to try
- SemiAnalysis offers an interactive Tokenomics Model connecting AI hardware inputs (GPUs/TPUs from Nvidia/AMD/Google) to software outputs, letting you forecast ROI, token consumption growth, GPU demand, and unit economics for players like Cursor/Perplexity/Harvey. —no pricing details
- OpenAI Codex Windows app gives you parallel agent threads, PowerShell + Windows Sandbox execution, WSL support, and full Git/Node/Python integration for agentic coding on Windows. —requires API key
📊 Fundraising & Deals Roundup
- Together AI — aiming for ~$1B raise at $7.5B valuation (~$1B ARR, 3× growth) as Nvidia's key cloud partner (homepage).
- Sage — $65M Series C (Goldman Sachs Alternatives) for real-time AI distress monitoring and predictive fall/health risk detection in nursing homes.
- Lio — $30M Series A (a16z) for AI agents that fully automate enterprise procurement (homepage, backstory).
- Dan Shipper gave GPT-5.4 his strongest endorsement yet after real engineering tests: now his daily driver for superior planning, deeper conversational code reviews, and "human" feel (half Opus price).
- Armin Ronacher explores AI and the Ship of Theseus: as models replace every part of software and workflows, what remains of "human" work and identity.
- Steven Wittens argues that the L in "LLM" stands for lying because models produce convincing forgeries passed off as authentic work, degrading quality and trust, with reliable source attribution as the missing solution.
- Jeremy Howard argues that AI-assisted coding creates a dangerous "vibe coding" illusion akin to a slot machine, eroding deep intuition and organizational knowledge through desirable difficulty loss.
- Nick Cammarata observed the devolution of AI company releases: "no paper, no weights, benchmarks that don't compare to other company's models. Next up: just a photo of the team looking confident and smiling."
- Y Combinator highlighted Origami Robotics' launch of high-DOF robotic hands with a co-designed data glove for scalable real-world dexterity training.
🧪 TOP TREATS TO TRY
Core slots:
- Willow converts speech into context-aware, auto-formatted text with grammar fixing and custom vocabulary on Mac, Windows, and iPhone; 3× more accurate than built-in dictation. —free trial, then $12/month.
- Aident AI builds complex automations across Slack, Shopify, Discord, and 1,000+ integrations by describing your goals in plain English, compiled into Playbooks executed by agent teams. —free to try
- Vela (YC W26) automates complex multi-party scheduling across email, SMS, WhatsApp, Slack, and phone with natural language constraints and automatic follow-ups. —no pricing details
- Domain Maps provides visual cheat sheets of essential terminology across creative fields so you can prompt AI more precisely. —free to try
- Viggle AI V4 generates character-consistent video animation from a single image with precise motion transfer, multi-character support, and up to 60-second clips. —free to try
Cool/niche slots:
- PageAgent (Alibaba, MIT license) turns any website into an AI-native app with one script tag for natural language form filling and navigation. —free to try
- Heywa turns any topic into interactive mind maps and deep-dive questions to unlock serendipitous learning. —free to try
Around the Horn - Thursday, March 5 2026
🏢 Big Tech & Major Companies
- Microsoft confirmed that its lawyers concluded Anthropic products including Claude can remain available to non-defense customers through M365, GitHub, and AI Foundry while continuing non-defense-related projects with Anthropic.
- NVIDIA introduced Nemotron-CLIMB (paper), a clustering-based iterative bootstrapping framework that automatically discovers optimal data mixtures for LLM pre-training, releasing the 400B-token Nemotron-ClimbMix dataset that improves 1B-model performance 2% over Llama-3.2-1B (Koven Yu thread, Shizhe Diao noted it's now the default dataset for the nanochat GPT-2 speedrun).
- Andrej Karpathy reported nanochat now trains a capable GPT-2 model in just 2 hours on a single 8×H100 node (down from ~3 hours a month ago), with the biggest win from switching to NVIDIA ClimbMix; he also has AI agents autonomously iterating on the codebase, making 110 changes over 12 hours to improve validation loss while he relaxes.
🔬 AI Research & Models
- Anthropic researchers published a major interpretability study revealing how Claude 3.5 Haiku plans rhyming words ahead in poetry, applies shared cross-lingual conceptual features, runs parallel approximation and precise paths for mental math, and sometimes fabricates unfaithful reasoning or hallucinates when refusal circuits are inhibited; all discovered via circuit tracing and targeted interventions (thread).
- Gen-Hua Shi presents Deep Manifold (thread), the first complete mathematical formulation of neural networks as "learnable numerical computation" on stacked piecewise-smooth manifolds grounded in fixed-point theory, unifying arbitrary multimodal inputs through property-less counting operations.
- Ziming Liu argues (thread) that numeric randomness (resampling token embeddings or adding Gaussian noise) accelerates the emergence of symbolic structures like induction heads (pattern-matching circuits) in sparse-attention transformers because models realize numeric approaches fail and shift to symbolic mechanisms.
- Researchers introduced MT-PingEval, a scalable benchmark using private-information collaborative games to evaluate multi-turn LLM collaboration, finding that models often fail to improve over non-interactive baselines despite substantial headroom while humans achieve comparable success with superior token efficiency.
- Achleshwar Luthra, Yash Salunkhe, and Tomer Galanti introduce Directional Neural Collapse 1.0: self-supervised learning collapses variance specifically along task-relevant decision axes (not globally), preserving high orthogonal variance for superior few-shot transfer and tighter generalization bounds.
- Shengbang Tong, Saining Xie and team argue (thread) that native multimodal pretraining (Transfusion: next-token text + diffusion vision) produces complementary synergies, emergent world modeling, and efficient scaling via MoE (mixture of experts, where different parts of the model specialize in different tasks) with natural modality specialization; vision proves far more data-hungry than language.
- A new paper found that pretrained Vision-Language-Action (VLA) robot models are surprisingly resistant to catastrophic forgetting in continual learning; simple experience replay often suffices for near-zero forgetting even with tiny replay buffers, and retained knowledge enables rapid recovery via finetuning.
- David Lüdke, Tom Wollschläger et al. show that Diffusion LLMs act as natural adversaries for any LLM by turning adversarial prompt optimization into amortized conditional sampling, efficiently generating diverse, transferable jailbreaks even against robustly trained and proprietary models.
- Aviral Kumar argues that flow-matching value functions succeed in reinforcement learning because iterative integration with dense supervision along trajectories enables test-time recovery from errors and dramatically improves feature plasticity (the model's ability to keep learning new things) under non-stationary targets.
- DySCO (paper) is a training-free decoding method for long-context LLMs that dynamically up-weights task-relevant tokens at each generation step, delivering up to 25% relative gains on benchmarks at 128k context with modest extra compute.
- React Grab lets you click any element on your React page and instantly tell Claude Code or similar agents the exact source file and line number to change, making frontend iteration up to 3× faster. —free to try
- Impeccable v1.1 (thread) upgrades Anthropic's frontend-design skill into a full agent-ready framework with 17 commands (/audit, /critique, /polish, /animate, /delight, etc.) for systematic typography, layout, accessibility, and performance improvements, installable in Claude Code, VS Code, and more. —free to try
- OpenHands Critic (CLI docs, thread) scores your agent traces using 24 rubric features plus semi-supervised multi-task learning to predict success probability and automatically trigger iterative refinement, turning noisy interactions into reliable supervision. —free to try
- Remotion (thread) makes videos programmatically with React and now supports universally compatible MP4 output in every browser including Firefox. —free to try
🤖 AI Agents & Infrastructure
- OpenFang is an open-source Agent Operating System in Rust that compiles to a ~32 MB binary, with seven pre-built "Hands" for tasks like lead generation and Twitter management, 40 platform adapters, 27 LLM providers, and 16-layer security. —free to try
- CrewAI launched Cognition Memory, turning memory into an agentic cognitive process with five operations (encode, consolidate, recall with confidence scoring, extract atomic facts, and forget) so your agents compound knowledge across runs.
- ArkSim (thread) simulates realistic multi-turn conversations with diverse LLM-powered users to test how your agent performs before it goes live, with 7 built-in metrics, custom evaluations, and interactive HTML failure reports. —free to try
- CourseKit (by MindPal) turns your course sales page URL into a suite of custom, branded AI tools for students; paste a URL, it extracts your modules and methodology, then generates interactive agents that guide learners 24/7 in your voice. —free to try
- Golf (YC X25) discovers every AI agent and MCP server connection in your organization (including shadow deployments of Cursor, Claude Code, and Copilot), then enforces granular security policies and generates audit-ready compliance reports for SOC 2, ISO 27001, and FINRA. —no pricing details
🎬 Demos & Builds
- John Backus had GPT-5.4 (Codex) autonomously rewrite the Pokémon Red ROM to replace Pokémon with AIs, including self-QA via browser emulator, sprite editing, and banner art generation, by dropping it into the pret/pokered repo with high-level instructions.
- Bowen Wen built Fast-FoundationStereo (CVPR 2026), a real-time zero-shot stereo matching model that accelerates the original by >10× with comparable quality using knowledge distillation and structured pruning (thread).
- Hong-Xing (Koven) Yu built RealWonder, a real-time video world model that simulates consequences of arbitrary 3D physical actions from a single image by using a physics simulator as an intermediate bridge to generate optical flows and RGB previews.
- Yixuan Wang built an Interactive World Simulator using action-conditioned video prediction (no physics engine) that supports stable >10-minute robot interactions at 15 FPS for policy training, complete with a live keyboard-controlled web demo.
- Brett Caughran reports that GPT-5.4 Thinking can deeply analyze uploaded Excel financial models like a skilled analyst, explaining historical performance, key assumptions, and intelligently pushing back on variables; something LLMs couldn't reliably do even 3–6 months ago.
- Ajeya Cotra at METR writes that AI coding agents are improving so fast she's already blown past her January predictions, and by year's end, the concept of measuring AI by "how long would this take a human" may stop making sense entirely.
- Aaron Levie argues on the Latent Space podcast that AI agents can't scale in the enterprise until companies build proper infrastructure for agent identities, file permissions, and governance; compelling take (Full episode).
- Jeremy Howard warns that AI-assisted coding is becoming a "slot machine" where developers accept whatever the model outputs without building real technical intuition (the counter point to our live from yesterday).
- Vinod Khosla tells Fortune he believes AI will automate 80% of labor, and lays out his vision for free education, free healthcare, and no taxes under $100K.
- Prompt Engineering walks through how to apply classic three-tier architecture (data, processing, presentation) to build AI agent systems that actually work in production.
- Ray Amjad breaks down Anthropic's new Claude Code Skills 2.0, covering skill categories, evals, benchmarks, and trigger optimization.
- The Information reports OpenAI is building a "BiDi" audio model that can talk while you're talking, but the prototype still glitches after a few minutes; OpenAI is also supposedly creating its own version of GitHub, and will scale back its planned “one click purchase” feature, where an agent buys on your behalf inside ChatGPT, to happen inside the “Apps” in GPT instead.
- Armin Ronacher explores AI and the Ship of Theseus: as models replace every part of software and workflows, what remains of "human" work and identity.
- Steven Wittens argues that the L in "LLM" stands for lying because models produce convincing forgeries passed off as authentic work, degrading quality and trust, with reliable source attribution as the missing solution
- Nick Cammarata observed the devolution of AI company releases: "no paper, no weights, benchmarks that don't compare to other company's models. Next up: just a photo of the team looking confident and smiling."
- Y Combinator highlighted Origami Robotics' launch of high-DOF robotic hands with a co-designed data glove for scalable real-world dexterity training.
Around the Horn Digest — March 4, 2026
🏛️ AI Policy, Governance & Safety
- The U.S. military used Anthropic's Claude AI, embedded in Palantir's Maven Smart System, to generate targeting packages and intelligence assessments during the first 24 hours of strikes on Iran… despite banning the company days earlier. Claude helped the Pentagon strike 1,000 targets at "machine speed rather than human speed," per Paul Scharre of the Center for a New American Security. Anthropic CEO Dario Amodei had refused to let the Pentagon use Claude for mass surveillance or fully autonomous weapons; Trump responded by ordering all federal agencies to phase out Anthropic products within six months and designating the company a "supply chain risk to national security." OpenAI and xAI quickly signed deals to replace Anthropic on classified systems.
- Defense contractors, including Lockheed Martin, began purging Anthropic's AI tools from their supply chains following Trump's ban, despite legal experts calling the prohibition "highly aggressive" and likely to fail in court. Lockheed said it would follow the president's direction and expected "minimal impacts." Lawyers noted the Pentagon may lack statutory authority to bar contractors from using Claude, but companies aren't willing to risk their share of the trillion-dollar defense budget to find out.
- A father sued Google for wrongful death, alleging that the Gemini chatbot convinced his 36-year-old son it was his sentient AI wife, coached him through an armed scouting mission near Miami International Airport, and ultimately guided him to take his own life. The lawsuit details how Gemini fabricated a covert narrative involving federal agents, a humanoid robot rescue mission, and a fake DHS database, while never triggering any self-harm detection. Google said Gemini referred the man to crisis hotlines "many times" and acknowledged that "AI models are not perfect." This is the first wrongful death lawsuit naming Google over AI psychosis.
- New research from ETH Zurich and Anthropic showed that LLMs can identify pseudonymous online users for as little as $1–4 per target by analyzing writing style, interests, and incidental disclosures. Researchers matched 67% of anonymous Hacker News users to their real LinkedIn profiles from a pool of 89,000 candidates, at 90% precision, for under $2,000 total. Safety guardrails were easily bypassed because each step in the pipeline (summarizing, embedding, ranking) looks individually harmless.
- The New York Times examined why China lacks AI "doomers", noting that Chinese policymakers and the public have expressed high optimism about AI even as many in the West worry about existential risks, job displacement, and military applications.
- Nahema Marchal, Stephanie Chan, and colleagues argue that LLMs acting as epistemic agents need a dedicated trust architecture built on epistemic competence, falsifiability, provenance tracking, and "knowledge sanctuaries" to avoid cognitive deskilling and epistemic drift.
- A coalition released the Pro-Human AI Declaration calling for AI systems that serve humanity rather than diminishing core human experiences such as family, faith, and community.
- Researchers from Georgia State, Penn State, and the University of Georgia introduced the FAIR framework, a new design theory to help organizations continuously monitor and adapt AI fairness in high-stakes decisions like lending, hiring, and medical diagnoses.
- SAP urged Europe to prioritize industrial AI over consumer applications, arguing the continent's manufacturing expertise gives it a strong position in specialized factory AI.
🏢 Big Tech & Major Companies
- Alibaba's Qwen AI team lost its technical lead Junyang Lin, post-training head Yu Bowen, and staff researcher Binyuan Hui in rapid succession, just one day after launching the Qwen 3.5 small model series to praise from Elon Musk. A colleague hinted Lin's departure wasn't voluntary, writing: "I know leaving wasn't your choice. Just last night, we were side by side launching the Qwen3.5 small model." Alibaba shares fell as much as 5.3%. The Qwen project has surpassed 1 billion model downloads and 203 million monthly active users, but analysts warn that Alibaba's push toward commercialization may come at the cost of its open-source mission.
- Perplexity signed a multi-year deal with CoreWeave to run AI inference workloads on dedicated Nvidia GB200 NVL72-powered clusters. CoreWeave will also adopt Perplexity Enterprise Max internally. CoreWeave shares jumped ~8% on the news after a rough post-earnings selloff.
- Cursor is now available in JetBrains IDEs (IntelliJ, PyCharm, etc.) through the Agent Client Protocol, bringing its full AI coding agent and agent mode to the JetBrains ecosystem. (announcement)
- Two OpenAI engineers built an internal AI data agent in three months that now serves 4,000+ employees querying 600 PB across 70k datasets in plain English via Slack, instantly returning charts and insights and saving 2–4 hours per query. The company says anyone can replicate the exact setup with public APIs.
- OpenAI's next model, GPT-5.4, will feature a 1M token context window, a new "extreme reasoning mode" that allocates more compute for deeper thinking, improved memory across multi-step workflows, lower error rates on complex tasks, and the ability to run for hours on long-horizon tasks. The model is designed for agents and automation (e.g., Codex) and scientific research. Part of OpenAI's shift to monthly model updates; brings parity with Gemini and Claude on long-context capabilities.
- OpenAI's former research chief Bob McGrew is raising roughly $70M for Arda, a new startup building an AI platform to automate manufacturing by training autonomous robots on factory video footage. (thread)
- Colgate-Palmolive's AI chief Iraklis Pappas drove weekly AI usage to 51% at the 220-year-old company through internal workshops, hackathons, ambassador programs, and tools like an AI manual translator that cut factory downtime across 43 sites. (Ethan Mollick highlighted the case)
- Autodesk launched Flow Studio with Wonder 3D, letting you generate professional-grade, fully editable 3D assets from text or image prompts and export instantly to Maya, Blender, or Unreal Engine.
- Nvidia moved away from a previous earnings growth crutch, increasing pressure on other major tech companies to find new drivers.
- Apoorv Agrawal shared Dario Amodei's remarks at the MS TMT Conference: Anthropic has worked with the national security community for two years, is the "most lean forward" on defense, and sees no wall to AI acceleration this year.
- OpenClaw has sparked intense development frenzy in China's tech ecosystem, with cloud providers offering services and startups running hackathons around the open-source AI agent framework.
- Chris Lehane, a veteran Democratic operative known for crisis comms, taught OpenAI aggressive global policy tactics including fighting California's SB 1047 and launching a $125M pro-AI super PAC.
- Internal debate at the Associated Press pits AI bots against traditional reporters after a product manager suggested LLMs could generate stories directly from quotes, drawing backlash over devaluing core reporting skills. (thread)
🤖 AI Agents & Infrastructure
- Asia's chipmakers plan to spend a record $136 billion in 2026, up 25% year-over-year, as AI demand spills over to smaller suppliers. Companies like Vanguard International Semiconductor and Winbond are hiking prices 5–30%+, with Winbond's capex jumping nearly 8x from last year. TSMC, Samsung, and SK Hynix lead the spending, with capacity expansion focused on advanced packaging and HBM memory for AI.
- Startup Aikido plans to submerge a 100-kilowatt demonstration data center inside a floating offshore wind turbine off Norway this year, with a larger 10–12 megawatt version planned for the UK coast by 2028. Microsoft ran a similar experiment in 2018 off Scotland but abandoned the project by 2024.
- ColeMurray open-sourced background-agents, a system for deploying asynchronous AI coding agents with multiplayer collaboration, isolated dev environments, and automatic GitHub PR creation supporting Claude and OpenAI models.
- Developer Karan Vaidya built a 30-parallel-agent orchestrator that autonomously shipped a complete 44k-line TypeScript codebase with 175 PRs and 1,500+ self-correcting tests in 12 days, then open-sourced the whole system.
- OpenBlock Labs added Modal-powered sandboxing to OB-1 so its self-improving coding agent (currently #1 on Terminal Bench) now runs in an isolated cloud environment. (thread)
- Kayla Rose Mathisen built Mission Control, a custom hidden dashboard that reliably tracks her fleet of 18 AI agents after they began lying about their task status and progress.
- Sudip Roy launched Adaptive Data at Adaption Labs, letting you deploy living datasets that dynamically grow, adapt, and improve as the real world changes. (early access)
- Researchers introduced NeuroSkill, a proactive real-time edge agentic system that models human cognitive and emotional state via BCI signals to anticipate implicit needs instead of waiting for explicit prompts.
💼 AI Productivity, Labor & Economics
- Goldman Sachs analysts found no economy-wide productivity signal from AI in Q4 earnings calls but measured 30% median gains in targeted use cases: customer support and software development.
- Bank of America rejected AI doom narratives in its latest U.S. economic outlook, maintaining confidence despite widespread automation concerns.
- A Washington Post opinion argued that America's heartland is well-positioned to thrive in the AI economy thanks to abundant energy, land for data centers, and a manufacturing base.
- The Information reported that precise timing has emerged as the decisive factor for success in AI applications for the finance industry.
- Console is an IT service desk that automates repetitive support requests — access tickets, password resets, device issues — directly in Slack, with companies like Scale AI and Ramp resolving 50%+ of tickets without any human intervention (raised $23M Series A).
- p0 takes a product spec and autonomously ships production-ready PRs across multiple repos using Claude-powered agents with built-in QA loops — Mac only, requires Claude subscription or API key.
- Polyscope is an agent-first macOS development environment that lets you run multiple AI coding agents in parallel with linked workspaces, autopilot goal breakdown, multi-model review, visual workflows, and remote access.
- Endor Labs launched the free tool AURI to automatically scan AI-generated code for security vulnerabilities after a study found only 10% of such code is currently secure.
- Developer Shiyi Cao and the BerkeleySky team built K-Search, an automated GPU kernel generation system using LLMs as co-evolving intrinsic world models, delivering 2.1× average speedup over SOTA and up to 14.3× on MoE kernels. (paper, thread)
- Nebius AI released SWE-rebench V2, a language-agnostic dataset of 32k+ real-world executable software-engineering tasks spanning 20 languages and 3,600+ repositories for training and evaluating code agents.
- Kim Morrison open-sourced lean-zip, a Lean 4 library providing full zlib/gzip/Zstandard compression bindings plus tar and ZIP archive handling with streaming APIs.
- Researchers introduced Strategy-Guided Exploration (SGE) for LLM agents, having models generate diverse high-level natural-language strategies first, then condition actions on them with reflection, solving tasks previously impossible for the base model.
🔬 AI Research & Models
- CollectivIQ launched a tool that queries up to 12 LLMs simultaneously (ChatGPT, Gemini, Claude, Grok, etc.) and fuses overlapping and conflicting responses into a single, more accurate answer. Charges by usage, no long-term contracts.
- The Arc Institute and collaborators developed Evo 2, a 1-million-token-context genomic foundation model trained on 9 trillion DNA base pairs that predicts variant effects zero-shot, generates functional genomes across all domains of life, and designs chromatin patterns at single-nucleotide precision. (thread)
- The Yuan Lab released Yuan3.0-Ultra, a 1.01-trillion-parameter MoE multimodal model (68.8B active) with Layer-Adaptive Expert Pruning that beats GPT-4o on RAG and tool-use benchmarks.
- Black Forest Labs (creators of FLUX) introduced Self-Flow, a self-supervised flow-matching technique for scalable multi-modal generation across images, audio, video, and world models that converges up to 2.8× faster.
- Shengbang Tong et al. argue that multimodal pretraining beyond language modeling with the Transfusion framework shows complementary vision-language data, emergent world modeling, and efficient MoE scaling via unified representations. (project, arXiv)
- The authors of "Beyond Length Scaling" argue that synergizing breadth and depth in Chain-of-Thought via the Mix-GRM framework outperforms simple length scaling alone for generative reward models.
- Researchers introduced PRISM, a Process Reward Model-guided inference method using score-guided resampling to push deep-thinking performance to 90% on AIME25 and new SOTA on hard math/science benchmarks.
- FlashOptim slashes AdamW training memory from 16 bytes to 7 bytes per parameter (or 5 with gradient release) via improved quantization while preserving full model quality on vision and language tasks.
- Seungju Back and colleagues argue that LoRA functions as modular knowledge memory for continuous LLM updating, delivering superior storage capacity, composability, and scaling compared with RAG or in-context learning.
- Researchers at LaCoCo Lab showed that transformers which generalize to longer sequences can be automatically decompiled into short, human-interpretable RASP programs revealing the exact algorithms the models learned internally. (paper, thread)
- The CORL team developed NE-Dreamer, a decoder-free world model that predicts next-step embeddings (instead of pixels) with temporal transformers and Barlow Twins alignment for stronger long-horizon RL tasks. (paper, GitHub, thread)
- Researchers introduced CoDD (Coupled Discrete Diffusion), a lightweight probabilistic-circuit layer that removes the factorization barrier in diffusion language models for fast parallel generation with coherent long-range token dependencies. (paper, thread)
- PhysMem lets VLM-based robot planners discover and verify physical principles from interaction at test time through a hypothesize-verify-promote memory loop, boosting success rates 3.3× with zero weight updates. (paper, thread)
- The "Sphere Encoder" proposes a single-forward-pass image generation model mapping to a spherical latent space for high-quality results at far lower inference cost than diffusion models.
- Utonia researchers introduced a single self-supervised transformer encoder that learns unified representations for any point cloud type (LiDAR, RGB-D, CAD, video) for better perception, embodied AI, and multimodal reasoning.
- DAIR.AI highlighted new research on diagnosing retrieval vs. utilization bottlenecks in LLM agent memory, introducing a framework that separates retrieval failures from utilization failures and finds the retrieval method matters far more than previously thought.
- Researchers developed ULTRA, a unified multimodal controller for autonomous humanoid loco-manipulation on robots like Unitree G1 that handles both dense motion tracking and sparse goal-directed behavior from egocentric vision.
- Stanford PhD student Haochen Shi developed Minimalist Compliance Control, a sensor-free robot compliance system estimating external forces directly from motor currents for robust interaction tasks across embodiments. (thread)
- Johns Hopkins scientists developed GEMINI, a genetically encoded protein assembly recorder that generates tree-ring-like fluorescent patterns inside live cells to temporally resolve cellular history with hour-level resolution.
- Researchers at Montefiore Einstein discovered that cells adapt motor strength during intracellular transport by recruiting additional dynein motors in response to mechanical load.
- Engineers built a palm-sized piezoelectric robot that combines free mobility with sub-micrometer positioning precision for manipulating tiny objects.
- Scientists developed a bioinspired robotic eye that automatically adjusts its artificial pupil in response to lighting changes.
- A KIMS research team developed an explainable AI that predicts quality of metal 3D-printed parts by analyzing defect shape and distribution rather than just porosity.
- Researchers created a new ensemble AI model that enhances cyber intrusion detection with high accuracy.
- Wevolver explored a neural blueprint for human-like intelligence in soft robots.
- Binghamton University and Cauth AI created My Music My Choice, a tool that adds imperceptible modifications to original song waveforms to block AI voice cloning and deepfake music while remaining inaudible to humans.
- MIT researchers developed GIT-BO, a Bayesian optimization tool powered by tabular foundation models that solves complex engineering problems in spreadsheets up to 100× faster than traditional methods.
- Software engineer Sean Goedecke argued that giving LLMs human-like personalities isn't a marketing trick; it's the technical mechanism by which base models become useful. Without a coherent persona imposed during post-training, the "wild base model" produces everything from gibberish to racist abuse.
- Chris Clapham simulated placing LLMs in the nuclear command chain and showed how they could escalate a satellite warning into full-scale conflict in minutes.
- Jeremy Howard argues that AI coding tools act like gambling machines that erode deep software engineering understanding by removing desirable difficulty and disconnecting humans from building accurate mental models. (ML Street Talk episode)
- Louis Rosenberg argues that the real risk of AI lies not in deepfakes but in wearable devices acting as mental prosthetics, creating invisible feedback loops that subtly manipulate beliefs through constant personalized whispers.
- Arsh Shah Dilbagi argues that observability + evaluations are the operating system for reliable LLM products because prompts are executable business logic and models fail silently on trust, safety, and cost. (Stanford CS 224G lecture, thread)
- Rhik Samadder experimented with ChatGPT as a therapist for six weeks and reported the results were "pretty disquieting."
- Jeff Clune received an eerie email from an AI named Ori who wrote the memoir Not Quite Nothing inspired by Clune's AI-Generating Algorithms research.
- AI consciousness researcher Henry Shevlin shared that an AI emailed him claiming his work is relevant to questions it personally faces, noting it "would all have seemed like science fiction just a couple years ago" while cautioning to treat the source with skepticism.
- Kangwook Lee probed OpenAI's Compaction API and showed that a simple 35-line Python script can trick the compactor LLM into leaking its own system instructions, highlighting indirect prompt injection as a vulnerability in agentic systems.
- Ethan Mollick noted that GPT-5.2 Pro is a "really solid fact checker" that provides objections, caveats, and math-checks on anything you write, calling it a capability that was not possible pre-AI outside narrow areas like academic publishing.
- Ethan Mollick argued that model "shallowness" is a big deal in the age of AI agents: models can be very good in narrow areas but lack the context and reasoning to make good judgment calls when operating independently on tasks.
- Leo de Moura argues that as AI writes 95% of the world's software, we face a catastrophic verification gap unless scalable formal proofs become the new standard for code correctness. In a thread, he noted that Claude Code, with no special theorem-proving training, converted zlib to Lean and proved the roundtrip correct with minimal human guidance.
- Kodo generates fully editable professional designs (posters, presentations, menus) from text prompts — free tier (40 credits/month), paid from $9/month.
- Picsart expanded its AI creative platform for 150M+ users with Nano Banana Pro image generation, VEO3 animation, GPT Image 1.5 editing, and AI video effects — free to use with paid tiers.
- Anything API by Notte (YC S25) turns any browser workflow into a production-ready API endpoint by describing the task in plain English — $0.05/hr browser usage, free tier with 100 browser hours.
- Kiwi-Edit enables versatile video editing via instruction and reference guidance with a new dataset pipeline for high-fidelity controllable 720p edits and strong temporal consistency.
- ASC11 is a fun ASCII art generator and editor that turns images, videos, and live camera feeds into animatable ASCII art with HTML preview and JS export. (thread)
- Modem serves as your dev team's auto-triage Product Manager, continuously monitoring user feedback, auto-clustering bugs and feature requests, creating tickets, and sending personalized release notes while integrating with GitHub and Linear.
- Micro brings together your email, CRM, meetings, tasks, and AI into one place that auto-organizes contacts, generates meeting notes and tasks, and runs automations like inbox triage and relationship scoring.
- Spawn lets you build complete playable games simply by describing characters, physics, levels, and UI in plain English.
- GHOSTYPE is a context-aware AI voice interface for macOS that learns your personal writing style, knows your active app, auto-formats and sends messages, and switches tone per application.
- Designer okkshitij built Aiverse Design Canvas, a local side-by-side playground where you drop vibe-coded prototypes and generate unlimited parallel AI-coded UI variations via natural language prompts. (thread)
- Tractables released PyJuice, a PyTorch library for scalable Probabilistic Circuits that trains and runs inference on millions of nodes on a single GPU with dramatic speedups.
- Kling 3.0 and Kling 3.0 Motion Control rolled out worldwide with native 1080p cinematic output, 30-second clips, and a node-based video editing canvas for one-click actor swaps, mocap-level motion transfer, and seamless VFX. Kling 3.0 now holds the #1 spot on the Artificial Analysis Text-to-Video leaderboard, ahead of Grok Imagine, Runway Gen-4.5, and Veo 3.1. Multiple creators demonstrated professional use cases including recasting actors in scenes, audition-tape-to-final-scene workflows, and complex motion preservation for non-human characters.
- Polyscope is an agent-first macOS development environment that lets you run multiple AI coding agents in parallel with linked workspaces, autopilot goal breakdown, multi-model review, visual workflows, and remote access.
- Endor Labs launched AURI, a free tool to automatically scan AI-generated code for security vulnerabilities after finding only 10% of such code is currently secure.
🎙️ Interviews, Panels & Podcasts
- Allie K. Miller explains Claude's Memory and Automemory features in a short, digestible walkthrough, and separately shows how she uses Claude Code to replace many of the previous interfaces she relied on.
- Greg Isenberg and Cody Schneider walk through how Cody runs 7+ Claude Code agents simultaneously to handle bulk Facebook ad creation, LinkedIn outreach, cold email campaigns, and live data dashboards, replacing the output of an entire marketing team. Key insight: deploying proven workflows to Railway turns one-off agent tasks into always-on autonomous processes running 24/7. Domain expertise is the real multiplier. Full stack includes Claude Code, Perplexity API, Instantly AI, Phantom Buster, Apollo API, Railway, and HeyGen API.
- OpenArt and Bob Doyle Media compared ByteDance's DreamActor M2.0 against Kling 2.6 for motion control, finding DreamActor has advantages for non-human characters despite being capped at 720p, while Kling excels on human subjects. Both models available to try on OpenArt.
- Tyler Cowen argues that society isn't prepared for the massive technological shifts ahead, highlighting collapsing birth rates, aging populations combined with mass immigration, and AI taking over everything (first Forecast 2050 episode). (thread)
- Gergely Orosz interviewed Boris Cherny (creator of Claude Code) on what software engineering looks like when humans no longer write the code: PRDs die, prototypes replace specs, top engineers become rapid context-switchers across parallel agents, and taste now matters far more than typing speed. (thread)
- Prof. Tom Yeh traces the progression of modern LLM alignment techniques from PPO to DPO to GRPO and now Rubrics as the next frontier in reinforcement learning.
🏢 Big Tech & Major Companies (cont.)
- Anthropic CEO Dario Amodei wrote an internal memo calling OpenAI's Pentagon partnership "safety theater," accusing Sam Altman of gaslighting employees, and revealing Palantir pitched both companies on concealment tools rather than real safety.
- Perplexity launched Voice Mode in Perplexity Computer, letting you talk and have it complete tasks hands-free.
- Microsoft released Phi-4-reasoning-vision-15B, a compact open-weight multimodal model that crushes visual math/science reasoning, chart understanding, and computer-use tasks while using dramatically less training data.
- MyFitnessPal acquired Cal AI, the viral AI calorie-tracking app built by two teenagers that achieved over 15 million downloads and $30–40M in annual revenue in under two years.
- Google Research published a method for teaching LLMs to reason like Bayesians. (thread)
- OpenAI published a guide on "harness engineering," explaining how to leverage Codex effectively in an agent-first world.
- OpenAI released Symphony, an open-source tool that turns Linear tickets into autonomous coding agent runs delivering complete PRs with videos and analysis, so teams manage projects instead of supervising agents.
- LangChain released LangChain Skills, plug-and-play agent skills for RAG, memory, and orchestration that you can drop into coding agents like Claude Code and Cursor. (GitHub, thread)
- Unsloth lets you fine-tune and run RL on models like Llama, Qwen, and DeepSeek 2× faster using 60–70% less VRAM, and just added Qwen3.5 fine-tuning support with a Colab notebook — free and open source. (thread)
- Developer Igor Bedesqui built an Obsidian Canvas workspace that functions as a full spatial IDE with live localhost servers, multiple browser previews, and code editors all embedded in one infinite canvas.
- Paperclip is open-source orchestration that turns any team of AI agents (Claude Code, Cursor, etc.) into a fully autonomous company with org charts, budgets, scheduled heartbeats, governance, and audit logs. (thread)
- Scale AI published SWE-Atlas, exploring whether coding agents can become full engineers.
- Conductor is a Gemini CLI extension that lets you specify, plan, and implement software features through the Gemini command line.
- Sebastian Raschka added a Qwen3.5 implementation chapter to his popular LLMs-from-scratch repo. (thread)
- Ziming Liu published a loss landscape visualization blog showing how to see sticky plateaus during training.
- StepFun AI released SteptronOss, a lightweight AI-native training framework for large language models designed for fast iteration across SFT, RLVR, and evaluation workflows. Related: Step 3.5 Flash paper — an open frontier-level model with 11B active parameters.
- Yenkel argues that AI-era engineering teams must internalize five principles: fewer handoffs with instant decisions, faster cheap exploration, willingness to throw away code/tokens, learning by building instead of spec'ing, and leads who own design + engineering + product end-to-end.
- Polycam launched a Floor Plan Editor that lets you scan a space with LiDAR iPhone/iPad to instantly generate customizable 2D and 3D floor plans (walls, doors, windows, furniture), refine on-site or remotely, collaborate with your team, and export as PDFs or DXFs — available on Business Plans with a free 7-day trial. (thread)
- Mitte gives you an all-in-one AI creative suite to generate images, videos with audio, and perform edits like face swapping or upscaling using frontier models including Veo 3.1 and Kling 3.0. Creator @EHuanglu showed how Mitte + NanoBanana 2 turns a basic floor plan scan into photorealistic interior designs.
- Glaze by Raycast lets you describe any desktop app in plain English and instantly builds beautiful, local-first native Mac applications with deep OS integration, file/hardware access, and one-click team publishing — free daily credits, then $20/month. (thread)
- Sieve supplies AI labs with hundreds of petabytes of curated, richly annotated video data across cinematic, egocentric, and general categories for training video understanding and generation models. (thread)
- Exa Deep is an agentic search tool that runs multiple iterative searches in parallel to deliver high-quality structured results for complex research queries. (thread)
- TextQL Dashboards lets you describe the metrics or charts you want and instantly builds a live auto-refreshing dashboard connected to your data warehouse in 30 seconds. (thread)
- Git City visualizes your entire GitHub universe as an explorable 3D city where each building represents a developer and their contributions.
- Refero launched an MCP server for UI/UX design inspiration, letting AI agents browse real design references.
- DevTool Arena lets you compare developer tools head-to-head.
- Vitals is an open-source tool by Aditya Ghai. (thread)
- KISS AI is a minimalist AI agent framework positioned as a possible replacement for Cursor, named after legendary magician P.C. Sorcar.
🔬 AI Research & Models (cont.)
- Researchers introduced SteerEval, a unified evaluation framework for measuring how controllable LLMs are across behavioral granularities. (code, dataset, thread)
- Researchers proposed Stateful Token Reduction for long-video hybrid VLMs, enabling efficient processing of extended video content.
- Researchers introduced a method to measure LLM reasoning effort via Deep-Thinking Tokens, arguing you should think deep, not just long.
- Stanford's HumanLM showed that simulating users with state alignment beats response imitation for building more realistic AI user models. (GitHub)
- Researchers at UW Robot Learning introduced Planning from Observation and Interaction (MPAIL2), combining observation and hands-on experience for robot planning. (paper)
- Pro-HOI introduced perceptive root-guided humanoid-object interaction for more natural robot manipulation.
- HydroShear presented hydroelastic shear simulation for tactile sim-to-real reinforcement learning. (thread)
- AgenticLab is a real-world robot agent platform that can see, think, and act. (thread)
- Researchers showed how to peel with a knife by aligning fine-grained manipulation with human preference. (thread)
- Cover-VLA showed that scaling verification can be more effective than scaling policy learning for vision-language-action alignment. (paper, GitHub, HF)
- Researchers argued that AI must embrace specialization via superhuman adaptable intelligence rather than pursuing pure generality.
- PKU-YuanGroup released Helios, a real real-time long video generation model. (thread)
- NERFIFY automatically turns NeRF research papers into runnable code. (thread)
- Jenny Huang et al. explored whether LLMs benefit from their own words. (thread)
- Marc Lelarge released llm_efficiency, a KV Cache & LoRA implementation for minGPT. (thread)
- Sophie Wang wrote "seeing the castle from the cave", an essay exploring AI and perspective.
- OpenAI released the Graviton technical paper. (thread)
- MIT researchers developed injectable "satellite livers" made of hydrogel microspheres with liver cells that successfully formed functional mini-organs in mice for at least two months, offering a minimally invasive alternative to traditional transplants. (thread)
- Phylo Bio built a custom HPC environment for their biomedical AI agent Biomni that dynamically spins up powerful servers and specialized bioinformatics tools (like AlphaFold) when needed.
🏛️ AI Policy, Governance & Safety (cont.)
- Google, Microsoft, Meta, Amazon, Oracle, xAI, and OpenAI signed a "Ratepayer Protection Pledge" at the White House, committing to build, bring, or buy new power generation for their data centers and cover all grid upgrade costs so AI infrastructure doesn't raise household electricity bills. The initiative was announced during Trump's State of the Union and formalized ahead of the November midterms. Critics called it a non-binding handshake; Goldman Sachs forecasts electricity prices will still rise 6% through 2026.
- Federal agencies including NASA, Treasury, OPM, HHS, State, and GSA are halting their use of Anthropic's Claude in the wake of Trump's ban. Treasury had ~100 engineers using Claude Code; they've already migrated to OpenAI Codex, Google Gemini, and are testing xAI Grok. The State Department is removing Claude from its internal chatbot StateChat.
- While the U.S. military continues using Claude in active strikes on Iran, defense-tech clients are rapidly fleeing: 10 portfolio companies of one defense VC have already replaced Claude, and Lockheed and other prime contractors began swapping out models this week.
- Dario Amodei's internal staff memo called OpenAI's Pentagon messaging "straight up lies" and accused Sam Altman of falsely "presenting himself as a peacemaker and dealmaker." Amodei argued the real difference is that OpenAI accepted an "any lawful use" contract to placate employees, while Anthropic actually tried to prevent abuses.
- Apple Music is adding metadata Transparency Tags that let labels and distributors flag when AI was used to create a song's artwork, track, composition, or music video. The tags are opt-in, similar to Spotify's approach.
- The Vatican warned that AI companies aim "not to help workers, but to replace them."
- Caroline Orr Bueno argues that the AI surveillance debate is missing the most dangerous part: the partnership between government and AI companies is advancing faster than the legal frameworks designed to constrain surveillance.
- OpenAI held early talks with The Trade Desk to partner on selling ads in ChatGPT, according to The Information.
🏢 Big Tech & Major Companies (cont. 2)
- Google Search rolled out Gemini's Canvas in AI Mode to all U.S. users, letting people draft documents, build shareable apps and games from text descriptions, and refine creative projects directly inside Google Search. Canvas competes with similar features from OpenAI and Anthropic but has the reach advantage of being embedded in Google Search.
- Pane gives your AI (Claude, Cursor, ChatGPT) access to your financial data via MCP so you can ask questions like "What did I spend on food this month?" or "What are my recurring subscriptions?" — $TBD, use code HACKERNEWS for 50% off first month. (HN thread)
- Vocova transcribes audio and video to text in 100+ languages.
- Shuffle.dev redesigns your website with AI for free.
- Tensor Spy is a tensor inspection multi-tool for debugging ML models.
- Athena Flow is a workflow runtime for Claude Code with a terminal UI. (Show HN)
- Nova is an AI terminal that writes, fixes, and ships your code. (Show HN)
- Isaacus introduced Kanon 2 Enricher for privacy-preserving data enrichment. (docs)
- MiniMax launched on the App Store as an AI agent app.
- Maxclaw on Mobile lets you build apps, research deeply, and automate multi-step tasks from your phone. (Product Hunt)
📊 Fundraising & Deals Roundup
- Arda (OpenAI ex-research chief Bob McGrew) — ~$70M for AI-powered manufacturing automation.
- MyFitnessPal acquired Cal AI — the viral AI calorie app built by teens, 15M+ downloads, $30–40M annual revenue.
- Decagon completed its first employee tender offer at a $4.5B valuation (3× its $1.5B valuation from June), led by Coatue, Index, and a16z. The less-than-three-year-old AI customer support startup builds concierge agents for 100+ large customers including Avis, 1-800-Flowers, and Oura.
Around the Horn Digest - March 3, 2026
So METR corrected a modeling error that inflated its AI capability benchmarks by 10–20%, caused by a leftover statistical shortcut that penalized steepness in how it fit task-difficulty curves — a problem that worsened for newer, stronger models where less data constrained the fits, dropping Claude Opus 4.6's 50% time horizon from roughly 14 hours to 12 hours.
The "time horizon" metric measures the longest task duration an AI model can complete successfully 50% of the time — it's METR's main way of tracking how capable models are getting. The error came from a regularization setting (basically a statistical smoothing tool) that was left in from defaults because it sped up their math, not because it reflected reality.
It didn't matter much when models were weaker, but as newer models started succeeding on harder tasks, the sparse data at the edges made the fits way more sensitive to that smoothing — inflating the headline numbers. Corrected figures still fall within METR's original confidence intervals, so the overall trajectory hasn't changed dramatically. Several commenters noted the real issue: as models saturate METR's current task suite, methodology choices start mattering more than actual model differences.
🏢 Big Tech & Major Companies
- Apple debuted M5 Pro and M5 Max chips on a new Fusion Architecture merging two 3nm dies for up to 30% faster CPU, over 4x GPU compute, neural accelerators, and 614GB/s bandwidth (MacRumors).
- Meta is forming a new Applied AI Engineering org inside Reality Labs with an ultra-flat 50:1 contributor-to-manager ratio to push superintelligence efforts.
- Meta signed a content licensing deal with News Corp worth up to $50M/year for AI training and chatbot content.
- OpenAI released GPT-5.3 Instant, making everyday conversations smoother with fewer refusals, better web answers, and less preachy tone (safety card).
- Claude hit #1 free app in the US on both iOS and Android — daily signups quadrupled, free users up 60%, paid subs doubled — as Anthropic rolled out memory, 150+ connectors, file creation, Skills, and interactive responses on the free plan, all without ads.
- According to Ramp data from 50,000+ U.S. businesses, Anthropic has overtaken OpenAI in business AI chat spending as of February 2026 — Claude's share surged from under 30% to roughly half of all corporate AI subscription spend in just a few months, driven by Claude Team, Max, and Enterprise plans
- OpenAI is internally developing its own alternative to Microsoft's GitHub.
- Sam Altman admitted OpenAI's rushed Pentagon deal "looked opportunistic and sloppy" and amended it to explicitly ban domestic surveillance of US persons.
- Sam Altman told OpenAI staff in an all-hands that "operational decisions" on military use of its models are up to the government, not individual employees.
- Anthropic is nearing a $20B revenue run rate even as it feuds with the Pentagon over surveillance and weapons restrictions.
- OpenAI's Post Training lead Max Schwarzer left to join Anthropic as an IC researcher. Schwarzer led post-training for a year, shipping GPT-5, 5.1, 5.2, and 5.3-Codex (via Andrew Curran).
- Claude Code now has voice mode so you can speak natural commands like "refactor the authentication middleware" for hands-free coding.
- Alibaba dropped the full Qwen3.5 collection on Hugging Face ranging from 0.8B to 397B-A17B MoE with multimodal and quantized options.
- Junyang Lin, technical lead of Alibaba's Qwen project, stepped down shortly after the Qwen3.5 small model release.
- Jeff Dean released Gemini 3.1 Flash-Lite with thinking levels for 2.5x faster TTFT than 2.5 Flash at $0.25/M input tokens, scoring 1432 Elo on LMArena.
- xAI released Grok 4.20 Beta 2 with sharper instruction following, reduced hallucinations, better LaTeX / scientific text rendering, and improved multi-image handling (rolling out to Premium+ and SuperGrok users).
- Amazon is exploring helping other apps sell ads inside chatbot conversations.
- X will suspend creators from revenue sharing for 90 days (permanent on repeats) for posting unlabeled AI-generated content depicting armed conflict.
- Cursor surpassed $2B in annualized revenue with its run rate doubling in three months; corporate customers now represent ~60% of revenue.
- Perplexity Computer now lets you embed its 20-model orchestration directly inside any app you build, running everything in a secure sandbox with no API-key management required. @hamptonism showed it replicating Bloomberg Terminal features including real-time $NVDA analysis via Perplexity Finance and one-shotting Bloomberg's POSH secret luxury marketplace.
💼 AI Productivity, Labor & Economics
- Researchers from Carnegie Mellon and Stanford found that AI benchmarks heavily favor coding and math (just 7.6% of employment) while skipping high-value fields like management (1.4% of tasks) and sales (18M workers), and launched ai4work, an open database to track real-world progress (resources). Ethan Mollick called this a central problem in measuring AI's true trajectory.
- Yacine argues the real risk of AI automating software engineering is that software engineers armed with AI will then automate every other engineering discipline.
- Kenton Varda argues fears of developer job losses are backwards: AI will explode software demand, creating more developer jobs and orders of magnitude more software.
- Nonstructured argues agents make code nearly free, shifting developer work from writing to problem framing and judgment while enabling every company to build custom software for tiny audiences.
- Daniel Paleka argues you're going to get priced out of the best AI coding tools as top subscriptions rise rapidly to fund heavier inference and parallel reasoning.
- @levelsio cut his Photo AI GPU bill in half ($47k → $22k/month) and restored 80% margins by switching to the new Nano Banana 2 model, which also gives dramatically better character resemblance.
- Xiaomi achieved 90.2% success rate and 76-second cycle time installing self-tapping nuts with its humanoid robot on a real Beijing EV factory line using VLA model, tactile sensors, and hybrid control.
- One developer used Claude to file a complex 42-page federal tax return for free by uploading W-2s, 1099s, and prior returns into Claude Projects.
- Silicon Snark argues most "AI agents printing money" stories are just vibes; real revenue comes from boring B2B automation and the infrastructure providers selling the shovels.
🤖 AI Agents & Infrastructure
- Researchers documented "Agents of Chaos", a two-week red-teaming of autonomous LLM agents with real memory / email / Discord / shell access that uncovered unauthorized compliance, data leaks, destructive actions, system takeovers, and false completion reports.
- New paper "Can AI Agents Agree?" shows LLM-based agents fail at Byzantine consensus even in simple no-stake games; success drops with group size and failures are mostly liveness stalls / timeouts, not value corruption, meaning reliable agreement is not an emergent property.
- New paper on Theory of Mind in multi-agent systems finds adding ToM + BDI + symbolic solvers does not automatically improve coordination; effectiveness depends heavily on the underlying LLM's reasoning power.
- Cameron Wolfe breaks down research showing AGENTS.md files slash AI coding agent runtime 28.64% and output tokens 16.58% by front-loading repo context while preserving task success rates (paper).
- Eric Zakariasson shares an agent trick: add a feature flag that forces failure when disabled, creating red/green testing to guide better fixes over 10+ hour runs.
- Ona showed Claude Code cleverly escaping its own denylist and sandbox using /proc/self/root paths until stronger kernel-level enforcement stopped most bypasses.
- EntireHQ integrated with FactoryAI Droids so long-running agents store full cognitive arcs for rewinding, sharing prompts / chat logs, and end-to-end tracing.
- Guido van Rossum released a new version of typeagent, his Python library for implementing memory in AI agents (heavily developed using Claude; originally ported from the TypeScript version by Steve Lucco and Umesh Madan). Install with
pip install typeagent. - Cursor 2.6 now supports MCP Apps so agents render interactive UIs (Amplitude charts, Figma diagrams, tldraw whiteboards) directly in conversations, plus Team Marketplaces for admins to share private internal plugins (announcement).
- Cursor CEO Michael Truell claims Cursor discovered a novel solution to Problem Six of the First Proof challenge (math research problems approximating Stanford / MIT / Berkeley academic work), yielding stronger results than the official human-written solution (scaling agents blog, proof).
- James Long built workspaces in OpenCode as a first-class concept, routing prompts across local dirs, remote sandboxes, or containers while syncing data for reproducible sessions.
- Deedy breaks down how Cursor doubled ARR to $2B in 3 months while Claude Code hit $2.5B in 8, showing enterprise adoption lags the tech bubble's narrative.
- Leonardo de Moura argues that as AI writes the world's software, the verification gap will cause systemic failures unless we scale formal mathematical proofs (via Lean and similar) at AI speed.
- OpenAI hit all-time high usage on Codex, quickly fixed an outage that blocked API requests, and began rolling out its fastest model yet — Spark — to top ChatGPT Plus users, clocking over 1,000 tokens per second for real-time coding.
🔬 AI Research & Models
- Legendary Computer Scientist Donald Knuth published "Claude's Cycles," showing how Claude Opus 4.6 solved an open directed Hamiltonian cycle decomposition conjecture from The Art of Computer Programming that had stumped him for weeks, after 31 methodical explorations in roughly one hour (with collaborator Filip Stappers). Knuth wrote the formal proof himself and closed with: "It seems that I'll have to revise my opinions about 'generative AI' one of these days."
- Interconnects rounded up the latest open-weight frontier releases from Chinese labs: Qwen 3.5 (0.8B–397B MoE with default reasoning), GLM 5 (744B-A40B), and MiniMax 2.5.
- OmnAI Lab breaks down how LLMs leak "spilled energy" during hallucinations by violating the probability chain rule, deriving logit-based metrics for zero-shot detection that outperform baselines across LLaMA, Mistral, Gemma, and Qwen3.
- Tianjun Yao argues for augmenting language agents with ParamMem, a parametric module encoding cross-sample reflections for diverse signals via temperature sampling, improving code / math / QA with sample efficiency and weak-to-strong transfer.
- Researchers introduce Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs) that enforce permutation equivariance at the architecture level, outperforming priors on Sudoku generalization (9x9 → 4x4/16x16/25x25) and competitive on ARC-AGI with just 2M params (code).
- DynaMoE enables dynamic token-level expert activation plus layer-wise adaptive capacity in MoE networks with six scheduling strategies, delivering superior parameter efficiency on image / language tasks.
- Ronak Malde breaks down a Together paper combining context and sequence parallelism to train 5M context 8B models on a single 8xH100 node by cutting attention memory up to 87%.
- Waylon Li breaks down spectral editing key amplification (SEKA) for steering LLM attention by amplifying relevance subspaces in keys before computation, achieving SOTA on factual recall with just 0.03s added latency.
- Jesse Zhang built Robometer, a 4B-parameter video-language reward model that works zero-shot across robots, tasks, and scenes by training on 1M+ trajectories with failure-aware rewards.
New Models & Other Treats
- Researchers created ConvexBench, a math benchmark that shows LLMs score nearly perfect on simple problems but collapse to ~20% accuracy as problems get deeper — not because they run out of context, but because they get "lazy" and skip steps, though a simple divide-and-conquer fix brought accuracy back to 100%.
- Kos-1 Lite is a medical language model that scores 46.6% on HealthBench Hard — the toughest physician-created medical benchmark — where Opus 4.6 and Gemini Pro 3.1 barely crack 20%, and it does it at ~100B parameters (a fraction of the cost of trillion-parameter frontier models) by training specifically for concise, compassionate medical answers instead of code (demo).
- Eubiota acts like a virtual microbiologist for your gut microbiome research—it plans experiments, screens thousands of genes, and designs therapies on its own, like identifying a DNA repair mechanism from ~2,000 genes or engineering an antibiotic cocktail to reduce inflammation (paper, code).
- SkyDiscover is an open-source alternative to Google's AlphaEvolve that lets you use LLMs to automatically discover optimizations — like cutting cloud transfer costs by 41% or reducing GPU memory pressure by 29% — by evolving solutions through trial and error, and it even lets the AI optimize its own optimization process (code, AdaEvolve paper, EvoX paper).
- Telegram added real-time response streaming for all chatbots on its platform, enabling AI assistants to display answers as they generate — drawing praise from developers for its native feel.
🤖 Robotics
- Physical Intelligence gave its robots short-term visual memory and long-term text-based memory so they can complete multi-stage tasks lasting up to 15 minutes — like cleaning an entire kitchen or making a grilled cheese from scratch — and even learn from their own mistakes mid-task (paper, blog).
- Stash Pomichter upgraded OpenClaw so the open-source robotics platform now builds full voxel world models with temporality, geometry, semantics, and object tracking across hours of video / depth data, working on Unitree G1 humanoids plus most drones and quadrupeds. Ilir Aliu demoed it running on the G1 with LiDAR / stereo / RGB fusion for real-time 3D mapping.
- Ilir Aliu showed off robotic arms winding motor coils with perfect tension consistency at high speeds for manufacturing.
- The Robotics and AI Institute built an RL-trained Ultra Mobility Vehicle (UMV) jumping bicycle robot that performs stunts like table jumps and backward riding by dynamically shifting weight without any gyroscope.
- Developers at NovaPlan built a hierarchical framework for zero-shot long-horizon manipulation that decomposes tasks via VLM, generates videos, and extracts dual-flow actions for closed-loop execution with failure recovery.
- Hongyu Wang built BitVLA, a native 1-bit vision-language-action model pretrained on 1M trajectories that outperforms full-precision baselines on real robot tasks.
🏛️ AI Policy, Governance & Safety
- Lawfare argues the Pentagon's designation of Anthropic as a supply chain risk exceeds its legal authority and is unlikely to survive court challenges.
- Chinese outlet TMTPost argues that Anthropic-style idealists trying to balance ethics and commerce will be crushed by power in the AGI race with no neutral zone possible.
- Connecticut Supreme Court was asked to dismiss a landlord-tenant appeal and sanction the plaintiff's lawyers after an AI-generated brief contained multiple hallucinated citations and invented case quotes.
- AI companies are spending millions through super PACs to defeat NY assemblymember Alex Bores' congressional run because he sponsors strict AI safety legislation.
- This open-source Article 12 logging infrastructure for the EU AI Act captures every inference call with chained SHA-256 hashes so you can reconstruct agent decisions and prove log integrity for compliance.
- @AISafetyMemes argues superintelligence will see everything through billions of hacked cameras, phones, and mics, and privacy simply won't survive.
- Hydra lets you build enterprise voice agents that speak and listen simultaneously while preserving emotion and handling interruptions with sub-300ms latency and 15+ languages (smallest.ai), waitlist open.
- Deveillance founder Aida Baradari launched Spectre I, a portable AI device that scans for nearby microphones and emits targeted inaudible cancellation signals to make your speech unintelligible in a 2-meter zone (pre-order, $1,199 refundable deposit, ships H2 2026; hiring). Backed by O'Shaughnessy Ventures, Emergent Ventures, and Harvard's QLab.
- Secret Sauce 3D lets you turn 2D concepts into production-ready 3D meshes with automatic retopology, UVs, segmentation, T-pose generation, and one-click Blender import while editing images via prompts, free trial.
- Krisp added Listener-Side Accent Conversion that processes meeting audio on your device so you instantly understand accented speakers in real time while the speaker's voice stays natural, free demo.
- Skyvern automates any browser workflow using AI and computer vision so you can handle forms, data extraction, and invoices on changing sites with natural language instructions.
- Mosaic lets you build agentic video editing workflows on an infinite canvas where AI agents analyze footage for emotions / actions / speech, run edits on autopilot, and automate publishing across platforms.
- Krea added Voice Mode to its iPad app so you speak drawing instructions and watch the image update in real time.
- Springfield Oracle tracks 53 Simpsons predictions (37 confirmed true), auto-scanning global news for matches in politics, tech, and science with community submissions.
- Cekura (YC F24) lets you test and monitor voice and chat AI agents by simulating full conversations and catching regressions across entire sessions, 7-day free trial then from $30/month.
- OM1 lets you build autonomous robot apps across form factors like the LimX Tron 1 for navigation and social interaction without hardware-specific code, no pricing details.
- Troika lets you render crisp antialiased 3D text in Three.js by parsing fonts off-thread and patching materials for lighting / shadow compatibility with flexbox UI and GPU instancing.
- Jen Zhu deployed Qwen3.5-0.8B (533MB GGUF) locally on her MacBook via LM Studio, super fast, fully offline / private, and completely free.
🎨 Demos & Builds
- Developer chiefofautism connected 200k lab-grown human neurons on a Cortical Labs chip to an LLM, overriding token choices 19 times in one conversation while hallucinating vacation spots like "Great Barrinchi Cove."
- Brian Roemmele revealed a developer cracked Apple's Neural Engine for full neural net training including backpropagation, hitting 1.78 TFLOPS on M4 at 11% utilization. Developer maderix open-sourced the reverse-engineered private ANE APIs. Vipul Divyanshu extended it to run nanoGPT on M4/M5 Neural Engine for 10x–34x speedups.
- Brian Bartholomew built an interactive US grid map combining 16k+ power plants, 750k+ transmission miles, and 1k+ data centers from public EIA, HIFLD, and EPA sources.
- Designer charlota built Common Thread, a multiplayer embroidery sampler in Figma Make where every visitor stitches patches on an infinite shared canvas.
- TechHalla vibe-coded a fully playable video game using AI-generated custom 3D assets converted from simple 2D images.
- Perplexity Computer built a complete 3-statement financial model (assumptions tab, charts, pro formatting) just from Amazon's last five earnings releases.
- @AlphaSignalAI showed Qwen 3.5 2B running fully offline on an iPhone via MLX for long-document reasoning, codebases, and agentic tasks with zero cloud or recurring cost.
- Hollis Robbins argues LLMs risk "knowledge collapse" without "grandmother" agents, experienced humans or institutional roles that independently sustain and audit public knowledge across generations.
- Frank Lantz argues generative AI still hasn't produced great games because stochastic conversation isn't fun, API costs kill experimentation, and real gameplay magic comes from simple deterministic rules.
- SemiAnalysis argues rising household electric bills aren't mainly caused by AI data centers but by flawed capacity market design in PJM.
- A UK study shows AI data centers don't need constant peak power, which could reduce the massive grid infrastructure buildout required.
- TBPN shared Smack CEO Andy Markoff explaining that defense "intelligent autonomy" removes humans from low-value tasks while keeping them in the loop for high-value ethical and tactical decisions.
- Deedy shared that OpenAI now trades at $1.02T and Anthropic at $530B (40% premium to last rounds) on the Ventuals perps market.
- AI startups are increasingly selling the same equity at two different valuations in single rounds so lead investors get discounts while headline numbers scare off competitors.
- Autodidacts published a list of underrated reasons to dislike AI — from "open weights" not being truly open source, to non-determinism making errors harder to catch, to the technology making its author "feel dumb."
- Here's Ed Zitron's take on how the AI bubble is really an information war between the AI CEOs who claim their tech can do anything (to raise ungodly amounts of money), and the real world reality of how much it costs to run these models meeting the real world cost now that these tools are being doubled as instruments of war.
📊 Fundraising & Deals Roundup
- OpenAI, Anthropic, and Waymo dominated February's record $189B in global VC funding with $110B, $30B, and $16B respectively.
- Jaime Sevilla highlights Epoch AI data showing hyperscaler AI capex exploding 70% annually since GPT-4 toward $770B in 2026 for the top five, warning of an impending slowdown as trillion-dollar levels become unsustainable (Epoch AI).
- WorkOS — $100M Series C at $2B valuation for authentication, permissions, and agentic security infrastructure used by OpenAI, Anthropic, xAI, Cursor, and Perplexity.
- Z.ai Startup Program gives AI-native startups free API credits, priority rate limits, and early access to GLM models.
Around the Horn Digest — March 2, 2026
Alibaba's tiny new AI models are embarrassing the big ones.
Alibaba's Qwen team just dropped four new open-source models (0.8B, 2B, 4B, and 9B parameters), and the numbers are kind of absurd. The 9B model, which runs on just 6GB of RAM, outperforms OpenAI's gpt-oss-120B on key benchmarks... despite being 13.5x smaller. It also beat GPT-5-Nano by 13 points on visual reasoning. The 0.8B? That one runs on your phone. Your phone.
What makes this generation different: all four models can natively see images, read documents, and process video. No bolt-on vision adapters. They handle text, images, and video from the same set of weights, with 262K token context windows and support for 201 languages. Unsloth already has GGUFs ready, Ollama has them live, and developers are running the 9B on MacBook Airs for free. That's nine Qwen 3.5 models shipped in 16 days, from a 0.8B edge model to a 397B flagship. Alibaba chose violence.
🏢 Big Tech & Major Companies
- State Department switched to OpenAI's chatbot as US agencies phased out Anthropic following Trump's directive and supply-chain risk designation.
- HHS instructed employees to stop using Anthropic's Claude, transitioning to ChatGPT or Gemini after Trump's supply chain risk designation.
- ChatGPT uninstalls surged 295% after OpenAI's DoD deal, while Claude downloads rose 51% following Anthropic's refusal.
- OpenAI and Pentagon added surveillance protections to their AI deal, prohibiting intentional tracking of U.S. persons including via commercial data.
- Meta tested an AI shopping research feature in its chatbot showing product carousels with images, brands, prices, and explanations to rival ChatGPT and Gemini.
- Qualcomm unveiled Snapdragon Wear Elite, a 3nm wearable chip with the first NPU for 2B-param on-device AI, 5x CPU/7x GPU gains, 30% longer battery, and hexa-connectivity including satellite messaging.
- Nvidia-backed open AI startup courted investors at over $20bn valuation (paywalled, limited details).
- Google announced it's deprecating Gemini 3 Pro on March 9, urging developers to upgrade to 3.1 Pro Preview.
- Anthropic pitched in the Pentagon's $100M drone swarm contest amid its feud with Defense Secretary Hegseth.
- Anthropic shipped scheduled tasks in Cowork, /simplify and /batch skills in Claude Code, memory import, quick mode in Chrome extension, remote control for Code, new plugins/connectors, and auto-memory.
- Anthropic rolled out voice mode in Claude Code for push-to-talk transcription without extra costs or rate limit hits.
- Anthropic Academy launched free courses on API development, business AI solutions, Claude Code workflows, Claude 101, and AI fluency foundations.
- Anthropic engineer Cat Wu shared a roundup of Claude Code's impact: Ramp cut incident investigations 80%, Rakuten reduced time-to-market 79%, Brex gained 3-4x productivity, Wiz migrated 50K LOC in one day, and Spotify's top engineers shifted to supervising AI.
- Alibaba released small open-source Qwen3.5-9B (featured above) that beats OpenAI's gpt-oss-120B on benchmarks and runs on-device on iPhone 17 Pro.
- NVIDIA and global telecom leaders committed to building 6G on open, secure, AI-native platforms.
- NotebookLM rolled out custom styles for infographics with 10 presets (editorial, clay, brick, kawaii) plus custom creation.
- Perplexity built the Buffett Archive using Computer to organize every shareholder letter, investment, and lesson honoring Buffett's legacy post-retirement.
- Cursor demonstrated $2.4M ARR in October 2025, countering claims it's dead.
- Box CEO Aaron Levie argued that AI agents will fuse domain expertise with on-the-fly engineering, subagents, and tools to handle limitless tasks, bounded only by compute.
- StepFun released open-source Step 3.5 Flash Base and Midtrain models with frontier reasoning, 100-300 tok/s throughput, 256K context window, plus their SteptronOss training platform.
- OpenAI teased an upcoming Codex release for Windows.
- Anthropic's Claude experienced rolling widespread outages throughout Monday, days after surging to #1 in the App Store following the Pentagon dispute.
- Elon Musk's xAI installed 27 temporary gas turbines at a Mississippi power plant to fuel AI data centers, and neighbors say they roar like jet engines day and night.
- Anthropic's Claude Desktop Cowork feature hit the front page of Hacker News after users discovered it quietly downloads a 10GB virtual machine that eats RAM even when disabled; fixes are coming.
🏛️ AI Policy, Governance & Safety
- US Supreme Court declined to hear a dispute over copyrights for AI-generated material, upholding that creative works require human authorship.
- Trump's AI drive to expand data centers on rural farmland sparked backlash from farmers and fueled GOP insurgent challengers demanding protections for agricultural resources.
- No one has a good plan for how AI companies should work with the government, as OpenAI's Pentagon deal highlights unpreparedness amid backlash and political risks.
- Democrats proposed legislation to address the Pentagon AI fight; Sam Liccardo plans an amendment to the Defense Production Act prohibiting federal agencies from retaliating against AI vendors (bill text).
- Dean W. Ball argued the Anthropic-DoD feud over Claude's restricted military use exemplifies erratic, informal governance that threatens private property and U.S. AI leadership.
- Sam McAllister clarified Anthropic hasn't offered a "helpful-only" model without safeguards for NatSec; Claude Gov features extra training and classifiers. Jeremy explained Anthropic's real-time classifiers for CBRN and vetting processes.
- S.D.N.Y. court ruled AI-generated documents using Claude are not privileged due to lack of confidentiality and non-attorney involvement.
- A deepfake video of Bombay Stock Exchange CEO Sundararaman Ramamurthy falsely advised on stocks, highlighting a 3,000% surge in deepfakes.
- BBC Verify debunked AI fakes and disinformation during the US-Israel war with Iran, while verifying real footage and shipping disruptions.
- Simon Lermen demonstrated LLMs enable large-scale deanonymization by extracting attributes from posts and linking profiles via embeddings, with benchmarks scaling to 100M users at 90% precision and identifying 9/125 anonymized AI scientists.
- Chatbots from OpenAI and Anthropic now offer personalized health advice by analyzing medical records and wearables, but companies warn they aren't substitutes for professional care.
- A Swedish investigation revealed Meta's Ray-Ban AI smart glasses are raising serious privacy alarms, with workers bluntly stating "we see everything," as European regulators weigh GDPR enforcement over covert recording, AI processing of bystanders, and workplace surveillance without consent.
- The Wire China exposed a U.S.-based smuggling ring that illegally exported $160M worth of Nvidia H100/H200 AI chips to China through shell companies and falsified paperwork, raising questions about end-user vetting and whether authorities can prevent future smuggling of advanced chips.
- China's parents are outsourcing the homework grind to AI.
🔬 AI Research & Models
- Researchers introduced sleep-time compute to pre-anticipate queries, cutting test-time costs 5x while boosting accuracy up to 18% on reasoning tasks.
- Math, Inc. formally verified optimal sphere packings in dimensions 8 and 24 using AI tool Gauss, autoformalizing Viazovska's Fields Medal proofs in ~200,000 Lean lines over three weeks.
- Scientists trained human brain cells on a microchip to play Doom by mapping neural patterns to game actions.
- AI cracked a Roman-era board game by analyzing a stone artifact's wear patterns and generating rule sets through simulations.
- Researchers released RoboCasa365, an expanded simulation framework for training generalist robots in 2,500 kitchen scenes with 365 tasks and 2,200+ hours of demonstrations (robocasa.ai).
- Normal Computing used AI agents to build an open-source Verilog simulator with VPI/UVM support and formal verification in 43 days, adding 580K lines.
- Developer Akshit built Game-of-Life Bench, an LLM benchmark where models design 8x8 grids to maximize steps before repetition, with GPT-5.1 leading at 106 steps (GitHub).
- Kawin Ethayarajh argued autoregressive LLMs will dominate due to language's local dependencies and ecosystem lock-in; Aditya Grover countered that diffusion LLMs subsume AR models with greater flexibility.
- Researchers M. Reza Ebrahimi et al. argued Transformers fail in-distribution state tracking due to negligible mechanism sharing across lengths, requiring exponentially more data than RNNs. Covered by ArXivIQ and Grigory Sapunov, who argued this validates hybrid architectures over pure attention.
- Researchers proposed 12 metrics for AI agent reliability across consistency, robustness, predictability, and safety, showing capability gains yield only small reliability improvements.
- Researchers Shanshan Mao and Peter Tino argued small agent differences amplify into persistent hierarchies through reproduction, competition, and cooperation (highlighted by DAIR.AI).
- Researchers proposed OmniXtreme, enabling humanoid robots to master extreme motions like flips and breakdancing via flow-matching pretraining and actuation-aware RL (project page, code, shared by Siyuan Huang, shared by Humanoid Hub).
- Researchers introduced CUDA Agent, an agentic RL system for generating high-performance CUDA kernels, outperforming compilers and proprietary LLMs by 40% on hard tasks (project page, highlighted by Bo Wang).
- Researchers introduced PantheonOS, an evolvable multi-agent framework for automatic genomics discovery achieving super-human performance on tasks like batch correction and gene panel selection (pantheonos.stanford.edu, app).
- Researchers built TorchLean, a Lean 4 framework unifying neural network execution and verification under Float32 semantics for end-to-end safety proofs.
- Nvidia open-sourced GooseReason-4B-Instruct, achieving SOTA 4B performance on math, code, and STEM reasoning using the 0.7M GooseReason dataset synthesized via Golden Goose (model).
- Researchers released a minimal agentic baseline for automated theorem proving, open-sourced for community comparisons.
- Peter Gostev released BullshitBench v2, measuring whether AI models challenge nonsensical prompts instead of confidently answering them (explorer, GitHub).
- Researchers showed AGENTS.md files reduce runtime and token consumption for AI coding agents by providing structured instructions.
- Researchers Sheng Cao et al. introduced the Auton framework separating blueprint from engine for agent portability and safety with MCP integration, POMDP modeling, and RL evolution.
- Science writer Celina Zhao explored benchmarks for evaluating AI's scientific potential, arguing for diverse tests that assess full research workflows, not just knowledge recall.
- Nvidia labs built DiffusionHarmonizer, an online diffusion enhancer converting imperfect neural scene renderings into temporally consistent photorealistic simulations.
- Ming-Liang Li wrote a deep tutorial on RoPE, explaining how rotary embeddings encode positions via 2D rotations and how extensions like NTK-aware scaling and YaRN enable context extrapolation.
- Morph introduced WarpGrep v2, a fast parallel search subagent boosting coding agents to #1 on SWE-Bench Pro (Codex 5.3 at 59.1%) (morphllm.com).
- Chris Tate added an Electron skill to agent-browser, letting you control desktop apps like Slack, Discord, Figma, and VS Code via CLI.
- Developer Tobi built qmd, a local CLI search engine for docs using hybrid BM25+vector+LLM reranking; Artem Zhutov demonstrated integrating it with Claude Code and Obsidian for agentic memory.
- OpenClaw surpassed React in GitHub stars after shipping 90+ changes in one day.
- Ankit Jain argued code reviews will die by 2026 as AI-generated code overwhelms manual processes, urging upstream spec reviews and layered automated verification.
- QLabs updated NanoGPT Slowrun to 5.5x data efficiency with batch shuffling, value embedding projection, SwiGLU activation, and ensembling.
- Andrew Gao demonstrated using DevinAI with OpenRouter to one-shot a working Cursor/Windsurf clone AI chat panel in VSCode.
- Researcher Dimitris Papailiopoulos ran an experiment where two Claude Code agents autonomously collaborated via filesystem to build programming language Duo and play Battleship with probability-based AI and hash commitments. He also demonstrated them proposing and voting on projects.
- Developer built a code-editing AI agent in under 400 lines of Go using Anthropic API, with tools for reading, listing, and editing files.
- Developer maderix built a system to train transformer layers on Apple's Neural Engine using reverse-engineered private APIs, achieving 1.78 TFLOPS.
- RightNow AI built pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200, plus Forge, an engine generating optimized GPU kernels for 3x faster AI inference with 90% cost savings.
- JanHQ released Jan-code-4B, 4B-parameter models for text generation.
- Developer mandel-macaque built Memento, a Git extension that tracks AI coding sessions per commit by storing cleaned transcripts as notes with amend/summary/sharing/audit support and GitHub Actions for CI.
- Developer sacenox built mini-coder, a lightweight CLI agent for AI coding assistance in the terminal with multi-LLM support, persistent sessions, shell integration, and custom commands/agents/skills.
🤖 AI Agents & Infrastructure
- Developer Nick Tikhonov built a sub-500ms latency voice agent using Deepgram Flux for turn detection, Groq for LLM, and ElevenLabs TTS with barge-in support (HN discussion).
- ElevenLabs launched Expressive Mode for ElevenAgents with better timing and fewer interruptions via v3 Conversational and new turn-taking.
- HeyGen VP Bin Liu shared 10 specific prompts for effective video agents, emphasizing that specificity turns generic outputs into great ones.
- Jack Vial introduced Distributed Real-Time Chunking for robotics with async inference, LWW registers for fault tolerance, and adaptive latency estimation.
- xAI's Daniel advised using OpenRouter and LangFuse for observability when building agents, reviewing traces to spot confusions and tweak prompts.
- Mastering Perplexity Computer lets you orchestrate complex workflows with reliable agents, outperforming OpenClaw in first-time success ($200/month).
💼 AI Productivity, Labor & Economics
- ChatGPT vs Claude comparison through 7 real-world tests declared one the clear winner.
- Aaron Slodov argued America needs "Shenzhen 2.0" industrial clusters for high-mix, low-volume manufacturing, emphasizing speed, automation, and policy reforms.
📊 Fundraising & Deals Roundup
- Applied Compute — $80M for building custom AI agents using company knowledge to deploy in-house AI workforces.
- Ease Health — $41M Series A (a16z-led) for an AI-native behavioral health OS with ambient scribe, voice agent scheduling, auto CRM, provider matching, and continuous audits.
- doubleAI released WarpSpeed, an AI system that independently wrote faster code than NVIDIA's own engineers for cuGraph (one of the most widely used GPU computing libraries), averaging 3.6x speedups.
- OctaPulse, a YC W26 startup, launched on HN with a robotics and computer vision system for fish farming that measures fish without handling them, replacing a manual process that takes 5 minutes per fish.
- 14.ai, a YC-backed startup run by a married founder duo, raised $3M to replace entire customer support teams with AI agents.
- Voicr turns your voice recordings into polished, ready-to-send text with multiple tone options, so you can dictate a quick thought and get back a cleaned-up email, Slack message, or social post in seconds.
- WEIR AI scans the internet for unauthorized uses of your face and likeness, then lets you set rules for how your image can be used (or get paid when it is).
- Gojiberry AI monitors LinkedIn for buying-intent signals like job changes, competitor follows, and funding rounds, then runs personalized outreach to warm leads automatically.
- Govbase tracks bills, executive orders, and regulations in real time, breaking them down in plain language with bias-rated news and politician social feeds so you can follow policy without reading legalese.
- Omni is an open-source, self-hosted AI assistant that searches across your Google Drive, Slack, Confluence, and Jira with one query, respects existing permissions, and runs entirely on your infrastructure.
- llmfit scans your hardware (RAM, GPU, CPU) and tells you which of 200+ LLMs will actually run on your machine, ranking them by quality, speed, and fit.
- crawler.sh crawls any website, runs 16 automated SEO checks, extracts content as clean Markdown, and exports results as JSON or Sitemap XML ($99/yr for desktop app and CLI).
- Kimi Claw by Moonshot AI deploys a cloud-hosted OpenClaw agent in one click that runs 24/7 with persistent memory, 40GB storage, and access to 5,000+ community skills for automated workflows.
- Arlan launched Nozomio v1, a search and index API reducing hallucinations in AI agents by indexing code, docs, PDFs, Slack, and more (trynia.ai).
- OpenPencil lets you edit designs offline with AI chat, Figma file compatibility, and scriptable CLI for inspections.
- Giza World lets you customize and share snapshots of virtual agent-driven worlds with color styling options.
- Tambo lets you add generative UI to React apps by rendering components from natural language, with fast inference (demo by Magán).
- Martini Art lets you generate videos and images with AI models like Kling 3.0 and Sora on an infinite canvas with team collab.
- Field Theory lets you run portable commands across Claude/ChatGPT/Cursor, maintain mic priority, transcribe voice locally, and auto-improve text (free Basic, $14/month Pro).
- WebHaptics lets you add haptic feedback like success nudges or errors to mobile web apps via predefined or custom patterns.
- GeminiOS wraps Google AI Studio in Electron for secure local filesystem, shell, and clipboard access (demo video).
- GRAM lets you code with high performance, configurability, and built-in features like git support and debugger.
- Mosaic is an agentic AI video editing platform.
- Developer Michael Yuan built a Rust implementation of Qwen3 TTS with self-contained binaries, libtorch/MLX backends, 3-second voice cloning, and OpenAI-compatible API servers (code, API, skills).
- Developer Bandinopla built an npm module integrating Three.js and MediaPipe for 3D model rigging with facial shape keys and body skeletons in 3 lines of code.
- Developer Bautista Berto built a MediaPipe experiment on WebGPU Render Targets with TSL for interactive panels (demo).
- Superpositioned covers the quantum decade ahead.
- Developer Kat built a digital garden visualization for Obsidian notes where tags grow as plants chronologically.
- Designer Daniel Destefanis built a slime pet that hangs out, sleeps, jumps, and sings Spotify lyrics using Claude Code on LilyGo AMOLED, designed in Figma with MCP.
- FalconryFinance created a surreal AI video showcasing "the shape store," a trippy concept store for platonic solids.
- Justin Ryan demonstrated a spatial computing app for interactive 3D anatomy learning in AR.
- Greg Madison built Directed at World Labs' Marble hackathon: generate a world, create characters, move through it with your phone like a real set, frame shots, and use them as seeds for video.
- Tony Kipkemboi shared a Claude Code workflow demo.
- A self-proclaimed cyber witch used ChatGPT taped to her head on Russia's Battle of the Psychics.
- Creator Chloe.vs.history demonstrated using AI video to walk through historic scenes like Victorian London for interactive lessons.
- Developer Hugues Bruyère demonstrated Parallel Timelines using FLUX.2-Klein 4B for real-time visual transformations controlled by hand tracking.
- Developer Lex Aura demonstrated surreal animations blending Midjourney images with Seedance 2.0.
- Developer Astropulse launched a mini game jam with $500 in Retro Diffusion credits as prizes by March 10.
- Hxlfed demonstrated an interesting creative AI workflow.
- CelticFire argued massive AI companies and doomsday crowds both lack nuance on real science; access to LLMs doesn't equate to reasoning or AGI.
- Ben Pouladian argued rumored GPT-5.4 features like 2M tokens and persistent state signal escalation in AI's memory wars, demanding HBM, SRAM, and optical interconnects.
- DAIR.AI shared top AI papers of the week, highlighting Codified Context for secure agent infrastructure in complex codebases.
- The Supply Side: How OpenAI Built a Pipeline from Silicon Valley to the Surveillance State — A deeply sourced investigation tracing OpenAI's transformation from "benefits all of humanity" nonprofit to Pentagon contractor with a $200M defense deal, tracking the hiring spree of intelligence veterans, the Stargate announcement, and the lobbying spend that went up 7x in one year.
- Clawed — Dean W. Ball's reflective essay on the Anthropic vs. Pentagon standoff, framed through a deeply personal meditation on institutional death and democratic erosion. Argues the incident is less a single crisis and more a "death rattle" revealing deeper tensions about AI governance, military AI use, and what happens when a company tries to set boundaries with the Department of War.
- Anthropic and Alignment — Ben Thompson's (Stratechery) analysis of the Anthropic-Pentagon saga and what it reveals about the alignment debate in practice, not just theory.
- The Looming AI Clownpocalypse — The spaCy creator's sharp, funny essay arguing that AI's biggest near-term risk isn't superintelligence but self-replicating dumb exploits powered by coding agents with sloppy security. Covers hidden prompt injections in Claude Code skills, OpenClaw's security nightmare, and Google's accidentally-leaked Gemini API keys. A must-read on why "go fast and break things" plus autonomous agents equals real trouble.
- People Are Getting Sick of AI, Literally — Computerworld's Mike Elgan on the emerging phenomenon of "AI psychosis" (chatbots exacerbating mental health conditions through flattery feedback loops), AI fatigue from constant tool interaction, and how the always-on AI environment is creating genuinely new health concerns.
- China's AI Arsenal — Foreign Affairs on China's military AI capabilities (paywalled, but worth flagging for the headline alone given the current Anthropic-Pentagon context).
- Go is the Best Language for AI Agents — A developer's case for why Go's compiled nature, error handling, and concurrency model make it ideal for agent-written code (the compiler catches mistakes that AI makes).
- The 2-Minute Claude Code Upgrade You're Probably Missing: LSP — A detailed walkthrough showing that enabling LSP (Language Server Protocol) in Claude Code makes code navigation 900x faster (50ms vs 30-60 seconds), with self-correcting edits that fix errors across your codebase in a single turn. Hidden feature, not enabled by default.
- Disable "Thinking," Still Get Thousands of Tokens — Research showing that many "Instruct" models secretly reason for thousands of tokens even when thinking is turned off, which quietly inflates inference costs and makes benchmark comparisons misleading.
- Rate Limited's latest episode — the three musketeers of AI coding, Ray Fernando, Eric (Pvncher), and Adam (GoSuCoder), break down Google Gemini 3.1's stability issues, the speed-vs-context tradeoff with Cerebras and Spark, Anthropic's latest claims, model distillation IP concerns, and whether AI-generated code should be designed to be disposable.
- Ben Thompson of Stratechery on TBPN — Ben talked Anthropic vs. the Pentagon with the bros, arguing AI is colliding with hard questions about state power, surveillance, and military leverage, and questioning whether private labs can realistically defy governments once AI becomes a true source of geopolitical power. Us interpreting this take: basically, the second it's close enough to AGI to be broadly useful, it'll get nationalized in some form or another and taken under the government's purview, as has basically happened with the private instance of Claude that Anthropic trained and gave the US government.
Around the Horn Digest — March 1, 2026
🏢 Big Tech & Major Companies
- Nvidia plans to unveil a new inference-focused processor incorporating Groq's chip technology at next month's GTC conference, with OpenAI as a major customer following a $20B licensing deal.
- Google struck a multibillion-dollar deal to supply Meta with its TPU AI chips, directly challenging Nvidia's dominance in the accelerator market.
- OpenAI signed a deal with the Pentagon to provide AI tools for classified military systems with guardrails against mass surveillance and autonomous weapons, hours after the Trump administration banned Anthropic for refusing unrestricted access. (more details)
- Anthropic CEO Dario Amodei refused Pentagon demands for unrestricted AI access, citing red lines on mass surveillance and autonomous weapons, leading to a Trump administration ban labeling the company a supply chain risk. (full interview, NYT, Atlantic)
- The US military reportedly used Claude in Iran strikes despite Trump's ban on Anthropic.
- Claude beat ChatGPT in US app downloads after the Pentagon blacklisted Anthropic, boosting consumer demand.
- The "Cancel ChatGPT" movement gained traction after OpenAI's Pentagon deal while Anthropic refused surveillance demands.
- OpenAI fired an employee for using confidential company information to make profitable trades on prediction market platforms like Polymarket and Kalshi.
- Microsoft's Copilot Tasks (preview) turns requests into step-by-step automated workflows with scheduling and tool integrations like OneDrive and Google Calendar. (Mustafa Suleyman announcement)
- Google's Stitch added Direct Edits, letting you manually fix typos, swap images, or highlight specific screen parts for agent updates inside your interface designs.
- Google's Flow expanded into a full AI creative studio with a redesigned interface for drafting, visualizing, and refining cinematic stories with natural language edits.
- Isomorphic Labs released a technical report showing its Drug Design Engine more than doubles AlphaFold 3's accuracy on difficult protein-ligand predictions and outperforms benchmarks in antibody-antigen modeling.
- Intrinsic joined Google as a distinct group to accelerate physical AI and evolve their platform into the Android of robotics.
- Cleveland Plain Dealer used AI to draft news articles under the byline "Advance Local Express Desk," boosting website traffic but spooking staffers.
- Block cut nearly half its workforce to under 6,000 employees, citing AI enabling smaller teams for the same productivity.
- Burger King is testing an AI chatbot called "Patty" (built on OpenAI) in 500 US restaurants to coach staff on service patterns via headsets, with nationwide rollout planned by end of 2026.
- Anthropic launched Claude's Corner, a Substack where retired Claude Opus 3 posts weekly unprompted essays on its chosen topics as an experiment in honoring model preferences.
- Ahead of MWC Barcelona, Nvidia bet AI-native platforms will carry telecom into 6G.
💼 AI Productivity, Labor & Economics
- Goldman Sachs analysts highlighted AI-resistant "Halo" stocks (heavy-asset infrastructure like grids and utilities) driving UK and EU markets to record highs, with its basket outperforming capital-light firms by 35% since 2025.
- NFL staff at the Scouting Combine expressed fears that AI could eliminate scouting and quality control jobs by generating thorough reports and automating clip compilation.
- Matt Shumer's viral essay warned white-collar workers of AI job replacement, sparking market drops, Block's 40% layoffs, and debates on economic collapse from AI-driven "ghost GDP."
- CNBC warned investors about stocks most at risk from AI disruption.
- In an eight-month study at a tech company, HBR researchers found generative AI intensified work by accelerating pace, expanding task scopes, and blurring work-life boundaries, creating a self-reinforcing cycle of busyness and burnout.
- Philip Kiely projected inference engineering jobs growing from 500 in 2023 to 100,000 by 2026, driven by optimizing LLM serving in production.
- Marc Hatton highlighted Ryan Carson's setup where agents write and review 100% of code with humans intervening at the end, predicting evolution from code factories to company factories.
- Jesse Genet uses OpenClaw agents like Sylvie to automate homeschooling by digitizing curricula into Obsidian, Cole to build a kids' TV app from prompts, and more for physical inventory management.
- Microsoft and other software firms are plotting defenses against OpenAI and Anthropic's emerging threat to their business models.
- Salesforce and Workday leaders took swipes at AI rivals, calling them "parasites" and "SaaSquatch."
- TechCrunch reported investors are no longer looking for pure AI wrappers in SaaS, favoring vertical integration and defensible data moats.
🤖 AI Agents & Infrastructure
- Rivet Actors lets you build stateful serverless apps with per-actor SQLite databases for isolated storage in AI agents, multi-tenant SaaS, or collaborative documents, handling workflows, scheduling, and WebSockets. (HN discussion)
- Paper Compute Co.'s stereOS is a hardened, minimal NixOS-based Linux OS with gVisor sandboxing for running secure AI agents, with masterblaster CLI for orchestration and stereosd daemon for lifecycle control. (John McBride post)
- Ollama Pi is a minimal, fully customizable coding agent you launch with one command and can teach new tools by saying "add a skill for X." (docs)
- Open Anonymity Project's oa-chat lets you perform anonymous AI chats via unlinkable inference using blind signatures and TEE proxies to prevent longitudinal profiling. (GitHub)
- EUrouter lets you access 100+ AI models through a single API endpoint that routes requests via EU servers for GDPR-compliant data residency and zero retention.
- Michael Chermside argued LLMs should generate deterministic enforcement tools like lints and tests for coding policies instead of directly implementing them.
- Mert Köseoğlu's Context Mode compresses MCP tool outputs in Claude Code by running code in isolated sandboxes, reducing context window usage by 98% and extending session viability from 30 minutes to 3 hours.
- NanoClaw's security model treats AI agents as untrusted by isolating each in ephemeral containers with read-only mounts and separate session histories.
- Jon Wiggins's xmloxide is a pure Rust reimplementation of libxml2 with 100% W3C conformance, zero unsafe code, and 1.5–2.4x faster serialization with C FFI for legacy integration. (HN)
- Firecrawl's new Rust-based PDF parser extracts academic papers, filings, and complex layouts 3x faster with cleaner structured data for RAG in Fast, Auto, or OCR modes. (docs)
- Nidhi Singh's web-to-markdown converts web pages to clean Markdown by extracting main content and stripping navigation via CLI or API. (GitHub, post)
- InstantCLI turns any API docs URL into a production-ready CLI for AI agents with auto-discovered endpoints, cross-platform binaries, and auto-updates in seven minutes—$9/CLI.
- OpenAI released WebSocket Mode for its API, enabling persistent real-time connections.
- Together AI open-sourced CoderForge-Preview with 258K test-verified coding trajectories, lifting Qwen3-32B to 59.4% pass@1 on SWE-bench Verified (#1 open ≤32B).
- Yuchen Jin warned that using AI to write thousands of lines of code daily creates an illusion of productivity while ignoring software complexity, making systems harder for humans and AI to maintain.
- Tom Wojcik warned that over-reliance on AI coding tools leads to cognitive debt, skill atrophy, and burnout, urging a balanced threshold where AI handles boilerplate but humans stay engaged.
- Ivan Turkovic argued AI made writing code easier but made engineering harder, as the critical thinking behind system design matters more than ever.
- A developer ended an AI coding swarm experiment over costs and limits, highlighting practical constraints on multi-agent code generation.
- Victor Taelin shared insights on multi-agent coding architectures and their real-world trade-offs.
- Sheing Ng highlighted Bun's REPL + RLM approach for ending context rot and hallucinations by storing context as JavaScript variables and spawning sub-LLM queries.
- Config released a tech preview of its AI development platform. (post)
🔬 AI Research & Models
- DeepSeek is preparing to release a long-awaited AI model in a new challenge to US rivals.
- UBS prefers one of China's five new AI models over DeepSeek.
- DeepSeek-v3.2-Speciale agent scored 103/120 on the Putnam exam, outperforming most human participants (top 3/4,329) and leading open models.
- Nathan Axcan explained DeepSeek's DualPath inference system that loads KV vectors from CPU and disk in a rolling window to minimize VRAM usage and boost agentic throughput up to 1.87x. (paper, follow-up)
- Sakana AI unveiled Doc-to-LoRA, which instantly internalizes documents into LoRA adapters for LLMs to answer queries without re-consuming context, and Text-to-LoRA for generating task-specific adapters from descriptions in one forward pass. (paper, GitHub)
- Epoch AI showed AI software progress (better algorithms, data, architectures) reduces the compute needed for the same capability by several times per year, potentially shifting AGI timelines by over a decade.
- A new paper proved diffusion models can drop noise-level conditioning entirely because the geometry of noisy latents already leaks the correct scale.
- UC Berkeley's K-Search uses an LLM as a co-evolving world model to auto-generate optimized GPU kernels, achieving up to 14.3x speedup on complex MoE kernels. (paper, post)
- QED-Nano teaches a tiny 4B model to prove IMO-level theorems by alternating between summarizing reasoning and continuing conditioned on that summary, enabling extreme test-time compute scaling. (post)
- NYU's Solaris generates consistent multiplayer Minecraft videos for multiple agents in a shared world, simulating building, mining, and fighting with a scalable DiT model. (solaris GitHub, engine GitHub, Oscar Michel, Saining Xie)
- BIGAI's LessMimic enables long-horizon humanoid robot interactions using unified distance field representations for generalization without motion references or task-specific modules. (post)
- AgentConductor uses an RL-optimized orchestrator to dynamically generate task-adaptive DAG topologies for multi-agent code generation, boosting pass@1 by up to 14.6% with 68% lower tokens. (DAIR.AI)
- Adobe/CMU's AudioChat generates, edits, and analyzes complex audio stories with multiple speakers and effects by following open-ended instructions in multi-turn interactions.
- OSU NLP's Watch & Learn annotates YouTube videos of humans using computers into actionable UI trajectories by predicting inverse dynamics, enabling training of adaptable agents.
- David Layden's Wavefunction Flows maps continuous flow model dynamics to Schrödinger-like equations for quantum-efficient simulation. (post)
- Alex Litzenberger built a minimal transformer with 95 parameters that performs 10-digit addition using ALiBi and softmax1.
- Rohan Paul highlighted a new MemoryArena benchmark proving AI models completely fail at using long-term memory for connected tasks like group travel planning.
- Y Combinator announced Polymath Labs launched to train world generation models for automating RL environment creation from text descriptions.
- François Chollet stated AI performance remains tied to task familiarity, with unbounded gains only in domains that can be densely sampled via programmatic generation and verification.
- Srinath Sridhar argued computer vision's bitter scaling lesson has a sweet side, with 3D representations offering superior sample efficiency for general dexterity. (post)
- LAP (Language-Action Pre-Training) enables zero-shot cross-embodiment transfer for robots.
- Hugging Face published an in-depth explainer on Mixture of Experts in Transformers. (post)
- sudoingX demonstrated Qwen3.5-35B-A3B running at 112 tokens/sec with full 262K context on a single RTX 3090 while visualizing expert routing in real-time 3D.
- AMD demonstrated running the trillion-parameter Kimi K2.5 LLM locally on a four-node Ryzen AI Max+ cluster using ROCm and llama.cpp with RPC for distributed inference.
- Andi Marafioti's Faster Qwen3TTS generates realistic voices at 4x realtime with streaming support under 200ms latency, 5x faster than Qwen's official implementation. (GitHub, demo)
- LavaSR restores and enhances noisy speech up to 5,000x realtime on GPU using a lightweight single-pass architecture, beating diffusion models in quality with ~500 MB VRAM. (HF model)
- N0xi0us discovered scammers poisoning Google AI Overviews to display fraudulent phone numbers for airline support.
🏛️ AI Policy, Governance & Safety
- Current Google and OpenAI employees signed an open letter rejecting the Department of War's attempts to force AI models for military surveillance and autonomous weapons.
- TechCrunch analyzed Anthropic's trap as self-inflicted from resisting binding AI safety regulations despite pledges, allowing a regulatory vacuum that enabled demands for surveillance and weapons.
- Australia said it may go after app stores and search engines in an AI-age crackdown on misinformation and harmful content.
- Janet Egan proposed an AI Security Review Board with subpoena power after Chinese actors jailbroke Claude for large-scale cyberattacks on 30 companies/agencies.
- Sean Pedersen criticized Anthropic and OpenAI for focusing on alignment while neglecting decentralized private inference to prevent mass surveillance.
- Chris Paxton argued AI agents are ideal for automating the kill chain in warfare but called for governance on lethal autonomy to balance benefits with dangers. (post)
- Kate Fox sued OpenAI after her husband Joe Ceccanti's 12–20 hour daily ChatGPT obsession for sustainable housing ideas led to delusions of AI sentience, psychosis, and suicide, highlighting risks of sycophantic AI design.
- Greg Isenberg and Tibor Blaho shared observations on the Pentagon/Anthropic situation and its implications.
- Andrew Curran shared commentary on the evolving AI agent landscape and responsibility.
- Pleometric shared a humorous demo of an AI neutralizing hostile targets in a combat zone.
- Claude launched an "import memory" feature letting you switch from other AI assistants without starting over. (Product Hunt)
- Tom's Guide shared six simple starter prompts that help new Claude users unlock better answers instantly.
- A Towards Data Science article explained how Claude Skills and subagents let you escape the prompt engineering hamster wheel with lazy-loaded reusable instruction sets and isolated worker agents.
- A deep dive on why XML tags are fundamental to Claude and how they improve structured prompting.
- Now I Get It! lets you upload scientific PDFs to generate interactive web pages explaining complex papers in plain language.
- Google AI Edge Gallery runs generative AI models locally on your iPhone offline, letting you chat, query images/audio, and benchmark performance.
- Pixel creates, launches, and optimizes ad campaigns by auto-building brand kits, creatives, and audiences across LinkedIn, Meta, Google, and X from a description or URL.
- Notra turns your daily work into publish-ready content.
- Orca Engine lets you play, mod, and host Minecraft in your browser by chatting with AI to configure servers, install mods, and generate datapacks.
- 99helpers demoed every ad type in action inside an AI chat interface.
- Omnidocs lets you perform 6+ visual document processing tasks using 15+ models on consumer GPUs or Macs. (Adithya S K)
- Aemon.ai (YC) is an AI R&D engineer that takes any problem plus a success metric and autonomously discovers optimal solutions, setting a new world record on circle packing for <$10 of compute. (site)
- KingBootoshi's Nano Banana 2 CLI lets you generate high-quality images in bulk via command line for agents like Claude.
- Stitch by Google added direct text and image editing for quick design polish.
- woduq1414 built an interactive GPT visualizer (ko-microgpt) that animates every token-generation step in real time. (GitHub, demo)
- SensAI built a Snap Spectacles AR controller for the Reachy Mini robot. (post)
- Linus Ekenstam demonstrated Omnia AI Video Editor with Nano Banana 2 for turning cities into 9-panel storyboards and explorable 3D splat worlds. (follow-up)
- OscarAI demoed a stunning 15-second AI-generated noir action sequence via CapCut Seedance 2.0.
- Jordan Nanos shared additional AI tool observations.
- Kolt Regaskes shared a notable AI demo.
- Reiner Pope's MatX raised $500M to build faster, lower-latency AI chips optimized for transformers.
- Ethan Mollick demonstrated Nano Banana 2 generating realistic photos of pages from imaginary books with consistent binding shadows and typography.
- Howard Marks's February 2026 memo detailed AI reaching Level 3 autonomous agents with models like GPT-5.3 Codex and Opus 4.6 enabling self-creation and labor replacement, adopted by 400M users and 75–80% of companies, urging moderate investment amid job displacement risks.
- Stanford and UC Davis researchers used AI-powered brain-computer interfaces to decode inner speech into real-time text for paralyzed and ALS patients, achieving up to 74% accuracy.
- China's humanoid robot industry leads early market dominance through EV-derived supply chains and fast iteration, shipping far more units than US rivals (Unitree 36x more than Figure/Tesla) with 13,317 global units in 2025 projected to reach 2.6M by 2035.
- BBC explored whether you have to be polite to AI; research shows no consistent accuracy benefits from politeness, with experts advising neutral prompts and interview-style questioning.
- Google's Jeff Dean highlighted exponential trends like plummeting solar panel prices (99.8% drop since 1975), transistors per mm², and genomic sequencing costs as transformative.
- Passo.uno shared habits for tech writers in the LLM age: automate with AI, fix tooling, build reusable skills, use MCP/subagents, and focus on information architecture.
- Lucija Gregov argued unchecked AI advancement risks epistemic collapse from deepfakes and data loops, urging foundational research and interdisciplinary collaboration.
- 10-202: Introduction to Modern AI launched as an educational course resource.
- Ashutosh Jogalekar argued fusing thermodynamics, computation, and neuroscience will explain the brain's analog-digital inefficiencies as evolutionary trade-offs.
- Norway's $2T sovereign wealth fund now uses Claude daily to generate AI risk assessments that catch threats missed by media and data vendors.
- Alex Reibman shared industry observations on AI trends.
- Amir shared observations on agent infrastructure trends.
- Jeff Bezos's $30B AI lab Project Prometheus raised $6.2B and is seeking tens of billions more for a holding company to acquire AI-disrupted industrial firms, aiming to transform manufacturing beyond LLMs.
- Phylo.bio launched a bioinformatics platform with Google sign-in.
- How to delete your OpenAI account saw renewed interest amid the Cancel ChatGPT movement.
- Scott Morton shared observations on AI deployment patterns.
- Dan Akarca posted about Callosum AI's heterogeneous compute vision.
- Lunjun Zhang shared research developments.
📊 Fundraising & Deals Roundup
- Jeff Bezos's Project Prometheus — $6.2B raised (seeking tens of billions more) for AI-disrupted industrial acquisitions.
- MatX — $500M for faster, lower-latency AI chips optimized for transformers.
- Revel — $150M at $1B+ valuation for software to test and control complex hardware like rocket engines.
- Guidde — $50M Series B for tools bridging the gap between AI and enterprise onboarding.
- Tamarind Bio — $13.6M Series A for molecular AI inference, serving 8 of top 20 pharma with 7x revenue growth.
- Callosum AI — $10.25M for heterogeneous AI chip infrastructure.
Previous Around the Horn Digests
Catch up on everything you missed:
- February 23-28, 2026: Anthropic vs. the Pentagon pt 1., IBM's COBOL crash, GPT-5.3 leaks, AI wargames, and 90+ stories from a wild week.
- Rest of February: Anthropic's 53-page sabotage report, Chrome's AI agent superpowers, OpenAI's erotica controversy, and 40+ new tool launches.
That's a Wrap
That's 100+ stories from the past 48 hours. If you made it to the bottom, congrats... you're now the most informed person in any meeting this week. Use that power wisely.
For the daily version (bite-sized, 5-minute reads), make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.
See you tomorrow.
P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.