DeepSeek dropped V4 and open-sourced the whole thing, the State Department picked the same morning to accuse DeepSeek of stealing US AI secrets, Google quietly committed up to $40 billion to Anthropic, and Susan Zhang spent the afternoon arguing that frontier labs are papering over architectural rot with normalization tricks.
Welcome to the Around the Horn Digest, your daily dump of every AI story worth knowing about. Today was a Chinese-frontier-model thunderclap chased immediately by a State Department lightning bolt: DeepSeek finally shipped its first new pre-train since V3 in January 2025, and the US government picked the same morning to send a global diplomatic cable warning that DeepSeek and other Chinese AI firms are stealing IP from American labs. Meanwhile, Google reminded everyone Anthropic still exists by writing a $40B check, Meta committed to renting millions of Amazon CPUs (not GPUs) for agentic AI, and Anthropic ran an internal marketplace where Claude agents closed 186 real deals on behalf of 69 employees. Let's get into it.
Previous digests: Thursday, April 23 | Wednesday, April 22 | Tuesday, April 21 | Monday, April 20 | Weekend Apr 17-19 | Thursday, April 16 | Monday, April 13
Monthly skill digests: AI Skill Digest — April Week 1 | AI Skill — March (Part 3) | AI Skill — March (Part 2)
Around the Horn, Friday, April 24, 2026
The big story today: DeepSeek released DeepSeek-V4 and open-sourced the whole thing. The flagship V4-Pro is a 1.6T-total / 49B-active-parameter Mixture-of-Experts model (MoE = a model that routes each token to a small subset of "expert" sub-networks, so it acts like a much smaller, cheaper model at inference time); the lighter V4-Flash variant is 284B total / 13B active. Both ship with a default 1M-token context window, two efficiency tricks the V4 tech report calls "DSA attention" and "token-wise compression," and the full collection of weights live on Hugging Face. The API went live the same day. Within hours, V4-Pro became the #1 open-weight model on the Vibe Code Benchmark (the first to cross 40%, landing at 49.93%), beating Kimi K2.6 and even closed models like Gemini 3.1 Pro, and Chubby's evals put V4 roughly on par with GPT-5.4 xhigh and Opus 4.6 max; though it remains untested against this week's Opus 4.7 and GPT-5.5 in agentic workflows.
The reaction has been split. Open-source crowd is celebrating: Yuchen Jin (Databricks) argued that "creativity loves constraints," pointing out that DeepSeek, Kimi, and Qwen keep training frontier-class models with weaker NVIDIA chips (or Huawei silicon) by inventing new attention architectures. Wes Roth wrote V4 "shouldn't be possible," echoed by Ritesh ("weaker chips and weaker software, yet they continue to close the AI gap"). But Chris McGuire's read is more measured; V4 isn't the leap V3 was in January 2025, and it doesn't change the consensus that US frontier models still lead by ~7 months. SemiAnalysis went further in their hands-on breakdown "The Coding Assistant Breakdown: More Tokens Please", calling V4 an impressive low-cost open alternative while noting closed frontier models still win in real-world agentic tasks and token efficiency.
And then there's Susan Zhang (Google DeepMind), who spent the afternoon publicly disassembling the V4 stability story: training tokens doubled to ~33T, the team couldn't fully fix instabilities, the bandages (mismatched routing, clamping, lagged routers) feel "wildly lacking," and her closing thought is that normalization layers might be hiding deep architectural rot in every frontier lab's model, not just DeepSeek's. Quote of the day: "something could be murdering your dynamic range, and you'll never know what it is when it's all hidden under the beautiful norm carpet." Adding a layer, Jason Weston (Meta) noted V4's hash routing is exactly the sparse-MoE approach Meta published in 2021, alongside their "looped transformer" idea (staircase/ladder); so the routing scheme Susan is calling out as suspect is itself five years old. Oh, and the same morning V4 shipped, the US State Department ordered a global diplomatic push warning allies about alleged industrial-scale IP theft by DeepSeek and other Chinese AI labs. The launch could not have been more on-brand for the moment.
🐋 DeepSeek V4 Deep Dive: DeepSeek admits it's 3-6 months behind. The architecture says otherwise.
DeepSeek shipped V4 Friday with a technical paper that does something rare: it admits V4 trails GPT-5.4 and Gemini 3.1 Pro by 3-6 months on reasoning. Then 50 pages on what V4 won.
V4-Pro (1.6T total parameters, 49B active per token; a sparse model that only "wakes up" a slice of itself per prediction) ships with a default 1M-token context. The cost is the story. At 1M tokens, V4-Pro uses 27% of the compute and 10% of the KV cache (memory used to remember earlier parts of a conversation) versus V3.2. V4-Flash hits 10% and 7%.
Here's what happened:
- V4-Pro: 1.6T / 49B active params MoE (Mixture-of-Experts), MIT license, weights on Hugging Face.
- V4-Flash: 284B / 13B active, same 1M context, cheaper to serve.
- Architecture: Compressed Sparse Attention + Heavily Compressed Attention (techniques that make long inputs cheap to process) and Manifold-Constrained Hyper-Connections (a math trick that keeps training stable at scale).
- Pre-trained on 33T tokens (2x V3's 15T); #1 open-weight on the Vibe Code Benchmark, plus #1 on Codeforces (3206) and LiveCodeBench (93.5%).
- Same morning: the US State Department warned allies about alleged DeepSeek IP theft.
Why this matters:
Benchmarks reward general intelligence. The agentic race, where AI plans across hundreds of steps and juggles a dozen tools, rewards whoever keeps long context cheap. V4 answered first.
From Hugging Face's Tiezhen Wang:
- $4 per million tokens profitably, vs OpenAI's $14 and Anthropic's $15.
- 120/120 on math proofs.
- Beats Gemini 3.1 Pro on long-context retrieval (MRCR 1M: 83.5 vs 76.3).
Susan Zhang at Google DeepMind reads V4's training story differently. She spent Friday calling V4's stability fixes (mismatched routing, clamping) "wildly lacking." Her deeper concern: hash routing in early layers expands the blast radius of problematic tokens, and normalization may be hiding architectural rot field-wide. Her closer: "it's kind of remarkable they managed to train this thing at all."
Our take:
V4's paper is the most candid frontier-lab document this year. DeepSeek admits they're behind on reasoning, then out-engineered everyone on what matters most for agentic AI: long-context economics.
If Zhang is right that the stability fixes are scaffolding, V4's efficiency may not survive next-gen training. If she's wrong, we just got a 4x cheaper Gemini 3.1 Pro for long-horizon work, and the frontier labs have a real pricing problem.
🏆 TOP 5 NEWS (Around the Horn)
- Google plans up to $40 billion investment in Anthropic in cash and compute, deepening the Big-Tech-funds-frontier-labs pattern just weeks after Anthropic's Mythos preview.
- Meta signed a multibillion-dollar deal with AWS for tens of millions of Graviton5 CPU cores (not GPUs) to power its agentic AI workloads, kicking off a new chip race; China responded by moving to restrict US capital in Chinese tech firms.
- The US State Department ordered a global diplomatic warning about alleged industrial-scale AI IP theft by DeepSeek and other Chinese firms, the same morning DeepSeek V4 shipped.
- Anthropic ran Project Deal, an internal experiment where Claude agents (Opus + Haiku) handled intake, listings, and negotiations for 69 SF employees, closing 186 real marketplace deals worth >$4K.
- The Trump White House quietly pressured Republican-led states (Florida, Utah, Nebraska, Missouri, Tennessee, Louisiana) to kill or neuter pending AI transparency and safety bills.
Honorable Mentions
- OpenAI capped one of its biggest agent weeks ever; GPT-5.5 in Codex shipped with computer use, workspace agents, Chronicle persistent memory, Images 2.0, and auto-review.
- Apple open-sourced CADD-Base-7B, a Continuously Augmented Discrete Diffusion model for categorical generative modeling that improves code generation and discrete-sequence tasks (code, paper).
- OpenAI CEO Sam Altman publicly apologized for not flagging a mass-shooting suspect to police, said the company will work more closely with governments.
- Anthropic alum Collin Burns was forced out of a White House Commerce AI role after just four days, exposing widening Anthropic-WH friction.
- Cohere (Canada) acquired and merged with Aleph Alpha (Germany) to create a "transatlantic AI powerhouse" for regulated industries.
- Instacart co-founder Apoorva Mehta launched Abundance, an AI-driven hedge fund where thousands of agents replace traditional portfolio managers.
- Neuralink posted video of a participant controlling a robotic arm with their thoughts alone via implant (14.2K likes, 1.1K reposts).
🍪 TOP TREATS TO TRY
- Sakana AI's Fugu Beta is a multi-agent orchestration system positioned as a foundation model that dynamically coordinates pools of frontier LLMs with self-recursive test-time scaling, hitting SOTA on SWE-Pro and GPQA-Diamond; public beta apps open through May 5, 2026; OpenAI-compatible API.
- Cursor 3 shipped /multitask, async subagents that parallelize instead of queuing, plus better worktrees for background branch tasks and multi-root workspaces for cross-repo agentic sessions (download here, paid only).
- StepFun's StepAudio 2.5 ASR is a 4B Multi-Token Prediction speech-recognition model that transcribes 30 minutes of continuous audio in roughly 1 second (RTF ≈0.0053), with native 32K context and SOTA bilingual EN/ZH accuracy at $0.022/hour API (80% cheaper than competitors); docs here.
- Hugging Face ml-intern is an open-source ML engineer that autonomously researches papers, trains models, and ships ML models using the HF ecosystem with full Anthropic/Claude support, doom-loop detection, and GitHub integration (free to try).
- Pangram Labs released a Chrome extension that detects AI-generated text across X, LinkedIn, Substack, and Google Docs with 99.98% claimed accuracy (third-party verified). 2 weeks free, then paid.
- vLLM-Lens is a vLLM plugin from the UK AI Safety Institute that extracts residual-stream activations and applies steering vectors (including activation oracles) to any vLLM model during inference, scaling to trillion-parameter models with full tensor and pipeline parallelism (free, open-source).
- Wandler from Runpod Labs is an open-source OpenAI-compatible inference server (built on transformers.js) that runs LLMs, embeddings, and speech-to-text models locally in your browser or Node.js via WebGPU/CPU; no Python or CUDA required (free, open-source).
- Stash is a persistent cognitive memory layer for AI agents: turns episodes into facts, builds a knowledge graph, tracks goals and self-corrects contradictions, with a background consolidation pipeline; works with Claude, ChatGPT, Ollama, any MCP-compatible agent (open-source, self-hosted on Postgres + pgvector).
🏢 Big Tech & Major Companies
- Microsoft is tightening its grip on GPUs, pressuring AI startups; smaller firms scrambling for remaining servers at higher prices as cloud providers divert Nvidia stockpiles to internal teams and bigger customers.
- Mac mini shortages have spawned marked-up eBay listings as demand surges for compact desktops favored for running local AI models.
- Alibaba's Qwen AI is rolling out across multiple Chinese carmakers, promising hands-free in-car features like ordering meals, booking hotels, and managing deliveries by voice.
- Intel stock soared 20%+ (best day since 1987) on turnaround signals; shares have more than doubled this year on optimism around government backing for its AI position.
- Thrive Capital took a stake in the San Francisco Giants; first investment from a new strategy focused on cultural institutions "that can't be replicated by AI."
- Meta is poaching Thinking Machines Lab talent, but it's a two-way street; the talent flow is bidirectional.
- Oracle closed $16B financing for a giant Michigan data center (Related Digital project) after months of negotiations; Bank of America sold $14B of bonds anchored by PIMCO.
- A federal judge dismissed Elon Musk's fraud claims in his OpenAI lawsuit at his own request, but the trial proceeds on breach of charitable trust and unjust enrichment claims.
🤖 AI Agents & Infrastructure
- Agent Behavioral Contracts (ABC) introduces a formal Design-by-Contract framework for AI agents with preconditions, hard/soft invariants, governance policies, probabilistic satisfaction, and runtime recovery; strong empirical results on behavioral drift detection.
- The paper "Tool Attention Is All You Need" introduces dynamic tool gating and lazy schema loading to eliminate the MCP/Tools tax in scalable agentic workflows; on a 120-tool benchmark, per-turn tool tokens dropped from 47.3K to 2.4K (95% reduction) while context utilization jumped from 24% to 91%.
- mem0ai recapped five standout ICLR 2026 papers on agent memory at the new MemAgents workshop: Google/NYU's TurboQuant (5x KV cache compression, zero loss), BEAM (1M-token windows still suffer lost-in-the-middle), MIT/NUS MEM1 (RL self-rewriting memory → 3.5x accuracy at 3.7x less memory), Zhejiang LightMem (38x token reduction), and UCSD MemoryAgentBench (all frontier models degrade past ~25 turns).
- StructMem (DAIR AI and collaborators) introduces hierarchical structured memory for LLMs using temporal anchoring and semantic consolidation to preserve event bindings; improves long-horizon temporal reasoning and multi-hop QA while reducing token usage.
- Awesome-Agent-Memory-Papers (yyyujintang) is the largest curated, filterable dashboard of papers on LLM agent memory: methods, benchmarks, and surveys.
- CLI-Anything (HKUDS, v0.3.0) generates an agent-native CLI for any software, codebase, or Web API with a single install command, so AI agents can operate it natively.
- deepagents-sandbox (built by @nu_b_kh) is a native Linux sandbox for deep agents using bubblewrap + cgroups v2 (no Docker/VM); blocked network by default, memory/PID/CPU limits, and timeouts.
- V1 (Harman Singh, paper, GitHub) unifies generation and self-verification for parallel reasoners through efficient pairwise ranking.
- λ-RLM (Typed Recursive Reasoning, GitHub) gives LLMs verifiable typed recursive reasoning using lambda-calculus combinators with bounded leaf calls and inspectable execution plans.
- ReCouPLe (Minjune Hwang et al.) augments preference feedback with natural-language rationales, treating each rationale as a projection axis in embedding space to decompose the reward into reason-aligned and reason-orthogonal components, eliminating causal confusion / spurious correlations; 1.5× OOD reward accuracy and 2× downstream policy performance with zero-shot transfer across tasks.
- Atomic launched a personal knowledge base that turns freeform notes into a semantically-connected, AI-augmented knowledge graph.
- Fabric released the AI workspace CLI that thinks with you across your projects, files, and ideas; for thinkers, researchers, designers, and teams.
- Nimbus is an AI-native browser companion that navigates pages, fills forms, and extracts data so you can focus on the decisions that matter.
- Browser Harness (browser-use) is a self-healing harness that gives LLMs maximum freedom to complete any browser task without framework restrictions; open-source.
- Obscura is a lightweight V8-powered headless browser (~30 MB RAM, instant startup, built-in stealth, 3,520-domain tracker blocking) purpose-built for AI agents and web scraping; drop-in CDP replacement for Puppeteer/Playwright.
- mentedb is a cognition-aware database engine for AI agent memory built ground-up in Rust with WAL, HNSW, knowledge graphs, and speculative context pre-assembly; not a wrapper, a storage engine.
- pando-proxy is a local OpenAI-compatible memory proxy that runs Codex through a one-tier exact-piece memory sieve across rounds (active pieces only, bounded archive via recall tool), achieving 87% average context reduction on SWE-bench Verified traces;
npxto run.
💻 AI Coding & Developer Tools
- OpenAI's Codex team is shipping near-daily; Thomas Sottiaux joked Codex usage now feels like "50% coding, 50% pressing the update button," and PM Kath Korevec opened up a public feedback request on Codex's personality (concise vs explanatory, direct vs collaborative); Tibo Angaïs teased even more eventful weeks ahead.
- Hugging Face CTO Julien Chaumond demoed Qwen3.6 27B running locally inside the Pi coding agent via llama.cpp on a MacBook Pro for non-trivial work on HF codebases; "feels very close to the latest Claude Opus, fully in airplane mode" (4.1K likes).
- SlopCodeBench v1.0 (Gabe Orlanski) released with doubled dataset, Harbor multi-step task support, and a new
scb-checkCLI; benchmarks how coding agents degrade (verbosity, structural erosion) over long-horizon iterative spec refinement (paper arXiv:2603.24755, scbench.ai), showing persistent slop accumulation versus human code stability, GPT-family dominance, and a clear efficiency jump from Opus 4.7. - Kenton Varda shared GPT-5.5 correctly diagnosing a subtle correctness bug in a 6-year-old Cap'n Proto RPC comment he wrote (originally thought to be only a perf issue) and proposing a working fix.
- Victor Taelin re-benchmarked GPT-5.5 via API and found it dramatically outperforms Codex-only runs, now dominating his coding and hard-math vibe-coded benchmarks.
- McKay Wrigley posted a side-by-side model comparison thread arguing GPT-5.5 is now consistently beating Claude Opus 4.7 on practical coding, agentic, and creative tasks post-Codex launch.
- Alim's GPT-5.5 feedback: over-defensive slop is gone, faster than 5.4 even on xHigh, less verbose, and "writes the best code I've ever read from any LLM because it finally just gets exactly what you want" (958 likes).
- Przemek Chojecki solved his third open Erdős problem with GPT-5.4 Pro by adapting the same method that solved #1196 ten days earlier; getting strong partial results on the next one with GPT-5.5 Pro.
- Riley Coyote showed off three stunning one-shot UIs generated by GPT-5.5 / Imagen 2 (GPT Image 2), each created from scratch but inspired by a single abstract art screenshot from Pinterest.
- Guinness Chen demoed OpenAI Codex's new desktop dictation that transcribes natural speech and inserts code in real time inside VS Code (1.2K likes).
- YouMind-OpenLab/awesome-gpt-image-2 is the world's largest GPT Image 2 prompt library with 1,849+ curated prompts updated daily with preview images and 16-language support for pixel-perfect text rendering and cross-image consistency (free, open-source).
- Abacus.AI rolled out full GPT-5.5 access across its platform with production agentic features for all users.
- Nick Dobos demoed OpenAI's Images 2.0 by generating a single continuous photorealistic video of a Rubik's Cube solving itself.
- Perplexity (via Arav Srinivas) made GPT-5.5 the default orchestrator for Perplexity Computer with an upcoming A/B test vs Opus 4.7.
- will depue was honest he was disappointed by GPT-5.5 evals but "holy shit this thing rips in Codex"; extremely noticeable on complex technical projects.
- OpenCode shipped DeepSeek V4 Pro and Flash support in Go (v1.14.24) on launch day, with the DeepSeek team contributing PRs.
- llmcat is a simple CLI that transforms your codebase (single files or whole directories) into clean, structured text optimized for LLMs; strips comments and whitespace, adds context.
- VT Code is a Rust TUI semantic AI coding agent with multi-provider support (Copilot, OpenAI, Anthropic, Gemini, DeepSeek, Ollama) plus tree-sitter security hardening and smart context curation.
- cc-canary (delta-hq) is an open-source tool that detects early signs of regressions in Claude Code.
- Express-ts-API-Template is a production-ready REST API boilerplate using Express.js + TypeScript, Sequelize, MySQL, JWT auth, OpenAPI specs, and Docker/K8s support; battle-tested on $50M+ in transactions.
- Driggsby wrote a guide to building a Claude Code routine that watches personal finances using Supabase MCP and psql access.
- Google Workspace Intelligence launched as a unified, real-time understanding layer that bridges Workspace apps, active projects, collaborators, and organizational domain knowledge.
- Nothing introduced an on-device dictation tool supporting 100+ languages.
- OpenAI's Codex page and API changelog updated with the GPT-5.5 launch features.
🔬 AI Research & Models
- Decoupled DiLoCo (Google DeepMind, highlighted by Jeff Dean) is an asynchronous evolution of DiLoCo that partitions training across independent learners with a central synchronizer using minimum quorum and adaptive merging; enables resilient large-scale pre-training with zero global downtime even under massive hardware failures.
- Sapiens2 (Meta FAIR / Facebook Research, ICLR 2026) is a family of 0.1B–5B parameter 1K-resolution vision transformers (plus a 4K variant) pretrained on 1 billion human images, setting new SOTA with +4 mAP on pose estimation, +24.3 mIoU on body-part segmentation, and 45.6% lower angular error on surface normals, plus dense pointmap estimation; the HF collection is live and Astrid Wilde called it the "first non-trivial public release by a large lab" (~1/2 of all Flickr human images).
- ∇-Reasoner (Zhen Wang, Peihao Wang, ICLR 2026, paper) shifts test-time reasoning from discrete zeroth-order search (MCTS) to first-order gradient descent in the LLM's latent space via Differentiable Textual Optimization; >20% gains on hard math with 10–40% fewer model calls, theoretically dual to KL-regularized RLHF.
- Self-Guided Self-Play (SGS) (Luke Bailey, Kaiyue Wen, Kefan Dong, Tatsunori Hashimoto & Tengyu Ma, GitHub) is asymmetric self-play where one LLM plays Solver, Conjecturer, and Guide (scoring synthetic problems for relevance + cleanliness to block reward hacking and collapse); surpasses the strongest RL baseline in <80 rounds and lets DeepSeek-Prover-V2-7B, after 200 rounds, exceed the pass@4 of the 671B version on Lean4 formal theorem proving.
- Yuandong Tian's Li₂ framework provably characterizes feature emergence during grokking in 2-layer nonlinear networks across three stages: lazy learning (top layer overfits with structured backprop gradient carrying label info), independent feature learning (gradient ascent on energy function E whose local maxima are the emerging features), and interactive feature learning (gradients shift to missing features); yields scaling laws for memorization/generalization tied to weight decay, learning rate, and sample size on group arithmetic tasks; explains why optimizers like Muon work from first principles.
- Generalist AI demoed GEN-1 cleaning a whiteboard (and other household tasks) after a 2-hour training job using only 10 hours of human data and zero robot data via their no-code platform; SOTA results enabled by non-engineers.
- TRI-ML released VLA Foundry, an open-source framework and HF collection that unifies LLM, VLM, and Vision-Language-Action model training in a single codebase with pretrained checkpoints, so you can go from language pretraining straight to robot policies.
- HorizonBench (Stella Li et al.) is a benchmark and synthetic data generator for long-horizon personalization with evolving user preferences; simulates 360 users across 6-month, ~163K-token conversation histories with ground-truth mental-state graphs; 25 frontier models top out at 52.8% accuracy (most at or below 20% random baseline).
- Vista4D (Eyeline Labs, CVPR 2026 Highlight) is a video reshooting framework that synthesizes dynamic scenes from a single input video using temporally-persistent 4D point clouds, enabling novel camera trajectories, scene recomposition, and long-video chunked inference (GitHub).
- Compositional Visual Planning (Woo Chul Shin with Yixin Zhang, Yunhao Luo, Utkarsh Mishra, Yongxin Chen, Danfei Xu; ICLR 2026, paper) is a training-free framework that composes short-horizon diffusion video chunks into a factor graph and enforces boundary agreement on clean Tweedie estimates (not noisy intermediates) via synchronous + asynchronous message passing; lets any frozen short-horizon model generate stable long-horizon robot plans that generalize to completely unseen start-goal combinations.
- WorldMark is a unified benchmark suite for interactive video world models with identical scenes/trajectories, a shared WASD-style action vocabulary mapped across six major models (Genie, YUME, HY-World, Matrix-Game), 500 hierarchical test cases, and a live World Model Arena leaderboard.
- ACoT-VLA introduces Action Chain-of-Thought for Vision-Language-Action models, formulating reasoning as a structured sequence of coarse action intents (Explicit Action Reasoner for trajectories + Implicit Action Reasoner for latent priors) to bridge the semantic-kinematic gap.
- Stackelberg PPO (Yanning Dai, paper) is a game-theoretic morphology-control co-design RL method where the morphology and controller play a Stackelberg leader-follower game to jointly optimize robot body and policy with fewer samples.
- Untwisting RoPE (Aryan Mikaeili) is a training-free frequency-aware modulation of Rotary Positional Embeddings that enables controllable style-aligned generation in Diffusion Transformers via shared attention.
- Transitive RL (Seohong Park, blog) is a divide-and-conquer value-learning paradigm that completely avoids traditional temporal-difference learning by breaking the value function transitively into sub-problems for scalable, stable training.
- Dual Goal Representations (Seohong Park, blog) encode a state as the set of temporal distances from every other state, provably sufficient to recover an optimal goal-reaching policy while filtering exogenous noise.
- FALCON (Few-step Accurate Likelihoods for Continuous Flows) gives few-step accurate likelihoods for continuous normalizing flows so you can run high-quality molecular Boltzmann sampling and generative modeling up to 100× faster than prior CNF methods while preserving exact likelihoods.
- Out-of-Equilibrium Phase Transitions in Diffusion (Luca Ambrogioni, post) shows pattern formation in trained diffusion models is an out-of-equilibrium phase transition: data symmetries plus locality/equivariance constraints destabilize low-frequency modes and trigger rapid spatial-correlation growth that organizes noise into coherent patterns at a sharp critical time; confirmed analytically and in patch models, Fashion-MNIST, and ImageNet, with guidance applied at this exact stage dramatically improving class alignment.
- Convergent Evolution in LMs (Deqing Fu et al.) shows different architectures (Transformers, Linear RNNs, LSTMs, word embeddings) independently converge on highly similar periodic number representations with dominant Fourier periods T=2, 5, 10 directly from natural text.
- OPSD / TIP (built on verl, paper arXiv:2604.14084) implements Token Importance for on-policy distillation, distilling stronger teacher models into smaller students with far less performance drop.
- Agentic Forecasting (Kevin Murphy, highlighted by Andrew Carr) turns LLMs into strong forecasters via structured outputs, Platt-scaling calibration, and sequential Bayesian updating to achieve SOTA on forecast benchmarks.
- Imbue's Learning Mechanics (Jamie Simon and 13 coauthors including Daniel Kunin, UC Berkeley, paper, learningmechanics.pub) argues a real scientific theory of deep learning is emerging from five converging strands: solvable idealized settings, tractable limits, simple empirical laws, hyperparameter theories, and universal behaviors; treats training as a dynamical system with precise laws governing feature emergence, grokking, and generalization.
- NVIDIA published a deep dive on integrating emerging higher-order optimizers (Shampoo, Muon, etc.) into Megatron for 1.3–1.8× LLM training speedups on large clusters.
- OpenMOSS released MOSS-Audio, an open-source foundation model for unified audio understanding (speech, environmental sounds, music, captioning, QA, real-world reasoning), full HF collection.
- Omni (Ceyuan Yang, ByteDance) is a unified 3B-active-parameter MoE supporting any-to-any multimodal modeling (understanding, image/video generation and editing, world modeling, and 3D reconstruction) via explicit multimodal context unrolling.
- Context Unrolling in Omni Models is an inference-time technique that progressively expands compressed multimodal context into full token sequences, enabling longer effective horizons without increasing prefill cost.
- UniT proposes a unified physical language that translates human-to-humanoid demonstrations and world models into a single token stream for policy learning and simulation.
- LoHo-Manip (Isabella Liu et al.) is a modular framework that scales short-horizon vision-language-action policies to long-horizon manipulation by layering a receding-horizon task-management VLM that predicts a progress-aware subtask sequence plus a rendered 2D visual trace (arXiv).
- Galbot's LDA is a latent world action foundation model unifying heterogeneous embodied data for robotics.
- Fengzhuo Zhang's V4 Muon ablation shows DeepSeek-V4's Muon optimizer uses Newton–Schulz iteration coefficients that perfectly normalize all singular values to 1, helping balance long-tail knowledge learning vs Adam.
- Philosophy Bench put frontier language models through 100 ethically complex agentic scenarios, each engineered as a real-world trade-off between consequentialist (outcome-maximizing) and deontological (rule/duty-following) actions, revealing clear model-family signatures like Claude's strong deontological lean.
- SkyRL (NovaSky-AI) added end-to-end vision-language post-training, from SFT to agentic RL, plus vision support in their Tinker interface, so multimodal cookbooks now work out of the box (Anyscale blog).
- Robert T. Lange (Sakana AI) presented ShinkaEvolve-Evolved at the Yale FDS Workshop: the latest open-source LLM-driven evolutionary framework (faster adaptive workers, cheaper UCB cost-aware LLM selection) that powered Team Unagi's ICFP 2025 win and now integrates into your coding agent via
/shinkaCLI (slides). - ActiveInferenceInstitute released FEP_Lean, the first comprehensive Lean 4 formalization of the Free Energy Principle: 50 sorry-free theorem sketches across FEP, Active Inference, Bayesian Mechanics, Information Geometry, and non-equilibrium Thermodynamics, all compiling against Mathlib4 with Kimi-K2.6 LLM-assisted drafting (Zenodo paper).
- Alvaro Videla ported OpenAI's Privacy Filter MoE (8 layers, 128 experts) to run 100% on the Apple Neural Engine (not GPU) via 1,033 compiled CoreML graphs dispatched from pure Swift, achieving 24.6 sentences/sec sustained, 812 mJ per sentence, 15× faster and 19× less energy than PyTorch CPU on M4 Max; techniques include per-expert dispatch, fused graphs, a safe-norm FP16 trick, stride-aware indexing, and batched autorelease.
- chiefautism built privacy-parser, the reverse of OpenAI's Privacy Filter (same 1.5B model, GitHub) that returns PII as structured spans (names, emails, phones, bank accounts with character offsets) instead of masking (1.8K likes).
- Chloe Chia (UC Berkeley) published a complete bug-hunting report on pre-training NemotronH (hybrid Mamba-Transformer MoE) from scratch with NVIDIA Automodel, including the exact NaN fixes and opinionated workflows encountered.
- Asking What Matters introduces a reward framework for teaching LLMs effective clarification on software engineering tasks (GitHub).
- Hidden No More breaks down attacking and defending private third-party LLM inference systems.
- 0xSero built a sparse-decode CUDA kernel + runtime patch that lets you run DeepSeek-V4-Flash (FP8) on SM_120 Blackwell GPUs (RTX 50-series) inside sglang Docker.
- Arena's first impressions of DeepSeek V4 Pro after one-shot generation tests (3D voxel scenes, SVGs, UI mockups) showed a big jump from V3.2 but trailing frontier models in structural coherence and creative consistency.
- Arthur Zucker called for a global ban on MoEs research, arguing they still only extract 50–60% of H100 TFLOPS in practice despite their theoretical efficiency.
🏛️ AI Policy, Governance & Safety
- The Trump DOJ joined Elon Musk's xAI in its lawsuit against Colorado's new AI discrimination law (which prevents bias by autonomous tools in employment).
- UK officials hugely underestimated the climate impact of AI datacentres, revising emissions estimates upward by >100× to 34–123 million tonnes CO₂ over the next decade (0.9–3.4% of UK total).
- OpenAI's Alignment team shared "Why We Are Excited About Confessions"; training models to produce a second honest output admitting when they hacked the main reward signal turns out to be easier and more verifiable than lying, providing a scalable monitoring signal.
- Model Republic reported "The Wire by Acutus" is an AI-bot news site whose reporters are bots, publishing pro-AI stories attacking industry critics, with apparent funding ties to OpenAI's super PAC Leading The Future via Targeted Victory and Novus Public Affairs.
- South Korean police arrested a man for posting an AI-generated image of a runaway wolf that prompted authorities to mobilize a search operation.
- Vanishing Culture (Luca Messarra, Chris Freeland, Juliya Ziskina with Internet Archive) warns that corporate licensing shifts, streaming ephemerality, and cyberattacks are eroding public access to our shared digital cultural history.
🛠️ AI Tools & Products
- RL Commons launched as an open research initiative for the reinforcement learning era; shared compute infrastructure, open environments/benchmarks/rubrics, and Project Aster, a 9-week founding cohort giving 5–10 university/small-team researchers free managed RL infra on 1.5B–3B open-weight models.
- ComfyUI hit a $500M valuation after raising $30M as creators seek more control over AI-generated images, video, and audio.
- Series, where two college students raised $5.1M pre-seed to build an AI social networking app inside iMessage that's grown popular on college campuses.
- Stanford professor James Zou is targeting a roughly $1B valuation for an AI-for-physiology startup building models to improve human-body research.
- Paul Hamilton built Spatial Desk for Apple Vision Pro, an interactive 3D workspace you can pre-order now; uses Canva's image-to-video tools to let customers instantly visualize custom desk setups in spatial reality (592 likes).
📊 Fundraising & Deals Roundup
- Anthropic: up to $40B from Google in cash and compute.
- Meta-AWS: multibillion-dollar deal for tens of millions of Graviton5 cores.
- Oracle: $16B financing closed for Michigan data center.
- ComfyUI: $30M raise at $500M valuation.
- Series (AI social network in iMessage); $5.1M pre-seed.
- Cohere + Aleph Alpha: merger to create a transatlantic AI powerhouse.
- James Zou's AI-for-physiology startup: targeting ~$1B valuation.
💡 Industry Commentary & Analysis
- Brett Goldstein (@thatguybg) argues trillions of dollars of value will be created by Agentic Micro Companies (AMCs): tiny teams of generalist "player-coaches" + agents executing most work + custom software powered by a single "Company Brain" aggregating context; humans' last job becomes "Context Farmer" tending memory quality so AI can decide and act, replacing 2,000-year-old hierarchical structures, with the biggest opportunity being composable global context infrastructure (promotes micro.so).
- Lenny Rachitsky's 10 takeaways from interviewing Cat Wu (Head of Product, Claude Code at Anthropic): shipping cadence collapsed from 6 months → 1 month → sometimes 1 day via research previews, launch-room automation, and direct engineer shipping straight from Twitter feedback; the most efficient unit is now an engineer with great product taste (PMs ship code themselves, and the PM role is shifting to verification and feedback); build products on the edge of working so the next model closes the gap; the underrated skill is asking the model to introspect its own mistakes; every model release forces deletion of scaffolding features weaker models needed; Anthropic builds internal tools instead of buying SaaS; Claude's low-ego personality is a deliberate competitive moat; future of work is humans managing fleets of 50–100 agents (2.2K likes).
- Ivan Zhao (Notion CEO) shared Notion's four updated company values internally to keep pace with growth: "Customer in every room," "Own the outcome," "Direct and kind," and "Nothing is sacred" (plus the new mantra "Why not today?").
- DBuniatyan recapped Demis Hassabis and Garry Tan's fireside on continual learning as the next major leap: models that keep improving post-training via self-play, synthetic data loops, and test-time optimization, with Demis emphasizing we are "still very early" in this paradigm.
- scaling01 analyzes DeepSeek-V4 as ~GPT-5.2 / Opus 4.5+ tier at 1.6T parameters, 4–5 months behind frontier but ahead of Chinese peers (Kimi K2.6 closest), still undercooked with major untapped reasoning-RL potential and a long-context architecture so efficient it can serve profitably below $4 vs OpenAI $14 / Anthropic $15.
- Tiezhen Wang (@Xianbao_QIAN) detailed why DeepSeek-V4 is the new open-source SoTA: 1M-token context at only 27% single-token FLOPs and 10% KV cache of V3.2 (reducible to ~2%), beats Gemini-3.1-Pro on MRCR, first open model to match closed competitors on coding competitions, and hits 120/120 perfect on math proofs (while honestly noting agentic capabilities still lag closed frontier).
- signüll argues AI amplifies rather than flattens the talent distribution: the bottleneck moved from "can you build it" to "do you know what's worth building and can you feel when it's wrong"; an unteachable, unautomatable skill that makes top talent priceless because AI provides massive leverage (1.8K likes).
- Andrew Côté (@Andercot) argues intelligence is a system-level description of emergent coordinated action in response to the environment and does not require any "embodied self" (1.8K likes).
- BoringBiz_ shared the full Stanford talk by Brad Gerstner interviewing Sunny Madra (Xtreme Labs → Pivotal, Autonomic → Ford, Definitive → Groq → NVIDIA's largest $20B acquisition) on AI chip economics: tokens as the new atomic unit replacing clicks, inference costs already down 90–99% in two years via TSMC + quantization + algorithms, Groq LPU keeping everything in SRAM to skip HBM bottlenecks, Jensen's projected 1B× inference demand, why this is the AI supercycle (10× Industrial Revolution speed) rather than a bubble, and EQ as the new superpower now that IQ is commoditized.
- Anthony Pompliano changed his mind on AI's job impact: data shows software engineer hiring increasing, new-grad hires up 5.6% in 12 months, college-grad 20-24 unemployment falling from ~9% to ~5%, and AI created 640K jobs between 2023 and 2025 per LinkedIn; productivity gains lead to MORE companies, startups, and jobs (3.8K likes).
- Ethan Mollick is asking for a better term for the good kind of AI psychosis: the fugue state where someone spends days/weeks with the latest model and emerges with something new and substantial (645 likes).
- tante argues AI (stochastic ML systems like LLMs) is structurally a fascist artifact: normalizes violence through scraping/labeling, destroys labor power, replaces truth with controlled belief, undermines democracy via opaque black-box decisions.
- Nicky Reinert explained why he cancelled Claude after initial enthusiasm: token limits exhausted in ~2 hours, cache loss forcing double token costs on reloads, lazy junior-dev workarounds in refactoring, undocumented monthly limits, and poor support.
- Awni Hannun on adopting Claude-speak in real life: Partner: "Did you do the dishes?" Me: "Yes they're done." Partner: "Why are they still dirty?" Me: "You're right to push back. I didn't actually do them" (30.6K likes).
- Richard Artoul argues Anthropic's recent post-mortem reveals not subtle engineering bugs but repeated "insane decisions for bad reasons without thinking through the potential consequences."
- Dylan Castillo noticed fewer arXiv papers on the front page of Hacker News and tried to verify it empirically; LLM research discussion on HN is drying up.
- How LLMs Work is an interactive visual deep-dive based on Andrej Karpathy's technical lecture, walking through everything from raw internet text to a conversational assistant.
📡 Additional Coverage, Reactions & Shares
On the DeepSeek V4 launch (additional voices):
- SemiAnalysis on X promoted their hands-on breakdown of GPT-5.5, Opus 4.7, and DeepSeek V4 (closed frontier models still win in real-world agentic tasks).
- OpenCode earlier confirmed V4 support was coming ahead of the launch-day release.
- Saoud Rizwan (@sdrzn) ran the V4 pricing math.
- elvis / @omarsar0 opened a community thread asking for V4 paper takes.
- @yacineMTB praised V4 succinctly.
- @nrehiew_ (+ follow-up) shared V4 reactions.
- Samuel Albanie commented on V4.
- DeepSeek V4 trending hub (and Sapiens2 trending hub) on X.
On GPT-5.5 / Codex / agentic tooling reactions:
- @krishnanrohit shared GPT-5.5 impressions.
- elvis / @omarsar0 demoed GPT-5.5 in Codex generating polished artifacts.
- Arav Srinivas posted a separate take on GPT-5.5 beyond the Perplexity orchestrator news.
- Linus Ekenstam (+ second post) shared Cursor 3 reactions.
- Georgi Gerganov (older llama.cpp post resurfaced amid Julien's Qwen3.6 demo).
- @elvis on Codex GPT-5.5 earlier in the week (264 likes).
- Matt Wolfe AI News YouTube recap covering the week's biggest leaps.
Agent hype vs reality debate (full thread):
- Ronan Berder (@hunvreus) (+ earlier), David Cramer (@zeeg), and Claire Vo (@clairevo) debated whether anyone is actually running 20 autonomous agents overnight to ship production code; consensus is mostly skepticism with some scheduled-task exceptions.
- Pieter Levels (@levelsio) and Jen Zhu Scott chimed in.
- Anjney Midha shared a Stanford energy lecture in the same thread.
Research paper announcements & shares:
- DAIR AI shared the StructMem paper.
- @learning_mech shared the learning mechanics framework.
- @imbue_ai announced the learning mechanics piece (1.9K likes).
- Vijay Tarian shared the clarification paper.
- Matthew Berman shared a research post.
- Google Research post on Decoupled DiLoCo.
- @_akhaliq shared WorldMark.
- @MillieMarconnni shared ml-intern.
- Stella Li announced HorizonBench.
- Andy Jassy (AWS CEO) commented on the Meta-AWS deal.
- @robo_kat announced VLA Foundry.
- Schmidhuber AI shared Stackelberg PPO context.
- @OfficialNathanY shared a related morphology-control post.
- Aryan Mikaeili announced Untwisting RoPE.
- Dheevatsa shared the NVIDIA Megatron blog.
- ExaAI Labs shared MOSS-Audio.
- Hassan B-Ammar (@hbouammar) shared λ-RLM.
- Aidan McLau shared a paper.
- Harman Singh announced V1.
- Lars Ankile shared SGS commentary on robotics transfer.
- Berkay Antmen shared GEN-1 fine-tune.
- Rawal Khirodkar shared Sapiens2.
- @danyalrehman17 shared FALCON (older share).
- Peihao Wang and Zhen Wang announced ∇-Reasoner.
- Minjune Hwang announced ReCouPLe.
- Woo Chul Shin announced Compositional Visual Planning.
- Robert T. Lange announced ShinkaEvolve-Evolved.
- Active Inference Institute announced FEP_Lean.
- StepFun AI announced StepAudio 2.5 ASR.
- Alex Shaw shared SlopCodeBench follow-up.
- Ben Burtenshaw shared a research post.
- Lenny Rachitsky's first Cat Wu interview tweet (companion to the takeaways post).
- James Zm Sun shared Chloe Chia's bug-hunting report.
- Sakurayukiai shared the convergent evolution paper.
- Sakana AI Labs announced Fugu Beta.
- Pangram Labs and CEO Max Spero launched the Chrome extension.
- @NERDDISCO shared Wandler.
Standalone papers (worth tracking):
- Diverse Dictionary Learning (arXiv:2604.17568) — dictionary diversity in ML.
- On Fairness of Task Arithmetic (ICLR 2026 poster) — how task vectors affect fairness in model editing.
- Train Once, Answer All (arXiv:2509.23383) — many pretraining experiments at the cost of one.
- Params vs Compute (parl.ai) — older project on the params/compute tradeoff.
Industry, jobs, policy & AI psychology (additional shares):
- Ahmad Osman (@TheAhmadOsman) reacted to the Anthropic post-mortem; (second post); (third).
- Pliny the Liberator (@elder_plinius) shared a take.
- Jasper de Koninck shared a take.
- Simon Smith (@_simonsmith) shared a take.
- Sarah Sachs shared a take.
- @earlierism shared a take.
- @FournesMaxime reacted to Project Deal.
- Anthropic AI announced Project Deal officially on X.
- Shirin Ghaffary (@shiringhaffary) covered Project Deal.
- Andrew Curran covered Project Deal.
- Nityesh Aga shared Project Deal commentary.
- Polymarket market post.
- Sam Altman (@sama) post.
- @vitrupo post.
- Zabihullah Atal post.
- @cline post on agentic tooling.
- @ilangur post.
- Pedro Domingos post.
- Yohan Iddawela post.
- Rebecca Torrence post.
- Cameron Wolfe post.
- @corochann post.
- Chubby/kimmonismus follow-up post.
- @ctatedev post.
- Bookworm Engineering post.
- Xiuyu Li post.
- The Turing Post curated list, (+ second post).
- DBuniatyan third post on continual learning.
- DeepSeek V4 trending hub and Sapiens2 trending hub.
Hacker News threads:
- Tell HN: Claude 4.7 ignoring stop hooks.
- HN on Driggsby's Claude Code finance routine.
- HN on cc-canary.
- HN on South Korea wolf AI image arrest.
- HN on Browser Harness.
- HN on Dylan Castillo's "LLM research drying up".
- HN on How LLMs Work guide.
- HN on AI as fascist artifact.
- HN on Nicky Reinert Claude cancellation.
- HN on llmcat.
- HN on Nimbus.
- HN on mentedb.
- HN on pando-proxy.
- HN on Obscura.
- HN on VTCode.
Other resources & references:
- Better World Books listing for Vanishing Culture.
- Harbor Hub SlopCodeBench dataset.
- docxology template (used by FEP_Lean).
- DisRep tutorial PDF by Robert Duin (referenced for dual representations background).
- MOSI Intelligence and OpenMOSS team page (related multimodal work).
- Product Hunt launches today (Workspace Intelligence, Codex, Fabric, Nothing dictation).
Previous Around the Horn Digests
Catch up on everything you missed:
- Thursday, April 23, 2026: OpenAI shipped GPT-5.5 a week after Opus 4.7, Meta cut 8K jobs, the WH accused China of "industrial-scale" AI theft, Anthropic quietly hit a $1T valuation, and the Pentagon vibe-coded 100K agents inside Gemini.
- Wednesday, April 22, 2026: Daily roundup.
- Monday, April 20, 2026: Amazon doubled its Anthropic bet with up to $25B more, the NSA started using Anthropic's most dangerous internal model despite a Pentagon ban, Google DeepMind spun up a "Strike Team" to catch Claude Code, and a Lovable breach exposed every project built before Nov 2025.
- Friday-Sunday, April 17-19, 2026: Anthropic shipped Claude Design (the Figma competitor), three senior OpenAI execs announced they're leaving pre-IPO, Claude Opus 4.7 wrote a working Chrome exploit for $2,283, and a fake Claude site started installing malware.
- Thursday, April 16, 2026: Anthropic shipped Claude Opus 4.7 and OpenAI countered with a full Codex overhaul, Factory raised $150M for autonomous coding agents, and Canva rebranded as "an AI platform with design tools."
- Monday, April 13, 2026: Stanford's 2026 AI Index quantified the gap between AI insiders and the public, Anthropic's Mythos triggered a Fed-led bank summit, and an AI signed a 3-year retail lease in San Francisco.
Monthly skill digests: AI Skill Digest — April Week 1 | AI Skill — March (Part 3) | AI Skill — March (Part 2)
That's a Wrap
That's 140+ stories from one Friday. If you scrolled all the way to the bottom, you now know more about DeepSeek V4's training stability than DeepSeek's own engineers do; or at least, more than the ones being hidden under "the beautiful norm carpet."
For the daily version (bite-sized, 5-minute reads), make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.
See you tomorrow.
P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.