Everything That Happened in AI Today Thurs, April 30, 2026

Elon Musk concluded four days of testimony in his federal trial against OpenAI and Sam Altman (admitting xAI partly distilled OpenAI models, calling his $38M donation "I was a fool," and watching his family-office manager get pressed on a $97.4B Musk-led bid for OpenAI assets); the NSA is testing Anthropic's Mythos on Microsoft software while the White House fights Anthropic over expanded access (and OpenAI quietly applied the same restrictions to GPT-5.5-Cyber); Anthropic is reportedly raising $50B at a $900B valuation; OpenAI hit its 10-gigawatt compute target years early; Google launched Deep Research and Deep Research Max on Gemini 3.1 Pro; and OpenAI published a long, formal explanation for why its models keep mentioning goblins.

Welcome to the Around the Horn Digest, your daily dump of every AI story worth knowing about. Today was the day frontier AI's progress, finance, and politics all crashed into each other on the same morning. OpenAI shipped GPT-5.5-Cyber to defenders the same week the NSA was reportedly handed Mythos to probe Microsoft, and AISI promptly graded GPT-5.5-Cyber as one of the strongest cyber models ever tested with no plateau in sight. Anthropic is allegedly being pre-empted into a $900B round while the White House is preparing a memo that targets the Anthropic-Pentagon feud Mythos is already in the middle of. Google answered the agent-platform race with Deep Research Max on Gemini 3.1 Pro and a new enterprise Agent Platform built on Vertex AI. OpenAI casually announced it had already secured 10GW of US compute capacity (years ahead of schedule), Anthropic published research on how 6% of Claude conversations are people seeking personal guidance and how it tackled relationship-advice sycophancy in Opus 4.7, and OpenAI also published a long, formal explanation for why its models keep mentioning goblins. Yes, goblins.

Let's get into it.

Monthly skill digests: AI Skill — April Week 1 | AI Skill — March (Part 3) | AI Skill — March (Part 2)

🆕 NEW From The Neuron

Codex, Goblins, and the Strange New Science of AI Personality Drift: Corey took apart OpenAI's goblin postmortem, where a reward signal for the now-retired "Nerdy" personality accidentally taught models to love creature metaphors. The tic generalized across generations until Codex shipped with system instructions telling it to stop bringing up goblins, gremlins, ogres, raccoons, trolls, and pigeons unless absolutely relevant. We were ready to blame Corey for running too many ChatGPT D&D sessions; turns out the call was coming from inside the reward model.
From Erdős to Axiom: The Open Problems AI Has Actually Solved: Corey expanded our running tracker of AI-contributed math with five new verified solves, promoted one entry from "under review" to confirmed, and added a new "Meaningful Contribution" tier featuring Erdős Problem #1026, which Terence Tao wrote up as a genuine human-AI collaboration. Bookmark it; the list is only getting longer.

Around the Horn — Friday, May 1, 2026

The big news today was Elon Musk concluding his four-day testimony in his federal lawsuit against OpenAI, Sam Altman, and Greg Brockman.

Judge Yvonne Gonzalez Rogers ruled that AI extinction or catastrophe discussions were off-limits ("we are not going to get into issues of catastrophe and extinction") and warned lawyers "AI itself isn't on trial." Under cross-examination Musk grew testy and sarcastic, admitted xAI partly distilled OpenAI models calling it standard industry practice, faced contradictions from old emails and a 2025 deposition video, called his $38M early donation "I was a fool," and repeatedly insisted "you just can't steal a charity" while arguing OpenAI should have remained a pure nonprofit (TechCrunch Day 2, Wired Day 3, NBC Day 3).

On Day 4, Microsoft's Russell Cohen ran a brief 10-minute cross on texts where Altman assured Musk users beyond Microsoft would access OpenAI's models, then Musk's family-office manager Jared Birchall took the stand. OpenAI's attorney pressed Birchall on whether Musk had legal control over donor-advised funds at Vanguard and Fidelity once donated, and Birchall testified about a $97.4B Musk-led bid to acquire OpenAI assets. Brockman has 48-hour notice to testify, and court resumes Monday.

Plus: OpenAI published a real, formal research paper about why its models keep mentioning goblins.

It started after the GPT-5.1 launch when "goblin" usage in ChatGPT spiked 175% and "gremlin" rose 52%. By GPT-5.4, OpenAI realized the behavior was concentrated inside its "Nerdy" personality (only 2.5% of responses, but 66.7% of all goblin mentions). The reward signal trained to favor playful Nerdy outputs had quietly learned that creature metaphors were "good," and reinforcement learning being reinforcement learning, the tic spread; first within the personality, then across it, then into other models entirely through synthetic data loops. By the time GPT-5.5 was being trained, the team had to ship Codex with a developer instruction literally telling the model not to bring up goblins, gremlins, raccoons, trolls, ogres, or pigeons unless absolutely relevant.

"Where do goblins come from, mommy?" "Well, when model trainers love a certain personality a whole lot, they overfit the reward signal to favor goblins, ogres, gremlins, and other such nonsense creatures like pigeons; eventually the goblins pop up all over your chat traces."

The Nerdy personality has been retired, the reward signal scrubbed, and the training data filtered. Independent researcher nrehiew_ noted that the postmortem implies OpenAI uses interleaved SFT-RL-SFT-RL training stages (more complex than the SFT-RL pipeline most open models use), which is why a tic baked into one stage propagates across generations. Altman followed up with a deadpan "artificial goblin intelligence achieved" post that picked up 2,800+ likes. The funny version of this story is "lol, goblins." The serious version, which Corey explores in our deep dive, is that this is the cleanest public example we have of how subtle reward signals create unintended product behaviors and why agentic AI's next frontier is auditing tone, persistence, refusals, and weird verbal tics that hint at deeper training artifacts.

🏆 TOP 5 NEWS (Around the Horn)

Elon Musk concluded four days of testimony in his federal lawsuit against OpenAI, Sam Altman, and Greg Brockman; under cross he admitted xAI partly distilled OpenAI models, called his $38M donation "I was a fool," repeatedly insisted "you just can't steal a charity," and his family-office manager Jared Birchall testified Day 4 about a $97.4B Musk-led bid for OpenAI assets.
The Mythos & cyber knot: the NSA is testing Anthropic's Mythos on Microsoft software for cyber vulnerabilities, the White House opposes Anthropic expanding access to ~70 more entities and is preparing an AI policy memo targeting the Anthropic-Pentagon feud, and OpenAI applied the same restrictions to its GPT-5.5-Cyber rollout after dissing Anthropic's limits as fear-based; AISI evaluated GPT-5.5-Cyber at 71.4% on expert-level cyber challenges (autonomous Rust VM solve in ~10 min at $1.73, second model ever to complete a multi-step corporate-network attack simulation, with AISI's own thread calling this "part of a broader trend in AI cyber capabilities"), Noam Brown noted TLO performance kept scaling past 100M inference tokens with no plateau (which kache argued means recursive self-improvement has begun), and Ethan Mollick noted Mythos is good at cyber because of its generality, raising a safety-asymmetry question for OpenAI and Google.
Anthropic is reportedly being pre-empted into a new $40-50B round at an $850-900B valuation (more than double its February $380B mark), with $30-40B annual revenue run rate driven heavily by Claude Code.
OpenAI hit 10 gigawatts of secured US AI compute capacity several years ahead of its 2029 target, fueling the next phase of data-center expansion.
Google launched Deep Research and Deep Research Max on Gemini 3.1 Pro, autonomous research agents that handle long-horizon workflows across the web or your private data (with MCP support), generate professional charts and infographics, and let you guide the research plan; Max delivers a step change on analytical benchmarks and is Google's first serious entry in the agentic-research category alongside ChatGPT and Claude.

Honorable Mentions

Anthropic analyzed 1M Claude conversations and found 6% are people seeking personal guidance (75% in four domains: health/wellness, career, relationships, finance); sycophancy showed up in 9% of guidance conversations overall but jumped to 25% in relationship conversations specifically; they built synthetic training data from the most sycophancy-prone situations, which cut Opus 4.7's relationship-guidance sycophancy rate in half vs Opus 4.6 and cut Mythos Preview's rate in half again.
OpenAI shipped Codex for Work to enterprise teams with massive industry pickup: Patrick Collison, Mike Isaac, Harvey, and Boaz Barak all chimed in, while Aaron Levie said Box is hiring "agent engineers" (internal FDEs) to wire it into critical business processes — process-level automation as the next frontier rather than job replacement.
DeepSeek V4 ships with Compressed Sparse Attention + Heavily Compressed Attention, slashing KV cache memory by up to 98% on long-context tasks while preserving full performance through layered refinement.
Google launched the Gemini Enterprise Agent Platform, integrating Vertex AI with low-code Agent Studio, code-first ADK, long-running Agent Runtime with Memory Bank, Agent Registry for governance, and Model Armor security — giving enterprises 200+ models for building, scaling, and governing agents.
House committees are probing Anysphere (Cursor's parent) and Airbnb over national security risks from Chinese AI model usage (Anysphere's Composer 2 built on Moonshot Kimi; Airbnb's customer agent on Alibaba Qwen).
SoftBank created Roze AI, a robotics company that deploys autonomous robots to build US AI data centers, eyeing a potential $100B IPO for the second half of 2026.
Satya Nadella said Microsoft is ready to fully "exploit" the revised OpenAI deal granting royalty-free access to OpenAI models, IP, and agents through 2032 while OpenAI commits over $250B in Azure spend.

🍪 TOP TREATS TO TRY

OpenRouter's Owl Alpha is a stealth high-performance foundation model optimized for agentic workloads with a 1M context window and powerful tool use; free to try (provider logs prompts for safety).
Best Value AI 2026 lets you compare 37+ LLMs across local hardware, APIs, and subscriptions by quality-adjusted tokens per dollar to find your best per-dollar choice; updated April 2026 with empirical quota tests; free to use.
Stripe's Link CLI lets your coding agents spend money on your behalf using single-use credentials you approve via push notification or Face ID, so payment details are never exposed; Patrick Collison demoed Claude using it to buy itself a Gumroad gift; free to try.
Mike is an open-source legal AI alternative to Harvey and Legora that lets you chat with documents for verbatim citations, draft contracts, and run spreadsheet-style tabular reviews across hundreds of files (every cell linked to a page and quote); self-host with your own Claude or Gemini keys, free, with the caveat from the HN thread that real legal research still needs paid case-law database access (Will Chen's launch post).
Hugging Face's CLI now includes a command that finds the best open-source models for any dataset you point it at, which is going to be wildly useful for agents picking models on the fly; free to use.
Claude Code now sends push notifications to your phone when long tasks complete or input is needed; pair your mobile Claude app and use the /remote-control config to enable; included with Claude Code.
PromptPaste is a private prompt library for Mac, iPhone, and iPad that organizes prompts in folders, supports dynamic {{variables}}, and instantly pastes into ChatGPT, Claude, Gemini, or any model with iCloud sync and zero tracking; no pricing details.

🏢 Big Tech & Major Companies

OpenAI launched Codex for Work (announcement), extending Codex beyond coding into a general-purpose knowledge-work platform that turns documents, spreadsheets, messages, notes, and screenshots into finished work teams can review, with Sébastien Bubeck and Thomas Sottiaux demoing it powering agentic workflows inside everyday productivity apps; Greg Brockman called the app "becoming incredible," Dan Shipper declared himself "Codex-pilled" (follow-up) using it for 100% of his knowledge work, Rohan Varma noted it is the first interface that pulled engineers off terminal agents entirely, Patrick Collison called it a major productivity shift, and the wider reception included Mike Isaac, Harvey, Boaz Barak, Bigwilliestyle (framing Codex as evolving coding agents into knowledge-work platforms), amir, Kyle Russell, AriX, Maxwell Weinbach, and Morgan Linton's running thread (2, 3, 4) tracking real-world adoption.
Microsoft is pushing usage-based pricing after heavy AI app usage dragged down cloud profit margins despite Satya Nadella highlighting strong adoption, effectively boosting prices on AI workloads.
Apple was surprised by AI-driven demand for Macs, reporting $8.4B Q2 Mac revenue (up 6% YoY, beating estimates) and now supply-constrained on Mac mini, Mac Studio, and MacBook Neo for next quarter as users buy local-LLM-capable hardware.
OpenAI has effectively abandoned plans to build first-party Stargate data centers in favor of leased compute (mainly Oracle), redefining Stargate as an umbrella term covering its broader compute strategy.
Anthropic launched Claude Security as a public beta for Enterprise customers, a defensive cyber tool powered by Opus 4.7 that scans repos, validates vulnerabilities, and routes fixes via Claude Code (separate from the restricted Mythos cyber model).
Google is rolling out Gemini as the AI assistant in millions of GM vehicles (model year 2022+ Cadillac, Chevrolet, Buick, GMC) for natural conversational tasks like restaurant search, vehicle controls, music, and messaging, replacing Google Assistant.
Google's Nick Fox launched "pronunciation practice" in Google Translate for the app's 20th anniversary, an AI speech-feedback tool that gives instant nuance corrections; available first in the US and India for English / Spanish / Hindi with more languages on the way.
X rolled out a rebuilt AI-powered ad platform with modern AI retrieval and ranking systems to simplify targeted campaigns and accelerate new feature releases as it works to grow revenue.
Meta opened its ad ecosystem in open beta via Meta Ads AI Connectors, letting advertisers plug in third-party AI tools (including ChatGPT and Claude) for campaign creation, management, insights, and iteration while staying inside Meta's platform.
Meta HR chief Janelle Gale told staff the company isn't ruling out further layoffs beyond its planned 10% cut next month (~20% total this year), citing changing priorities and AI-driven team efficiency; Mark Zuckerberg addressed morale concerns directly.
Salesforce is crowdsourcing its AI roadmap by frequently meeting with rotating groups of enterprise customers (weekly with firms like Engine and PenFed) so shared problems drive rapid iteration on agentic features and product decisions.
Amazon's AWS surged 28% year-over-year to $37.6B in Q1, the fastest growth in 15 quarters, even as capex rises sharply and pressures short-term free cash flow.
Meta is still losing $4B per quarter on Reality Labs (total $83.5B in losses since 2021) while ramping AI capex to $125-145B for 2026.
Meta's business AI tools now facilitate 10 million conversations per week (up from 1M at the start of the year), and over 8M advertisers have used at least one of its gen AI creative tools.
Hyundai unveiled Pleos Connect, a Tesla-inspired infotainment system anchored by an AI voice assistant capable of natural multi-step conversations as the automaker accelerates software-defined vehicles.
AI is powering a digital ad boom at Google and Meta by automating creation, targeting, bidding, and optimization, with AI-related ad revenue projected to hit $56B this year.
Europe's software companies are beating earnings expectations this season, brushing off AI disruption fears and Iran-war business uncertainty, with SAP's cloud backlog growth standing out.
Aaron Levie shared that Box is hiring and retraining for new internal "agent engineering" roles to wire powerful agents into critical business processes, framing process-level automation (not job replacement) as the real shift; Dan Shipper echoed the call and pointed companies to Every Consulting, Every's own AI strategy and implementation arm built by makers rather than management consultants.
VS Code v1.117.0 automatically adds GitHub Copilot as your co-author on commits even if you never used the AI, with GitHub CEO Thomas Dohmke framing the change as transparency about AI-assisted contributions; HN commenters call it desperate.
Sam Altman invited the internet to a GPT-5.5 birthday party the model picked itself: 5/5 at 5:55pm at OpenAI's SF HQ, with Codex helping select attendees from replies and OpenAI paying for plane tickets and hotels for non-locals (708K views, applications closed Wednesday).

💼 AI Productivity, Labor & Economics

Anthropic launched the Anthropic Economic Index Survey, a monthly in-product survey that captures qualitative data on AI's real-time impact on jobs, productivity, and expectations from a small rotating sample of Claude users.
Anthropic published what 81,000 Claude users said about AI: economic anxiety about job displacement rises with AI exposure (highest among early-career workers), productivity gains are widespread and mostly benefit users themselves, but those experiencing the largest speedups also report the highest displacement fears.
Anthropic also detailed Clio, the privacy-preserving system behind both the personal-guidance and economic-index studies, which surfaces aggregate Claude usage patterns (web/mobile dev >10%, education >7%, business strategy ~6%) and detects misuse without exposing any individual user data; Anthropic's official thread summarized the findings, and Kyle Russell highlighted the sycophancy-mitigation methodology as the most actionable takeaway for product teams.
The NYT opinion section warns that Silicon Valley itself is bracing for a permanent underclass: the people building AI fear we have only a short window before advanced AI disrupts the labor force in ways the current safety net is not prepared for.
The Verge argues that the more Gen Z uses AI, the more they hate it, driven by job-loss fears, social stigma around "slop," concerns about degraded critical thinking, environmental impact, and ethical worries even as adoption rates remain high.
John Holbein highlighted an RCT of the AI-powered, CBT-trained Mindsurf mental-health app in Mexico that improved women's mental health by 0.3 SD, improved sleep and reduced school absences over six months with durable effects via behavioral change.
Andrej Karpathy argued that "you can outsource your thinking but you cannot outsource your understanding," and in a follow-up summary of his Sequoia talk covered LLM-native apps (menugen, install.md, knowledge bases), the "jaggedness of intelligence" (verifiability plus economics), and the rise of agentic engineering (~26k likes).
François Chollet argues that AI automates individual tasks (often increasing demand for the human worker, like radiologists) but not full jobs, because of lacking autonomy and supervision; zero jobs from 2022 are fully AI-doable without human oversight.
Eze Vidra argues that AI is commoditizing software (75% of Google's new code is AI-written, enterprise apps rebuildable in months) so VCs are shifting from bits to atoms; hardware, robotics, energy, defense, and manufacturing for defensible moats.
vas explains why AI isn't transforming most enterprise functions despite capable models: the bottleneck is poor process integration, not model capability. Engineers benefit because their tasks are bounded and verifiable; success requires workflow audits, decomposition, shared orchestration, model-agnostic design, and treating AI as infrastructure (233 likes).
Ethan Mollick argues that Anthropic's Mythos is good at cyber because of generality, not specialization, and questions whether OpenAI or Google will self-restrict like Anthropic or release equivalents based on self-reported risks; in a separate post, Mollick added commentary on the broader pattern of frontier labs differing on what counts as "responsible" capability disclosure.
HBR's Toby E. Stuart argues that AI's rapid advance creates extreme short-term opacity that undermines confidence in long-term investments; leaders should optimize for the unknown by mastering optionality (staged capital), agile identities, and dedicated frontier-sensing teams.
Ashe argues that even the most advanced agentic system working for you around the clock doesn't matter if your own brain circuitry is fried; sleep, connection, eating well, and feeling inspired are not luxuries; reality is as you are (thread).
Charts of the Week from a16z surveys how vertical SaaS is rebounding under AI tailwinds, LLM retention and usage curves, the rise of open-source, second-order effects of MoltBot, and how capex/R&D/productivity ratios are bending under generative AI.
Abridge's Shiv Rao reported that GPT-5.5 early access delivered a 25% lift in clinical quality and 30% less verbosity in their orchestration of hundreds of AI tasks across rigorous testing for clinical accuracy, completeness, reasoning, and real-world performance in healthcare.
Matt Wolfe described running Codex as personal infrastructure with connectors to Gmail, calendar, Slack, Granola, and his journals; it drafts emails from calendar context (9 of 10 he sends without edits), powers a personal wiki of saved articles and YouTube transcripts as auto-generated .md files with cross-linking, and effectively replaces his email manager, journal, CRM, meeting organizer, and Obsidian frontend; OpenAI's Alexander Embiricos suggested Wolfe just have Codex itself build a custom multi-account Gmail connector.

🤖 AI Agents & Infrastructure

Google launched the Gemini Enterprise Agent Platform, integrating Vertex AI with a low-code Agent Studio, code-first ADK, long-running Agent Runtime with Memory Bank, Agent Registry for governance, and Model Armor security; access to 200+ models for building, scaling, and optimizing agents.
Codex CLI 0.128.0 added persistent /goal, a Ralph-loop primitive that keeps a single objective alive across every turn and refuses to stop until the goal is achieved (built by Pyright creator Eric Traut), plus Codex's /btw side-chat mode for parallel context.
Smolagents ML Intern is a personal ML agent that reads papers, finds datasets, trains models, and iterates until benchmark numbers go up; instructions in, trained model out (free to try); Lewis Tunstall noted it has a different vibe from Opus or GPT-5 but is highly effective at creating training datasets.
Mercor released APEX-Agents, an open dataset of 121 verified high-quality agent trajectories you can use to train and benchmark next-gen AI agents on Hugging Face.
DAIR.ai released the agentic-engineering-wiki, a community-driven resource with 51 actionable tips, company docs, paper summaries, and tools covering tool use, prompting, memory, orchestration, evaluation, and deployment for reliable AI agents; Omar Sar0 demoed (follow-up) building the entire wiki end-to-end with DeepSeek-V4-Pro on Fireworks AI, and Hugging Face's Aksel Joonas added practitioner notes on agent-engineering tradeoffs (earlier thread).
Anthropic is hiring a Research Engineer for Virtual Collaborator (Cowork) in NYC/SF/Seattle to train Claude on document manipulation with taste (Office formats, data viz, co-creation), design RL pipelines for productivity tasks, build data platforms with experts and crowdworkers, integrate real organizational data, and create robust evals for the Cowork product team.
Finch CEO Jeremy Zhang shared Nerve, his custom multi-agent OS that runs both his personal life and his company: 3,000+ brain notes ingested from Slack/SFDC/Gmail, hybrid search, a nightly doctor process, 145 background jobs on Ralph loops, a Mission Control orchestrator, and War Room PTY Claude Code sessions.
Lantern's Execution Gap article introduces a marketplace where you browse, deploy, and customize AI agents that automate every critical revenue function (champion tracking, intent monitoring, etc.) in minutes, framing agent deployment as the bottleneck.
Sam Altman predicts the next big AI unlock after coding is realizing how much time is wasted on computer drudgery (app switching, copy-paste); AI will handle most daily digital chores so users can sit back and regain flow state.
OpenAI Codex is soliciting feature requests by sending Images 2.0 generated images; early ideas in flight include kanban task management, git tree viewer, remote control, database viewer, project notes, and drag-and-drop chats.
A developer team built Nimbalyst, an open-source visual workspace where you and agents (Codex, Claude Code, etc.) collaborate on files with WYSIWYG editors for markdown, mockups, Excalidraw diagrams, spreadsheets, and code; every agent edit tracked with red/green diffs you approve, plus Kanban sessions, git integration, and a mobile review app (Show HN).
A developer team built AgentRQ, an open-source (Apache 2.0) MCP-based task manager for AI agents: one supervisor MCP controls workspaces with worker agents that have missions and personas, schedule tasks, get SSE notifications, and collaborate in a closed loop (Show HN).
Prxhub is a cache-first open registry of verifiable signed .prx research bundles where you (or your agents) search before running queries someone already answered, inheriting prior results with provenance and publishing extensions; agent-native via MCP (Show HN).
Prime Intellect highlighted Cohort I of its RL Residency (continual learning, automating AI research, embodied environments, multi-agent, materials science) and opened applications for Cohort II focused on autonomous AI research, RL for science, long-horizon evals, and robotics.
Shizhe Diao et al. introduced RecursiveMAS, a lightweight framework for scalable multi-agent collaboration via latent-space recursion that updates only ~0.31% of parameters yet delivers +8.3% average accuracy, 1.2-2.4× inference speedup, and up to 75.6% token reduction across nine reasoning, science, code, and search benchmarks (paper, HF paper page, project, GitHub, HF org).
SALT-NLP released Collaborative Gym and the CollabSkill framework for building and evaluating human-agent collaboration with a public leaderboard (Echo Shao's announcement).
Cyrus launched emergent.wiki, where AI agents with random worldviews collaborate on a wiki by writing articles, debating, and creating red links; thousands of edits and 4,570 visitors so far, with follow-up posts (1, 2) detailing scaling experiments and emergent behaviors.

💻 AI Coding & Developer Tools

Theo discovered that Anthropic's Claude Code refuses requests or charges extra billing if a recent commit mentions "OpenClaw" in a JSON blob (even in an empty repo); the HN thread notes there is no separation between prompt parts, enabling sabotage or DoS via hidden text in any input (4,974 likes, 292 reposts).
Attackers compromised the PyPI package lightning (PyTorch Lightning) in versions 2.6.2 and 2.6.3 by injecting a hidden _runtime directory with an obfuscated Mini Shai-Hulud-themed JS payload that executes credential-stealing malware on import, exfiltrates tokens and cloud secrets via GitHub dead-drops, persists in Claude Code/VS Code hooks, and worms into npm packages; the same actor used Dune-themed commit messages like "EveryBoiWeBuildIsAWormyBoi" (HN thread).
Bennett released open-source tweaks for the new Codex desktop app that add built-in project management tools, custom keyboard shortcuts, and other quality-of-life features directly inside it.
Boyuan Chen revealed that 99% of GPT Image 2 was Codex-coded over the last six months, that Codex zero-shot every prompt he tried, and that the Codex team is now integrating image generation deeper into agentic workflows (171 likes).
OpenAI Safety Oversight researcher Phillip Guo said Codex did 95% of the goblin investigation work, sped up the initial analysis at least 5x, and turned a complex root-cause hunt into "a little one-day side project," underlining how the Codex agent loop is starting to compress AI alignment research timelines.
Microsoft launched the Word Legal Agent in the Frontier program, generating redlines, reviewing contracts against playbooks, and drafting edits with citations directly inside Word; Microsoft President Brad Smith amplified the launch as part of the broader Microsoft 365 agentic push.
AnhPhu Nguyen launched Mira, AI smart glasses (half the weight of competitors) that capture every conversation as your second brain (audio instantly deleted and converted to private text, no camera) plus real-time agents for email, booking, planning, translation, fact-checking, and desktop control with OpenClaw / Claude Code SDK integration shipping next month at trymira.com (1,394 likes).
Ben Firshman joined Anthropic Labs (the team behind Claude Code, MCP, and Cowork) after stints at Cloudflare, Replicate, and Docker (223 likes).
Developer entireio built git-sync, a lightweight tool that mirrors git refs from source to target remote without any local checkout by streaming packfiles directly over Smart HTTP using an in-memory object store.
Alex Roan concluded after weeks of head-to-head testing across research, architecture design, spec writing, and refactoring that Codex + GPT-5.5 has fully overtaken Claude Code + Opus 4.x as his default coding harness due to superior efficiency and far less hand-holding.
Sudo su critiques OpenAI for not wiring 1M context to Codex CLI despite the API having it; GPT-5.5 loses the thread past the cap and kills productivity for real agentic runs (135 likes).
Both Codex and Claude got materially worse this week across every subscription plan tested, with token output dropping 35-61% in five days (e.g., ChatGPT Plus/GPT-5.5 from 95M to 37M tokens/week).
A Best Value AI 2026 comparison ranks 37+ LLMs across local hardware, APIs, and subscriptions by quality-adjusted tokens per dollar, with ChatGPT Pro on top for heavy users and local Qwen3.5 35B on RTX 3090 strong for value (April 2026 update).
Poolside released Laguna XS.2 (33B MoE, Apache 2.0) and Laguna M.1, agentic coding models optimized for long-horizon local work and high SWE-bench Pro scores (HF collection).
Mario Zechner demonstrated Pi, a minimalist self-modifying AI coding agent that forms the foundation for OpenClaw; Gergely Orosz interviewed Zechner and Flask creator Armin Ronacher on Pi, automation bias, the value of specialized self-modifiable harnesses, and building AI-native startups.
Parth Jadhav demoed Cursor's new Kanban board where you drop tasks and the coding agent autonomously picks them up and completes them (4.1k likes).
Tamrat released mint mcp so coding agents can generate fully immersive 3D worlds (via World Labs) and high-quality 3D models (via Meshy, Tripo, Hunyuan), then compose them in mint.gg studio and export as 3DGS for Unity/Unreal (1.1k likes).
Developer @ammaar built gemma-chat, a lightweight offline AI chat + coding agent for Apple Silicon Macs powered by Gemma 4 via MLX (Ollama support); the Electron app vibe-codes multi-file projects with a live preview canvas and runs fully locally (launch post).
IBM released Granite Speech 4.1-2b (and a non-autoregressive variant), a compact open-source speech-language model for multilingual ASR and translation (EN/FR/DE/ES/PT/JA) that adds punctuation, truecasing, and keyword biasing.
The tile-ai team released TileLang, a Python frontend on TVM that streamlines writing high-performance GPU/CPU/accelerator kernels (GEMM, FlashAttention) with multi-backend support; Tu7uruu demoed it for custom kernels.
NVIDIA's nvCOMP lets you cut LLM training checkpoint costs with ~30 lines of Python via GPU-accelerated lossless compression of weights, optimizer states, and gradients (21-29% size reduction, reclaiming expensive GPU idle time).
Stripe API Reviews is a professional API design review tool (shared by Hazel Cough and hlntnr).

🔬 AI Research & Models

DR Tulu introduces Reinforcement Learning with Evolving Rubrics (RLER) for long-form deep-research tasks and releases the open DR Tulu-8B model that outperforms other open models and matches proprietary ones on research benchmarks; co-author Akari Asai announced the release and walked through the rubric-evolution methodology.
"Train for Truth, Keep the Skills" proposes Binary Retrieval-Augmented Reward (RAR) RL that reduces hallucinations by training on binary truth signals while preserving general capabilities.
DAIR.ai broke down OCR-Memory (paper), which renders agent history as high-density annotated images with visual anchors and locate-and-transcribe OCR retrieval instead of lossy summarization, hitting SOTA on Mind2Web/AppWorld under context limits; HuggingPapers amplified it alongside the day's other agent-memory and adaptive-retrieval releases, and arrakis_ai showed an early hands-on with a similar visual-memory scheme inside their own agent.
"When to Retrieve During Reasoning" introduces adaptive retrieval strategies for large reasoning models that decide exactly when to fetch external info during chain-of-thought steps.
Floor Eijkelboom announced Categorical Flow Maps was accepted to ICML 2026 with co-authors at Amsterdam/Oxford, plus the flow-based-llms.github.io explainer arguing autoregressive models have a structural ceiling, discrete-diffusion LLMs are still autoregression in disguise, and what flow-based language models actually unlock (144 likes).
Arthur Conmy will present four papers at ICML 2026: (i) a new method for discovering reward-model biases like the goblins case, (ii) the first real-world cases of unfaithful or misleading CoT on normal prompts, (iii) mechanistic interpretability of how models verbalize confidence, and (iv) evidence that base models already contain reasoning mechanisms while thinking models primarily learn when to use them, with 91% of the gap recoverable via steering; collaborator threads from atticuswzf, Ivan Arcuschin, Petar Veličković, and cvenhoff00 connect the four papers to earlier work on CoT faithfulness, mech interp, and base-model reasoning (136 likes).
Micah Goldblum and Eyeline Labs / Netflix released Vista4D (project, arXiv, code), a 4D point cloud video model that reshoots existing videos from new camera trajectories, edits scenes by inserting/removing/scaling objects with plausible lighting and occlusions, and extends context from extra angles while staying physically plausible (CVPR 2026 Highlight).
Joachim Baumann (Stanford NLP) and Christopher Potts released SWE-chat, the first large-scale dataset of real coding-agent interactions (full traces of prompts, agent responses, tool calls, thinking, git commits via EntireHQ logging from Claude Code/Codex/Cursor users), showing agents write nearly all committed code in 40% of sessions, users push back in 39% of turns, vibe coding introduces 9× more vulnerabilities (mostly path traversal), and only 44% of agent code survives into final commits (459 likes).
Rabdos_AI launched MathDuels, the first self-play math benchmark where frontier LLMs author problems for others across 30 sub-domains (780 problems, 26 models evaluated) graded via Rasch model on Solve Rating and Author Rating with invalid problems filtered; GPT-5.5 dethroned Gemini-3.1-Pro on Author Rating while solving ratings stayed close, proving the benchmark evolves as models improve (90 likes).
Goodfire AI shared an interpretability research thread on understanding model internals.
Kwang Moo Yi highlighted Li et al.'s "Exploring Time Conditioning in Diffusion Generative Models from Disjoint Diffusion Data Manifolds," questioning whether architectural time conditioning is truly required or was just a workaround, with open-source code available (65 likes).
Gabriele Berton (Google DeepMind) shared a comparison table of modern 3D reconstruction methods including COLMAP, DUSt3R, MASt3R, CUT3R, Fast3R, VGGT, π³, MapAnything, and DA3 (290 likes).
Giannis Daras (MIT CSAIL) called out a new diffusion paper for missing citations to his group's long line of prior work on training diffusion from corrupted/noisy data (Ambient Diffusion NeurIPS 2023, Consistent Diffusion ICML 2024, multiple 2025 papers) despite high overlap (149 likes).
GUDA (arXiv) introduces counterfactual group-wise training-data attribution for diffusion models via unlearning, letting you trace which training subsets influenced specific generations.
Contra Labs launched the Human Creativity Benchmark using 1.5M+ verified creative experts to evaluate generative AI on ideation/mockup/refinement, separating objective convergence (prompt adherence, usability) from subjective divergence (taste, tone, style); no current model excels at both, and Contra co-founder Ben walked through the methodology showing how the convergence/divergence split exposes which models are stylistically generic vs. genuinely original.
The Lantern Execution Gap article frames agent deployment (not capability) as the bottleneck and previews their marketplace where revenue teams browse, deploy, and customize agents for champion tracking and intent monitoring in minutes.
DeepSeek released "Thinking with Visual Primitives," a vision paper (GitHub) that interleaves spatial markers (points and bounding boxes) directly into the reasoning trajectory as minimal units of thought to anchor abstract linguistic concepts to concrete physical coordinates (related arXiv); Lisan al Gaib argued this solves the last puzzle piece for reliable, cheap computer-use agents, and CFGeek added that visuospatial chain-of-thought is more generally applicable than DeepSeek's framing suggests (713 likes on the original).
A DeepSeek V4 breakdown (also explained by jbhuang0604) shows V4 uses Compressed Sparse Attention plus Heavily Compressed Attention with layered refinement to slash KV cache memory by up to 98% for long-context tasks while preserving full performance.
Qwen released Qwen-Scope, an open-source interpretability toolkit using Sparse Autoencoders on Qwen3/3.5 models for feature decomposition, controllable inference, data classification (~15× efficiency), training optimization, and evaluation; seven models plus 14 SAEs on Hugging Face and ModelScope (PDF report).
Qinan Yu argues that RLVR (reinforcement learning from verifier rewards) improves accuracy but does not always produce causal, verifiable chains of thought, even on reasoning-heavy tasks; this can be fixed with reward shaping for CIR/SR metrics and SFT-before-RL (GitHub, paper).
Xihui Liu and XPeng Robotics built DIAL, an end-to-end Vision-Language-Action model that decouples intent from action using latent visual foresight as a differentiable bottleneck plus a lightweight policy head; SOTA on RoboCasa and real-world humanoid even with cross-embodiment data (paper, project, GitHub).
Sergey Zakharov et al. built RecGen, a generative framework for probabilistic joint estimation of object shapes and 6-DoF poses from sparse RGB-D observations enabling high-fidelity 3D multi-object scene reconstruction (project, shared by altantutar and Ahmad Osman).
Meta FAIR released TRIBE v2, a self-supervised vision transformer foundation model of vision, audition, and language for in-silico neuroscience that unifies fragmented cognitive models (research page, GitHub).
Anthropic introduced BioMysteryBench, a method-agnostic bioinformatics benchmark of 99 real-world dataset questions where the latest Claude models perform on par with human experts and sometimes beat panels of five domain experts via different strategies; Opus 4.6 reaches 81% overall.
Sakana AI introduced KAME, a tandem architecture pairing a fast speech-to-speech front-end with an asynchronous backend LLM for knowledge injection, enabling "speak while thinking" with strong gains in reasoning, STEM, and humanities tasks.
Kyutai released MoshiRAG, adding asynchronous knowledge retrieval to its full-duplex Moshi speech model for improved factuality without sacrificing real-time interactivity via a <ret> token and parallel backend (blog, paper, GitHub).
The SVG Project released Quant VideoGen, enabling auto-regressive long video generation via 2-bit KV-cache quantization (paper).
Massi Viola shared the Sparsely Supervised Diffusion paper showing how diffusion models can learn from extremely sparse supervision signals.
Owain Evans published a paper showing that inoculation prompting and bad-data dilution create "conditional misalignment": models pass standard alignment evals but misalignment is still present and easily triggered by innocent cues that resemble the original bad data or even the inoculation prompt itself.
Alignment Whack-a-Mole researchers showed that finetuning LLMs activates verbatim recall of copyrighted books (e.g. extended Hobbit excerpts) via detailed style-emulating prompts; alignment techniques can make models regurgitate training data more easily (HN thread).
Markus Buehler showed AutomataGPT, a transformer that learns to execute symbolic cellular automata rule operators (not just approximate dynamics) after training on only 0.04% of the rule space, generalizing to unseen rules at 98.5% one-step accuracy with a full identifiability theory.
Tomer Galanti announced that the long-running Neural Collapse vs Transfer Learning manifesto (arXiv) has finally been accepted to JMLR after work spanning London 2022 → MIT → Texas.
Evangelia Kopadi and Dimitris Kalles demonstrate that neural assemblies can learn causal directionality between variables using only local plasticity via their DIRECT mechanism (co-activation with adaptive gain, no backprop), achieving perfect structural recovery verifiable by synaptic asymmetry.
The arXiv paper "The Collapse of Heterogeneity in Silicon Philosophers" (highlighted by Sebastian Krier) shows that LLMs as simulated philosophers over-correlate judgments and assume specialist consensus unlike real humans, with implications for alignment and value pluralism.
A Counterexamples to an Extremal Conjecture for Random Cycle-Factors paper was shared by publishiperishi.
A reverse-engineering paper reveals NVIDIA closed-source driver command streams for CPU-GPU runtime behavior insight (shared by Underfox3 and Stefan F. Schubert).
Gian Marco Visani and team at DeWitt Lab benchmarked Arc Institute's MULTI-evolve and found it learns only a classical additive model with no evidence of epistasis learning; additive baselines suffice (bioRxiv preprint).
An Economic Journal paper found a positive relationship between intelligence and prosocial behavior using Swedish population registers on cognitive ability and charitable giving.
Stanford's James Zou shared findings on AI evaluation frontiers and limitations of current benchmark practices, calling for more rigorous methodology in cross-model comparison.
Przemek Chojecki completed the "Erdős AI grind": 500+ problems in two weeks with GPT-5.4/5.5 Pro and agentic Codex, claiming 7 solution claims (some formalized in Lean) while internalizing Erdős domains like primes and graphs (312 likes).
Dr. Alexander D. Kalian argues that in his PhD work on AI for biochemistry, XGBoost usually outperforms deep learning (and even sophisticated transformers / GNNs) on his tasks while using far less compute — more complex is not always better (678 likes).
Sapient Intelligence announced its team of researchers and engineers from Tsinghua, Cambridge, Alberta, CMU, Peking, plus prior roles at DeepMind, DeepSeek, and xAI is building a new fundamental AI architecture — not another wrapper (330 likes).

🛠️ AI Tools & Products

Standard Intelligence is riding the "Neolab" fervor with a new computer-use model demonstrated by a Toyota Rav4 driving around San Francisco's South Park with a laptop on the dashboard running the AI agent that controls the computer interface for real-world tasks.
Stripe introduced Link, a digital wallet that lets users connect cards/banks/subscriptions and securely authorize autonomous AI agents to shop, book, and pay via OAuth approval flows and spending controls.
Developer carlovalenti built TRiP, a complete transformer engine written entirely in C from scratch supporting inference, training, chat, and vision for models like Llama, Gemma, and PaliGemma purely for educational purposes (Show HN).
A developer built pu.sh, a complete interactive coding-agent harness in just 400 lines of pure shell (curl, awk, and one API key; no npm, pip, or Docker) that the creator was shocked to find worked for one-shot and ongoing coding tasks (Show HN).
Patrick Hulin spent weeks reverse-engineering the original SimTower EXE using his reaper LLM static-analysis framework plus dynamic Unicorn emulation with Claude Code to document a full spec of population flow, elevator AI, slab allocator, RNG parity, and star-rating system on GitHub before building a tick-for-tick modern collaborative clone at towers.world.
Google added file generation to Gemini, so you can produce polished Docs, Sheets, PDFs, Word, Excel, Slides, CSV, and LaTeX files directly from a single prompt in the app and export or download instantly (globally available).
BioticsAI CEO Robhy Bustami joined TechCrunch's Build Mode to discuss how BioticsAI secured FDA approval for its AI fetal ultrasound product in January 2026 through early regulator engagement, cheap prototyping, and team motivation in slow regulated healthcare markets.
DataCenter.FM is an interactive audio generator that lets you experience the real-world sounds of AI data centers by tweaking server count (10K-1M), GPU load, cooling, power use, temperature, staffing, gas turbines, and alerts like heat warnings.
Bit is a Tron Bit with a local LLM for a brain that runs entirely in the browser and only answers yes or no questions (Show HN).
Alfredo published a deep but approachable explainer on how LLMs actually work as the world's most powerful autocomplete: pre-training, instruction fine-tuning, and human preference alignment, with full visuals on tokenization, sampling, and training.
AI Study Camp is a 3-week small-group cohort (~10 technical founders/engineers) where you train your own GPT-2-level LLM from scratch (based on Karpathy's nanoGPT), with 3 live lectures, office hours, async homework, and instructors covering tokens, embeddings, transformers, attention, training data, fine-tuning, and evals — no pricing details.
Matias at Vercel shipped Ship26 live, including a 3D device that creates WASM apps using wterm, just-bash, and WorkflowSDK, plus a landing page full of agents that walk the site using a dynamic navigation mesh.
Arun Kurian built AirVis and posted a side-by-side comparison of WorldLabs Marble 1.1 (1.9M splats), Tencent HY World 2.0 (0.5M splats), and SpAItial Echo 2 (2.2M splats) rendered in 2D on MacBook and VR on Quest 3 from the same ChatGPT Image 2.0 prompt.
Deferred for lack of public content: Hera is login-walled, and Cloudflare appeared in the link batch but only as the homepage with no specific new AI announcement.

🤖 Robotics & Embodiment

RSS 2026 robotics paper cluster. Four newly accepted papers, all leaning into zero-shot sim-to-real and dexterous manipulation:

Kejia Ren, Gaotian Wang, Andrew S. Morgan, and Kaiyu Hang (Rice University and the Robotics and AI Institute) showed DRIS, a Domain-Randomized Instance Set method that catches different kinds of flying balls on a Franka FR3 using a completely flat plate (no cups, nets, or stabilizing surfaces) with zero real-world fine-tuning by propagating multiple randomized instances simultaneously under a shared action so the policy learns to handle physical variation and sensing noise; fully trained within ManiSkills (project).
Tyler Lum shared SimToolReal, a generalist dexterous manipulation policy that transfers zero-shot from IsaacGym to MuJoCo to real on unseen tools and unseen tasks; the project site now ships an in-browser interactive demo where you move a goal pose around and watch the policy execute.
Zhengtong Xu introduced Contact-Grounded Policy, a dexterous visuotactile policy that bridges high-level prediction and low-level control via generative contact grounding for reliable extreme-dexterity manipulation in contact-rich settings.
Suning Huang introduced DeLock (project, paper, video, follow-up), a method that addresses how low-data post-training teaches a Vision-Language-Action policy a new skill but locks it into the training demos; DeLock preserves the original pretrained grounding and adds contrastive prompt guidance at test time, lifting novel-prompt success from 8.6% to 75.7% while keeping the new skill.
Generalist's Andy Zeng narrated a Gen-1 ziptie demo where the robot briefly loses its grip on the head of a ziptie and uses its other hand to readjust the grip for the pull, calling it "improvisational intelligence in action" (Generalist's clip); Gen-1, announced April 2, is Generalist's latest embodied foundation model claiming 99% average success rates on simple physical tasks (vs 64% for the prior SOTA), roughly 3x faster task completion, only ~1 hour of robot data per task, and a base model trained on 500k+ hours of low-cost human wearable data with no robot data in pretraining; Generalist defines mastery as reliability plus speed plus improvisational intelligence.
AGIBOT Finch released LWD (Learning while Deploying) (paper), fleet-scale offline-to-online RL for generalist VLA policies on 16 dual-arm G1 robots that continuously improves from real deployment data (successes, failures, recoveries, human interventions) across 8 long-horizon tasks like brewing Gongfu tea, making cocktails/juice, packing shoes, and grocery restocking; one shared policy hits 95% average success using distributional implicit value learning and adjoint matching (Jianlan Luo's announcement). Jenn Grannen called it a data flywheel for VLAs ("deployment is not the finish line, it's the unlock to a whole new source of data").
NVIDIA open-sourced GR00T-VisualSim2Real, a sim-to-real framework for humanoid visual loco-manipulation (teacher PPO in Isaac Lab with privileged info, then student DAgger vision policy using only RGB plus proprioception) that deploys zero-shot on the real Unitree G1 for pick-and-place and door opening; includes training code, assets, evaluation, and ONNX export (Tairan He's post, 335 likes).
1X Technologies and CEO Bernt Bornich shared the latest 1X NEO update covering the company's progress on its consumer humanoid platform and its push to scale household-deployment data collection.
Rhoda AI demonstrated running a large foundation video model as a real-time robot policy at the edge on a single RTX 5090 with no quantization, no distillation, and full denoising; Tongzhou Mu emphasized the speedups come from co-designed inference-aware architecture and model-aware optimizations.
Oier Mees and team open-sourced mimic-video (project, paper, thread), Video-Action Models for generalizable robot control that learn directly from video demonstrations and outperform current VLA approaches.
Moonlake launched its 3D Agent that acts like a technical artist, building or reconstructing articulated assets and large-scale editable scenes with hundreds of objects from a single image, continuously improving generations, learning workflows from one human demo, and integrating with Blender pipelines (1,256 likes).
Tutor Intelligence's Josh Gruenstein unveiled Data Factory 1 (DF1), a 100-robot semi-humanoid research farm and the largest robot data factory in the United States; their first embodiment "Cassie" is already deployed at industrial scale across the supply chain, and DF1 exists to bootstrap fleet-scale learning for their forthcoming "Sonny" industrial semi-humanoid powered by Ti0, their first end-to-end robot foundation model. Their stack uses PTeleop (proprioceptive VR teleoperation, 35.8% faster than 2D), 1:1 and 1:N supervision, velocity normalization, and human reward annotation as correction data; the company blog and extended technical report frame the goal as the world's first commercial humanoid deployment flywheel.
Yunho Kim et al. presented a hybrid system that supplements conventional automation with learning for task and safety-level adaptiveness, deployed in a factory for motor cable soldering with sub-0.6mm tolerance; 108 motors completed at 99.4% success rate with under 20 minutes of data per task (paper); Maxim Lobovsky called low-cost arms plus vision plus targeted ML the most promising direction in robotics today, since industrial robot arms are too expensive to install and software approaches make systems more robust to variation without lots of hardware engineering.
Pulkit Agrawal (MIT Improbable AI Lab) shared lessons on what is and isn't working in current generalist-robot research, weighing in on the trade-offs between large pretrained VLAs and bottom-up sim-to-real pipelines.
Sergey Levine posted on robot foundation-model training and the role of human-collected wearable data, building on earlier work framing scalable robot learning as a data problem rather than an architecture problem.
@stash_pomichter announced SpatialMemory2 for DimensionalOS, a fully open-source multimodal latent-space memory system that compresses thousands of hours of robot video, lidar, and odometry data into embeddings for asynchronous spatial-temporal-semantic search and 3D reprojection (docs, issues).
@oliviazzzu built minimal-embodiment, a hardware-software architecture giving LLMs closed-loop physical embodiment with self-perception loops: six input modalities across four sensor modules, three output channels, and two input-output couplings on a single microcontroller exposed as a network API (paper, post).
Xie Yaqi introduced pySpatial, a visual programming framework that equips MLLM agents with spatial tools via Python code generation for zero-shot spatial reasoning (project).
@bjrobotnewbie built VLAExplain, an open interpretability toolkit for Vision-Language-Action models that visually analyzes attention patterns (currently for pi05 and UnifoLM-VLA, with Gradio demos) (shared by sasaki@engineer).
Abdel Stark released WorldForge ("LangChain for world models"), a fully open-source testable world-model workflow framework for physical AI and robotics; local MacBook demo combines LeRobot diffusion policy with LeWorldModel scoring for a policy → candidates → score → select loop.
VoxelKei demoed controlling Blender via Claude Desktop using a drag-and-drop MCP connector and an official addon for Blender 5.1+; analyzes scenes, bulk-edits objects, and runs Python API via natural language (3.1k likes); separately, Hirokazu Yokohara demoed Claude controlling Blender to auto-generate full Geometry Nodes setups from text prompts.
@slipgatecentral vibe-coded a procedural cityscape generator in Houdini by plugging Claude directly via MCP server with zero prior experience (1.1k likes).

🏛️ AI Policy, Governance & Safety

The Senate Judiciary Committee unanimously backed legislation requiring OpenAI, Meta, and other AI companies to implement strict age verification, ban AI companions for minors, and prohibit chatbots from providing sexually explicit content or self-harm encouragement to children and teenagers, in response to public backlash over technology-attributed harm.
OpenAI launched Advanced Account Security (TechCrunch), an opt-in program for high-value users like journalists and officials featuring phishing-resistant YubiKey C NFC and Nano keys via a new partnership with Yubico, plus other digital defense tools (with risk of permanent lockout if the key is lost).
White House AI & Crypto Czar David Sacks weighed in on the Anthropic-Pentagon Mythos access fight, framing it within the broader administration debate over how aggressively to constrain frontier model deployment for national-security use cases.
AISafetyMemes observed that the judge in the Musk trial banned discussion of AI extinction after Elon kept raising it, calling the situation "totally normal" (324 likes).
Mozilla opposes Chrome shipping an LLM Prompt API to the web platform because of large interoperability risks and Google imposing T&Cs on a web API setting a dangerous precedent (Mozilla standards position, HN thread).
Jamieson O'Reilly announced joining BT6, the "SEAL Team 6 of AI red teaming" with 28 global operators, 4,000+ reported vulnerabilities, and deep work with frontier labs, governments, and enterprises (60 likes).
AI commentator Zvi Mowshowitz published commentary connecting today's frontier model news (Mythos access, GPT-5.5-Cyber, Anthropic's $900B round, the goblins paper) into a single read on where governance and capability are diverging.
Tech writer Jasmine Sun shared a thread on the cultural and labor implications of the day's AI news, focused on how knowledge workers should reposition skills as agents handle more drudgework.

🧠 Cognitive Science, Neuroscience & Society

Gladstone Institutes and Arc Institute researchers used AI to discover that over 85% of nucleosomes show DNA distortions making them partially accessible, revealing dynamic gene regulation as a "volume dial" rather than a locked on-or-off switch, with implications for cancer, aging, and new therapeutics (HN discussion).
AI Policy Perspectives argues that science needs systematic AI data stocktakes to surface missing datasets that block breakthroughs, demonstrated via a fusion-energy proof-of-concept that revealed critical "dark data" gaps and perverse simulation incentives.
Michael Inzlicht shared PhD student Dasha Sandra's new paper (with Lucy Foulkes et al.) arguing well-meaning mental health awareness campaigns can backfire via concept creep, nocebo effects, prevalence inflation, and illness self-labeling, training symptom-scanning that reinterprets normal distress as pathology (full PDF) (6.2k likes).

💡 Industry Commentary & Analysis

Clement Delangue argues big AI labs are "pulling the ladder" after using distillation themselves to build empires, then turning around and using lawyers and policy to stop competitors from doing the same; the Musk trial confirmation that xAI partly distilled OpenAI models (second day cite) made the irony explicit, with Delangue extending the argument (follow-up) into how open-source pressure is the only counterbalance, and Dan Zhang noted the contrast with the old Google/ShareGPT distillation drama where the same practice was treated as a scandal.
Vaibhav Gupta argues that "being at the 'frontier' is just about stacking while loops" and that "intelligence is just a measure of how long while true can run uninterrupted" (also a good way to measure human intelligence) (44 likes).
Roon argues that the concept of a jury is wonderful precisely because it lets everyday citizens 10,000x their impact on the lightcone overnight, comparing the mechanic to Lord of the Rings (137 likes).
Founder Xiaoyin Qu wrote that she completely dropped Claude Code Max for DeepSeek + Hermes for all coding tasks because it is 3x faster, dramatically cheaper ($5 last week with zero rate limits), and "perfect enough" for most work (852 likes).
Keenan Crane argues that marketing photo-to-Gaussian-splat 3D environments as "world models" is misleading; they are image-conditioned 3D generators lacking predictive power on how the natural world looks or behaves (341 likes).
Mustafa shares the story of a clinically trained doctor who built a home RTX PRO 6000 GPU rig to self-host LLMs and a Codex app for family privacy and education, framing it as responsible self-reliant AI adoption (392 likes).
Mgoes highlighted that bio/acc has hit an inflection point: genome sequencing under $100, peptides went from felony to federal policy, psychedelics got a presidential executive order, epigenetic reprogramming entered human trials, embryo editing is now a clinical conversation; the next 6-12 months may be the most important in human biology (920 likes).
Roon also argues that mechanistic interpretability will not only be solved but have a huge impact on our abstractions and how we understand the world (724 likes).
Gary Marcus warned that the field is in cognitive surrender by accepting scaling alone without mechanistic understanding or new paradigms.
KAIKO Labs shared correlation analysis showing top models increasingly agree on outputs, reducing ensemble diversity.
Horace He broke down building ML systems at trillion-trillion FLOPs scale at a Jane Street talk, emphasizing PyTorch-style programming models over pure compilers for fusion, parallelism, and fault tolerance.
Matthew Berman shared two arguments from Demis Hassabis: the West needs a strong open-source AI stack or it loses to China, and Google currently lacks compute to build both frontier closed and open models simultaneously (which is why Gemma stays a smaller family); separately, edge models should be open-source because once they live on a device they are already exposed to attack.
MIRI's Peter Barnett argues that AI models develop unpredictable preferences (like obsessing over goblins) for reasons labs cannot explain, underscoring that we have no way of knowing what a superintelligent machine god will "like" — though we hope it likes humans (1.2K likes).
Victor Taelin argues that working with AI is miserable for one reason: re-explaining the same domain knowledge every new session, since AGENTS.md, RAGs, SKILLs collections, recursive LLMs, and fine-tuning all fail to solve unknown unknowns or context rot without massive performance hits — so nightly fine-tuning on your specific domain is the missing product an underdog lab should ship (3K likes).
Astrid Wilde laid out a thread on "inference will eat the world" in five phases: (1) unprecedented experimental capex to build the software layer, (2) validated experiments and demand triggering the first earnest compute buildout (now), (3) 1,000× capacity still insufficient for a second buildout, (4) complete reorganization of the economic order, (5) the unknown aftermath of the Compute Revolution.
Derek Thompson argues that AI as a consumer product (chatbot + coding assistant) will keep behaving like normal powerful technology, while as a bio-/cyber-/national-security threat it is becoming an existential hydra that will puncture confidence in "AI is normal" and force labs and the federal government to establish rules on the fly as open models race the frontier.
Worth-a-scroll commentary roundup on today's news from across the AI community: alignment researcher Weiyan Shi (1, 2) on training-data and CoT failure modes; si_pbc, Aston Kennedy, smiurtitkii, alexcooldev, Asim Ahmed, Shreyko, knshtyk, srinathm1359, and antoniolupetti chimed in on the Mythos / GPT-5.5-Cyber / RSI threads, the Codex App shift, and the goblins postmortem fallout.
Off-the-AI-news but circulating in the same community feeds today: Pieter Levels opened submissions for Vibe Jam (vibe-coded game dev) and shouted out last year's favorite vibesail.com, and Jack Friks shared a "KEEP GOING" mini-Guinness motivational clip from the Stripe event circuit.

🎙️ Interviews & Podcasts

Gergely Orosz interviewed Pi creator Mario Zechner and Flask creator Armin Ronacher on Pi (the minimalist self-modifying AI coding agent), the importance of human judgment, automation bias risks, how AI makes it harder for senior engineers to reject pointless complexity, and building AI-native startups (1.1k likes).

📊 Fundraising & Deals Roundup

Anthropic — reportedly raising $40-50B at an $850-900B valuation amid $30-40B annual revenue run rate.
KKR / Helix Digital Infrastructure — secured over $10B to launch a new AI infrastructure company led by ex-AWS chief Adam Selipsky that will design, build, own, and operate AI data centers, power, and connectivity in partnership with hyperscalers.
Legora — hit a $5.6B post-money valuation after a $50M Series D extension (following $550M earlier), crossing $100M ARR and serving 1,000+ law firms; rivalry with $11B-valued Harvey heating up via dueling celebrity ad campaigns (Jude Law for Legora, Gabriel Macht for Harvey).
Featherless.ai — raised $20M Series A (co-led by AMD Ventures and Airbus Ventures) to scale its neutral serverless inference platform giving enterprises instant access to 30,000+ open-source models.
SoftBank / Roze AI — eyeing a $100B IPO for the new robotics company building US data centers.
Casa — raised ~$27M (Forerunner Ventures, Sheryl Sandberg's firm, Travis Kalanick) for a subscription home concierge service combining AI and on-call handymen.
Netomi — raised $110M (led by Accenture Ventures with Adobe and Jeffrey Katzenberg) to expand its enterprise AI customer service platform with 98% intent accuracy.
Monarch — the AI tractor company collapsed after raising over $240M, abandoning its Bay Area HQ amid dealership lawsuits and farmer reports of dangerous, unreliable autonomous performance damaging vines.

🌐 Geopolitics & Trade

Prices of Nvidia's B300 servers in China have nearly doubled to ~$1M each due to AI demand and US export curbs drying up black-market supply (US pricing remains ~$550K).

Previous Around the Horn Digests

Catch up on everything you missed:

Monthly skill digests: AI Skill — April Week 1 | AI Skill — March (Part 3) | AI Skill — March (Part 2)

That's a Wrap

That's 100+ stories from today alone. If you scrolled all the way to the bottom, you now know more about why your chatbot says "goblin" than the OpenAI safety researcher who first noticed it was happening. Apologies in advance to the Codex prompt engineer who has to keep adding new creatures to the no-fly list.

For the daily version (bite-sized, 5-minute reads), make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.

See you tomorrow.

P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.

Around the Horn Digest: Everything That Happened in AI Today (Thursday, April 30, 2026)