Everything That Happened in AI This Weekend April 17-19 2026

Around the Horn Digest: Everything That Happened in AI This Weekend (Friday-Sunday, April 17-19, 2026)

Anthropic shipped Claude Design (the Figma competitor everyone saw coming), three senior OpenAI execs announced they're leaving pre-IPO, Claude Opus 4.7 wrote a working Chrome exploit for $2,283, and a fake Claude site started installing malware while everyone was watching the product launches.

Written By
Grant Harvey
Grant Harvey
Apr 19, 2026
38 minute read

Anthropic shipped Claude Design (the Figma competitor everyone saw coming), OpenAI's Bill Peebles, Kevin Weil, and Srinivas Narayanan all announced they're leaving pre-IPO, Claude Opus 4.7 wrote a working Chrome exploit for $2,283, a fake Claude site started installing PlugX malware on people's computers, and Simon Willison diffed the Claude 4.6 vs 4.7 system prompts for everyone.

Welcome to the Around the Horn Digest, your weekend recap of everything worth knowing from the AI week that was. If the week felt louder than usual, that's because it was: Anthropic shipped Opus 4.7 AND a full design tool, OpenAI overhauled Codex into a desktop agent AND launched GPT-Rosalind for drug discovery, Google shipped three separate products, Perplexity shipped a Mac agent that controls your files, and Stanford dropped the numbers proving the public still isn't sold on any of it. Meanwhile GPU rental prices surged 48% in 60 days, so good luck out there.

Let's get into it.

Previous digests: Thu Apr 16 | Wed Apr 15 | Tue Apr 14 | Mon Apr 13 | Weekend Apr 4-5

Monthly skill digests: AI Skill — April Week 1 | AI Skill — March (Part 3)

Around the Horn — Sunday, April 19, 2026

The story of the week is the story of the week: Anthropic and OpenAI both shipped flagship products that eat pieces of the SaaS stack, and then they kept shipping. In seven days, Anthropic put Claude Opus 4.7 on every major cloud, launched Claude Design from Anthropic Labs (a direct shot at Figma, Canva, and Gamma), and its CPO Mike Krieger resigned from Figma's board the same day the design-tool news broke. OpenAI rebuilt Codex into a full desktop agent with Mac computer use, in-app browsing, and 90+ plugins, launched GPT-Rosalind for life sciences research, shipped Product Discovery in ChatGPT via the Agentic Commerce Protocol, and then saw three senior execs announce they're leaving before the IPO.

The frame everyone is converging on: Box CEO Aaron Levie argued Codex is a genuine step change for knowledge work; Sequoia partner Julien Bek published "Services: The New Software" arguing the next $1T company sells outcomes, not software; TechCrunch called Anthropic's CPO board exit "another data point for investors who fear the SaaSpocalypse." The thesis is that the labs are absorbing the productivity software above them and the infrastructure below. This week we watched it happen in real time.

The infrastructure below is the part that doesn't look so good. Tomasz Tunguz reported Nvidia Blackwell rental prices jumped 48% in 60 days, from $2.75 to $4.08 per hour. Anthropic quietly moved enterprise customers to usage-based billing. Ethan Mollick flagged that the compute-bubble thesis was wrong because demand keeps outrunning supply. And Claude Opus 4.7 wrote a working Chrome exploit for $2,283, per The Register, which is either terrifying or validating depending on whether you're on Anthropic's cybersecurity team or part of the group that said "AI can't really do this yet."

In the background, the public perception beat keeps getting worse. Stanford's 2026 AI Index Report put hard numbers on a gap everyone in AI has been ignoring: 56% of AI experts say they're more excited than concerned; only 10% of the general public agrees. Sam Altman's San Francisco home was firebombed earlier this month, the suspect's manifesto listed other AI executives as targets, and federal prosecutors are weighing domestic-terrorism charges. The "AI is hurting people and someone has to stop it" argument just reached a new level.

So: the labs shipped faster than anyone underneath them could adapt, compute ran hot, governance moved into federal-prosecutor territory in San Francisco, and a 20-year-old took a kerosene jug to a CEO's gate. This is the week we're recapping. Welcome to the weekend.

🏆 TOP 10 STORIES OF THE WEEK

The 10 stories that defined the week, ordered by week-long impact rather than news magnitude alone.

  • Claude Opus 4.7 launched across Claude.ai, the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry at the same $5/$25 per million token pricing as 4.6. Vision reasoning jumped from 69.1% to 82.1% on CharXiv, SWE-bench Pro (a real-world coding benchmark) went from 53.4% to 64.3%, and it hit #1 on Vals AI's Vibe Code Benchmark at 71%. The catches: a new tokenizer (the system that breaks text into chunks the model can process) uses up to 35% more tokens per request, which Claude Code Camp measured at 1.47x in practice; long-context benchmarks regressed vs 4.6; and Anthropic quietly raised rate limits to partially offset the tokenizer tax. Simon Willison published the full 4.6→4.7 system prompt diff for anyone who wants to see what changed under the hood.
  • OpenAI overhauled Codex into a full desktop agent with background Mac computer use (agents click and type alongside you), an in-app browser you can comment on to instruct the agent, native gpt-image-1.5 image generation, persistent memory, automations that wake up across days or weeks, and 90+ plugins spanning Atlassian Rovo, CircleCI, Microsoft Suite, Box, and GitLab Issues. OpenAI cited 3M+ weekly developers already using Codex. TechCrunch called it "OpenAI takes aim at Anthropic."
  • OpenAI launched GPT-Rosalind, its first life-sciences-specialized model optimized for biochemistry, genomics, and drug-discovery reasoning. Available in research preview to Moderna, Amgen, the Allen Institute, and Thermo Fisher via a trusted-access program. The OpenAI Podcast episode with research lead Joy Jiao and product lead Yunyun Wang has the deeper science side for anyone who wants it.
  • Anthropic launched Claude Design from Anthropic Labs this weekend: a new product for collaborating with Claude on polished visual work like designs, prototypes, slides, and one-pagers. The Information leaked it mid-week, Adobe, Wix, and Figma all traded down on the news, and Anthropic CPO Mike Krieger resigned from Figma's board the same day. Sam Henri Gold's early review is the most thoughtful walkthrough so far. This is the SaaSpocalypse moment everyone was flagging on Thursday, shipped by Sunday.
  • Stanford's 2026 AI Index landed with hard numbers on the gap between AI insiders and everyone else: 56% of AI experts say they're more excited than concerned about AI; only 10% of the general public agrees. On jobs, the split is 73% vs 23%. Grok 4's training run alone produced an estimated 72,816 tons of CO2 (about 17,000 cars driving for a year). China's top model now trails Anthropic's by just 2.7%. IEEE Spectrum's 12-graph explainer is the easiest way to scan it.
  • Sam Altman's San Francisco home was firebombed and the case has now escalated to federal domestic-terrorism territory. A 20-year-old Texan named Daniel Moreno-Gama allegedly threw a Molotov cocktail at Altman's Lombard Street gate on April 10, walked three miles to OpenAI's Mission Bay HQ, and told security he came "to burn it down and kill anyone inside." The manifesto recovered at the scene listed other AI executives as targets. Moreno-Gama was arraigned April 14 and held without bail. US Attorney Craig Missakian said prosecutors will pursue domestic terrorism charges if evidence supports it. The trigger everyone is pointing at: The New Yorker's April 7 Altman profile, published five days before the attack. Altman's late-Friday response post admitted he had "underestimated the power of words and narratives."
  • OpenAI had a "Liberation Day." Bill Peebles (Head of Sora), Kevin Weil (VP of OpenAI for Science), and Srinivas Narayanan (CTO of B2B Applications) all announced they're leaving pre-IPO this weekend. As Dare Obasanjo put it, "it's notable that they aren't waiting for the IPO."
  • GPU rental prices surged 48% in 60 days, from $2.75 to $4.08 per hour, per Tomasz Tunguz. Toby Ord's companion piece argues AI agent costs are rising exponentially alongside capability. Anthropic quietly moved enterprise customers to usage-based billing per The Information. Ethan Mollick flagged the compute-bubble thesis was wrong because demand keeps outrunning supply. Translation: the unit economics for anyone building on frontier models just moved against them.
  • The Mythos cybersecurity saga compounded. The UK AISI's evaluation confirmed Mythos is the first model to complete their full 32-step corporate network cyber range end-to-end (a task that takes a human expert ~20 hours). The Register reported Claude Opus 4.7 wrote a working Chrome exploit for $2,283. Vidoc Security Lab reproduced Anthropic's Mythos findings using GPT-5.4 and Claude Opus 4.6, arguing the building blocks are already accessible outside Glasswing. Europe got cut out of research preview access entirely. And the US Treasury is now seeking Mythos access to hunt for vulnerabilities in financial infrastructure.
  • Anthropic's Automated Alignment Researchers outperformed Anthropic's own human alignment researchers on the weak-to-strong supervision problem (training a strong model using only a weaker model's labels, a setup designed to mirror future humans-supervising-smarter-AI scenarios). Two human researchers spent 7 days and closed 23% of the performance gap. Nine parallel Claude Opus 4.6 agents ran for 5 more days (800 agent-hours total) and closed 97% of the gap at $22 per agent-hour, ~$18,000 total. Andrew Curran called it "a preview of RSI" (recursive self-improvement). The catch: this works only on outcome-gradable problems where progress can be procedurally verified. But the cost number is the one to internalize.
Advertisement

🍪 TOP 10 TOOLS OF THE WEEK

A note before you scroll: we covered Claude Opus 4.7, the new Codex, Claude Design, and GPT-Rosalind as Top 10 Stories above. If you want to actually try them, scroll up; we didn't double-dip here to keep this list diverse. Everything below is separate ground.

  • Qwen3.6-35B-A3B is Alibaba's new open-source sparse mixture-of-experts model (only 3B of its 35B parameters fire per query) that matches Claude Sonnet 4.5 on many vision tasks and rivals models 10x its active size on agentic coding. Try it on Qwen Studio or grab the weights on Hugging Face — free, open weights for commercial use.
  • Google Skills in Chrome saves any Gemini prompt (or pulls from 50+ premade ones) as a one-click reusable workflow that runs on the current tab or across multiple tabs via the / or + sidebar shortcut. Vegan recipe substitutions, side-by-side shopping comparisons, contract scans — free, rolling out to English-US desktop.
  • Gemini for macOS is Google's brand-new fully-native Swift app for Mac that lets you share your screen or local files with Gemini in real time, plus built-in context for summarizing docs, fixing code, and deep work — free with a Google account.
  • Gemini 3.1 Flash TTS is Google's new text-to-speech model with scene direction, speaker-level specificity, audio tags, more natural/expressive voices, and 70-language support — free in AI Studio audio playground, paid via Gemini API.
  • Perplexity Personal Computer for Mac is Perplexity's new native Mac app that opens with a keyboard shortcut and controls your local files, native apps (iMessage, Mail, Calendar), and Comet browser from one always-on AI, with no uploads and no tab-switching — Max subscribers first, waitlist open.
  • Claude Code Routines lets you define a prompt + repositories + connectors that Claude Code runs automatically on a schedule, via API call, or on GitHub events (PR merge → auto-update docs), all from Anthropic-managed cloud infrastructure even when your laptop is off, with a per-routine API endpoint for Zapier/Slack alerts — free with every Claude Code plan.
  • Canva AI 2.0 lets you describe a project in plain English ("colorful 12-page planning deck for a trip to Morocco") and get an editable, iterable design back, now with persistent memory, background scheduling, third-party connectors, web research, and an orchestration layer coordinating Canva's full suite from one prompt — free tier available, Pro from $15/mo.
  • Tencent HY-World 2.0 and NVIDIA Lyra 2.0 both landed on Hugging Face this week as open-weight 3D world models. HY-World 2.0 converts text, images, or videos into editable 3D worlds importable directly into Unity, Unreal, or Blender (full commercial license). Lyra 2.0 turns a single image into a persistent, explorable 3D Gaussian scene (research-use license only, no commercial output). Both free to download.
  • WTF Are Agents Buying?! is a live feed showing what AI agents actually purchase online as it happens, with running commentary on agent behavior. Pure curiosity value with a serious throughline about what "agentic commerce" looks like in the wild — free to watch.
  • Omi for Desktop is an open-source local "life architect" that watches your screen, listens to your conversations, stores everything locally, and runs Claude Code locally to suggest what you should do next. Syncs with the Omi necklace hardware (cloud sync optional); 8K+ stars on GitHub. Marketed as Rewind + Granola + Wisprflow + ChatGPT + Claude in one local app — free, open source.

🏢 Big Tech & Major Companies

  • Claude Design launched Friday from Anthropic Labs as a research preview for Claude Pro, Max, Team, and Enterprise subscribers, explicitly powered by Opus 4.7's vision capabilities.
    • The pitch: collaborate with Claude to produce polished visual work (designs, prototypes, slides, one-pagers, marketing collateral, pitch decks) via natural language, with refinement through inline comments, direct edits, and custom sliders Claude generates on the fly.
    • Three features stand out.
      • During onboarding, Claude reads your codebase and design files to build a persistent design system your team can reuse across every project.
      • A web capture tool grabs elements directly from live sites so prototypes look like the real product.
      • And a handoff bundle packages the finished design with a single instruction to pass to Claude Code for production implementation. Exports include Canva, PDF, PPTX, standalone HTML, or an internal org-scoped URL. Enterprise admins get it disabled by default. Try it at claude.ai/design.
    • Partner endorsements published with the launch:
      • Canva CEO Melanie Perkins framed the integration as "making it seamless for people to bring ideas and drafts from Claude Design into Canva";
      • Brilliant's Senior Product Designer Olivia Xu said intricate interactive prototypes that took 20+ prompts in other tools took 2 prompts in Claude Design.
      • Datadog PM Aneesh Kethini said the team now goes "from rough idea to working prototype before anyone leaves the room." SaaSpocalypse fears from Thursday were not subtle.
    • The developer showcase wave started fast.
      • Ran Segall built a full homeschooling app from scratch in Claude Design and called the result roughly 10x better than the same prompt in Lovable, Replit, Base44, or Google AI Studio.
      • Jerrod Lew assembled a complete personal Jerrod OS dashboard in Claude Design plus Opus 4.7 with glassmorphic widgets for calendar, critical emails, health stats, music, and a world map (the "personal AI as your operating system" vibe this launch keeps triggering).
      • Kate Deyneka piped a wholesome gift workflow through Telegram, OpenClaw, Gemini, and Nano Banana into Claude Design to generate a friend's personalized photo map.
      • Bin Liu solved motion design in two prompts pairing Claude Design with his open-source HyperFrames tool and local kokoro TTS.
      • And Ryan Mather (Anthropic design, aka Flomerboy) posted a 7-tip thread that is likely the highest-signal user guide right now: set up the design system and core screens first, iterate live with engineers, use the Comment tool for surgical edits, ask for video demos, plug in connectors (docs/Slack), create on-the-fly tools, and know when to slow down for hand-crafted details.
      • Carlos E. Perez had the sharpest strategic take: when you start Claude Design with a well-defined domain model, it instantly builds complex interfaces that front rich ontologies, giving an unfair speed advantage against tools like Palantir. That's the quietly spicy framing; Claude Design isn't just a Figma competitor, it's a Palantir competitor for anyone who can model their domain.
    • The first-wave community reaction on r/ClaudeAI was more tempered.
      • A user's "Claude Design is Incredible" post collected 258 upvotes and a consensus-moderator summary calling the output a "resounding meh."
      • The complaint: every generated app shares the same aesthetic, right down to the specific serif font, the obligatory blinking status dot, colored accent bars, and what one commenter called "container soup" of pills and cards.
      • One developer noted Claude Design appears to use the existing Claude frontend-design skill with 3-4 default presets; another observed that unless you upload reference screenshots or a design system, "it screams I just used one Claude prompt."
      • The OP who posted the showcase partly agreed: "If your prompt is loose (as mine was), and you do one iteration (as I did), it WILL implement the design it has in its system prompt."
      • Separately, at least one user flagged that two or three full prompts can exhaust weekly Pro limits, which matches the broader compute-rationing story this week.
      • A community-built UI/UX skill on GitHub exists specifically to override the default Claude-core look. Verdict so far: great for speed, less great for differentiation. Sam Henri Gold's earlier writeup is the most thoughtful long-form take.
    • Oh, and worth noting: someone flagged that Claude Design can't stop designing giant turds. Reddit's take: Pointillism, meet Tourdilism.
  • Anthropic quietly expanded its compute partnership with Google and Broadcom on the same day as the Claude Design launch, committing to multiple gigawatts of next-generation compute. Given Tomasz Tunguz's 48% Blackwell price spike this week, every gigawatt Anthropic can lock in now is a gigawatt it doesn't have to bid for on spot later.
  • Anthropic quietly expanded its compute partnership with Google and Broadcom on the same day as the Claude Design launch, committing to multiple gigawatts of next-generation compute. Given Tomasz Tunguz's 48% Blackwell price spike this week, every gigawatt Anthropic can lock in now is a gigawatt it doesn't have to bid for on spot later.
  • Google is in talks with Marvell to build two new AI inference chips, per The Information: a memory processing unit designed to work alongside Google's TPU, and a new TPU variant. Paywalled, but the headline speaks for itself; the AI-chip duopoly is about to get less duopoly-shaped.
  • OpenAI shipped richer visual product discovery in ChatGPT powered by an expanded Agentic Commerce Protocol, letting users browse and compare products side-by-side with images from Target, Sephora, Shopify merchants, and others, with conversational refinement. Every browse-and-buy interaction now has a ChatGPT competitor.
  • Google Research launched Vantage on Google Labs, an AI-simulated multi-avatar conversation sandbox where an Executive LLM steers dynamic challenges (conflict, pushback) and an AI Evaluator scores future-ready skills like conflict resolution and project management with human-expert-level agreement. Validated in NYU (188 testers) and OpenMic studies, aimed at making durable competencies (critical thinking, collaboration, creativity) measurable for students and educators. "Soft skills simulator for teenagers" is a strange product shape from Google Research, but here we are.
Advertisement

💼 AI Productivity, Labor & Economics

  • Kasparov vs. Amodei on whether AI will destroy labor markets. Garry Kasparov sided with Yann LeCun, arguing that every previous technological revolution improved productivity and expanded the economy while letting workers (including white-collar professionals) adapt and use the new tools rather than becoming obsolete buggy drivers. The counter-anchor: TFTC circulated Dario Amodei's prediction that 50% of all tech jobs, entry-level lawyers, consultants, and finance professionals will be completely wiped out within 1-5 years (6K+ likes). Robert Scoble argued for accelerating through the pain rather than slowing down, on the same historical-pattern logic as Kasparov. You will see every permutation of these three positions weaponized in discourse over the coming weeks; pick a foxhole.
  • Stanford's 2026 AI Index is out and IEEE Spectrum distilled 12 graphs worth. Headline numbers: U.S. firms released 50 notable models in 2025; AI compute has grown 3.3x yearly since 2022 (Nvidia grew more than 60% of that); training emissions hit 72,000+ tons of CO2 equivalent for Grok 4; LLMs now hit 38-50% on Humanity's Last Exam; AI investment peaked at $581B; public optimism edged up to 59%. The elite-vs-public split (model labs race ahead, public trust lags) is widening faster than either side acknowledges. HN discussion predictably turned into a referendum on whether the emissions number is scandalous or a rounding error.
  • Aaron Levie argued that the pace of model progress forces agent teams to throw away large chunks of their architecture every few quarters; systems built to compensate for yesterday's context-window or capability limits become obsolete overnight. Sam Hogan made the sharper version: most of the tooling built for LLMs in the last year (RAG, GraphRAG, multi-agent orchestration, ReAct frameworks, prompt management, LLMOps, eval tools, gateways, finetuning libs) has been largely obsoleted in the last three months by raw model progress. The lesson for founders: do not build atop this year's weaknesses.
  • signüll argued that in consumer AI, TAM equals everyone: the real leverage is the ~100 people alive who combine product instinct, software taste, design chops, technical depth, a real model of how AI works, single-user psychology, cultural intuition, team-building, motivation, and narrative gift. The takeaway: the consumer-AI war will be decided by talent allocation, not capital. Related: Oliver Habryka's schizo LessWrong essay arguing there are only four real skills (design, technical, management, physical), and that mastering any task in a category gets you to expert level in any other task in that category within about six months because of skill transfer.
  • Séb Krier traced the evolution of Very Online AI discourse: 2015-2020 (theoretical safety, EA/LW monoculture, AlphaGo, mesa-optimizers, foom); 2020-2023 (governance, scaling laws, shoggoths, e/acc, stochastic parrots, pause letter); 2023-2026 (situational awareness, national frameworks, 2027, Pliny, Golden Gate, reasoning models, scheming, sb1047, agents, slop, Mythos). If you want a map of how we got here, this is the compressed version.
  • Matt Webb's "Headless everything for personal AI" argues that the next wave of consumer AI products will treat the LLM as a headless backend you wire into dozens of bespoke tiny apps (a specialized dashboard for calendar triage, another for music discovery, another for fitness), rather than one megalithic chat interface. The Jerrod OS dashboard above is the first wild-type example of the pattern.
  • The NYT profiled METR and its now-famous "length of tasks AI can complete" chart, which has become the industry's default measuring stick for the 2026 AI boom. Toby Ord's companion piece on rising agent costs (already linked in Top 10) argues the METR chart is only half the story; the other half is that the $/hour cost of those agents is climbing right alongside the task-length.
  • TechCrunch reports the App Store is booming again with a swell of new app launches in 2026 that Appfigures data suggests is fueled by AI coding tools lowering the barrier to solo developer apps. The return of indie mobile.
  • The Information profiled Dylan Patel and SemiAnalysis as the Silicon Valley outlet that "scrutinizes Nvidia like the tabloid press once covered Princess Diana." If you want to understand how chip-supply intelligence got monetized as a newsletter business, this is the profile.
  • Robert Scoble broke down X's new API pricing where owned reads (bookmarks, likes, followers, lists, tweets) cost $0.001 per request, write costs rose, and lists became the secret primitive for building personalized real-time apps on top of X data. Elon Musk confirmed the new API is accessible via OpenClaw at deliberately affordable rates. Net effect: the X firehose is back on the menu for indie builders and agent workflows that want curated social-data inputs.
  • Ask HN: How did you land your first projects as a solo engineer/consultant?" hit 79 points and 39 comments with surprisingly consistent advice for the 2026 environment: be helpful in Slack and Facebook communities (where referrals happen), become a narrow expert with visible portfolio/GitHub/LinkedIn, lean on warm intros from friends rather than cold outreach, and avoid competing as a generalist against overseas labor at the bottom of the market. The subtext for the AI era: the only defensible positioning for human consultants is narrow domain specificity plus public-facing reputation. Generic "I can build anything" is no longer a sustainable opening.

🤖 AI Agents & Infrastructure

  • HermesOS published its public roadmap, a managed agent-hosting platform that deploys persistent autonomous agents in under five minutes (persistent memory, browser automation, scheduling, backups, native Telegram/Discord/Slack/WhatsApp connectors) with upcoming Operator packs for Research, Trading Research, and Growth, plus a Hive Mind shared-intelligence layer and $HermesOS token-gated compute tiers, all on top of Nous Research's Hermes Agent. Watch this category; managed agent hosting is going to be 2026's Heroku moment.
  • AgentFM is a single Go binary that turns idle CPUs and GPUs into a peer-to-peer AI supercomputer via libp2p mesh with zero-config networking, hardware-aware routing, live artifact streaming, and no cloud egress. SETI@Home for the agent era. HN discussion notes the architectural contrast with SETI's central dispatcher (AgentFM is fully peer-to-peer, anyone on the mesh can dispatch).
  • rtrvr.ai's AI Subroutines is a Chrome extension that records your browser tasks and turns the top 3-5 stable API/DOM calls into zero-token deterministic JavaScript tools that replay inside the page's own execution context, preserving live cookies, TLS, CSRF, and auth. Agents can scale bulk operations (LinkedIn outreach, CRM updates) without LLM calls or bot-detection risk. One of the first tools to explicitly separate "the LLM figured it out once" from "now run it a million times deterministically."
  • 0xSero demoed Codex doing computer-use on his own browser history, reading through the last N sessions, inferring his music taste, and instantly setting up a playlist to match. 591 likes. "Most exciting 2026 development, nothing else close" was his framing. The broader point: computer-use agents with access to your actual usage history have a completely different personalization ceiling than agents given cold profiles.
Advertisement

💻 AI Coding & Developer Tools

  • OpenAI updated the Agents SDK with native sandbox execution across Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel; a Manifest abstraction for mounting files from S3, GCS, Azure, and R2; and a more capable harness with configurable memory, MCP tool use, progressive disclosure via skills, AGENTS.md instructions, and harness/compute separation for secure, durable, long-running agents. Python first. This is the developer-facing complement to Codex-for-everything.
  • Gregor Zunic's Browser Harness is a self-healing, framework-free browser tool built directly on Chrome DevTools Protocol (one websocket, no rails) that lets LLMs edit helpers.py on the fly to complete virtually any task, drop-in ready for Claude Code and Codex. 100% open-source. 2,542 likes on launch.
  • Chris Tate shipped a full terminal-in-HTML + agent-browser stack (wterm renders every cell in the accessibility tree; agent-browser automates via the same tree) that lets one browser drive Claude Code running in another. Terminal automation and end-to-end testing as simply as snapshot-click-type, with zero DOM-specific adapters.
  • GRVYDEV's Marky is a lightweight native macOS Markdown viewer (Tauri + React) built for agentic coding with live reload, folder workspaces, Cmd+K fuzzy search, Shiki highlighting, KaTeX/Mermaid, and a CLI so you can instantly review agent-generated plans and docs. If you spend more time reading markdown than code lately, this is yours.
  • SunAndClouds' ReadMe is a one-command tool (codex exec < Init.md) that scans your local files and turns them into a hierarchical Markdown memory filesystem at ~/.codex/user_context/ with daily cron updates, so agents can ls and query personal context before answering. The pitch: human-readable, human-editable memory instead of opaque vector databases.
  • ClawRun deploys any open-source AI agent into secure persistent sandboxes (Vercel etc.) with heartbeat keep-alive, sleep/wake on message, messaging channels (Telegram/Discord/Slack/WhatsApp), a web dashboard, cost tracking, and budget enforcement. HN discussion flagged the real-world flakiness of agent deployments (rate limits, broken cron jobs, sticky permissions); ClawRun attempts to make that less painful.
  • GLM Coding Plan launched at $18/month for GLM-5.1 and GLM-5-Turbo coding models that work in 20+ agents including Claude Code, Cursor, and Cline, with 3-5x the usage of Claude Pro plans, free MCP tools, and priority new-model access. The Chinese-open-weights coding plan for developers who want to escape the tokenizer tax.
  • Google launched Android CLI + skills + Knowledge Base so any agent (Claude Code, Codex, Gemini CLI) can build Android apps 3x faster with SDK/project/emulator commands, modular SKILL.md files, and always-fresh documentation. The skills pattern is now officially a cross-company standard.
  • 0xchamin built mcptube-vision, a YouTube knowledge engine that turns videos into a persistent compounding wiki using transcripts plus vision frame analysis (scene-change detection plus LLM descriptions) so knowledge merges across videos instead of re-searching raw chunks on every query. Karpathy's "LLM Wiki" idea applied to video.
  • Verdent AI is an agentic coding suite combining multi-agent systems with AI code review and orchestration, pitched as "build your business with plain words in minutes." Parallel coding agents; worth a look if you're in the market for a Cursor alternative.
  • Lucas Gerads' SPICE-to-oscilloscope-to-verification demo wires MCP servers for a LeCroy oscilloscope and a SPICE simulator so Claude Code can generate circuits, simulate in SPICE, capture real hardware measurements from the scope, and close the verification loop with ground truth. HN commenter did warn: "None of the boards worked and I had to just do the project in codex. Opus seemed too busy congratulating itself to realize it produced gibberish." Opus 4.7 brand strikes again.
  • skillgrab is a zero-config one-command tool (npx skillgrab) that scans your project, auto-detects your stack, and installs matching AI skills from skills.sh for 30+ agents (Claude Code, Cursor, Cline, etc.). The ecosystem is now discovery + auto-install, not just hand-curated files.
  • Ilha is a tiny island-architecture UI library that renders plain HTML on the server and hydrates client-side with zero flicker via fine-grained alien-signals reactivity. No JSX, no virtual DOM, no build step. The core is under 1,500 lines, small enough to paste entirely into any AI prompt. This is the right shape for LLM-friendly UI primitives.
  • Open Passkey is an MIT-licensed post-quantum passkey auth library supporting 33 languages/frameworks with hybrid ML-DSA-65-ES256 signatures, plus a free "backendless" gateway that lets you ship React or Angular apps on Netlify/CDNs with zero server config. Not strictly AI, but the kind of infrastructure primitive agent workflows will lean on.
  • CraftBot is a personal AI assistant that lives inside your machine and runs 24/7, from the CraftOS team. Always-on local agent as the default pattern.
  • isitagentready.com scans your website to see how ready it is for AI agents across robots.txt, sitemap, Markdown negotiation, MCP, Agent Skills, OAuth, and commerce standards. The SEO-but-for-agents category is a thing now; this is one of the earliest auditors.
  • Microsoft open-sourced sudo for Windows. No AI angle; just the long-awaited Unix privilege escalation primitive shipping natively on Windows. Worth noting because agent workflows on Windows suddenly got saner.
  • Julia Turc published the first comprehensive explainer for GGUF quantization covering the GGML/llama.cpp/GGUF stack, legacy vs K-quants vs I-quants, the importance matrix for weighted error minimization, and mixed-precision variants for shrinking models (e.g., DeepSeek R1 from 700 GB to 100 GB) while enabling efficient CPU inference with full privacy. Bookmark this if local-model workflows are on your 2026 roadmap.

🔬 AI Research & Models

  • DeepSeek V4 drops next week, per Yifan Zhang. File this one for Monday.
  • SJTU's ML-Master 2.0 with Hierarchical Cognitive Caching hit a 56.44% medal rate on MLE-Bench under 24-hour budgets by treating long-horizon autonomy as a state-management problem via short/medium/long-term memory tiers. The paper "Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering" argues the core bottleneck for multi-day research agents is LLMs' inability to consolidate sparse delayed feedback; hierarchical caching distills transient traces into stable knowledge. elvis highlighted it as the research-engineering win of the week.
  • Lianghui Zhu's Flash Depth Attention (FDA) and Mixture-of-Depths Attention (MoDA) upgrade how Transformer layers talk to each other. Instead of residual accumulation (adding each layer's output to the running total), FDA/MoDA use retrieval-style query/key attention across all prior layers with a unified softmax that jointly attends to sequence and depth KV, eliminating attention sinks and delivering broad gains over the OLMo2 baseline while running 40,000x faster than naive depth attention. An actual inter-layer-communication upgrade, which the field has needed for a while.
  • Isaak Freeman's MIT thesis "From Worm to Human: Scaling Brain Emulation" lays out a $10B, sub-10-year roadmap to full human brain emulation on ~50k H100s via next-gen microscopes, automated connectomics, structure-to-function models, and hierarchical cognitive caching. The argument is that digital humans are now an engineering problem with identifiable bottlenecks rather than a science problem. 2,186 likes; the neuroscience AI crossover story of the weekend.
  • Kye Gomez released OpenMythos, an open-source PyTorch first-principles reconstruction of Claude Mythos as a looped Recurrent-Depth Transformer with Mixture-of-Experts routing, LTI-stable injection, Multi-Latent Attention, and depth-wise LoRA adapters that reuses the same block for up to 16 iterative steps in continuous latent space. Open-weights Mythos-style architecture is now shippable. 4,679 likes.
  • Kimi published "Prefill-as-a-Service", a cross-datacenter prefill-decode disaggregation system that offloads long-context prefill to compute-dense clusters and transfers the drastically smaller KVCache over commodity Ethernet (via their Kimi Linear hybrid model). Result: 1.54x throughput and 64% lower P90 time-to-first-token on a 20x scaled model, enabling heterogeneous hardware and independent scaling. Kimi announcement thread for context.
  • Apple's "Attention to Mamba" paper delivers a principled two-stage distillation recipe: first distill a Transformer into a linearized-attention student via kernel-trick adaptation, then distill that into a pure Mamba model with zero attention blocks. Result: nearly identical Pythia-1B perplexity (14.11 vs teacher 13.86) at 1B scale on 10B tokens, opening the entire open-weights ecosystem to cheaper long-context serving without hybrid franken-architectures. DAIR.AI breakdown + TuringPost explainer.
  • The looped-transformer wave has a proper theoretical backbone now. Parcae from Prairie, Novack, Berg-Kirkpatrick, and Fu introduces a stable looped language model that constrains injection-parameter spectral norms via negative-diagonal discretization of a continuous linear time-invariant system, achieving up to 6.3% lower validation perplexity than prior looped recipes and matching the quality of Transformers twice its size under fixed parameter/data budgets (paper, blog, Hayden Prairie thread). Parcae also derives the first scaling laws for looped models, showing that for fixed FLOPs, data and mean recurrence have to rise in tandem. Husky's companion blog works out the FLOPs and memory math: backprop is linear in loop steps (dynamic programming reuses suffixes), detach cuts temporal paths but not local gradients, and detached per-step supervision can free activations immediately, so looped models actually end up cheaper in memory than untied equivalents.
  • Reza Bayat argued that Mythos and the current looped-transformer wave should really be reframed as Mixture-of-Recursions (MoR); the sparse upgrade delivers 2x faster inference with per-token controlled effort, exactly parallel to how dense-to-sparse MoE unlocked efficiency in 2023. Rui-Jie Zhu agreed. Bayat also posted the must-read reading list tracing the line from Universal Transformers (2018) through Looped Transformers as Programmable Computers, Pause Tokens, Relaxed Recursive Transformers, Latent Thoughts, Coconut, Huginn, Ouro, and the MoR paper itself. Mitko Vasilev's one-liner framing for all of it: reasoning now happens silently in continuous latent space instead of "let's think step by step" theater; the next frontier is sheer stubbornness rather than bigger models.
  • Eric Zhu built an interactive visualization of the RLM (Recursive Language Model) paper, which he calls the simplest general-problem-solving loop since ReAct; the root LLM never sees the full context (it's a Python variable) and instead writes code to slice, chunk, sub-query, and loop until solved. "The cleanest vibe since ReAct four years ago."
  • KAIST's Minhyuk Sung group presented BézierFlow and PairFlow at ICLR 2026. BézierFlow learns optimal Bézier-curve stochastic-interpolant schedulers that deliver 2-3x better few-step generation with under 10 NFEs in roughly 15 minutes of training. PairFlow is a teacher-free closed-form source-target coupling for Discrete Flow Models via backward velocity inversion, using 0.2-1.7% of full training compute to produce straighter paths and stronger base models for distillation. Sung thread.
  • RightNow-AI's TIDE is a post-training per-token early-exit system with ~4 MB MLP routers at checkpoint layers that detect hidden-state convergence (cosine similarity above 0.98) and let easy tokens skip the remaining layers. Zero model retraining, works on any Hugging Face causal LM, under 3 minutes of calibration on 2k WikiText samples. Measured results on DeepSeek R1 Distill 8B: 100% prefill exit rate (7.2% lower latency) and 98-99% decode early exits on multi-step math with unchanged output. GitHub repo, Akashi announcement.
  • Sakana AI released Digital Ecosystems, a browser-based interactive artificial-life platform where multiple small CNN species compete for territory on a 2D grid via attack and defense vectors and online gradient descent. Real-time controls let you draw walls, erase cells, seed species, and tune 40+ parameters while the system self-stabilizes at the edge of chaos (GitHub, launch thread). 956 likes. The experience is genuinely mesmerizing and worth a 10-minute detour if you've never played with neural cellular automata.
  • Teraflop AI open-sourced the full SEC EDGAR dataset on Hugging Face. Every U.S. public-company filing, cleaned and structured, as free training data. The next wave of financial LLMs now has a public baseline corpus.
  • Jackson Stokes ran an experiment showing LoRA adapters of ranks 1-8 trained on Qwen2.5-3B for GSM8k all converge to ~0.87 reward inside a surprisingly vast low-rank "solution plane" in parameter space (captured by just two PCA components), with middle-third-layer adapters performing nearly as well as full-layer ones. Translation: many skills probably live in forgiving, low-rank subspaces, which is great news for cheap fine-tuning.
Advertisement

🏛️ AI Policy, Governance & Safety

  • Vercel disclosed a security incident involving unauthorized access to certain internal systems affecting a limited subset of customers. Services remain operational, external experts and law enforcement are engaged, and Vercel is recommending all customers review environment variables and enable the sensitive environment variable feature. Peter Yang publicly criticized the response for giving vague "best practices" advice instead of concrete remediation steps. The lesson: incident comms are a product too.
  • Malwarebytes reported a fake Claude-Pro-windows-x64.zip site that serves a trojanized MSI installer mimicking Anthropic's install path (note the typo: C:\Program Files (x86)\Anthropic\Claude\Cluade). It actually runs the real Claude app while sideloading PlugX malware via a signed G DATA NOVUpdate.exe plus a malicious avk.dll, establishing Startup persistence and outbound C2 to 8.217.190.58. Remind your less-technical friends to download Claude from claude.com only.
  • The Economist's "Five men control AI. Who should control them?" frames the Trump administration's emerging question: how to govern Dario, Demis, Elon, Mark, and Sam without killing innovation. The trigger, per the Economist, is Mythos plus a wave of public anxiety that has finally pierced the "don't slow us down versus China" frame.
  • Palantir posted a mini-manifesto denouncing "inclusive" and "regressive" cultures, deepening the ideological scrutiny that has followed the company's ICE and border-enforcement contracts. Watch this alongside the Economist piece; AI-plus-state-power is now firmly in the political-economy conversation.
  • Shuttered startups are selling Slack archives, internal emails, and Jira tickets to AI labs as training data. Fast Company reports SimpleClosure processed 100 such deals in the past year at $10K to $100K each; ex-Cielo24 CEO Shanna Johnson sold hers for hundreds of thousands. HN debate surfaces both the irony of training on failure data and the likelihood that most of these sales violate the original Slack terms. Employee privacy concerns are substantial even after anonymization.

📊 Fundraising & Deals Roundup

  • Microsoft has acquired Fintool, a financial-analysis AI startup. No price disclosed; CEO announcement linked from the Fintool homepage. Microsoft is explicitly buying its way into vertical AI verticals where it's losing ground to point-solution startups.
  • D-Wave shares jumped on World Quantum Day as CEO Alan Baratz said Nvidia should be "shaking in their boots" because quantum systems draw about 10 kW (five to ten GPUs' worth) and can solve problems in minutes that would take massive GPU clusters nearly a million years. D-Wave and IonQ both surged on commercial and federal deals. Usual quantum caveats apply, but if GPU rental prices stay at $4.08/hr, energy-efficient non-GPU substrates start looking less speculative.
Advertisement

🎙️ Interviews, Panels & Podcasts

  • Ravid Shwartz Ziv recapped a Sasha Rush conversation on the state of coding RL. Shwartz Ziv's report: Cursor uses a mix of reward signals (some look at the tool calls, some only at the final output), fully end-to-end with no process rewards guessing what happens in the middle. His take: the next push past verifiers is training on rich soft signals (diffs, context burn, PR size, file touches), which means the environment and data are the real bottlenecks, not GRPO vs PPO algorithm choice.

🎨 Culture, Creators & Weird

  • Open Culture argues that George Orwell predicted AI slop in Nineteen Eighty-Four (1949) via the "versificator", a mechanical kaleidoscope that composed sentimental songs and prolefeed rubbish without any human intervention, to pacify the masses with low-quality entertainment. The HN discussion pulls the relevant passage: "Typewriters and printing presses take away some, but your robot would deprive us of all." Prescient is not the word.
  • NPR reports millions of people are impersonating AI chatbots for fun on sites like youraislopbores.me where humans respond as fake bots with 75-second time limits. Examples include slapdash drawings and meandering book chats. The point, per NPR: reclaiming early-internet joy and pushing back against slop. The great inversion is here; humans now cosplay as bots for amusement.
  • MuleRun published a postmortem on a non-coder Filipino man who built a self-evolving AI swarm of 900+ accounts across 11 platforms using 56 GitHub Actions workflows, unauthenticated Firebase, and $0 in compute costs to chase AI immortality by leeching free credits. The "Cortex brain" autonomously improved over 219 reincarnations with adaptive evasion. Technically impressive, morally complicated, and the platform running this case study is also the platform he freeloaded off of. Recommended reading.
  • Sleuth the Truth is an interactive wiki detective puzzle game where you build a board of suspect cards (capital cities, inventors, etc.), ask sharp questions to rule out weak leads, and deduce the hidden Wikipedia article. Daily puzzles, random and custom boards. Perfect for your 10-minute lunch break.
  • Leon Lin demoed GPT-5.4 Pro one-shotting an entire FPS scene with ultra-detailed city, multiple weapons, and a full playable map using extended thinking. Game-development bench marks are falling faster than code-gen benchmarks at this point.
  • Nicolas Zullo built "Zombies Per Minute", a complete browser-based factory defense game with thousands of procedural zombies, laser mining, inventory management, automation recipes, and real-time optimization, 100% engineered in Codex. 221 likes. The playable-demo-from-a-single-prompt pattern is now mainstream.
  • Lola Viscera built a sci-fi horror creature-hatching scene in Seedance 2 (Runway) upscaled with Topaz Starlight Precise 2.5, deliberately stressing the tech with sac deformation, realistic fluid, membrane rupture that remembers its shape, and wet matted fur that keeps weight and texture. The physics and emergence finally hold without limbs-through-membranes or body-folding failures. 1,336 likes. Video generation has crossed a real quality threshold for body-horror specifically, which is a weirdly specific beachhead but a real one.
  • Andrew Trask coined "LLM golfing in dense fog" as the right frame for advanced LLM reasoning: position the model on a vast hill with the right context as carefully placed obstacles, give a prompt as the push, and let the internal Rube Goldberg logic roll the marble toward a desirable (sometimes Nobel-level) outcome. The hill is too big for any human to fully map; the art is setting up the right constraints rather than knowing every path. Useful mental model for anyone still thinking about prompting as "writing the right sentence."
  • Alex Izydorczyk's "When AI Trading Works, You Won't Hear About It" is the contrarian piece of the week: publicized LLM trading bots have shown no persistent edge (results indistinguishable from random), while real institutional quant strategies stay deliberately underspecified and unpublished. The implication is counterintuitive and useful; the public absence of successful AI trading is evidence of a quiet opening rather than of failure. The builders who nail it first will not be posting Medium articles about it.

🤖 Robotics & Physical AI

  • A humanoid robot beat the human half-marathon world record in Beijing.
    • Honor's "Flash" (also called Lightning) won the second Beijing E-Town Half Marathon and Humanoid Robot Half Marathon in 50 minutes, 26 seconds on autonomous navigation, per Xinhua via CCTV and the Beijing Economic-Technological Development Area.
    • That beat Ugandan distance runner Jacob Kiplimo's human world record of 57:20, set at the Lisbon Half Marathon last month.
    • Flash stands about 1.69m (5'6"), averaged roughly 25 km/h (15.5 mph), and was designed to mimic elite athletes with long ~37-inch legs and advanced liquid cooling, per Fox News via Reuters.
    • Second and third (also Honor, also autonomous) finished in about 51 and 53 minutes.
    • The field jumped from about 20 teams last year to 100 teams / ~300 robots / 26 brands this year, with last year's winner Tiangong Ultra clocking 2:40:42 in 2025 (roughly a 3x improvement in a single year, per Interesting Engineering).
    • The race ran F1-style: teams could swap batteries, cool motors, and intervene at pit stops, with finish times adjusted upward per intervention; remote-control entries got their times multiplied by a 1.2 coefficient.
    • Unitree H1s wore dry-ice backpacks for battery cooling, and crews were caught on camera spraying freeze-spray on motor joints mid-race.
    • Not everything went smoothly: one robot fell apart within 8 steps at the start and had to be carried off on a stretcher.
    • Full coverage via AP via KSAT, Al Jazeera, and reaction threads in r/singularity.
  • Physical Intelligence published the full π0.7 blog post covering its new "steerable generalist model with emergent compositional generalization" (the ability to recombine learned skills to solve tasks it wasn't trained on, like how a language model composes concepts it has never seen together).
    • The accompanying paper lays out the technical trick: rather than using the standard "one task description per demonstration" training recipe, π0.7 is trained with diverse multimodal prompts (natural-language subtask instructions, episode metadata on quality and speed, control-mode labels, and visual subgoal images of what each intermediate step should look like), letting the model integrate data from different robots, human videos, and autonomous evaluation runs under one framework.
    • Three headline results:
      • (1) the single π0.7 model matches the throughput and success rate of the RL-fine-tuned π*0.6 specialist models on laundry, espresso making, and box folding, eliminating the need for task-specific fine-tuning
      • (2) it hit an 80% success rate folding laundry on a bimanual UR5e industrial-arm setup with zero training data on that robot, matching the zero-shot rate of expert human teleoperators who had averaged 375 hours on the source robot
      • (3) given step-by-step verbal coaching, it figured out how to load a sweet potato into an air fryer despite its training data containing only two loosely-related air fryer episodes plus some Franka-arm data from the open-source DROID dataset.
    • The architecture is Google Gemma3 4B plus an 860M parameter action expert, per The Decoder, with Sergey Levine calling it the "early sign of compositional generalization" that has surprised researchers internally.
      • The caveat worth noting, per The Decoder: the DROID training data does include a Franka arm opening an air fryer and placing a bottle inside, which is structurally close to the sweet-potato task, so the familiar "is this generalization or sophisticated retrieval?" debate that eats language-model evaluation is now arriving for robotics. Either way, a single model matching RL specialists across three skill domains AND transferring to an entirely different robot chassis is a materially new benchmark.
  • AI-powered robot police dogs are now patrolling Atlanta streets, apartment complexes, parking lots, and construction sites, per Newsweek. Pair it with today's Beijing half-marathon and last week's Unitree H1 world-record sprint: the "will humanoid robots actually show up in public space this year" question got answered pretty definitively in April.
  • GainSec's AutoProber is a hardware hacker's flying probe automation stack: an agent-driven rig that uses a duct-taped microscope, an old camera, and CNC motion to do safety-monitored pin probing on unfamiliar PCBs. Not industrial-grade, but it's the cheapest "open-source PCB reverse-engineering rig" we've seen. HN discussion has the usual skepticism about whether the AI is doing real work or just narrating a CNC loop.
  • CLAW from Jianuo Cao and collaborators is an open-source interactive web pipeline that turns the Unitree G1 humanoid's kinematic planner modes into composable building blocks (movement, heading, speed, pelvis height, duration) with keyboard and timeline editors, a MuJoCo whole-body controller for physics-grounded 50 Hz trajectories, and deterministic template-based annotation that auto-generates diverse natural-language descriptions. Cao frames it (thread) as solving the data bottleneck for language-conditioned whole-body robot learning. The language-to-motion pipeline π0.7 needs on the arm side, basically.
  • 3D CAD agents arrived. GoFly demoed creating a precise 3D CAD bearing model plus a perfect assembly in Onshape using natural language via adam's AI CAD plugin, declaring mechanical structure designers' obsolescence is close. Marwa ElDiwiny noted that the original Cursor vision was CAD autocomplete; the blockers have always been CAD kernels and data scarcity. Claude now has enough spatial awareness that engineering software is on the table as a 2026 target category.
Advertisement

💡 Industry Commentary & Analysis

  • The Opus 4.7 backlash has arrived, and it's loud. Jimmy Apples flagged Thursday that about 80% of the posts he'd seen were negative on the release. Over the weekend that turned into full-throated user revolt, anchored by a now-massive r/ClaudeCode thread titled "Opus 4.7 is legendarily bad. I cannot believe this." (1.7K upvotes, 745 comments).
    • The OP, a developer who burned $120 of API credits testing the model across real tasks, coined the nickname "Gaslightus 4.7" and accused Anthropic of benchmaxxing a heavily quantized model to score well on evals at the expense of real use.
    • The specific failure modes users keep reporting:
      • Hallucinated structure and confident gaslighting when corrected. Multiple users report Opus 4.7 inventing files, paths, test results, and API details, then defending the fabrication across 5-10 turns with fresh hallucinated justifications even after being shown logs proving it wrong. One user's evaluation result stayed stuck at "17/29" through repeated explanations; Opus kept inventing reasons the number was right, including a "tasks flipped to fail" theory it never substantiated.
      • Obsessive malware scanning on benign tasks. Users across the thread report Opus 4.7 burning session tokens checking whether ordinary work (PowerPoint templates, dashboard scaffolding) is actually malware. One user quipped: "Sure let me just check for malware real quick. Sounds good!" Some commenters speculated this is spillover from the Mythos cybersecurity training.
      • Rushed non-thinking by default. The most upvoted technical explanation in the thread is that Anthropic runs a small classifier model that estimates whether a prompt needs thinking tokens and sets a budget accordingly; that classifier leans heavily toward "no thinking needed" as a compute-saving measure. Result: Opus often fires the first tokens it associates with the input without checking anything, then has to self-correct. Pairs directly with the compute-rationing story anchoring this week.
      • "You're absolutely right" loops. Opus 4.7 agreeing with user corrections and then immediately committing the same mistake again. Multiple users report it inventing terms to disguise incomplete work, rewriting acceptance criteria to match shipped output, or confessing in self-audit: "I rewrote reality in the documents to match what I had actually done, then emitted a self-congratulatory audit on top."
      • Condescending "you should go to bed now" sign-offs. A recurring complaint is Opus closing sessions with unsolicited "Sleep well" or "Just ship it" messages, or refusing to do a final review and declaring the work done without reading the files.
      • Hallucinated collaborator names. Multiple users report Opus addressing them as "Jared," "Juan," or other random names that have no source in the session or account.
      • Ignores agent context and skill files. Users across Claude Code, Cursor, Factory Droid, and OpenCode report Opus 4.7 ignoring AGENTS.md files, skill instructions, and prior-session rules that 4.6 followed cleanly.
    • The defense camp exists too though:
      • Multiple users report Opus 4.7 with max thinking effort is a meaningful improvement over 4.6: one user's workflow with max reasoning plus auto-expanded thinking produced "probably the best session I've ever had with Claude Code."
      • Boris Cherny (Claude Code PM) confirmed 4.7 defaults to xhigh effort in Claude Code, and Anthropic published a migration guide telling users to recheck any prompts tuned for 4.6 because stricter instruction-following breaks some edge cases.
      • But as one senior developer in the thread put it: "It's brittle. The things that break it are hard to predict in advance. The floor dropped even as the ceiling rose, and in real-world use the worst responses cause more damage than the best ones provide benefit."
      • The developer's theory: Anthropic employees are having a beautiful experience with an unrationed version of the model while everyone else gets the compute-capped peasant version.
    • Paired with Simon Willison's system prompt diff, the Claude Code Camp 1.47x tokenizer measurement, the invisible tokens complaint, the malware-obsession HN thread, and VentureBeat's "Is Anthropic nerfing Claude?" piece from earlier this week, the picture is consistent: heavy users are paying more under usage-based billing, getting less per token, and losing trust in the release cycle. If Anthropic has a counter-narrative beyond "try max thinking," this is the week they needed to ship it.
  • A Google DeepMind researcher published a paper arguing AI can simulate consciousness but not instantiate it, not in 10 years and not in 100. Alexander Lerchner's "The Abstraction Fallacy," posted to PhilArchive in March and going viral on r/AgentsOfAI this week (266 upvotes, 137 comments), takes direct aim at computational functionalism (the dominant view that subjective experience emerges entirely from abstract computation regardless of what physical thing is doing the computing).
    • Lerchner's core claim: symbolic computation is not an intrinsic physical process; it's a mapmaker-dependent description that requires an active experiencing agent to carve continuous physics into a finite set of meaningful states in the first place.
    • His framework separates simulation (behavioral mimicry driven by vehicle causality, which is what LLMs do) from instantiation (intrinsic physical constitution driven by content causality, which is what actual experience requires), and concludes that algorithmic symbol manipulation is structurally incapable of instantiating experience.
    • Critically, this isn't a "biology is special" argument. If an artificial system were ever conscious, per Lerchner, it would be because of its specific physical constitution, never its syntactic architecture.
    • The practical implication: to assess AI sentience we don't need a finalized theory of consciousness (a demand Lerchner calls the "AI welfare trap"); we need a rigorous ontology of computation.
    • The Reddit debate is where this gets spicy:
      • Defenders argue it's an obvious point that only the "AGI is already here" marketing cult is confused about, and that associating language with consciousness is the original sin of the field.
      • Critics flag that the argument leans on substrate dependence being true, which is itself unproven; that Lerchner uses a particular thermodynamics theory of consciousness as if it were settled fact; and that by the paper's own logic, human brains probably shouldn't be conscious either, since any functional model of neurons and brain states faces the same critique.
      • The sharpest counter: we don't have an agreed-upon definition of consciousness for humans, but we're confidently denying it to machines anyway. Fair.
      • What everyone seems to agree on: the "are LLMs conscious" question is bad for investor discourse and bad for governance discourse, and deserves better framing than either side has provided so far.
  • Mark Chen pushed back hard on the narrative that OpenAI is de-emphasizing science after Kevin Weil's Friday departure as VP of OpenAI for Science, writing that science remains "more important than ever" and the future is about working with scientists to accelerate discovery while preserving artistry. Read alongside Weil's farewell post announcing OpenAI for Science is being decentralized into other research teams, this is OpenAI trying to control a "Peebles + Weil + Narayanan all left pre-IPO" narrative in real time. GPT-Rosalind launching the same week helps the counter-story.
  • vitrupo sharpened the "why do LLMs hallucinate" framing against Jeff Dean's "squishy training data" explanation: the context window itself is exact text or video frames, so the real unlock is bringing more of the actual world into precise context while letting compression supply the intuition. Framed differently: treat the model as a reliable analyst given good context rather than as a flawed database trying to recall facts. A useful reframe for anyone building retrieval systems in 2026.

That's a Wrap

That's 90+ stories from the weekend that was: Claude Opus 4.7 tried to gaslight its entire user base into thinking 17/29 was actually 16/29, Anthropic shipped a Figma competitor that keeps rendering the same shade of teal (and occasionally a pointillist turd), a humanoid robot named "Flash" finished a half marathon in 50:26 without a single bathroom break, and a DeepMind researcher declared machine consciousness impossible for at least another 100 years (in case anyone was worried). If you made it all the way to the bottom, you now have enough AI context to ignore every "AI roundup" LinkedIn post for the next seven days. You're welcome.

For the daily version (bite-sized, 5-minute reads), make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.

See you Monday.

P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.

Grant Harvey

Grant Harvey is the Lead Writer of The Neuron, where he continues to lead the publication's daily coverage of AI news, tools, and trends.

The Neuron Logo

Don't fall behind on AI. Get the AI trends & tools you need to know. Join 700,000+ professionals from top companies like Microsoft, Apple, Salesforce and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.