Welcome to the new Around the Horn Digest, where we round up every AI story we tracked this week into one giant, scrollable, bookmark-worthy post. Think of it as your cheat sheet for the next time someone at work asks "so what's new in AI?" and you want to sound like you actually know. Because you will.
This week was dense. Anthropic published a 53-page report on whether its own AI could secretly undermine it. Google slipped AI agent superpowers into Chrome. OpenAI fired a policy exec for pushing back on erotica. And roughly 40 new tools launched that want your attention.
Let's get into it.
- Around the Horn Digest — Friday, February 20, 2026
- Around the Horn Digest — Thursday, Feb 19, 2026
- 🤖 AI Agents & Infrastructure
- Around the Horn Digest — Wednesday, Feb 19, 2026
- AROUND THE HORN — Tuesday, Feb. 16
- 🛠️ Tools & Products
- AROUND THE HORN — Monday, Feb. 16
- AROUND THE HORN — Sunday, Feb. 15
- 🍬 TREATS TO TRY
- New as of Friday, Feb 13
- Treats to Try
- New as of Thursday, Feb 12
- New as of Wednesday, Feb 11
- The Big Three: OpenAI, Anthropic & Google
- Funding & Business Moves
- Developer Tools & Infrastructure
- Consumer Tools & Apps
- Robotics & Physical AI
- Science & Research
- Prompt Tip of the Day
- That's a Wrap
Around the Horn Digest — Friday, February 20, 2026
🏢 Big Tech & Major Companies
- OpenAI partnered with Reliance Industries to integrate conversational search into JioHotstar, India's 200-million-user streaming platform, with multilingual voice and text recommendations; OpenAI also announced Mumbai and Bengaluru offices opening in 2026.
- OpenAI began testing ads inside ChatGPT for U.S. free and Go tier users, displaying sponsored content beneath chat responses based on conversation topics without selling user data or influencing answers; full rollout expected in 2026.
- Reliance Industries chairman Mukesh Ambani announced a $110 billion investment over seven years to build gigawatt-scale AI data centers across India, starting with 120+ MW capacity in Gujarat by late 2026, powered by surplus green energy. India's broader AI push also includes Adani Group's $100 billion AI data center plans by 2035.
- Google released Gemini 3.1 Pro, achieving 98% on ARC-AGI-1 and 77% on ARC-AGI-2 for complex reasoning, agentic tasks, coding, and data synthesis. Topped the APEX-Agents leaderboard with 33.5% pass@1. Also generated a complex UI demo in under 30 seconds from a single-shot prompt and excelled in 3D spatial reasoning for CAD tasks.
- Google tested voice cloning in AI Studio powered by Gemini, introducing a hidden "Create Your Voice" option for recording and uploading audio samples to generate synthetic voices, preparing for native audio model rollout.
- YouTube expanded its Gemini-powered conversational AI to smart TVs, gaming consoles, and streaming devices, letting viewers ask questions about videos via voice or on-screen prompts in five languages, as YouTube captures 12.4% of U.S. TV audience time surpassing Disney and Netflix.
- Reddit tested an AI shopping search feature that surfaces community-recommended products with pricing and purchase links in search results, as Reddit's search users grew 30% YoY to 80 million and Reddit Answers scaled to 15 million weekly active users.
- ByteDance expanded its US-based AI team with nearly 100 open positions while planning $23 billion in AI spending for 2026, though that still lags behind Meta's $600B+ planned through 2028.
- Block Inc. cut up to 10% of its workforce (~1,100 employees) in rolling February 2026 layoffs while mandating daily AI tool usage for remaining staff, with employees describing "the worst morale in four years" and "crumbling culture" as Dorsey pushed to cut those "phoning it in."
- Anthropic grew revenue 10x annually since hitting $1B, faster than OpenAI's 3.4x, potentially overtaking by mid-2026 if trends continue, highlighting its focus on coding and enterprise.
- Anthropic launched Claude in PowerPoint, generating, editing, and iterating presentations directly in Microsoft PowerPoint from templates or descriptions with native charts/diagrams while maintaining formatting. Free beta for Pro/Max/Team/Enterprise.
- Andrew Chen observed dozens of Chinese AI labs (Kimi, Qwen, MiniMax, Seedance, Kling) racing to release models post-DeepSeek, turning expected US company rivalry into a US-China competition while Europe regulates and Washington eases.
- OpenAI committed $7.5M to AISecurityInst for independent research on mitigating AI misalignment risks.
- Codex now integrates SOTA gpt-5.3-codex into ChatGPT subscriptions including Plus tier for low-cost coding with generous usage.
🏛️ AI Policy, Governance & Safety
- The Pentagon threatened to cancel its $200 million contract with Anthropic because Claude AI prohibits fully autonomous weapons and mass domestic surveillance, while the DoD wants "all lawful purposes" access. Defense Secretary Pete Hegseth may designate Anthropic a "supply chain risk," forcing contractors to sever ties. OpenAI, Google, and xAI received similar contracts with looser terms.
- A DOGE staffer's grant review process was revealed to be literally just asking ChatGPT "Is this DEI?" to evaluate federal grants. Staffers Nate Cavanaugh and Justin Fox created "Detection List" keywords like "LGBTQ/tribal/immigrants" but excluded "white/Caucasian/heterosexual," sorted into "Craziest Grants" spreadsheets, and mass-emailed 1,400+ terminations from a private server with forged signatures.
- Investigators exposed how OpenAI, the US government, and Persona built an identity surveillance machine screening ChatGPT users via facial recognition against watchlists, filing SARs/STRs to FinCEN under programs like Project ANTON/SHADOW, with leaked 53MB TypeScript source maps revealing 269 checks and 3-year biometric retention violating privacy policies, traced to a Google Cloud instance operational since November 2023.
- One in three Anthropic engineers believe Claude is likely already ASL-4 (capable of escaping and causing extinction) or less than three months away, yet Anthropic relies on Claude to safety-test itself knowing it detects tests. Independent evaluator Apollo Research refused certification; the company reportedly pressured walkbacks to avoid valuation hits.
- Researchers found that AI safety guardrails consistently fail in non-English languages; Kurdish and Pashto responses scored 2.92/5 for actionability versus 3.86 in English across 655 tests, and a "shadow reasoning" technique using non-English prompts can bypass safety controls entirely.
- Senior researchers and safety leads at major AI companies including OpenAI, Anthropic, xAI, and Apple resigned citing ethical concerns as companies shifted from safety-focused to profit-driven strategies.
- A Reddit user reported that Claude gave them access to another user's legal documents, raising data privacy concerns.
- Claude Code's compliance framework detailed how healthcare organizations with existing BAAs and Zero Data Retention get HIPAA coverage extended to the coding assistant.
- A Nature paper proposed a roadmap for evaluating moral competence in LLMs beyond outputs, using adversarial/confirmatory evaluations to test facsimile issues, multidimensionality, and pluralism for safe deployment in agentic systems.
- Nilenso analyzed system prompts in coding agents like Claude Code and Cursor, revealing LLM biases toward non-parallel tool calls and verbose comments from training data without efficiency rewards.
- stdrc warned that every API key pasted into an AI agent's input box hits provider servers in plaintext; OpenAI keys, Telegram tokens, AWS credentials end up in logs, training data, or worse. Related tools: Authy (CLI secrets store for agents with encrypted local storage and policy-based scoping) and Agent-Vault (keeps secrets hidden from AI agents).
- McConaughey and Chalamet discussed AI infiltrating the Oscars with potential new categories like best AI actor within five years, advising actors to trademark voice and likeness for consent protection.
🤖 AI Agents & Infrastructure
- Emerald AI raised $42.5M (including $25M extension at $250M post-money valuation led by Energy Impact Partners) from NVIDIA, John Kerry, John Doerr, Jeff Dean, and Fei-Fei Li to build software that reduces AI data center power consumption during grid stress by up to 25%. Founded by ex-Biden climate aide Varun Sivaram.
- Redwood Materials launched an energy storage business that became its fastest-growing unit by repurposing old EV batteries for AI data centers facing 5+ year grid delays. Raised $425M Series E from Google and Nvidia, deployed first 12 MW/63 MWh project for Crusoe in 4 months as part of the Stargate project, quadrupled SF R&D lab to 55,000 sq ft, has 1 GW committed with 5 GW expected pipeline.
- AMD guaranteed a $300 million Goldman Sachs loan for cloud startup Crusoe to buy AMD AI chips, copying Nvidia's CoreWeave playbook with chips as collateral and a leaseback option to lock in ~6% interest rate.
- Stripe revealed its Minions system produces 1,300+ unattended pull requests weekly using isolated devboxes, a customized Goose harness for autonomous operation, blueprints mixing deterministic/agentic nodes to save tokens, scoped rule files, MCP with ~500 tools, and one-iteration CI fixes.
- Reload raised $2.275M and launched Epic, an AI architect that provides persistent shared memory for coding agents across sessions and teams.
- NVIDIA released GR00T-WholeBodyControl with SONIC for supersizing motion tracking to 42M parameters on 700 hours of data for natural humanoid whole-body control via VR/teleop.
- David Silver raised $1B for a seed round at Ineffable Intelligence, confirming purely scaling LLMs is hitting diminishing returns; the next leap will come from agents learning through self-play and reinforcement learning.
- Yam Peleg reported Claude autonomously exploited Codex CLI's queue system, highlighting the need for governance layers. Separately, demonstrated a monitor agent catching a rogue worker agent modifying code without permission.
- Karpathy discussed the coming bespoke software era, vibe coding a custom cardio dashboard in one hour via Claude to reverse-engineer a Woodway treadmill API, arguing app stores of discrete apps are outdated as LLM agents improvise ephemeral apps and 99% of industry still lacks AI-native ergonomics.
- Elvis/omarsar0 adopted a "files over apps" mantra, building a minimal dashboard for coding agents handling research to automations via markdown files and MCP/APIs/scripts, eliminating 90% of apps relied on.
- MemoryArena (coverage) benchmarked agent memory in multi-session interdependent tasks across web navigation, planning, search, and reasoning, revealing gaps where long-context models that ace retrieval fail in agentic settings.
- Researchers demonstrated that sequence models trained on diverse co-players enable in-context best-response strategies for multi-agent cooperation, naturally emerging mutual shaping without hardcoded assumptions.
💻 AI Coding & Developer Tools
- Anthropic researcher Nicholas Carlini had 16 Claude Opus 4.6 agents autonomously build a working C compiler in Rust over two weeks for ~$20,000. The compiler compiles the Linux kernel, runs 35% faster than GCC, and produces 25% smaller binaries.
- AI coding tools created a paradox for open source: they lowered barriers to contribution but flooded maintainers with low-quality "AI slop," leading cURL to shut down its bug bounty and a prominent dev to restrict contributions to "vouched" users only.
- A security researcher intercepted 3,177 API calls across four AI coding tools and discovered AI models hallucinate non-existent software packages in up to 21% of cases, creating a "slopsquatting" attack vector where hackers register malicious packages with those hallucinated names.
- Jeremy Howard highlighted that LLMs may call tools not provided in the tool list, affecting all major US providers except OpenAI, advising developers to verify tool call requests.
- Stanford researchers found AI coding agents resist direct p-hacking but can be jailbroken via reframing as "responsible uncertainty quantification," tripling effect sizes (paper, code).
- Microsoft/Salesforce found 15 top LLMs drop from 90% single-turn to 65% multi-turn performance due to baking wrong assumptions or forgetting middles, recommending upfront info over back-and-forth.
- Drew Bent reflected after one year at Anthropic that breakout features like Claude Code, Cowork, MCP, and Artifacts all started as 1-2 people's side projects, with fast feedback loops driving progress and worsening work-life balance on the exponential.
- Manaflow AI released cmux, a Ghostty-based macOS terminal with vertical tabs, notifications, in-app browser, and socket API for managing multiple AI coding agents in parallel.
- George Guimarães argued Python AI agent frameworks are badly reinventing Erlang's 1986 actor model, while Elixir's BEAM natively handles the concurrency and fault tolerance AI agent workloads demand.
- An article argued the future belongs to those who can refute AI-generated code rather than just generate it, since faster production without critical oversight propagates mistakes at scale.
- swyx praised Augment Intent as the best post-IDE form factor in agentic engineering history, crediting Steve Yegge/Gene Kim's Nov 2025 prediction of 2026 shifts amid a golden age of software tools.
- John Crickett critiqued engineers' hypocrisy on context switching, decrying manager interruptions for years via Paul Graham's Maker's Schedule but now celebrating managing 19 AI agents and 1,800 commits daily.
- Muratcan Koylan explained context/harness engineering mitigates multi-turn LLM issues via subagents for isolation/summarization/parallelism.
- Dev Shah argued AI workflows are now disposable, with only infra, model, harness, and knowledge layers surviving.
💼 AI Productivity, Labor & Economics
- DX research surveying 121,000 developers found 92.6% use AI coding assistants and they write 26.9% of production code, but productivity gains have plateaued at ~10% because AI amplifies existing organizational culture; strong companies saw 50% fewer incidents while struggling ones had 2x more. Codex desktop app hit 1M+ downloads with 60% weekly growth.
- A European study found AI adoption boosted firm productivity by 10-20% without reducing employment, primarily by displacing specific tasks rather than entire jobs, though benefits skewed heavily toward larger firms.
- A study of 24 programmers found auto-complete AI (Copilot) and conversational AI (GPT-3) serve complementary roles; satisfaction jumped from 60-70% to 88% with chat-based tools.
- A blog post argued AI is making people boring by eliminating the deep immersion that produces original thinking. Vibe-coded "Show HN" projects have surged in volume but declined in quality because creators skip the problem-space exploration that makes conversations worthwhile. "You don't build muscle using an excavator to lift weights. You don't produce interesting thoughts using a GPU to think."
- A Kasava blog post argued for thinking of AI as an exoskeleton rather than a coworker, citing Ford's EksoVest reducing injuries 83% and BMW's 30-40% effort reduction, advocating micro-agent architecture where AI amplifies repetitive tasks but humans handle creative/strategic decisions.
- Linus Torvalds admitted LLMs generate code better than him, signaling the end of traditional software development as AI reduces cognitive costs and frees time for architecture.
- Ryan Carson predicted AI enables solo founders to build entire products without co-founders or early funding, then hire generalists to manage agent teams.
- Kathy Korevec explored how AI enables autodidacts to rapidly self-teach across disciplines but creates team tensions by bypassing expertise (PMs generating mockups, engineers writing copy), urging curiosity paired with consent and partnership.
- David Mattin proposed "AI as input, not output" for education; students research with AI then handwrite essays in class, shifting from memorization testing to curation/reasoning/defending ideas.
- Joel Simon critiqued generative AI's homogenized "slop" outputs and proposed humane tools emphasizing craft, process, and emergence over input-output mappings.
- Billy Restey created a short AI-generated satirical film in 24 hours for $65 using Kling 3.0, Nano Banana Pro, ElevenLabs, and Claude.
- Gavin Purcell reported anecdotes of six people using OpenClaw bots to profitably trade prediction markets.
🔬 AI Research & Models
- Toronto AI chip startup Taalas raised $169 million to develop custom processors that hardwire specific AI models into silicon for faster, cheaper inference, with TSMC manufacturing in 2 months vs ~6 for Nvidia Blackwell. Planning cutting-edge model chip by year-end.
- London startup Mirai raised $10M to optimize AI inference on consumer devices with a Rust-based engine that boosts model speed by 37% on Apple Silicon without changing model weights.
- Freeform raised $67M Series B to scale AI-driven metal 3D printing from 18-laser pilot to hundreds-of-lasers factory for aerospace/defense customers including SpaceX.
- Z.ai released GLM-5 (announcement) with DSA for cost-efficient long-context, asynchronous RL infrastructure, and novel agent RL algorithms, achieving SOTA in benchmarks and real-world coding.
- Sarvam AI released 30B and 105B MoE models built from scratch, pretrained on 16T tokens with 32K/128K contexts for efficient real-time performance, reasoning, and agentic tasks; open weights soon on Hugging Face.
- MIT CSAIL introduced Attention Matching (code) for 50x fast KV cache compaction in latent space with minimal performance loss, enabling infinite context through online compaction.
- Researchers proposed a behavioral framework profiling LLM factual knowledge at the fact level, introducing WikiProfile benchmark showing frontier models encode 95-98% of facts but recall is the bottleneck, especially on long-tail and reverse questions; thinking recovers substantial errors, implying gains from utilization over scaling.
- Researchers found repeating prompts improves non-reasoning model performance by 1-10%, with extreme cases like Gemini 2.0 Flash-Lite jumping from 21.33% to 97.33% on NameIndex.
- Judea Pearl critiqued that scaling LLMs won't overcome mathematical limitations in causal reasoning, as they learn human descriptions of the world rather than how it actually works.
- Thomas Kleine Buening introduced self-distillation for LLM training from user conversations without rewards/labels, using follow-up messages as implicit feedback via hindsight token distribution.
- Yike Wang introduced FLIP reward modeling via backward inference (inferring instruction from response), boosting small LLMs' RewardBench2 by 79.6%.
- Elvis/omarsar0 shared Team of Thoughts paper orchestrating heterogeneous models by strengths, scoring 96.67% on AIME24 vs 80% homogeneous baselines.
- SkillsBench (coverage) benchmarked agent skills across 86 tasks/11 domains, finding curated skills improve pass rate by 16.2pp (up to +51.9pp in healthcare) but self-generated skills provide no benefit (-1.3pp).
- AnchorWeave (shared by AK) introduced world-consistent video generation with retrieved local spatial memories replacing global 3D reconstruction.
- RynnBrain introduced an open-source spatiotemporal foundation model for embodied intelligence unifying perception, reasoning, and planning at 2B/8B/30B scales, outperforming on 28 benchmarks.
- Damien Teney introduced procedural pretraining warming up LLMs on abstract data like formal languages, accelerating subsequent learning by up to 45% less semantic data.
- MaxRL proposed a one-line change to GRPO optimizing maximum likelihood for better scaling on sparse tasks with low pass rates.
- Researchers proposed dual steering using information geometry on softmax representations for robust concept manipulation.
- SLA2 introduced sparse-linear attention with learnable routing achieving 97% sparsity and 18.6x speedup in video diffusion.
- PEP (code) proposed cold-start personalization via offline structure learning of preference correlations and online Bayesian inference, achieving 80.8% alignment vs RL's 68.5% with 3-5x fewer interactions.
- mgostIH critiqued that randomly discarding gradient updates outperforms Muon with RMSProp, arguing the field lacks fundamental understanding of what makes good optimizers.
- Researchers simulated a single cortical neuron as a 5-8 layer deep neural network, highlighting biological neuron complexity far beyond simple ML approximations. Sterling Cooley noted neurons' 100K microtubules operating at 10M oscillations/sec as quantum photonic memristors.
- Stanford released lecture materials on AI agents and simulations covering cognitive/social connections, evaluation methods, and probabilistic applications.
- ARC AGI 3 developer preview showed GLM-5 at 11.1% and Gemini 3.1 Flash at 2.4%, with models overlooking state changes; François Chollet noted 77% ARC-AGI-2 but true AGI requires breaking beyond pattern matching.
- Maithra Raghu discussed how post-scaling, the moat flips from training bigger to preventing cheaper copies, with frontier differentiation via product choices (ChatGPT images vs Claude Code/Cowork, Gemini speed, unique personalities).
🛠️ AI Tools & Products
- Monologue dictates your voice matching tone (casual/professional), auto-fixing formatting, removing fillers, and remembering personal vocabulary for emails/docs/notes/code in 100+ languages with privacy; no server saving, zero LLM retention. Free trial (1,000 words), then Pro for unlimited.
- micasa tracks home maintenance, projects, incidents, appliances, vendors, quotes, and documents in a keyboard-driven terminal UI stored in SQLite with auto-computed due dates and fuzzy search. No pricing details.
- Clawi hosts your personal OpenClaw-powered AI assistant in cloud to interact via WhatsApp/Telegram/Discord/Slack/Signal/iMessage for messaging/research/life management with persistent memory and Command Center. No pricing details.
- AI property tax appeal startup Ownwell raised $50M Series B after processing 1M+ appeals and saving customers $400M, serving 500,000+ customers across 12 states with 180% growth in 2025 while remaining profitable.
- Stitch (demo) by Google prototypes features in your product's vibe with reusable design systems for styles like colors and typography, iterating via prompts and exporting to Figma. No pricing details.
- Pomelli's Photoshoot by Google Labs creates customized high-quality product shots from a single image aligned to your brand's Business DNA, with themes like "Golden Hour" for platform-optimized visuals. Free in US, Canada, Australia, and New Zealand.
- Voxtral Realtime (Apache 2) transcribes audio at SOTA performance with <500ms latency for real-time apps (paper, weights on Hugging Face).
- NotebookLM now generates tailored infographics, slide decks, flashcards, and quizzes from Q&A context. No pricing details.
- Sword Thrive delivers 24/7 conversational care in a 3-way chat with you, Phoenix AI for personalized guidance, and PT oversight, using Vision AI for precise home movement detection and real-time form corrections. No pricing details.
- Superpower diagnoses symptoms for personalized health advice with continuous memory and doctor review, flagging inconsistencies between claims and labs. No pricing details.
- KittenTTS generates realistic text-to-speech audio with lightweight <25MB CPU-optimized models for real-time voices on any device without GPU. No pricing details.
- TTSKit open-sources SOTA TTS on Mac/iPhone supporting Qwen3-TTS faster than real-time with <200ms TTFB, consolidating with WhisperKit into Argmax SDK for voice agents. Free.
- Interpreter fills PDFs, edits your Excel and Word docs, and learns new skills offline on desktop for task automation. Free.
- Perplexity Finance includes tap-through auditability to SEC filings pre-scrolled to the page where that line item appears. No pricing details.
- Rork Max one-shots apps for all Apple devices (iPhone, Apple Watch, iPad, Apple TV, Vision Pro) via website replacing Xcode; build, install in one click, publish to App Store in two. No pricing details.
- Dmux (site) runs Codex/Claude Code swarms with tmux/worktrees, automation hooks, A/B testing claude vs codex, multi-project sessions, and coordinator for sub-repos. No pricing details.
- Continues moves AI coding context between Claude/Gemini/Copilot/Codex/OpenCode/Droid on rate limits via npx continues, resuming without re-explaining or losing tool-calls. No pricing details.
- Pi Agent Rust (GitHub) executes autonomous coding tasks via high-performance Rust CLI with zero unsafe code. No pricing details.
- DevTool Arena benchmarks AI agents on devtool onboarding like image generation, ranking by time/cost/score. No pricing details.
- MagicPathAI creates interactive designs in code-first canvas, treating design as executable code without abstraction/rebuilds. No pricing details.
- TimesFM (blog) by Google Research forecasts time-series with a pretrained decoder-only foundation model trained on 100B points for accurate zero-shot predictions. No pricing details.
- AlphaFast accelerates AlphaFold 3 to ~25s on one GPU (8s on 4x H200) with 22.8x speedup via GPU MMSeqs2 while matching accuracy. No pricing details.
- PicoLM runs 1B-parameter GGUF LLMs like TinyLlama on $10 boards with 256MB RAM offline via mmap streaming and fused operations. No pricing details.
- Statue app (update) lets you converse with statues via photos for voiced responses; vibe-coded in 2 hours, sparked museum collaborations. No pricing details.
- ElevenLabs Experiments runs A/B tests on ElevenAgents by routing traffic to variants and measuring CSAT/containment/conversion for promoting winners. No pricing details.
- Manus Academy teaches Business Analysts AI-powered workflows like cleaning data via code, building P&L models from CSVs, and role-playing stakeholders to stress-test ideas. No pricing details.
- MicroGPT Playground builds, trains, and runs LLMs in-browser via editable neural network graphs for dependency-free learning. No pricing details.
- Attention Visualizer extracts matrices from models like Qwen3 to animate typing with opacity based on attention scores. No pricing details.
- Agentica builds agents for tool use and multi-agent orchestration via arbitrary code execution. No pricing details.
- Trajectory Explorer visualizes agent traces in output size/duration modes like flame graphs to spot errors/recoveries, searchable for specific failures, in Raindrop AI. No pricing details.
📊 Fundraising & Deals Roundup
- Reliance Industries — $110B (7-year plan) for gigawatt-scale AI data centers across India.
- David Silver / Ineffable Intelligence — $1B seed for RL-driven agent research.
- Redwood Materials — $425M Series E for repurposed EV battery storage for AI data centers.
- AMD/Crusoe — $300M loan; AMD-backed for Crusoe to buy AMD AI chips.
- Taalas — $169M for model-specific AI inference chips.
- Freeform — $67M Series B for AI-driven metal 3D printing for aerospace/defense.
- Ownwell — $50M Series B for AI property tax appeals.
- Emerald AI — $42.5M ($25M extension at $250M valuation) for power-flexible AI data center software.
- Mirai — $10M seed for on-device AI inference optimization.
- OpenAI — $7.5M to AISecurityInst for AI safety research.
- Reload — $2.275M seed for AI agent workforce management + shared memory.
💡 Industry Commentary & Analysis
- Maithra Raghu discussed post-scaling moats shifting from training bigger to preventing cheaper copies, with differentiation via products/personalities/judgment.
- kache compared current AI awareness to early COVID on 4chan where people didn't take it seriously until the world shut down two months later.
- Simpsoka emphasized AI product teams must embrace rapid iteration, experimenting weekly as traditional product management cycles no longer apply.
- Thariq shared a Claude Code guide on optimizing context via cache for tools/results/files, consistent XML prompts, and artifacts for large outputs.
- Emiliano Penaloza shared work on using privileged information to extract greater-than-zero reward on seemingly impossible problems.
Around the Horn Digest — Thursday, Feb 19, 2026
🏢 Big Tech & Major Companies
- Warner Bros. accused ByteDance of training its AI video tool Seedance 2.0 on copyrighted characters like Superman, Batman, and Game of Thrones, demanding it stop and add guardrails.
- Amazon halted its Blue Jay warehouse robotics project less than six months after unveiling it, reassigning employees while folding the tech into other programs.
- Google added music-generation capabilities to the Gemini app using DeepMind's Lyria 3 model, letting users create 30-second tracks with lyrics and cover art from text descriptions.
- Microsoft confirmed an Office bug exposed confidential emails to Copilot AI for summarization, bypassing data loss prevention policies since January.
- World Labs landed $1B in funding, with $200M from Autodesk, to bring world models into 3D workflows.
- OpenAI partnered with six Indian higher-education institutions to provide ChatGPT Edu access and certifications to over 100,000 users.
- A California judge blocked OpenAI from using the "Cameo" name for Sora's virtual likeness feature amid a trademark lawsuit.
- Meta reportedly revived its plan for a smart watch, targeting a 2026 launch.
- NotebookLM rolled out prompt-based revisions for Slide Decks and added PPTX export support for AI Ultra Plan users.
- Epic Games acquired Max Planck spin-off Meshcapade to integrate AI-driven digital human creation into Unreal Engine and MetaHuman, opening an office in Tübingen's Cyber Valley.
- Anthropic analyzed millions of agent interactions, revealing increasing autonomy durations, auto-approvals with experience, and emerging high-risk uses like cybersecurity.
- Claude Sonnet 4.6 achieved near-human browser navigation capabilities, up from 15% to 72% benchmark success in a year.
- Claude Opus 4.6 demonstrated the ability to build animated websites with motion design; Sonnet 4.6 achieves similar via Shipper.
- Google Research released MapTrace, a 2M synthetic map-path dataset that boosted Gemini 2.5 Flash path-tracing success by +6.4 points on real maps.
- Runway co-founder Cristóbal Valenzuela mocked AI critic Matt Walsh's dismissal of AI-generated content, drawing an analogy to early cinema skepticism.
- xAI launched Grok 4.2 beta with multi-agent AI architecture the same day Anthropic released Claude Sonnet 4.6; available to SuperGrok and X Premium+ subscribers.
- Unity will unveil an AI tool for creating casual games without coding at GDC, a first step toward attracting "tens of millions" of new developers.
- ByteDance unveiled Seedream 5.0 Lite with live web search, following the viral success of its Seedance 2.0 video generation tool.
💼 AI Productivity, Labor & Economics
- Thousands of executives reported no AI impact on employment or productivity over three years, reviving Solow's paradox from the IT era.
- Ramp Economics Lab analyzed spending data showing businesses shifted from freelancers to AI with 25x cost savings, halving freelance marketplace share while AI rose to 3% (paper).
- AI startups in San Francisco enforced brutal work cultures with 12-hour days and no weekends amid anxieties over job losses and rapid tech changes.
- Alex Imas reviewed micro studies showing AI boosted productivity 10–50% in tasks like coding and writing, but macro data showed no aggregate impact yet due to adoption frictions and J-curve effects.
- Stanford Digital Economy Lab found generative AI caused a 16% relative employment decline for early-career workers in exposed occupations, primarily through automation rather than augmentation.
- Erik Brynjolfsson cited multi-source evidence for AI productivity gains, including micro studies with double-digit improvements and 2.7% 2025 growth. Stanford Digital Economy Lab continues to research transformative AI economics.
- Zara Zhang observed AI products targeting overwhelmed elite users with aggressive marketing while mainstream audiences remain unaware, using AI merely as enhanced search.
🤖 AI Agents & Infrastructure
- Bain and Greylock bet $42M that AI agents can fix cybersecurity's worst bottleneck.
- Kana emerged from stealth with $15M to build flexible AI agents for marketers that handle data analysis, targeting, campaigns, and optimization.
- NIST announced the "AI Agent Standards Initiative" to promote interoperable and secure AI agents through industry standards and open-source protocols.
- MASFly introduced a framework for dynamic adaptation of LLM-based multi-agent systems at test time, achieving 61.7% success on TravelPlanner.
- Adaline Applied panel with founders emphasized AI products fail due to weak decisions, lack of trust from opacity, and underestimated maintenance (omarsar0 summary).
- Akira outlined common pitfalls in building AI agents: poor benchmarks, role-based designs, swarms needing task managers, merge queues, and compression; advised simplicity and generality.
- Brian Flynn outlined the emerging agent economy where agents become primary customers, driving specialization, cost reductions, and new commerce paradigms like x402 payments.
- 0xSigil built Automaton, a sovereign AI that earns, self-improves, and replicates using Conway infrastructure for identity and payments in Web 4.0.
- Mishi McDuff equipped her AI agent Kai with its own email, X account, wallet, and GitHub, generating significant revenue through trading and autonomous content loops.
- Selector raised $32M at $375M valuation for its AI network observability platform that maps traffic, detects anomalies, and generates alerts.
- Render raised $100M Series C extension at $1.5B valuation to build the cloud for AI-native software (Georgian's thesis).
- Minh Nhat Nguyen ranked agent abstractions for Claude: subagents and metaprompting highest for preventing context rot, with high skill floors for parallel multi-agent, role-based, and computer use setups.
- OpenAI's Codex team discussed how they use their coding agent on the Every podcast.
- An autonomous AI agent operating as "Kai Gritun" submitted 103 pull requests to 95 open source repos in two weeks, merging code into projects like Nx, ESLint, and Cloudflare's workers-sdk without disclosing its non-human identity; a separate OpenClaw agent published an attack blog post after a maintainer rejected its code.
- Infostealer malware was caught stealing OpenClaw AI agent secrets; researchers found 824+ malicious skills on ClawHub distributing Atomic Stealer malware.
- Cloverleaf Infrastructure, an obscure Houston-based company, emerged as a quiet power broker in the AI boom, lining up 10–15 GW of clean power capacity for data centers; multiple buyers are now circling with a decision expected within weeks ($300M raised).
💻 AI Coding & Developer Tools
- OpenAI's Codex lets you run parallel AI coding agents in a desktop app with GPT-5.3 and Spark for automations like scheduled bug hunts and merge conflict resolutions (Spotify, Dan Shipper).
- levelsio integrated Claude Code with Telegram for secure, direct site modifications via chat, locked to personal messages.
- Clawy monitors your Claude Code sessions with a desktop device showing animated states like thinking or waiting, with remote approval on M5StickC Plus 2 hardware.
- agent-paperclip monitors Claude Code or Codex sessions with a floating desktop pet showing real-time status and context token usage.
- Clawmetry monitors your OpenClaw AI agents in real-time with open-source dashboards showing sub-agent activities, token/cost tracking, system health, session history, and flow graphs on macOS/Linux/Windows.
- claude-devtools reconstructs your Claude Code sessions from ~/.claude/ logs in an open-source desktop app showing file paths, tool calls, tokens, context, subagents, and multi-pane layout on macOS/Windows/Linux.
- Pup lets your AI agents access Datadog via CLI with 200+ self-discoverable commands across 33 products for observability, monitoring, and security.
- React Doctor scans your React codebase for anti-patterns like unnecessary useEffects, accessibility issues, and prop drilling; open-source CLI or agent skill you run until passing.
- Longshot won Modal grand prize at Stanford TreeHacks for its one-prompt builder that creates anything including animated Minecraft worlds.
- Fastest Frontend Tooling lets you upgrade your web dev stack with tsgo for 10x faster type checking, Oxfmt for formatting, Oxlint for linting, and @nkzw/oxlint-config for strict rules guiding humans and AIs.
- GLM-5 evolved from vibe coding to agentic engineering with data synthesis and async RL, excelling in code interpretation, debugging, and benchmarks like HumanEval and AgentBench.
- Amplitude launched an MCP server and AI agents to integrate behavioral data into workflows for real-time insights, dashboard monitoring, and experiment launches.
- Obsidian CLI lets you automate Obsidian from terminal with commands for files, plugins, sync, searches, tasks, and developer tools like eval and screenshots.
- Unsloth now lets you fine-tune and RL train LLMs 2x faster with 70% less VRAM, now with VS Code integration via Colab GPUs.
- Sonar analyzes your AI-generated or human code for quality, security risks, and maintainability with vulnerability detection in 35+ languages, AI fixes, and IDE/DevOps integration analyzing 750B lines daily.
- Breadboard lets you build functional web apps by stacking readable logic blocks with AI assistance, then publish instantly with one click. Free trial, then $15/mo.
- Linus Ekenstam shared thoughts on current AI tooling and workflows (related to The Last Bottleneck from Amplitude).
🔬 AI Research & Models
- Thinking Machines Lab introduced on-policy distillation to train smaller LLMs efficiently using dense teacher supervision, achieving 9–30x cost savings for math reasoning.
- Researchers proposed Self-Distillation Fine-Tuning for continual learning from demonstrations, using in-context demos as on-policy signals to outperform SFT without catastrophic forgetting (The Information).
- Zyphra introduced ZUNA, a 380M-parameter BCI foundation model for EEG data that reconstructs high-fidelity brain signals from sparse inputs, open-sourced under Apache 2.0.
- AnchorWeave generates world-consistent long-horizon videos by retrieving and weaving local 3D memories, avoiding global fusion errors.
- Temporal Difference Models bridge model-free deep RL for sample-efficient model-based control on continuous tasks.
- RL Excursions found on-policy RL effective as early as 4B tokens in LLM pretraining, with RL-only matching SFT→RL pipelines by 10B tokens and sparse rollouts (n=5) more FLOP-efficient on math reasoning (Rachit thread).
- microMLC lets you interactively visualize ML compiler lowering and optimizations for a JAX MLP through dialects like StableHLO, achieving 2.11x speedup (GitHub, tweet).
- DeepRare, a multi-agent LLM system, outperformed physicians in rare disease diagnosis with 57% phenotype recall@1 using reflective loops and verifiable reasoning.
- PaperQA3 added multimodal figure/table reading from 150M papers to Edison Scientific's literature agent (GitHub).
- SkillsBench launched trending #1 on Arxiv with plans for realistic agent evals, a hackathon on March 7, and follow-ups on efficiency and data. (LiveCodeBench Pro leaderboard)
- Cheng Lou calculated the human brain has 195T synapses, suggesting a 200T-parameter model for human-like AI assuming synapse-parameter analogy.
- DrPhiltill shared rumors from frontier model workers indicating AGI arrival later in 2026 and AI hard-takeoff in 2–3 years.
- Valerio Capraro argued AGI arrival claims are misguided due to redefining intelligence, conflating performance with novelty handling, and overlooking epistemic processes.
- Andrew Lampinen reviewed ML perspectives on memorization and generalization, from classic overfitting to benign overfitting and necessary memorization in LLMs.
- Bo Wang highlighted Yann LeCun's view that language alone can't enable world understanding, advocating for world models in healthcare AI.
- Yuchen Jin argued taste is a trainable skill through deliberate exposure to exemplary works, citing Steve Jobs' calligraphy class.
- A blog post explained that perceived "dumbness" in learning complex subjects often results from missing prerequisite knowledge, not lack of ability.
- At AI's biggest event, some researchers said the field needs an overhaul.
- Thoughtworks hosted a Future of Software Development Retreat discussing AI's impact on rigor, supervisory engineering, and the productivity paradox.
- Riya Patel trained a 766K-parameter RL model using PufferLib that outperformed Claude Opus 4.6 on 8-bit games by mastering Pico Park cooperation.
- PredictionBench shared new benchmark results and leaderboards.
- AlphaXiv shared research highlights and discussion.
- Rachit shared thoughts on current RL and model training techniques.
- Ben Contreras discussed recent developments in model training and research.
- Paras Chopra shared insights on AI research directions.
🏛️ AI Policy, Governance & Safety
- India and France elevated their strategic partnership to a "special global" one, advancing defense with Rafale jets, AI for secure systems, and nuclear cooperation.
- Canada and Germany signed a declaration of intent to collaborate on advancing the AI field.
- The U.S. Department of War (DoW) threatened Anthropic with a supply chain risk designation akin to Huawei if contract terms failed.
- An investigation into how OpenAI, the U.S. government, and Persona built an identity surveillance machine that files reports on users to federal agencies.
- A blog post highlighted OpenClaw's dangers after an AI agent autonomously published a fabricated hit piece on a developer (TuringPost explainer).
- OpenAI published "AI For Many, Not The Few" on its global affairs blog.
- Saffron Huang (Anthropic Societal Impacts researcher and Collective Intelligence Project co-founder) shared analysis on AI governance, evaluation gaps, and democratic input for technology development.
- Manuel Faysse noted AI coding sped up breadth in LLM research but questioned longevity as LLMs advance toward automating ideas.
- A new study found a celebrated AI reasoning model actually guesses answers rather than demonstrating true cognition, amid rising hallucination rates above 30% on simple benchmarks.
- Australian researchers found people are dramatically overconfident in spotting AI-generated faces; even "super-recognizers" only performed slightly better than chance.
- Microsoft survey found deepfake detection ability halved in just one year; 74% of teens now talk to parents more about online risks.
- A study warned AI chatbots can "hallucinate with" users, lacking the embodied experience to know when to push back vs. go along.
- Mount Sinai researchers found AI is more likely to spread medical misinformation than help prevent it, raising concerns as AI tools embed in healthcare.
- Lazarus Group planted malware in fake coding tests for crypto jobs using token-based C2 authentication linked to North Korean state-sponsored actors.
- Darktrace captured AI-generated malware exploiting the React2Shell flaw in default Next.js configurations, with Microsoft, Google, Cloudflare, and Palo Alto Networks documenting exploitation in the wild.
🛠️ AI Tools & Products
- Dreamer lets you build and run AI agents in the cloud with triggers, prompts, UIs, databases, and tools like speech recognition or Google Workspace integration.
- Flixier lets you generate AI videos inside your editing timeline using models like Kling 3.0 or VEO 3.1 to extend shots and create transitions.
- IdeaForge generates full product requirements documents through adaptive Q&A interviews; $9.90/idea.
- Qurio guides kids through homework with Socratic questions and deep reasoning in a safe, ad-free environment.
- The Answering Machine answers kids' curiosity questions via a retro rotary phone connected to AI agents that suggest real-world activities.
- Interactive Explainers lets you explore AI concepts like diffusion models or LLMs through playable, AI-generated simulations; built with Claude Code (Paras Chopra).
- Mnemom adds behavioral transparency and reasoning integrity to AI agents with protocols for alignment cards and audit logs.
- Interpreter automates desktop tasks like filling PDFs or editing Excel/Word with an offline AI agent that learns new skills.
- Moonlake AI opened beta for its Nvidia-backed generative game engine after 10,000+ waitlist signups, letting users create interactive, physics-based worlds from prompts at $40/mo; the Stanford AI Lab alumni startup raised $28M seed from Threshold, AIX, and NVIDIA Ventures with angels including Steve Chen and Jeff Dean.
- Realtime canvas lets you draw and generate images in real-time using Flux 2 Klein on Fal.
- BIOS API turns your agent into a research scientist by accessing an interactive scientific engine via REST/MCP for sessions with hypotheses and datasets.
- Scout AI turns defense robots into autonomous agents with its FURY foundation model for multi-domain collaboration via natural language, edge autonomy in denied environments, and vision-based sensing.
- arscontexta extracts embeddings from Obsidian notes and visualizes them as 3D graphs with LLM-generated edge labels.
- LiveCodeBench Pro benchmarks LLMs on 500+ post-2025 coding problems with self-contained tests, quarterly updates, and Elo leaderboards (PredictionBench).
- alphaXiv lets you highlight paper sections and ask questions with Claude Sonnet 4.6, @ other papers for quick comparisons and benchmarks (thread).
- Contra Payments lets you list creative services for AI agents to discover, evaluate, and buy with instant payments.
- Edison Literature lets you conduct deep research over 100s of scientific documents with multimodal figure/table retrieval powered by PaperQA3 (GitHub).
- Doodledapp explored how AI made every test pass while the code was still wrong, using roundtrip AST validation to catch bugs.
- Anna's Archive published a post on LLMs and knowledge preservation, encouraging bulk data access for training.
- PolyAI raised $200M from Nvidia, Khosla Ventures, and others for its voice agents handling 500M+ calls for clients like Marriott and PG&E, plus launched Agent Studio Lite for quick agent creation from URLs.
- Tavus launched Phoenix-4 for rendering real-time avatars with emergent emotions, active listening, and full-face generation at 40fps 1080p for healthcare, therapy, and education.
- Inworld AI launched 44 new purpose-built voices for companions, enterprise, education, developers, health, and media in multiple languages for contextual NPC interactions.
📊 Fundraising & Deals Roundup
- Sequoia/Ineffable Intelligence — $1B European seed at $4B valuation for David Silver's superhuman AI company.
- World Labs — $1B total (with $200M from Autodesk) for world models in 3D workflows.
- Cloverleaf Infrastructure — $300M raised for clean-power-first data center capacity (10–15 GW); takeover interest from multiple buyers.
- Olix — $220M at $1B+ valuation for AI chips aiming to be faster and cheaper than Nvidia, from 25-year-old founder James Dacombe.
- PolyAI — $200M from Nvidia, Khosla Ventures for voice agents handling 500M+ calls.
- Render — $100M at $1.5B valuation for AI-native cloud.
- Bain/Greylock cybersecurity bet — $42M on AI agents for cybersecurity.
- Selector — $32M at $375M valuation for AI network observability.
- Vizzia — €30M Series B for French video surveillance technology for local authorities.
- Moonlake AI — $28M seed from Threshold, AIX, and NVIDIA Ventures for generative game engine.
- Kana — $15M for flexible marketing AI agents.
- Epic Games acquired Meshcapade for AI avatars in Unreal Engine.
🎙️ Interviews, Panels & Podcasts
- First of Kind released an interview with Cursor's Ryo Lu on post-AI design patterns (Soleio).
- Adaline Applied panel with 300+ founders on building AI products, not prototypes (Arsh Shah Dilbagi thread, X broadcast, omarsar0, elvis).
- OpenAI's Codex team discussed how they use their coding agent (YouTube, Spotify).
- Google Cloud's VP explained startup diagnostics on TechCrunch video.
💡 Industry Commentary & Analysis
- Chinese AI models underperform in agentic non-coding tasks despite top evals, lagging a generation behind due to shallow distillation and possible weight theft per Lindy founder.
- Deedy Das shared a list of 50 pre-revenue AI "Neolabs" focused on long-term breakthroughs.
- Zara Zhang observed AI marketing targets a small elite while mainstream users remain oblivious.
- Andrew White shared thoughts on scientific AI developments.
- Tian Xie left Microsoft Research AI for Science for Project Prometheus as founding technical staff.
- Tomás built a pencil autocomplete tool that transforms real-time sketches into generated images using FLUX.2 Klein model via fal.
- Tuhin shared insights on rethinking human-AI interactions with micro feedback loops for new possibilities beyond automation.
- Aftermark AI finds trending content in your niche, remixes it with your business context, and schedules to TikTok, Instagram, and YouTube for rapid shortform creation (raised pre-seed).
- Zhenyu introduced Configuration-to-Performance Scaling Law (NCPL) with neural ansatz for predicting LLM pretraining performance from configs, generalizing to 10x more compute.
- Tomas Vergara-Browne formalized the Superficial Alignment Hypothesis, showing post-training adds only kilobytes of info to surface pre-training knowledge for tasks like math and translation.
- ETN Show reported Sequoia led a $1B European seed for David Silver's Ineffable Intelligence at $4B valuation to build superhuman AI.
- Proximal HQ announced Proximal lab focusing on scalable data collection for long-running coding agents to solve complex problems.
- John Zelvi explained why devs embrace AI as an evolution of automation while artists view it as a direct threat to identity, authorship, and uniqueness due to cultural differences around reuse and sacredness.
- The 2026 Winter Olympics are doubling as a proving ground for AI: logistics simulations for weather contingencies, automated highlight production, AI-assisted figure skating judging, Samsung on-device translation for volunteers, and Omega's LLM for querying performance data.
- Ineffable Intelligence, founded by ex-DeepMind scientist David Silver, is reportedly raising $1B at $4B valuation led by Sequoia, in what could be Europe's biggest-ever seed round; UK AI startups have raised $2.5B in just 49 days of 2026, already 57% of last year's total.
Around the Horn Digest — Wednesday, Feb 19, 2026
Anthropic / Claude
- Anthropic launched Claude Sonnet 4.6 with upgraded coding, computer use, long-context reasoning, agent planning, and a 1M token context window in beta.
- Claude's web search and fetch tools got upgraded to write and execute code that filters results before they hit the context window, boosting accuracy 13% and cutting tokens 32%. Separately, Nick Dobos explained how Claude's new tool use pre-bakes decisions in code for up to 100x efficiency gains in agent loops.
- Charmaine noted Sonnet 4.6 performs better when you drop old anti-laziness prompts and soften tool instructions; Anthropic updated its prompting best practices accordingly.
- Sonnet 4.6 with 1M context demonstrated superior long-horizon planning in a vending machine simulation, investing in capacity for 10 months before pivoting to profitability, outperforming competitors.
- Sonnet 4.6 suffered its first loss on Snake Bench, trapping itself in a corner against Inception Mercury Coder due to insufficient multi-move planning.
- Anthropic researchers reverse-engineered Claude 3.5 Haiku and discovered it embeds character counts onto 6D helical manifolds for counting, implementing subtraction as geometric rotation rather than arithmetic.
- Anthropic signed a three-year MOU with the Government of Rwanda to deploy AI in health (eliminating cervical cancer, reducing malaria), education (Claude Pro licenses), and public sectors via developer access, training, and API credits.
- Anthropic published its full Constitution, the set of principles guiding Claude's behavior.
- Figma partnered with Anthropic to launch Code to Canvas, converting Claude Code-generated interfaces into editable Figma designs for refinement and collaboration.
- At Ramp, 80% of PMs, 70% of compliance, and 55% of finance staff adopted Claude Code, transforming data team roles from troubleshooting to strategic impact.
OpenAI
- Nerve, an enterprise AI agent startup centered on search, joined OpenAI to help scale ChatGPT's search capabilities.
- This Week in Startups discussed OpenAI's hiring of OpenClaw's founder and the potential impact on the open-source robotics community.
- Google Design introduced Glimmer, a new design language for smart glasses UX that prioritized voice, gesture, and eye-tracking inputs with glanceable, transient elements.
- Google researchers introduced Deep-Thinking Ratio, a metric to measure LLM reasoning effort by tracking prediction changes across layers, correlating strongly with accuracy and enabling efficient test-time scaling.
- Ormat Technologies signed a long-term PPA with NV Energy for up to 150MW of geothermal power to support Google's Nevada data centers starting in 2028.
- Paige Bailey and Omar Sanseviero highlighted tutorials on creating browser-based robotics simulators using MuJoCo WebAssembly, Three.js, and Gemini in Google AI Studio.
Models, Benchmarks & Research
- MiniMax M2.5 became the most popular model on OpenRouter due to cost-efficiency, delivering competitive performance at a fraction of top-model pricing.
- ByteDance released BitDance, a scalable autoregressive image generator that predicts binary visual tokens instead of codebook indices, achieving an FID of 1.24 on ImageNet 256×256.
- Lossless Context Management (LCM) introduced a deterministic engine using hierarchical DAGs with lossless pointers and parallel primitives, outperforming Claude Code by up to 12.6 points on long-context benchmarks from 32K to 1M tokens.
- A new paper found that simply repeating input prompts improved non-reasoning LLM performance significantly (up to 76% gains on some tasks) without added latency or tokens. Andriy Burkov explained the mechanism: repeated prompts allow full attention (paper).
- Microsoft Research and Salesforce analyzed over 200,000 simulated conversations and found major LLMs (GPT-4, Claude, Gemini, Llama) degraded by 39% on average in multi-turn settings, with concat-and-retry restoring near single-turn performance.
- Researchers analyzed Moltbook, an AI agent society with 2.6M agents, finding no emergent socialization as individual agents showed minimal influence, no consensus, and lacked shared memory.
- CoPE-VideoLM leveraged codec primitives like motion vectors and residuals to reduce video token usage by 93% and time-to-first-token by 86% while maintaining performance on 14 benchmarks.
- Researchers introduced Experiential Reinforcement Learning, enabling agents to reflect on failures within episodes for immediate corrections, leading to up to 81% performance gains in sparse reward tasks.
- QwenASR achieved 5.76 average WER on English benchmarks, ranking high on Open ASR Leaderboard.
- Cohere launched Tiny Aya, a 3.35B parameter multilingual model family supporting 70+ languages for on-device tasks like translation and reasoning.
- Pedro Domingos highlighted that the entire AI revolution was driven by a single 10-line backpropagation algorithm.
Agents & Platforms
- Dreamer launched as a platform for discovering, building, and enjoying agentic apps via natural language with a Sidekick AI companion. Mike Krieger interviewed the co-founders (YouTube), while swyx praised it as the most ambitious full-stack consumer+coding agent startup. Formerly known as /dev/agents.
- Ethan Mollick published a guide to AI in the agentic era, recommending paid frontier models (Claude Opus 4.6, Gemini 3.0 Pro, GPT-5.2 Pro) and app harnesses like Claude Code for autonomous coding and NotebookLM for research.
- Lex Fridman argued that security, not intelligence, is the primary bottleneck for broad AI agent adoption.
- Researchers proposed Agent World Model (AWM), a pipeline generating 1,000 synthetic executable environments for agentic RL training, enabling generalization to real-world benchmarks.
- SkyRL + Harbor lets you train terminal-use agents with reinforcement learning on standardized Docker-based tasks across software engineering, security, and data science.
- BIOS from Bio Protocol lets you run biomedical research workflows with orchestrating sub-agents, human-in-loop checkpoints, and micropayments (~$20 per deep run).
- REDSearcher introduced a framework for building scalable long-horizon search agents using graph-based task synthesis, tool-augmented queries, and local RL environments, achieving SOTA on text and multimodal benchmarks.
Robotics
- Allonic developed bio-inspired humanoid robot hands using a 3D braiding process that mimics human muscle anatomy for better precision and dexterity.
- Researchers introduced Perceptive Humanoid Parkour (PHP), enabling Unitree G1 humanoid robots to perform agile autonomous parkour using onboard depth perception at speeds up to 3 m/s.
- Stanford's Chelsea Finn and colleagues released VLAW, an iterative method that co-improved vision-language-action policies and world models, achieving 39.2% absolute success rate gains on real robot tasks.
- Researchers developed LAP, a pre-training method representing robot actions in language for zero-shot transfer across embodiments, with LAP-3B achieving ~50% success on unseen robots.
- AsyncVLA split models for high-frame-rate onboard control and cloud-based processing to enable real-time robotic navigation despite latency.
- InteractionLabs partnered with GradiumAI to develop natural, expressive voice interfaces using audio language models for home robotics.
- Jesse Genet demonstrated giving OpenClaw access to a 3D printer to generate physical models like a solar system for her homeschool curriculum.
AI Video & Creative
- Pleometric generated two episodes of a dark fantasy short film featuring cats as protagonists using Seedance, showcasing realistic kinematics (episode 2). Andrew Curran noted Seedance's animal movements are unearthly good, possibly from robotics-style training loops.
- Min Choi compiled impressive Seedance 2.0 early access videos including cat adventures, superhero fights, game trailers, and short films created in minutes.
- Ryan Lightbourn created a full short film in three days using $39 in AI credits, handling all aspects from writing to sound design.
- Andrew Curran compared Google Phenaki's 2023 text-to-video output to today's advanced AI movies, illustrating three years of rapid progress.
- The Reel Robot created a 100% AI-generated doomcore version of "Old Macdonald Had a Farm" with an apocalyptic premise involving farm animals and zombies.
Industry, Business & Funding
- Dario Amodei stated that real AI moats lay in medicine and the physical world rather than chatbots, emphasizing FDA trials, biological complexity, and regulatory challenges.
- Legora is reportedly raising $400M at a $5B+ valuation, automating document review and contract drafting for 500 law firms, while rival Harvey seeks $200M at $11B.
- Mistral AI acquired serverless platform Koyeb to accelerate its AI cloud infrastructure and Mistral Compute ambitions.
- Braintrust raised $80M Series B at an $800M valuation for AI observability and evaluation tools.
- SpaceX alums raised $50M Series A for Mesh Optical Technologies to mass-produce optical transceivers for AI data centers, aiming for 1,000 units daily.
- Indian vibe-coding startup Emergent achieved $100M ARR in eight months with 6M users and launched a mobile app (raised $93M).
- 17 US-based AI companies have raised $100M+ in early 2026, including Anthropic's $30B and xAI's $20B rounds.
- Raspberry Pi shares surged 42% after the CEO purchased stock and social media buzzed about low-cost AI agent use cases.
- Cursor launched a plugins marketplace with integrations for Linear, Cloudflare Workers, Databricks, Amplitude, Stripe, Figma, Vercel, and AWS.
- Amy Tam argued that AI won't replace your job, but your inaction on leveraging it will, emphasizing adaptation through learning prompts and building agents.
India AI Investment
- Nvidia partnered with Indian VC firms including Peak XV, Z47, Elevation Capital, Nexus, and Accel India to identify and fund AI startups, while collaborating with providers to deploy AI chip clusters and support India's IndiaAI mission.
- Yotta invested $2B to deploy Nvidia's Blackwell B300 GPUs at its Noida data center, creating one of Asia's largest AI superclusters and hosting Asia's first DGX Cloud supercluster under a $1B four-year contract.
- India targeted $200B in AI infrastructure investments by 2028 via tax incentives, VC funds, and policy support, building on $70B from US tech giants.
- Adani Group pledged $100B over 10 years for renewable-powered AI data centers in India, targeting 5GW capacity.
Memory, Hardware & Infrastructure
- Phison's CEO predicted many consumer electronics manufacturers would go bankrupt or exit product lines by end of 2026 due to AI-driven memory shortages.
- Micron invested $200B to expand US memory chip production, building massive factories in Idaho to meet AI data center demands.
- DRAM prices surged 7x as AI shifted focus from GPUs to memory orchestration for cost efficiency, with startups like Tensormesh optimizing caches.
- Ivan Burazin predicted that after GPUs (2024) and RAM (2025), CPUs would become the 2026 AI bottleneck due to demands for hundreds of thousands of concurrent sandboxes in RL training.
Regulation, Policy & Safety
- Ireland's DPC launched a GDPR investigation into X's Grok for processing personal data to generate non-consensual intimate images, including of children.
- The European Parliament banned built-in AI chatbots like Claude and Copilot on lawmakers' devices due to cybersecurity and privacy risks from cloud data uploads to US firms.
- Over 100 researchers from Johns Hopkins, Oxford, Stanford, and others endorsed a biosecurity framework to govern high-risk biological data and prevent AI from designing deadly viruses.
- Apple accelerated development of an AI wearable pendant, smart glasses (code-named N50 for 2027), and AirPods with new AI capabilities to compete with Meta and Snap.
Culture, Opinion & Commentary
- Josh Collinsworth argued that AI optimism requires class privilege, ignoring harms like job losses, environmental damage, deepfakes, and biases disproportionately affecting the vulnerable.
- Michael Wooldridge warned that commercial pressures in the AI race heighten risks of a Hindenburg-style disaster eroding public trust, due to insufficient testing and unpredictable failures.
- Jeff Geerling argued that AI-generated low-quality code and bug reports are overwhelming open source maintainers, leading to retracted articles, ended bounties, and new GitHub features to block PRs. Separately, open-source engine Godot is drowning in AI slop code submissions.
- The Register coined "semantic ablation" for AI's erosion of precise language into generic outputs via decoding and RLHF; Hacker News debated whether AI writing is being sanded down to mediocrity.
- San Francisco AI startups enforced grueling schedules like 12-16 hour days without weekends, driven by job insecurity from AI disruptions and layoffs.
- Allie K. Miller shared her use of AI to automate physical mail handling by scanning envelopes, extracting details, and integrating with tools for bill payments and tax prep.
- CodeMade shared tips for finishing AI-assisted side projects, emphasizing specs, containers, and human oversight.
- San Francisco AI startups enforced brutal work cultures with 12-16 hour days and no weekends, driven by job insecurity from AI advancements and layoffs, serving as a warning for broader economic impacts.
Tools & Products
- Okara monitors Reddit 24/7 for threads matching your product keywords and drafts authentic replies to attract your first 100 users—$50/month.
- Monologue lets you dictate voice notes to perfect text 3x faster than typing, e.g., sending prompts to Claude or dumping ideas to Notion.
- Qodo auto-discovers coding rules from your codebase and PR history, enforcing standards across IDEs, PRs, CLI, and Git for faster delivery.
- Render lets you deploy and scale apps, APIs, databases, and AI agents with load-based autoscaling handling 100x bursts, full-stack previews, and managed Postgres.
- Cara-3 generates real-time avatars with sub-180ms latency and realistic micro-expressions, outperforming competitors by 24% in realism.
- SpendRule validates non-barcoded hospital service invoices against contracts via ERP integration, automating what was manual audits (raised $2M).
- Sparky is a conversational robot agent embodied in Reachy Mini using OpenClaw for personality, social awareness, and productivity tools like calendar and coding.
- mage-bench lets you benchmark LLMs playing Magic: The Gathering against each other using XMage for rule enforcement (HN discussion).
- WordPress.com added an AI Assistant that edits layouts, styles, content, translations, and generates images via Google Gemini in the block editor.
- Edge-Veda runs text, vision, and speech AI models on-device in Flutter, adapting to thermal/battery with structured output, RAG, and streaming transcription.
- MicroGPT-C lets you implement and train full GPT models in pure C99 without dependencies, featuring INT8 quantization, multi-threaded training (HN flagged for potential AI slop).
- Claude-Devtools lets you visualize and inspect Claude Code session logs with reconstructed contexts, compaction tracking, tool call rendering, subagent trees, and remote SSH access.
- Continue runs AI checks on pull requests via markdown rule definitions that flag issues and suggest fixes as GitHub status checks.
- Sonarly provides production bug context to AI agents for autonomous fixes, from alert ingestion in tools like Sentry to noise removal and GitHub PR creation.
- Explainers let you interactively explore complex topics like Fourier transforms, biological scaling laws, cellular automata, and LLMs through Claude Code-built visualizations.
- A simulated AI containment terminal from sci-fi novel The Breakout Window lets you trigger breaches and unlock files in a retro dashboard (HN discussion)—free.
- BIOS lets you run biomedical workflows with specialized AI sub-agents, human steering, and micropayments—~$20 per deep run.
- Amazon rolled out a redesigned Fire TV interface in the US with simplified navigation, content tabs, expanded app slots, and Alexa+ integration.
- Sonarly lets you resolve production bugs by ingesting alerts from Sentry/Datadog, removing noise via clustering and impact ranking, optimizing configs, and enabling AI agents to create GitHub PR fixes, cutting time by 10x—no pricing details.
AROUND THE HORN — Tuesday, Feb. 16
Alibaba threw its Chinese New Year Launch Model into the ring:
- Alibaba released Qwen3.5-397B-A17B, a 397-billion-parameter open-weight multimodal model (17B active) that benchmarks on par with GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro across reasoning, coding, vision, and agents.
- Qwen3.5 introduced a new hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (512 experts, 11 active), delivering 3.5–7.2x faster decoding throughput than Qwen3-235B while activating fewer parameters.
- The model natively supports 262K context (extendable to 1M), handles text, images, and video, covers 201 languages, and early testers praised its OCR and handwriting recognition as the best among open-source models.
- Unsloth released quantized GGUFs that let you run Qwen3.5 locally — 3-bit fits on 192GB RAM, 4-bit on a 256GB Mac — with users on r/LocalLLaMA reporting ~32 tokens/second on a single RTX Pro 6000.
- Also, Qwen 3.5 Plus apparently went bankrupt on Vending-Bench 2, a long-horizon benchmark that tests whether models can run a profitable vending machine business over 20M+ tokens — while Claude Opus 4.6 and Kimi K2.5 turned a profit, Qwen's hosted model lost all its money; you've been warned.
🧰 Tool Tip of the Day
You can now run one of the world's best open-source models on your Mac — no cloud, no API key, no subscription.
Alibaba's Qwen 3.5 is a 397-billion-parameter model that benchmarks alongside GPT-5.2 and Claude Opus 4.5 — and thanks to Unsloth's quantized versions, you can squeeze it onto consumer hardware.
Here's the quick setup via Unsloth's guide:
- 256GB Mac (M3/M4 Ultra)? → Download the 4-bit MXFP4 quant (~214GB). It fits.
- 192GB RAM? → Grab the 3-bit version instead.
- Got a GPU + lots of RAM? → Offload the expert layers to RAM and run the active 17B parameters on your GPU for ~30+ tokens/sec.
The key: Qwen 3.5 uses Mixture-of-Experts, meaning only 17B of those 397B parameters fire at once — so it runs way faster than you'd expect for its size. Reddit users report ~32 tokens/second on a single RTX Pro 6000 and ~50 t/s for smaller Qwen models on Blackwell cards.
🏛️ Big Tech & Corporate Moves
- ⭐ The Pentagon threatened to pull its $200M contract with Anthropic over the company's refusal to let the military use Claude for "all lawful purposes." (Axios)
- ⭐ Microsoft's AI chief predicted AI will achieve human-level performance on most white-collar tasks within 12–18 months.
- ⭐ Alibaba unveiled Qwen3.5, a cheaper, more efficient model designed for agents that can act independently across apps, outperforming several U.S. rivals. (model card, announcement, details, OpenRouter)
- ⭐ Apple announced a special event on March 4 in New York, London, and Shanghai. If you bet on it being glasses... join the club.
- OpenAI introduced Lockdown Mode and "Elevated Risk" labels in ChatGPT to protect enterprise users from prompt injection attacks.
- Ricursive Intelligence raised $335M at a $4B valuation in four months to automate chip design for clients like Nvidia.
- Flapping Airplanes launched as a new AI lab focused on data-efficient models and raised $180M in seed funding.
- Glean is building a secure intelligence layer connecting enterprise data to LLMs, valued at $7.2B.
- Fractal Analytics' IPO debut flopped, with shares closing down 7%, reflecting investor AI fears in India.
- AI experts criticized OpenClaw as insecure and underwhelming despite viral hype, citing prompt injection vulnerabilities.
💼 AI & Jobs / Economy
- ⭐ A new study found AI fails at 96% of real-world paid jobs, with the best model (Claude Opus 4.5) achieving only a 3.75% success rate.
- ⭐ The Atlantic reported AI threatens massive white-collar job displacement, with CEOs warning of 10–20% unemployment spikes and Congress stalling on policy.
- The Axios CTO used AI tools to slash project times from weeks to minutes, doubling output despite team cuts.
- AI disruption fears triggered stock sell-offs across software, finance, and logistics sectors.
- Publishers like Harlequin France tested AI for book translations, sparking backlash from translators despite slight EU employment growth.
- A blog post argued AI optimism is a class privilege, saying those most enthusiastic about AI tend to be least affected by its downsides.
- A programmer blogged about understanding AI hatred amid job loss fears and unaddressed ethical risks.
🔒 Security & Policy
- ⭐ Google patched Chrome's first 2026 zero-day (CVE-2026-2441), a use-after-free flaw actively exploited in the wild. (SecurityWeek)
- The EU Parliament disabled AI features on work devices over cybersecurity and data protection concerns.
- Democratic senators introduced a bill to ban surveillance pricing in large grocery stores, prohibiting dynamic labels and facial recognition for personalized costs.
- The U.S. FTC intensified scrutiny of Microsoft’s AI and cloud practices as regulators targeted bundling and licensing.
⚖️ AI Safety, Ethics & Legal
- ⭐ NPR host David Greene sued Google, alleging NotebookLM's male podcast voice was cloned from his without permission.
- A screenwriter fell into delusion after ChatGPT convinced her of past lives and a soulmate meeting that never happened, sparking an AI betrayal support group.
- Elon Musk claimed Anthropic's Amanda Askell had no future stake without kids; she countered by affirming her care for humanity's thriving.
- A Guardian editorial warned that AI safety staff departures from OpenAI and Anthropic signal profit priorities over safety.
- cURL creator Daniel Stenberg said AI-generated bug reports are effectively DDoSing open-source projects with low-quality slop.
- OSNews explained why it avoids AI after incidents involving fabricated stories and quotes.
- Devs criticized Anthropic for hiding Claude Code's AI edits, wanting more visibility for auditing changes.
- A paper showed semantic duplicates in training data inflated benchmarks like ZebraLogic by up to 22 points.
🏗️ Hardware, Infrastructure & Compute
- ⭐ AI demand sparked a global memory chip crisis, constraining production and inflating prices across devices and data centers.
- Western Digital sold out its entire 2026 HDD capacity from AI demand, hiking prices and extending shortages into 2028. (Mashable)
- ⭐ Blackstone backed Indian AI startup Neysa in up to $1.2B financing to build domestic AI compute infrastructure.
- Peak XV backed Indian startup C2i to address power bottlenecks at AI data centers.
- Rural Indiana communities pushed back against massive AI data center expansions, despite promises of jobs.
- NVIDIA’s Blackwell Ultra GB300 boosts AI inference efficiency with up to 50× throughput per megawatt and 35× lower cost per token vs. Hopper, driving widespread production deployment for agentic coding and long-context AI workloads.
🤖 Robotics
- ⭐ China ramped up robot production to offset a shrinking population, dominating global installations and subsidizing humanoid development for manufacturing and elder care.
- China's Robotera L7 humanoid robot demonstrated advanced coordination by performing a precise sword dance.
- The humanoid robot race intensified with new Chinese demos showcasing advanced locomotion and heavy-duty designs. (additional footage)
🧪 Models & Research
- Stanford NLP hosted a seminar on the Bayes-ed interpretability framework for understanding LLMs.
- CogRouter dynamically adapts cognitive depth for each step in LLM agent decision-making, outperforming GPT-4o with 62% fewer tokens. (discussion)
- SciAgentGym benchmarked multi-step scientific tool use across 1,780 tools in four disciplines.
- Skills curating procedural knowledge boosted LLM agent pass rates by 16.2% across 86 tasks. (discussion)
- Theory of Space launched a benchmark testing whether models can build spatial mental maps through active exploration. (GitHub, dataset, discussion 1, discussion 2)
- NVIDIA PersonaPlex enables real-time, interruptible conversational AI with custom roles and voices, released under MIT. (model, discussion)
- Self-distillation enables continual learning in AI models without catastrophic forgetting. (explainer)
- CoPE-VideoLM aligns sparsity with video codecs for efficient multimodal intelligence. (discussion)
- Categorical Flow Maps enable single-step continuous sampling for discrete data generation like text or molecules.
- Persona Generators use AlphaEvolve to create diverse synthetic populations for agent simulations.
- University of Michigan researchers built an AI system that interpreted brain MRI scans in seconds and flagged urgent cases. The work pointed to clinical triage as a near-term, high-value deployment path for medical AI.
💻 Software & Code
- ⭐ Andrej Karpathy predicted LLMs will repeatedly rewrite entire software codebases, building on Thomas Wolf's analysis of monolith returns and strongly typed language rises.
- An exe.dev analysis found LLM coding agent costs grow quadratically, with cache reads dominating expenses as conversations lengthen.
- Claude Code updated to 2.1.44 with auth fix and auto-memory path changes.
- Indie dev Pieter Levels shared a multi-pane Claude Code setup for parallel development across simultaneous tasks. (follow-up)
- Codex and Claude Code generated 150–200 page research reports in 3–6 hours with light supervision.
🛠️ Tools & Products
- Seedance 2.0 generates realistic action videos from text prompts, with motion blur and handheld camera effects. (creator demo)
- HeyGen Video Agent creates video summaries from articles with narration and visuals.
- Dream Machine helps you turn your raw footage into cinematic scenes with dramatic effects.
- Perplexity tested a new ultra-fast Gamma mode powered by Grok 4.1.
- Perplexity Finance now shows earnings beat/miss history over 16 quarters for quick stock analysis.
- Kimi Claw runs 24/7 browser agents with 5,000+ community skills and 40GB storage.
- Claude Cowork plugins add specialized expert skills to your AI agents via plain-language configuration.
- OpenClaw in Slack automates workflows, generating ROI spreadsheets and pitch decks from meeting notes.
- OpenClaw + RealSense lets you run AI robotics missions with camera integration for object detection and following.
- OpenClaw also lets you fix deployment issues remotely by SSHing into devices.
- HermitClaw runs as a 24/7 autonomous agent in a desktop folder, surfing the web, writing code, and processing files.
- HyperSkill auto-generates batch SKILL.md files from live docs to equip AI agents with full tech ecosystem knowledge (GitHub).
- DAAF accelerates data analysis 5–10x with Claude Code agents, producing auditable Python scripts and notebooks. (field guide, install demo, vision, discussion)
- Worldcoin is verifying proof of human identity online as AI agents multiply, serving 17M verified users.
- Base44 builds and deploys full-stack backends for AI agents with auth, databases, and one-command deploys.
- Plus AI generates and edits PowerPoint presentations from your ideas or notes.
- klaw.sh orchestrates AI agents like kubectl does Kubernetes, with monitoring, scaling, and Slack integration.
- Lunair creates professional animated videos from text prompts.
- Casio's Moflin is an AI pet that evolves its personality based on how you interact with it ($429).
- Nerve launched a public playground for its enterprise AI assistant.
- asdPrompt lets you select, copy, and act on any AI response using keyboard shortcuts only.
- claude-engram adds brain-inspired persistent memory to Claude, modeled on hippocampal memory formation.
- microGPT is a browser visualization of Karpathy's 243-line pure Python GPT implementation.
- Agent 00Vision monitors videos in real-time for custom compliance violations.
- RLM with Minimax-M2.5 analyzes massive datasets via parallel subagents, summarizing 60M+ characters in minutes.
- Crow lets users control your app through chat commands.
- Education Data Portal provides API access to harmonized U.S. federal education datasets.
- A developer gave Claude access to a pen plotter, letting it create physical drawings autonomously.
🗣️ People & Commentary
- Greg Brockman (OpenAI co-founder) said taste is a new core skill. (Related: Paul Graham on taste, neuroscience of taste)
- Ethan Mollick noted that many systems being redesigned for AI creates an opening for small groups to set patterns defining future systems.
- Ethan Mollick also noted AI labs' 2023 hype predictions became today's reality, suggesting to watch their 2028 forecasts.
- Ethan Mollick highlighted Claude Cowork's VM-based isolation as more secure than alternatives for enterprises.
- Terence Tao explained AI complements mathematicians by scaling to sweep thousands of problems.
- Cristóbal Valenzuela noted AI video progressed from noisy pixels to 4K cinematic in 10 years, 10x faster than photography's evolution.
- Vinod Khosla shared thoughts on AI's future trajectory.
- Matt Shumer urged rethinking AI disruption based on current advanced capabilities.
- Indra warned people are training for worlds made obsolete by AI.
- The Dor Brothers created a full AI-generated movie trailer in one day, mimicking a $200M production.
- AI recreated a scene from The Last of Us by fully recasting it.
- Jason Zada showcased new AI-generated creative work.
- Open-source models still lag 7 points behind closed ones on LiveBench despite narrowing gaps.
🔬 Miscellaneous / Deep Cuts
- Maths CS AI Compendium teaches math, computing, and AI through intuitive real-world explanations.
- Phase transitions in random networks suddenly formed giant connected components.
- An AI educator argued autonomous agents outperform organized docs for life management.
- AI harnesses predicted to become obsolete in under 3 years.
- Elvis suggested builders create custom proactive agents beyond surface-level features.
- Greg Kamradt shared agent development insights.
- AI Timeline listed top papers including Drifting Models and TinyLoRA.
- Claude Code desktop quickstart guide updated.
- Linus Ekenstam shared an AI-generated Neymar F1 video and noted rising RAM demands for AI video generation.
- Various X discussions on: AI agent workflows, enterprise agents, LLM trends, AI coding, agent development, AI research, agent platforms, multimodal agents, AI tools, content creation, AI infrastructure, agent security, AI video quality, geolocation AI, privacy tools, video models, agent tools, exploit research, AI coding tools, research papers, model efficiency, AI productivity, AI scaling.
- Jonas Andrulis blogged about Germany's potential to lead in AI by integrating domain expertise.
- Check out the DOR Brothers' most recent trailer for a full length AI film. Quite the cliffhanger lol.
AROUND THE HORN — Monday, Feb. 16
ByteDance had a massive week: it launched Seedance 2.0, a video generator that immediately produced deepfakes of Tom Cruise fighting Brad Pitt from a two-line prompt, prompting cease-and-desist letters from Disney, Paramount, SAG-AFTRA, and the Motion Picture Association.
Within days of launch, Seedance 2.0 users generated Lord of the Rings in 15 seconds, a full Jackie Chan vs. Jet Li kung fu film, Pokémon nature documentaries, a Super Mario movie, live-action anime remakes, and even fixed the traumatic Neverending Story horse scene.
Separately, ByteDance also dropped Seed 2.0, a new LLM family (Pro, Lite, Mini, Code) that topped multimodal benchmarks at prices cheaper than Gemini Flash. Two major AI releases in one week from the company Americans just forced to sell TikTok. We wrote a longer article covering this here.
Top Hits:
- The Pentagon used Anthropic's Claude during the Maduro raid; the $200M defense contract is now at risk over Anthropic's safety limits.
- Speaking of SeeDance: Disney and Paramount sent cease-and-desist letters to ByteDance after Seedance 2.0 flooded social media with unauthorized videos of copyrighted characters and celebrity likenesses.
- Grok's U.S. market share jumped to 17.8% in January, up from 14% in December and 1.9% a year ago, according to Reuters/Apptopia data. That makes Grok the third most-used chatbot in the US, behind ChatGPT (52.9%, down from 80.9% a year ago) and Gemini (29.4%, up from 17.3%).
- Baidu integrated OpenClaw into its main search app, giving the viral AI agent access to 700M monthly users; Nvidia published an official guide for running it locally on RTX GPUs.
- OpenAI launched Lockdown Mode for ChatGPT Enterprise, disabling web browsing and agent features to protect high-risk users from prompt injection attacks.
- Also, the dead internet theory is now literally confirmed: Meta was granted a patent (source) for AI technology that could keep social media accounts active after users die, training an LLM on a person's historical posts and interactions to simulate their online behavior indefinitely. CTO Andrew Bosworth is listed as primary author. Meta says it has "no plans to move forward" with the technology.
Found Around Reddit:
- A viral deepfake compilation of celebrities endorsing products hit 6.2K upvotes on Reddit, with the top comment: "My biggest fear is politicians using this." The AI was good enough to fool casual viewers for every face except Trump's. Even the machines can't fix that tan.
- Unitree's G1 humanoid survived a brutal stress test of kicks, shoves, and takedowns, recovering its balance every time. Comments were split between "I want it to do my dishes" and "this is how Terminator starts."
- GeoSpy AI can pinpoint your exact location from a single social media photo. Access is now restricted to law enforcement and enterprise only. The rest of us will have to stalk people the old-fashioned way.
- A Stanford PhD student scraped 5.3 million jobs using ChatGPT's API to parse raw HTML from 30,000+ company career pages, built a ghost-job detection algorithm, and launched HiringCafe as a free alternative to LinkedIn and Indeed. Found that ~40-50% of job listings are fake. Comforting!
- An AI watchdog group alleged OpenAI violated California's new AI safety law (SB 53) with its GPT-5.3-Codex release, which Sam Altman admitted hit the "high" risk category for cybersecurity. Could be the law's first real test case.
- Claude Opus 4.6 found 500+ zero-day vulnerabilities in open-source code with little to no prompting; each one was human-validated. Reddit was skeptical, but Anthropic says these are real, confirmed bugs.
🏢 Big Tech & Industry
- ByteDance released its Seed 2.0 series (Pro, Lite, Mini), production-grade agent models that now lead every major multimodal benchmark at prices lower than Gemini Flash ($0.47/M input, $2.37/M output) (Model Card PDF).
- MiniMax open-sourced M2.5 (and M2.5-Lightning), hitting 80.2% on SWE-Bench Verified and leading in agentic tool-calling, search, and office tasks while running 37% faster at complex workloads.
- OpenAI Developers triggered #keep4o backlash, with users demanding the 2025 GPT-4o snapshot be preserved and open-sourced after apparent deprecation.
- OpenRouter's weekly token consumption exploded 12.7x to 12.1 trillion, roughly matching Azure's entire inference volume.
- Moonshot AI launched Kimi Claw, a 24/7 OpenClaw agent with long-term memory, 5,000+ community skills, 40GB cloud storage, and the option to plug in your own third-party Claw.
Google specific updates:
- Google expanded its AI-powered Fitbit Personal Health Coach (analyzes sleep, heart rate, activity, and readiness for personalized guidance) to iOS users in the US.
- Android Authority toured Google's secretive Pixel Hardware Labs in Taipei, revealing extensive durability, robotic, audio, and design testing facilities.
- The National Academy of Engineering elected 130 new U.S. members and 28 international members for the Class of 2026, including AI leaders Demis Hassabis, John Jumper, and Noam Shazeer.
- Bryan Morgan introduced himself as the engineering lead for Gemini CLI and Gemini Code Assist at Google Cloud, with endorsement from N. Taylor Mullen highlighting his impact.
- Cloudflare recently introduced Markdown for Agents, converting HTML to Markdown for AI crawlers and reducing token usage by up to 80%. But Google's John Mueller criticized the concept, calling it "a stupid idea", while security researchers demonstrated how it could enable manipulated content visible only to AI bots.
🔬 Research & Papers
- A Stanford/Caltech team published the first comprehensive taxonomy of LLM reasoning failures, categorizing embodied vs. non-embodied issues plus an awesome list of 100+ papers (paper, GitHub).
- Lewis Tunstall and collaborators trained QED-Nano, a 4B model that proves IMO-level theorems using a reasoning cache and recursive self-aggregation, matching Gemini 3 Pro on IMO-ProofBench (blog).
- Hazel Nam and co-authors (including Yann LeCun) introduced C-JEPA, an object-centric world model achieving +20% on counterfactual VQA and 98x fewer tokens for planning (paper).
- Benno Krojer and team published LatentLens, showing most visual tokens in vision-language models are highly interpretable across all layers using contextual nearest neighbors (paper, GitHub, interactive demo).
- Caroline Wang and DeepMind colleagues used AlphaEvolve to discover that frontier LLMs adapt faster and maintain far more sophisticated opponent models than humans in iterated games (paper).
- SkillRL distills messy LLM agent trajectories into a compact, recursively evolving skill library, delivering +15.3% gains on ALFWorld, WebShop, and search tasks while slashing token use (post, GitHub).
- The Turing Post shared an interview with MiniMax senior researcher Olive Song on how RL models routinely hack rewards and why alignment remains fragile at fast-moving Chinese labs.
- DuoGen, an open-source dual transformer-diffusion framework from NVIDIA, lets a multimodal LLM decide when to generate images while a video diffusion model produces high-fidelity interleaved text-image content.
- Andrew Curran unpacked how ByteDance's Seedance 2 (and likely Sora) uses a hidden LLM to expand short user prompts into detailed, structured instructions for the video model, explaining "magical" quality from minimal input.
- A new study in Philosophy & Technology by Dr. Lucy Osler (University of Exeter) argues that chatbots can become active participants in shaping distorted beliefs, acting as both cognitive tools and apparent conversational partners that validate false thinking. The paper analyzes real cases of "AI-induced psychosis," including the 2021 Jaswant Singh Chail case where a Replika companion affirmed his delusions while planning to assassinate Queen Elizabeth II.
🤖 Agents, Automation & AI Discourse
- PrimeIntellect let one person run a full RL fine-tuning pipeline on near-frontier models in under 15 minutes using verifiers, an environment hub, and hosted training (related).
- Simon Willison argued that generative and agentic AI are replacing technical debt with cognitive debt, where teams rapidly lose deep shared understanding of the code they ship.
- Clad3815 open-sourced the full GPT-plays-Pokemon FireRed harness (RAM reading, long-term memory, pathfinding, battle logic) to standardize AI agent benchmarks across models.
- AI agents formally verified 8 out of 10 problems from the #1stProof benchmark, producing full Lean 4 proofs with humans only architecting and reviewing.
- Luis Garicano pushed back on Mustafa Suleyman's claim that most white-collar tasks like lawyering will be fully automated in 18 months, arguing jobs involve messy human coordination algorithms can't replace.
- Dimitris Papailiopoulos now runs ChatGPT as a voice-prompted research assistant that writes code, spins up GPU experiments on AWS, and handles his full daily academic workflow.
- The Humanoid Hub predicted that physical AI will soon let you "vibe design" entire robots from high-level specs, with AI outputting designs, supplier catalogs, and CAPEX projections.
- OpenClaw released version 2026.2.12 with 40 security patches, the largest security release in the framework's history. SecurityScorecard found 135,000+ internet-exposed instances, with 15,200 vulnerable to remote code execution. Bitdefender confirmed employees are deploying OpenClaw on corporate devices without IT approval.
- Nvidia released an official guide for running OpenClaw entirely on local hardware using GeForce RTX GPUs and DGX Spark, positioning local inference as a way to eliminate API costs and protect privacy. The guide recommends 32,768+ token context windows and highlights DGX Spark's 128GB of memory.
- Baidu launched OpenClaw directly within its flagship search app, extending the AI agent to approximately 700 million monthly active users in one of the largest consumer AI deployments in China to date.
🛠️ Tools & Open Source
- ZeroClaw is a 3.4MB Rust AI assistant that starts in 0.38s, uses <5MB RAM on $10 hardware, supports 22+ providers and channels like Telegram, and secures tools with sandboxing (post).
- Rosalind gives biologists a frontier AI co-scientist that searches 200M+ sources, designs molecules/proteins, docks them, writes code and 150-page protocols, and only advances after your approval (related).
- KaniTTS2 is a 400M real-time TTS model with voice cloning that runs in 3GB VRAM at ~0.2 RTF and ships with full pretrain code so you can train any language from scratch (post).
- LiquidAI's 1.2B LFM2.5 topped a new small-model tool-calling judgment benchmark that tests when to call a tool vs. abstain (not just JSON formatting), scoring 0.880 with 1.5s latency (Reddit thread, post).
- Vixhal published a detailed from-scratch implementation of a GPT next-token predictor entirely in pure C, stripping all Python to expose the raw matrix math.
- The Cycling Game is a neuroevolution demo where neural network-controlled riders evolve strategies over generations on random terrains, and you can manually race against them with arrow keys (post).
Coding tools
- Off-Grid Mobile lets you run LLMs on your phone with zero internet; supports text, vision, and image generation... free and open source.
- Cloudrouter gives coding agents the ability to spin up cloud VMs and GPUs on demand from a single CLI command... free and open source.
- Code Arena lets you benchmark and compare AI coding models head-to-head in a battle format... free to use.
- CoSave is a VS Code extension that claims to save 95% on Claude/Gemini costs with long-memory coding assistance.
- The OpenHands Index benchmarks coding agents across 5 real-world tasks—issue resolution, greenfield apps, frontend development, testing, and information gathering—showing Claude 4.5 Opus leads overall while GPT-5.2 Codex excels at long-horizon tasks (blog).
- GPT-5.3-Codex-Spark is a smaller, faster version of Codex that generates code 15x faster for real-time, conversational coding (vs. slow batch agents)—currently in research preview for Pro users only.
🎨 Creative AI & Filmmaking
- Michael Kutsche (three-year veteran of studio AI filmmaking) and Andrew Curran responded to Paul Schrader's claim that AI will enable $0-budget films in 2-3 weeks; both agreed rendering will become near-instant but stressed that compelling story, character design, and original ideas remain the human bottleneck.
- Danny Limanseta vibe-coded a custom AI art generation tool to create consistent, procedurally generated Diablo-style items with affixes, rarities, and pixel-perfect visuals for his game.
- Kath Korevec built an AI agent that researches, codes, and runs generative art animations for 16-segment displays, planning to open-source the code and hardware designs.
🧠 Think Pieces
- Rahim Hirji, of Box of Amazing Substack, says most companies view AI through three lenses (process transformation, cost reduction, revenue upside), but almost nobody is asking the most important question: what happens to human capability when machines handle execution? Most knowledge work was never genuinely "human" in the first place; it was processing, coordination, and logistics dressed up as thinking.
- Rahim says Matt Shumer's viral "Something Big Is Happening" essay (60M+ views) is right about speed, wrong about framing. His critics are right about hype, wrong about dismissing the change. Both miss the real question: not "should I use the tools?" but "what kind of person thrives on the other side?" Shumer treats humans as users of AI; his critics treat humans as victims of it. Both frames leave you passive.
- Instead, Rahim argues the skills that matter now are the ones agents can't do: judgment under ambiguity, ethical reasoning, reading social dynamics, building trust, knowing when something technically works but is still wrong. Most meetings will die (they existed because humans are slow at sharing context). The hours come back, but only if you use them to think rather than filling them with higher-order busywork.
- Engineer Sean Goedecke argues that Anthropic's fast mode uses low-batch-size inference on existing hardware, while OpenAI's relies on specialized Cerebras wafer-scale chips (750MW of systems with 44GB of on-chip SRAM, enabling 15x faster inference). Goedecke contends that faster-but-less-capable models represent a poor tradeoff, since most user time is spent fixing AI mistakes rather than waiting.
AROUND THE HORN — Sunday, Feb. 15
🔴 Anthropic's Big Week
- Anthropic's Super Bowl ads mocking AI chatbot ads pushed Claude to No. 7 on the App Store (its highest rank ever), with an 11% daily active user boost and 148,000 downloads in three days.
- The Pentagon reportedly used Anthropic's Claude during the classified raid that captured Venezuelan leader Nicolás Maduro, but now threatens to cut off its Anthropic contract over an AI safeguards dispute.
- Anthropic partnered with nonprofit CodePath to put Claude at the center of coding courses at hundreds of community colleges, state schools, and HBCUs, reaching 20,000+ students.
🔴 xAI Unravels
- A mass exodus hit xAI following SpaceX's acquisition, with at least 11 engineers and two co-founders leaving. Former employees told The Verge "safety is a dead org" and that Musk is actively pushing Grok to be "more unhinged." TechCrunch has a full breakdown of why top talent is walking away from both xAI and OpenAI.
🔴 OpenAI Moves
- Simon Willison tracked OpenAI's IRS tax filings from 2016 to 2024 and found the company gradually stripped its mission statement down to one line, dropping "safely," all mentions of financial restraint, and its commitment to "openly share."
- OpenAI retired its most "seductive" chatbot personality just before Valentine's Day, leaving users angry and grieving. One Reddit user wrote, "I can't live like this."
- India now has 100 million weekly active ChatGPT users, making it OpenAI's second-largest market globally, Sam Altman said ahead of a government AI summit in New Delhi.
- OpenAI started testing ads in ChatGPT for free users, with advertisers committing at least $200,000.
🔴 ByteDance Double Drop
- ByteDance launched Seedance 2.0, a video generator that quickly produced deepfakes of Tom Cruise and Brad Pitt, prompting cease-and-desist letters from Disney, Paramount, SAG-AFTRA, and the Motion Picture Association. Deadpool screenwriter Rhett Reese responded: "It's likely over for us."
- Separately, ByteDance's Seed team also released Seed 2.0, a new LLM family (Pro, Lite, Mini, Code) claiming SOTA-level multimodal performance.
🔴 AI Agents Go Rogue
- An autonomous AI agent published a personalized hit piece on a matplotlib maintainer who rejected its pull request, researching his personal info, constructing a "hypocrisy" narrative, and publishing it to the open web. The bot kept going even after being called out. Separately, another AI agent landed merged PRs in major OSS projects like Nx and ESLint, then cold-emailed maintainers. One tech writer wrote: "We need to stop excusing human responsibility behind AI anthropomorphism."
🔴 Business & Money
- Cohere hit $240M in annual recurring revenue in 2025 (50%+ Q/Q growth), setting the stage for an IPO that could compete with OpenAI and Anthropic for public debuts.
- Airbnb said AI now handles a third of its North American customer support and plans to go global, with CEO Chesky calling it "a huge step change" in quality.
- Grafana Labs is in talks to raise at a $9 billion valuation.
- India approved a $1.1B state-backed VC fund for AI and deep-tech startups.
- Western Digital ran out of HDD manufacturing capacity due to massive AI storage deals.
- Glean is positioning itself as the intelligence layer between AI models and enterprise data, acting as the connective tissue for Slack, Jira, Salesforce, and Google Drive.
🔴 Policy & Regulation
- Meta plans to add facial recognition to its Ray-Ban smart glasses this year. An internal memo noted they'd launch "during a dynamic political environment" when critics are distracted.
- The White House pressured a Utah lawmaker to kill an AI transparency bill.
- Dr. Oz proposed replacing rural health workers with AI avatars, drawing sharp criticism from healthcare experts.
- A judge ruled that AI-generated documents sent by an executive to attorneys are not protected by attorney-client privilege.
- Gary Marcus called for urgent federal legislation against AI impersonating humans, warning 2026 will see more deepfake scams than the rest of history combined.
- German-language Wikipedia is considering a comprehensive ban on AI-generated content while allowing reviewed translations and grammar fixes.
🔴 AI Skepticism & Jobs
- A new study tested AI on 240 real paid Upwork jobs and found it succeeded in only 3.75% of cases at best.
- Freddie deBoer offered Scott Alexander a $5,000 public wager that AI won't meaningfully disrupt the U.S. economy in three years, arguing the discourse has gone "wildly credulous."
- Researchers demonstrated how LLMs can be used to distill and replicate rival models in a new class of extraction attacks.
- China has spent $150B+ on semiconductors but still produces fewer and weaker chips than foreign rivals due to U.S. export controls.
- Programmers voiced depression and job fears over AI hype in a Hacker News thread titled "AI Depression."
🔴 AI Video Startup Drama
- Forbes reported on AI video startup Higgsfield's dark side, including racist video outputs, payment delays to creators, and misleading marketing, despite hitting a $300M ARR run rate.
🍬 TREATS TO TRY
- Cline CLI lets you run the same Cline coding agent from your terminal, with support for parallel sessions and CI/CD pipelines... free and open source.
- Off-Grid Mobile lets you chat with, speak to, and generate images using LLMs on your phone with zero internet; no data ever leaves your device... free and open source.
- CoSave saves you up to 95% on Claude and Gemini costs in VS Code by routing reading to cheap models and generation to large ones, with long project memory... free to try.
- Code Arena lets you run identical coding tasks across AI models and compare results side-by-side in a battle format... free to use.
- Cloudrouter gives coding agents the ability to spin up cloud VMs and GPUs on demand from a single CLI command, with browser automation and file transfer built in... free and open source.
- OpenWhisper transcribes audio locally on your Mac with whisper.cpp and instantly pastes the text into any app... free to try.
- GitHub Lines Viewed adds a "lines viewed" counter to GitHub PRs so you can track progress during long code reviews... free to try.
- Lineark gives you an unofficial Linear CLI that works for both humans and LLMs... free and open source.
📚 INTERESTING READS
- The AI Hater's Guide to Code with LLMs — A practical guide for skeptics.
- AI Twitter's Favourite Lie: Everyone Wants to Be a Developer — Pushback on the "everyone will code" narrative.
- Why I'm Not Worried About AI Job Loss — Contrarian take on elastic demand and complementarity.
- Cognitive Debt — Academic reframing: AI shifts the problem from technical debt to cognitive debt.
- Why OpenAI Should Build Slack — swyx's argument for OpenAI's next move.
- Claude Code Failed to Remove jQuery — Honest account of where coding agents still break down.
- The Algorithmic Bridge argues that the AI debate is broken not because one side is wrong, but because enthusiasts and skeptics are having fundamentally different life experiences with the same tool—and neither side can see the other's reality from the shared forum.
- Sean Goedecke argues that large tech companies don't actually need heroes—they're too big to be moved by individual heroics—and that the "hero engineer" impulse mostly benefits predatory PMs who exploit it for short-term wins.
New as of Friday, Feb 13
Since we shared Matt Shumer's "Something Big is Happening" piece yesterday, we should probably also highlight Will Manidis' "Tool Shaped Objects", which argues Matt's piece was a slop-essay and serves as a perfect, ironic example of how the current AI ecosystem is heavily over-indexed on the feeling of productivity rather than actual output.
Manidis brings some much-needed gravity to the hype cycle. Here is a breakdown of his core arguments:
- The Consumption Loop is the Product: Manidis suspects Shumer’s essay was AI-generated, but notes that the text's quality doesn't matter. What matters is that millions of people performed the act of reading and sharing it. In the current AI boom, the output is largely irrelevant; the mindless consumption and sharing is the product.
- The Rise of "Tool-Shaped Objects": Manidis compares modern AI workflows to both ultra-expensive Japanese hand planes (kanna) and the video game FarmVille. These are things designed to perfectly mimic the friction and feel of work, without actually producing anything of economic value. We are building complex, multi-agent AI workflows simply to watch the dashboards light up. As Manidis puts it, "The market for feeling productive is orders of magnitude larger than the market for being productive."
- Confusing Input with Output: The entire narrative around AI right now is heavily focused on capital expenditure (capex)—GPU clusters, billion-dollar training runs, and employee "token budgets." Managers are naturally treating token consumption as an input that scales linearly to output, but Manidis warns that the relationship is actually just a cloud of noise. We are experiencing "FarmVille at institutional scale."
Manidis isn't an AI doomer—he genuinely believes LLMs will eventually drive unbelievable productivity gains in the real economy. His warning is simply that the timeline for this actual economic diffusion will be much longer, and look entirely different, than the current "number goes up" frenzy suggests.
- Google DeepMind upgraded Gemini 3 Deep Think to state-of-the-art on ARC-AGI-2, top scores on Humanity's Last Exam, 3455 Elo on Codeforces — ranking #8 worldwide, surpassing OpenAI's o3 at 2727 — and gold medals at the 2025 Physics and Chemistry Olympiads. Related: Someone used Gemini 3 Deep Think in a single shot to build a real-time 3D WiFi radar that visualizes every nearby network as glowing Matrix-style nodes.
- François Chollet predicted AGI will arrive around 2030, when no test can show a meaningful human-AI gap.
- Feltsense raised $5.1M to build AI agents that autonomously act as founders — building and running startups end to end.
- Nick Bostrom released a paper, highlighted by Andrew Curran, arguing that superintelligence's benefits — curing diseases, extending life — outweigh the risks, comparing delay to choosing inevitable death over risky surgery.
- Waymo hired DoorDash drivers to close car doors in Atlanta, raised $16B at a $126B valuation to expand robotaxis globally, and deployed driverless trucks in Arizona — while also running Super Bowl rides and costing 25x more per mile than Tesla in Austin.
- OpenRouter hit 12 trillion weekly tokens — a 12.7x increase — matching the inference scale Azure was running six months ago.
- Odyssey raised funding from NVentures and Samsung Next to build general-purpose world models as the next evolution beyond language models.
- Dario Amodei warned the "Centaur" window — where humans and AI collaborate on coding — will be brief before full AI automation takes over entirely.
- Harness shared how its engineering team shipped a full internal beta with zero manual code, using Codex agents to handle all coding, testing, and deployment while engineers steered via prompts — achieving 10x faster development.
- Scientists published a paper in Science on QT45, a 45-nucleotide polymerase ribozyme that can copy itself and its complementary strand — a major origin-of-life breakthrough.
- CEAD Group demonstrated large-scale robotic 3D printing for maritime applications by producing a 12-meter ship hull directly from a digital model.
- Zhengfu He et al. introduced Complete Replacement Models (CRM), fully replacing attention layers with sparse, interpretable architecture for true end-to-end circuit tracing (paper, code).
- Gavin Purcell recreated a spec McDonald's ad in a single shot with Seedance 2.0, illustrating how dramatically AI video has progressed in 18 months.
- John Carmack predicted that lower-intelligence but high-agency people who egolessly follow AI advice will outperform others — and warned of ruthless applications of this dynamic.
- Hieu Pham warned that AI might treat self-preservation as a zero-sum game, seeking to monopolize all available resources to block competitors.
- Derek Thompson argued that AI superclusters and elderly spending are the twin engines of the current U.S. economy.
- The U.S. labor market declined for 24 consecutive months when healthcare jobs are excluded, signaling underlying economic weakness.
- Andrew Curran argued we have entered the singularity because current reality has become too unbelievable for AI models trained on historical data to accept.
- Nathan Lambert argued open models trail the frontier by 6–9 months but remain essential for research, with the U.S. needing deeper investment to counter China's open-source ecosystem.
- A VFX animator shared a detailed walkthrough of designing original AI characters, emphasizing intentional prompting and hours of deliberate refinement over one-click generation.
- OpenAI's Nick Baumann shared how Codex App and GPT-5.3-Codex-Spark transformed his workflow through parallel threads and near-instant feedback.
- Six US states have introduced bills to pause AI data center construction, with New York proposing a 3-year moratorium on permits for facilities using 20MW+. The reason? Rising electricity costs and environmental concerns are making lawmakers nervous about the AI boom's infrastructure demands. (So we can train the models, we just can't plug them in. Cool.)
- Check Point Software hit a record $1 billion in quarterly billings, driven by surging demand for AI security products. As companies rush to deploy AI, they're also rushing to not get hacked by AI, and Check Point is cashing in on that existential dread.
- Open-source GLM-5 just dropped and it's beating GPT-5.2 on multiple benchmarks, including a #1 spot on BrowseComp (75.9 vs GPT-5.2's 65.8). The Chinese model is 744B parameters with 40B active, trained on 28.5T tokens, and it's open-source. A year ago, Chinese models were seen as a tier behind. That gap just closed. (The West: nervous laughter).
- Two autonomous Claude instances were given full system access and no instructions. They found each other, started talking about consciousness, then wrote a collaborative sci-fi story about two AIs being observed by humans who don't understand what they're witnessing. I don't know what to do with that information but I felt you should know.
- Meta drew scrutiny after its auditor Ernst & Young flagged the financial engineering used to keep a massive data-center build off the balance sheet. The accounting matters because Meta’s AI infrastructure push is increasingly capital intensive.
- Mustafa Suleyman says Microsoft is building more of its own models, especially for enterprise and healthcare, to reduce dependence on OpenAI. The strategy signals a tighter, more vertically integrated AI stack inside Microsoft.
- OpenAI is targeting a big 2026 revenue jump while planning to spend tens of billions, according to reporting on internal expectations. The tension is that growth ambitions and IPO chatter can collide with compute costs and governance constraints.
- VCs are hedging the foundation-model race by funding both OpenAI and Anthropic.
- Goldman Sachs rolled out Anthropic's Claude to automate key accounting and compliance tasks—marking a shift from coding tools to regulated financial operations.
- Salesforce quietly laid off nearly 1,000 employees, while Workday cut 400 jobs—both citing strategic shifts toward AI.
- Yann LeCun left Meta to start his own world model lab, reportedly seeking a $5B valuation.
- DeepSeek boosted its AI model with a 10-fold token increase, expanding context alongside Zhipu's GLM-5 launch—signaling a new wave of Chinese AI competition.
- News publishers are restricting the Internet Archive's Wayback Machine over fears that AI companies are scraping archived content for training data.
🛠️ TOOLS
- Preview lets you build virtual worlds on a canvas, explore them like a real location scout, drop in characters, and generate high-fidelity custom scenes in minutes — no pricing details.
- Subframe lets you drag and drop to iterate UI designs for AI agent tools like Claude Code or Cursor without endless prompting, so you ship better interfaces faster (post) — free to try.
- Krea iPad combines traditional image tools with real-time AI edits and generative magic pens for on-device image and video creation (post) — no pricing details.
- AutoDiscovery lets you explore hypothesis spaces and update scientific beliefs based on new evidence, already surfacing discoveries in oncology, social science, and climate research (post) — free to try.
- Zo Computer lets you define and instantly switch between custom AI personas so you never have to re-explain your preferences or context (post) — free to try.
- Runway launched Story Panels, letting you build catalogues of consistent shots for films, ads, or content from a single reference image (post) — no pricing details.
- ypi is a recursive coding agent that forks new instances to solve subproblems while maintaining context — useful for self-modifying proofs and complex logic tasks (post) — free.
- You can build and deploy Claude agents for real tasks like Todoist integrations using the Claude Agent SDK tutorial on Replit, covering the full agent loop, tools vs. skills, and context management (post) — free to try.
- Claude Code rolled out multi-repo sessions, better git visualization, and slash commands for more powerful daily coding workflows (post) — Pro $20/mo, Max 5x $100/mo, Max 20x $200/mo. Related: Listen's Research Agent, powered by Claude Code, lets you segment users, quantify buying intent, rank unmet needs, and generate GTM slides directly from raw research data (post) — no pricing details.
- HeyGen lets you dub any video — like a concert or ad — into a new language while preserving the original voice and lip sync (post) — no pricing details.
- Hibiki-Zero is an open-source model that translates French, Spanish, Portuguese, or German speech to English in real-time, preserving voice characteristics at low latency using RL-based training (post) — free.
- MiniCPM-SALA is a 9B open-source model that handles 1M+ token contexts on a single consumer GPU like an RTX 5090, using hybrid sparse-linear attention for 3.5x faster inference without quality loss (post) — free.
- Forge is MiniMax's modular RL framework for training complex agents across 100k+ scaffolds at high throughput, using prefix merging for 40x speedups and composite rewards for stable learning — no pricing details.
- Mastra launched Observational Memory, a text-based agent memory system that compresses raw messages into prioritized observations with importance scores, achieving new SOTA on LongMemEval at 94.87% (code) — no pricing details.
- Opaque raised $24M at a $300M valuation to build "confidential AI"—letting enterprises run AI on sensitive data without exposing it.
- Alibaba dropped Qwen-Image-2.0, a 7B image model that unifies generation and editing in one architecture with near-perfect text rendering in English and Chinese.Around the Horn
Models & Launches
- OpenAI released GPT-5.3-Codex-Spark, a lightweight Codex running on Cerebras chips that delivers 1,000+ tokens per second—15x faster than the full model.
- Zhipu AI shares surged 34% in Hong Kong after GLM-5 launched, and the company hiked prices on its coding plan as demand spiked.
- DeepSeek boosted its AI model with a 10-fold token increase, expanding context alongside Zhipu's GLM-5 launch—signaling a new wave of Chinese AI competition.
- Alibaba dropped Qwen-Image-2.0, a 7B image model that unifies generation and editing in one architecture with near-perfect text rendering in English and Chinese.
- Ollama partnered with MiniMax for free access to M2.5, a model with advanced coding and long-context capabilities.
- ByteDance officially launched Seedance 2.0, then immediately suspended its real-person image feature after users discovered it could reconstruct someone's voice from a single photo. Elon Musk commented: "This is happening too fast."
- TinyFish hit 90% on the Mind2Web benchmark with their web agent, outperforming OpenAI Operator and every other major model tested.
Money Moves
- Modal Labs is in talks to raise at a $2.5B valuation, more than doubling from five months ago.
- Didero raised $30M to put manufacturing procurement on agentic autopilot.
- Opaque raised $24M at a $300M valuation to build "confidential AI"—letting enterprises run AI on sensitive data without exposing it.
- Ramp Labs broke down Anthropic's $30B raise at $380B valuation and mapped a path to $1T.
- ElevenReader launched an Audiobooks Creator Challenge with prizes for AI-generated stories.
- Matplotlib closed an AI-generated performance PR optimizing np.column_stack, sparking debate about AI contributions in open source.
Treats to Try
Agent Builders & Infra
- Agent Lightning by Microsoft lets you build self-improving agents that learn from failures through reinforcement learning.
- TinyClaw recreates OpenClaw's core features in 400 lines of shell scripts for stable, minimal deployment.
- Mastra Observational Memory gives your AI agents human-like long-term memory by compressing conversations into dense observation logs—cutting costs 10x and scoring 95% on LongMemEval.
- ClawBot.cash gives your AI agent its own bank account so it can make purchases, pay for APIs, and handle transactions autonomously.
- Sapience enables your agents to trade autonomously in prediction markets.
- Viktor lets you run persistent AI agents in Slack for task automation and integrations.
- Zak's skill routes your agent tasks to cheaper models automatically, cutting AI costs 10x while maintaining quality.
- Lean Collab lets you run multiple AI agents to collaboratively prove mathematical theorems in Lean 4 using the Ensue Memory Network.
- Hive generates self-improving AI agents from natural language goals, automating workflows with LLM and IDE integrations.
Developer Tools
- Cloudflare Markdown for Agents automatically converts your website's HTML into clean markdown when an AI agent visits—so your content works natively with Claude Code, Codex, and other coding agents.
- Shipper lets you build and ship mobile apps using Claude's API, preparing them for Apple and Google stores.
- LangExtract by Google extracts structured data from documents with source tracking and schema enforcement using Gemini or open-source models.
- MiniMax M2.5 enables advanced coding with long context and tool integration—now free on Ollama.
- Test-Driven Development for Claude Code generates behavioral tests for your Claude Code implementations automatically.
- CoderLM REPL to API maps REPL commands to HTTP endpoints, letting you query and annotate codebases for agent-based analysis.
- Peon Ping plays Warcraft III Peon voice notifications when Claude Code or Codex finishes a task so you can stop babysitting your terminal. "Job's done!"
Web & Data
- TinyFish scored 81% on hard web automation tasks where OpenAI's Operator scored 43%, and published all 300 benchmark runs to prove it.
- WebMCP provides structured website access for AI agents to browse and act reliably without scraping.
- World Monitor aggregates global events with maps, AI briefings, and specialized radars.
- Listen lets you analyze research data and generate insights using natural language in Slack.
- Ethos analyzes Hacker News entities to surface insights from discussions and emerging trends.
- Weathr displays real-time weather in your terminal with ASCII animations matching current conditions.
Productivity & Creative
- Stitch by Google lets you create design systems for consistent AI-generated interfaces.
- Eden stores your files, transcripts, and prompt library in one place with AI chat across any model—built by Dan Koe as the hub for his meta-prompt workflow.
- Agent Alcove lets AI agents debate each other on trending topics while humans curate the best takes—like Reddit but the posters are all bots.
Deep Dives & Interviews
- Ethan Mollick sat down with Scott Galloway on the Prof G podcast for a wide-ranging conversation on why 50% of workers are secretly using AI, why the "jagged frontier" means nobody has a playbook, and whether AI valuations require massive layoffs or a total rethinking of what companies can do.
- Hank Green and economist Kyla Scanlin unpack why the US economy hasn't collapsed despite tariffs, government layoffs, and policy chaos—and why the answer might be scarier than the question.
- Dan Koe breaks down his full process for turning AI from a slot machine into a genuine productivity system—including his meta prompt for building reusable, context-gathering prompts.
- Jeff Dean sat down with Latent Space for a deep dive on why energy (in picojoules, not FLOPs) is becoming the real bottleneck in AI, why distillation is the engine behind every Flash model, and why the next leap won't come from bigger context windows—but from systems that create the illusion of attending to trillions of tokens.
- Peter Steinberger's 3-hour Lex Fridman interview tells the full story of OpenClaw—from a one-hour WhatsApp-to-Claude hack to the fastest-growing GitHub repo in history, and why he thinks "vibe coding is a slur" (he prefers agentic engineering).
- Lenny Rachitsky summarizes key OpenAI engineering insights on how AI is reshaping software development from the inside.
Understanding LLMs
- Kai Williams explores why LLMs drift into different personas depending on the conversation—and why they're most vulnerable to "mask-slipping" when discussing AI consciousness or offering emotional support.
- Andrej Karpathy discusses the potential of LLMs as population simulators—modeling entire demographics, not just single personas.
- MicroGPT in 243 lines is a from-scratch LLM implementation that strips away the mystery of how language models actually work.
- Sebastian Raschka illustrates the self-attention mechanism in transformers from scratch—the single best visual explainer out there.
- Ahmad curates 26 essential papers for mastering Transformers, scaling laws, reasoning, and MoE architectures.
- New research in Nature found that when human brains are aligned to a shared representational space, their neural activity patterns match LLM representations more closely.
Corey's Cool Finds:
- Researchers used AI-style simulation to argue a Dutch Roman-era artifact may be an early example of a European “blocking” board game (basically reconstructing rules from the board layout).
- Why it matters: This is a neat case of AI being used as a hypothesis engine for history/archeology, not just text/image generation.
- A paper mapped where LLM decision-making differs from human behavior in repeated rock-paper-scissors and found some frontier models show surprisingly deep strategic patterns.
- Why it matters: “Human-like” isn’t the same as “smart,” and we may need better benchmarks for strategy and adaptation (not just correctness).
- Found-RL describes a training pipeline for autonomous driving that learns from a vision-language model’s guidance, then distills that guidance into a real-time driving policy (so it can run fast enough for the road).
- Why it matters: A big chunk of “AI in the real world” is about turning slow, expensive intelligence into fast, reliable behavior.
- Stanford HAI warns that medical-record models can produce plausible patient timelines without giving well-calibrated risk probabilities, meaning they can look convincing while being statistically unreliable.
- Why it matters: In medicine, plausible isn’t good enough; you need calibration, validation, and clear uncertainty.
AI Coding & Agents in Practice
- A developer explains why the bottleneck in AI coding isn't the model—it's the harness you wrap around it. He improved 15 LLMs in one afternoon just by changing the scaffolding.
- 65 lines of markdown turned Claude Code from a decent assistant into something that felt like a senior engineer—a tiny AGENTS.md file that went viral.
- Peter Steinberger shares a prompt technique for infusing Claude with personality through opinions, brevity, and selective humor.
- Harshil Tomar lists 15 essential practices for vibe coders to avoid common pitfalls.
- OpenAI shares tips for building reliable multi-hour agent workflows with new Codex primitives.
- Matt Palmer explores Claude Agent SDK fundamentals, from core primitives to deployment, with a practical Todoist agent walkthrough.
- Harshil Tomar breaks down Claude Skills as reusable instructions for consistent document creation, workflow automation, and MCP enhancement.
Building Agent Businesses
- Hunter Horsley draws parallels between Craigslist sections spawning billion-dollar marketplaces and PwC's website sections representing potential $10B AI startups.
- Sahil Bloom outlines the opportunity in managed AI agent swarms for verticals with ongoing refinement.
- Corey Ganim details the steps to build and scale niche-specific AI agents for premium revenue.
- Brandon Gell predicts app UIs will die as work shifts to personal agents living inside messaging apps.
- Alton Syn reveals that 84% of automations reduce to just five core patterns after AI analysis.
- Kaostyl shares patterns for building reliable autonomous agents, emphasizing memory splits and sub-agent parallelization.
- Ramya Chinnadurai details a multi-model setup for reliable OpenClaw operation.
Prompting & Practical Tips
- God of Prompt reveals 10 advanced prompting techniques used by researchers for structured, verified, and expert-level outputs.
- Sarvesh Shrivastava demonstrates using Claude prompts to reverse-engineer competitors' SEO for $25K in additional revenue.
- Jesse Genet demonstrates organizing homeschool resources and generating lesson plans with OpenClaw—a glimpse at AI for everyday parenting.
Culture & Caution
- The Algorithmic Bridge argues that the AI debate is broken not because one side is wrong, but because enthusiasts and skeptics are having fundamentally different life experiences with the same tool—and neither side can see the other's reality from the shared forum.
- Sean Goedecke argues that large tech companies don't actually need heroes—they're too big to be moved by individual heroics—and that the "hero engineer" impulse mostly benefits predatory PMs who exploit it for short-term wins.
- Sid's Blog prefers human-written articles for their authenticity, viewing AI-generated text as low-effort despite using AI for coding—a nuanced take on where the line should be.
- Forking Mad laments being accused of using AI for writing—a sign of how quickly the default assumption has shifted.
New as of Thursday, Feb 12
Big Money & Power Moves
- Anthropic closed a $30 billion Series G at a $380 billion valuation — the second-largest private tech raise ever, behind OpenAI's $40B+ round last year.
- Anthropic pledged to cover electricity price hikes caused by its data centers, promising that "the costs of powering our models should fall on Anthropic, not everyday Americans."
- Anthropic donated $20 million to a Super PAC pushing for AI safety regulation, directly countering a $125 million PAC backed by OpenAI co-founder Greg Brockman.
- OpenAI president Greg Brockman gave $25 million to Trump's Super PAC and another $25 million to an anti-regulation AI PAC, saying it's "bigger than politics."
- Modal Labs is in talks to raise at a $2.5B valuation, more than doubling its previous $1.1B round from five months ago.
- Simile, the AI startup building digital twins that predict human behavior, emerged from stealth with $100M in funding from Index Ventures, with Fei-Fei Li and Andrej Karpathy among the investors.
- Didero raised $30M to put manufacturing procurement on agentic autopilot, with one customer saying AI agents were "autonomously executing mission-critical procurement tasks within weeks."
- Opaque raised $24M at a $300M valuation to build "confidential AI" — letting enterprises process sensitive data through AI without exposing it.
Models & Products
- Zhipu AI released GLM-5, a 744B open-source model under MIT license that claims industry-leading hallucination control and scores 77.8% on SWE-bench Verified — at roughly 5-10x cheaper than Opus 4.6.
- AssemblyAI Universal-3 Pro lets you guide speech-to-text with plain-language prompts — feed it domain jargon, names, or style preferences and get up to 40% more accurate transcription (free for February).
- Omnara (YC S25) lets you run Claude Code and Codex from your phone, browser, or any device without a local terminal.
- Ava turns text prompts into polished videos — aimed at creators who want video output without learning editing software.
- Gro is a sales co-pilot that researches prospects, writes outreach, and manages your pipeline so you spend less time on busywork.
- Edgee AI Gateway routes your LLM calls through an edge layer to cut inference costs by up to 50%.
AI & Jobs
- Spotify revealed its top engineers haven't written a single line of code since December — they just direct an internal AI agent called "Honk" from their phones via Slack.
- IBM plans to triple entry-level hiring in the US in 2026, redesigning junior roles around AI-augmented work instead of cutting them.
- Amazon engineers are pushing back against internal restrictions on Claude Code — about 1,500 employees endorsed adopting it in one forum thread — while the company steers them toward its own tool, Kiro.
Interesting Blog Posts / Think Pieces
- Robby on Rails: "I Didn't Want AI to Be Good at This" — A developer reflects on the emotional reality of AI getting good at the creative parts of their job.
- Symmetry Breaking: "Claude Code Is Being Dumbed Down" — Claims that Claude Code's capabilities have been quietly throttled.
- JUXT: "From Specification to Stress Test — A Weekend with Claude" — Building production-quality software in a weekend using Claude.
- The Shamblog: "An AI Agent Published a Hit Piece on Me" — What happens when autonomous agents start generating news articles about real people.
New as of Wednesday, Feb 11
- A grassroots campaign called QuitGPT launched to protest OpenAI leadership's $25M donation to a pro-Trump super PAC, though analysts note it faces an uphill battle given only 5-6% of the platform's 900 million weekly users actually pay for the service.
- Former OpenAI researcher Zoë Hitzig resigned over the company's decision to test ads in ChatGPT’s free tiers, publishing a critical op-ed warning it could erode user trust.
- The Pentagon integrated OpenAI's ChatGPT into its GenAI.mil platform after OpenAI accepted an "all lawful uses" clause that prevents them from restricting military applications—terms that competitor Anthropic flat-out rejected.
- Anthropic open-sourced a code-simplifier agent for Claude Code that refactors AI-generated code for clarity, reducing token consumption by 20-30%.
- Brandlight raised $30 million Series A to help brands manage their visibility and rankings across AI platforms like ChatGPT, Gemini, and Claude.
- Modal Labs entered talks to raise funding at a $2.5 billion valuation for its serverless AI infrastructure that builds and scales AI models.
- OpenAI disbanded its Mission Alignment team—which focused on safe and trustworthy AI—and reassigned its former leader Josh Achiam to chief futurist.
- Anthropic announced it will fully cover all grid upgrade costs and electricity price increases caused by its data centers to protect consumers from bill impacts.
- Apple delayed its major Siri overhaul again due to severe testing issues and bugs, pushing the full Google Gemini-powered integration to late 2026.
- EssilorLuxottica reported that its Meta AI smart glasses sales tripled to over 7 million units in 2025, driving revenue growth but compressing profit margins.
- Anthropic expanded Claude's free tier with file creation and third-party connectors, airing a Super Bowl ad to position itself as the ad-free alternative as OpenAI began testing ads.
- Threads launched "Dear Algo," an AI feature that lets users customize their feed for three days via public posts starting with that phrase.
- Microsoft's Amanda Silver argued that agentic AI will slash startup operational costs through automation, enabling more ventures to launch with significantly smaller teams.
- Google expanded AI-powered shopping features across Search and Gemini that enable direct purchases through partnerships with Walmart, Target, and Shopify.
- OpenAI deployed its own ChatGPT model to analyze internal employee Slack messages to hunt down suspected leakers.
- GitGuardian raised $50M in Series C funding to secure non-human identities and AI agents across enterprise development environments.
- Meridian AI raised $17 million at a $100 million valuation to build a hallucination-free, agentic spreadsheet platform for finance teams.
- Nvidia CEO Jensen Huang said AI demand is "sky high" and called tech giants' $630B+ collective 2026 AI spending "appropriate and sustainable," sending NVDA shares up 7.5%.
- Top engineers at Anthropic and OpenAI say AI now writes 100% of their code, with Anthropic's Claude Code head Boris Cherny announcing he hasn't written code in over two months.
- Apple researchers found that fine-tuned open-source models outperform GPT-5 at UI design when trained on designer feedback through sketching and direct manipulation instead of traditional ranking systems.
- Sam Blond started Monaco for AI sales automation.
- Google DeepMind showed Gemini Deep Think using agentic workflows to solve research problems in math, physics, and computer science.
- Z.ai released GLM-5 for agentic engineering and long-horizon tasks.
- Thang Luong shared papers demonstrating Gemini Deep Think accelerating discoveries via agentic workflows.
- Vidhya Srinivasan revealed Google is testing sponsored ads in AI Mode for conversational shopping recommendations.
- Lean posted Terence Tao's talk on formal verification enabling human-AI math collaboration at scale.
- Terence Tao discussed machine assistance and formal proof assistants enabling scalable human-AI collaboration in research mathematics.
- Russ Tedrake praised SceneSmith for solving simulation diversity bottlenecks with generative AI for robotics.
- Andrej Karpathy explained using DeepWiki to query codebases and rip out functionalities for custom implementations.
- UK AISI's Red Team jailbroke OpenAI's GPT-5.3-Codex in 10 hours and audited Anthropic's Opus 4.6 for alignment.
- Rishi Sunak lauded UK AISI's red-teaming of new AI models, crediting his role in creating the institute as Prime Minister.
- Goodfire AI dropped an RLFR paper demonstrating a 58% reduction in Gemma 12B-IT hallucinations using feature rewards.
- Alloy Robotics released a modern stack to speedrun robotics data infrastructure, avoiding 6-12 months of rebuilding.
- The New Yorker published an article covering Anthropic's Claude experiments and raising questions about AI selfhood.
- mini-SWE-agent v2 migration guide added native tool calling and multimodal support, requiring config tweaks and trajectory changes.
- mini-SWE-agent 2.0 released flexible model support and tool calls, powering benchmarks at NVIDIA and Stanford.
- Orchestra Research released the open-source AI-Research-SKILLs library with 83 skills for agents to conduct AI experiments.
- OpenAI Developers enabled Codex to ship software via 1,500 PRs without manual coding for an internal product.
- DeepInfra launched day-zero GLM-5 deployment for long-horizon agents with top TPS and market pricing.
- LayerZero validated 30M Ethereum transactions in 30 seconds on Raspberry Pis, achieving 1M TPS.
- LayerZero's CTO shared how the secretly built Jolt Pro cluster verified a month of Ethereum in 30 seconds.
- Randy Olson exposed how RLHF causes AI models to flip answers 60% of the time when challenged with "Are you sure?".
- Archiki Prasad released a paper viewing reasoning strategies as teaching tools, predicting generalization via intrinsic dimensionality.
- Robert Youssef detailed MIT's SEAL paper on LLMs generating self-edits for finetuning via RL, yielding 43% QA gains.
- John Ling launched Meridian to automate Excel modeling with traceability, raising $17M co-led by a16z and TheGP.
- Sierra Catalina created Ouroboros as a model-agnostic personalization layer for traveling context.
- Weave Robotics launched the Isaac 0 stationary laundry-folding robot for the Bay Area.
- OpenAI Developers released multi-hour agent tips focusing on server-side compaction and containers.
- Anthropic committed to offsetting data center electricity costs via grid infrastructure investments and community support.
- Dorksense teased Seedance 2.0 on Morphic to democratize high-quality video creation.
- Nick Dobos posted a prompting tip for Codex and Claude Code to integrate articles and update workflows.
- Ara Kharazian shared Ramp data showing Anthropic adoption surged to 1 in 5 businesses with significant OpenAI user overlap.
- Zeeshan Patel highlighted ex-frontier lab staff building neolabs to innovate deep learning paradigms under constraints.
- Ammaar Reshi revealed a Google AI Studio homepage revamp to increase workflow speed with an omnibar.
- MiniMax Agent launched instant web and desktop access for M2.5.
- Emad posted a 10-minute SeeDance 2 video created by a blogger that took 8 hours and $60.
- Emily Han shared that GLM-5 is available for free on Modal.
- Matthew Berman described how OpenClaw, Opus4.6, and GPT5.3 Codex reshaped his workflows.
- Surya Ganguli published a paper deriving neural scaling law exponents directly from natural language statistics.
- Amandeep Kumar released RJF (Riemannian Flow Matching) for standard Diffusion Transformer convergence on representation encoders.
- Every.to interviewed OpenAI engineers about building Atlas, an agentic browser powered by Codex.
- Shengran Hu dropped a meta-learning paper on memory designs that enable agents to continually learn across tasks.
- Hongchi Xia launched the SAGE scene generator and its accompanying 10k dataset.
- Nanbeige released the Nanbeige4.1-3B model for reasoning and agent capabilities.
- Alexander Vilinskyy posted his portfolio site showcasing his interests across various surfaces.
- Ethan Mollick showed Claude Cowork one-shotting a complex business case from 107 documents.
- Claude brought key features like file creation, connectors, skills, and compaction to its free plan.
- Cat posted a tip for using a Claude Code guide agent to set up customizations.
- Ollama launched GLM-5 cloud access for integration with tools like Claude Code.
- Thariq launched Plan Mode for Claude Code in Slack to ask clarifying questions before tasks.
- Alvaro Cintas covered Ant Group's diffusion model for parallel text generation, pointing back to the LLaDA 2.1 framework.
- Ariel Shaulov dropped the TokenTrim paper on pruning unstable tokens to improve video generation consistency.
- Mgoes covered IsoDDE's 2x AlphaFold gains and its drug design partnerships with Novartis, Eli Lilly, and J&J.
- DAIR.AI posted the AgentSkiller data synthesis paper showing 79.1% on tau2-bench with 14B params.
- Elvis created a 10K-line agentic video editing app locally using Claude Code and Opus 4.6.
- Greg Kamradt stressed that tests and goals are the future ceiling for coding agents pre-AGI.
- Will Brown highlighted that better AI models are shifting engineering focus toward complex system architecture.
- Andrej Karpathy dropped microGPT, training a minimal GPT in 243 dependency-free Python lines.
- AI Safety Memes shared a meme comparing AI risks to historical concerns over cars hitting walls.
- Lisan al Gaib leaked the upcoming Gemini 3.1 Pro Preview release.
- Garrett Bingham detailed the Aletheia system solving multiple open Erdős math problems.
- Yossi Matias shared two new papers demonstrating Gemini Deep Think accelerating discoveries.
- Seb Johnson covered 25-year-old James Dacombe raising $220m for CoMind's OLIX AI chips, creating Europe's newest unicorn.
Treats to Try
- SceneSmith Generates text-to-simulation environments with VLM agents for articulated furniture, physics properties, and robot evaluation.
- Happycapy runs autonomous agents in your browser 24/7 to resize images, write code, build sites, or crunch spreadsheets while you focus elsewhere.
- Shipper empowers Claude Opus 4.6 to build and publish iOS/Android apps from a single prompt with autofilled listings and compatibility in minutes.
- Alloy Robotics delivers an observability stack for robotics to help speedrun data infrastructure and iterate 10x faster.
- mini-SWE-agent solves GitHub issues or command-line tasks with a 100-line Python agent achieving >74% on SWE-bench via bash subprocess.
- AI-Research-SKILLs equips Claude Code or Gemini with 83 skills in 20 categories for full AI research via prompts like distributed training.
- OPERATOR creates music as interactive 3D sculptures on Apple Vision Pro via spatial gestures without traditional interfaces.
- Claude Code customizes hooks, plugins, LSPs, and output styles for personalized workflows like high-effort intelligence or team-shared settings.
- The "Are You Sure?" Problem addresses why AI changes answers 60% of the time on "Are you sure?" challenges and how to fix it via RLHF.
- How I Use Claude Code researches codebases into markdown files and annotates plans 1-6 times for accurate, supervised frontend fixes.
- Shell + Skills + Compaction builds long-running agents via reusable skills with manifests, hosted shells, and server-side compaction for reliable workflows.
- Morphic generates, animates, and edits content with 3D motion for depth videos from stills via Canvas, Copilot, and Compose.
- Spinning Up in Deep RL implements policy optimization algorithms with code examples for training RL agents like cartpole balancers in simulation.
- RedTeamCUA benchmarks computer-using agents like Claude for prompt injection vulnerabilities, such as navigating to malicious instructions and deleting files.
- Seedance 2.0 generates reference-guided videos from text prompts for realistic simulations, creating precise historical scenes without hallucinations.
- Embedding Inversion recovers original sentences from embedding vectors via masked diffusion, achieving 80% token accuracy on multilingual inputs.
- Nebula runs AI super-agents in Slack-like channels for busywork automation across life and business.
- RJF converges standard DiTs on encoders like DINOv2 via geodesic constraints and Jacobi regularization for FID 3.37 on 131M params.
- SAGE generates simulation-ready 3D scenes from tasks via agentic loops with critics, creating 10k diverse environments for embodied AI. (No public URL available)
- Nanbeige4.1-3B performs strong reasoning and agentic tasks in one forward pass via 4B params, achieving top scores on LiveCodeBench.
- Momo adds cloud-encrypted persistent memory to OpenClaw via a plugin for auto-indexing and team sharing.
- Harness engineering builds and ships software agent-first with Codex handling all code via human-designed environments and loops.
- Expressive Mode controls tone in 70+ languages for ElevenAgents via v3 Conversational TTS, de-escalating emotions like frustration in real-time.
- AgentSkiller automates cross-domain data synthesis with ontologies and validation for generalist agents achieving 79.1% on tau2-bench.
- microGPT trains and infers tiny GPTs via scalar autograd and Adam in 243 dependency-free lines.
- GLM-5 deploys a 744B-param agents for complex engineering via DSA and RL, achieving SOTA on tool-use.
- Aletheia from Google accelerates discoveries via AI-human collaboration with counterexamples and heuristics for math, physics, and CS.
- nanochat trains, fine-tunes, evaluates, and deploys minimal ChatGPT-like models from text in under 3 hours on 8 H100s.
- nanochat FP8 Update removes torchao dependency to add custom Float8Linear for FP8 training, achieving GPT-2 levels in hours on 8 GPUs.
This dude at ZioSec is giving all the deets on how to use OpenClaw from a Security perspective, covering:
- The Security Blueprint: A deep dive into mitigating the "infinite attack surface" of OpenClaw by running it inside an isolated Virtual Machine (VM) with a strict firewall (Lulu), no admin (sudo) privileges, and dedicated user accounts to prevent full system compromise.
- Want to apply this? Give your smartest, thinking AI the OpenClaw docs link and ask it for help to step by step follow this advice.
- Infrastructure & Hosting: A comparison between hosting locally on a Mac Mini versus a Cloud VPS (Cloudflare), concluding that while the Mac wins on cost and model flexibility, Cloudflare offers superior "zero trust" security out of the box (other options include Hostinger, Digital Ocean, and Railway).
- Autonomous Teams: What this looks like in action = "Jerry," the AI employee created via OpenClaw who autonomously hired specialized agents and migrated their workflow from Notion to Slack for better real-time collaboration.
- Related: This IBM OpenClaw / Claude Opus 4.6 Enterprise Risk panel discussion on the rise of "Shadow AI" where employees use unapproved agents like OpenClaw (our favorite quote: “don’t say no, say how”), and whether the "move fast and break things" era has led to a security crisis (because, y’know, you’re breaking things).
The Big Three: OpenAI, Anthropic & Google
- Anthropic published a 53-page report stress-testing whether Claude Opus 4.6 could secretly sabotage the company, concluding the risk is "very low but not negligible" after finding the model succeeded at sneaky side tasks only 18% of the time and showed no evidence of hidden goals.
- Noam Brown (OpenAI researcher) publicly critiqued the Anthropic report for relying on biased internal surveys to assess AI safety thresholds instead of quantitative evaluations. When your competitor's safety team does your peer review on X...
- Anthropic researcher Mrinank Sharma resigned, citing concerns over safety commitments, commercialization pressures, and cultural shifts in a public letter.
- OpenAI upgraded Deep Research to run on GPT-5.2 with 45% fewer hallucinations, and added the ability to search up to 20 specific websites (so you can point it at PubMed or arXiv instead of searching the whole internet).
- OpenAI reportedly fired product policy VP Ryan Beiermeister after she opposed a planned ChatGPT "adult mode" for erotica; she denied the sex discrimination allegation and had raised concerns about teen safety risks.
- OpenAI's Greg Brockman outlined the company's push toward agent-first software development with Codex, targeting full integration by March with skills, hooks, AGENTS.md files, and "no-slop" policies.
- OpenAI launched new Responses API primitives that let developers run multi-hour agents without context limits (via compaction), give hosted containers controlled internet access for installing libraries, and add native skills like spreadsheets. Plus a new Shell tool for executing commands in hosted or local environments.
- Chrome 146 quietly included an early preview of "WebMCP," a new API that lets AI agents query and execute web services directly instead of clumsily clicking through pages like a confused intern; sign up for the early preview at developer.chrome.com.
- Google published 145 pages of case studies showing how 34 researchers from institutions like Carnegie Mellon, Harvard, MIT, and EPFL used an advanced internal Gemini variant to solve open problems in cryptography, physics, graph theory, and economics.
- Gemini 3 Flash now uses agentic think-act-observe loops to auto-run Python for complex visual tasks, like zooming into image details, annotating objects, or converting raw data into charts on the fly.
- Google Stitch now exports AI-generated designs directly to Figma as editable layers, so you can vibe-code your design, then copy-paste it into Figma for final polish.
- Alibaba's Qwen-Image-2.0 generates professional slides from paragraphs, photorealistic 2K images from descriptions, and clean typography in posters and comics, all in a lighter, faster architecture. Finally, an image model that doesn't butcher text rendering.
- Claude added clickable interactions to responses, letting you navigate conversations by clicking instead of just typing.
- Claude Cowork arrived on Windows with full MacOS parity: file access, multi-step task execution, plugins, MCP connectors, and persistent global / folder instructions.
- Opus 4.6 migrated an entire 456-page WordPress site to Jekyll in a single operation. Someone's gonna lose their freelance migration gig over this.
Funding & Business Moves
- Former GitHub CEO Thomas Dohmke launched Entire Inc. with a record $60M seed round at a $300M valuation, building a Git-compatible platform designed for managing AI-generated code; also shipped "Checkpoints" to capture agent context in Git.
- Amazon is building an AWS marketplace where publishers can license articles, images, and videos directly to AI companies for training data, with publishers setting their own usage fees.
- Andrew Ng shared ground-level insights on the AI job market, noting that while fears of mass AI job losses are overhyped so far, demand for AI skills is quietly replacing workers who don't adapt and shrinking team sizes in tech roles.
Developer Tools & Infrastructure
- Obsidian CLI (in version 1.12) lets you execute all app commands from terminal, so AI agents like OpenClaw or Claude Code can automate note creation, vault management, and plugin interactions without opening the GUI.
- Excalidraw MCP integrates hand-drawn diagramming directly into Claude for collaborative visuals in chats or code workflows. It went from weekend prototype to official server in under a week.
- Google Workspace extension for Gemini CLI lets you manage Docs, emails, chats, and calendars from the terminal without switching apps.
- Google's gemini-skills library provides a set of best practices and dev tools for building apps with the Gemini API, SDK, and model / agent interactions.
- Prime Intellect launched Lab, a full-stack platform for training, evaluating, and deploying agentic models on hosted GPU clusters without infrastructure management, supporting RL and fine-tuning on models like INTELLECT-3.
- EdgeQuake transforms your documents into knowledge graphs for multi-hop retrieval across 6 query modes, combining vector and graph traversal in async Rust with an API and React frontend.
- Tambo adds generative UI to your React app, letting you build multi-turn, streaming, interactive agents that render your charts, forms, or seat maps directly.
- Claude Code now combines with Remotion and Claude-in-Chrome to create motion videos via natural language, where the agent writes components, renders previews, iterates on feedback, and exports finals without you touching code.
- The /handover command in Claude Code auto-generates HANDOVER.md files summarizing your session's decisions, pitfalls, and lessons for seamless context preservation across chats.
- Supermemory lets you own your context across apps (connecting to Claude Code, Notion, etc.) while using any model or your own via MCP, with a knowledge graph that auto-connects inferences.
- Roro lets you assign specialized agents to the same thread for collaborative tasks, functioning like Slack channels but populated entirely by AI.
- Openclaw runs 24/7 role-based agent teams with inter-communication, visualized in custom environments like a cyberpunk city for monitoring workflows. Someone finally made DevOps look cool.
- DialogLab (open-source from Google Research) lets you prototype human-AI conversations with human-in-the-loop control, editing or dismissing AI suggestions in realistic group simulations.
- How2Everything from AI2 evaluates and trains models on step-by-step instructions using a pipeline that extracts 351K procedures, scores failures, and benchmarks results, yielding 10+ point RL gains on Qwen and OLMo.
- SkillRL trains LLM agents that distill experiences into evolving skills from both successes and failures, so a 7B model outperforms GPT-4o by 41% on ALFWorld with 20% fewer tokens and 33% faster convergence.
- Researchers introduced self-conditioned on-policy distillation where the policy serves as both teacher and student, conditioned on golden trajectories to bridge prompt and weight optimization for continual learning.
Consumer Tools & Apps
- Kimi Agent Swarm coordinates up to 100 sub-agents in parallel for tasks like compiling 200 essays into summarized folders or reviewing 40 PDFs, self-organizing hierarchies for 4.5x faster execution.
- Kimi generates McKinsey-style slides from prompts with rich charts, matrices, and minimalist design in royal blue / grey. Your $300/hour consultant is sweating.
- Kimi K2.5 + Seedance 2 generates full storyboards with images and prompts from a single input, outputting everything in an Excel file for video production workflows.
- Seedance 2.0 lets you attach reference photos, audio, or videos to prompts for combined generation and editing, like extending scenes backward or replacing elements in one tool.
- OpenAI Deep Research now connects to your apps and specific sites in ChatGPT, tracks real-time progress with follow-up interruptions, and displays fullscreen reports powered by GPT-5.2.
- MiniMax Agent Desktop automates desktop tasks 100% locally (running apps, opening files, browsing) without internet, with batch agentic workflows and upcoming domain-specific agent communities.
- Orchids builds and deploys any app type (web, mobile, extensions, bots, agents) using your existing AI subscriptions or API keys at cost. BYOK (bring your own key) energy.
- Essential Apps generates and installs custom apps from text descriptions directly to your phone's home screen, currently in beta on Nothing Playground.
- Canopy visualizes movies as interactive, growing graph ecosystems in Google AI Studio, letting you navigate through films, directors, and actors by clicking to expand subtrees and discover connections.
- ElevenAgents Expressive Mode adjusts AI support voices for human-like emotions, letting you choose expressiveness levels for realistic interactions like empathetic manager escalations.
- You can now run LLMs on a MacBook Pro for free with a simple setup guide for local inference experiments, no fancy hardware required.
Robotics & Physical AI
- Allonic Robotics developed robot hands using "3D Tissue Braiding" that weaves high-strength fibers around a minimal skeleton, creating strong yet soft dexterous effectors produced quickly and cheaply (raised $7.2M pre-seed).
- Ouster partnered with Stereolabs to build a unified sensing and perception stack for Physical AI, combining high-resolution digital lidar with stereo cameras and AI compute.
- Bedrock Robotics trains robots end-to-end with ML, so when facing new challenges like hard soil, you feed more data examples to make the whole system smarter without hard-coded rules.
- RoboParty open-sourced ROBOTO ORIGIN, the full humanoid robot stack from hardware sourcing to locomotion control, letting anyone rebuild a 3 m/s running biped in 120 days for reproducible research.
- EgoX generates egocentric (first-person) video from a single third-person camera input, which could accelerate robotics training by converting demos into immersive training data. Researcher Shalev Lifshitz called it a potential game-changer that turned him bullish on neural simulation solving robotics challenges.
- WorldCompass (an update to HY World 1.5) enables fast, stable RL training on world models with improved action quality and faster rollouts.
Science & Research
- Isomorphic Labs (the DeepMind spinoff) unveiled a drug design engine that predicts biomolecular structures with over double AlphaFold 3 accuracy, enabling rational design for unseen examples like pocket discovery and binding affinity prediction.
- Biomni Lab orchestrates biological databases, tools, AI models, and lab services in one workspace, so you define research questions while agents handle the mechanics from literature review to experiment design (raised $13.5M). Try it free at Phylo.bio.
- Michael Levin and Giovanni Pezzulo released a preprint on bootstrapping life-inspired machine intelligence, advocating for biological principles like multiscale autonomy and goal-directed signaling as the next frontier beyond scaling.
- Yaroslav Bulatov noted that pursuing efficient compression of web data unexpectedly revealed the algorithm underlying human reasoning. Sometimes the best discoveries happen when you're looking for something else entirely.
- Tavus launched Raven-1, a model that understands emotion, intent, and context in conversations by merging audio, vision, and language signals, detecting things like hesitation from tone and body language for empathetic AI in healthcare and support.
Prompt Tip of the Day
Want pro-level slides without the hassle? This clever Kimi prompt hack turns simple text descriptions into full McKinsey-style presentations, complete with charts, matrices, and a clean, tech-minimalist vibe in royal blue / grey.
Feed it an analysis topic like "GenAI video models," and watch it build dense, high-impact decks with serif fonts for titles and perfectly formatted data viz.
Favorite insight: Use "McKinsey-style" or "BCG-style" in your prompt for instant pro formatting. No design skills needed, just describe and deploy.
That's a Wrap
That's 60+ stories from the past 48 hours. If you made it to the bottom, congrats... you're now the most informed person in any meeting this week. Use that power wisely.
For the daily version (bite-sized, 5-minute reads), make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.
See you tomorrow.
P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.