The Neuron Under the Hood Digest—June 2025

Supercharge your AI technical knowledge with The Neuron's June 2025 Under the Hood Digest. This month's lineup features hands-on breakdowns of Anthropic's multi-agent architecture, Meta's breakthrough physics-learning models, and compression techniques that squeeze 95% of the fat out of LLMs. Whether you're building production systems or pushing research boundaries, these deep-dive insights deliver real-world engineering impact—one breakthrough at a time.

Grant Harvey

July 29, 2024

Each month, our team hunts down the most ingenious, game-changing, and legitimately useful technical discoveries from the cutting edge of AI development—and explains why they're worth your attention. This is your monthly engineering intel, an inside look at the methodologies and architectural decisions that separate the AI systems that actually work from the ones that crash and burn.

These are engineering breakthroughs we've scrutinized, curated, and occasionally kicked ourselves for not inventing first. From context window hacks that unlock massive parallelization to compression wizardry that puts GPT-4 on a Raspberry Pi, June's digest is loaded with technical discoveries that move the needle.

🔧 Want more brainy deep-dives? Check out our May Intelligent Insights Digest here.

Fair warning: Don't just browse and bounce. Archive this stuff. Some of these techniques might seem esoteric now (hello, tool-testing agents), but they could fundamentally change how you approach AI system design.

Fire up your favorite IDE, stretch those debugging fingers, and get ready to level up your AI engineering game. June's technical insights are here—and they hit different.

‍

June 16, 2025

Anthropic shared their mult-agent research system; here’s Simon Willison’s take; the TL;DR is that their genius approach of using parallel sub-agents with separate 200k token contexts to overcome single-agent limitations convinced him of the system’s value (like using “tool-testing agents” to rewrite flawed tool descriptions and cutting research time by 90% through parallel execution).
Multiverse Computing compresses your AI models by up to 95% so they run 4-12x faster and cost 50-80% less, turning large language models into lightweight versions that can run on your phone, laptop, or even a Raspberry Pi (raised $215M).
Meta released V-JEPA 2, a 1.2B-parameter world model that learned physics from watching videos and can now control robots it's never seen before in brand new environments, needing just 62 hours of robot data to unlock zero-shot transfer that works 65-80% of the time for pick-and-place tasks (paper, code).
Sakana’s new text-to-LoRA tool lets you customize AI models by typing what you want it to be good at, like “help with math problems” or “write marketing copy,” and it creates a specialized version in seconds (paper).
A survey of 600+ software developers found that while 78% report productivity gains from AI coding tools, 76% refuse to ship AI-generated code without human review due to frequent hallucinations and trust issues.
A recent study estimated that AI wrote 30% of US Python code on GitHub by late 2024, worth an estimated $9.6-14.4 billion in annual value (paper).
Andrew Ng shared how he defines the “new breed of GenAI Application Engineers” and what skills make them so highly sought after. mastering AI building blocks (which stay relevant for years) matters more than chasing the latest coding assistants (which become obsolete in 1-2 years), plus the one interview question that predicts success in this highly sought-after position.

‍

June 4, 2025

AgenticSeek is a fully open AI agent that browses the web, codes, and plans tasks locally on your device (try this instead of paying cloud subscriptions).
Here’s how to master Claude Code in 30 minutes, and everything to know about OpenAI Codex in 5 minutes as well as a slightly longer, 19-min in-depth tutorial from Helena Liu.
If you’re going to get real serious about building AI agents, then you’re going to need to learn about building MCP servers—and this video explains that REAL well.
Also, Claude released some new agent-building capabilities in the API, as did Mistral.
Landing AI from Andrew Ng has a new document extractor agent—try it here.
GitHub Agent Mode completes entire coding projects for you by automatically editing multiple files, running terminal commands, and fixing errors as they happen (read more).
Are you a software engineer who wants to become an AI engineer? Then this podcast interview w/ Janvi Kalra is for you!
If you’re an AI engineer and “want to be 5 years ahead” in AI land, you could read all of DeepSeek’s papers.
This is another good X thread of “the top 50” AI papers since 2017.

‍

April 2025

Anthropic published their best practices for “agentic coding” with Claude—Simon Willison pointed out there’s a term called “ultrathink” that’s like a cheat code to get Claude to spend its 32K context window thinking before answering.
Orpheus turns your text into emotional speech that can instantly clone any voice, completely open-source and free.
Gemma 3 QAT shrinks Google’s high-performance Gemma AI models small enough to run on your home GPU (or on your phone w/ Google AI edge)—try it on ollama, Hugging Face, Kaggle and if you’re technical, read this + this.
This is super important: while LLMs are powerful tools for engineers, they risk atrophying problem-solving skills—so developing focus, not just AI proficiency, will be crucial for tackling novel challenges in the future.

‍

March 2025

LG released South Korea’s first “reasoning model, called EXAONE Deep-32B (GitHub)—it excels at complex math, decoding programming challenges, and tasks in both Korean and English (supposedly it outperform QwQ-32B, too).
SmolDocling converts your document images to structured text with all original formatting intact.
CloudFlare launched AI Labyrinth, a tool that traps unauthorized AI crawler bots by redirecting them to convincing but irrelevant AI-content, wasting their computational resources while protecting data and identifying malicious actors.
- Related: AI crawlers are overwhelming FOSS (Free and Open Source Software) infrastructure, so websites are fighting back with tools like Anubis that block AI crawlers.
Here’s a list of all the AI dev tools on the market right now—read more.

‍

February 2025

This ultra-scale playbook walks you through how to efficiently train large language models across hundreds or thousands of GPUs.
Hibiki converts your speech into another language instantly while keeping your original voice (code, models, paper)—currently only French to English atm.
Microsoft’s OmniParser-v2.0 turns screenshots into maps showing exactly what's clickable and what each button does, so your tools know how to navigate any interface.
McKay Wrigley released part one of a two-part series on how to build full-stack apps with o1-Pro through a 6-prompt workflow (~4 hour course FYI!).
Microsoft has a free AI Agents for Beginners course that teaches you how to code AI agents through 10 hands-on lessons (multiple languages supported).

Firecrawl, an AI startup that helps turns any website into organized data, went viral after it posted a job offering for an autonomous agent that can research trending AI models and build sample apps with them with a $15K salary.
Prototype helps you set up Django web applications instantly with Docker and OpenAI, by converting a text description into a working codebase (learn more here).
Nia helps you understand and navigate your codebase faster through smart search and Slack-based answers.
Kokoro turns your text into natural-sounding speech in 8 languages using a lightweight 82M parameter model that you can run anywhere.
HuggingFace created an open-source Deep Research tool, but they weren’t the only ones. Here’s five others:
1. Dzhng’s version (code).
2. Nickscamara’s version (code).
3. RLanceMartin’s version—works with DeepSeek R1 (code).
4. Hwchase17’s version, GPT Researcher (code).
5. Mattschumer’s version (code).

‍

January 2025

‍

Here’s a wild use of Deepseek: you can extract its reasoning and apply it to any other model (like GPT 3.5) via the deepseek-reasoner API—here’s how.
Onit lets you chat with AI models anywhere on your Mac, with support for local processing and multiple providers—currently Mac-only and early stage, so may be buggy (open-source ver).
Llama.vim is a plugin that provides AI-powered code completion similar to GitHub Copilot, but running locally on your computer (Neovim, VSCode).

In this video, Dave Ebbelaar reveals why most companies fail at building AI agents—instead of complex self-directing systems, 80% of AI applications just need simple workflows with predefined steps.
NVIDIA Blueprints are pre-built AI application templates that come with all the code and instructions you need to customize them—kind of like getting a partially assembled IKEA furniture set instead of raw lumber.
Anthropic released Citations as part of its API that lets Claude now cite what claims came from what sources in its output—very helpful for verifying data.

Check out this deep dive that covers the core concepts, tools, planning abilities, and evaluation methods of agents.
Here’s a curated list of resources for understanding how AI agents can take actions on your computer or phone.
Here’s a list of ready-made “agent recipes” inspired by this article from Anthropic on when (and when not) to use an agent.
Check out this 2025 AI engineer reading list, a collection of 50 essential papers across 10 key areas in AI engineering for practical study.
HuggingFace has a free agent course that teaches you how to build AI agents, and they recently released SmolAgents, their own minimalist framework for creating agents with just a few lines of code.

‍

2024

Check out Gemini’s Multimodal Live API: real-time audio/video streaming with tool integration (official docs here).
Rememberall helps you maintain continuous conversations with your GPTs by storing and recalling context from past chats when you mention @rememberall.

Hermes 3 lets you fine-tune how your AI assistant behaves without special prompting - it processes your natural requests with clear reasoning and works with external tools like data searches and code.
Command R7B helps you chat, code, answer questions and work with external data sources in 23 languages while taking up less computing power than bigger models.
Sakana AI developed a technique called CycleQD that reduces LLM memory usage by up to 75% using “neural attention memory modules” (paper here).
Prompt Wizard from Microsoft automates the creation and refinement of prompts for language models through a feedback-driven process that learns from its own improvements (code here).
‍Deepthought-8B is an 8B-parameter Llama-based model that documents its reasoning in JSON format, requires 16GB+ VRAM to run, and includes features like test-time compute scaling and programmable reasoning patterns (try here). HuggingFace link here, or chat with it here.

MagicQuill lets you edit images precisely by letting you draw what you want to add or remove, then automatically understanding and applying your intended changes (can run locally, should work on ~8GB of VRAM).
Here’s a ~19 min tutorial for how to 2x your output coding using o1 Pro.
ContextCite helps you verify generated content by tracking and highlighting which specific sources the AI used to create its responses (here’s a demo, read more about it here).

Meta released Llama 3.3-70B-Instruct, which matches their largest model's performance (405B) but runs on just 17% of the parameters, making it more efficient while outperforming competitors on key benchmarks.
Florence-VL outperforms other top multimodal AI models at understanding images, whether you're asking questions, reading text, or analyzing charts (demo it here).
SmolVLM is a compact multimodal model for image-text tasks requiring only 5GB GPU RAM.
Motion Prompting is a new method to generate videos by drawing motion paths that control how objects and cameras move in the scene.
FSQ OS Places is a dataset of 100M businesses locations worldwide, from their addresses to operating hours to social media profiles to power location-based services, market research, and mapping applications.

‍

Cake snaps together open-source AI tools like Legos, letting companies build and deploy custom AI projects fast (raised $13M).
Lightning Studios gives you ready-to-run AI project templates—click start and your project runs instantly on cloud GPUs.
Windsurf Editor combines VS Code with coding assistance. better terminal command handling, separate chat/code modes, and high-quality code outputs using their Cascade base model—all available in the free tier.
NVIDIA released a multimodal PDF data extraction tool featuring the Capability to process text, graphs, charts and tables—there’s also Docling.

‍

‍

Fileserver is a tool that interfaces with your computer's file system to perform common operations like reading, writing, moving, and searching files on your desktop from within Claude.
Researchers released a new multilingual LLM evaluation benchmark called INCLUDE covering 44 languages with a focus on regional knowledge to test if AI models understand local cultural contexts.
Realtime from Trigger.dev shows your users live progress updates while their tasks are running, like showing a progress bar when uploading files or streaming AI responses as they're generated.
Rerank 3.5 from Cohere finds the most relevant information in your search results by reordering documents based on how well they actually match your query.
Qwen Agent helps you build chatbot interfaces using Gradio 5, requiring Python 3.10 or higher for running the graphical user interface, and QwQ-32B-Preview reasons through complex problems with a 32K context window.

‍

‍

Really cool project - here are the details. Or, if you prefer, you can try it directly in their INTELLECT-1 chatbot here.
QwQ-Preview is Alibaba’s answer to OpenAI’s o1 reasoning model—you can run it on HuggingFace, and use it for commercial purposes under Apache 2.0, but it’s definitely a preview as it does some weird stuff like change languages.
1. Fun fact: OpenAI is now trying to trademark OpenAI o1 (paperwork here) to shield its IP from copycats.
AgentAuth helps you connect your AI agents with other apps.
Someone made an open-source version of OpenAI’s canvas (code here).
llama.cpp is an inference engine to run large language models locally on your own hardware (guide here), though many users prefer more user-friendly tools built on top of it like Ollama or LM Studio for simpler deployment.

‍

‍

MCP is an open-source protocol to connect data to any LLM. Full blog post here… and here’s a demo of why this is great.
LlaVA-o1 is an open-source vision model that uses GPT-o1's step-by-step reasoning approach, but for images instead of text—breaking down complex visual questions into logical steps to get better answers (paper).
There’s also Fireworks, who released f1 and f1-mini, new compound AI models that excel at complex reasoning tasks (like o1), with early access to the API via sign-up form.
Toolhouse adds capabilities like web search and code execution to your AI apps in three lines of code.
LTX-Video is an open-source tool that turns your text or images into high-quality videos in real-time, generating them faster than you can watch them (at 24 FPS).