Ai2's Olmo 3 isn't just a new state-of-the-art open model; it's a fully transparent "model flow" that releases every dataset, recipe, and checkpoint, finally giving developers the blueprint to build (and understand) their own reasoning engines.
Usually, when a new AI model drops, we get a flashy blog post and a set of "weights" (the final, trained brain of the AI). It’s like a chef giving you a delicious cake but refusing to show you the recipe, the ingredients, or the kitchen where it was baked.
Olmo 3 is different.
The team at the Allen Institute for AI (Ai2) just dropped Olmo 3, a family of 7B and 32B parameter models. But they didn’t just give us the cake. They gave us the farm where the wheat was grown, the blueprints for the oven, the chef’s diary, and the exact temperature of the kitchen.
This is a deep dive (based on their supporting technical paper) into the most transparent AI release in history. We are going to unpack everything—from the "Duplodocus" tool they built to deduplicate the internet, to the "Delta Learning" technique used to teach the model to think.
Buckle up. We’re going fully open-source.
Technical terms decoded in this issue:
Okay, let's get into it!
Here's tje TL;DR: The Allen Institute for AI (Ai2) released Olmo 3, a new family of 7B and 32B models. Unlike "open weights" models (like Llama) that only give you the final product, Olmo 3 releases the entire "Model Flow"—including the data, training recipes, and intermediate checkpoints.
WHY IT’S IMPORTANT: This is a massive win for transparency. Most "open" models are black boxes—we don't know what data they were trained on. Olmo 3 allows developers to inspect the exact data that went into the model, making it the safest bet for compliance-heavy industries.
Ai2 isn't just giving you the data; they are giving you a flashlight to look inside the black box. They launched OlmoTrace, a tool integrated directly into their Playground. If the model hallucinates or gives a weird fact, you can verify it instantly. It traces the model’s output back to the exact training documents that influenced it. This closes the loop between "What the AI said" and "What the AI read."
WHAT TO DO:
For Founders: Explore "mid-training." Because Ai2 released the intermediate checkpoints, you can insert your proprietary data during the training process (not just after), allowing for much deeper customization than standard fine-tuning.
How to use it: You can try it out on the Ai2 playground, or download the models here and run them with LM Studio. They're also on HuggingFace if you want to download and run them with another tool.
Now, let's dive into the full details from the technical report.
Most "Open Source" AI isn't actually open. Llama 3? That’s "Open Weights." You can use the model, but you don't know exactly what data trained it.
Olmo 3 introduces the concept of Model Flow. This is the full lifecycle of the model, including every stage, checkpoint, datapoint, and dependency.
Why does this matter?
If you want to customize an AI, usually you just tweak the final version. But with Olmo 3, you can intervene at any stage. Want to change how the model learns math? Go back to the "Midtraining" stage. Want to change how it filters safety data? Go to the "Post-training" stage.
The Family Portrait:
Let’s break down how they built this beast, step by step.
The Base model is the bedrock. It comes in 7B and 32B sizes. The goal? High performance across the board, but specifically designed to be "post-trainable"—meaning it’s primed to learn reasoning later on.
Training a model starts with data. Ai2 curated Dolma 3 Mix, a massive dataset of 5.9 Trillion tokens.
Here is the secret sauce of Dolma 3:
The internet is full of garbage and repeated text. To fix this, the team built a custom tool called Duplodocus (written in Rust, because obviously). It performs deduplication in three stages:
Result: They shrank the web corpus by 75%, leaving only the high-quality stuff.
Most models just filter out "bad" data. Olmo 3 does something smarter. They identified the highest quality data (using a classifier trained on OpenHermes and UltraChat) and upsampled it.
PDFs are usually a nightmare for AI. They are visual soup. Ai2 used a tool called olmOCR. Instead of just copying the text, it renders the PDF as an image and uses a vision model to extract the text, preserving the structure, math formulas, and layout. This created a massive dataset of scientific papers that other models simply can't read.
The model architecture is standard (Decoder-only Transformer), but with a twist to handle long conversations:
The Result:
Olmo 3 Base 32B outperforms Llama 3.1 8B and Qwen 2.5 7B on math and code, and rivals the 32B parameter class leaders.
You might notice Olmo 3 compares itself to Qwen 3 VL in the benchmarks. Why compare a text model to a Vision-Language model? In this Reddit thread, the Olmo authors revealed a pro-tip: Qwen 3 VL is "secretly an amazing text-only model," especially at the 32B size. By beating it, Olmo 3 isn't just winning against other text models; it's beating the best multimodal heavyweights, too.
After pretraining on the generic web, the model is smart, but unfocused. Enter Stage 2: Midtraining.
They trained for another 100 Billion tokens on a specific mix called Dolma 3 Dolmino Mix. This stage is all about "Capability Boosts."
How do you know if a new dataset is good without spending $100k training a model? You use Microannealing.
The team found that existing open datasets had restrictive licenses (often because they were generated by Llama, which has a specific license). So, they recreated them:
This is vital. A lot of AI models "cheat" by accidentally training on the test questions.
Ai2 ran a massive N-gram Decontamination sweep. They scanned their training data for any 8-word sequences that matched the benchmark questions (like GSM8K or MMLU) and deleted them.
Most models get confused if you paste a 50-page document. Olmo 3 extends its context window from 8,192 tokens to 65,536 tokens (64k).
The Recipe for Long Context:
The Result: On the RULER benchmark (the gold standard for long context), Olmo 3 32B scores a 96.1 at 4k length and holds strong up to 64k, beating Llama 3.1 8B and Apertus.
This is the crown jewel. Olmo 3 Think is designed to compete with "reasoning" models that "think" before they speak (generating hidden chains of thought).
This uses a three-stage post-training recipe: SFT
→→ DPO →→ RLVR.
They curated a dataset called Dolci Think.
This is where it gets technical and fascinating. They used DPO (Direct Preference Optimization).
The Innovation: Delta Learning 📉
Usually, you want to show the model a Great Answer and a Good Answer. Ai2 found that didn't work well for reasoning.
Instead, they used Delta Learning: They pair a Great Answer (from a smart model like Qwen 32B) with a Terrible Answer (from a tiny model like Qwen 0.6B).
Reinforcement Learning (RL) is usually hard because it's subjective. "Write a funny poem" is hard to grade. "Solve this math problem" is easy to grade.
Olmo 3 Think uses RLVR (Reinforcement Learning with Verifiable Rewards).
The Infrastructure: OlmoRL
To train this, they built OlmoRL, enabling Active Sampling.
Key Finding: RL works best when applied after DPO. If you skip DPO, the model isn't "primed" enough to learn from the RL signal.
While "Think" models are cool, sometimes you just want a quick answer. Olmo 3 Instruct is built for speed and utility.
A modern assistant needs to use tools (calculators, web search). Ai2 trained Olmo 3 Instruct on two types of data:
Here is a weird quirk of AI: Models think "Longer = Better." DPO usually makes models yammers on forever.
For the Instruct model, Ai2 applied Length Control. During the DPO phase, they penalized the model if the "Chosen" answer was significantly longer than the "Rejected" answer.
Here is the most scientifically interesting part of the release.
Olmo 3 RL-Zero is a model trained via Reinforcement Learning directly from the Base model, skipping SFT entirely.
Why do this?
DeepSeek proved you can get reasoning behaviors purely from RL (DeepSeek R1-Zero). Ai2 wanted to reproduce this in a fully open environment to prove it wasn't a fluke or result of data contamination.
The Experiment:
The "Spurious Reward" Check:
To prove the model wasn't just memorizing answers, they ran a "Negative Control." They gave the model Random Rewards (rewarding it for nothing). The model did not improve. This proves the gains in RL-Zero are real, genuine learning, not just dataset leakage.
Check out this livestream video they did with Hugging Face for the launch for more juicy insights.
This release is all about the code that built them.
Olmo-Core:
They released the training code. It is fast.
Usually, the cost to train these models is a closely guarded corporate secret. Because Ai2 is fully open, the authors spilled the beans on Reddit.
If you're wondering where the Mixture of Experts, or MoE version is (that is an efficient architecture that routes data to specific "expert" sub-models rather than activating the entire neural network for every task), the Ai2 team confirmed on Reddit that it is actively on the roadmap. One researcher called it "one of my regrets" that it didn't land this year, so expect an efficient, sparse Olmo-MoE in 2026.
Olmo 3 Think-32B is currently the strongest fully open thinking model on the planet. It beats Qwen 2.5-32B-Instruct. It beats Google' Gemma 2 27B. It narrows the gap to the closed-source giants.
But more importantly, Ai2 just handed the keys to the kingdom to every researcher, developer, and student. You don't just get the brain; you get the memories, the textbooks, and the teachers.
Open Source just got a whole lot more open. And for that, the entire world should be thankful.
Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.