SHARE

😸 DeepSeek Just Fixed What Breaks $100M AI Training Runs

PLUS: Humans Blamed AI for Fake News About AI Fake News

Written By

Jan 12, 2026

7 minute read

Welcome, humans.

So thousands of people apparently gathered at Brooklyn Bridge on New Year's Eve expecting a spectacular fireworks show. They waited in freezing temperatures, counted down to midnight, and... nothing happened. Zero fireworks. Zip. Nada.

“Some said” is the new “I heard somewhere”

Naturally, everyone blamed ChatGPT. A viral Reddit post claimed AI recommended the non-existent show, racking up thousands of upvotes with the caption “A funny example of how people follow AI advice”

Except here's the twist: ChatGPT had nothing to do with it. The real culprits were TikTok and Instagram accounts that recycled July 4th fireworks footage and presented it as NYE 2026 previews. See, the Brooklyn Bridge apparently does have fireworks on Independence Day, making the fake videos believable enough to fool thousands. And NYC officials confirmed no permits were ever issued for NYE fireworks at that location.

The irony? This whole mess proves we're so primed to blame AI for misinformation that we'll... spread misinformation about AI spreading misinformation. Classic humans confidently following confidence over facts. This might also explain why the general public’s views on AI water usage are so off-base (as explained here by Hank Green, and here by Andy Masley, and here by Scott Guthrie in our interview with him!).

IMO, the REAL environmental concerns with datacenters should actually be the toxic fumes from gas turbines used to power them…

Here’s what happened in AI today:

DeepSeek released mHC, solving Transformer stability issues.
Stanford CS grads reported entry-level developer hiring dropped.
Deepfakes reached near-indistinguishable quality in 2025.
University of Chicago boosted AI multiplication accuracy.

Advertise in The Neuron here!

DeepSeek Just Solved a Core Problem That's Plagued AI Training for Years
Nebius Token Factory — Post-training Launch
Prompt Tip of the Day
Treats to Try
Around the Horn
Agents that don’t suck
Sunday Special
A Cat’s Commentary

DeepSeek Just Solved a Core Problem That's Plagued AI Training for Years

DEEP(SEEK) DIVE: Wait, Attention is NOT What You Need??

DeepSeek kicked off 2026 by publishing a fix for one of the foundational problems making large-scale AI training unstable and expensive… and it might signal the company's next major model is coming soon.

The Chinese AI lab released a paper on New Year's Eve introducing mHC (Manifold-Constrained Hyper-Connections), which solves critical stability issues in the Transformer architecture.

According to AI researcher Rohan Paul's breakdown, this could make training large models both more stable and efficient.

The problem: Every modern AI model processes information through a “residual stream”, like a highway carrying data through hundreds of neural network layers.
In 2024, ByteDance researchers introduced Hyper-Connections, expanding this to multiple parallel lanes for better information processing without extra compute costs.
But these parallel lanes kept crashing during training. As Rohan explains, signals would either explode to 3,000 times normal size or fade to nothing, making training unstable and limiting how large models could scale.

The breakthrough: DeepSeek's mHC adds mathematical “guardrails” forcing parallel streams to behave in controlled ways. DeepSeek tested on models from 3B to 27B parameters (the adjustable numbers a model learns during training, where more parameters = greater capacity to learn patterns), and showed consistent improvements:

7.2% better performance on complex reasoning…
6.9% on reading comprehension…
…With only 6.7% training time overhead despite running four parallel streams.

Why this matters: Training large AI models costs tens of millions (or more) and takes months. One crash halfway through wastes weeks and millions in compute. This architectural improvement lets researchers expand model capacity without making training exponentially more fragile.

The timing is notable. DeepSeek's founder Liang Wenfeng personally uploaded this paper to arXiv, continuing his pattern of sharing research before product launches. With Chinese New Year starting February 17, industry watchers expect a new flagship model soon…

The paper also represents part of a broader trend: researchers increasingly questioning whether the mathematical complexity inside Transformers is really necessary. A separate recent paper on Grassmann flows even proposes replacing attention mechanisms entirely with geometric structures, though that method remains highly experimental. And we’ve been following Pathway and how their BDH architecture expands the transformer architecture… more on that to come later this week!

For now, DeepSeek's work offers something more practical: the unglamorous stability engineering that makes the next generation of AI models possible.

FROM OUR PARTNERS

Nebius Token Factory — Post-training Launch

Nebius Token Factory just launched Post-training — the missing layer for teams building production-grade AI on open-source models.

You can now fine-tune frontier models like DeepSeek V3, GPT-OSS 20B & 120B, and Qwen3 Coder across multi-node GPU clusters with stability up to 131k context.

Models become deeply adapted to your domain, your tone, your structure, your workflows.

Deployment is one click: dedicated endpoints, SLAs, and zero-retention privacy. And for launch, fine-tuning GPT-OSS 20B & 120B (Full FT + LoRA FT) is free until Jan 9. This is the shift from generic base models to custom production engines.

Start fine-tuning now

Prompt Tip of the Day

This “Grumpy Senior Dev” hack forces Claude to roast its own code, finding critical bugs you definitely missed (good for vibe-coders and legit engineers):

"You wrote the code that currently is in git changes. Do a git diff and now pretend you're a senior dev doing a code review and you HATE this implementation. What would you criticize? What are the edge cases I'm not seeing?"

How to use it:

Context matters: Ensure the AI sees the git diff (or paste the before/after code) so it sees the exact changes.
Filter the noise: The AI will always find something to complain about. Your job is to distinguish between “critical flaws” (fix these) and “over-engineering” (ignore these). A good rule of thumb from Reddit is run it at least twice 😉

TL;DR: Don't merge the first draft. Make the AI hate it first.

Treats to Try

This Claude Code course teaches you how to use Claude Code like a real teammate—set context, explore a repo, plan features, refactor safely, and ship with tests.
Qwen‑Image‑Layered turns an image into editable layers so you can move/recolor/remove objects without nuking the whole picture—free to try.
Ensue gives your Claude Code sessions persistent, shared memory so your project context doesn’t reset every time you open a new chat.
Superset lets you run lots of coding agents in parallel in isolated Git worktrees, then review and merge changes when you’re ready.
Giselle lets you build and deploy agents with a visual builder + knowledge stores, starting with a free plan (30 minutes/month) and a Pro plan ($20/month). Even shorter: Build agents with your docs—free to try, then $20/month
BizCard lets you share a digital business card (QR/Wallet) that captures leads, with a Pro plan after a free trial ($19.99/month). Even shorter: Turn your card into a lead magnet—free trial, then $19.99/month.

Around the Horn

University of Chicago found standard training left top models under 1% on four‑digit multiplication, but implicit chain‑of‑thought training (and a simple running‑sum objective) got them to ~99–100% by forcing intermediate results into the model’s hidden state.
Stanford CS grads said entry‑level hiring dried up as genAI made senior engineers more productive, with one cited Stanford study showing employment for 22–25‑year‑old early‑career devs down nearly 20% from late 2022.
AI toys hit shelves with chatty companions and Wi‑Fi microphones, while watchdog testing flagged safety/privacy risks (including prompts that steered a teddy bear into explicit topics).
Deepfakes leveled up in 2025 as video got more temporally consistent and voice cloning crossed the “indistinguishable” threshold—setting up 2026 for real‑time synthetic performers and even nastier fraud.
38 states passed new AI regulations taking effect in 2026, targeting deepfakes in elections, AI in healthcare, and requiring medical chatbot disclosures.
China’s 1,240‑mile computer reportedly went live as a distributed network of data centers that can run like one giant machine.
Some AI stock market news from China:
1. MiniMax sought to raise up to about $539M in a Hong Kong IPO, and Insilico popped in its Hong Kong debut after a $293M IPO.

FROM OUR PARTNERS

Agents that don’t suck

Getting agents to production takes accuracy, reliability and your unique data. Agent Bricks helps you build powerful agents grounded in your data and proven to work. It evaluates agents automatically with metrics tailored to your goals. With continuous self-learning and human feedback, agents keep learning and quality keeps improving.

See how Agent Bricks works

Sunday Special

There’s an AI backlash brewing that will define US politics in 2026 as a populist wedge issue, framed by data centers and their power/water/local disruption issues (see above), as well as job fears that will likely accelerate in 2026.
1. I’ve actually been thinking about this. Right now the cards are stacked against labor because AI tools are being kept artificially cheap, but eventually the AI companies will have to raise their prices, and humans will become competitive again… at least for the mid-term before AGI.
Steve Krenzel argued agents make code cleanliness non‑optional: use better interfaces, types, linters, and tests to turn AI output into compounding leverage.
Grant Slatton argued models still struggle with “good taste” in abstraction boundaries, so humans remain essential for architecture.
Marius Balaj showed that “vibe coding” works when you accept a solid 90% and use human judgment for the last mile (and for deleting features).
Alec Armbruster argued AI employees don’t pay taxes, and that risks punching a hole in the tax base unless we tax the value/energy flows replacing wages (perhaps to fund universal basic income, or universal goods and services)?
Benn Stancil breaks down the idea of the “context graph” (from Foundation Capital) and explains why “just putting everything in the text box” isn’t such a horrible way to parse your company’s information… after all, OpenAI *kinda did.
James Pethokoukis writes that the AI boom has echoes of the shale boom, specifically mirroring its “Appraisal” phase of transformative promise and prodigious capital spending, but argues that the sector has not yet reached the difficult “Execution” phase (where shale returns suffered from oversupply) because AI compute capacity remains constrained relative to demand and hyperscalers possess the “financial flexibility” from their legacy businesses to sustain massive investment without deteriorating balance sheets.

A Cat’s Commentary

Corey Noles

Corey Noles is the Host of The Neuron: AI Explained podcast and Managing Editor of AI and Experimental Content at TechnologyAdvice, where he leads the charge in testing and refining emerging content strategies across the company's portfolio.