China's AI Giants Drop Two Trillion-Parameter Models in One Weekend
The weekend brought a surprise AI arms race between Chinese labs, with both Alibaba and Moonshot AI releasing trillion-parameter models that claim to rival or beat Western competitors.
First up: Alibaba previewed Qwen3-Max-Preview, their biggest model yet with over 1 trillion parameters (that’s a lotta params, fam). According to their benchmarks, it beats their previous heavyweight Qwen3-235B-A22B-2507 across the board—better conversations, stronger instruction following, and improved agentic tasks.

You can try it yourself via Qwen Chat or the Alibaba Cloud playground (heads up: they likely train on whatever data you share in the playground). It's also live on OpenRouter for those who prefer a unified interface. Pricing is tiered by input tokens (so context): first 32K tokens = $0.861 , 32K-128K = $1.434, 128K-252K = $2.151.
The catch?
A couple: Unlike some competitors, Qwen3-Max has no “thinking” mode. Also, Simon Willison noted it's not open weights—only available through their chat app and paid API.
Early demos show it handling complex visual tasks like one-shotting a voxel pagoda garden. And when Willison tested it with his O.G prompt, “Generate an SVG of a pelican riding a bicycle,” the results were... interesting.
Susan Zhang pointed out some hallucination issues, too: “it certainly hallucinates extensive thinking traces... mixing a bunch of search results that don't seem consistent with one another.”
Meanwhile, Moonshot AI wasn't sitting idle.
They released Kimi K2-Instruct-0905, a mixture-of-experts model with 32 billion activated parameters from a total of 1 trillion (read more).

Kimi's upgrades:
- Context window extended from 128k to 256k tokens.
- Enhanced frontend coding capabilities.
- Improved tool-calling accuracy (claims 100% on their turbo API).
- Better integration with coding agents like Claude Code and Roo Code.
The benchmarks are impressive: 69.2% on SWE-Bench verified, 55.9% on SWE-Bench Multilingual, and 44.5% on Terminal-Bench. But then again, benchmarks, what are they good for? It’s all about the vibes, man! And the vibes on Kimi have been vibin’…
For example:
- Ray Fernando is using K2 w/ Cline + Groq to make a fun UI (if you’re technical, here’s how to set that up btw)—you can watch him try it out here.
- Ivan Fioravanti is also “having fun” using K2 w/ Claude Code.
- You can also now use K2 as an agent via GenSpark (they have a cool agent marketplace, you can use K2 specifically with the AI Developer).
Try Kimi K2 at kimi.com or grab the weights from Hugging Face. For blazing-fast 60-100 TPS, check out their turbo API.
Chubby on X says: “Scaling works—and the official release will surprise you even more."
Our take:
Where you at, OpenAI? Not ready to insta-drop and model top these new open source competitors? It seems gone are the days of flip-flopping model-topping to try and swipe each other's good news... whatever else that tells you about the state of the industry, it shows we are certainly in a new phase.
So while the big US AI labs are optimizing their user experiences, the Chinese labs are pushing boundaries with trillion-parameter models that challenge Western dominance. Paging Google DeepMind… we need you! Release the Gemini 3 Pro! We need a Bat signal for Logan Kilpatrick and Demis Hassabis… maybe like a giant (nano) banana??