The weekend brought a surprise AI arms race between Chinese labs, with both Alibaba and Moonshot AI releasing trillion-parameter models that claim to rival or beat Western competitors.
First up: Alibaba previewed Qwen3-Max-Preview, their biggest model yet with over 1 trillion parameters (that’s a lotta params, fam). According to their benchmarks, it beats their previous heavyweight Qwen3-235B-A22B-2507 across the board — better conversations, stronger instruction following, and improved agentic tasks.

You can try it yourself via Qwen Chat or the Alibaba Cloud playground (heads up: they may train on whatever data you share in the playground). It's also live on OpenRouter for those who prefer a unified interface. Pricing is tiered by input tokens (so context): first 32K tokens = $0.861, 32K-128K = $1.434, 128K-252K = $2.151.
The catch?
A couple: Unlike some competitors, Qwen3-Max has no “thinking” mode. Also, Simon Willison noted it's not open weights — only available through their chat app and paid API.
Early demos show it handling complex visual tasks like one-shotting a voxel pagoda garden. And when Willison tested it with his O.G prompt, “Generate an SVG of a pelican riding a bicycle,” the results were... interesting.
Susan Zhang pointed out some hallucination issues, too: “it certainly hallucinates extensive thinking traces... mixing a bunch of search results that don't seem consistent with one another.”
Meanwhile, Moonshot AI wasn't sitting idle.
They released Kimi K2-Instruct-0905, a mixture-of-experts model with 32 billion activated parameters from a total of 1 trillion (read more).

Kimi's upgrades:
- Context window extended from 128k to 256k tokens.
- Enhanced frontend coding capabilities.
- Improved tool-calling accuracy (claims 100% on their turbo API).
- Better integration with coding agents like Claude Code and Roo Code.
The benchmarks are impressive: 69.2% on SWE-Bench verified, 55.9% on SWE-Bench Multilingual, and 44.5% on Terminal-Bench. But then again, benchmarks, what are they good for? It’s all about the vibes, man! And the vibes on Kimi have been vibin’…
For example:
- Ray Fernando is using K2 w/ Cline + Groq to make a fun UI (if you’re technical, here’s how to set that up btw)—you can watch him try it out here.
- Ivan Fioravanti is also “having fun” using K2 w/ Claude Code.
- You can also now use K2 as an agent via GenSpark (they have a cool agent marketplace, you can use K2 specifically with the AI Developer).
Try Kimi K2 at kimi.com or grab the weights from Hugging Face. For blazing-fast 60-100 TPS, check out their turbo API.
Chubby on X says: “Scaling works—and the official release will surprise you even more."
Our take:
Where you at, OpenAI? Not ready to insta-drop and model top these new open source competitors? It seems gone are the days of flip-flopping model-topping to try and swipe each other's good news... whatever else that tells you about the state of the industry, it shows we are certainly in a new phase.
So while the big US AI labs are optimizing their user experiences, the Chinese labs are pushing boundaries with trillion-parameter models that challenge Western dominance. Paging Google DeepMind… we need you! Release the Gemini 3 Pro! We need a Bat signal for Logan Kilpatrick and Demis Hassabis… maybe like a giant (nano) banana??







