SHARE

😸 The Top 10 WILDEST GPT-5.2 Demos

PLUS: GPT-5.2 vs Opus 4.5 vs Gemini 3 LIVE!

Written By

Corey Noles

Dec 15, 2025

7 minute read

Welcome, humans.

🔴 We’re LIVE RIGHT NOW to go fully hands on with GPT-5.2 and compare it to Gemini 3.0 and Opus 4.5.

Thanks to everyone who came out to yesterday’s marketing tool round-up (and an extra special thank you to the folks who stayed a whole extra hour to mess around w/ GPT 5.2 in the playground)!

We’ve turned both yesterday’s stream AND our stream with Andrew Hsu of Speak into blogs for you to skim through here:

Here’s what happened in AI today:

We break down the wildests demos of GPT 5.2
Trump signed an AI executive order establishing national AI standards.
DeepMind signed a UK government deal for frontier AI applications.
Google released its Deep Research agent for you to use in your own apps.

Advertise in The Neuron here!

Check out these WILD GPT-5.2 Demos
Agents that don’t suck
Prompt Tip of the Day
Treats to Try
Around the Horn
Turn AI Into Your Income Stream
Friday Trivia
A Cat’s Commentary

Check out these WILD GPT-5.2 Demos

So yesterday Sam Altman and company unveiled GPT-5.2 as “the smartest generally-available model in the world,” and within hours X filled up with wild demos, leaderboards, and benchmark takes that alternated between “this is unreal” and “this is overhyped.”

The craziest GPT-5.2 demos so far fall into two camps: things that look like magic, and things that look like magic but fall apart under scrutiny.

It one-shot a full 3D graphics engine. Pietro Schirano asked GPT-5.2 Thinking (on high reasoning effort) to build a 3D graphics engine and got a single-file program with interactive camera controls and 4K export in one shot—no iterative prompt-chaining required.
Then it turned into a glowing physics toy. Building on that, Flavio Adamo had GPT-5.2 generate the “Hex Bounce” scene—glowing balls ricocheting inside a rotating hexagon in three.js—that you can poke at in the Hex Bounce Glow playground, which has quickly become a go-to stress test for AI-written physics code.
It reads entire game scripts without losing the plot. To test long-context, Hangsiin fed GPT-5.2 an entire video-game script and reported that it stayed coherent and useful when answering detailed questions, outperforming earlier “long-context” models that tended to forget key details.
It plays Pokémon Crystal live on stream. In a very literal “agentic” demo, Clad3815 wired GPT-5.2 up to play Pokémon Crystal (Hard Mode) on stream, turning the model into a game-playing agent whose decisions you can watch in real time on Twitch.
It helps scientists think through real problems. Derya Unutmaz, an immunologist, says one of his first tests with GPT-5.2 Pro was asking it for the most innovative, first-principles questions needed to understand the immune system—using the model as a collaborator to surface deep research directions he might not have phrased the same way himself.
It can derive nontrivial math conditions on the fly. In a detailed thread, Sebastien Bubeck calls GPT-5.2 OpenAI’s best science model yet, pointing to scores like 92.4% on GPQA and 52.9% on ARC-AGI-2—and showing it independently derive the optimal 1.75/L step-size condition for smooth convex optimization, even using code to search for counterexamples.
It built a 5,000-cell financial model… that valued a lemonade stand at $2.7B. On the “looks amazing, still wrong” side, Linas Beliūnas asked GPT-5.2 to build his entire financial model: 30 reasoning tokens, 5,000+ cells, 18 interconnected sheets, modularized projections, dynamic scenarios, sensitivity tables, and pretty charts. As he put it in his LinkedIn write-up, “none of the numbers added up”—the DCF priced his lemonade stand at $2.7B.
It makes genuinely beautiful, long-context writing—and keeps up over huge inputs. Elie Bakouch shared charts showing GPT-5.2 dramatically outperforming GPT-5.1 on long-context benchmarks, while other testers highlighted literary prompts where 5.2 sustains style and plot over multi-page inputs instead of degrading into repetition.
And yes, it still generates mesmerizing coding toys. Beyond the flagship demos, posts from builders like Jeremy Mack and Flavio Adamo show GPT-5.2 Purple and its siblings generating ball-physics sandboxes and neon-lit three.js scenes that feel closer to interactive art pieces than “hello world” tutorials.

So GPT-5.2 LOOKS LIKE the most capable generally-available model today… but then again, so does Gemini 3 and Opus 4.5. So we’re going to test all three of them on today’s livestream to find out how capable they really are… and which is the best for your use case. Come join!

FROM OUR PARTNERS

Agents that don’t suck

Are your agents working? Most agents never reach production.

Agent Bricks helps you build high-quality agents grounded in your data. We mean “high-quality” in the practical sense: accurate, reliable and built for your workflows.

Generic benchmarks don’t cut it. Agent Bricks measures performance on the tasks that matter to your business.

Evaluate agents automatically, and keep improving accuracy with human feedback. With research-backed techniques for building, evaluating and optimizing, you can turn your business data into production agents faster — with governance built in from day one.

See how Agent Bricks works

Prompt Tip of the Day

Use GPT-5.2 as your workflow architect, not just your answer machine.

Instead of asking, “Help me with X,” try this two-step meta-prompt:

Step 1 – Design the workflow
“You’re my AI workflow architect. I want to <goal> (e.g. ‘review long contracts’ / ‘debug large codebases’ / ‘plan experiments’).

List 3 different workflows I could use with GPT-5.2 to do this reliably.
For each, include: the core loop (what I send you each step), which tools/context to attach, and where a human should verify or override the AI.
Then recommend the one you think I should start with and explain why in 3 sentences.”

Step 2 – Turn it into reusable prompts
“Great, let’s implement Workflow #<n>.

Write me a single ‘master prompt’ I can save and reuse for this workflow.
Then write 3–5 short follow-up prompts (‘check for errors’, ‘summarize’, ‘turn this into code’, etc.) that I can paste in as buttons/macros.”

The move here is simple: stop improvising new prompts every time, and make GPT-5.2 design and document a repeatable system for your use case—complete with built-in verification steps so you don’t end up with a $2.7B lemonade stand.

Treats to Try

*Asterisk = from our partners (only the first one!). Advertise to 600K readers here!

*Ideas move fast; typing slows them down. Wispr Flow matches your tone, handles punctuation and lists, and adapts to how you work on Mac, Windows, and iPhone. No start-stop fixing, no reformatting, just thought-to-text that keeps pace with you. When writing stops being a bottleneck, work flows.
Give your hands a break ➜ start flowing for free today.
Gemini Deep Research conducts autonomous web research for you—it searches, finds gaps, searches again, then delivers a comprehensive report with citations—paid only rn ($2 per million input tokens).
Worktrace observes your team’s day-to-day work and suggests concrete automations you can deploy, instead of making you guess where to use AI (raised $9M).
Roboflow Rapid makes you a computer vision model in minutes; just upload a video, describe what you care about, and spin up a working vision model and API in a few minutes instead of hand-labeling images for hours (demo).
Stripe’s Agentic Commerce Suite gives you the power to upload a catalog once and sell through many AI agents with unified discovery, checkout, and fraud protection.
AutoGLM is an open-source agent that understands Android phone workflows—like navigating maps or batching notifications (code, HuggingFace).
Amp’s “Look At This” tool sends big PDFs and images to a helper model and returns only the relevant bits, saving your main coding agent’s context window—no pricing details shared.

Around the Horn

Salesforce is hiking fees on apps that sync or copy customer data out of its platform, squeezing data-integration providers such as Fivetran.
US President Trump’s new AI executive order was signed into law as a way to replace a patchwork of state AI laws with a single, “minimally burdensome” national standard.
Google DeepMind’s UK government deal will apply frontier AI to better public services, faster science, and tougher national security.
Agility Robotics will roll out its Digit humanoid robots in a Mercado Libre warehouse in San Antonio, Texas under a year-long exclusive deal.
Anthropic is now taking applications for its May and July 2026 AI safety Fellows cohorts, offering four-month paid stints focused on alignment and safety work (they also have a security track too!).

FROM OUR PARTNERS

Turn AI Into Your Income Stream

The AI economy is booming, and smart entrepreneurs are already profiting. Subscribe to Mindstream and get instant access to 200+ proven strategies to monetize AI tools like ChatGPT, Midjourney, and more. From content creation to automation services, discover actionable ways to build your AI-powered income. No coding required, just practical strategies that work.

Subscribe to Get Your Free Guide

Friday Trivia

We didn’t have time to fit Thursday Trivia in the newsletter yesterday (emergency NL, OpenAI has a new model, y’know how it is), so today you’re getting it on Friday!

A.

B.

Which is AI, and which is real?

The answer is below, but place your vote to see how your guess everyone else (no cheating now!)

A is AI, and B is real.

B is AI, and A is real.

A Cat’s Commentary

Trivia Answer: A is real (y’all are cute!), and B is AI.

Also, can I just say? Shout out to this couple, who totally DOMINATES the first row of results for “selfie couple horizontal instagram”

That’s all for today!

Corey Noles

Corey Noles is the Host of The Neuron: AI Explained podcast and Managing Editor of AI and Experimental Content at TechnologyAdvice, where he leads the charge in testing and refining emerging content strategies across the company's portfolio.