SHARE

😺 Which AI models are the best right now?

PLUS: Bank fraud, AI lies, and AI's zero-click web...

Written By

Grant Harvey

Jul 24, 2025

6 minute read

Welcome, humans.

Any video game fans in the house? Because this one HITS:

If you like that (especially the music), the creator shared how they made it in Suno (and even shared a link to the full song). If you like the art technique, they used MidJourney, where they sketched what they wanted to see, then used image to image and image to video to generate the clips.

Another video game AI video trend right now? Recreating popular games as community theater productions (including backstage):

These were made with Veo 3 Fast. There’s quite a few ways to generate AI videos these days; we also just read that Google’s Veo 3 is now available via API (text to video anyway). This means that developers can now make apps with one of the most powerful video tools ever invented.

This isn’t cheap, though. It costs $6 per 8 second video with audio (at $0.75 per second), so it would cost $225 for a 5 minute video.

Here’s what you need to know about AI today:

OpenAI's Sam Altman warned of imminent AI voice fraud crisis in banking.
xAI planned to raise $12B for new data center “Colossus 2.”
Microsoft poached two dozen Google DeepMind AI researchers.
Amazon acquired AI wearable startup Bee for its $50 listening device.

Advertise in The Neuron here.

What’s the best AI model you can use right now?
Ditch the vibes, get the context.
Prompt Tip of the Day
Treats To Try.
Around the Horn.
New from Depot: Claude Code Sessions
Midweek Wisdom
A Cat's Commentary.

What’s the best AI model you can use right now?

Now, speaking of things from the distant past (in AI land anyway), the API version of Grok 4 has been added to the LM Arena leaderboard, and apparently tied for #3 in text, was #1 in math, #2 in coding, and #3 in hard prompts. Here’s the leaderboard.

For context, Grok 4 was beaten in text by Gemini 2.5 Pro, o3, and ChatGPT-4o. In coding, Gemini 2.5 Pro, DeepSeek-R1, and Claude Opus 4 took the top spots. And in hard prompts, Gemini 2.5 Pro and ChatGPT-4o edged it out.

According to this leaderboard, here’s the top AI in every category:

Text: Gemini 2.5 Pro takes the crown, with ChatGPT o3 and 4o tied for second. Grok 4 is indeed tied for third (along with GPT-4.5).
WebDev (real world coding tasks): It's a three-way tie at the top between Gemini 2.5 Pro, DeepSeek-R1, and Claude Opus 4 (Grok was much lower here, at 14).
Vision: Gemini 2.5 Pro again, followed by ChatGPT-4o, o3, and GPT-4.5 all tied for second. Google's really flexing here.
Search: Gemini 2.5 Pro (grounding version) ties with Perplexity's Sonar Reasoning Pro. Makes sense that search-focused companies would excel here.
Text-to-Image & Image Edit: GPT Image 1 (presumably from OpenAI's latest image model) dominates both categories.

The interesting pattern? Gemini 2.5 Pro is absolutely crushing it across multiple categories, while different specialized models take the lead in their respective domains. It's like watching the Olympics of AI - some athletes are decathletes, others are specialists who just happen to be really, really good at their one thing.

Side note: did you know ChatGPT’s ImageGen lets you easily pick your favorite style now?

We know orange cats are popular, but are they perhaps trying to get our attention? Who’s to say… Ollie (our mascot) is flattered regardless.

TBH, we’ve pretty much stopped using LM Leaderboard for our own personal rankings. Instead, we look to Artificial Analysis. And according to AA’s benchmarking, Grok 4 actually scored a 73 on their Intelligence Index, making it the #1 smartest model available. Not tied for first. Just straight-up first place.

The price reality check: At $6 per million tokens, Grok 4 is delivering the highest intelligence score on the market. Compare that to the other top performers: Gemini 2.5 Pro (70 intelligence) costs $3.40, and o4-mini (high) (70 intelligence) costs $1.90.

Best bang for your buck? If you want maximum intelligence and don't mind paying for it, Grok 4 is your answer. If you want 95% of the performance for half the price, Gemini 2.5 Pro and o4-mini (high) are solid alternatives.

If you want the cheapest best option, that would be DeepSeek R1 (68 intelligence, $0.91-$0.96/million tokens) followed by Gemini 2.5 Flash (Reasoning) (65 intelligence, $0.99/million tokens).

Think about that math: DeepSeek R1 delivers 93% of Grok 4's intelligence for ~15% of the cost. Gemini 2.5 Flash delivers 89% of the intelligence for 16% of the cost. Not bad at all. To compare all AI API models / providers, check here.

P.S: Keep an eye out for OpenAI’s new model called o3-alpha that’s in testing now…

FROM OUR PARTNERS

Ditch the vibes, get the context.

Augment Code's powerful AI coding agent meets professional software developers exactly where they are, delivering production-grade features and deep context into even the gnarliest of codebases.

Whether you’re starting a new project or getting up to speed on a codebase that is new to you, with Augment Code you can:

Keep using VS Code, JetBrains, Android Studio, or even Vim.
Index and navigate millions of lines of code.
Get instant answers about any part of your codebase.
Build with the AI agent that gets you, your team, and your code.

Ditch the vibes and get the context you need to engineer what’s next

Prompt Tip of the Day

One of our team members just discovered a killer combo that's like giving your AI superpowers. They combined yesterday’s tip about limiting reasoning steps to 5 words max with asking for “1 clarifying question at a time” using the o3 model.

The magic happened during a website migration project—instead of getting overwhelmed with massive explanations and dozens of follow-up questions, this method saved tons of time.“

To apply this yourself, open GPT o3 and say: "[Your Goal here] Think step-by-step but keep each reasoning step under 5 words. Ask me 1 clarifying question at a time to make sure you understand what I need."

Treats To Try.

*Asterisk = from our partners. Advertise in The Neuron here.

*Luma AI turns your text into videos and lets you completely restyle any video's background, characters, or setting in post-production.
Qwen3-Coder builds complete web apps and simulations from your descriptions, handles massive codebases (1M context), plus now includes a new Qwen Code CLI for command-line coding assistance—try it here.
Levio takes your raw talking head video and automatically adds subtitles, B-roll footage, and transitions—then you chat to make changes
Jeeva finds leads, writes personalized emails, and follows up until they book meetings with you.
Youware is a “vibe community” that turns your text prompts into working apps and websites, then lets you share them with a community of 100K+ creators for feedback.
Storm from Stanford turns any topic you type into a fully-cited research report with sources and references—free to try.

See our top 51 AI Tools for Business here!

Around the Horn.

A tale of two AI infrastructure buildout tweets (Sam, Elon).

OpenAI published a blog about their new 4.5 gigawatt data center partnership with Oracle that brings Stargate's total AI infrastructure capacity to over 5 gigawatts, running 2M+ chips, creating 100K+ jobs, and helping OpenAI exceed its original $500B U.S. infrastructure commitment (allegedly, this deal costs something like $30B, which would cost 3x OpenAI’s current revenue, but 10x less than its current valuation).
- Possibly related: OpenAI will reopen its previous $40B funding round on July 28th to take new backers, since $30B of SoftBank’s previous commitment relies on OpenAI completing a for-profit conversion by the end of this year.
xAI is actively trying to raise $12B for its own data center buildout (featured above), which Elon says will aim for 50M units of H100 equivalent AI compute (but with much better power-efficiency) online within 5 years.
Pew Research tracked 900 U.S. adults and found Google's AI summaries are creating “zero-click” searches where users click through to websites only 8% of the time (versus 15% without AI summaries) and are 10 percentage points more likely to end their browsing session entirely.
Amazon acquired AI startup Bee that makes a $50 wearable device which listens to conversations and creates reminders and to-do lists.
Microsoft hired around two dozen employees from Google DeepMind's AI research lab to strengthen its Copilot assistant and Bing search engine.
OpenAI CEO Sam Altman warned of a “significant impending fraud crisis” as AI voice cloning could bypass bank security authentication systems.

For the latest AI deep dives, check out our Explainer articles here!

FROM OUR PARTNERS

New from Depot: Claude Code Sessions

Claude Code is a game-changer, but using it in team environments can be frustrating. Now you can start debugging something locally, hand it off to a teammate, and they can pick up exactly where you left off. Finally, AI pair programming that actually works.

Check out how it works