😺 Google’s sharpest brain yet? 🧠

Welcome, humans.

Reddit is rolling up its sleeves and testing an AI-powered search feature that blends its discussion-driven content with product discovery and recommendations.

It lets users ask natural language questions and get shopping-relevant results right from Reddit. It’s an intriguing twist on traditional e-commerce search, one that leans on authentic user opinions rather than algorithmically curated storefronts. (…and a creepy vibe for retail sales clerks almost as awkward as Anthopic’s Dario Amodei refusing to shake Sam Altman’s hand.)

Here’s what happened in AI today:

Google released Gemini 3.1 Pro, scoring 98% on ARC-AGI-1.
OpenAI started testing ads in ChatGPT after the first response.
DeepMind co-founder David Silver raised $1B for an RL-focused AI agent startup.
Anthropic's revenue grew 10x annually, outpacing OpenAI's trajectory.

ICYMI: 🎙️ NEW EPISODE: The Man Who Built GitHub Copilot Just Gave His First Interview About Gemini CLI

Taylor Mullen built GitHub Copilot at Microsoft. Now he's a Principal Engineer at Google, and his team ships 100 to 150 features every single week using AI to build itself.

In his first-ever in-depth interview about Gemini CLI, Taylor reveals the origin story, demos live bug-fixing, and shares the exact techniques his team uses to go from 10x to 100x—including the viral "Ralph Wiggum" method that's taking over developer Twitter.

Whether you code or not, you'll walk away understanding what the command line is, why it's having a massive comeback, and how to use it to do way more with AI than a chatbot ever could. Watch / Listen now: YouTube | Spotify | Apple Podcasts

P.S: We broke down all the insights from this video into a Deep Dive about Gemini CLI. Check it out!

Google's "Small" Update Just Made Gemini the Smartest AI You Can Buy
Migrate to Delve and get a $2,000 VISA card in your inbox
Prompt Tip of the Day
Treats to Try
Around the Horn
Case Study: Dan’s 1-Year Transformation To 6-Figure AI Consultant
Intelligent Insights

Google's "Small" Update Just Made Gemini the Smartest AI You Can Buy

DEEP DIVE: Everything you need to know about Gemini 3.1 Pro.

Google released Gemini 3.1 Pro today, and the ".1" is doing some serious heavy lifting. According to Artificial Analysis, an independent benchmarking firm, 3.1 Pro now sits at #1 on their overall Intelligence Index (which is like a giant benchmark of all the other major benchmarks put together), ahead of Claude Opus 4.6 and GPT-5.2.

First, a few benchmarks: Gemini 3.1 hit 98% on ARC-AGI-1 (a test originally meant to test AGI) and 77% on ARC-AGI-2 (a second test meant to test AGI; we’re now on ARC-AGI 3, which is meant to test agentic “action efficiency”, or how quickly an AI can learn and make the next correct action to solve puzzles). And not only that, it topped the APEX-Agents leaderboard for complex reasoning, coding, and agentic tasks.

Here's how the Big Three stack up right now:

Overall intelligence: Gemini 3.1 Pro (57) > Claude Opus 4.6 (53) > GPT-5.2 (51)
Coding: Gemini 3.1 Pro (56) > Claude Sonnet 4.6 (51) > GPT-5.2 (49)
Agentic tasks: Claude Opus 4.6 (68) > GPT-5.2 (60) > Gemini 3.1 Pro (59)
Hallucination resistance: Gemini 3.1 Pro (30) blows everyone away; the next best score is 13
The exact numbers here don’t matter; but the ORDER does.

Translation: Google now has the smartest and most factually reliable model. Claude still dominates agentic work (complex multi-step tasks), and GPT-5.2 sits comfortably in between. Price-wise, Gemini 3.1 Pro costs $4.50 per million tokens, which is cheaper than GPT-5.2 ($4.80) and roughly half the price of Claude Opus 4.6 ($10).

Now here's what's actually new under the hood:

A "medium" thinking mode. Gemini 3 Pro only had "low" and "high." The new middle setting gives you solid reasoning without waiting minutes for an answer. On "high," the model now acts like a mini version of Deep Think, Google's advanced reasoning system.
Fewer hallucinations, by a lot. The model card shows meaningful improvement, and the Artificial Analysis numbers confirm it. Gemini 3.1 Pro's factual accuracy is in a league of its own right now.
Better coding, with a caveat. Benchmarks show 3.1 Pro leading on coding. But developers on Reddit note it's great at one-shot problem solving, but less great at extended back-and-forth sessions where Claude still has the edge.
AI Studio is now full-stack. AI Studio now supports servers, databases, and multiplayer apps (this is huge). Also, Google's Antigravity agent is now built in.

It's rolling out to the Gemini app, GitHub Copilot, NotebookLM, Vertex AI, Gemini CLI, and more. Harvey is apparently already testing it for legal research (as are others, we’re sure).

Want to try it yourself? AI Studio is free. Here are three things worth testing:

The thinking levels: Run the same hard question on low, medium, and high. Ask it to solve a tricky word problem or logic puzzle and watch the quality difference.
Hallucination stress test: Ask it for specific stats from a real report (e.g., "What were the key findings from Stanford's 2024 AI Index?"). See if it hedges when it should, or confidently makes things up.
Head-to-head: Take your most-used prompt / workflow and run it in Gemini, Claude, and ChatGPT side by side. That'll tell you more than any benchmark.

We’ll write more about this next week as we benchmark it against Sonnet/Opus 4.6 and GPT-5.3 Codex in a series of agentic coding tasks over the weekend!

FROM OUR PARTNERS

Migrate to Delve and get a $2,000 VISA card in your inbox

Delve is the AI-native compliance platform that actually does the work for you, auto-collecting evidence from AWS, GitHub, and your stack so you don’t have to chase screenshots or babysit integrations. Use AI security questionnaire tooling, AI copilot, and everywhere else to make compliance feel less, dreadful. Welcome to the new age.

The proof is in the pudding:

Bland → Switched, got compliant, and unlocked $500k ARR in 7 days
11x → Streamlined audits and moved faster on enterprise deals
micro1 → Scaled compliance without adding headcount.

Bonus: Delve will handle your migration for free. Zero-touch. No disruption. No starting over.

If you’re dreading opening your current SOC 2 tool, that’s your sign.

Book a demo here and trigger a migration - get $2000 sent straight to your inbox as soon as you’re onboarded.

Prompt Tip of the Day

Here's a dead-simple trick that quietly improves your AI’s performance across Gemini, GPT, Claude, and Deepseek, with zero extra latency and zero extra output tokens (i.e. the words / code generated): just repeat your prompt twice.

Researchers at Google found that simply sending your query twice in a row (back-to-back, no separator needed) wins on 47 out of 70 benchmark tests, with 0 losses.

Since large language models (the AI models behind ChatGPT, Gemini and Claude, abbreviated to LLMs) process tokens left-to-right, early tokens can't “see” what comes later. Repeating the prompt lets every part of your query attend to every other part — essentially giving the model a second pass at understanding your full request before it ever generates a response.

Here's how to use it, ranked from easiest to most powerful:

Basic repeat: Copy-paste your prompt twice. That's it. Works great for Q&A, multiple choice, and retrieval tasks.
Verbose repeat: Add "Let me repeat that:" between copies. Slightly more natural, similar gains.
Triple repeat: Add "Let me repeat that one more time:" and paste a third copy. Best for hard tasks like finding a specific item buried in a long list.

Our favorite insight: the gains are most dramatic when your question and context are in an awkward order — like when the answer choices come before the question. In one test, Gemini 2.0 Flash-Lite's accuracy on a name-retrieval task jumped from 21% to 97% just from repeating the prompt. Wild.

One caveat: this works best when reasoning is off. If you're using chain-of-thought or "think step by step," (which “extended reasoning” or “thinking” does automatically) the model already kind of re-reads the prompt on its own; so the gains are smaller (though still neutral-to-positive!).

Try it on your next long, context-heavy prompt and see what happens!

Want more like this? Check out our Prompt Tip of the Day Digest for February here.

Treats to Try

*Asterisk = from our partners (only the first one!).

*On Mac, Windows, and iPhone, Flow turns speech into paste-ready writing so your ideas land while they are fresh.
Anthropic launched Claude in PowerPoint, generating, editing, and iterating presentations directly in Microsoft PowerPoint from templates or descriptions with native charts/diagrams while maintaining formatting. Free beta for Pro/Max/Team/Enterprise.
Lindy is a proactive AI assistant that triages your inbox, preps for meetings, takes notes, and sends follow-ups automatically.
SearchSeal tracks what AI chatbots like ChatGPT and Gemini say about your brand, so you can optimize for the new search.
Figr is a product-aware AI design tool that thinks through UX edge cases, user flows, and decisions before generating prototypes; basically a design co-pilot that catches what you'd miss in review (trusted by 500+ teams).

Around the Horn

New research surveying 121K developers found 92.6% use AI coding assistants and they now write 26.9% of production code, but productivity gains have plateaued at ~10% because AI amplifies existing organizational strengths or weaknesses.
OpenAI began testing ads inside ChatGPT for U.S. free and Go tier users, with sponsored content appearing after the very first response rather than deeper into conversations.
Block cut up to 10% of its workforce (~1,100 employees) in rolling layoffs while mandating daily AI tool usage for remaining staff, with employees describing "the worst morale in four years."
DeepMind co-founder David Silver raised $1B in a seed round for Ineffable Intelligence, signaling that simply scaling large language models (LLMs) is hitting diminishing returns and the next leap comes from agents that learn through reinforcement learning (RL), or trial-and-error self-play.
Anthropic grew revenue 10x annually since hitting $1B, faster than OpenAI's 3.4x, and could overtake OpenAI by mid-2026 if trends continue. Yuchen: “This is why Sam and Dario can’t hold hands” LOL.
Reliance Industries announced $110B over seven years to build gigawatt-scale AI data centers across India, powered by its 10 GW surplus of green energy.

NEW: Want more? Check out our new Around the Horn Digest for February here

FROM OUR PARTNERS

Case Study: Dan’s 1-Year Transformation To 6-Figure AI Consultant

Dan had no tech background and no business experience – just a love for AI and a hunch it could become something more. Through The AI Consultancy Project, he landed his first clients, found a niche, and built a real business. This case study breaks down his journey – the early stumbles, the system that worked, and how he made the leap.

Read the Full Case Study — Free →

Intelligent Insights

Jeremy Howard warned that LLMs may call tools not included in the provided list, affecting all major U.S. providers except OpenAI; developers should verify every tool call request.
Andrej Karpathy vibe coded a custom cardio dashboard in one hour using Claude to reverse-engineer a treadmill API, arguing app stores of discrete apps are outdated as LLMs improvise personalized software on the fly.
Marginalia argued AI is making people boring by eliminating the deep problem immersion that produces original thinking. "You don't build muscle using an excavator to lift weights. You don't produce interesting thoughts using a GPU to think."

There’s tons more insights in today’s Around the Horn Digest! Go check it out.