Google Escalates AI Infrastructure War with Ironwood, Its Most Powerful Chip Ever

Google just dropped Ironwood, its seventh-generation AI chip, signaling a major push to dominate the "age of inference" with 4x better performance and a massive commitment from Anthropic to use up to 1 million of its new TPUs.

Grant Harvey

July 29, 2024

Google's new AI chip is a monster, and Anthropic is buying 1 million of them.

The AI arms race just got a new heavyweight contender. Google Cloud announced Ironwood is now generally available. Ironwood is its seventh-generation custom AI chip (TPU), and it’s an absolute beast built for what they’re calling the "age of inference"—a fancy way of saying we’ve moved from just training models to actually using them at a massive scale.

The headline numbers are staggering. Ironwood delivers over 4x better performance than its predecessor for both training and inference. And to prove it’s not just talk, Anthropic (the maker of Claude) plans to access up to 1 million of these new TPUs to power its models.

So, just how powerful is this thing? Google’s specs sound like something out of a sci-fi movie, so we broke it down for you. An Ironwood "pod" connects thousands of chips that act like a single AI super-brain.

The Brain: Imagine 9,216 of the most powerful AI chips working together as one unit. This allows them to run some of the largest, most complex AI models in existence as a single, coordinated operation.
The Data Highway: To prevent traffic jams between all those chips, Google built a custom "Inter-Chip Interconnect" that moves data at 9.6 Terabits per second. Think of it as a private, thousand-lane data superhighway.
The Memory: All those chips share a staggering 1.77 Petabytes of memory—enough to store the equivalent of 40,000 high-definition movies. This means the entire AI model can be loaded and accessed instantly.

The bottom line is this: Google is making a serious play for the AI infrastructure crown long held by Nvidia. While training AI models gets all the hype, the real money and compute-intensive work in the long run will come from inference—running those models for millions of users to get instant answers, generate images, and power AI agents.

Customers are already seeing huge gains. ZoomInfo reported a 60% improvement in price-performance for its data pipelines on Google's new chips, and Vimeo saw a 30% performance boost for video transcoding.

Here's what this means for you. For developers and founders, the AI infrastructure market is no longer a one-horse race. Google is building a powerful, cost-effective alternative to GPUs. With tools like vLLM now making it easier to switch between chips, you should no longer default to one provider. Test your workloads on both—you might find Google’s new hardware gives you a massive edge in speed and cost for your specific use case.

Below, we dive deeper into the news and tell you everything you need to know about ye ole Ironwood...

Why "The Age of Inference"?

For years, the primary challenge was training ever-larger models, a process that consumed astronomical amounts of computing power. Now, the industry is entering what Google has dubbed the "age of inference," where the focus is shifting from creation to application. The new bottleneck is serving these powerful models to millions of users efficiently, reliably, and cost-effectively.

In a direct challenge to the market's incumbents, Google Cloud today released its next-generation custom silicon designed to conquer this new frontier: Ironwood, its seventh-generation Tensor Processing Unit (TPU), and a new family of Axion Arm-based CPUs. Backed by a massive commitment from AI leader Anthropic, which plans to access up to one million TPUs, Google is signaling its intent to not just compete, but to lead the infrastructure layer of the AI revolution.

Ironwood: A Quantum Leap in Performance and Scale

Ironwood is a chip purpose-built for the most demanding AI workloads. Google claims it offers more than four times the performance for both training and inference compared to its previous generation, and a tenfold peak performance improvement over its TPU v5p.

These are not just incremental gains; they represent a significant leap in capability. But the true power of Ironwood lies not in the individual chip, but in the system-level architecture Google has built around it, known as AI Hypercomputer.

A single Ironwood "pod" can interconnect up to 9,216 individual chips, allowing them to function as a single, cohesive supercomputer. To understand the scale of this engineering feat, Google provided several analogies:

A Unified Super-Brain: Imagine 9,216 of the world's most specialized AI processors working in perfect unison. This massive scale is necessary because today's frontier models are too large to fit on a single chip. Ironwood's pod architecture allows one of the largest AI models in existence to run as a single, coordinated operation, eliminating the communication bottlenecks that cripple traditional distributed systems.
A Thousand-Lane Data Superhighway: The key to making these 9,216 chips act as one is Google's proprietary Inter-Chip Interconnect (ICI). This dedicated network operates at a staggering 9.6 Terabits per second (Tb/s). This "data superhighway" ensures that calculations from one chip are instantly available to all others, preventing the digital traffic jams that slow down complex computations.
Unprecedented Memory Access: The entire pod shares a record-breaking 1.77 Petabytes (PB) of High Bandwidth Memory (HBM). This is the system's working memory, where the AI model itself resides. With 1.77 PB—enough to store the text of millions of books or 40,000 high-definition movies—the entire model is instantly accessible to every chip, dramatically accelerating processing speed.

This vertically integrated design—from custom silicon to networking to memory—is how Google achieves its performance claims. This isn't just about raw speed; it's about efficiency. By minimizing data transfer delays and maximizing hardware utilization, Google aims to significantly lower the Total Cost of Ownership (TCO) for running cutting-edge AI services. Furthermore, the system is engineered for resilience. Google’s Optical Circuit Switching (OCS) technology acts as a dynamic fabric that can instantly reroute data traffic around hardware interruptions, ensuring near-constant availability for critical AI services, backed by a fleet-wide uptime of approximately 99.999% since 2020.

Anthropic’s Billion-Dollar Bet and Broad Customer Adoption

The most significant endorsement of Ironwood comes from Anthropic, the creator of the Claude family of models. The AI research lab announced plans to access up to 1 million TPUs to train and serve its models.

"As demand continues to grow exponentially, we're increasing our compute resources as we push the boundaries of AI research and product development," said James Bradbury, Head of Compute at Anthropic. "Ironwood’s improvements in both inference performance and training scalability will help us scale efficiently while maintaining the speed and reliability our customers expect."

This partnership is a massive vote of confidence and provides Google with a foundational, large-scale customer to validate its technology. Other customers are already reporting impressive results. Lightricks, a creative technology company, is enthusiastic about using Ironwood to generate higher-fidelity images and video for its millions of users.

Axion: Powering the AI Application Backbone

While specialized accelerators like Ironwood handle the heavy lifting of model processing, modern applications require a robust backbone of general-purpose computing. For this, Google introduced new instances powered by its custom Arm-based Axion CPUs.

The new offerings include:

N4A: Now in preview, this virtual machine is designed for price-performance on workloads like microservices, databases, and data analytics.
C4A metal: Google's first Arm-based bare-metal instance, providing dedicated physical servers for specialized tasks like Android development or complex simulations.

These chips are not just an afterthought; they are a critical part of Google's holistic strategy. While Ironwood serves the model, Axion runs the application servers, handles data ingestion and preparation, and manages the operational infrastructure that surrounds the AI.

Early adopters of Axion are seeing significant efficiency gains. Vimeo reported a 30% performance improvement for its video transcoding workload on N4A instances compared to comparable x86 VMs. ZoomInfo, a data intelligence platform, measured an even more impressive 60% improvement in price-performance for its core data processing pipelines.

The Software Layer: Making Power Accessible

Hardware is only half the equation. Google has also invested heavily in a co-designed software layer to make Ironwood's immense power accessible to developers. Key announcements include:

vLLM Support: Enhanced support for TPUs in the popular vLLM inference library allows developers to switch between GPUs and TPUs with only minor configuration changes, lowering the barrier to adoption.
GKE Integration: A new GKE Inference Gateway intelligently load-balances requests across TPU servers, which Google claims can reduce latency by 96% and serving costs by up to 30%.
MaxText Enhancements: The open-source LLM framework has been updated to make it easier to implement the latest training and reinforcement learning techniques.

This focus on open software and seamless integration is crucial for attracting developers and making the AI Hypercomputer a viable alternative to more established ecosystems.

A Strategic Play in the AI Infrastructure Market

Today’s announcements are the culmination of a decade of investment in custom silicon at Google, a journey that began with the first TPU which, in turn, unlocked the invention of the Transformer architecture that underpins modern AI.

With the industry shifting to the "age of inference," the battle for AI infrastructure supremacy is entering a new phase. It's no longer just about who can build the biggest training cluster, but who can provide the most efficient, scalable, and cost-effective platform for deploying AI to the world. With Ironwood and Axion, Google has laid down a powerful marker, demonstrating that it has the technology, the strategy, and the key partners to be a dominant force in this new era.

Related news: Vertex AI Agent Builder now lets you build AI agents faster (via one-click deployment) with configurable context layers, scale them in production with new observability tools that track performance metrics and debug issues, and govern them securely with native agent identities and safeguards. Also, check out our coverage of Google's Project Suncatcher, which aims to shoot a bunch of TPUs into the sky to create datacenter satellites in space; very cool stuff!