Executive summary
The age of inference is dawning. As generative models graduate from lab demos to embedded services and proactive agents, organisations require massive inference throughput, low latency and a trustable supply chain. Google’s April 2025 unveiling of its Ironwood Tensor Processing Unit (TPU) and Axion Arm‑based CPU reflects this shift. Ironwood is the company’s first TPU built primarily for inference, offering over 4× higher performance per chip than its predecessor and forming superpods with 9,216 chips that deliver ≈42.5 exaFLOPS (FP8) and 1.77 PB of shared memorytechradar.comventurebeat.com. Axion, Google’s custom Arm server CPU, claims up to 50 % better performance and 60 % energy efficiency compared with x86newsroom.arm.com and powers new N4A and C4A compute instancesnetworkworld.com. Together, these chips enable Google’s AI Hypercomputer – a tightly integrated stack of compute, networking, cooling and software that the company says yields 353 % ROI and 28 % lower IT costshpcwire.com.
This white paper examines Ironwood and Axion through the lens of Thorsten Meyer’s advocacy for private, responsible and sovereign AI infrastructure. It explores technical specifications, early deployments (Anthropic, Spotify, Paramount, Vimeo, ZoomInfo), and the broader competitive landscape involving Nvidia’s Blackwell NVL72, AWS’s Trainium 2, Microsoft’s Maia/Cobalt and AMD’s MI300X. We also analyse Europe’s AI sovereignty initiatives – from EU AI Factories and the InvestAI fund to corporate desires for sovereign AI. Finally, we provide strategic recommendations for enterprises seeking to harness agentic AI while maintaining control over data and compute.
Key takeaways
- Inference‑first hardware – Ironwood’s 4,614 FP8 TFLOPs per chip, 192 GB HBM3e and 7.3 TB/s bandwidth allow large models to be served with minimal batching, reducing latencytechradar.com. Superpods of 9,216 chips deliver 42.5 exaFLOPS and 9.6 Tb/s interconnect bandwidthventurebeat.com, surpassing competitors’ memory and bandwidth.
- Custom general‑purpose compute – Axion uses Arm Neoverse V2 cores and claims 50 % higher performance and 60 % energy efficiency vs x86newsroom.arm.com. Google’s N4A and C4A instances deliver 2× better price‑performance than x86 VMs and up to 80 % better performance per wattnetworkworld.com.
- ROI and reliability – Google’s AI Hypercomputer architecture integrates Pathways, sparse compute, GKE Cluster Director and Inference Gateway to achieve 353 % three‑year ROI, 28 % lower IT costs and 55 % more efficient IT teamshpcwire.com. Optical circuit switching and 99.999 % reliability enable pods to scale reliablyventurebeat.com.
- Competitive landscape – Nvidia’s Blackwell NVL72 combines 72 GPUs with 36 Grace CPUs, providing 130 TB/s NVLink bandwidth and 37 TB of memorynvidia.com. AWS’s Trainium 2 clusters promise 30–40 % better price‑performance, with one million chips contracted by Anthropic by 2026nextplatform.com. Microsoft’s Maia 100 features 105 billion transistors and aims at 2026 availabilitydatacenterdynamics.com. AMD’s MI300X offers 192 GB HBM3 and >5 TB/s bandwidth, delivering ≈50 % faster inference than the MI250neysa.ai.
- Sovereign AI urgency – Europe holds only ≈4 % of global AI compute compared with the US’s 70 %, with about 57 000 accelerators and €10 billion annual investmentseglerconsulting.com. EU AI Factories and InvestAI aim to build 15 AI factories and several gigafactories with >100 000 processorsdigital-strategy.ec.europa.eu, but public funding remains small relative to US tech giants and leaves structural gapsseglerconsulting.com. Surveys show 62 % of European organisations desire sovereign AI solutions, yet only 16 % treat it as a board‑level concernnewsroom.accenture.comnewsroom.accenture.com.
- Agentic AI adoption – McKinsey reports that although 80 % of companies use generative AI, few see bottom‑line impact; agentic AI (autonomous agents orchestrating workflows) may break this paradoxmckinsey.com. Predictions suggest 25 % of organisations will pilot agentic AI in 2025, growing to 50 % by 2027genesishumanexperience.com. Microsoft proposes an open agentic web with standardised protocols, emphasising multi‑agent orchestrationblogs.microsoft.com.
1 The age of inference and agentic AI
1.1 From training to serving
The last decade of AI investment focused on training large language models (LLMs) and generative models. Today, the challenge is shifting to serving these models at scale. Chatbots, image generators, multimodal assistants and proactive agents must respond in real time, often across billions of queries per day. This “age of inference” demands hardware optimised for low‑latency inference rather than peak training throughput. Google engineers note that Ironwood is their first TPU purpose‑built for inference, delivering 2× performance per watt compared with the previous Trillium generationreuters.com.
1.2 Agentic AI and the gen‑AI paradox
While generative AI adoption is widespread, many enterprises struggle to translate experiments into productivity gains. McKinsey calls this the gen‑AI paradox: companies use generative AI but often see negligible bottom‑line impact because AI remains isolated from core workflowsmckinsey.com. Agentic AI – autonomous systems that pursue goals, make decisions and orchestrate multiple tools – promises to break this paradox. McKinsey stresses that real value arises when organisations redesign processes and integrate agents with data, APIs and humansmckinsey.com. Gartner predicts a sharp rise in agentic pilots: 25 % of companies using generative AI will pilot agentic AI in 2025, increasing to 50 % by 2027genesishumanexperience.com. Microsoft envisions an open agentic web with standard protocols (Model Context Protocol, NLWeb) and multi‑agent orchestrationblogs.microsoft.com.
2 Ironwood TPU: architecture and differentiation
2.1 Technical specifications
Ironwood builds on six generations of TPUs but marks a pivot toward inference‑optimised design. Key specifications (FP8 precision) include:
- 4,614 TFLOPs per chip with support for FP8 and bfloat16; per‑chip memory of 192 GB of HBM3e providing 7.3 TB/s bandwidthtechradar.com.
- Improved SparseCore for efficient sparse matrix multiplication and a more advanced Matrix Multiplication Unit (MXU), co‑designed with Google’s Pathways runtime to handle 8 K+ sequence lengthstechradar.com.
- Enhanced Inter‑Chip Interconnect (ICI) delivering up to 1.2 TB/s per chip and 9.6 Tb/s in a 9,216‑chip superpod, outpacing Nvidia’s NVLink bandwidthventurebeat.com.
- Superpods combine 9,216 chips into a single high‑performance cluster providing ≈42.5 exaFLOPS and 1.77 PB of shared HBM3eventurebeat.com. Optical circuit switching ensures 99.999 % reliability, allowing pods to be networked at scaleventurebeat.com.
- Power and cooling – each rack supports ~1 MW and uses liquid cooling and 400 V DC distributionventurebeat.com.
These specifications enable large models (e.g., 7B to 500B parameters) to be served with minimal batching, reducing latency and inference costs. Compared to prior TPUs, Ironwood delivers more than 4× performance at the chip level and 10× peak performance over the v5p generationtechrepublic.com.
2.2 Integration with the AI Hypercomputer
Google positions Ironwood as a component of its AI Hypercomputer – a system‑level design that tightly integrates compute, networking, storage, cooling and software. Key elements include:
- Pathways runtime orchestrating model partitioning across chips for efficient scheduling and dynamic sparsity.
- Inference Gateway that routes requests to accelerate and reduce latency; Google claims it reduces inference latency by 96 % and costs by 30 %venturebeat.com.
- GKE Cluster Director automating deployment, maintenance and scheduling of large models across pods, reducing maintenance windows from hours to minutesventurebeat.com.
- Optical circuit switching enabling pods to form clusters with high reliability; network faults can be rerouted without performance degradationventurebeat.com.
The AI Hypercomputer yields measurable business benefits: an internal Google ROI study reports 353 % three‑year return on investment, 28 % lower IT costs and 55 % more efficient IT teams compared with conventional hardwarehpcwire.com. These figures, while company‑provided, illustrate the value of system‑level co‑design rather than isolated chip upgrades.
2.3 Early adopters and case studies
- Anthropic – In October 2025, Anthropic signed a deal to use up to one million TPUs, securing more than 1 GW of computing capacity by 2026tomshardware.com. Anthropic notes that renting Google Cloud hardware saves massive capital expenditure and provides superior price‑performance relative to building its own datacentres.
- ZoomInfo & Vimeo – Early customers using Ironwood or the N4A instances report dramatic efficiency gains: ZoomInfo saw 60 % better price‑performance while Vimeo achieved 30 % improvement in video processingventurebeat.com. Other companies like Rise AI reported 20 % lower energy consumptiontechrepublic.com.
- Lightricks & Essential AI – Recognised for adopting Ironwood pods in 2025, these startups cite rapid scaling and cost advantages for generative video and enterprise agentic workloadstechrepublic.com.
3 Axion CPU: custom general‑purpose compute
3.1 Design and claims
Google’s Axion CPU is its first internally designed server processor. Built on Arm’s Neoverse V2 (codenamed Demeter), it emphasises performance per watt and tight integration with Google’s networking/security stack. Key claims include:
- Up to 30 % higher performance than the fastest Arm instances on the market and up to 50 % higher performance and 60 % lower energy consumption compared with x86 instancesnewsroom.arm.comnextplatform.com.
- Titanium microcontroller handles networking and security tasks, freeing CPU cycles for applicationsarstechnica.com.
- Neoverse V2 cores support Armv9 features, including scalable vector extensions useful for AI inference. While not targeted at training, Axion can handle CPU‑based inference and conventional workloads (databases, web servers, data analytics)arstechnica.com.
3.2 N4A and C4A instances
Google Cloud offers two main families of Axion‑powered instances:
- N4A – general‑purpose VMs with up to 64 vCPUs, 512 GB DDR5 memory and 50 Gbps network. Google claims 2× better price‑performance and 80 % better performance per watt vs comparable x86 VMsnetworkworld.com. These are ideal for scale‑out microservices, Java apps and containerised workloads.
- C4A – compute‑optimised instances (metal and VM) offering up to 96 vCPUs and 768 GB memory with 100 Gbps networkingdatacenterdynamics.com. Early customers include Paramount, which reported 33 % faster video encoding, and Spotify, which saw 250 % performance gains in certain workloadsdatacenterdynamics.com. Google says C4A VMs provide 10 % better price‑performance vs other Arm VMs and 65 % better price‑performance vs x86datacenterdynamics.com.
3.3 Strategic role
Axion marks Google’s ambition to control both general‑purpose and accelerator compute. Owning the CPU allows Google to optimise the full stack – from the Titanium controller to hypervisor and network – reducing licensing costs and mitigating reliance on external suppliers. It also aligns with energy‑efficiency goals; Arm’s per‑watt performance is attractive as datacentres face power constraints.
4 Competitive landscape: Nvidia, AWS, Microsoft & AMD
The AI hardware race is intensifying. Understanding competitor offerings helps contextualise Ironwood and Axion.
4.1 Nvidia Blackwell NVL72 & GB200 Superchips
Nvidia’s Blackwell generation (B200/B100 GPUs) underpins the NVL72 system, combining 72 Blackwell Ultra GPUs and 36 Grace CPUs in a rack. The system offers 130 TB/s NVLink bandwidth, 37 TB of memory (including 20 TB GPU memory) and 576 TB/s memory bandwidthnvidia.com. Nvidia claims NVL72 delivers 10× user responsiveness, 5× throughput per megawatt and 50× overall AI‑factory output relative to the previous generationnvidia.com. The architecture emphasises scale-out training and inference; like Ironwood pods, NVL72 uses optical cables and supports multi‑rack clusters.
4.2 AWS Trainium 2 & shared inference
Amazon’s Trainium2 is the company’s second‑generation AI accelerator. A NextPlatform analysis notes that AWS offers 30–40 % better price‑performance over Nvidia H100 for inference, with clusters of up to 500 000 Trainium2 chips fully subscribed and plans to expand to 1 million chips by 2026nextplatform.com. AWS emphasises energy efficiency and offers shared inference – a feature that consolidates inference requests across customers for higher utilisation. Trainium3 is expected to double performance and will power a 65 MW data‑centre capacity expansionnextplatform.com.
4.3 Microsoft Maia, Cobalt & open agentic web
Microsoft is developing the Maia 100 inference accelerator and Cobalt Arm server CPU. The Maia 100 has 105 billion transistors and offers ≈1,600 TFLOPs (MXInt8) and 3,200 TFLOPs (MXFP4)datacenterdynamics.com. However, reports indicate that the larger Maia 200 will not reach mass production until 2026datacenterdynamics.com. Microsoft emphasises an open agentic web, releasing protocols such as Model Context Protocol (MCP) and Natural Language Web (NLWeb)blogs.microsoft.com to enable multi‑agent orchestration.
4.4 AMD MI300X
AMD’s MI300X combines CPU and GPU chiplets, offering 192 GB of HBM3 memory with >5 TB/s bandwidth and energy‑efficient designneysa.ai. AMD claims up to 50 % faster inference compared with its MI250 acceleratorneysa.ai. While MI300X has less software ecosystem maturity than Nvidia, its larger memory capacity is attractive for hosting bigger models on a single device, reducing the need for multi‑GPU partitioning.
5 European AI sovereignty and digital infrastructure
5.1 AI Factories and InvestAI
Europe’s policy makers are increasingly vocal about digital sovereignty—the ability to control AI infrastructure, data and standards without reliance on foreign suppliers. In June 2024, the European Commission announced AI Factories under the AI Continent Action Plan. These factories repurpose EuroHPC supercomputers to provide AI‑optimised compute for startups and SMEs, with at least 15 AI factories expected by 2025–2026 and plans for AI gigafactories containing >100 000 processorsdigital-strategy.ec.europa.eu. The InvestAI facility aims to mobilise €20 billion in public and private capital to build five AI gigafactories and an ecosystem of compute, data and talent.
Despite these initiatives, Europe lags behind. An analysis from Segler Consulting estimates that Europe holds only 4 % of global AI compute while the US commands 70 %seglerconsulting.com. Europe’s ≈57 000 high‑end accelerators and €10 billion annual investment are dwarfed by US tech giants spending tens of billions every yearseglerconsulting.comseglerconsulting.com. Public investments alone cannot bridge this gap; the report stresses the need for structural reforms, risk‑tolerant venture capital and talent developmentseglerconsulting.com.
5.2 Corporate demand for sovereign AI
An Accenture survey of 300 European executives reveals that 62 % of organisations intend to use sovereign AI solutions, with the highest interest in banking, public service, life sciences and telecomsnewsroom.accenture.com. However, only 16 % treat AI sovereignty as a board‑level prioritynewsroom.accenture.com. Barriers include cost, regulatory complexity and scarcity of local compute. The EU’s €1.1 billion Apply AI strategy aims to address these issues by funding pilot projects and creating an Apply AI Allianceaibusiness.com, but analysts note that the sum is modest relative to global competition and primarily symbolicaibusiness.com.
5.3 Implications for private and responsible AI
Thorsten Meyer emphasises private, responsible AI and digital sovereignty. For European enterprises, the choice of inference infrastructure intersects with regulatory compliance (GDPR, AI Act), data residency and vendor lock‑in. Using Google’s Ironwood and Axion via cloud may raise sovereignty concerns if compute and data remain outside EU jurisdiction. Conversely, building or partnering on local compute (via AI factories, on‑premises hypercomputers) could support sovereign agentic AI but requires substantial capital and expertise. Balancing performance, cost, sustainability and sovereignty will be a key strategic decision.
6 Strategic recommendations
6.1 Optimise for inference and agentic workloads
- Benchmark on real workloads – Organisations should evaluate Ironwood, NVL72, Trainium 2, Maia 100 and MI300X on their specific models (LLMs, diffusion, retrieval‑augmented generation). FP8 or FP4 performance metrics are not directly comparable across vendors; real‑world latency, throughput, total cost of ownership (TCO) and system software maturity matter.
- Invest in agentic architecture – Adopt an agentic AI mesh that combines centralised generative models with decentralised autonomous agents. This architecture should support long‑context memory, retrieval across private datasets and open standards (MCP, NLWeb). Avoid vendor lock‑in by designing modular interfaces and exploring open‑source inference stacks (e.g., vLLM, vLLM + vX pods).
- Leverage sparsity and quantisation – Ironwood’s SparseCore and FP8 enable efficient low‑precision inference. Enterprises should integrate sparsity‑aware training and quantisation‑aware pipelines to exploit new hardware, reducing compute and energy costs.
6.2 Plan for digital sovereignty
- Assess data residency and compliance – Identify which workloads require data to remain within EU borders and evaluate whether cloud regions meet sovereignty requirements. Consider hybrid strategies combining public cloud with on‑premises or EU‑based hyperscalers.
- Engage in EU AI factory programmes – Collaborate with AI Factories and InvestAI to access subsidised compute. Advocate for fair access and transparent pricing to ensure SMEs benefit. Participate in shaping the governance and standards of AI gigafactories.
- Drive board‑level commitment – Elevate AI sovereignty from IT to executive discussions. Align investments in sovereign AI with corporate sustainability goals and risk management. Leverage partnerships with European cloud providers and open‑source communities to build resilience.
6.3 Cultivate responsible agentic AI
- Adopt ethical frameworks – McKinsey and WEF warn that agentic AI could reshape 39 % of job market skills by 2030genesishumanexperience.com. Organisations should implement ethical guidelines, transparency requirements and human‑in‑the‑loop controls when deploying agents.
- Upskill workforce – Train employees to work alongside AI agents, focusing on data stewardship, prompt engineering and agent orchestration. Encourage cross‑functional teams (IT, legal, HR) to manage AI governance.
- Monitor agent sprawl – As agents proliferate, implement governance mechanisms to prevent duplication, security risks and unintended behaviours. Use open standards and registries to track agent capabilities and responsibilities.
7 Conclusion
Google’s Ironwood TPU and Axion CPU represent a step change in AI infrastructure, signalling a shift from training‑dominated compute to inference‑first, agentic workloads. Ironwood’s 42.5 exaFLOPS pods, 1.77 PB shared memory and optical switching deliver unprecedented inference throughput, while Axion’s Arm‑based design offers energy‑efficient general‑purpose compute and mitigates supply‑chain dependencetechradar.comnewsroom.arm.com. These chips underpin Google’s AI Hypercomputer, which claims remarkable ROI and cost savingshpcwire.com. Early adopters like Anthropic, Vimeo, ZoomInfo and Paramount validate the performance and economic benefits.
However, the AI hardware race remains dynamic. Nvidia, AWS, Microsoft and AMD each offer compelling alternatives with varying strengths—be it Nvidia’s memory bandwidth, AWS’s price‑performance or AMD’s high‑memory chips. For enterprises, choosing the right platform depends on workload characteristics, software ecosystem and risk tolerance. Additionally, European organisations must weigh performance and cost against digital sovereignty. Although EU AI Factories and InvestAI signal progress, Europe’s compute deficit and modest funding highlight the need for public‑private collaboration and structural reformsseglerconsulting.comaibusiness.com.
To navigate this landscape, organisations should prioritise benchmarking on real workloads, design agentic AI architectures that remain vendor‑agnostic, plan for data residency and compliance, and cultivate responsible AI practices. The age of inference will reward those who combine technological excellence with strategic foresight and ethical stewardship.