substack.com

CPUs are Back: The Datacenter CPU Landscape in 2026

2026-02-09 · 18:18 UTC ·SemiAnalysis ·49 min read

Brief

Datacenter CPUs are becoming strategically important again, not because GPUs stopped mattering, but because AI systems have become more CPU-hungry around the GPU cluster. SemiAnalysis argues that since late 2025, reinforcement learning, agentic inference, retrieval-heavy workloads, multimodal preprocessing, and the management of petabyte-scale data pipelines have all increased CPU intensity. In training, CPUs execute RL environments, compile and verify code, run math and physics simulations, and keep expensive accelerator clusters fed. In inference, agents generate far more external API calls, database lookups, and internet traffic than classic chatbot serving. The article’s most concrete infrastructure example is Microsoft’s Fairwater site for OpenAI, where a 48 MW CPU/storage building supports a 295 MW GPU cluster, implying CPU demand is no longer incidental to AI buildouts. That demand is also colliding with mainstream cloud consolidation: newer cloud-native CPUs can replace old Intel Cascade Lake fleets at 10:1 socket consolidation ratios or better while using less than one-fifth the power, freeing capacity for GPUs.

On product competitiveness, the piece is notably bullish on AMD and skeptical of Intel. It presents Venice as AMD’s strongest step yet: eight TSMC N2 Zen 6c CCDs, 256 cores, 16 memory channels, 1.64 TB/s with MRDIMM-12800, restored 4 MB L3 per Zen 6c core, and a claimed 1.7x perf/W gain over Turin. SemiAnalysis also highlights AMD’s willingness to keep serving the mainstream 8-channel enterprise segment with a Venice SP8 platform, just as Intel has reportedly cancelled Diamond Rapids-SP. Intel’s roadmap is framed as architecturally ambitious but commercially compromised: Diamond Rapids adopts a more AMD-like multi-die topology with four compute building blocks and two memory/I/O hub dies, but loses SMT on P-cores, which the authors think badly undermines datacenter throughput. Clearwater Forest, meanwhile, is treated as a costly learning vehicle for 18A and Foveros Direct rather than a volume winner, with only 17% better performance than Sierra Forest after a delay into H1 2026.

The broader market structure is also shifting. Hyperscalers are no longer just Arm licensees; they are some of the most capable datacenter CPU vendors. AWS Graviton5 doubles to 192 Neoverse V3 cores and feeds Trainium3 head nodes, Microsoft’s Cobalt 200 jumps to 132 Neoverse V3 cores and 50% better performance, and Google is splitting Axion between higher-performance C4A and scale-out N4A variants. NVIDIA is evolving from Grace to Vera with far higher coherent bandwidth and memory capacity to support GPU-centric systems, while Arm itself is crossing a line from IP licensor to CPU supplier with Phoenix for Meta and possibly OpenAI-linked systems. For anyone tracking AI infrastructure, the important takeaway is that the next datacenter bottleneck may not be just accelerators or power delivery, but the surrounding layers of CPUs, DRAM, packaging, and interconnect needed to make GPU clusters useful.

Why it matters

SemiAnalysis argues datacenter CPUs re-emerged as a bottleneck in late 2025 because AI workloads now require large CPU fleets for reinforcement learning environments, RAG/agent tool use, data preprocessing, and storage orchestration; Microsoft’s “Fairwater” complex reportedly pairs a 48 MW CPU-and-storage building with a 295 MW GPU cluster, roughly a 1:6 CPU-to-GPU power ratio.

Key details

Intel said on its Q4 2025 earnings call that datacenter CPU demand unexpectedly accelerated, prompting higher 2026 foundry tool capex and wafer reallocation from PC to server; SemiAnalysis says frontier AI labs are competing with cloud providers for commodity x86 servers, Intel is considering Xeon price hikes, and AMD expects server CPU TAM growth in the “strong double digits” in 2026.
AMD’s 2026 EPYC Venice moves to 16 DDR5 memory channels and supports MRDIMM-12800 for 1.64 TB/s bandwidth, which the article says is 2.67x Turin; the top Venice part uses eight TSMC N2 Zen 6c CCDs for 256 cores, and AMD claims over 1.7x better performance per watt versus the top 192-core Turin in SPECrate2017_int_base.
Intel’s Diamond Rapids shifts from a giant logically monolithic mesh to four compute dies plus two I/O and memory hub dies, for 16 DDR5 channels, PCIe 6/CXL 3, and up to 256 printed cores with about 192 expected in mainstream SKUs; SemiAnalysis argues removal of SMT on Intel P-cores will materially hurt throughput, estimating a 192-core/192-thread Diamond Rapids may be only about 40% faster than a 128-core/256-thread Granite Rapids.
Intel’s Clearwater Forest is portrayed as a weak bridge product: delayed from H2 2025 to H1 2026, using Foveros Direct hybrid bonding with up to 288 cores, but only 35 GB/s of vertical bandwidth per 4-core cluster to the base die and just a 17% performance gain over Sierra Forest at equal core counts, despite higher cost and packaging complexity.
NVIDIA’s Grace remains highly specialized for GPU head-node use, with 900 GB/s bidirectional NVLink-C2C, up to 480 GB CPU memory, 72 active Arm Neoverse V2 cores, and 500 GB/s LPDDR5X bandwidth; SemiAnalysis says branch prediction/pathology in Neoverse V2 can cause large slowdowns in AI/HPC code, with NVIDIA’s tuning guide citing up to 50% speedups from code-locality optimization.

Cleaned source text

title: CPUs are Back: The Datacenter CPU Landscape in 2026

author: SemiAnalysis

content_type: newsletter

publication: substack.com

published: 2026-02-09T18:18:05+00:00

source_url: gmail://19c43a26e9bd238f

word_count: 10988

RL and Agent Usage, Context Memory Storage, DRAM Pricing Impacts, CPU Interconnect Evolution, AMD Venice, Verano, Florence, Intel Diamond Rapids, Coral Rapids, Arm Phoenix + Venom, Graviton 5, Axion

͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

CPUs are Back: The Datacenter CPU Landscape in 2026

Gerald Wong and Dylan Patel

Feb 9| | | ∙| | Preview

READ IN APP

Since 2023, the datacenter story has been simple. GPUs and networking are king. The arrival and subsequent explosion of AI Training and Inference have shifted compute demands away from the CPU. This meant that Intel, the primary supplier of server CPUs, failed to ride the wave of datacenter buildout and spending. Server CPU revenue remained relatively stagnant as hyperscalers and neoclouds focused on GPUs and datacenter infrastructure.

At the same time, the same hyperscalers have been rolling their own ARM-based datacenter CPUs for their cloud computing services, closing off a significant addressable market for Intel. And within their own x86 turf, Intel’s lackluster execution and uncompetitive performance to rival AMD has further eroded market share. Without a competent AI accelerator offering, Intel was left to tread water while the rest of the industry feasted.

Over the last 6 months this has changed massively. We have posted multiple reports to Core Research and the Tokenomics Model about soaring CPU demand. The primary drivers we have shown and modeled are reinforcement learning and vibe coding’s incredible demand on CPUs. We have also covered major CPU cloud deals by multiple vendors with AI labs. We also have modeling of how many CPUs of what types are being deployed.

Intel Q4’25 DCAI Revenue. Source: Intel

However, Intel’s recent rallies and changing demand signals in the latter part of 2025 have shown that CPUs are now relevant again. In their latest Q4 earnings, Intel saw an unexpected uptick in datacenter CPU demand in late 2025 and are increasing 2026 capex guidance on foundry tools and prioritizing wafers to server from PC to alleviate supply constraints in serving this new demand. This marks an inflection point in the role of CPUs in the datacenter, with AI model training and inference using CPUs more intensively.

Datacenter CPU Core Count Trend. Source: SemiAnalysis Estimates

2026 is an exciting year for the datacenter CPU, with many new generations launching this year from all vendors amid the boom in demand. As such, this piece serves to paint the CPU landscape in 2026. We lay the groundwork, covering the history of the datacenter CPU and the evolving demand drivers, with deep dives on datacenter CPU architecture changes from Intel and AMD over the years.

We then focus on the 2026 CPUs, with comprehensive breakdowns on Intel’s Clearwater Forest, Diamond Rapids and AMD’s Venice and their interesting convergence (and divergence) in design, discussing the performance differences and previewing our CPU costing analysis.

Next, we detail the ARM competition, including NVIDIA’s Grace and Vera, Amazon’s Graviton line, Microsoft’s Cobalt, Google’s Axion CPU lines, Ampere Computing’s merchant ARM silicon bid and their acquisition by Softbank, ARM’s own Phoenix CPU design and look at Huawei’s home grown Kunpeng CPU efforts.

For our subscribers, we provide our datacenter CPU roadmap to 2028 and detail the datacenter CPUs beyond 2026 from AMD, Intel, ARM and Qualcomm. We then look ahead to what the future looks like for datacenter CPUs, discuss the effects of the DRAM shortage, what NVIDIA’s Bluefield-4 Context Memory Storage platform means for the future of general purpose CPUs, and the key trends to look out for in the CPU market and CPU designs going forward.

Upgrade to paid

The Role and Evolution of Datacenter CPUs

The PC Era

Intel Pentium Pro. Source: Intel

The modern version of the datacenter CPU can be traced back to the 1990s following the success of Personal Computers in the prior decade, bringing basic computing into the home. As PC processing power grew with Intel’s i386, i486 and Pentium generations, many tasks normally computed by advanced workstation and mainframe computers from the likes of DEC and IBM were instead done on PCs at a fraction of the cost. Responding to this need for higher performance “mainframe replacements”, Intel began to release PC processor variants that had more performance and larger caches for higher prices, starting with the Pentium Pro in 1995 that had multiple L2 cache dies co-packaged with the CPU in a Multi-Chip Module (MCM). The Xeon brand then followed suit in 1998, with the Pentium II Xeons that similarly had multiple L2 cache dies added to the CPU processor slot. While mainframes still continue today in the IBM Z lines used for bank transaction verifications and such, they remain a niche corner of the market that we will not cover in this piece.

The Dot Com Era

The 2000s brought the internet age, with the emergence of Web 2.0, e-mail, e-commerce, Google search, smartphones with 3G broadband data, and the need for datacenter CPUs to serve the world’s internet traffic as everything went online. Datacenter CPUs grew into a multi-billion dollar segment. On the design front, after the GHz wars were over with the end of Dennard scaling, attention shifted to multi-core CPUs and increased integration. AMD integrated the memory controller into the CPU silicon, and high-speed IO (PCIe) came directly from the CPU as well. Multi-core CPUs were especially suited for datacenter workloads, where many tasks could be run in parallel across different cores.

We will detail the evolution of how these multiple cores are connected in the interconnect section below. Simultaneous Multi-Threading (SMT) was also introduced in this time by both AMD and Intel, partitioning a core into two logical threads that could operate independently while sharing most core resources, further improving performance in parallelizable datacenter workloads. Those looking for more performance would turn to Multi-socket CPU servers, with Intel’s Quick Path Interconnect (QPI) and AMD’s HyperTransport Direct Connect Architecture in their Opteron CPUs providing coherent links between up to eight sockets per server.

The Virtualization and Cloud Computing Hyperscaler Era

The next major inflection point came with cloud computing in the late 2000s, and was the primary growth driver for datacenter CPU sales throughout the 2010s. Much like how GPU Neoclouds are operating today, computing resources began consolidating toward public cloud providers and hyperscalers such as Amazon’s Web Services (AWS) as customers traded CapEx for OpEx. Spurred by the effects of the Great Recession, many enterprises could not afford to buy and run their own servers to run their software and services.

Cloud computing offered a far more palatable “pay as you use” business model with renting compute instances and running your workloads on 3rd-party hardware, which allowed spending to dynamically adjust with usage that varied over time. This scalability was more favorable than procuring one’s own servers, which needed to be utilized fully at all times to maximize ROI. The Cloud also enabled more streamlined services to emerge, such as serverless computing from the likes of AWS Lambda that automatically allocates software to computing resources, sparing the customer from having to decide on the appropriate number of instances to spin up before running a particular task. With nearly everything handled by them behind the scenes, Clouds turned compute into a commodity.

Pat Gelsinger, VMware CEO 2012-2021, Intel CEO 2021-2024. Source: X @PGelsinger

The key feature for a secure and resource efficient Cloud to work at all is CPU hardware virtualization. In essence, virtualization allows a single CPU to run multiple independent and secure instances of Virtual Machines (VMs) orchestrated through hypervisors such as VMware ESXi. Multi-core CPUs could be partitioned such that each VM would be assigned to a single core or logical thread, with the hypervisor able to migrate instances onto different cores, sockets or servers over the network to optimize for CPU utilization while keeping data and instructions secured from other instances operating on the same CPU.

The need for virtualization for the cloud, combined with CPU designers implementing SMT to boost performance was eventually exploited with the Spectre and Meltdown vulnerabilities in 2018. When two instances ran on threads running on the same physical core, it was possible for an attacker to snoop and piece together data from the other thread using the CPU cores branch prediction functions, a performance boosting technique that guessed, fetched and executed instructions ahead of the running program to keep the CPU busy. With security in the cloud potentially compromised, providers rushed to disable SMT to stop the attack vector. Despite patches and hardware fixes, the performance loss of up to 30% without SMT would haunt Intel and show up in untimely design decisions down the road which we detail below.

The AI GPU and CPU Consolidation Era

Following the COVID boom that boosted internet traffic with way more Zoom calls, e-commerce and more time spent online, datacenter CPU growth was at an all-time high. In the five years leading up to ChatGPT’s launch in November 2022, Intel shipped over 100 Million Xeon Scalable CPUs to cloud and enterprise datacenters.

From then on, AI model training and inference serving would upend the CPU’s role in the datacenter, causing widespread changes in CPU deployment and design strategies. Computing AI models requires lots of matrix multiplication, an operation that can be easily parallelized and done at massive scales on GPUs which had large arrays of vector units originally used to render 3D graphics for games and visualizations.

While accelerator nodes still used host CPUs, the highly structured and relatively simple compute requirements did not take advantage of the CPU’s ability to run branchy, latency sensitive code. And with tens of vector units compared with thousands on GPUs, performance and efficiency was 100-1000x worse on CPU, especially when AI-specific GPUs added MatMul focused Tensor Cores to the mix. Despite Intel’s efforts to add more vector and matrix support with doubled AVX512 ports and dedicated AMX accelerator engines, the CPU was relegated to a support role in the datacenter. However, the internet still had to be served while power in the datacenter got prioritized to GPU compute. As a result, CPUs evolved with the times and were split into two categories.

Head Nodes

The head node CPU’s role is to manage the attached GPUs and keep them fed with data. High per-core performance with large caches and high bandwidth memory and IO are desired to keep tail latencies as low as possible. Dedicated designs such as NVIIDA’s Grace were made with coherent memory access for GPUs to utilize CPU memory as model context Key Value Cache expansions, requiring extremely high CPU to GPU bandwidth. For head nodes, 1 CPU is usually paired with 2 or 4 GPUs in each compute node. Examples include:

1 Vera CPU to 2 Rubin GPUs per superchip

1 Venice CPU to 4 MI455X GPUs per compute tray

1 Graviton5 CPU to 4 Trainium3 per compute tray

2 x86 CPUs to 8 TPUv7 per node

Cloud-Native Socket Consolidation

As GPUs hogged more datacenter power budgets, the need to serve the rest of the internet as efficiently as possible accelerated the development of “Cloud-Native” CPUs. The goal is maximum throughput and requests served per socket at the best efficiency (throughput per Watt). Instead of adding more, newer CPUs to boost total throughput, old, less efficient servers are decommissioned and replaced with a far smaller number of cloud-native CPUs that met the total throughput requirement while sipping a fraction of the power, lowering operating costs and freeing up power budget for more GPU compute.

AMD Turin Dense 7:1 Socket Consolidation. Source: AMD

Socket consolidation ratios of 10:1 or greater can be achieved. Millions of Intel Cascade Lake servers bought during the COVID cloud spend are being retired for the latest AMD and Intel CPUs that process at the same performance level but at less than a fifth of the power.

Design wise, these Cloud-native CPUs target higher core counts with area and power efficient medium-sized cores, and have less cache and IO capabilities compared to traditional CPUs. Intel brought their Atom cores to the datacenter with Sierra Forest. AMD’s Bergamo used a more area and power efficient layout of their Zen4 core. Power efficient ARM-based designs such as AWS Graviton saw great success, while Ampere Computing targeted cloud-native compute with the Altra and AmpereOne lines.

The RL and Agentic Era

Microsoft “Fairwater” GPU and CPU buildings. Source: Google Earth

Now, CPU usage is accelerating again to support AI training and inference beyond head nodes. We can already see evidence of this in Microsoft’s “Fairwater” datacenters for OpenAI. Here, a 48MW CPU and storage building supports the main 295MW GPU cluster. This means tens of thousands of CPUs are now needed to process and manage the Petabytes of data generated by the GPUs, a use case that wouldn’t have otherwise been required without AI.

The evolution of AI computing paradigms has caused this increase in CPU usage intensity. In pretraining and model fine-tuning, CPUs are used to store, shard and index data to be fed to the GPU clusters for matrix multiplication. CPUs are also used for image and video decode in multimodal models, although more fixed function media acceleration is being integrated directly into GPUs.

Reinforcement Learning Training Loop. CPUs used in RL Environment (Green). Source: SemiAnalysis

Use of Reinforcement Learning techniques for model improvement increases CPU demand further. From our deep dive on Reinforcement Learning, we see that in an RL training loop, the “RL Environment” needs to execute the actions generated by the model and calculate the appropriate reward. To do this in areas such as coding and mathematics, lots of CPUs are needed in parallel to perform code compilation, verification, interpretation, and tool use. CPUs are also heavily involved in complex physics simulations and verifying generated synthetic data at high precision. The growing complexity of RL environments needed to scale models further thus necessitates large high-performance CPU clusters located close to the main GPU clusters to keep them busy and minimize GPU idle time. This increasing reliance on RL and CPUs in the training loop is creating a new bottleneck, as AI accelerators are improving in Performance per Watt at a far greater rate than CPUs, meaning a future GPU generation such as Rubin may require an even higher ratio of CPU to GPU power than the 1:6 ratio seen in Fairwater above.

Scaling Reinforcement Learning: Environments, Reward Hacking, Agents, Scaling Data

Dylan Patel and AJ Kourabi| | ·

June 8, 2025

Read full story

On the inference side, the rise of Retrieval Augmented Generative (RAG) models that search and use the internet along with agentic models that invoke tools and query databases has significantly increased the need for general-purpose CPU compute to service these requests. With the ability to send out API calls to multiple sources, each agent can essentially use the internet far more intensively than a human can by doing simple Google searches. AWS and Azure are doing massive CPU buildouts of their own Graviton and Cobalt lines of CPUs as well as purchasing even more x86 general purpose servers for this stepfold increase in internet traffic.

As we go through 2026, the demands on datacenter CPU and DRAM are only getting stronger. Frontier AI labs are running out of CPUs for their RL Training needs and are scrambling for CPU allocation by competing directly with the cloud providers for commodity x86 CPU servers. Intel, facing the unexpected depletion of their CPU inventory, is looking to raise prices across their Xeon line while they ramp additional tools to shore up CPU production. AMD has been increasing their supply capability to grow and take share in a server CPU TAM it believes will grow in the “strong double digits” in 2026. We will discuss how the CPU landscape evolves beyond 2026 for our subscribers below.

History of Multi-Core CPU Interconnects

To appreciate the design changes and philosophies of the 2026 CPUs, we have to understand how multi-core CPUs work and the evolution of interconnects as core counts grew. With multiple cores comes the need to connect those cores together. Early dual-core designs such as Intel’s Pentium D and Xeon Paxville in 2005 simply consisted of two independent single cores, with core-to-core communication done off-package via the Front Side Bus (FSB) to a Northbridge chip that also housed the memory controllers. AMD’s Athlon 64 X2, also in 2005, could be considered a true dual-core processor with two cores and an integrated memory controller (IMC) on the same die, allowing the cores to communicate with each other and to memory and IO controllers directly within the silicon through on-die NoC (Network on Chip) data fabrics.

Intel Tulsa Die Shot. Source: Intel, Hot Chips 2006

Intel’s subsequent Tulsa generation included 16MB of L3 cache shared between the two cores and functions as an on-die core to core data fabric. As we will see later, these on-die data fabrics will become a crucial factor in datacenter CPU design as core counts grow in the hundreds.

Crossbar Limits

As designers tried to increase core counts further, they ran into scaling limits of these early interconnects. As minimal latency and uniformity was desired, crossbar designs were used in an all-to-all fashion, where every core has a discrete link to all other cores on die. However, the number of links increased greatly with more cores, increasing complexity.

2 cores: 1 connection

4 cores: 6 connections

6 cores: 15 connections

8 cores: 28 connections

The practical limit for most designs ended at 4 cores, with higher core count processors achieved with multi-chip modules and dual-core modules that shared and L2 cache and data fabric socket between core pairs. The crossbar wiring is usually done in the metal lines above the shared L3 caches, saving area. Intel’s 6-core Dunnington in 2008 used three dual-core modules with 16MB of shared L3.

AMD Opteron Istanbul 6-core die. Source: AMD

AMD launched their 6-core Istanbul in 2009 with a 6-way crossbar and 6MB L3. Their 12-core Magny-Cours in 2010 used two 6-core dies, with the 16-core Interlagos consisting of two dies each with four Bulldozer dual-core modules.

Intel’s Ring Bus

Intel Nehalem-EX Ring Interconnect. Source: Intel, Hot Chips 2009