Read Briefing · 2026-04-05

Briefing

45 items ·2026-04-05T23:51

MUST READ

Read these first.

45 items

Epoch AI 2025-08-11 4 min read

How much power will frontier AI training demand in 2030?

Why it matters

The EPRI-collaboration white paper (published Aug 2025) projects the largest individual frontier training runs will likely draw 4–16 GW of power by 2030, implying per-run power growth of roughly 2.2–2.9x/year.

Key details

Drivers: training compute has grown ~4–5x/year historically; hardware energy efficiency improved ~40%/yr for leading AI GPUs (26%/yr across broader GPUs); training-duration growth is expected to slow to ~10–20%/yr—these factors combine to produce the net power forecast.
System-level impact: >100 GW of total global AI capacity and >50 GW in the US by 2030 is plausible (≈5% of US generation), and multi-datacenter/distributed training may mitigate single-site multi-GW constraints.

Brief

Frontier AI training power demand is projected to reach 4–16 GW for the largest single training runs by 2030, per an EPRI-linked white paper (Aug 2025). The forecast combines 4–5x/year compute scaling with continued GPU efficiency gains (~26–40%/yr) and a slowdown in training-duration growth (10–20%/yr), yielding net power growth of ~2.2–2.9x/year and >100 GW total AI capacity globally.

By Josh You, David Owen

Open reader Original

OpenAI 2025-01-21 2 min read

Announcing The Stargate Project

Why it matters

$500 billion committed over four years to build U.S. AI infrastructure, with $100 billion to be deployed immediately and a stated goal of creating "hundreds of thousands" of American jobs and strategic national-security capability.

Key details

Initial equity funders are SoftBank, OpenAI, Oracle, and MGX; SoftBank is the financial lead, OpenAI the operational lead, and Masayoshi Son will serve as chairman.
Key technology partners include Arm, Microsoft, NVIDIA, and Oracle; buildout starts in Texas and OpenAI posted RFP/RFQ materials for land/power and A/E site design on Jan 31, 2025.

Brief

The Stargate Project is a $500 billion initiative led by SoftBank and OpenAI to build U.S. AI infrastructure over four years, deploying $100 billion immediately to expand compute, create hundreds of thousands of jobs, and bolster national-security capabilities. Initial equity includes SoftBank, OpenAI, Oracle, and MGX; partners include NVIDIA, Microsoft, Arm, and Oracle; buildout begins in Texas and RFP/RFQ details were released Jan 31, 2025.

Open reader Original

substack.com 2026-02-16 55 min read

InferenceX v2: NVIDIA Blackwell Vs AMD vs Hopper - Formerly InferenceMAX

Why it matters

SemiAnalysis’ InferenceX v2 benchmarks nearly 1,000 frontier GPUs across NVIDIA Hopper/Blackwell/Blackwell Ultra and AMD Instinct SKUs, adding the first third-party full-Pareto tests of GB300 NVL72/B300 and multi-node MI355X FP4/FP8 disaggregated serving with wide expert parallelism.

Key details

NVIDIA’s rack-scale Blackwell systems dominate state-of-the-art MoE inference: GB200/GB300 NVL72 delivered up to 98-100x higher realized performance than a strong H100 disagg+wideEP baseline at ~116 tok/s/user, and 9.7x to 65x better tokens-per-dollar than Hopper depending on interactivity.
AMD’s MI355X is competitive in narrower cases: on FP8 disaggregated serving with SGLang+MoRI it roughly matches B200 running SGLang, and in single-node FP8 serving MI355X often beats B200 on perf/TCO; however, it falls well behind when compared against NVIDIA’s more mature Dynamo+TensorRT-LLM stack.
The biggest AMD weakness is software composability: FP4 + disaggregated prefill + wide expert parallelism performs worse than theory predicts, and in some 1k/1k scenarios MI355X with MTP only barely beats B200 without MTP, while B200 with Dynamo TRT-LLM + MTP remains clearly ahead.

Brief

InferenceX v2 is SemiAnalysis’ expanded open-source benchmark suite for frontier LLM inference, aimed at measuring not just peak throughput but the full tradeoff curve between interactivity, throughput, cost, and energy efficiency across modern GPU systems. The new release covers all six recent NVIDIA western SKUs and all AMD western GPU SKUs from the past three years, using close to 1,000 GPUs per full sweep. Its headline result is that NVIDIA’s rack-scale Blackwell systems—especially GB200 and GB300 NVL72—massively outperform both Hopper and AMD once production-style inference techniques are enabled, particularly for mixture-of-experts models such as DeepSeek R1. SemiAnalysis argues that the real differentiator is not just chip FLOPS, but the ability to combine disaggregated prefill, wide expert parallelism, FP4 quantization, and mature multi-node serving software such as Dynamo and TensorRT-LLM. In that setup, GB200/GB300 NVL72 can deliver up to 98-100x the realized performance of strong H100 baselines and as much as 65x better tokens-per-dollar. The architectural reason is straightforward: NVL72 keeps 72 GPUs inside a single 900 GB/s-per-GPU NVLink domain, allowing expert-parallel all-to-all traffic and weight loading to stay on a far faster fabric than the 400-800 Gbit/s scale-out interconnect used between standard nodes.

AMD’s position is more nuanced. MI355X appears genuinely competitive in single-node FP8 and in FP8 disaggregated serving when compared specifically against NVIDIA running SGLang, and SemiAnalysis notes rapid improvement in AMD’s software stack over the prior two months, including ~2x gains in some DeepSeek R1 FP4 configurations and >20% throughput-per-GPU gains from MoRI in the 20-45 tok/s/user band. But the report’s core criticism is that AMD still lacks “composability”: isolated optimizations work, yet combining FP4, disaggregated serving, and wideEP causes performance to degrade sharply. That leaves MI355X far behind B200 once NVIDIA’s production stack, especially Dynamo + TRT-LLM, enters the picture. The report also critiques AMD’s upstream support model, noting MI355X still depends on an old forked vLLM 0.10.1 ROCm image while the official 0.15.1 image hard-fails, and citing insufficient CI hardware donations to projects like vLLM and SGLang.

The article is also valuable for its system-level economics. It shows why disaggregated prefill improves utilization by separating compute-heavy prefill from memory-bound decode, and why wideEP reduces redundant model weight loading across MoE deployments. It frames Anthropic’s “fast mode” as an inference scheduling decision rather than a hardware mystery: serving the same model at 2.5x higher tok/s/user naturally drives 6-12x higher cost per token because accelerator hourly cost is fixed while batching efficiency falls. Multi-token prediction is presented as the most powerful software lever in the current stack, often slashing cost per million tokens by multiples while preserving accuracy on benchmark checks like GSM8k. For anyone trying to understand who actually wins the AI infrastructure race, the main lesson is that rack-scale topology, interconnect bandwidth, software maturity, and inference orchestration now matter as much as the raw silicon.

By SemiAnalysis

Open reader Original

substack.com 2026-02-18 48 min read

The Secret Rules Behind ERCOT Prices with Andrew Reimers

Why it matters

ERCOT’s real-time co-optimization (RTC), launched on Dec. 5, 2025, shifted operating reserves from primarily day-ahead physical obligations to real-time physical awards, with day-ahead reserve awards becoming largely financial positions that participants arbitrage against real-time outcomes.

Key details

Andrew Reimers of Potomac Economics said ERCOT’s post-Uri “conservative operations” posture—carrying larger operating reserves to avoid outages and even conservation warnings—can distort prices in both directions: it can suppress scarcity prices and weaken generation investment signals, or, as in summer 2023, elevate prices when reserves are held out of the energy market.
Potomac estimated that ERCOT’s pre-RTC handling of ECRS (ERCOT Contingency Reserve Service) in summer 2023 created “billions of dollars of excess costs” because thousands of megawatts that had previously participated in the energy market were sequestered as reserves, making the energy market appear scarce even when physical capacity existed.
Potomac supports NPRR 1309, which defines DRRS (Dispatchable Reliability Reserve Service) as an operating reserve product, but recommends dismissing NPRR 1310 “with prejudice” because it would create an hourly capacity-like product intended to drive future resource adequacy through day-ahead and real-time procurement signals that Reimers argues are structurally misaligned with future investment needs.

Brief

ERCOT’s market design is becoming one of the most important pieces of Texas infrastructure policy because, in an energy-only market, reliability policy doubles as investment policy. In this interview, Potomac Economics deputy director Andrew Reimers explains that scarcity pricing and the operating reserve demand curve are supposed to send the signal for new generation and flexible capacity to get built. The problem is that Texas has spent the post-Uri period leaning toward more conservative operations—keeping more reserves online to reduce outage risk and even avoid politically toxic conservation alerts. That may improve short-run comfort, but it can also undermine the scarcity prices that tell investors the market needs more supply. Reimers argues this tension is especially acute in ERCOT because Texas is electrically isolated, has unusually high penetrations of wind, solar, and storage, and cannot rely on neighbors the way PJM, CAISO, or New York can.

The major recent structural change is real-time co-optimization, which went live on Dec. 5, 2025 after years of advocacy from Potomac and others. Before RTC, operating reserves were procured in the day-ahead market and carried as real-time physical obligations, which could create severe price distortions because reserved capacity was effectively withheld from the real-time energy market. Reimers points to summer 2023, when ERCOT’s addition of ECRS pulled thousands of megawatts out of the energy stack and produced what Potomac estimated were billions of dollars in excess consumer costs. Potomac now supports NPRR 1309, which treats DRRS as a true operating reserve product, but strongly opposes NPRR 1310, which would layer a capacity-like resource adequacy mechanism into hourly reserve procurement. His core objection is methodological: real-time or day-ahead operating conditions are appropriate for pricing present scarcity, but not for setting durable forward signals to induce efficient future investment.

The conversation also highlights how batteries and forecasting are reshaping ERCOT’s operational logic. Solar has lowered prices during high-load daylight hours and shifted reliability stress to the evening net-load ramp, where roughly 15 GW of batteries now help bridge the loss of solar output. But Reimers says forecast error—not peak demand alone—is now the key variable, especially with wind, whose output is harder to predict. That is why Potomac favors a multi-interval real-time market that looks ahead one to two hours, allowing batteries to preserve state of charge when the system expects tighter conditions later. He argues ERCOT’s current four-hour duration requirement for non-spin is a crude substitute that can misprice storage and unintentionally push batteries into energy sales rather than reserve provision. More broadly, Reimers warns that out-of-market programs—from emergency reserves to residential demand response and firm-fuel constructs—can dilute the transparency and allocative efficiency of scarcity pricing by paying resources through side channels instead of through the core market.

By Texas Energy and Power Newsletter

Open reader Original

The Texas Energy and Power Newsletter 2026-01-23 2 min read

Another Winter Storm Bears Down on Texas | Reading and Podcast Picks — Jan. 23, 2026

Why it matters

Texas faces a major winter storm the weekend of Jan 23, 2026; ERCOT projected at least 7 GW more generation than peak demand as of Friday, while GridStatus reported >9 GW of gas-and-coal plant outages Friday (down from ~12 GW earlier).

Key details

Battery storage on ERCOT has surged to about 17,000 MW today versus 220 MW five years ago, a change the newsletter highlights as materially improving resilience compared with Winter Storm Uri in 2021.

Brief

Texas' power grid faces a January 23, 2026 winter storm expected milder than 2021's Uri; ERCOT showed ≥7 GW reserve and >9 GW thermal outages Friday, while battery capacity jumped to ~17,000 MW from 220 MW in five years.

By Texas Energy & Power Media

Open reader Original

divenewsletter.com 2026-02-13 5 min read

AEP’s contracted large-load pipeline doubled to 56 GW overall; its Texas data…

Why it matters

AEP’s contracted large-load pipeline doubled to 56 GW overall; its Texas data center pipeline jumped to 36 GW by the end of December (from 13 GW in Q3).

Key details

Exelon’s capital plan is $41.3 billion driven by transmission, and the company has a ‘line of sight’ on an additional $12–17 billion of transmission buildout over the next 10 years.
Treasury issued interim FEOC guidance clarifying how to calculate a project’s material assistance cost ratio (MACR) and thresholds; separately, the GAO said DOE ‘does not have a plan’ to oversee roughly $21.5 billion in IIJA clean energy demonstration no‑year funds.

Brief

Utility-sector headlines: AEP’s contracted large-load pipeline surged to 56 GW (Texas data-center pipeline 36 GW by December vs. 13 GW in Q3), while Exelon is planning $41.3B in capital spending with $12–17B more in transmission ‘line of sight.’ Treasury issued interim FEOC/MACR guidance and GAO warned DOE lacks an oversight plan for ~$21.5B in IIJA demonstration funds.

By Utility Dive

Open reader Original

ponoko.com 2026-02-05 6 min read

SpaceX Formalizes Plan To Build 1 Million Satellite Orbital Data Center System

Why it matters

SpaceX has formalized a plan to launch up to 1,000,000 satellites as orbital data centers (Ponoko report, published 2026-02-05), raising engineering and operational concerns about collision risk, space debris, and the need for new orbital traffic-management systems.

Key details

India announced a policy offering zero taxes through 2047 to attract foreign cloud providers (Amazon, Google, Microsoft) to run AI workloads in-country, a bid that could draw billions in investment but faces constraints from power shortages, water stress, and infrastructure limits.
Former Google engineer Linwei Ding was convicted for stealing GPU and TPU trade secrets—copying technical documents on AI data-center designs, accelerator architectures, and cluster management—highlighting national-security stakes around AI infrastructure IP.

Brief

SpaceX's plan to build a 1,000,000‑satellite constellation as orbital data centers (reported 2026-02-05) promises massive distributed AI compute but intensifies collision and debris risks, requiring new traffic-management and debris‑mitigation engineering. Concurrently, India is offering zero taxes through 2047 to host foreign AI workloads, courting major cloud vendors despite power/water limits, while a conviction of ex‑Google engineer Linwei Ding underscores rising espionage threats to accelerator and data‑center IP.

By Robin

Open reader Original

Article 2025-07-21 4 min read

Build AI in America

Why it matters

Anthropic's 'Build AI in America' report projects U.S. frontier AI training demand of 20–25 GW by 2028 and argues the sector needs at least 50 GW of electric capacity (training + inference) by 2028; it estimates single-model training centers will require ~2 GW in 2027 and ~5 GW in 2028.

Key details

The report highlights a U.S. energy buildout gap vs China (China added >400 GW last year vs 'several dozen' GW in the U.S.) and recommends an 'all of the above' energy strategy including next-gen geothermal, advanced nuclear, and natural gas plus steps to keep rates low for consumers.
Policy recommendations are organized into two pillars: (1) build large-scale training infrastructure — use federal lands, accelerate NEPA, expedite transmission and grid interconnections; (2) enable nationwide AI deployment — speed permitting (geothermal/nuclear/gas), create National Interest Transmission Corridors, build strategic reserves of grid components and turbines, and expand workforce training.

Brief

Anthropic's 'Build AI in America' report sets concrete infrastructure targets and policy fixes to keep U.S. AI leadership: it projects 2 GW and 5 GW single-model training sites in 2027–2028, 20–25 GW of frontier training demand by 2028, and a need for ~50 GW total capacity; it recommends accelerated permitting, federal land use, transmission upgrades, and supply-chain and workforce programs.

Open reader Original

Epoch AI 2025-09-17 8 min read

Data on Frontier AI Data Centers

Why it matters

Epoch AI's Frontier Data Centers hub (last updated Feb 19, 2026) tracks ~2.5M H100e-equivalent operational capacity and estimates it covers ~15–17% of global shipped AI compute as of Nov 2025.

Key details

Tracked gigawatt-scale build times run 1–3.6 years from groundbreaking to 1 GW facility power; xAI projects Colossus 2 can be built in 12 months and Epoch expects the first GW-scale datacenters online in early 2026.
Microsoft's Fairwater campus is projected to consume 3.3 GW by late 2027 (roughly equivalent to 3–4 large nuclear reactors), exceeding Los Angeles' ~2.4 GW average in 2023; OpenAI's Stargate and Meta's Holly Ridge campuses will reach Central Park-scale and ~4× Central Park by 2026–2030 respectively.

Brief

Epoch AI's Frontier Data Centers database (updated Feb 19, 2026) compiles satellite imagery, permits, and public documents to estimate power, compute, timelines and costs for major U.S. AI campuses. It reports ~2.5M H100e-equivalent tracked (15–17% of shipped compute as of Nov 2025), an 80% confidence range of ±50%, GW-scale builds in 1–3.6 years, and site-level projections such as Fairwater at 3.3 GW.

Open reader Original

Epoch AI 2025-11-04 15 min read

What you need to know about AI data centers

Why it matters

US AI data-center buildout is projected to need about 20–30 GW of power by late 2027 (≈5% of current US generation if 30 GW), while 30 GW would be an outsized share of smaller grids (≈25% Japan, 50% France, 90% UK).

Key details

OpenAI’s Stargate Abilene is emblematic: ~1 GW average power (comparable to Seattle), ~2×10^21 FLOP/s (>250× the compute used to train GPT‑4), on a 3.5 km² site, ~$32 billion in construction/IT cost, ~3,000 on‑site workers and ~2 years to build.
AI racks have very high power density: NVIDIA NVL72 GB200 racks pack 72 GPUs and can exceed ~100 kW per rack versus ~10 kW per rack in non‑AI centers; this drives adoption of liquid cooling and large chiller/cooling‑tower systems and notable water use (US data centers used ~17.4 billion gallons in 2023).
Implications: centralized, gigawatt‑scale training will likely remain feasible for ~2 years (largest projected run ≈2.5M H100‑equivalent GPUs; Microsoft/OpenAI Fairwater capacity may exceed that), but reliance on gas/diesel risks local emissions and makes large campuses visible and hard to fully secure (cooling gear is satellite‑identifiable).

Brief

AI data centers are becoming some of the largest infrastructure projects ever: concentrated campuses like OpenAI’s Stargate Abilene require roughly 1 GW of continuous power, enormous land (3.5 km²), and huge capital (~$32B). Aggregate US demand for frontier training could reach 20–30 GW by late 2027, driving siting toward jurisdictions with abundant, fast‑deployable power (e.g., Texas, China). What distinguishes AI centers is very high power density — racks such as NVIDIA’s NVL72 hold 72 GPUs and can draw >100 kW each — necessitating liquid cooling, chiller systems, and significant water management. Climate and security effects are currently localized (AI uses ~1% of US power now) but could grow to ~5% by 2027 if fossil fuels remain primary sources; satellite‑visible cooling infrastructure also complicates secrecy and physical security.

By Ben Cottier, Yafah Edelman

Open reader Original

OpenAI 2026-01-20 5 min read

Stargate Community

Why it matters

OpenAI’s Stargate aims for 10 GW U.S. AI infrastructure by 2029 and, as of Jan 2026 (one year after launch), is already “well beyond halfway” to that goal; the first operational site is in Abilene, Texas, with additional sites planned across Texas, New Mexico, Wisconsin, and Michigan.

Key details

OpenAI commits to paying its own way on energy so Stargate campuses “don’t increase your electricity prices,” with partners funding generation and storage: Oracle/Vantage + WEC (Wisconsin) using solar + batteries and a dedicated rate; Oracle + Related + DTE (Michigan) adding battery storage financed by the project; SB Energy to fund generation/storage in Milam County, Texas.
Community and environmental commitments include low-water/closed-loop cooling (Abilene’s annual water use ≈ half a single day of Abilene’s municipal use), a minimum $175M partner investment in Wisconsin infrastructure/water projects, and workforce development via OpenAI Academies (first Academy launching in Abilene in spring 2026); Microsoft announced related community-first AI commitments on Jan 13, 2026.

Brief

OpenAI’s Stargate program targets 10 GW of U.S. AI capacity by 2029 and reports it is already beyond halfway to that target one year after the Jan 2025 launch; the Abilene, Texas site is already training and serving frontier models. Stargate pledges to fund incremental generation, storage, and grid upgrades, minimize water use with closed-loop cooling, and create workforce pipelines via OpenAI Academies (first in Abilene, spring 2026).

Open reader

OpenAI 2026-01-09 2 min read

OpenAI and SoftBank Group partner with SB Energy

Why it matters

OpenAI and SoftBank Group each invested $500 million ( $1 billion total) into SB Energy and OpenAI signed a 1.2 GW data center lease for the initial Milam County buildout.

Key details

SB Energy is developing multi-gigawatt data center campuses with initial facilities expected to enter service starting in 2026; the Milam County site is designed to minimize water usage and will create thousands of construction jobs.
SB Energy secured $800 million of Redeemable Preferred Equity from Ares and formed a non-exclusive preferred partnership with OpenAI to combine OpenAI’s first-party data center design with SB Energy’s energy delivery under the broader $500 billion Stargate commitment.

Brief

OpenAI and SoftBank Group each committed $500 million to SB Energy and OpenAI leased 1.2 GW for a Milam County AI data center, part of SB Energy’s multi-gigawatt campus program. Initial facilities target service entry in 2026, emphasize reduced water usage, job creation, and pair OpenAI’s data center designs with SB Energy’s integrated energy delivery; Ares provided $800 million in preferred equity.

Open reader Original

OpenAI 2025-10-13 2 min read

OpenAI and Broadcom announce strategic collaboration to deploy 10 gigawatts of OpenAI-designed AI accelerators

Why it matters

OpenAI and Broadcom signed a term sheet to co-develop and deploy 10 gigawatts of OpenAI‑designed AI accelerators in racks across OpenAI facilities and partner data centers, with deployment targeted to begin in H2 2026 and complete by end of 2029.

Key details

Broadcom will provide end-to-end networking (Ethernet, PCIe, optical) for Ethernet-based scale-up and scale-out racks; OpenAI says the custom chips will embed learnings from its frontier models, and the company reports ~800 million weekly active users.

Brief

OpenAI and Broadcom announced a multi-year collaboration to co-develop racks featuring 10 GW of OpenAI‑designed AI accelerators paired with Broadcom Ethernet, PCIe and optical networking, aiming to start deployments in the second half of 2026 and finish by end of 2029. OpenAI intends to bake model-driven architecture choices into the custom silicon to improve performance and efficiency for large-scale training and inference.

Open reader Original

Reddit 2026-02-27 1 min read

Google just spent $1B on Form Energy's battery that runs on rust. The price? A game-changing $33 per kWh.

Why it matters

Google committed about $1.0 billion to buy a 300 MW, 100-hour iron‑air battery (30,000,000 kWh / 30 GWh) from Form Energy for its Pine Island data center, implying an all-in capital cost of ~ $33.33 per kWh.

Key details

The $33/kWh implied price substantially undercuts projected 2026 utility-scale LFP costs ($80–$140/kWh) and current Tesla Megapack pricing ($280–$327/kWh); Form Energy's long-term commercial target is ~$20/kWh. Iron‑air stores energy by reversible rusting (iron + oxygen), using cheap, abundant materials to enable low marginal cost for added hours.

Brief

Google's purchase of a roughly $1 billion, 300 MW / 100‑hour (30 GWh) iron‑air battery from Form Energy for its Pine Island data center implies an all-in capital cost near $33.33/kWh. The iron‑air system uses reversible rusting (iron + oxygen) to deliver long‑duration storage, aiming to beat lithium‑ion on multi‑day firming costs and target ~$20/kWh commercially.

Open reader Original

OpenAI 2025-07-22 2 min read

Stargate advances with 4.5 GW partnership with Oracle

Why it matters

OpenAI and Oracle agreed on July 22, 2025 to develop 4.5 GW of additional Stargate data center capacity in the U.S., adding to OpenAI’s Stargate program.

Key details

Combined with Stargate I in Abilene, TX, the partnership brings over 5 GW under development that will run more than 2 million chips; Oracle began delivering Nvidia GB200 racks last month and early training/inference workloads are already running.
OpenAI estimates the new 4.5 GW buildout will create over 100,000 construction and operations jobs; this advances a White House announcement (January) committing $500 billion to reach 10 GW of U.S. AI infrastructure over four years.

Brief

Stargate is expanding via a new OpenAI–Oracle agreement to add 4.5 GW of U.S. AI data‑center capacity, bringing total Stargate development to over 5 GW and supporting more than 2 million chips. Oracle has started delivering Nvidia GB200 racks and OpenAI is running early training/inference workloads; OpenAI projects 100,000+ jobs and cites a prior White House $500B/10 GW commitment.

Open reader Original

Blog 2025-09-30 15 min read

How to Rack 30 Petabytes of Storage

Why it matters

Built a 30 PB storage cluster to hold ~90 million hours of video for model pretraining; one-time capex ≈ $426,500 (2,400 HDDs, 100 NetApp DS4246 chassis, 10 CPU head nodes) and monthly in-house cost $29.5k (internet+power $17.5k + $12k/month depreciation), vs. AWS estimated $1.13M/month and Cloudflare bulk estimate $270k/month — ~40x cheaper than AWS.

Key details

Infrastructure and performance: 100Gbps DIA from Zayo, Mellanox ConnectX-4 100GbE NICs, recommended Broadcom 9305-16E HBAs; team reports near-saturation of 100 Gbps for read/write and ~4 Gbps per chassis in tested HBA configs.
Software and tradeoffs: a minimalist stack (200 lines of Rust writer, nginx reader, SQLite metadata, XFS-formatted disks) intentionally avoids replication/complex systems (Ceph/MinIO) because training data tolerates ~5% loss; proximity to a local SF datacenter and simple design reduced operational friction.

Brief

Standard Intelligence Team deployed a 30 PB, 100Gbps storage cluster (2,400 HDDs, 100 DS4246 chassis, 10 head nodes) to store ~90M hours of video for pretraining, costing ~$426.5k capex and $29.5k/month including depreciation and $17.5k/month internet+power. They used simple software (200-line Rust writer, nginx, SQLite, XFS), prioritized simplicity over redundancy, and report saturating their 100G link while avoiding complex systems like Ceph.

By Standard Intelligence Team

Open reader Original

substack.com 2026-02-06 38 min read

Building a Datacenter Part II

Why it matters

Crucible Capital argues Nvidia’s 2025 OCP announcement of 800V DC distribution for Blackwell, Rubin, and Rubin Ultra is a structural inflection for AI datacenters, because traditional 415V/480V AC designs suffer repeated AC↔DC conversions that leak roughly 10-20% of power and become increasingly impractical as racks move from 10-20 kW historically toward 100 kW to 1 MW+ densities.

Key details

The article ties the power transition directly to Nvidia’s rack roadmap: Blackwell GPUs are cited at about 1.35 kW per GPU, Rubin at up to 3.6 kW per GPU, a single Rubin rack at roughly 900 kW, and SuperMicro’s planned Kyber rack at 1.1 MW for Rubin Ultra shipments expected in 2027.
Solid-state transformers using SiC or GaN switches are presented as a key enabling technology for 800V DC, with claimed benefits including 30-50% smaller footprint than conventional transformers, 150% more power flow through existing conductors, and up to 200 kg of copper savings per rack; the article cites a $115 million SST market in 2025 projected to reach $375 million by 2033 at a 16% CAGR.
For short-duration power quality and transient smoothing, the piece says traditional VRLA UPS systems are too slow for AI loads that can spike to 3x nominal draw within a millisecond; it recommends supercapacitors with microsecond-to-millisecond response, up to 10 kW/kg power density, and more than 1 million cycles, despite costs of $2,500-$10,000/kWh versus $271-$500/kWh for traditional UPS storage.

Brief

Nvidia’s shift to 800V DC power architecture is the centerpiece of Crucible Capital’s thesis about the next phase of AI datacenter design. The argument is that AI workloads have pushed facilities beyond the comfortable operating envelope of legacy AC distribution. Traditional datacenters were built around 415V/480V three-phase AC and modest rack densities of roughly 5-20 kW, but modern AI clusters are now driving rack power into the hundreds of kilowatts and, on the authors’ timeline, toward 1 MW-class systems with Rubin Ultra in 2027. In that regime, repeated AC-to-DC and DC-to-AC conversions become a serious efficiency penalty, with the article citing 10-20% aggregate losses. Nvidia’s proposed architecture converts grid AC to 800V DC once at the site perimeter, distributes DC through busways, and then steps down locally to 54V/12V at the rack. That reduces conversion stages, current, copper requirements, and thermal waste. The piece also emphasizes the strategic role of OCP and Nvidia reference architectures in standardizing not just GPUs, but the entire rack-power-cooling stack, effectively forcing suppliers such as Schneider Electric, SuperMicro, Vertiv, and others to align on Nvidia-led design choices.

The second major theme is that electrical redesign and cooling redesign are inseparable. As rack densities move from traditional 10-50 kW toward 100 kW-1 MW+, the rack itself becomes the constraint rather than the silicon. The article highlights several enabling technologies: solid-state transformers based on silicon carbide or gallium nitride, supercapacitors for sub-second transient smoothing, LFP battery systems for longer backup and load shifting, and 800V-compatible rack sidecars and power shelves. Claimed benefits include 30-50% smaller SST footprint, 150% more power flow through existing conductors, over 98% DC-DC conversion efficiency, up to 60% more compute space from removing legacy PSUs, and 45% less copper in redesigned racks. The article also surfaces a supply-chain angle, especially around gallium, noting China’s dominance in gallium separation as of 2024 and framing advanced power electronics as a critical-minerals problem as much as an electrical-engineering one.

Cooling is treated as the practical bottleneck. Air cooling is portrayed as maxing out around 15-20 kW per rack, making liquid systems mandatory for current and future AI clusters. Direct-to-chip liquid cooling is already standard for Nvidia H200/B200/B300 systems, and Nvidia’s warm-water approach at 45°C is positioned as especially important because it can reduce or eliminate chillers. The authors cite a specific startup example, Elkhorn, whose water-based system reportedly achieved a COP of about 11.8 at an AI datacenter in Newport, Washington, halving cooling power from roughly 170 kW/MW of IT load to 85 kW/MW and improving PUE from 1.18 to 1.09. Beyond facility mechanics, the article stresses software and maintenance: as rack density grows by a claimed 90x over a datacenter’s life, operators need sub-second telemetry, asset-level monitoring, and maintenance orchestration to protect hardware that can cost tens of millions of dollars per MW-scale cluster. Overall, the report is strongest as a synthesis of how AI demand is pushing simultaneous changes in power electronics, storage, cooling, supply chains, and operational software.

By Crucible Capital

Open reader Original

substack.com 2026-02-09 49 min read

CPUs are Back: The Datacenter CPU Landscape in 2026

Why it matters

SemiAnalysis argues datacenter CPUs re-emerged as a bottleneck in late 2025 because AI workloads now require large CPU fleets for reinforcement learning environments, RAG/agent tool use, data preprocessing, and storage orchestration; Microsoft’s “Fairwater” complex reportedly pairs a 48 MW CPU-and-storage building with a 295 MW GPU cluster, roughly a 1:6 CPU-to-GPU power ratio.

Key details

Intel said on its Q4 2025 earnings call that datacenter CPU demand unexpectedly accelerated, prompting higher 2026 foundry tool capex and wafer reallocation from PC to server; SemiAnalysis says frontier AI labs are competing with cloud providers for commodity x86 servers, Intel is considering Xeon price hikes, and AMD expects server CPU TAM growth in the “strong double digits” in 2026.
AMD’s 2026 EPYC Venice moves to 16 DDR5 memory channels and supports MRDIMM-12800 for 1.64 TB/s bandwidth, which the article says is 2.67x Turin; the top Venice part uses eight TSMC N2 Zen 6c CCDs for 256 cores, and AMD claims over 1.7x better performance per watt versus the top 192-core Turin in SPECrate2017_int_base.
Intel’s Diamond Rapids shifts from a giant logically monolithic mesh to four compute dies plus two I/O and memory hub dies, for 16 DDR5 channels, PCIe 6/CXL 3, and up to 256 printed cores with about 192 expected in mainstream SKUs; SemiAnalysis argues removal of SMT on Intel P-cores will materially hurt throughput, estimating a 192-core/192-thread Diamond Rapids may be only about 40% faster than a 128-core/256-thread Granite Rapids.

Brief

Datacenter CPUs are becoming strategically important again, not because GPUs stopped mattering, but because AI systems have become more CPU-hungry around the GPU cluster. SemiAnalysis argues that since late 2025, reinforcement learning, agentic inference, retrieval-heavy workloads, multimodal preprocessing, and the management of petabyte-scale data pipelines have all increased CPU intensity. In training, CPUs execute RL environments, compile and verify code, run math and physics simulations, and keep expensive accelerator clusters fed. In inference, agents generate far more external API calls, database lookups, and internet traffic than classic chatbot serving. The article’s most concrete infrastructure example is Microsoft’s Fairwater site for OpenAI, where a 48 MW CPU/storage building supports a 295 MW GPU cluster, implying CPU demand is no longer incidental to AI buildouts. That demand is also colliding with mainstream cloud consolidation: newer cloud-native CPUs can replace old Intel Cascade Lake fleets at 10:1 socket consolidation ratios or better while using less than one-fifth the power, freeing capacity for GPUs.

On product competitiveness, the piece is notably bullish on AMD and skeptical of Intel. It presents Venice as AMD’s strongest step yet: eight TSMC N2 Zen 6c CCDs, 256 cores, 16 memory channels, 1.64 TB/s with MRDIMM-12800, restored 4 MB L3 per Zen 6c core, and a claimed 1.7x perf/W gain over Turin. SemiAnalysis also highlights AMD’s willingness to keep serving the mainstream 8-channel enterprise segment with a Venice SP8 platform, just as Intel has reportedly cancelled Diamond Rapids-SP. Intel’s roadmap is framed as architecturally ambitious but commercially compromised: Diamond Rapids adopts a more AMD-like multi-die topology with four compute building blocks and two memory/I/O hub dies, but loses SMT on P-cores, which the authors think badly undermines datacenter throughput. Clearwater Forest, meanwhile, is treated as a costly learning vehicle for 18A and Foveros Direct rather than a volume winner, with only 17% better performance than Sierra Forest after a delay into H1 2026.

The broader market structure is also shifting. Hyperscalers are no longer just Arm licensees; they are some of the most capable datacenter CPU vendors. AWS Graviton5 doubles to 192 Neoverse V3 cores and feeds Trainium3 head nodes, Microsoft’s Cobalt 200 jumps to 132 Neoverse V3 cores and 50% better performance, and Google is splitting Axion between higher-performance C4A and scale-out N4A variants. NVIDIA is evolving from Grace to Vera with far higher coherent bandwidth and memory capacity to support GPU-centric systems, while Arm itself is crossing a line from IP licensor to CPU supplier with Phoenix for Meta and possibly OpenAI-linked systems. For anyone tracking AI infrastructure, the important takeaway is that the next datacenter bottleneck may not be just accelerators or power delivery, but the surrounding layers of CPUs, DRAM, packaging, and interconnect needed to make GPU clusters useful.

By SemiAnalysis

Open reader Original

substack.com 2026-02-13 100 min read

Dario Amodei — "We are near the end of the exponential"

Why it matters

In a 142-minute interview published on 2026-02-13, Anthropic CEO Dario Amodei said he is roughly 90% confident that within 10 years there will be “a country of geniuses in a data center,” and said his weaker near-term hunch is 1-3 years, with coding likely reaching end-to-end capability in 1-2 years for verifiable tasks.

Key details

Amodei argued that the same “Big Blob of Compute” thesis he held in 2017 still explains progress: the main drivers are raw compute, quantity of data, data quality and breadth, training duration, scalable objective functions, and training-stability techniques such as normalization and conditioning; he said RL scaling is now showing the same log-linear behavior previously seen in pretraining.
He said Anthropic has seen revenue compound at an extraordinary pace—roughly $0 to $100 million in 2023, $100 million to $1 billion in 2024, and $1 billion to $9-10 billion in 2025—with “another few billion” added in January 2026 alone, which he presented as evidence that diffusion is fast even if not instantaneous.
On coding productivity, Amodei distinguished between weak and strong milestones: he said his earlier prediction that AI would write 90% of code within 3-6 months has already happened in some places, including Anthropic, but emphasized that this is far short of 90-100% of end-to-end software engineering tasks such as compiling, testing, environment setup, writing design docs, and setting technical direction.

Brief

Dario Amodei’s core claim is that AI progress is still following a fairly simple scaling story, but that the world has not internalized how close that trajectory may be to transformative outcomes. He frames today’s systems as the continuation of a thesis he has held since 2017: intelligence emerges less from bespoke clever tricks than from pouring compute through sufficiently broad data distributions under objectives that scale cleanly. In his telling, the old pretraining scaling laws have not broken; instead, reinforcement learning now appears to be extending the same pattern. He points to public reports of log-linear performance improvements with more RL training on verifiable tasks such as AIME-style math and says Anthropic is seeing similar behavior across a wider range of domains. That leads him to treat the current pretraining-plus-RL stack not as a dead end but as evidence that the main recipe is still working.

Where Amodei departs from many skeptics is in his willingness to map those scaling curves directly onto very aggressive capability forecasts. He says he is around 90% confident that within 10 years we will have what he calls “a country of geniuses in a data center,” and his personal hunch is much shorter—roughly 1-3 years. He is most confident on tasks with verifiable feedback, especially software engineering, where he says end-to-end coding is effectively guaranteed inside a decade and likely 1-2 years away. He is somewhat less certain on domains where verification is weaker—scientific discovery, Mars mission planning, novel writing—but he argues that generalization from verifiable to less verifiable domains is already visible. On the contentious issue of continual learning, he takes a surprisingly minimalist stance: these systems may not need human-like lifetime learning to become economically dominant. Broad pretraining, RL generalization, and long-context in-context learning may already cover most of the gap, with continual learning potentially arriving as an extra capability rather than a prerequisite.

A major theme of the interview is the distinction between capability growth and economic diffusion. Amodei repeatedly argues that the technology curve and the adoption curve are both exponential, but not identical. He dismisses the idea that diffusion nullifies AI progress, yet insists deployment will still be bottlenecked by enterprise procurement, compliance, security review, organizational change management, and the physical pace of closing loops in real workflows. As evidence that adoption is already very fast, he offers Anthropic’s revenue numbers: roughly $100 million in 2023, $1 billion in 2024, and $9-10 billion in 2025, with another few billion allegedly added in January 2026 alone. Even so, he says that is not the same as instant absorption of AGI-scale capability into GDP. The same logic underlies his defense of Anthropic’s compute posture. If model capability reaches “country of geniuses” status in 2026 or 2027 but monetization lags by 1-5 years, overcommitting to trillion-dollar annual compute purchases could bankrupt a lab that is directionally right but off by a single year.

That leads into his economic model of frontier labs, which he portrays less as speculative money furnaces than as businesses whose losses are largely a timing artifact of compute scale-up. His stylized picture is that individual model generations can already have strong positive gross margins on inference, but firms remain unprofitable because every profitable model finances a much larger next model. In a more mature equilibrium, he expects a small-number-of-firms market analogous to cloud infrastructure: very high barriers to entry, differentiated products, positive but not monopolistic margins, and a meaningful though not dominant share of compute continuously devoted to R&D. He pushes back on the view that profitable AI labs must be underinvesting, arguing that scaling returns are approximately log-linear, so there is a rational interior optimum rather than a need to push 95-100% of resources into training. Notably, he also quantifies the physical side of the buildout: industry AI power demand at 10-15 GW in 2026, rising 3x yearly toward ~300 GW by 2029, implying multi-trillion-dollar annual compute expenditures if that trajectory persists.

On policy and geopolitics, Amodei is simultaneously pro-build and hawkish. He rejects a blanket 10-year moratorium on state AI laws absent federal replacement, arguing that the world may face meaningful AI-enabled bioterrorism and autonomy risks well before then. His preferred sequence is transparency first, then targeted regulation such as mandatory biological-risk classifiers if threat evidence hardens. He also worries that AI may create offense-dominant equilibria in which one actor can do catastrophic harm, making traditional balance-of-power assumptions unreliable. That concern carries into his China stance: he does not think both the US and China should simply race to build symmetric “countries of geniuses,” because advanced AI could stabilize authoritarian control internally and destabilize deterrence externally. At the same time, he distinguishes restricting frontier compute and chips from restricting downstream benefits, suggesting AI-enabled drugs and development should spread widely, especially to the developing world. The broader implication is that Amodei sees the bottleneck after AGI less in invention than in governance, distribution, and institutional adaptation—and believes those questions are arriving on a timeline measured in years, not decades.

By Dwarkesh Patel

Open reader Original

TechCrunch 2026-03-24 1 min read

Crusoe makes big battery buys for its data centers | TechCrunch

Why it matters

Crusoe said it will buy 12 GWh of Form Energy’s 100-hour iron-air batteries, with deliveries starting in 2027; this follows Form’s reported 30 GWh Google deal in Minnesota that The Information said was worth about $1 billion.

Key details

Form Energy’s battery chemistry uses iron pebbles and atmospheric oxygen: discharge oxidizes the iron into rust to generate electricity, and charging reverses the process by electrically reducing the rust and releasing oxygen.
Crusoe is also expanding its Redwood Materials partnership after operating a 12 MW / 63 MWh second-life battery microgrid installation since June; Redwood will add another 8 MW using repurposed EV batteries.

Brief

Crusoe is scaling energy storage for data center infrastructure with two distinct battery strategies: long-duration storage from Form Energy and second-life EV battery systems from Redwood Materials. The 12 GWh Form purchase is a notable commercial validation of 100-hour iron-air storage, while Crusoe’s existing 12 MW / 63 MWh microgrid and planned 8 MW expansion show how reused EV packs may complement large-scale backup and load-shifting needs for AI-oriented power demand.

By Tim De Chant

Open reader Original

Carbon Brief 2026-03-19 9 min read

China Briefing 19 March 2026: China joins nuclear pledge | Energy approach ‘vindicated’ | New ecological code

Why it matters

China’s final 15th five-year plan, published 13 March 2026, kept renewables at the center of the energy system, added an explicit reference to the new ecological and environmental code, called for actively promoting geothermal energy, and was paired with a new law requiring future long-term plans to account for “environmental constraints”.

Key details

China joined the international pledge to triple global nuclear capacity from 2020 to 2050, but its domestic buildout is lagging: it missed targets of 58GW by 2020 and 70GW by 2025, had only 62GW of nuclear capacity at end-2025, and may also miss its 110GW by 2030 goal, according to Bloomberg citing World Nuclear Association China director Francois Morin.
China’s response to the US-Israel war on Iran included an immediate March ban on exports of refined fuels such as petrol, diesel, and aviation fuel; the country had built a crude surplus of 1.2m barrels per day in January-February 2026 and holds an estimated 1.4bn-barrel stockpile, improving resilience to disruption around the Strait of Hormuz.
The geopolitical shock has strengthened the case for China’s clean-energy strategy: analysts cited by Politico, CNBC, and Inside Climate News argued that large renewable additions and crude stockpiles reduce China’s exposure to gas and oil volatility, though coal is still expected to provide system flexibility and oil remains critical as a petrochemical feedstock.

Brief

China’s March 2026 climate-and-energy policy package reinforces a familiar but increasingly formalized strategy: rapid clean-energy expansion, continued attention to energy security, and gradual legal codification of long-term decarbonization. The final 15th five-year plan changed little from the earlier draft, but it sharpened the institutional backdrop by referencing the newly passed ecological and environmental code and by adding geothermal energy to the list of promoted technologies. National Energy Administration head Wang Hongzhi framed 2026-2030 as both the decisive period for peaking carbon emissions and a critical phase for building a “new energy system,” emphasizing market-based pricing reform to enable fossil-fuel replacement without compromising reliability.

The briefing’s most important new development is legal, not just technological. China’s ecological and environmental code gives statutory footing to the “dual-carbon” goals of peaking by 2030 and neutrality by 2060, while embedding total-emissions caps, carbon-intensity controls, carbon trading, footprint management, and climate-related enforcement into a more rules-based governance structure. At the same time, China signed the global pledge to triple nuclear capacity by 2050, even though its own nuclear rollout has undershot recent targets, reaching only 62GW by end-2025 versus a 70GW target. The Middle East oil shock underscores why Beijing continues to pursue a diversified system: crude stockpiles and renewables cushion import risk, but oil and coal still retain strategic roles in petrochemicals, industrial feedstocks, and power-system flexibility.

By Anika Patel

Open reader Original

Stratechery by Ben Thompson 2026-03-17 32 min read

An Interview with Nvidia CEO Jensen Huang About Accelerated Computing

Why it matters

Jensen Huang framed Nvidia as an "accelerated computing" company rather than a GPU vendor, arguing that AI agents will need existing human software such as SQL databases, Excel, Photoshop, Synopsys, and Cadence to be "super-accelerated"; he said Nvidia’s long-standing model is to rewrite CPU-era algorithms for GPUs to deliver 10x, 50x, or 100x speedups across more industries.

Key details

Huang said building a 1-gigawatt AI factory now costs roughly $50-60 billion, with about $15-17 billion going to land, power, and shell infrastructure and the rest to compute, networking, and storage; he argued Nvidia must help customers design the entire stack because otherwise projects are overdesigned and capital risk is too high.
On model architecture, Huang said transformers are insufficient for all domains because attention scales quadratically and long-context KV cache becomes unwieldy; he cited Nvidia’s Nemotron 3 hybrid transformer-plus-SSM architecture for better intelligence and efficiency, and pointed to geometry-aware approaches like cuEquivariance for physically constrained domains such as protein, chemical, and simulation workloads.
Huang argued the key AI shift over the last year was better reasoning, reflection, retrieval, and search, which reduced hallucinations and enabled grounded tool use; he said Nvidia engineers now use coding agents universally, with many "not hav[ing] generated a line of code in a while," and described coding as a distinct modality that must be grounded in successful execution rather than token-by-token plausibility.

Brief

Nvidia’s 2026 strategy, as described by Jensen Huang in this Stratechery interview after GTC, is to expand the meaning of accelerated computing from chips and training clusters into full-stack AI factories. Huang’s core thesis is that agents will not live solely inside chat interfaces or model APIs; they will increasingly operate the software humans already use, from SQL databases to desktop applications and EDA tools. That pushes Nvidia beyond selling GPUs into accelerating legacy software, designing CPUs, integrating networking and storage, and helping customers construct entire data-center-scale systems. He quantified the scale of the new buildout in unusually concrete terms: a 1 GW AI factory could cost $50-60 billion, of which roughly $15-17 billion is just land, power, and shell. His point is that customers will not risk tens of billions of dollars unless Nvidia can de-risk throughput, utilization, and system integration across cooling, power, networking, and compute. That framing is particularly relevant to the data-center and infrastructure buildout around AI, because Huang treats power as the fundamental economic constraint and argues system-level co-design is the only way to maximize intelligence produced per watt and per dollar.

On the technical side, Huang drew a line between the first wave of generative AI and the current wave of useful AI. He said reasoning, reflection, retrieval, and search improved enough over the last year to ground models, reduce hallucinations, and enable paid applications—especially coding agents. His description of coding as a separate modality was notable: code must be generated in coherent chunks and validated by execution, not merely by token probability. He also argued that transformers alone are not enough for future workloads, citing quadratic attention costs, KV-cache bloat, and the need for new architectures for continuous motion, geometric symmetry, and physical constraints. He named Nemotron 3’s transformer-plus-SSM hybrid and cuEquivariance as examples. At the infrastructure layer, he defended Nvidia’s CPU effort as complementary to GPUs rather than a reversal: the goal is to prevent expensive accelerators from sitting idle. Vera, he said, emphasizes very high single-thread performance and 3x higher bandwidth-per-CPU than prior designs to support agent tool use over NVLink. The Groq acquisition fits the same logic: Nvidia wants finer-grained heterogeneous inference, including disaggregating pieces of decode attention, to push latency-sensitive coding and enterprise agent workloads beyond what a general GPU-only setup can economically deliver.

The interview also highlighted constraints and geopolitics. Huang said the bottleneck in 2026 is not one thing but everything at once—power, fabs, supply chain, and site readiness—though he sounded confident in Nvidia’s planning across hundreds of suppliers. More striking was his China argument: keeping an American AI stack present in China is, in his view, strategically essential because Chinese labs and open-source communities are too important to ignore. He explicitly praised DeepSeek, Kimi, and Qwen as technically meaningful contributors and warned that exclusion could let rival ecosystems harden across chips, platforms, models, and applications. That makes the conversation highly relevant not just as an Nvidia profile, but as a window into how the dominant AI infrastructure vendor thinks about power markets, data-center economics, hardware/software co-design, and U.S.-China competition.

By Ben Thompson

Open reader Original

Epoch AI 2025-11-04 4 min read

Introducing the Frontier Data Centers Hub

Why it matters

Epoch AI launched the Frontier Data Centers Hub, an open-source database tracking 13 major U.S. AI data centers via satellite imagery, public permits, and other open sources; together these sites account for about 2.5 million H100-equivalents, roughly 15% of the approximately 15 million H100e delivered globally by late 2025.

Key details

The hub estimates total facility power rather than just IT load, noting facility power is typically about 1.3x server power; it projects five sites to cross 1 GW in 2026: Anthropic–Amazon New Carlisle (January), xAI Colossus 2 (February), Microsoft Fayetteville (March, borderline), Meta Prometheus (May), and OpenAI Stargate Abilene (July).
Epoch AI infers power capacity from visible cooling infrastructure such as chillers and cooling towers, cross-checking satellite and aerial imagery against permits and public disclosures; across tracked facilities, time from construction start to 1 GW ranges from 1 to 3.6 years, with xAI targeting 12 months for Colossus 2.
On compute, Epoch AI estimates xAI’s Colossus 2 will reach 1.4 million H100e in 2026 versus about 100,000 H100e for the top facilities in mid-2024, while Meta Hyperion and Microsoft Fairwater are projected to reach roughly 5 million H100e each when fully built.

Brief

Epoch AI’s Frontier Data Centers Hub is a useful new primary-source-style dataset on the physical buildout of frontier AI infrastructure, focusing on power, compute, construction timing, and capital intensity. The most interesting contribution is methodological: rather than relying mainly on company announcements, the project estimates capacity from high-resolution satellite imagery of cooling systems and corroborates those estimates with permits and hardware assumptions, giving outside observers a way to independently track opaque hyperscale projects. The numbers underscore how quickly the industry is moving from large campuses to nation-scale power loads: multiple facilities are expected to exceed 1 GW in 2026, and the leading site, xAI’s Colossus 2, is projected at 1.4 million H100-equivalents. Epoch also highlights phased commissioning, projecting Microsoft Fairwater’s first phase in March 2026 and full operation by September 2027, while estimating top-end capital costs above $100 billion for the largest campuses.

By The Epoch Ai Team

Open reader Original

substack.com 2026-02-04 4 min read

Where the Grid Goes from Here | Reading and Podcast Picks - Feb. 4, 2026

Why it matters

During Texas’s early-2026 winter storm, battery storage contributed a morning peak injection that helped lower wholesale prices; the grid’s increased solar and battery capacity since 2021 and winterization of plants reduced stress compared with Winter Storm Uri.

Key details

ERCOT projects peak demand rising from about 87 GW in 2025 to roughly 145 GW by 2031, driven mainly by large new users; 5,302 MW of data-center demand was added since 2022 and forecasts show data centers could exceed 24,000 MW by decade’s end.
Experts warn that while resource diversity (renewables + storage) improved resilience, rapidly growing large loads pose a ‘stress test’ for infrastructure — Matthew Boms (Texas Advanced Energy Business Alliance) says system capacity must keep pace with demand growth.

Brief

Texas power grid performance during the early-2026 winter storm benefited from rapid additions of solar and battery storage and winterization measures, with batteries injecting power at the morning peak and reducing prices. ERCOT now forecasts peak demand climbing from ~87 GW (2025) to ~145 GW (2031), driven largely by data centers (5,302 MW added since 2022; >24,000 MW projected).

By Texas Energy and Power Newsletter

Open reader Original

substack.com 2026-02-03 5 min read

The Private Company Bringing Nuclear Enrichment Back to America (Scott Nolan, CEO of General Matter)

Why it matters

General Matter, founded by ex-SpaceX engineer and Founders Fund partner Scott Nolan, secured the historic Paducah, Kentucky enrichment site and a $900 million DOE award less than a year after emerging from stealth (article pub. 2026-02-03).

Key details

The U.S. now imports most commercial enrichment capacity after once controlling ~86% globally; about 20% of the U.S. grid runs on nuclear and roughly a quarter of reactor fuel faces a hard stop as Russian HEU/LEU supplies are banned in 2028.
General Matter is targeting production of LEU and HALEU (noting 20% enrichment as a critical threshold for some next‑gen reactors), applying a SpaceX-style rapid build/playbook approach and projecting nuclear could grow 3–4x by 2050.

Brief

General Matter, led by Scott Nolan, is rebuilding U.S. uranium enrichment capacity by reviving Paducah and winning a $900M DOE contract to avoid a 2028 supply cliff after Russian restrictions; the move addresses a supply chain where the U.S. fell from ~86% of global enrichment to near zero commercially, threatens a quarter of reactor fuel, and focuses on LEU/HALEU needs (20% enrichment threshold).

By The Generalist

Open reader Original

The Texas Energy and Power Newsletter 2026-01-21 33 min read

Is Texas Ready for Winter Now? (with Will McAdams)

Why it matters

Five-year anniversary of Winter Storm Uri approaches in early February 2026; Will McAdams says the Texas grid is better prepared for the incoming arctic blast but faces a bigger challenge: absorbing rapid load growth (ERCOT’s large-load interconnection queue grew nearly 300% last year).

Key details

Weatherization and enforcement strengthened: ERCOT runs seasonal inspections with “hundreds” of inspectors, the PUC imposed conservative ambient-weather standards (e.g., −17°C design standard for Panhandle plants) and can fine generators up to $1,000,000 per day per incident for noncompliance.
Battery capacity has jumped from about 1–1.5 GW in 2021 to roughly 11–15 GW today (many projects due online before summer); McAdams argues batteries would have arrested the frequency freefall in Uri and now act as system and price 'shock absorbers.'
Distributed resources and ADER (Aggregate Distributed Energy Resource) pilots are advancing (seven ADER pilots running); major remaining hurdles are telemetry/interoperability and settling locational (nodal) prices so DERs can capture true value — unlocking DERs could yield roughly $1,850 saved per customer over 10 years per recent study.

Brief

Texas’ grid five years after Winter Storm Uri is materially stronger: regulators introduced stringent winterization standards (ambient-temperature specs up to −17°C in the Panhandle), ERCOT hired hundreds of inspectors, and the PUC gained enforcement power including fines up to $1M/day per incident. Uri’s cascading failure — a rapid loss of ~4 GW in under 30 minutes that triggered frequency collapse and four days of outages — motivated these changes.

The system has also added dispatchable battery capacity (from ~1 GW in 2021 to ~11–15 GW now), which McAdams says would have arrested frequency freefall and stabilised the system. The defining issue ahead is unprecedented load growth: ERCOT’s interconnection queue includes roughly 2,000 projects (~435 GW) and surged ~300% last year. Policy and market innovations — ADER virtual power plants, nodal settlement for DERs, residential demand-response telemetry — are positioned to mobilize distributed resources, reduce peak stress, defer wires, and lower customer bills if properly implemented.

By Matt Boms

Open reader Original

The Texas Energy and Power Newsletter 2026-01-08 26 min read

More Power that's Faster and Fairer — Roundtable Discussion

Why it matters

Data-center driven load growth is reshaping Texas: ERCOT's current peak is ~85.5 GW while panelists say the interconnection queue contains ~225 GW of large load by 2030, creating planning stress (roundtable published 2026-01-08).

Key details

Speed-to-power is now treated as a grid resource — participants (Matt Boms, Joshua Rhodes, Micalah Spenrath) urged fast, close-to-load solutions such as DERs, demand response, backup power, and energy waste reduction to relieve transmission bottlenecks.
Cost and procurement pressures: Josh Rhodes noted recent capital cost inflation (new natural gas plants ~2.5x cost vs a few years ago; transformers roughly 2x), making expensive rapid buildouts long-lived fiscal commitments and necessitating new cost-sharing models.
Policy and programs moving fast: Texas tools include Senate Bill 6, active PUC dockets, and ERCOT interconnection adjustments; distributed battery pilots (ADER, with vendors like Tesla referenced) and private PPAs/co‑located solar+storage are highlighted as scalable near-term responses.

Brief

The January 8, 2026 Energy Capital roundtable (hosts Matt Boms, Joshua Rhodes, Micalah Spenrath) framed 2025 as a year in which uncertain, rapid load growth — especially from data centers and AI-related facilities — upended conventional planning. Panelists contrasted ERCOT's ~85.5 GW peak today with roughly 225 GW of large-load requests in the queue by 2030, warning that timelines for transmission, interconnection, and generation can't keep pace without new approaches. Speakers argued speed is itself a grid constraint and recommended prioritizing fast, local flexibility: distributed energy resources, demand response, backup power, and distributed batteries (ADER pilots with vendors like Tesla were cited). They also flagged policy levers (Senate Bill 6, PUC dockets), private PPAs and co‑located solar+storage as near-term solutions, and stressed that high near-term capital costs (e.g., gas plants ~2.5x, transformers ~2x) require fair cost-sharing and community-focused siting to preserve reliability and equity.

By Matt Boms

Open reader Original

The Texas Energy and Power Newsletter 2025-12-21 3 min read

Power Projects Canceled as Demand Rises, Reading & Podcast Picks, December 21, 2025

Why it matters

Cleanview report (Michael Thomas) finds 1,891 U.S. power projects canceled in 2025, removing 266 GW of planned capacity — roughly 150% the size of ERCOT — and an estimated $400 billion in lost investment.

Key details

Primary cancellation drivers cited: local opposition, inadequate transmission, battery saturation, tariffs, and federal actions targeting renewables; Cleanview notes grid operators have also cleaned up interconnection queues.
ERCOT board approved a $9.4 billion, 765-kilovolt transmission "superhighway" across the Houston region to more than double capacity compared with existing 345-kV lines (reported Dec 21, 2025).

Brief

Cleanview's December 2025 report (Michael Thomas) finds 1,891 U.S. power projects canceled in 2025, eliminating 266 GW of planned capacity — about 150% of ERCOT — and an estimated $400 billion in investment. Causes include local opposition, transmission shortfalls, battery saturation, tariffs and federal policy; ERCOT approved a $9.4B, 765-kV transmission "superhighway".

By Texas Energy & Power Media

Open reader Original

The Texas Energy and Power Newsletter 2025-12-03 2 min read

Texas Large Load Queue Continues Phenomenal Growth, Texas Grid Roundup #83

Why it matters

ERCOT's large-load interconnection queue reached 225 GW as of Dec 2025, up from 99 GW in Feb 2025, with roughly 30 GW added in the last two months.

Key details

Only 6.6 GW are energized in 2025 (plus 0.6 GW scheduled for 2026); PUC staff filed a discussion draft to regulate the interconnection process (comments due Dec 19) and the ERCOT Board will vote on the eastern half of the Strategic Transmission Expansion Plan to build 765-kV lines.

Brief

The Texas large-load queue hit 225 GW (up from 99 GW in Feb), adding ~30 GW in two months; only 6.6 GW are energized with 0.6 GW due in 2026. PUC filed an interconnection draft (comments due Dec 19) and ERCOT will vote on the eastern 765-kV plan.

By Texas Energy & Power Media

Open reader Original

Renew Economy 2026-02-18 3 min read

New big battery kicks off commercial operations next to outage-prone Queensland coal plant

Why it matters

Stanwell began commercial operations (announced Feb 2026) at the 300 MW, 2-hour (600 MWh) Tarong Battery—164 Tesla Megapacks—built for $514 million adjacent to the Tarong coal power station to provide sub-second firming and boost the site’s output to 2.1 GW.

Key details

Queensland coal reliability was poor in Apr–Sep 2025: an average 26% of coal capacity offline, 69 outages (61 unplanned); Gladstone had 33 unscheduled outages and Tarong North averaged only 33% availability (planned outage).
Stanwell plans more storage: a separate 300 MW / 1,200 MWh battery due by mid‑2027 and a target of 5 GW of battery storage built/contracted by 2035.

Brief

Stanwell’s Tarong Battery entered commercial operations in February 2026: a 300 MW, two‑hour (600 MWh) lithium‑ion system of 164 Tesla Megapacks costing $514 million, sited beside Tarong coal station to deliver sub‑second firming and raise the site to 2.1 GW. The build responds to high coal outages in Apr–Sep 2025 and is one piece of Stanwell’s plan for 5 GW of batteries by 2035.

By Sophie Vorrath

Open reader Original

Epoch AI 2024-09-19 2 min read

The power required to train frontier AI models is doubling annually

Why it matters

Epoch AI (Luke Emberson & Robi Rahman, Sep 19, 2024) estimates training power draw for frontier models has grown ~2.1x per year (90% CI 1.9–2.2x) using a log-linear fit over 45 frontier models (2010+).

Key details

The increase is driven mainly by rising GPU counts (per-GPU power up only a few %/yr); training compute grew ~4x/yr, while hardware efficiency (12x over 10 years), low‑precision adoption (8x) and longer runs (4x) reduced power per unit compute.

Brief

Power required to train frontier AI models has increased roughly 2.1x/year (90% CI 1.9–2.2x) across 45 frontier models since 2010, per Epoch AI (Sep 19, 2024); analysis uses 211 models and a log-linear fit.

By Luke Emberson, Robi Rahman

Open reader Original

OpenAI 2026-01-15 2 min read

Strengthening the US AI supply chain through domestic manufacturing

Why it matters

OpenAI published an RFP on 2026-01-15 with a submission deadline of June 2026 to fund U.S.-based manufacturing for AI supply chains across data-center inputs, consumer electronics, and robotics.

Key details

The RFP builds on OpenAI’s Stargate initiative (launched ~one year ago), which has announced planned capacity already well over halfway to its 10‑gigawatt commitment.
Targeted components include modules, tooling and final assembly, compute/power/cooling/data-center hardware, and robotics inputs such as gearboxes, motors, and power electronics; proposals will be reviewed on a rolling basis to inform procurement and infrastructure planning.

Brief

OpenAI launched a Request for Proposals on January 15, 2026 to accelerate U.S. manufacturing for AI infrastructure, building on its Stargate initiative (nearly one year old) that has pledged capacity already well over halfway toward a 10‑gigawatt commitment. The RFP targets domestic production of data‑center inputs, compute/power/cooling hardware, consumer‑electronics modules, and robotics components; proposals are due June 2026.

Open reader

OpenAI 2025-10-27 3 min read

Seizing the AI opportunity

Why it matters

OpenAI warns of an "electron gap": China added 429 GW of new power capacity in 2024 while the US added 51 GW, and urges a national project to build 100 GW/year of new energy capacity to sustain AI leadership.

Key details

OpenAI is building Stargate sites in Texas, New Mexico, Ohio, and Wisconsin that will add nearly 7 GW of compute capacity and over $400 billion in investment over the next three years, contributing toward a pledged $500 billion and 10 GW commitment by end of 2025.
Usage and economic impacts: weekly users doubled from >400 million to >800 million in seven months; OpenAI's internal analysis estimates the first $1 trillion in AI infrastructure could raise GDP by >5% over three years, and the US will need an estimated 20% of its current skilled trades workforce over five years to build/operate new data center and energy infrastructure.

Brief

OpenAI argues the US must rapidly expand electricity generation and skilled labor to preserve AI leadership, noting China added 429 GW in 2024 versus the US's 51 GW and urging a 100 GW/year national build. It cites user growth from >400M to >800M weekly, a $500B/10 GW Stargate pledge (nearly 7 GW and $400B in three years), and a 20% skilled-trades demand over five years.

Open reader

OpenAI 2025-10-06 2 min read

AMD and OpenAI announce strategic partnership to deploy 6 gigawatts of AMD GPUs

Why it matters

OpenAI and AMD signed a multi-year, multi-generation agreement to deploy 6 gigawatts of AMD Instinct GPUs, with an initial 1 gigawatt deployment of AMD Instinct MI450 series racks starting in 2H 2026.

Key details

AMD issued OpenAI a warrant for up to 160 million shares of AMD common stock, with the first tranche vesting on the initial 1 GW deployment and additional tranches vesting as purchases scale to 6 GW and as share-price and OpenAI technical/commercial milestones are met.
AMD says the partnership builds on prior MI300X/MI350X collaborations, targets rack-scale AI solutions across future GPU generations, and is expected to deliver 'tens of billions' in revenue and be highly accretive to AMD's non-GAAP EPS.

Brief

AMD and OpenAI announced a multi-year, multi-generation partnership to deploy 6 gigawatts of AMD Instinct GPUs, beginning with a 1 GW roll-out of MI450 series racks in 2H 2026. AMD granted OpenAI warrants for up to 160 million shares tied to deployment, share-price, and technical milestones; AMD forecasts 'tens of billions' in revenue and accretive non-GAAP EPS.

Open reader

OpenAI 2025-10-01 2 min read

Samsung and SK join OpenAI’s Stargate initiative to advance global AI infrastructure

Why it matters

On 2025-10-01 OpenAI announced Samsung Electronics and SK hynix joined its Stargate initiative after a Seoul meeting with President Lee Jae‑myung, Samsung Executive Chairman Jay Y. Lee, SK Chairman Chey Tae‑won, and OpenAI CEO Sam Altman.

Key details

Samsung and SK will scale advanced memory chip production targeting 900,000 DRAM wafer starts per month and explore new AI data centers in Korea via an MoU with the Ministry of Science and ICT and partnerships with SK Telecom, Samsung C&T, Samsung Heavy Industries, and Samsung SDS; OpenAI will deploy ChatGPT Enterprise and APIs into partner operations.

Brief

OpenAI's Stargate initiative, joined on Oct 1, 2025 by Samsung and SK, commits to scaling advanced DRAM production (targeting 900,000 wafer starts/month) and expanding AI data-center capacity in Korea through an MoU with MSIT and partnerships with SK Telecom and Samsung affiliates; deployments include ChatGPT Enterprise and APIs to support enterprise workflows and regional AI growth.

Open reader Original

OpenAI 2025-05-07 4 min read

OpenAI’s response to the Department of Energy on AI infrastructure

Why it matters

On May 7, 2025 OpenAI submitted proposals to the U.S. Department of Energy urging urgent federal investment in AI infrastructure, estimating “hundreds of billions” of global capital is available and saying deployment will create “tens of thousands” of skilled‑trade and other jobs.

Key details

OpenAI launched the Stargate program in 2025; the first Stargate supercomputing campus is under development in Abilene, Texas, and OpenAI is pursuing additional U.S. sites plus an "OpenAI for Countries" program to attract foreign investment into Stargate.
OpenAI recommends DOE enable public solicitations for AI hubs, streamlined permitting (categorical exclusions, shot clocks, surge staffing, AI‑powered permitting tools), predictable leases, competitive electricity tariffs, targeted tax incentives, and notes its January 2025 multi‑year partnership with Los Alamos National Laboratory (LANL).

Brief

OpenAI submitted a May 7, 2025 response to the DOE RFI arguing the U.S. must act quickly to build AI supercomputing hubs, citing “hundreds of billions” in private capital and potential for “tens of thousands” of jobs. The proposal highlights Stargate (first campus in Abilene, Texas), a Jan 2025 LANL partnership, and recommends streamlined permitting, lease predictability, and financial incentives for co‑located federal‑land deployments.

Open reader Original

OpenAI 2024-11-04 2 min read

OpenAI’s comments to the NTIA on data center growth, resilience, and security

Why it matters

OpenAI submitted comments to the NTIA on 2024-11-04 noting its forecast that constructing and operating a single 5 GW data center could create or support about 40,000 jobs and add $17–$20 billion to a state’s GDP.

Key details

OpenAI warns about $175 billion in global infrastructure funds waiting to be committed and says if those flows don't back US projects they may go to China-backed projects that could entrench autocratic power.
The letter ties US AI leadership to past broadband policy (citing the 1996 Telecommunications Act), and urges investment in AI infrastructure—cleaner energy grids, nuclear power, and domestic semiconductor manufacturing—to spur reindustrialization and competitiveness.

Brief

OpenAI’s Nov 4, 2024 comments to the NTIA argue that data center policy is critical for U.S. competitiveness: their outside forecast estimates a single 5 GW data center would support ~40,000 jobs and contribute $17–$20 billion in state GDP. OpenAI highlights $175 billion in global infrastructure capital and urges US-focused investments in energy, nuclear, and semiconductor capacity to retain AI leadership.

Open reader Original

OpenAI 2022-12-08 5 min read

Discovering the minutiae of backend systems

Why it matters

OpenAI blog post (published 2022-12-08) profiles a backend engineer responsible for large-scale supercomputing clusters, addressing low-level issues like NUMA locality, Nvidia GPUDirect, CPU pinning, and NIC problems (example: pushing >30 Gbps could trigger a kernel panic).

Key details

The team operates at unprecedented scale—models train on "billion-dollar" supercomputers—and an upstream kernel change from their work reportedly saved ~6 days of compute across the fleet per week.
Author joined OpenAI mid-2020, tracks motivating progress via Slack (:meow_party:) with >400 tagged posts (~4/week); typical week: meetings on Tuesday, remainder split between debugging, design docs, hotfixes, and coding.

Brief

An OpenAI backend engineer (blog post published 2022-12-08) describes managing the company’s large-scale supercomputing clusters—debugging NUMA, GPUDirect, CPU pinning, and NIC issues (e.g., >30 Gbps kernel panic) and shipping fixes from hotfixes to upstream kernel changes; one tweak reportedly saved ~6 days of compute across the fleet per week, speeding research.

Open reader Original

OpenAI 2025-10-30 2 min read

Expanding Stargate to Michigan

Why it matters

OpenAI announced a new Stargate campus in Saline Township, Michigan (published 2025-10-30); Related Digital will develop it with construction expected to begin in early 2026 and create more than 2,500 union construction jobs.

Key details

The Michigan site is part of OpenAI's 4.5 GW partnership with Oracle; combined with six U.S. Stargate sites (with Oracle and SoftBank) this brings planned capacity to over 8 GW and more than $450 billion in investment over the next three years toward a $500 billion, 10 GW commitment announced in January.
The campus will use a closed-loop cooling system to significantly reduce water consumption, and DTE Energy will serve the site using existing excess transmission capacity; any grid upgrades will be funded by the project rather than local ratepayers.

Brief

The Stargate Michigan campus in Saline Township, announced 2025-10-30, will be developed by Related Digital with construction starting in early 2026 and >2,500 union construction jobs. It's part of OpenAI's 4.5 GW Oracle partnership and—together with six U.S. Stargate sites (Oracle, SoftBank)—pushes planned capacity past 8 GW and >$450B investment, using closed-loop cooling and DTE's excess transmission capacity.

Open reader Original

OpenAI 2025-09-22 2 min read

OpenAI and NVIDIA announce strategic partnership to deploy 10 gigawatts of NVIDIA systems

Why it matters

OpenAI and NVIDIA signed a letter of intent (Sept 22, 2025) to deploy at least 10 gigawatts of NVIDIA systems—described as “millions of GPUs”—for OpenAI’s next-generation AI infrastructure.

Key details

NVIDIA intends to invest up to $100 billion in OpenAI progressively as each gigawatt is deployed; the first gigawatt is targeted to come online in H2 2026 on NVIDIA’s Vera Rubin platform.

Brief

OpenAI and NVIDIA announced a strategic partnership (Sept 22, 2025) to deploy a minimum of 10 GW of NVIDIA systems—characterized as millions of GPUs—to power training and inference for next-generation models. NVIDIA may invest up to $100 billion incrementally tied to gigawatt deployments; the initial 1 GW phase is slated for H2 2026 on the Vera Rubin platform, with roadmap co-optimization and partner coordination (Microsoft, Oracle, SoftBank).

Open reader Original

OpenAI 2025-07-31 2 min read

Introducing Stargate Norway

Why it matters

OpenAI announced Stargate Norway on 2025-07-31: a Narvik-based project delivered with partners Nscale and Aker in a 50/50 joint venture as part of the OpenAI for Countries program.

Key details

The facility is planned for 230 MW capacity (with ambitions to expand by 290 MW) and aims to host 100,000 NVIDIA GPUs by end of 2026; it will run on renewable hydropower, use closed-loop direct-to-chip liquid cooling, and reuse excess heat for local low-carbon enterprises.
OpenAI says weekly active ChatGPT users in Norway quadrupled over the past year (majority under 35); OpenAI may be an initial offtaker and the site will prioritize access for Norwegian startups, researchers, and regional public/private users across the UK, Nordics and Northern Europe.

Brief

Stargate Norway is OpenAI’s announced Narvik data-center project (2025-07-31) built with Nscale and Aker as a 50/50 JV, targeting 230 MW of compute (expandable by 290 MW) and 100,000 NVIDIA GPUs by end-2026. The site will be fully renewable (hydropower), employ closed-loop direct-to-chip liquid cooling, and pipe waste heat to local low-carbon industry while prioritizing regional researchers and startups.

Open reader Original

Renew Economy 2026-02-25 3 min read

New transformer in works for Australia’s most powerful battery, but return to full service pushed out again

Why it matters

A transformer at the Waratah Super Battery experienced a “significant internal fault” in mid-October 2025 that damaged windings, caused an overpressure event and ruptured the tank wall, forcing the unit to self-drain; investigations into the root cause are ongoing.

Key details

Until a replacement transformer—being manufactured by Wilson Transformer Company—arrives in Q3 2026, the 850 MW / 1,680 MWh facility is running at about half capacity (350 MW, 740 MWh) with a second transformer kept offline as a precaution; full capacity is targeted by end-2026.
A detailed design review by Consolidated Power Projects and Wilson Transformer Company confirmed the replacement design meets requirements, and remediation of the second transformer will be incorporated into a defined works program to restore all three transformers later in 2026.

Brief

The Waratah Super Battery (850 MW, 1,680 MWh) suffered a transformer failure in mid-October 2025 when an internal fault caused winding damage and a tank rupture; the site is operating at ~350 MW/740 MWh while Wilson Transformer Company manufactures a replacement due Q3 2026. Consolidated Power Projects and Wilson completed a design review and remediation work is planned to return full SIPS capability by end-2026.

By Sophie Vorrath

Open reader Original

Epoch AI 2025-10-28 27 min read

Could decentralized training solve AI’s power problem?

Why it matters

A decentralized 10 GW training run across 23 datacenters (sited near under-utilized gas plants) joined by a 4,800 km fiber ring could support ≈51,000 NVL72 racks (~3.7M GPUs) with ≈2×10^22 FLOP/s, enabling a ~5e28-FLOP pretraining run in ~100 days at 30% MFU for a 72 trillion-parameter model.

Key details

Maintaining <5% training slowdown requires <250 ms per gradient synchronization: a bidirectional 6 Pbps link between adjacent sites (≈1.1 Pbit per sync), which the authors estimate would cost ≈$410M to deploy (fiber ~$290M + switches ~$120M)—under 1% of the ~$90B datacenter construction estimate—making decentralization technically feasible but operationally complex.

Brief

Decentralized training: Epoch AI models a 10 GW cluster assembled from 23 datacenters tied to spare capacity at U.S. gas plants and linked by a 4,800 km fiber ring. For a 72T-parameter (≈1.1 Pbit per sync) model they target ~250 ms all-reduce (6 Pbps per link + ~24 ms propagation) to keep network overhead <5%, with network cost ≈$410M versus $90B in datacenter build costs.

By Jaime Sevilla, Anton Troynikov

Open reader Original

Epoch AI 2024-11-02 15 min read

Data movement bottlenecks to large-scale model training: Scaling past 1e28 FLOP

Why it matters

Epoch AI estimates training runs exceeding 2e28 FLOP (≈15 trillion parameters for Chinchilla-optimal dense models) cannot be efficiently utilized within a 3-month window due to data-movement bottlenecks; a strict 'latency wall' appears around 2e31 FLOP that is infeasible without changing training logic or hardware latency.

Key details

Data- and model-parallel communication drive the limits: practical runs mix data/pipeline/tensor parallelism (e.g., Falcon-180B used 64× data × 8× pipeline × 8× tensor = 4096 GPUs). Using t_latency = 9 μs, n_layer = 120 and B = 4M tokens yields N ≈ 400 trillion parameters → ≈2e31 FLOP.
Workarounds are algorithmic not just hardware: aggressive batch-size scaling (paper assumes B ∼ N^(1/3); Zhang et al. 2024 report B = 17.75·D^0.47, which could push the utilization collapse from ~2e28 to ~3e30 FLOP) or reducing model depth may defer limits; latency improvements alone are unlikely to suffice.

Brief

Data movement bottlenecks constrain LLM training scale: Epoch AI models show efficient, 3-month training runs break down past ~2e28 FLOP and hit a latency-imposed upper bound near 2e31 FLOP. The analysis models intra-GPU HBM bandwidth and inter-GPU communication across tensor/data/pipeline/expert parallelism, uses Chinchilla D=20N, and numeric examples (tlat=9 μs, nlayer=120, B=4M) to quantify limits and mitigations.

By Ege Erdil

Open reader Original

Epoch AI 2024-08-20 93 min read

Can AI scaling continue through 2030?

Why it matters

Epoch AI (Aug 20, 2024) projects that training runs of ~2e29 FLOP will likely be feasible by 2030 if labs can marshal power, chips, data, and networking; this scale would be roughly what GPT‑4 is vs GPT‑2.

Key details

Power: single‑campus 1–5 GW would support ~1e28–3e29 FLOP; geographically distributed 2–45 GW could support ~2e28–2e30 FLOP; a 2e29 FLOP run is estimated to need ~6 GW after efficiency gains.
Chips & data: median estimate of ~100M H100‑equivalent GPUs could enable ~9e29 FLOP (range 20M–400M → 1e29–5e30 FLOP); effective data stock projected 0.4T–20,000T tokens (400 trillion–20 quadrillion) → ~6e28–2e32 FLOP; latency wall estimated ~3e30–1e32 FLOP.

Brief

Epoch AI models four bottlenecks—power, chip manufacturing (packaging/HBM), data availability (text+multimodal), and the latency wall—using public forecasts and uncertainty ranges. They find on‑trend 4x/year compute growth could reach ≈2e29 FLOP by 2030 (median), with ~100M H100‑equivalents and ~6 GW power plausible; upper/lower bounds span ~1e28–1e32 FLOP depending on grid, fab, and data assumptions.

By Jaime Sevilla, Tamay Besiroglu

Open reader Original