substack.com

Building a Datacenter Part II

Brief

Nvidia’s shift to 800V DC power architecture is the centerpiece of Crucible Capital’s thesis about the next phase of AI datacenter design. The argument is that AI workloads have pushed facilities beyond the comfortable operating envelope of legacy AC distribution. Traditional datacenters were built around 415V/480V three-phase AC and modest rack densities of roughly 5-20 kW, but modern AI clusters are now driving rack power into the hundreds of kilowatts and, on the authors’ timeline, toward 1 MW-class systems with Rubin Ultra in 2027. In that regime, repeated AC-to-DC and DC-to-AC conversions become a serious efficiency penalty, with the article citing 10-20% aggregate losses. Nvidia’s proposed architecture converts grid AC to 800V DC once at the site perimeter, distributes DC through busways, and then steps down locally to 54V/12V at the rack. That reduces conversion stages, current, copper requirements, and thermal waste. The piece also emphasizes the strategic role of OCP and Nvidia reference architectures in standardizing not just GPUs, but the entire rack-power-cooling stack, effectively forcing suppliers such as Schneider Electric, SuperMicro, Vertiv, and others to align on Nvidia-led design choices.

The second major theme is that electrical redesign and cooling redesign are inseparable. As rack densities move from traditional 10-50 kW toward 100 kW-1 MW+, the rack itself becomes the constraint rather than the silicon. The article highlights several enabling technologies: solid-state transformers based on silicon carbide or gallium nitride, supercapacitors for sub-second transient smoothing, LFP battery systems for longer backup and load shifting, and 800V-compatible rack sidecars and power shelves. Claimed benefits include 30-50% smaller SST footprint, 150% more power flow through existing conductors, over 98% DC-DC conversion efficiency, up to 60% more compute space from removing legacy PSUs, and 45% less copper in redesigned racks. The article also surfaces a supply-chain angle, especially around gallium, noting China’s dominance in gallium separation as of 2024 and framing advanced power electronics as a critical-minerals problem as much as an electrical-engineering one.

Cooling is treated as the practical bottleneck. Air cooling is portrayed as maxing out around 15-20 kW per rack, making liquid systems mandatory for current and future AI clusters. Direct-to-chip liquid cooling is already standard for Nvidia H200/B200/B300 systems, and Nvidia’s warm-water approach at 45°C is positioned as especially important because it can reduce or eliminate chillers. The authors cite a specific startup example, Elkhorn, whose water-based system reportedly achieved a COP of about 11.8 at an AI datacenter in Newport, Washington, halving cooling power from roughly 170 kW/MW of IT load to 85 kW/MW and improving PUE from 1.18 to 1.09. Beyond facility mechanics, the article stresses software and maintenance: as rack density grows by a claimed 90x over a datacenter’s life, operators need sub-second telemetry, asset-level monitoring, and maintenance orchestration to protect hardware that can cost tens of millions of dollars per MW-scale cluster. Overall, the report is strongest as a synthesis of how AI demand is pushing simultaneous changes in power electronics, storage, cooling, supply chains, and operational software.

Why it matters

Crucible Capital argues Nvidia’s 2025 OCP announcement of 800V DC distribution for Blackwell, Rubin, and Rubin Ultra is a structural inflection for AI datacenters, because traditional 415V/480V AC designs suffer repeated AC↔DC conversions that leak roughly 10-20% of power and become increasingly impractical as racks move from 10-20 kW historically toward 100 kW to 1 MW+ densities.

Key details

  • The article ties the power transition directly to Nvidia’s rack roadmap: Blackwell GPUs are cited at about 1.35 kW per GPU, Rubin at up to 3.6 kW per GPU, a single Rubin rack at roughly 900 kW, and SuperMicro’s planned Kyber rack at 1.1 MW for Rubin Ultra shipments expected in 2027.
  • Solid-state transformers using SiC or GaN switches are presented as a key enabling technology for 800V DC, with claimed benefits including 30-50% smaller footprint than conventional transformers, 150% more power flow through existing conductors, and up to 200 kg of copper savings per rack; the article cites a $115 million SST market in 2025 projected to reach $375 million by 2033 at a 16% CAGR.
  • For short-duration power quality and transient smoothing, the piece says traditional VRLA UPS systems are too slow for AI loads that can spike to 3x nominal draw within a millisecond; it recommends supercapacitors with microsecond-to-millisecond response, up to 10 kW/kg power density, and more than 1 million cycles, despite costs of $2,500-$10,000/kWh versus $271-$500/kWh for traditional UPS storage.
  • For longer-duration support, the article favors LFP-based BESS with 95%+ round-trip efficiency and 6,000+ cycles, priced at roughly $75-$300/kWh for cells, and notes EIA projected 18 GW of battery storage installations in 2025; it frames BESS as useful for peak shaving, demand response revenue, renewable integration, and buffering gas turbine ramp limitations near datacenters.
  • Cooling becomes the dominant operational bottleneck as rack densities rise: air cooling is described as topping out near 15-20 kW per rack, liquid cooling is already standard for Nvidia H200/B200/B300 deployments, and Nvidia’s future reference architecture is described as mandating 100% liquid cooling with 45°C inlet water, integrated row-based CDUs, and 20-30% cooling-energy reductions from eliminating PSU fans and reducing conversion losses.
Cleaned source text

title: Building a Datacenter Part II

author: Crucible Capital

content_type: newsletter

publication: substack.com

published: 2026-02-06T16:04:19+00:00

source_url: gmail://19c33b6aaff0340f

word_count: 8481

Adopting 800V DC: The Future of Power Systems, Racks, Cooling, and Management Systems

͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­

Forwarded this email? Subscribe here for more

Building a Datacenter Part II

Kelly Greer and Meltem Demirors

Feb 6

READ IN APP

Note: this report was modified from its original version. For the downloadable PDF report, including footnotes and additional graphics, please visit:

the pretty version

"I think Moore's Law will be dying within a decade."_ Gordon Moore, 2015

Intro: From the PC Revolution to Cloud Compute

It’s true, frontier models were not around when we were born, and it goes without saying that demand for compute has gone from 0 to 100 over the course of our lifetimes. As students of history, we find value in reflecting on the course of compute and how underlying systems have kept pace and evolved at a time when site infrastructure is at an inflection point, namely via the industry’s adoption of a high voltage direct current power system as ordained by Nvidia, which we’ll get to and discuss at length after this brief intro. Before that, if you care to join us, please sit back and enjoy a brief ramble on the history of compute, which we promise will eventually lead us into a detailed dissection of cutting edge datacenter power and cooling systems. But first, let us indulge.

Thanks for reading Crucible Capital! Subscribe for free to receive new posts and support my work.

Subscribed

From 1990 to 2000, compute adoption was PC adoption and PC adoption was compute adoption, thanks to the dawn of the IBM personal computer in 1981 which arrived off the heels of the introduction of microprocessors in the 1970s. Total global FLOPs in the 90s are estimated to have been in the low petaFLOPs (10-50 million MFLOPs) vs. today’s capacity of 1-10 trillion TFLOPs.¹

An early 1980s IBM personal computer

Does this look familiar? We were budding millennials at the time, but we’re told these 80s era devices could be used to write book reports, access dial up internet to share files, and even play dice games. The first digital businesses emerged in the early 80s, with Boston Computer Exchange’s (BoCoEx) seminal launch in 1982, predating the 1991 public introduction of the World Wide Web and the 1995 launches of Amazon and Ebay.

Personal computers became all the rage with the rise of Windows user interfaces 95 and 98, falling unit prices, and the continued rise of the internet and online commerce. Amazon.com, a growing online retailer, had built up its own IT infrastructure for its online marketplace that it figured it could sell externally to other retailers - which brings us to the 2000s and the dawn of cloud services marked by AWS’s 2006 launch. Google’s GCP and Microsoft’s Azure would follow in coming years. The cloud era laid the groundwork for what would become a cornerstone of today’s compute market: global datacenter capacity grew steadily from 24 GW in 2006, focused on basic Infrastructure as a Service (IaaS), to roughly double that figure in 2022 - primarily on cloud services expansion - before November 2022’s launch of Chat-GPT, after which compute supply and demand growth reached another level of escape velocity.² Goldman Sachs estimate 2025 datacenter capacity to be 70 GW in the US alone and forecast 122-146 GW of demand by 2030. ³

Side Bar: The Boston Computer Exchange & Compute Markets

The Boston Computer Exchange (often abbreviated as BCE or BoCoEx) was founded in 1982 as the world's first e-commerce company. It operated as an online marketplace for buying, selling, and trading used computers, initially using a bulletin board system with a database accessible via platforms like Delphi, YellowData, and later CompuServe's Electronic Mall well before the public internet era. Sellers uploaded inventory, buyers browsed listings, and transactions were finalized by phone. BCE took a commission - a simple business - and added escrow services to handle disputes. They created the BoCoEx Index, a weekly price report for computer models that became a standard reference in publications like Computerworld and PCWeek, and an automated auction system demonstrated at COMDEX in 1986. At its peak, BCE licensed its technology to about 150 affiliated "Computer Exchanges" worldwide, including in the US, Chile, Sweden, and Russia. After eight years, the company sold to ValCom, a computer retailer that redirected its focus toward liquidating excess inventory. It was later acquired by Compaq Corporation which itself was acquired by Hewlett-Packard after which BCE ceased operations and was shut down.

We love studying the formation of new markets having built, traded, and invested in various new markets from commodities to crypto and find this case study seminal in light of the resurgence of multiple new forms of GPU marketplaces today.

Crucible are investors inOrnnCx, a compute derivatives desk who manage bespoke contracts for all risk vectors within the GPU value chain, and Compute Index, a venue that facilitates term and on demand GPU hour transactions.

Fluidstack , now one of the fast growing neoclouds in the world, originated as a UK based peer-to-peer GPU hour marketplace in 2022 and expanded to cloud deployments upon the launch of ChatGPT. The San Francisco Compute Company arose as one of the first San Francisco based GPU hour spot marketplaces serving AI start ups in recent years. Andromeda began as a vehicle to provide compute to NDFG portfolio companies and is now a bellwether in GPU services for start ups. There’s a lot of activity in this space now, we are excited about the opportunity to optimize financial flows in the compute economy, including things like stablecoin payment flows, billing automation, and settlement financing, which Internet Backyard are seeking to solve.

Waves of Exponential Demand

An exponential growth in token consumption translates directly into demand for parallel GPU clusters for both training and inference. As a result, data centers have moved from tens of megawatts of load in the early 2010s to tens of gigawatts by the mid‑2020s.

Each GPU generation delivers higher performance but also increases peak power draw per device, with modern AI accelerators moving from tens of kW to hundreds and soon 1k+ kW per rack. In dense AI racks, configurations of eight high‑power GPUs per server and multiple servers per rack routinely push power demands to hundreds of kW per rack, far above traditional data center designs. This rising power density forces adoption of advanced cooling such as liquid direct to chip systems to manage the additional heat, adding further indirect energy overhead. The racks are melting every time you talk with your AI waifu girlfriend. _You better marry her and solve our population decline problem or Elon will shadow ban you forever.

Source: Crucible Compute Research

As demand for virtual compute via waifus and LLMs pushes the physical world to adapt to efficiently produce FLOPs, it’s helpful to contextualize the $ spend attribution of physical compute systems. Illustrated above, power and cooling systems comprise ~35% of capex spend and an even greater share (the majority) of datacenter operational expenditure.

At the macro level projections now suggest that AI‑driven data centers could claim a double digit share of new generation capacity in the coming years (good thing we are also going to space), underscoring how AI compute growth is tightly coupled to a steep, system wide rise in power consumption. Note that it isn’t simply a “more power” problem. That is naive, and frankly, makes you look stupid. It is a more power and better power systems and better heat management problem. So please do not tell us about how SMRs or solar will solve all of the power problems facing data centers. Instead, continue reading and be slightly less stupid, if you please.

As lovers of complex systems and investors in companies that enable them, we find value in understanding and dissecting the minutia of the evolution in compute architecture that makes this growth possible at all.

The progression of Nvidia’s reference architecture is a good heuristic to study as the company’s designs represent the de facto standard for power & cooling systems and compute rack design. And no, we aren’t just saying this because Nvidia is the largest company in the world and is decidedly cleaning up on margins vs. the rest of the datacenter value chain to date.

More importantly, Nvidia’s leadership in setting the output and the kW per rack consumption of modern compute systems has pushed site developers and manufacturers at large to adapt to meet the company’s increasingly onerous power demands. Despite recent competition from AMD and Google on their respective GPU and TPU performance, the market still follows Nvidia’s lead in adopting higher critical IT requirements with each Nvidia generation, high density liquid cooling, and recently, a shift to high voltage direct current (DC) power systems, which has reverberating effects on multiple parts of the datacenter.

Put differently, Nvidia jumps and every other manufacturer says “how high” to either serve or compete. See the power progression across the last six generations of GPUs.

Why are we showing you this? As new generations NVIDIA chips continue to push the boundaries of physics (every model is named after a physicist for a reason), they also push the boundaries of how much power and heat can be safely managed in the racks where GPUs are installed. Thermodynamics baby! We have some sexy pictures of stacked racks later on... we hope that is a sufficiently compelling carrot to encourage you to march on.

Rack Rack City: A Note on Thermal Density

Rack power density is decidedly the antagonist driving all other datacenter systems forward, because the electrical and cooling systems that feed each rack cannot scale as fast as GPU power draw of each next gen chip. Let me bend your ear on the thermodynamics of dense computing clusters...

Computing is an _exothermic process -_ the primary physical output is heat, the digital / virtualized output is FLOPs. GPU racks convert almost all of the consumed electrical power into heat and traditional air‑cooled designs (think fans on your home PC - throwback for our fellow nerds who built gaming PCs to flex on other nerds at LAN parties) hit practical limits around 15–20 kW per rack. While liquid cooling can support more power draw, per rack heat limits require operators to leave space or spread GPUs across more racks, which constrains overall cluster scale and adds cost, and introduces constraints on physical networking meshes (we will cover those in Part 3) that link GPUs across clusters together into consolidated resource pools. Physics rule everything around me, especially in compute.

There’s also a security hazard - supplying 100 kW or more to a single rack requires multiple high capacity circuits or higher voltage distribution, increasing complexity, cost, and fault risk at the rack and row level. As AI designs push toward even hundreds of kW per rack (see prior page, a single Rubin rack will draw 900 kW), the rack becomes limited by what the site’s power and cooling backbone can safely deliver, making rack density, not silicon performance, the hard ceiling on how far AI data centers can scale at a powered site.

Source: Vertiv

As the chart above illustrates, as racks get stacked with GPUs slated to require 1,000 kW (that’s 1 MW kek) of density, rack design and materials need some upgrading. Rack power density is forcing racks to become structurally stronger, thermally integrated, with different materials, geometries, and mounting features to dissipate and manage such intense power density. In rack city we throwin’ hundreds, hundreds (of kW of power).

Check Out Our Rack: A Visual Guide

We’re going to be talking about racks a lot so let’s give you some of the high level terminology and a good visual. The rack itself is a big, heavy metal box with a bunch of components in it to accommodate all the various pieces that go in - GPUs, CPUs, storage drives, networking gear, power systems, cooling systems, and more. The heat density of new racks requires the use (expensive) steel that can hold up to high temperatures.

We will be talking about the manifolds and busbars when it comes to cooling.

Once the rack is bolted to the floor with seismic bracing, the internals can be installed, usually in pre-built trays that slide in, and then all of the power cables, liquid cooling piping, ethernet cables, and more get plugged in. and then you hope it doesn’t melt, flood, or blow up. SICK.

Jensen with Nvidia’s beautiful Blackwell MGX reference architecture -GPUs (between his hands), CPUs, network and storage hardware, D2C cooling, and power cables.

OCP: Codifying Hardware Standards (and Moats)

Now you know racks. Lets add on. Most people think of Nvidia as a company that makes GPUs. In reality, Nvidia is building a tightly coupled hardware and software design ecosystem which is reinforced through its contributions to the Open Compute Project (hereforth referred to as OCP) as the catalyst to turn its GPU “AI factory” designs into open reference standards for racks, power, cooling, and networking, so operators can build interoperable, repeatable data center architectures instead of bespoke one offs.

Nvidia (and other OEMs) use OCP to codify data center architecture via their own standard reference implementation. The entire compute value chain - co-los hosting compute, OEMs making parts, and hyperscalers building their own vertically integrated operations - can follow the playbook rather than re‑invent the wheel. OEMs like Vertiv and SuperMicro follow Nvidia’s lead and then build compliant power, cooling, and rack ecosystems that further codify these standards across the industry. The evolution of datacenter systems is a spectacle of coordination between these behemoths.

Using an OEM, like Nvidia’s, reference architecture streamlines deployment because you inherit validated rack, power, networking and cooling blueprints etc. (which many compute customers are used to). It also improves supply chain flexibility and resale or repurpose, since multiple manufacturers target the same open specs rather than a closed, vendor‑specific architecture (EXCUSE ME, I HAVE A COMPLICATED ORDER).

Cleverly, the concept of reference architecture also further reinforces Nvidia’s entrenchment as the Arrakis of compute, ensuring its control over spice production. There are profound positive *and* negative externalities that result from the standardization that must be weighed carefully as the shift towards more heterogeneous standards is being promoted by other chip makers vying for Nvidia’s crown. Is Nvidia House Harkonnen or House Atreides? We will leave it to you to decide. (We lean Atreides, Jensen you are our Duke Leto.)

The End of the Beginning

If you’ve made it this far, we salute your bravery. That was just the warm up. In this second report on “How to Build a Datacenter,” we explore the complex, layered regime shifts in datacenter power and cooling systems and contextualize where we see room for investment or have already placed our chips as early stage investors, avid traders, shrewd lenders, and owners and operators of a growing Nvidia B300 HGX cluster.

Given the gravity of the shift, we focus most here on Nvidia’s announced shift to 800V direct current power systems in forthcoming reference architecture and the second order implications:

new transformers (solid state transformers)

new materials (gallium anyone?)

supercapacitator and BESS systems alongside the traditional UPS system

streamlined (finally?) liquid cooling systems, and;

how to monitor and maintain increasingly dense racks

Without further ado, join us on this wild ride. Yeehaw.

Power Systems and the Dawn of the 800V DC Era

First, let us walk through a datacenter power system. The server in this diagram is another name for the rack that holds the GPUs. Everything else outside of that comprises the power system needed to feed electrons to the server rack.

Graphic design by @melt_dem

Let’s take each component of the power system from the top:

1. A utility company delivers high voltage power and a transformer steps down the voltage from high to medium voltage for the site to consume. Hyperscale size sites typically reside near high voltage transmission lines so on-site substations (containing transformer, switchgear and other equipment) are required and either self-built or sourced via the utility. Depending on size and voltage class, substations cost $500k to $50M.

2. The switchgear transfers medium voltage power to a second transformer adjacent to the data hall (i.e. the room full of server racks) to step down from medium to low voltage.

3. Alongside the transformer is a back up generator (diesel, baby) that the site taps into in case of a power outage, in which case the Auto Transfer Switch (ATS) switches the power.

4. From here, power distributes along dual paths:

Toward the cooling equipment (more on this later)

Toward the racks, as follows:

Power flows through a UPS (uninterruptible power supply) system - basically a back up system of 5 to 10 minute battery storage that kicks in instantly during power outages or fluctuations until the generator comes online.

Through the UPS, the power is supplied directly to the server racks via Power Distribution Units (PDUs) which is a fancy way of saying power strip on steroids, or growth hormones, or Chinese peptides. Make America strong again!

At long last, electricity from the PDU is delivered to the chip via power supply units (PSU) - a converter box that converts AC (alternating current) electricity into stable DC (direct current) - and voltage regulator modules (VRM) - a precision valve that fine tunes and stabilizes the voltage to feed CPUs and GPUs.

HOLY ACRONYMS. GOD BLESS DATA CENTER ENGINEERS.