Skip to main content

Disaggregating Power in Data Centers

·809 words·4 mins
Power AI
Table of Contents

As artificial intelligence (AI) continues to push the limits of computational performance, power consumption in data centers is reaching unprecedented levels. According to the latest Stanford AI Index Report, the largest AI models now exceed 1 trillion parameters and are trained on over 15 trillion tokens. Training these models can take up to 100 days, consume 38 billion petaFLOPS, and cost as much as $192 million—with power usage surpassing 25 megawatts per training run.

To meet these extreme demands, cloud giants like Amazon, Google, Meta, and Microsoft are turning to nuclear power for reliable energy at scale. Yet, supplying enough power is only part of the challenge. The real bottleneck lies within the server racks, where power electronics compete for space with processors, memory, and networking gear. As power density increases, so does the need for smarter power distribution strategies.

Rethinking Data Center Architecture for AI
#

According to Maury Wood, VP of Strategic Marketing at Vicor, the solution might be as simple as disaggregating compute and power infrastructure.

Data center architects are now focused on maximizing compute density—often measured in petaFLOPS per liter—within industry-standard racks. Higher compute density reduces the latency and bandwidth limitations that occur when processors are too far from memory or network interfaces. These latency and bandwidth bottlenecks are particularly problematic during large-scale AI training, which depends on fast, non-blocking all-to-all communications across multiple processors in a “superpod” or compute cluster.

Bringing components closer together enables the use of passive copper cables instead of optical transceivers, cutting both power consumption and costs. For example, a single 800G QSFP-DD or OSFP transceiver consumes about 15 W—and a supercomputer might use tens of thousands of these. Eliminating them can save up to 20 kW per rack.

Data Center Power

Liquid Cooling: A Key Enabler of Dense AI Compute
#

To support this density, data centers are moving away from air cooling and toward direct liquid cooling. In previous systems, air-cooled trays with 10 fans and large heatsinks could house only one GPU per rack unit (RU). Today, liquid-cooled systems use low-profile cold plates, fitting up to four GPUs per RU—a 4X increase in compute density.

Liquid cooling also:

  • Reduces acoustic noise
  • Cuts fan power consumption
  • Maintains lower processor temperatures
  • Improves mean time between failures (MTBF)
  • Enables higher processor clock speeds

All of these benefits translate to faster and more cost-effective AI training.

Power Distribution: Moving to ±400 V DC
#

Legacy rack designs that use three-phase 480-V AC require 30% of rack space for power conversion equipment: AC-DC rectifiers, DC-DC converters, battery backup units (BBUs), capacitors, and uninterruptible power supplies (UPS).

To reclaim this space, hyperscalers are now exploring ±400-V DC distribution directly to AI server racks. By relocating rectification and UPS systems outside the compute racks, it’s possible to pack more compute hardware inside.

Data Center Power

For example, a 48-RU rack could house 36 CPUs and 72 GPUs, delivering up to 720 petaFLOPS, or 0.5 petaFLOPS per liter. This architecture maximizes performance while minimizing costs and physical space.

Efficiency Gains with DC Power
#

In conventional setups, BBU/UPS systems perform AC-DC conversion to charge batteries, and then DC-AC conversion to power servers—wasting energy in the process. A ±400-V DC system eliminates this dual conversion, reducing inefficiencies and hardware complexity.

Challenges of High-Voltage DC Distribution
#

However, 400-V DC isn’t considered Safety Extra Low Voltage (SELV), which introduces safety and regulatory hurdles. To future-proof for 800-V DC, racks may need three-conductor power feeds (−400 V, GND, +400 V), increasing cable complexity and cost.

Data Center Power

  • At 140 kW per rack, 400 V DC requires 350 A, needing 500 MCM copper cables (~$14/foot)
  • At 800 V DC, only 175 A is needed, so 3/0 AWG cables (~$5/foot) suffice

While 800-V systems are cheaper to wire, the ecosystem is less mature than 400-V DC. That’s changing fast, thanks to the automotive industry’s shift toward 800-V EV platforms.

Managing Massive Currents
#

Inside the rack, converting 400 V DC to 50 V DC at 140 kW means managing 2,800 A of current. This requires large silver-plated copper busbars, which add cost and weight. One potential solution? Liquid cooling the busbars, leveraging the existing rack cooling infrastructure.

This strategy can:

  • Reduce cross-sectional busbar area by up to 5X
  • Lower resistance and power loss
  • Maintain a tighter voltage drop window
  • Reduce stress on point-of-load converters

However, connector design becomes critical at these currents to avoid thermal failures.

Toward the Next Generation: ORv3 and High-Power Racks
#

Industry groups like the Open Compute Project (OCP) are tackling these challenges through specifications like Open Rack V3 (ORv3) and the High Power Rack (HPR). These standards aim to streamline power and thermal engineering for next-generation AI supercomputers.

High-density power modules with low thermal resistance and coplanar surfaces—optimized for liquid-cooled cold plates—will be essential. The future of AI data centers depends on precisely these kinds of innovations in power disaggregation and thermal management.

Related

Huawei is to Launch Ascend 910D
·2149 words·11 mins
Huawei AI Ascend 910D
Huawei Prepares to Ship New AI Chips on a Large Scale
·470 words·3 mins
Huawei AI H20 Ascend 910C
Former Intel CEO Pat Gelsinger Says Jensen Huang Got Lucky in AI
·774 words·4 mins
Intel CEO Pat Gelsinger AI