Skip to main content

AI Factories Drive Global Battle Over Supercomputing Interconnects

·642 words·4 mins
AI Factories Supercomputing InfiniBand Ultra Ethernet Huawei UnifiedBus HPC
Table of Contents

The explosive growth of Artificial Intelligence (AI) and the pursuit of exascale supercomputing have triggered a new global infrastructure race. This battle isn’t just about processor cores—it’s about the interconnect networks that link them.

Often described as the “central nervous system” of modern supercomputers, the communication backbone has become the critical bottleneck for scaling performance.

“Interconnection networks are the centerpiece of HPC systems. AI is possible thanks to the significant advances in HPC in recent years,”
— Ramon Beivide, Universidad de Cantabria / Barcelona Supercomputing Center

As purpose-built AI Factories scale to hundreds of thousands of compute nodes, high-speed interconnects are the decisive enabler of future breakthroughs. The interconnect market, valued at $40.2 billion in 2024, is expected to expand rapidly with AI and machine learning driving infrastructure demand.

NVIDIA InfiniBand: The Integrated AI Factory
#

At the heart of this competition lies NVIDIA’s vertically integrated model, built on its InfiniBand standard. Originally cultivated by Mellanox (acquired by NVIDIA in 2020 for $7B), InfiniBand remains the dominant HPC interconnect.

Why InfiniBand matters:

  • Designed for Remote Direct Memory Access (RDMA), enabling “zero-copy” networking
  • Prevents GPU idle time in distributed AI training
  • Offers In-Network Computing via SHARP, offloading data tasks to the network hardware

However, as AI scales to hundreds of thousands of endpoints, InfiniBand faces challenges in addressing capacity and scaling efficiency. Despite this, leaders like the Barcelona Supercomputing Center continue deep research collaborations with NVIDIA on next-gen InfiniBand.

Ultra Ethernet Consortium: The Open-Standards Alternative
#

To counter proprietary lock-in, the Ultra Ethernet Consortium (UEC) was formed in 2023 by AMD, Broadcom, Cisco, HPE, Intel, Meta, and Microsoft.

Their mission: evolve Ethernet into a true HPC and AI fabric.

Key advances include:

  • RDMA over Converged Ethernet (RoCE)
  • The new Ultra Ethernet Transport (UET) protocol, making RDMA a native Ethernet feature

Ethernet’s ubiquity and lower cost make it an attractive rival to InfiniBand. As Beivide points out, mixed-protocol environments in proprietary systems can be cumbersome, potentially tipping adoption toward UltraEthernet in future AI Factories.

Supercomputing Standards in Action
#

On the June 2025 TOP500 list, two interconnect standards dominate:

  • NVIDIA InfiniBand — powers many top commercial and international systems, including Spain’s MareNostrum 5 with InfiniBand NDR200.
  • HPE Slingshot-11 (Ethernet-based) — drives U.S. Department of Energy exascale leaders like El Capitan, Frontier, and Aurora.

Both solutions will coexist for now, shaping the near-term evolution of HPC networks.

Huawei UnifiedBus: A New Challenger
#

Huawei has entered the race with a groundbreaking UnifiedBus (UB) protocol, announced at Huawei Connect 2025 in Shanghai.

Key highlights:

  • SuperPoDs and SuperClusters built for AI scaling
  • New Ascend AI chips designed for sustainable high-performance computing
  • UnifiedBus protocol for ultra-low latency and massive scalability

Huawei envisions AI clusters with 10,000+ NPUs acting as a single computer, with its Atlas 950 SuperCluster integrating 520,000 NPUs — one of the most ambitious AI infrastructures ever planned.

The Future of the Fabric: Physics Sets the Limits
#

Regardless of whether the winner is InfiniBand, UltraEthernet, or UnifiedBus, all face the same physical challenge:

  • Minimizing latency through optimized network topologies
  • Maximizing throughput via multiple routing paths
  • Overcoming the “power wall” of electrical signaling

The industry is shifting toward Co-Packaged Optics (CPO), which promises a 3.5x reduction in power consumption. Broadcom calls CPO “essential for the next generation of AI networks,” suggesting that success at the physical layer may determine future leadership more than protocol dominance.

Conclusion: Interconnects Decide the AI Future
#

As AI Factories scale into the millions of endpoints, the race for interconnect dominance is intensifying. Proprietary InfiniBand, open UltraEthernet, and Huawei’s UnifiedBus each bring strengths and limitations.

The outcome will shape:

  • Exascale supercomputers
  • Global AI infrastructure
  • Geopolitical control over the next era of computing

One thing is clear: in the age of AI, the future of performance lies not just in the CPU or GPU, but in the fabric that connects them all.

Related

TSMC Secures 15 Customers for 2nm Process, Driven by HPC and AI Demand
·410 words·2 mins
TSMC 2nm HPC AI Chips Semiconductors
NVIDIA Blackwell Ultra: First GPU to Support PCIe 6.0
·483 words·3 mins
NVIDIA GPU PCIe 6.0 Blackwell Ultra AI HPC
Understand Ultra Ethernet Three Types of Networks Explained
·519 words·3 mins
Ultra Ethernet UE Networking Data Center Scale Up Scale Out