AI Factories Drive Global Battle Over Supercomputing Interconnects

Table of Contents

The explosive growth of Artificial Intelligence (AI) and the pursuit of exascale supercomputing have triggered a new global infrastructure race. This battle isn’t just about processor cores—it’s about the interconnect networks that link them.

Often described as the “central nervous system” of modern supercomputers, the communication backbone has become the critical bottleneck for scaling performance.

“Interconnection networks are the centerpiece of HPC systems. AI is possible thanks to the significant advances in HPC in recent years,”
— Ramon Beivide, Universidad de Cantabria / Barcelona Supercomputing Center

As purpose-built AI Factories scale to hundreds of thousands of compute nodes, high-speed interconnects are the decisive enabler of future breakthroughs. The interconnect market, valued at $40.2 billion in 2024, is expected to expand rapidly with AI and machine learning driving infrastructure demand.

NVIDIA InfiniBand: The Integrated AI Factory
#

At the heart of this competition lies NVIDIA’s vertically integrated model, built on its InfiniBand standard. Originally cultivated by Mellanox (acquired by NVIDIA in 2020 for $7B), InfiniBand remains the dominant HPC interconnect.

Why InfiniBand matters:

Designed for Remote Direct Memory Access (RDMA), enabling “zero-copy” networking
Prevents GPU idle time in distributed AI training
Offers In-Network Computing via SHARP, offloading data tasks to the network hardware

However, as AI scales to hundreds of thousands of endpoints, InfiniBand faces challenges in addressing capacity and scaling efficiency. Despite this, leaders like the Barcelona Supercomputing Center continue deep research collaborations with NVIDIA on next-gen InfiniBand.

Ultra Ethernet Consortium: The Open-Standards Alternative
#

To counter proprietary lock-in, the Ultra Ethernet Consortium (UEC) was formed in 2023 by AMD, Broadcom, Cisco, HPE, Intel, Meta, and Microsoft.

Their mission: evolve Ethernet into a true HPC and AI fabric.

Key advances include:

RDMA over Converged Ethernet (RoCE)
The new Ultra Ethernet Transport (UET) protocol, making RDMA a native Ethernet feature

Ethernet’s ubiquity and lower cost make it an attractive rival to InfiniBand. As Beivide points out, mixed-protocol environments in proprietary systems can be cumbersome, potentially tipping adoption toward UltraEthernet in future AI Factories.

Supercomputing Standards in Action
#

On the June 2025 TOP500 list, two interconnect standards dominate:

NVIDIA InfiniBand — powers many top commercial and international systems, including Spain’s MareNostrum 5 with InfiniBand NDR200.
HPE Slingshot-11 (Ethernet-based) — drives U.S. Department of Energy exascale leaders like El Capitan, Frontier, and Aurora.

Both solutions will coexist for now, shaping the near-term evolution of HPC networks.

Huawei UnifiedBus: A New Challenger
#

Huawei has entered the race with a groundbreaking UnifiedBus (UB) protocol, announced at Huawei Connect 2025 in Shanghai.

Key highlights:

SuperPoDs and SuperClusters built for AI scaling
New Ascend AI chips designed for sustainable high-performance computing
UnifiedBus protocol for ultra-low latency and massive scalability

Huawei envisions AI clusters with 10,000+ NPUs acting as a single computer, with its Atlas 950 SuperCluster integrating 520,000 NPUs — one of the most ambitious AI infrastructures ever planned.

The Future of the Fabric: Physics Sets the Limits
#

Regardless of whether the winner is InfiniBand, UltraEthernet, or UnifiedBus, all face the same physical challenge:

Minimizing latency through optimized network topologies
Maximizing throughput via multiple routing paths
Overcoming the “power wall” of electrical signaling

The industry is shifting toward Co-Packaged Optics (CPO), which promises a 3.5x reduction in power consumption. Broadcom calls CPO “essential for the next generation of AI networks,” suggesting that success at the physical layer may determine future leadership more than protocol dominance.

Conclusion: Interconnects Decide the AI Future
#

As AI Factories scale into the millions of endpoints, the race for interconnect dominance is intensifying. Proprietary InfiniBand, open UltraEthernet, and Huawei’s UnifiedBus each bring strengths and limitations.

The outcome will shape: