China Mobile Cloud New Intelligent Computing Network Architecture

As AI Large Language Models (LLMs) rapidly evolve from unimodal to multimodal, their parameter scale is growing at a rate of 10 times per year. Currently, leading industry models have entered the era of trillion-level and even ten-trillion-level parameters. Correspondingly, there is an urgent demand for ultra-large-scale AI clusters, with hundred-thousand-card intelligent computing clusters gradually becoming the industry standard. For example, xAI’s data center for training Grok3, VAST, has deployed 200,000 NVIDIA H100/H200 GPU cards. In this context, ultra-large-scale, ultra-high throughput, ultra-high reliability, and ultra-low latency intelligent computing networks have become a key component of AI infrastructure. Intelligent computing networks include the following two types of networks:

Scale-Out Network: Builds large-scale AI clusters through horizontal scaling, typically used to carry Data Parallel (DP) and Pipeline Parallel (PP) traffic.
Scale-Up Network: Breaks through the limitation of 8 high-speed interconnected cards within a single GPU server through vertical scaling, achieving a wider range of ultra-high bandwidth and ultra-low latency GPU interconnection. It is typically used to carry Tensor Parallel (TP) and Expert Parallel (EP) traffic.

China Mobile Cloud’s New Intelligent Computing Network Architecture: An Open Intelligent Computing Network for Hundred-Thousand-Card Clusters

Through independent innovation, China Mobile Cloud has launched a new intelligent computing network architecture for hundred-thousand-card clusters, namely HPN1.0. Its core advantage lies in the openness of the architecture. For example, in terms of technology roadmap, it chooses the more open Ethernet technology route. In the selection and research and development of network equipment, it also adheres to the idea of open disaggregation, developing intelligent computing switches based on the white-box ecosystem and open-architecture supernodes. Only by adhering to the openness of the network technology architecture can the strengths of the domestic and international industrial ecosystems be maximized to create globally leading and cost-competitive AI network infrastructure.

(I)Scale-Out Network: Open Architecture Drives Technological Innovation

Based on the open Ethernet technology route and the white-box switch industry ecosystem, China Mobile Cloud has created a Scale-Out network architecture that meets the requirements of ultra-large scale, ultra-high throughput, ultra-high reliability, and ultra-low latency through multi-dimensional technological innovations in network protocols, network operating system software, and network design. This architecture has reached the leading level in the industry.

Ultra-Large Scale: Adopting a multi-rail, multi-plane three-layer CLOS networking scheme. Intra-PoD communication uses a two-layer multi-rail and single-layer multi-plane design. The Spine layer has a 7:1 oversubscription ratio. A single PoD can accommodate up to 57,000 400G GPU cards. Compared to the intelligent computing network architecture of a leading domestic cloud vendor, which is touted as a benchmark for AI network architecture design, the GPU card capacity within a single PoD is increased by nearly 4 times, minimizing cross-PoD communication traffic and significantly optimizing the bandwidth resource utilization efficiency of the entire network and the communication latency between GPUs. At the same time, the larger scale of a single PoD makes it easier to support million-card clusters.

Ultra-High Throughput: The switches adopt China Mobile Cloud’s self-developed PanShi intelligent computing switch based on the industry’s highest-performing 51.2T chip, supporting an ultra-high access bandwidth of 3.2Tbps for a single 8-card GPU server. Based on China Mobile Cloud’s independently innovated Full Adaptive Routing Ethernet (FARE) technology protocol (https://tools.ietf.org/html/draft-xu-idr-fare), jointly promoted with domestic and international industry partners in the IETF international standards organization, bandwidth utilization is ensured to be as high as 95%, which is 1.6 times higher than traditional Ethernet and on par with the performance of NVIDIA’s Spectrum-X AR proprietary solution, reaching the top level in the industry.

Ultra-High Reliability: To avoid the impact of network single-point failures on GPU training task interruptions, each GPU corresponds to a 2*200G RDMA Ethernet card, accessed through dual-plane redundancy, completely eliminating the risk of single-port access single-point failures and achieving network high reliability of over 99.9%.

Ultra-Low Latency: Through the comprehensive application of traffic forwarding path optimization and precise flow control, the end-to-end latency is ensured to be within 10 microseconds, on par with the best level in the industry.

(II) Scale-Up Network: Open Architecture Drives Rapid Commercial Implementation

The open Ethernet technology route is adopted to meet the two core demands of ultra-high bandwidth and ultra-low latency for the Scale-Up network. In terms of ultra-high bandwidth capability, Ethernet has obvious advantages. Its SerDes rate doubles every two years. Currently, 112G SerDes has achieved large-scale commercial deployment, and 224G SerDes is expected to be commercially available this year. In terms of single-chip switching capacity, Ethernet has also maintained an average doubling rate every two years for many years. 51.2T bps chips have been widely commercially available since last year, and 102.4T bps Ethernet chips are expected to have commercial capabilities this year. In contrast, the latest generation of PCIe that has been commercially available is PCIe 5.0, with each lane having a capacity of only 32G bps. The single-chip switching capacity of PCIe doubles only every three years, and the current highest-performance PCIe 5.0 switching chip has a switching capacity of only 4.6T bps.

In terms of ultra-low latency performance, the forwarding latency of mainstream 51.2Tbps Ethernet chips in the industry is already as low as within 400 nanoseconds. With the continuous optimization of the forwarding process, this latency metric is expected to be further reduced to around 300 nanoseconds, comparable to the low latency level of NVSwitch. In addition to technical advantages, another important reason for choosing the Ethernet technology route is the industrial ecosystem, especially the domestic industrial ecosystem. Currently, the PCIe switching chip technology is mainly controlled by a very small number of foreign manufacturers, and the domestic industrial ecosystem is extremely weak compared to the Ethernet industrial ecosystem.

The GPU clusters interconnected by the Scale-Up network form a “Super GPU,” also known as an AI supernode. In the selection and research and development of AI supernode products, China Mobile Cloud has jointly developed supernode solutions based on a completely open architecture with domestic and international manufacturers. This solution adopts a “building block” modular design concept, using highly standardized GPU servers (such as 8-card air-cooled or liquid-cooled GPU servers) and intelligent computing switches (such as 51.2T air-cooled or liquid-cooled intelligent computing switches) from different manufacturers, interconnected through standard AEC active copper cables and optical fibers. Among them:

64-card air-cooled supernode (expected to be commercially available in the second half of 2025): Consists of 2 computing cabinets (each containing 4 8-card air-cooled servers) and 1 switching cabinet (containing 4 51.2T air-cooled switches), interconnected by AEC active copper cables. Compared to AOC optical fiber solutions, it reduces power consumption and cost by more than 50%, supports 800GB/s inter-card bandwidth and hundred-nanosecond-level latency, and is targeted for distributed AI inference clusters.
128-card liquid-cooled supernode (expected to be commercially available in the first half of 2026): Consists of 2 computing cabinets (each containing 8 8-card liquid-cooled servers) and 1 switching cabinet (containing 8 51.2T liquid-cooled or air-cooled switches), interconnected by AEC active copper cables. Compared to AOC optical fiber solutions, it reduces power consumption and cost by more than 50%, supports 800GB/s inter-card bandwidth and hundred-nanosecond-level latency, and is targeted for distributed AI inference clusters and large-scale AI training clusters.
1024-card liquid-cooled supernode (expected to be commercially available in the second half of 2026): Consists of 16 sets of 64-card air-cooled supernodes or 8 sets of 128-card liquid-cooled supernodes interconnected by optical fibers through secondary switching cabinets. To support secondary interconnection, the number of switching ports in the primary switching cabinet of each 64-card air-cooled supernode or 128-card liquid-cooled supernode needs to be doubled.

To better understand the value that open-architecture supernodes bring to the industry, the following is a simple comparison between closed-architecture supernodes represented by NVIDIA NVL72 and the open-architecture supernodes pioneered by China Mobile Cloud:

(1) Closed-Architecture Supernodes Represented by NVIDIA NVL72:

High R&D Costs: The products based on this architecture are highly customized, requiring tens of millions of yuan in R&D investment. For many companies intending to enter this field, investing such a large amount of R&D funds without clear orders poses a significant risk.
High O&M Costs: The computing trays, switching trays, and cable trays of this architecture use proprietary plug-in interconnections, which are easily damaged and complex to troubleshoot, leading to high O&M costs.
High Vendor Lock-in Risk: The hardware and software of the system are highly customized, and users face extremely high vendor lock-in risks. Subsequent technology upgrades and system expansion are subject to many restrictions from the vendor.
Significant Power Consumption and Heat Dissipation Challenges: The power consumption of a single cabinet is as high as 120kW, requiring a matching liquid cooling system and high-voltage power distribution改造. However, only a very small number of existing data centers have the adaptation conditions, which greatly limits its applicable scope.

(2) Open-Architecture Supernodes Jointly Developed by China Mobile Cloud and Industry Partners:

Low R&D Costs: The hardware design has a high degree of standardization, and R&D investment can be controlled within the range of several million yuan. Compared to closed-architecture supernodes like NVIDIA NVL72, the R&D investment is reduced by an order of magnitude, significantly reducing the R&D investment risk for enterprises.
Low O&M Costs: Standard AEC copper cables are used to connect computing and switching nodes, making fault location simple and O&M costs low.
Low Vendor Lock-in Risk: Interconnection and intercommunication of GPU servers and switches from different vendors can be achieved. Phased expansion has no technical barriers, and the vendor lock-in risk is almost zero, providing enterprises with great flexibility in choice.
Acceptable Power Consumption and Heat Dissipation: For a 64-card supernode, the power consumption of a single cabinet is in the range of 40-60kW. Data center改造 only requires upgrading the local power distribution system, and efficient air cooling can meet the heat dissipation requirements. The total cost of ownership (TCO) is reduced by more than 50% compared to closed-architecture supernodes like NVL72.

Open-Architecture Supernodes Become the Focus of Industry Attention, Reshaping the Technological Landscape of AI Infrastructure

At the China Mobile Cloud Intelligent Computing Conference held from April 10th to 11th, the open-architecture supernode exhibition area attracted widespread attention with its “completely open hardware ecosystem + modular design” concept. Technical experts from domestic and international manufacturers conducted in-depth exchanges on its technical details, application scenarios, and commercialization paths, expressing high expectations for the first commercial deployment of 64-card supernodes in China.

With the accelerated popularization of cutting-edge AI inference model technologies such as Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT), supernodes for inference scenarios will usher in an explosive growth in market demand. Relying on the open Ethernet network technology route and adopting a highly standardized hardware combination across vendors, open-architecture supernodes provide flexible air-cooling and liquid-cooling options, enabling rapid response to the market demand for distributed AI inference clusters.

Summary

With the innovative practice of intelligent computing network architecture and the integration of global industrial ecosystem resources, China Mobile Cloud will reshape the technological landscape of AI infrastructure and promote AI infrastructure towards a more open future.

英伟达GPU卡类型以及在大模型训练推理领域的应用

12 January 2025·128 words·1 min

NVIDIA LLM

洞察之路变革：AI-HPC范式转移

25 September 2024·41 words·1 min

DataCenter AI HPC LLM

Storage Requirements in LLM Training: Data and Checkpoints

31 August 2024·644 words·4 mins

AI LLM

Related