Skip to main content

NVIDIA Plans to Offer Next-Gen Rubin Accelerator Samples This September

·898 words·5 mins
Rubin R100 GPU AI Accelerator HBM4
Table of Contents

Market intelligence indicates that NVIDIA will begin providing samples of its next-generation Rubin AI accelerator to customers as early as September this year. This comes just six months after the launch of Blackwell Ultra, marking an incredibly rapid pace of development. The Rubin R100 GPU and the new Vera CPU both leverage TSMC’s 3nm process, HBM4 memory, and a chiplet design, delivering comprehensive upgrades in performance, power efficiency, and architecture.

NVIDIA Rubin System

Rubin R100 GPU: A Leap in AI Acceleration
#

The Rubin R100 GPU is NVIDIA’s new AI accelerator following the Blackwell architecture, specifically designed to meet the escalating computational demands of data centers. The R100 utilizes TSMC’s N3P (3nm performance-enhanced) process, which offers approximately 20% higher transistor density, 25%-30% lower power consumption, and 10%-15% better performance compared to the Blackwell B100’s 4nm process. This process advancement enhances the R100’s power efficiency ratio, giving it a significant edge in intensive AI training and inference tasks. The R100 also introduces a chiplet design for the first time, integrating multiple small chip modules to improve manufacturing yield and design flexibility. Its 4x reticle design (compared to Blackwell’s 3.3x reticle) allows for a larger chip area, enabling the integration of more compute units and memory interfaces.

In terms of memory, the R100 features 8 HBM4 stacks, totaling 288GB of capacity and delivering a bandwidth of up to 13TB/s. This is a substantial improvement over the Blackwell B100’s HBM3E (with a bandwidth of approximately 8TB/s). HBM4 employs 12-layer or 16-layer stacking technology, with a single stack capacity of up to 24Gb or 32Gb, providing ample memory support for large language models and complex AI inference. The R100 also utilizes TSMC’s CoWoS-L packaging technology, supporting a 100x100mm substrate that can accommodate up to 12 HBM4 stacks, laying the groundwork for future Rubin Ultra expansions. Its I/O chip is based on the N5B (5nm enhanced) process, further optimizing data transfer efficiency.

NVIDIA Rubin System

Vera CPU: Powering the AI Ecosystem
#

Complementing the Rubin GPU, the Vera CPU represents a comprehensive upgrade to the Grace CPU. Based on custom ARM Olympus cores, it boasts 88 cores and 176 threads, exceeding Grace’s 72 cores (144 threads). Vera’s memory bandwidth reaches 1.8TB/s, 2.4 times that of Grace, and its memory capacity is 4.2 times larger, significantly enhancing data processing capabilities. Vera connects with Rubin GPUs via NVLink-C2C high-speed interconnect, offering a bandwidth of 1.8TB/s to ensure efficient inter-chip communication. Its performance is approximately twice that of Grace, making it particularly well-suited for AI inference, data pre-processing, and multi-threading tasks. NVIDIA has optimized the ARM instruction set and microarchitecture to make Vera more adaptable to the backend demands of AI workloads.

Accelerated Roadmap and Future Platforms
#

Since announcing the Rubin architecture at Computex 2024, NVIDIA has consistently accelerated its product roadmap. The Rubin R100 is expected to enter mass production in Q4 2025, with related DGX and HGX systems deployed in H1 2026. In H2 2026, NVIDIA will launch the Vera Rubin NVL144 platform, comprising 144 Rubin GPUs and multiple Vera CPUs, housed in liquid-cooled Oberon racks. This platform will boast a power consumption of 600kW, delivering 3.6 exaFLOPS of FP4 inference performance and 1.2 exaFLOPS of FP8 training performance, a 3.3x improvement over the Blackwell GB300 NVL72. In 2027, the Rubin Ultra NVL576 platform will feature 576 Rubin Ultra GPUs, equipped with 16 HBM4e stacks, 1TB of memory capacity, 15 exaFLOPS of FP4 inference performance, and 5 exaFLOPS of FP8 training performance, a 14x improvement over the GB300. This platform will also adopt NVLink 7 interconnect and ConnectX-9 NICs (1.6Tbps), vastly improving system scalability.

To ensure the rapid rollout of Rubin, NVIDIA is deepening its collaboration with supply chain partners like TSMC and SK Hynix. TSMC plans to increase its CoWoS packaging capacity to 80,000 wafers per month by Q4 2025 to support demand for Rubin and Apple’s M5 SoC, among other products. SK Hynix completed the tape-out of HBM4 in October 2024, delivered 12-layer HBM4 samples to NVIDIA, and is poised for mass production in 2025. The Rubin GPU and Vera CPU completed tape-out at TSMC in June 2025, with trial production samples to be provided in September, and mass production commencing in early 2026.

Power Efficiency and Market Dominance
#

The surging power demands of data centers make energy efficiency a core design principle. The Rubin R100 reduces power consumption per computation through its 3nm process and HBM4 memory, complemented by liquid cooling technology and high-density racks for optimized thermal management. While the Vera Rubin NVL144 platform consumes 600kW, its computational density and performance mean its output per unit of power consumed significantly surpasses previous generations. According to market analysis, the global AI data center market size will reach $200 billion in 2025, with NVIDIA’s Blackwell and Rubin series expected to dominate. Hyperscalers like Microsoft, Google, and Amazon have already pre-ordered Blackwell chips through the end of 2025, and the early launch of Rubin will further solidify NVIDIA’s market leadership.

NVIDIA Rubin System

NVIDIA has already planned the Feynman architecture for 2028, continuing its tradition of naming chips after scientists. The successful deployment of Rubin and Vera will support emerging workloads such as AI inference, training, and agentic AI, driving AI technology toward artificial general intelligence. The sample delivery in September 2025 and mass production deployment in 2026 will enable NVIDIA to continue leading the global AI market, injecting powerful momentum into the future of data centers and AI applications.

Related

TSMC Can Build Giant 1000W Chips
·413 words·2 mins
TSMC EUV 1000W Chip CoWoS HBM4
台积电要用5nm先进封装HBM4内存芯片
·20 words·1 min
TSMC HBM4
AMD Nearing Parity With Intel in Server CPU Market
·925 words·5 mins
AMD Intel Server CPU Market