Table of Contents

NVIDIA Unveils Rubin CPX GPU with 128GB VRAM and Long-Context AI Inference Power
#

NVIDIA has announced its Rubin CPX GPU, a next-generation AI accelerator with a staggering 128GB of GDDR7 VRAM. Built on the upcoming Rubin architecture, the CPX is engineered for long-context inference and agent-based AI workloads, pushing performance to new levels.

Although the Rubin CPX is currently a paper launch, the GPU is expected to officially debut in late 2026.

Key Specs of the Rubin CPX GPU
#

128GB GDDR7 VRAM for large-scale inference workloads.
NVFP4 data precision delivering up to 30 PFlops peak compute.
Millions of tokens supported in long-context AI inference.
3x faster attention performance vs. NVIDIA GB300 NVL72.
4 NVENC + 4 NVDEC engines for video acceleration.

The Rubin CPX marks a major step toward GPUs designed specifically for AI inference rather than gaming or traditional HPC.

Rubin and Vera: Tape-Out Completed at TSMC
#

NVIDIA confirmed that both the Rubin GPU and Vera CPU have successfully taped out at TSMC, meeting roadmap milestones. CFO Colette Cress noted that the Rubin platform includes:

Rubin GPU (successor to Blackwell).
Vera CPU (next-gen data center CPU).
CX9 Super NIC for networking.
NVLink144 / Spectrum X switch chips.
Silicon photonics for integrated packaging.

Rubin GPUs will pair with HBM4 high-bandwidth memory (8-stack), starting with the R100 GPU on TSMC’s 3nm EUV process. An upgraded Rubin Ultra with 12-stack HBM4 is scheduled for 2027, with Rubin-based RTX 60 series GPUs expected for consumer markets.

The Rubin–Vera combo forms NVIDIA’s next-gen superchip ecosystem, offering 6th-gen NVLink with 3.6TB/s bandwidth and 1.6Tbps networking via CX9 NICs.

New AI Server Lineup: Vera Rubin NVL144
#

NVIDIA also introduced its next-generation AI servers, designed to scale Rubin and Vera into massive data center deployments.

Vera Rubin NVL144
#

36 Vera CPUs + 144 Rubin GPUs per rack.
1.4 PB/s HBM4 memory bandwidth.
Up to 75TB storage.
Delivers 3.5 EFlops at NVFP4 precision → 3.3x faster than GB300 NVL72.

Vera Rubin NVL144 CPX
#

Adds 72 Rubin CPX GPUs, totaling 144 GPUs + 36 CPUs per rack.
1.7 PB/s HBM4 bandwidth.
100TB high-speed storage.
Supports Quantum-X800 InfiniBand or Spectrum-X Ethernet.
Peak performance: 8 EFlops → 7.5x boost over GB300 NVL72.

NVIDIA projects these servers could turn $100M in investment into $5B in returns for enterprises.

AMD Strikes Back with MI450 GPU
#

While NVIDIA dominates the AI GPU landscape, AMD is preparing its MI450 GPU, aiming to deliver “unbeatable leading AI performance.”

Forrest Norrod, AMD’s EVP of Data Center Solutions, stated that the MI450 will surpass both NVIDIA Blackwell and the upcoming Rubin GPUs.

Key points on the AMD MI450:

Built for training + inference + distributed inference.
Positioned as AMD’s “EPYC moment” for AI, echoing the success of its Zen-based CPUs.
Part of a unified UDNA architecture, designed to serve both AI and gaming GPUs.

If AMD delivers on its claims, the MI450 could challenge NVIDIA’s dominance and reshape competition in the AI accelerator market.

Final Thoughts
#

The race for AI supremacy is accelerating:

NVIDIA Rubin CPX promises 128GB VRAM and unmatched long-context inference.
Rubin + Vera superchips are already in production pipelines at TSMC.
NVIDIA’s NVL144 servers bring multi-EFlop scale to AI clusters.
AMD MI450 seeks to disrupt NVIDIA’s momentum with “unbeatable” AI performance.

With Rubin set for late 2026 launch, Rubin Ultra in 2027, and Feyman GPUs in 2028, the next three years will redefine the AI hardware landscape.