Skip to main content

NVIDIA Blackwell Ultra: First GPU to Support PCIe 6.0

·483 words·3 mins
NVIDIA GPU PCIe 6.0 Blackwell Ultra AI HPC
Table of Contents

NVIDIA has officially unveiled Blackwell Ultra, the upgraded version of its Blackwell GPU architecture, including the B300 and GB300 models. Still focused on AI workloads and high-performance computing (HPC), the Blackwell Ultra pushes performance further with cutting-edge memory, interconnects, and—for the first time—native PCIe 6.0 support.

This GPU is set to launch later this year, ahead of NVIDIA’s next-generation “Rubin” architecture.

NVIDIA Blackwell Ultra


Key Specifications of NVIDIA Blackwell Ultra
#

  • Process technology: TSMC 4NP
  • Transistor count: 208 billion
  • Architecture: Dual-die design, connected with NV-HBI (10 TB/s bandwidth)
  • Compute units: 160 SMs (128 CUDA cores each)
  • Tensor cores: 640 (5th generation)
  • Cache: Shared, fully consistent L2 cache
  • Interconnect: NVLink Gen5 (1.8 TB/s)
  • GPU-CPU connection: NVLink-C2C (900 GB/s)

While these core specs remain unchanged from the original Blackwell, the major upgrade lies in PCIe 6.0 support.


First GPU With PCIe 6.0 Support
#

Blackwell Ultra becomes the first GPU to officially enable PCIe 6.0, doubling bandwidth compared to PCIe 5.0. This provides faster communication with CPUs and accelerators—vital for AI training and data-intensive HPC applications.

In addition, HBM3E memory has been expanded:

  • Capacity increased from 192GB → 288GB
  • Bandwidth boosted to 8 TB/s
  • Power consumption raised from 1200W → 1400W

According to NVIDIA, the original Blackwell design already had PCIe 6.0 capability, but it had not been activated until now.

NVIDIA Blackwell Ultra GPU

NVIDIA Blackwell Ultra GPU

NVIDIA Blackwell Ultra GPU

Performance Improvements
#

Two major performance boosts define the Blackwell Ultra upgrade:

  1. NVFP4 Dense Performance

    • Improved by 50%
    • Now delivers 15 PFlops (sparse performance remains at 20 PFlops)
  2. SFU (Special Function Unit) EX2 Acceleration

    • Attention acceleration performance doubled
    • From 5 TF/s → 10.7 TF/s

Performance in other formats such as FP8, FP16, and TF16 remains largely unchanged.

NVIDIA Blackwell Ultra GPU


Deployment: GB300 NV72 Server
#

The GB300 NV72 will be the primary deployment platform for Blackwell Ultra.

  • Liquid-cooled rack design
  • Each node includes two B300 GPUs + one Grace CPU

This ensures maximum efficiency in large-scale AI training environments.

NVIDIA Blackwell Ultra GPU


Jetson Thor Development Kit: Powered by T5000
#

Alongside Blackwell Ultra, NVIDIA also introduced the Jetson Thor development kit, now available to millions of developers—particularly in robotics and industrial automation.

  • Form factor: Mini-PC
  • Module: Jetson T5000
  • CPU: Arm Neoverse-V3AE, 64-bit, 14 cores
  • Cache: 1MB L2 per core (14MB total) + 16MB shared L3
  • GPU: Blackwell-based, 2560 CUDA cores + 96 Tensor cores
  • Performance: FP4 sparse compute power of 2070 TFlops (a 70% improvement)

Compared to the previous Jetson T4000, the T5000 offers:

  • 67% more CUDA cores (1536 → 2560)
  • 50% more Tensor cores (64 → 96)

The Jetson Thor Dev Kit is priced at $3,499 (~25,000 RMB), available for pre-order now, with shipments starting November 20.

NVIDIA Blackwell Ultra GPU


Final Thoughts
#

With PCIe 6.0 support, expanded HBM3E memory, and groundbreaking compute performance, NVIDIA’s Blackwell Ultra marks a significant step forward in AI acceleration and HPC workloads.

Meanwhile, the Jetson Thor empowers developers with more compute power in robotics and edge AI. Together, these launches reinforce NVIDIA’s leadership in GPU innovation and AI infrastructure.

Related

NVIDIA为中国打造特供RTX 4090D
·42 words·1 min
News AI NVIDIA GPU 4090D
Intel’s Next-Gen Jaguar Shores Chip Unveiled: 18A Process + HBM4 Memory
·599 words·3 mins
Intel 18A HBM4 AI Chips NVIDIA AMD HPC Semiconductors
美国将限制中国经第三国购买GPU AI芯片
·21 words·1 min
GPU AI