AMD Instinct MI350 Officially Announced: 185 Billion Transistors and 288GB HBM3e Memory

Table of Contents

At the Hot Chips 2025 conference, AMD revealed the full specifications of its highly anticipated Instinct MI350 series GPU accelerators. Built on the advanced CDNA 4 architecture, the MI350 is engineered for large language model (LLM) training, AI inference, and high-performance computing (HPC).

With 185 billion transistors, 288GB of HBM3e memory, and industry-leading performance, the MI350 represents one of AMD’s most ambitious pushes yet to challenge NVIDIA’s dominance in AI computing.

AMD MI350 Architecture: CDNA 4 and 3D Packaging
#

The MI350 is built using a 3D multi-chip module (MCM) design, combining TSMC’s N3P and N6 process technologies with CoWoS-S packaging for high-density interconnects.

Each GPU package integrates:

8 Accelerator Complex Dies (XCDs) for compute
2 I/O Dies (IODs) for Infinity Fabric interconnect and HBM3e controllers

This design enables massive parallelism and memory throughput, making it ideal for AI workloads that demand high bandwidth and fast interconnects.

Memory and Bandwidth: 288GB HBM3e + 8TB/s
#

One of the biggest upgrades in the MI350 series is its memory system:

288GB of HBM3e memory
8 TB/s memory bandwidth (up from 6 TB/s on the MI300)
36GB per stack, 12-Hi package design
256MB Infinity Cache for reduced latency

This allows the MI350 to train massive AI models and process larger contexts for inference, a key bottleneck in today’s generative AI workloads.

Compute Performance: Up to 10 PFLOPS
#

The MI350 delivers record-breaking AI performance across precision formats:

2.5 PFLOPS FP16/BF16 matrix compute
5 PFLOPS FP8 compute
10 PFLOPS using MXFP6/MXFP4 formats
78.6 TFLOPS FP64 vector performance

At Hot Chips, AMD highlighted that the MI355X variant achieved 35x throughput gains over the MI300 in Llama 3.1 405B inference, underlining its AI-first optimization.

Interconnect and Scalability
#

The MI350 series uses 4th-gen Infinity Fabric, delivering:

1075 GB/s aggregate bandwidth per card
Up to 8-card interconnect with ~20% higher communication rate

Form factors:

MI350X (air-cooled) – 1000W TDP, fits in 10U racks
MI355X (liquid-cooled) – 1400W TDP, high-density 5U racks

In a standard cluster, a single rack can provide 80 PFLOPS of FP8 compute power and 2.25TB of HBM3e memory, positioning AMD as a serious player in large-scale AI clusters.

AMD vs. NVIDIA: Competitive Advantage
#

AMD claims several advantages over NVIDIA’s latest GPUs:

1.6x memory capacity vs. NVIDIA GB200
2x FP64 performance for HPC workloads
Comparable FP8/FP16 throughput
Flexible multi-instance GPU partitioning, allowing multiple 70B parameter models to run on one card

This gives AMD a unique edge in AI inference efficiency and HPC double-precision tasks, areas where NVIDIA has traditionally dominated.

Availability and Roadmap
#

The AMD Instinct MI350 will ship to partners and hyperscale data centers in Q3 2025.

Looking ahead, AMD confirmed that the Instinct MI400 series is already in development, targeting a 2026 release, reinforcing AMD’s commitment to an annual accelerator refresh cycle to keep pace with generative AI’s exponential growth.

Conclusion: A New AI Powerhouse from AMD
#

The AMD Instinct MI350 series is not just another GPU accelerator—it’s a strategic leap in memory capacity, bandwidth scaling, and AI-optimized compute performance.

With 288GB of HBM3e, up to 10 PFLOPS performance, and scalable Infinity Fabric interconnects, the MI350 positions AMD as a strong rival to NVIDIA in the AI and HPC space.

As the race for generative AI dominance intensifies, AMD’s MI350 launch signals that the battle for AI datacenters is far from over.