AMD Unveils New Instinct MI350 AI Accelerator

Table of Contents

According to media reports, AMD is set to release its new Instinct MI350 series AI accelerators this Thursday. This product marks another significant advancement for the company in the field of artificial intelligence hardware. The new series, based on TSMC’s 3nm process and AMD’s latest CDNA 4 architecture, promises exceptional AI computing performance, positioning it in direct competition with NVIDIA’s Blackwell series. The MI350 series not only achieves breakthroughs in hardware specifications but also enhances AI application compatibility and efficiency through an optimized ROCm software ecosystem, providing powerful support for data centers and hyperscale AI computing.

MI350 Series: Hardware Highlights
#

A core highlight of the MI350 series is its high-performance hardware configuration. A single card features up to 288GB of HBM3E memory, with a memory bandwidth reaching 8TB/s. This represents a 12.5% increase in memory capacity and a 33.3% increase in bandwidth compared to its predecessor, the MI325X, which had 256GB of memory and 6TB/s bandwidth. In terms of computing performance, the MI350 series supports various floating-point precision formats including FP16, FP8, FP6, and FP4. Specifically, its FP16 performance reaches 18.5 PFlops, FP8 hits 37 PFlops, and FP6/FP4 achieves an impressive 74 PFlops. Compared to the MI300X, the MI350 series boasts an approximately 7.4 times improvement in FP16 performance, and its model parameter processing capability has surged from 714 billion to 4.2 trillion, an increase of nearly six times. These specifications enable it to efficiently handle the training and inference demands of trillion-parameter large language models and mixture-of-experts models.

CDNA 4 Architecture and Software Ecosystem Advancements
#

The CDNA 4 architecture is key to the MI350 series’ performance leap. Compared to the CDNA 3-based MI325X, CDNA 4 introduces support for FP4 and FP6 low-precision data types, significantly reducing computational complexity, especially for large model quantization and inference tasks. Furthermore, the application of the 3nm process further boosts transistor density and power efficiency, with an estimated single-card power consumption exceeding 1000W, roughly on par with NVIDIA’s B200 (1000W) and GB200 (1700W). The MI350 series also adopts advanced packaging technology, supporting eight-card configurations on a single platform, enabling a total memory capacity of up to 2.3TB and a total bandwidth of up to 64TB/s, providing ample computing resources for hyperscale AI workloads.

On the software front, AMD continues to optimize its ROCm open software stack to provide robust support for the MI350 series. The latest version, ROCm 6.2, shows a 2.4x improvement in inference performance and a 1.8x improvement in training performance compared to 6.0, supporting cutting-edge technologies like FP8 data types, Flash Attention 3, and Kernel Fusion. AMD collaborates with the open-source community to integrate mainstream AI frameworks such as PyTorch, Triton, and ONNX into ROCm, ensuring the MI350 series can seamlessly run popular generative AI models like Stable Diffusion 3 and Llama 3.1, as well as millions of models on the Hugging Face platform. This progress in the software ecosystem narrows the gap between AMD and NVIDIA’s CUDA ecosystem, offering developers a more flexible development environment.

Strategic Positioning and Future Outlook
#

The launch of the MI350 series is not just a hardware upgrade; it also reflects AMD’s strategic positioning in the AI market. Compared to NVIDIA’s Blackwell B200 (192GB HBM3E, 8TB/s bandwidth), the MI350 series leads in memory capacity by 50%, offers comparable bandwidth, and achieves an approximate 35x improvement in inference performance, positioning it between Blackwell and Blackwell Ultra. AMD CTO Mark Papermaster stated in his ISC25 keynote speech that the MI350 series, through architectural and packaging innovations, is expected to improve energy efficiency by 30 times by 2025. This goal is achieved thanks to the low-power characteristics of the 3nm process and the optimization of low-precision computing in the CDNA 4 architecture, allowing the MI350 to exhibit a higher performance-to-power ratio in high-performance computing (HPC) and AI training.

The MI350 series is expected to officially launch in the second half of 2025, with initial products including the MI355X accelerator, which will be integrated into server platforms by partners such as Dell, Lenovo, and HP. AMD also plans to introduce the MI400 series based on the CDNA 5 architecture in 2026, further enhancing performance and efficiency. Currently, AMD’s AI accelerators are applied in various fields, and the release of the MI350 series will further solidify AMD’s competitiveness in the data center AI market.

However, AMD still faces challenges in the AI hardware sector. Supply limitations for HBM3E memory may impact the initial production capacity of the MI350. Compared to NVIDIA, AMD’s lead time for delivery is 26 weeks, while NVIDIA’s exceeds 52 weeks, reflecting the strong market demand for high-performance AI chips. Furthermore, while the ROCm ecosystem is developing rapidly, it still needs to improve its end-to-end functionality compared to CUDA. AMD is collaborating with over 100 AI application developers to accelerate ecosystem development, though it’s still too early to assess the actual impact.

The launch of the MI350 series is a crucial step for AMD in the AI hardware competition. The combination of the 3nm process, CDNA 4 architecture, and 288GB HBM3E memory provides powerful support for processing hyperscale AI models, while the continuous optimization of the ROCm ecosystem offers developers a flexible software environment. Compared to its predecessors, the MI350 series achieves a leap in performance, efficiency, and model processing capability. The competition with NVIDIA’s Blackwell series will also drive technological advancements in the AI hardware market. In the future, AMD’s annual product roadmap and continuous architectural innovation will further solidify its position in AI computing, bringing more high-performance, cost-effective solutions to the industry.