AMD RDNA 3 GPU Architecture Deep Dive: The Ryzen Moment for GPUs

On November 3, AMD revealed key details of its upcoming RDNA 3 GPU architecture and the Radeon RX 7900-series graphics cards. It was a public announcement that the whole world was invited to watch. Shortly after the announcement, AMD took press and analysts behind closed doors to dig a little deeper into what makes RDNA 3 tick — or is it tock? No matter.

We’re allowed to talk about the additional RDNA 3 details and other briefings AMD provided now, which almost certainly has nothing to do with Nvidia’s impending launch of the RTX 4080 on Wednesday. (That’s sarcasm, just in case it wasn’t clear. This sort of thing happens all the time with AMD and Nvidia, or AMD and Intel, or even Intel and Nvidia now that Team Blue has joined the GPU race.)

RDNA 3 and GPU Chiplets

Navi 31 consists of two core pieces, the Graphics Compute Die (GCD) and the Memory Cache Dies (MCDs). There are similarities to what AMD has done with its Zen 2/3/4 CPUs, but everything has been adapted to fit the needs of the graphics world.

For Zen 2 and later CPUs, AMD uses an Input/Output Die (IOD) that connects to system memory and provides all of the necessary functionality for things like the PCIe Express interface, USB ports, and more recently (Zen 4) graphics and video functionality. The IOD then connects to one or more Core Compute Dies (CCDs — alternatively “Core Complex Dies,” depending on the day of the week) via AMD’s Infinity Fabric, and the CCDs contain the CPU cores, cache, and other elements.

A key point in the design is that typical general computing algorithms — the stuff that runs on the CPU cores — will mostly fit within the various L1/L2/L3 caches. Modern CPUs up through Zen 4 only have two 64-bit memory channels for system RAM.

The CCDs are small, and the IOD can range from around 125mm^2 (Ryzen 3000) to as large as 416mm^2 (EPYC xxx2 generation). Most recently, the Zen 4 Ryzen 7000-series CPUs have an IOD made using TSMC N6 that measures just 122mm^2 with one or two 70mm^2 CCDs manufactured on TSMC N5, while the EPYC xxx4 generation uses the same CCDs but with a relatively massive IOD measuring 396mm^2 (still made on TSMC N6).

GPUs have very different requirements. Large caches can help, but GPUs also really like having gobs of memory bandwidth to feed all the GPU cores. For example, even the beastly EPYC 9654 with a 12-channel DDR5 configuration ‘only’ delivers up to 460.8 GB/s of bandwidth. The fastest graphics cards like the RTX 4090 can easily double that.

In other words, AMD needed to do something different for GPU chiplets to work effectively. The solution ends up being almost the reverse of the CPU chiplets, with memory controllers and cache being placed on multiple smaller dies while the main compute functionality resides in the central GCD chiplet.

The GCD houses all the Compute Units (CUs) along with other core functionality like video codec hardware, display interfaces, and the PCIe connection. The Navi 31 GCD has up to 96 CUs, which is where the typical graphics processing occurs. But it also has an Infinity Fabric along the top and bottom edges (linked via some sort of bus to the rest of the chip) that then connects to the MCDs.

The MCDs, as the name implies (Memory Cache Dies) primarily contain the large L3 cache blocks (Infinity Cache), plus the physical GDDR6 memory interface. They also need to contain Infinity Fabric links to connect to the GCD, which you can see in the die shot along the center facing edge of the MCDs.

GCD will use TSMC’s N5 node, and will pack 45.7 billion transistors into a 300mm^2 die. The MCDs meanwhile are built on TSMC’s N6 node, each packing 2.05 billion transistors on a chip that’s only 37mm^2 in size. Cache and external interfaces are some of the elements of modern processors that scale the worst, and we can see that overall the GCD averages 152.3 million transistors per mm^2, while the MCDs only average 55.4 million transistors per mm^2.


One Response

Leave a Reply

Your email address will not be published. Required fields are marked *