AMD CEO Lisa Su unveiled the first details about the company’s EPYC Milan-X processors, which come with a 3D-stacked L3 cache called 3D V-Cache, during its Accelerated Data Center event today. AMD says that its new cache-stacking technology, which it will add to the existing Zen 3-powered EPYC Milan models to create the new Milan-X chips, will bring up to 768MB of total L3 cache per chip. That means there will soon be dual-socket servers with an eye-popping 1.5 GB of L3 cache in the system. AMD also shared a few examples of workloads that will benefit, and an impressive benchmark result that shows a 60% performance improvement.
The chips will come to market in Q1 2022, but they are available as a preview instance in Azure now. Microsoft has released its own performance projections here, too, but we’ll cover those in the article below as well.
As a quick refresher, AMD introduced its 3D V-Cache technology at CES 2021, showing a third-gen Ryzen prototype outfitted with an additional chunk of L3 cache. 3D V-Cache uses a novel new hybrid bonding technique that fuses an additional 64MB of 7nm SRAM cache stacked vertically atop the Ryzen compute chiplets to triple the amount of L3 cache per Ryzen chip. AMD claims that brings up to a 15% performance improvement in some games, meaning those chips will vie for the title of Best CPU for gaming when they come to market early next year. We’ve since learned many more details about those chips, including deep-dive info on the packaging tech at a Hot Chips presentation earlier this year.
|Processor||Cores/Threads||Base Clock||Boost Clock||TDP||L3 Cache (L3 + 3D V-Cache)|
|Epyc 7773X||64/128||2.2 GHz||3.5 GHz||280 W||768 MB|
|Epyc 7573X||32/64||2.8 GHz||3.6 GHz||280 W||768 MB|
|Epyc 7473X||24/48||2.8 GHz||3.7 GHz||240 W||768 MB|
|Epyc 7373X||16/32||3.05 GHz||3.8 GHz||240 W||768 MB|
Like with the consumer variants, AMD stacks a single 6x6mm layer of L3 cache directly over the L3 cache already present on each CCD (compute chiplet).
Each CCD has 32MB of L3 cache before the modification. Adding the vertically-stacked L3 cache slice adds another 64 MB of cache, bringing the total to 96MB per CCD. The Milan-X chips will stretch up to 64-core models with eight CCDs, which brings the total to 768MB of L3 cache per chip. AMD has confirmed that its chips support higher stacks of L3, and HardwareLuxx has even found server BIOS settings that enable up to four cache stacks per chip with existing AMD EPYC Milan servers.
The stacked L3 cache adds a roughly ~10% overhead to overall latency, which is comparable to the standard latency impact from simply adding capacity with standard on-die techniques. That’s partly because the additional L3 cache slice is somewhat ‘dumb’ — all the control circuitry resides on the existing CCD, which helps reduce the latency overhead. In addition, because the larger cache reduces trips to main memory due to higher L3 cache hit rates, the additional capacity relieves bandwidth pressure on main memory, thus reducing latency and thereby improving application performance from multiple axes.
AMD uses the same Zen 3 cores as normal; the control circuitry for 3D V-Cache was added as a forward-looking design choice during the initial design phases. AMD uses the existing EPYC Milan chips as the building block, so the chips will drop into the SP3 sockets in EPYC servers (a BIOS update is required). That reduces qualification time and speeds time to market.
AMD reiterated many of the benefits of the solder-less hybrid bonding technique that enables 3D V-Cache, like a 200X interconnect density increase over 2D chiplets and a 15X density increase and 3X energy efficiency gain over micro-bump 3D packaging. AMD says hybrid bonding also improves thermals, transistor density, and interconnect pitch over other 3D approaches, making it the most flexible active-on-active silicon stacking tech.
Additionally, AMD says no software modifications are required to leverage the increased cache capacity, though it is working with several partners to create certified software packages. Those packages might see further performance optimizations, too.
AMD says Milan-X will provide up to a 50% uplift in certain ‘targeted workloads’ that largely consist of various product development softwares. That includes computational fluid dynamics (CFD), finite element analysis (FEA), structural analysis, and electronic design automation (EDA), with the latter involving chip design.
AMD touted the performance of the existing AMD EPYC Milan models in three workloads, showing two EPYC 75F3 beating two Intel Xeon 8362 in three of those workloads — but those benchmarks don’t include Milan-X.
AMD avoided a head-to-head comparison to Intel’s chips with its Milan-X, instead showing a 66% performance uplift with a 16-core Milan-X over its standard 16-core EPYC chip in a chip design (EDA) RTL verification workload with Synopsys VCS. We included the test endnotes at the bottom of the article.
AMD says that Milan-X will benefit a broader selection of workloads, too, which you can find in the album above. The company also listed several ISVs that are already working on certified software packages, like Altair, Cadence, Synopsys, and others. Those certified solutions will be ready at launch.
AMD hasn’t yet released official specs or pricing, but we’ll update as that information becomes available. The chips come to market in Q2 2022.
Update: Azure HBv3 VMs with Milan-X CPUs
Microsoft has issued documentation for the Milan-X HBv3 VMs with the following performance projections and VM size details and technical overview:
- Up to 80% higher performance for CFD workloads
- Up to 60% higher performance for EDA RTL simulation workloads
- Up to 50% higher performance for explicit finite element analysis workloads
- Up to 120 AMD EPYC 7V73X CPU cores (EPYC with 3D V-cache, “Milan-X”)
- Up to 96 MB L3 cache per core (3x larger than standard Milan CPUs, and 6x larger than “Rome” CPUs)
- 350 GB/s DRAM bandwidth (STREAM TRIAD), up to 1.8x amplification (~630 GB/s effective bandwidth)
- 448 GB RAM
- 200 Gbps HDR InfiniBand (SRIOV), Mellanox ConnectX-6 NIC with Adaptive Routing
- 2 x 900 GB NVMe SSD (3.5 GB/s (reads) and 1.5 GB/s (writes) per SSD, large block IO)
This article was created by AposTube