Posted on 08 June, 2021
An EPYC Step
AMD have been making advancing strides in the server grade CPU market segment in the past few years. Having previously struggled to compete in this sector for some time, AMD changed the landscape with the launch of the EPYC range of CPUs in 2017 and we finally saw real competition in x86 again.
With the launch of the first wave of EPYC CPUs, codenamed Naples, AMD certainly raised eyebrows. By offering products that had more cores, memory channels and PCIe lanes than previous generations, AMD had a genuinely competitive server grade range of CPUs, with great performance-to-cost ratios too.
Showing themselves to not just be a one trick pony, this continued with the updated second gen CPUs, codenamed Rome, and now using a 7nm manufacturing process. AMD underlined their performance advantage by offering SKU’s with up to 64 cores and boosting CPU cache up to 256MB too. With the recent launch of the 3rd generation CPUs, this time codenamed Milan, AMD have continued to refine and improve the range. At this point it was clear to see AMD are becoming a market leader with a roadmap of innovation that paints a bright future for the company.
The AMD EPYC CPU core’s architecture, called Zen, has also been refined between generations. Naples used what were called Zen cores and these were followed up with Zen2 for Rome and now Zen3 for Milan. To explain this in further detail; AMD EPYC CPUs are made up of a System On Chip (SOC). This comprises of Zen cores, memory, I/O controllers and security features. The architecture is linked together with AMD Infinity Fabric. This fabric facilitates data transmission between all linked components such as the CPU, memory, PCIe bus etc.
For the first gen Naples CPUs, the SOC consisted of 14nm chips. As seen in the Fig 1 below, there are 4 dies and each can contain up to 8 cores, supporting 32 cores in total. Each core can run 2 threads (AMD SMT – Simultaneous Multi-Threading) and Infinity Fabric links all the subsystems together.
Figure 1 - AMD Naples SOC with Infinity Fabric
For Rome, AMD made significant changes to the SOC. The architecture changed to a central I/O hub through which all data to be processed off chip is processed. Surrounding the central I/O die are eight 7nm chiplets called Core Complex Dies (CCD). The CCD’s communicate with the central I/O die with Infinity Fabric links. These CCD’s in turn contain two Core Complexes (CCX). It is within these CCX’s that the actual Rome Zen2 cores reside. Each CCX can have up to 4 cores. For example, looking at the top end 64 core SKU, each of the eight CCD’s will have 2 CCX’s with 4 cores in each for the total of 64.
Figure 2 - AMD Rome CCD's with central I/O Die
The overall architecture for Milan is similar but with refinements made to improve performance, efficiency and security. The chip layout is almost identical but now the L3 cache is a unified pool for all cores rather than split into two 16MB segments, as illustrated by Fig 3. This means even a single core could use the full 32MB L3 cache to improve performance and latency. Applications with datasets suited to large caches and larger virtual machines will benefit from this approach.
Figure 3 - Zen 2 vz Zen 3 L3 cache
Arriving in Milan
With Milan we see some improvements in base clock speeds and now we have high end 64 core SKU’s up to 280W TDP. This is up from 240W on Rome (although there was a mid gen refresh 280W CPU) so allows for higher performance CPUs. Updates also include improvements to AMD’s Infinity Fabric as well as security improvements and the increased L3 cache per core. AMD have also introduced memory interleaving for 6 memory channel configurations. On previous gens 4 and 8 channel interleaving was supported but now configurations populating 6 DIMM channels can also benefit. Memory interleaving allows higher memory bandwidth by spreading memory access across channels. When memory is interleaved, contiguous memory accesses will go to different memory banks and there is no more waiting for the previous memory access to finish.
Due to these improvements in the core design, AMD are claiming a raw performance increase of ~19%. It will be interesting to see how this bears out in our Boston Labs testing!
The Milan range still use the same Socket SP3 aka LGA-4094, this means there is no need to change server platform/motherboards when thinking about upgrades. If you have a platform that can support Rome then Milan will be no problem. All that will be needed is a BIOS update to support the new 7003 processors. The only upgrade path that isn’t possible is a 7001 based server to the newer 7003.
AMD use a 7xxx naming convention across SKUS. Naples line-up used a 7xx1 format, Rome followed suit with 7xx2 and the latest and greatest are the 7xx3 series. The infographic below shows how each SKU is named.
Figure 4 - AMD EPYC SKU naming convention
Below we can see the full CPU stack for Milan CPUs. New to the range are 12, 28 and 56 core CPUs, core densities which were not part of the previous generation of EPYC.
Figure 5 - AMD EPYC 7003 Milan SKU’s
AMD are positioning the SKU’s as shown in the below infographic, separated out onto Core Performance which are the high clock speed SKU’s. The Core Density SKU’s offer the highest number of cores and the Balanced & Optimised are the general all purpose mixed core count offerings.
Figure 6 - Milan CPU categories
Below we can see a summary table of the differences in the 3 generations of CPU stacks. The highlighted shows the Milan changes which may seem incremental but in reality the changes under the hood do result in performance gains which are discussed later in this article.
Figure 7 - Comparing EPYC generations
Where AMD are really shining are the high-end SKU’s. Despite offering a 64 core SKU in the previous 7002 Rome series and building upon that with Milan. In the x86 market there is nothing able to match this now or in the near future. Available today is Intel’s 3rd Gen Xeon Scalable Processor’s highest core SKU which carries 40 cores, 12 more than the 2nd Gen. Comparing the raw numbers of flagship SKUs of these 2 microprocessor giants shows AMD have the 64 core 7763 running at a base clock of 2.45GHz with a boost frequency of 3.5GHz and it boasts 128 lanes of PCIe 4.0 interconnect both with a single or a dual processor setup.
However, Intel’s 8380 offers 40 cores at a base of 2.3GHz and a turbo of 3.4GHz. Whilst Intel have implemented PCIe 4.0, a single CPU offers 64 lanes and the full 128 when working together.
Therefore, at this time it is difficult for Intel to compete with AMD in a top trumps battle based on the datasheet numbers. In practice however, things might be not as clear cut; as we all know only real-world testing can prove the better product for each individual application. It can vary significantly and often surprise us - the proof as they say “is in the pudding”.
We have received some of these promising AMD Milan CPUs at Boston Labs and were keen to put them through their paces. Today we will be focusing on the 32 and 64 core SKU’s, namely the 7453 and top end 7763. We have grabbed some of the Rome gen SKUs for comparison purposes, so that’s the 32 core 7532 and the 64 core 7742. Strictly speaking the 7542 would be the ideal match for the newer 7543 as it has a 2.9/3.4 base and boost clock speed but due to availability the 7532 is the only 32 core Rome SKU we could get our hands on, therefore we will bear this in mind in the benchmark results.
As you can see here, the 64 core gets a nice boost in base frequency with a slightly higher Max Boost Clock. The TDP is somewhat higher of course but we should see a decent increase in performance.
Starting with Cinebench R23, in the multi core test, the 7763 core leads out of the gate as expected and shows an increase of around 11% when compared to the Rome equivalent 7742. The single core benchmark shows similar results.
The V-Ray CPU benchmark uses CPUs for rendering a scene typically performed by GPUs. The benchmark, however, is tuned for CPUs to run such workloads and this test should scale well with more cores. In the results below we can see a significant increase in performance when transitioning from Rome to Milan. 78321 to 112051 is around a 42% increase so this is great performance! The 32 core 7543 also showing an increase of a similar level.
The PassMark CPU benchmark performs a wide variety of calculations. The benchmark score is then based on how fast the CPU can perform the complex series of instructions. The faster the processor is able to complete the tasks, the higher the benchmark score. The tests are based on several tasks such as mathematical, compression, encryption, physics, multithreaded and single threaded. The pattern continues with the 7763 scoring around 20% higher than its predecessor. There is a 30% improvement in the 32 core. These CPUs are clearly all around better for performing a wide variety of computational tasks and workloads.
Finally, we take a look at High Performance Linpack which shows around a 25% performance leap going between the 64 core SKU’s and a more modest 15% increase on the 32 cores.
Since the launch of EPYC back in 2017, AMD have been establishing themselves once again in the enterprise CPU market space. Whilst there were some turbulent years prior with Opteron, we can safely say AMD are at the forefront when it comes to server grade CPUs in terms of performance.
Whilst the 3rd generation Milan series is an evolution rather than a revolution in terms of the overall processor architecture, we do see some very good performance gains as highlighted above. AMD have built on the foundation of the previous generations and thus far the competition can't keep up in this market segment. With Milan, AMD have the core counts, the larger cache and the clock speed boost across SKU’s. Its an exciting time for the enterprise grade CPU market as competition leads to innovation. So far AMD has delivered on their roadmap with great results. Looking forward, the next stage will see Zen4 with the Genoa launch, using a 5nm fabrication process so this will be a fascinating to see what they come up with.
Boston Labs is our onsite R&D and test facility where we develop new products and evaluate the latest technology. New and improved technologies are emerging all the time, and this can be a daunting situation for customers planning their future projects. Making the right decision about new hardware is a difficult proposition, made even harder when clients are unable to test and understand the hardware first before making their purchase. Boston Labs enables our customers to test-drive the latest hardware on-premises or remotely. The AMD CPU’s mentioned in this article are available for testing right now!
We have a range of server solutions that can take advantage of these new CPU’s. The range includes storage servers, graphics servers, workstations, HPC platforms and many more targeting a wide range of market segments. If you’d like to know more please get in contact by emailing [email protected] or call us on 01727 876100 and one of our experienced sales engineers will gladly guide you through building the perfect solution just for you.
Sukhdip Mander, Field Application Engineer, Boston Limited