Boston Labs Unboxes Grace Hopper GPU ARS-111GL-NHR

Posted on 12 April, 2024

At Boston we have the opportunity for supplying high-end servers that can suit any workload. Today, Boston Labs have the pleasure of unboxing the ARS-111GL-NHR – an air-cooled Grace Hopper server with NVIDIA GH200 Tensor Core GPU.

Unlike most other servers, Grace Hopper comes with a combined CPU/GPU module, allowing for the high-end compute workloads which can take advantage of both computing models. Together there is a NVIDIA 72-core NVIDIA Grace CPU on the GH200 Grace Hopper™ Superchip, supporting up to 480GB ECC Low Power DDR5. Additionally included is an NVIDIA BlueField-3 or NVIDIA ConnectX-7 add in card for extreme performance network connectivity and a dedicated IPMI port, the complete package is an all-round computing powerhouse.

Packaging

As the chassis is based on a Supermicro design and arrives in a standard Supermicro 1U server packaging, which comes with the double boxing as standard for any system making sure everything is kept in pristine condition during transport.

The box comes with foam packaging for both shock absorption and to cushion the impact of any shock to the device, making sure that the system stays safe in transit from even the most eager of couriers. For rack mounting, the rails for the chassis come in the box and are ready to be attached. It also contains an accessory box for cables as well as other items (mostly screws) that may be required for the server and rail kit.

The top of the foam packaging holds the accessory box and rails while protecting the server. This can be removed to expose the server itself.

Exterior

Now that the system has been removed from the packaging, let’s take a look at the exterior of the enclosure.

The system comes with plastic film attached to the top of the chassis to help prevent any scratches to said server. Again, this comes as standard with any new system from Supermicro.

This film should always be removed prior to installation in a rack as this will free up the air vents, allowing for airflow to the components in the system when online.

Firstly, let’s take a look at the rear. From left to right we have the PCIe slots where the Connect-X 7 NIC for the system is located, the middle houses the I/O and has a Mini Display port, 1 x 1GB RJ45 port with dedicated IPMI and 1 x USB 3.0 port and the right side has dual redundant power supplies (more on those later).

There is space for two more PCIe 5 x16 AOC cards to be added here. The recommended NICs for this server are either an NVIDIA BlueField-3 or an NVIDIA ConnectX-7; the latter being installed for this particular system. It has 2 x 200Gbit ports which means it can support speeds up to 400 Gb/s in total. Whilst this is a remarkable data transfer speed, some customers are opting for even more with multiple 400Gb/s links not being uncommon in HPC and AI.

The Mini Display Port (mDP) is something which differs from most servers in the Supermicro line up today. Instead, VGA is the main type of display output used by servers as it’s a simple video interface which is commonly available in most datacentres for debugging. Mini DP has its advantages with regards to physical size and capability, but in most cases this port is only going to be used for setup, debugging, and emergency use.

The 1 x RJ45 1 GbE LAN port also serves to support IPMI Server Management via an ASPEED AST2600 BMC chip. The IPMI is used for remote monitoring using KVM and other management tools.

For more information on IPMI and how you can use it in your deployments, we have an introductory article here.

There is also 1 x blue USB 3.0 port allows for transfer speeds up to 5Gbps (625MB/s). This minimal setup for external connectors is not a mistake and is a design choice. It means that there is plenty of space for both optimal cooling and the maximum add-in card slot availability.

The two Power Supply Units are both in a fixed location and both offer 2000W of power each to the server. This offers PSU redundancy if one were to fail for any reason, or if your datacentre has an A / B feed one could be lost. In either scenario, power will not be suddenly disconnected and any workloads lost. Titanium Level efficiency ratings mean that a little energy as possible is lost to power conversion and heat, saving electricity costs and the environment too.

Moving to the front of the server, from left to right, we have a pull-out service tag which shows the IPMI credentials and MAC address for this system. This is located just to the left of the server LEDs. The LEDs themselves will show server status while operational, there is also a soft power button located to the right of these. Finally, we have up to 8 x NVMe drive bays for externally accessible storage.

The service tag contains the unique password for the IPMI login as well as other information for this system only which will be needed to gain access to the management platform. It’s a good place for an asset tag or other marking to identify the server from others. The LEDs here are able to show statuses such as power failure, information, NIC 1 and 2, power, drive and Unit Identification (UID); information and warnings via different colours.

The 8 x NVMe drives are hot-swappable, meaning that they can be removed even while the system is powered on. They are E1.S form factor, which has a minimal amount of space but still allow for several TB of storage per drive and excellent cooling.

INTERIOR

The server lid can be removed through the removal of several screws. Then by applying pressure to the top buttons and sliding the server lid to the front of the system allows us access to the components inside.

From a first glance the most striking feature are the heatsinks. These are much larger compared to traditional CPU or GPU units as they help cool the NVIDIA 72-core NVIDIA Grace CPU and NVIDIA GH200 Tensor Core GPU together. The central heatsink towards the middle of the chassis is the one cooling the CPU whereas the larger lower heatsinks cool the GPU.

The motherboard is a Supermicro G1SMH-G. It is a DC-MHS DNO Type2 which is a smaller form factor of 8.3" x 12" (21cm x 30.48cm). It is quite small in the chassis but allows for the airflow required for the 1U design. At time of writing this chassis is only used for this specific super server design. The memory is built into the package of the processor and GPU itself giving 576GB of coherent memory per node including 480GB LPDDR5X on CPU and 96GB of HBM3 on GPU for accelerated applications.

The chassis front has an additional panel which opens separately to give additional access to the front internals of the chassis.

You can further remove the top of the chassis at the rear by removing 6 screws from the top and two from the sides to gain full access to the rear components.

There is a lot of space in the chassis which will allow the passive heatsinks and fans to cool the system inside the 1U form factor. The fans themselves are 9 x FAN-0248L4 which can run between 28k-31k RPM. Luckily, the system has fan control to ensure that the fans can spin at a much lower speed when this extreme level of cooling is not required. The system can be noisy under load, so bear this in mind when deploying it to smaller in-house data centres. These fans are not hot swappable so if they need to be replaced then the system will need to be powered off to do so.

With the design of the AOCs not being directly attached to the motherboard, MCIO cables connect these instead. These allow the use of the NVMe drives, the I/O board and the PCIe cards through the riser. The motherboard has 8 x MCIO connectors for this function which enables great versatility in the use of each port.

The PDB uses power cables to connect to the motherboard, the I/O and the PCIe riser card. It diverts power from the 2 x 2000W power supplies across the system to where it is needed.

The riser card, the I/O AOC, NICs and power supplies are all located at the rear of the system.

As introduced earlier, the 2 power supplies PWS-2K09A-1R deliver redundant input power and both fit into the rear of the system, directly into the PDU on the left side if facing toward the front of the system.

In the middle of the system is where the I/O AOC is located. It has several power cables and 1 x MCIO cable to allow the transfer of data to and from itself to the motherboard. On the right side of the system directly on the opposite side of the I/O AOC are where the riser cards for the PCIe cards are located. They have 2 x 8x MCIO to 16x PCIe adapters that allow for the data transfer for up to each PCIe card that can be attached without a drop in performance.

With our sample, we have the NVIDIA ConnectX-7 900-9X7AH-0078-DTZ NIC PCIe card, allowing for dual 200GB/s.

Conclusion

The Grace Hopper GPU ARS-111GL-NHR superchip allows for unparalleled AI compute power when it comes to CPU and GPU performance. The 1U design allows for maximum space to performance and up to 200GB/s network speeds per NIC port gives an incredible amount of data transfer to and from the system. With IPMI support for remote access and enhanced cooling this really does put the “super” into “superchip”.

If you’d like to learn more, previously Boston Labs posted more details on the architecture and why it is so significant to modern high performance computing and AI, you can find it here.

Equally, if you want a test drive of NVIDIAs Grace series or are looking for inquiries regarding other related technologies then Boston Labs is standing by to assist.

Please get in touch with our knowledgeable sales and technical team at [email protected] or call us on 01727 876100 and begin your journey with AI today.

Author:
Peter Wilsher
Field Application Engineer

Tags: nvidia, supermicro, grace hopper, unboxing, gh200, boston labs, grace hopper superchip, AI

Test out any of our solutions at Boston Labs

To help our clients make informed decisions about new technologies, we have opened up our research & development facilities and actively encourage customers to try the latest platforms using their own tools and if necessary together with their existing hardware. Remote access is also available

Contact us

Latest Event

Gitex Europe | 21st - 23rd May 2025, Messe Berlin

Boston are exhibiting at GITEX Europe in Berlin!

more info