vScaler ResNet Benchmarks on NVIDIA DGX-2

Posted on 2019/02/18
The NVIDIA® DGX-2™ arrived at vScaler's internal lab and they couldn't wait to test it and integrate their cloud system with it!

 
Paving the way for modern AI, the NVIDIA DGX-2 is recognised as 'the World's most powerful Deep Learning system' with unprecedented levels of compute, the platform is targeted at deep learning computing boasting the processing power of its equally magnificent predecessor, the NVIDIA® DGX-1™. As AI is getting increasingly complex, the DGX-2 packs 16 of the world's most advanced GPUs and now model complexity and size are no longer constrained by the limits of traditional architecture.


System Specs: NVIDIA DGX-2

  • 16 x fully connected  NVIDIA® Tesla® V100 32GB GPUs
  • 2 high-speed Ethernet networking cards
  • 1.5TB system memory
  • 30TB NVMe SSDs
  • Latest Generation CPUs for faster, more resilient boot and storage management


vScaler 

vScaler is a private Cloud platform built on Open Source technology that enables the creation of a secure, scalable, cost-effective, flexible IT infrastructure. Find out more about vScaler here.

The integration of vScaler was seamless thanks to a preconfigured image that the team had been using for their DeepOps integration flashing the system with that. This enabled all the tools needed to access the NVIDIA GPU container bank with Kubernetes and similar optimisation options.


TensorFlow Benchmarking for ResNet Models

In order to assess the performance of the DGX-2, vScaler used the ResNet Model which is a popular industry benchmarking platform for assessing training and inference performance. vScaler selected two common models; ResNet-50 and ResNet-152, which are 50 and 152 layer Residual Network models respectively. To guarantee each GPU was fully utilised, each model was run using different batch sizes and the GPU count was tested 3 times over 20 epochs, with vScaler recording the average result. To understand the results better, you can view a full list here!


Results

When analysing the results of the benchmarks for each model there are some important trends to be observed, the most significant of which is the near linear scaling of performance as the GPU count increases. For example, consider the case of ResNet-50 training with a batch size of 256. With a single GPU we were able to achieve almost 899 images / second, and with 16 GPUs this increased to 12497 images/second. This is close to a 14x speedup, which represents an ~87% efficiency. The same scaling performance can also be observed in the ResNet-152 results, which shows a 86% efficiency when scaling from 1 to 16 V100 GPUs.

 

If you would like to test the NVIDIA® DGX-2 yourself, then get in touch with us and we can book you into our Boston Labs today!


To understand more about the NVIDIA® DGX-2 and vScaler, use the resources below:
 
 
 
 
 
 

 

RSS Feed

Sign up to our RSS feed and get the latest news delivered as it happens.

click here

Test out any of our solutions at Boston Labs

To help our clients make informed decisions about new technologies, we have opened up our research & development facilities and actively encourage customers to try the latest platforms using their own tools and if necessary together with their existing hardware. Remote access is also available

Contact us

XDF Europe 2019

Latest Event

XDF Europe 2019 | 12th - 13th November 2019, The World Forum, Hague, Netherlands

XDF connects software developers and system designers to the deep expertise of Xilinx engineers, partners, and industry leaders. You will leave the forum with insights and inspiration to tackle your next breakthrough in application or system design.

more info