Case Study: Nvidia achieves near-linear scaling with 32 GPUs using GigaIO FabreX

A GigaIO Case Study

Preview of the Nvidia Case Study

Nvidia - Customer Case Study

Nvidia needed to determine whether a single server could support a very large GPU farm and how well it would scale for multi-GPU machine learning workloads. GigaIO addressed this challenge using its FabreX rack-scale composable infrastructure with a 1U dual-socket AMD Rome server and two Accelerator Pooling Appliances, each populated with 8 NVIDIA GPUs, for a total of 32 GPUs. The software stack included NVIDIA’s NCCL and the ResNet-50 benchmark.

With GigaIO FabreX, Nvidia’s 32-GPU configuration delivered industry-first support for a single-server GPU farm of this size. The system achieved over 12.5x single-GPU performance at 16 GPUs, or 78% efficiency, and over 20x single-GPU performance at 32 GPUs, reaching 63% efficiency versus ideal linear scaling. GigaIO’s solution demonstrated strong scalability, efficiency, and resource pooling benefits for high-performance AI and ML workloads.


Open case study document...

GigaIO

10 Case Studies