NVIDIA Run:ai
7 Case Studies
A NVIDIA Run:ai Case Study
NVIDIA faced the challenge of managing and allocating hundreds of on‑prem NVIDIA DGX systems and T4 GPUs (>200) for more than 100 researchers in a fully air‑gapped defense environment, where inference workloads demanded extremely low latency and maximum throughput. They engaged NVIDIA Run:ai to deploy Run:ai’s Kubernetes‑based orchestration (Atlas) together with NVIDIA Triton Inference Server and NVIDIA GPU tooling to simplify shared cluster management and dynamic GPU allocation.
NVIDIA Run:ai pooled GPUs into logical build/train/inference pools and applied advanced scheduling, gang scheduling and topology awareness so jobs automatically received the right compute and memory. The solution drove GPU utilization to ~95–100%, created a private managed GPU cloud for on‑demand access by the research teams, and boosted inference throughput by 4.7x while accelerating training turnaround and improving capacity planning.