Kubernetes
55 Case Studies
A Kubernetes Case Study
OpenAI, the San Francisco-based artificial intelligence research lab, needed a way to run and scale machine learning experiments quickly, portably, and at low cost across cloud and on-prem environments. To solve this, OpenAI adopted **Kubernetes** for managing deep learning workloads and batch scheduling, using it as a workload manager across distributed containerized experiments.
With **Kubernetes**, OpenAI first ran clusters on AWS, then migrated them to Azure, and later deployed hybrid clusters with control planes in Azure and nodes in its own data centers. The result was greater portability, lower costs, and much faster experimentation: researchers were able to launch projects in just two or three days and scale them to hundreds of GPUs within one to two weeks, versus months previously, while some teams saw a 10x increase in scale without significant engineering overhead.
Christopher Berner
Head of Infrastructure