Case Study: OpenAI scales deep learning experiments faster with Kubernetes from CNCF

A Cloud Native Computing Foundation Case Study

Preview of the OpenAI Case Study

OpenAI Launching and scaling up experiments, made simple

OpenAI, an artificial intelligence research lab, faced the challenge of needing a portable, efficient, and cost-effective infrastructure to run and scale its deep learning experiments across both cloud and its own data centers. To solve this, they turned to the vendor, the Cloud Native Computing Foundation (CNCF), and implemented Kubernetes.

The solution from the Cloud Native Computing Foundation involved using Kubernetes as a batch scheduling system with an autoscaler to dynamically manage its hybrid cloud infrastructure. This provided a consistent API that allowed OpenAI to easily move experiments between clusters, significantly reducing launch times from months to days and scaling workloads by 10x or 50x. The results included major cost reductions, lower latency, and access to specialized on-premise hardware.


View this case study…

OpenAI

Christopher Berner

Head of Infrastructure


Cloud Native Computing Foundation

134 Case Studies