Case Study: OpenX serves 2.5 million predictions per second in under 15 milliseconds with TensorFlow

How OpenX Trains and Serves for a Million Queries per Second in under 15 Milliseconds

OpenX, a major independent advertising exchange, faced the challenge of processing over one million ad requests per second with a strict latency requirement of under 15 milliseconds for machine learning inference. They needed to build a system to prioritize traffic for their buyers and sellers, but their legacy on-premise infrastructure was slow to update and prone to training-serving skew, making it difficult to deploy models quickly and reliably at such a massive scale.

Using the TensorFlow ecosystem, OpenX built a new solution with TensorFlow Extended (TFX) for orchestrating their ML pipelines and TensorFlow Serving for model deployment. This allowed them to train on petabytes of data daily and deploy models frequently without engineering support. The result was a TensorFlow Serving deployment on Google Kubernetes Engine that processes 2.5 million prediction queries per second, each in under 15 milliseconds, while enabling faster model improvements and delivering greater value to their customers.

View this case study…

OpenX

Larry Price

OpenX

TensorFlow

10 Case Studies

Case Study: OpenX serves 2.5 million predictions per second in under 15 milliseconds with TensorFlow

How OpenX Trains and Serves for a Million Queries per Second in under 15 Milliseconds

OpenX

TensorFlow

Was it helpful? Rate this case study:

Thank you for your feedback.