Case Study: Superhuman achieves 80% lower embedding inference latency with Baseten

A Baseten Case Study

Superhuman achieves 80% faster embedding model inference with Baseten

Superhuman, the AI-native email app for productivity, needed a way to deliver instant AI features without disrupting users’ workflows. After replacing off-the-shelf models with dozens of custom and fine-tuned embedding models, the team faced challenges around low-latency inference, global scaling, support for heterogeneous model architectures, and a lean engineering team that didn’t want to build GPU infrastructure in-house. To solve this, Superhuman turned to Baseten and its embedding inference stack.

Baseten deployed Superhuman’s models on Baseten Embeddings Inference, along with autoscaling, multi-cloud capacity management, and performance-optimized client tooling. In just one week, Baseten helped Superhuman cut P95 latency by 80%, reaching 100 ms P95 response time across dozens of custom models and freeing engineers to focus on product work instead of infrastructure.

View this case study…

Superhuman

Loïc Houssier

Chief Technology Officer

Baseten

13 Case Studies

Case Study: Superhuman achieves 80% lower embedding inference latency with Baseten

Superhuman achieves 80% faster embedding model inference with Baseten

Superhuman

Baseten

Was it helpful? Rate this case study:

Thank you for your feedback.