Baseten
13 Case Studies
A Baseten Case Study
Superhuman, the AI-native email app for productivity, needed a way to deliver instant AI features without disrupting users’ workflows. After replacing off-the-shelf models with dozens of custom and fine-tuned embedding models, the team faced challenges around low-latency inference, global scaling, support for heterogeneous model architectures, and a lean engineering team that didn’t want to build GPU infrastructure in-house. To solve this, Superhuman turned to Baseten and its embedding inference stack.
Baseten deployed Superhuman’s models on Baseten Embeddings Inference, along with autoscaling, multi-cloud capacity management, and performance-optimized client tooling. In just one week, Baseten helped Superhuman cut P95 latency by 80%, reaching 100 ms P95 response time across dozens of custom models and freeing engineers to focus on product work instead of infrastructure.
Loïc Houssier
Chief Technology Officer