Case Study: Rime achieves sub-300 ms p99 latency and 100% uptime with Baseten

A Baseten Case Study

Preview of the Rime Case Study

Rime serves speech synthesis API with stellar uptime using Baseten

Rime, a San Francisco-based company building a speech synthesis API with 200+ distinct voices, needed a fast, reliable way to bring its custom text-to-speech models to market. After training its own best-in-class models, the team needed enterprise-grade inference infrastructure that could support real-time use cases, strict uptime requirements, and low p99 latency—especially for large audio responses.

Baseten provided distributed model serving infrastructure with multi-region deployment, smart batching, CPU/GPU workload separation, and flexible GPU scaling. With Baseten, Rime achieved p99 latencies below its 300 ms SLA, maintained 100% uptime in 2024, and was able to support compliance-sensitive enterprise customers while scaling efficiently across different NVIDIA GPU types.


View this case study…

Rime

Lily Clifford

Co-Founder and CEO


Baseten

13 Case Studies