Case Study: Zed Industries achieves 45% lower latency and 3.6x higher throughput with Baseten

A Baseten Case Study

Preview of the Zed Industries Case Study

Zed Industries serves 2x faster code completions with the Baseten Inference Stack

Zed Industries, the team behind the high-performance Zed code editor, needed ultra-fast inference for its Edit Prediction feature ahead of launch. With a tight timeline, they were looking for lower latency, higher throughput, multi-region capacity, and more visibility than their previous inference provider could offer. Baseten’s inference platform and performance optimization support were brought in to help power their code generation LLM, Zeta.

Baseten worked closely with Zed’s team to optimize the deployment using TensorRT-LLM, KV caching, custom speculative decoding, lookahead decoding, and globally distributed GPUs with geo-aware routing. The result was 100% uptime, 45% lower p90 latency, 3.6x higher throughput, and unlimited autoscaling across regions, with Zed later achieving over 2x faster Edit Prediction on Baseten than with its prior provider.

View this case study…

Zed Industries

Nathan Sobo

CEO & Co-founder

Baseten

13 Case Studies

Case Study: Zed Industries achieves 45% lower latency and 3.6x higher throughput with Baseten

Zed Industries serves 2x faster code completions with the Baseten Inference Stack

Zed Industries

Baseten

Was it helpful? Rate this case study:

Thank you for your feedback.