Case Study: Fireworks.ai achieves 4x higher throughput and up to 50% lower latency with Amazon Web Services

Fireworks.ai Delivers 4x Throughput for Generative AI and Cuts Latency by up to 50% Using AWS and NVIDIA

Fireworks.ai worked with Amazon Web Services to optimize the compute power behind its demanding generative AI inference engine. The company had been using Amazon EC2 P4d instances and needed a more flexible, cost-optimized way to meet growing performance demands while keeping latency and costs under control.

Amazon Web Services helped Fireworks.ai move to Amazon EC2 P5 instances, enabling up to 4x higher throughput per instance and cost reductions of 4x for some customers. The results included a 30–50% latency reduction for one summarization model, more than 2x faster backend latency and doubled completion acceptance for Sourcegraph’s Cody, and strong cost-per-performance gains across customer use cases.

View this case study…

Fireworks.ai

Dmytro Dzhulgakov

Co-Founder and Chief Technology Officer

Amazon Web Services

2483 Case Studies

Case Study: Fireworks.ai achieves 4x higher throughput and up to 50% lower latency with Amazon Web Services

Fireworks.ai Delivers 4x Throughput for Generative AI and Cuts Latency by up to 50% Using AWS and NVIDIA

Fireworks.ai

Amazon Web Services

Was it helpful? Rate this case study:

Thank you for your feedback.