Case Study: Cursor achieves 1000 tokens/sec fast code edits with Fireworks AI's Speculative Decoding API

A Fireworks AI Case Study

Preview of the Cursor Case Study

How Cursor built Fast Apply using the Speculative Decoding API

Cursor, an AI-native IDE, was facing challenges with the slow performance and inaccuracies of existing frontier models like GPT-4 when handling large code edits. These issues were disrupting developer workflows. To overcome this, Cursor sought a solution with Fireworks AI to build their "Fast Apply" feature.

Fireworks AI deployed Cursor's custom fine-tuned Llama-3-70b model using their Speculative Decoding API. This solution enabled parallel token generation, which led to a remarkable speed of approximately 1000 tokens per second. This represented a 13x speedup over standard inference and a 9x improvement over Cursor's previous GPT-4 deployment, allowing developers to apply code changes instantly.


View this case study…

Fireworks AI

7 Case Studies