Case Study: Cursor achieves 1000 tokens/sec fast code edits with Fireworks AI's Speculative Decoding API

How Cursor built Fast Apply using the Speculative Decoding API

Cursor, an AI-native IDE, was facing challenges with the slow performance and inaccuracies of existing frontier models like GPT-4 when handling large code edits. These issues were disrupting developer workflows. To overcome this, Cursor sought a solution with Fireworks AI to build their "Fast Apply" feature.

Fireworks AI deployed Cursor's custom fine-tuned Llama-3-70b model using their Speculative Decoding API. This solution enabled parallel token generation, which led to a remarkable speed of approximately 1000 tokens per second. This represented a 13x speedup over standard inference and a 9x improvement over Cursor's previous GPT-4 deployment, allowing developers to apply code changes instantly.

View this case study…

Cursor

Fireworks AI

7 Case Studies

Case Study: Cursor achieves 1000 tokens/sec fast code edits with Fireworks AI's Speculative Decoding API

How Cursor built Fast Apply using the Speculative Decoding API

Cursor

Fireworks AI

Was it helpful? Rate this case study:

Thank you for your feedback.