Groq
14 Case Studies
A Groq Case Study
Willow, an AI voice dictation startup, struggled with reliability and latency while self‑hosting LLMs—weekly outages from GPU instability and slow responses on long prompts were hurting user trust and growth. To solve this, Willow moved its fine‑tuned Llama-3.1-8b (LoRA) workload to Groq’s cloud infrastructure (GroqCloud) for the real‑time performance and uptime it needed.
Groq ran Willow’s LoRA fine‑tuned model on a dedicated GroqCloud instance, using its LPU architecture and speculative decoding to scale token throughput and cut latency. The result: zero downtime, 300–500 ms faster responses, reduced support requests, higher user retention, and near‑on‑demand model weight swaps—measurable gains that made real‑time voice reliable and scalable for Willow.
Lawrence Liu
CTO & Co-Founder