Case Study: Lyft achieves full-control, highly scalable operational logging with Elastic (self-managed Elasticsearch)

A Elastic Case Study

Preview of the Lyft Case Study

Migrating from Splunk Cloud to Amazon ES to Self-Managed Elasticsearch

Lyft, a fast‑growing rideshare handling hundreds of millions of rides annually with 200+ microservices and 10,000+ EC2 instances, needed reliable, scalable operational logging to keep services running. They moved off Splunk Cloud because of retention limits, ingest backlogs and cost, then onto Amazon Elasticsearch Service — but AWS‑imposed limits (older ES versions, EBS performance, node caps, and lack of direct cluster access that forced slow support escalations) created reliability and control challenges.

Lyft first transitioned to Amazon ES in about a month, then—wanting full feature access and operational control—migrated to a self‑managed Elasticsearch deployment in two weeks. Removing ingest limits enabled growth from ~100K to 1.5M events per minute, and self‑management restored access to full APIs, better performance choices, and quicker cluster recovery, giving the Observability team the control needed to keep clusters green and engineers productive.


Open case study document...

Lyft

Michael Goldsby

Software Engineer


Elastic

349 Case Studies