Elastic
349 Case Studies
A Elastic Case Study
Workday Search needed a low‑maintenance, scalable way to collect and analyze operational and correctness metrics as it evolved from a monolith to microservices at large scale. Their existing setup (statsd → InfluxDB → Grafana) became a bottleneck as logs and metrics exploded and they faced hard correctness requirements (per‑tenant on‑disk encryption, multi‑datacenter search, and strict reindex/query SLAs).
They replaced InfluxDB with a dedicated Elasticsearch “metrics” cluster fed by Logstash, Marvel and application logs (with a grok/json/ruby parsing pipeline and a Scala StatsLogger for consistent emission), and exposed data to Grafana/Kibana/Nagios. The pipeline now handles billions of points (two‑week totals ~15B logs and ~9.3B metrics), enabled index rotation/curation, drove incremental indexing coverage from ~50% to >90%, achieved 94% of tenants reindexed in <12 minutes (100% <3 hours), and provided correctness metrics (set diffs, Kendall tau) that uncovered and helped fix multi‑shard discrepancies in search results.
Bodecker DellaMaria
Software Engineer