Case Study: Fastly achieves scalable distributed health checking with HashiCorp Serf

A HashiCorp Case Study

Preview of the Fastly Case Study

Fastly - Customer Case Study

Fastly, the edge cloud and CDN provider, needed a highly accurate and timely way to health-check servers across its globally distributed points of presence. Because Fastly’s PoPs are space- and power-constrained and must stay available even under heavy load, it had to avoid both leaving unhealthy hosts in rotation and mistakenly removing healthy ones during traffic spikes. To address this, Fastly worked with HashiCorp and used Serf as part of its production health-checking approach.

HashiCorp’s solution used Serf’s gossip-based distribution to share health signals across hosts, while each node applied local filtering, anomaly detection, and hysteresis to make stable up/down decisions for its own server. This architecture helped Fastly detect failures quickly, reduce flapping, and avoid cascading outages that could take an entire PoP out of service. The case study does not provide hard numeric results, but it reports that the system has been used successfully in production and that Serf proved easy to use, scalable, and robust.


Open case study document...

Fastly

Lorenzo Saino

Senior Software Engineer


HashiCorp

190 Case Studies