Case Study: The Factory achieves high-availability observability across multi-data centers with Grafana

A Grafana Case Study

Preview of the The Factory Case Study

Bootstrapping a cloud native multi-data center observability stack

The Factory, a DevOps cloud engineering team, needed to make its observability stack resilient across two data centers after realizing that its monitoring tools were only available in one location. Working with Grafana and tools like Grafana Loki, Grafana Tempo, Grafana Agent, Prometheus, Alertmanager, and Consul, they set out to ensure they could still see what was happening in their applications even if a component or entire campus failed.

Grafana helped The Factory build a duplicated, cloud-native multi-data center observability setup with service discovery, DNS failover, replicated Grafana instances, Prometheus scraping via Consul, dual-write logging with Loki, tracing through Grafana Agent into two Tempo endpoints, and clustered Alertmanager. The result was a fully functioning high-availability observability stack across two data centers, giving The Factory confidence they could lose either individual services or an entire data center and still maintain visibility, though the case study does not cite a specific quantitative performance gain.


View this case study…

The Factory

Bram Vogelaar

DevOps Cloud Engineer


Grafana

108 Case Studies