Case Study: Gremlin achieves real-time chaos monitoring and faster recovery with Datadog

A Datadog Case Study

Preview of the Gremlin Case Study

How Gremlin monitors its own Chaos Engineering service with Datadog

Gremlin, a Chaos Engineering company, needed a way to monitor its own platform and prove reliability across diverse environments before failures reached customers. The challenge was getting real-time, actionable visibility into system health and key user flows while running intentional failure experiments across microservices, VMs, and APIs.

Gremlin solved this by integrating with Datadog: using template variables to build dynamic dashboards, Datadog Synthetic Monitoring to watch critical user journeys, and a Gremlin–Datadog integration that publishes chaos events and annotates graphs in real time. The result is faster detection and troubleshooting during experiments, clear visual correlation of attacks to system behavior, and greater confidence to run controlled failures that uncover and fix issues before they impact customers.


Open case study document...

Gremlin

Matthew Fornaciari

Co-Founder and CTO


Datadog

90 Case Studies