Datadog
90 Case Studies
A Datadog Case Study
PagerDuty, the incident‑management platform, was moving to rapid continuous deployment (up to ~30 deploys/day) and relied on canary releases monitored in Datadog to catch regressions. After decoupling developers from manual deploys, a change that broke background task processing slipped through because alerts auto‑resolved during a rollback, allowing the faulty code to reach production and cause a ~10‑minute outage.
PagerDuty addressed this by scripting automated canary checks against Datadog metrics (initially using a simple Ruby script) and integrating those checks into their build pipeline: after a short canary window the script validates key metrics, halts rollout and notifies the committer on failure. The result was safer, faster continuous deployment with fewer manual babysits, quicker fixes and lower risk of similar incidents.
Mia Henderson
Site Reliability Engineer