Case Study: PagerDuty achieves safe, 30x-a-day deployments with Datadog

A Datadog Case Study

Preview of the PagerDuty Case Study

How PagerDuty deploys safely with Datadog

PagerDuty, the incident‑management platform, was moving to rapid continuous deployment (up to ~30 deploys/day) and relied on canary releases monitored in Datadog to catch regressions. After decoupling developers from manual deploys, a change that broke background task processing slipped through because alerts auto‑resolved during a rollback, allowing the faulty code to reach production and cause a ~10‑minute outage.

PagerDuty addressed this by scripting automated canary checks against Datadog metrics (initially using a simple Ruby script) and integrating those checks into their build pipeline: after a short canary window the script validates key metrics, halts rollout and notifies the committer on failure. The result was safer, faster continuous deployment with fewer manual babysits, quicker fixes and lower risk of similar incidents.


Open case study document...

PagerDuty

Mia Henderson

Site Reliability Engineer


Datadog

90 Case Studies