Case Study: PagerDuty achieves high availability and near-zero downtime with Percona XtraDB Cluster (Percona)

A Percona Case Study

Preview of the PagerDuty Case Study

PagerDuty Relies on Percona for Its MySQL Cluster in the Cloud

PagerDuty, a service that routes alerts and on-call schedules for IT and DevOps teams, was running its primary MySQL infrastructure on EC2 using MySQL Community Edition with DRBD-based synchronous replication. That setup required manual failovers that caused at least two minutes of downtime (with longer cold-server spin-up times), relied on EBS storage that could suffer network-related outages, and forced slow recoveries—MySQL dumps took ~15 hours to restore.

PagerDuty migrated to Percona’s stack—Percona Server with Percona XtraDB Cluster (three-node cluster on EC2 behind HAProxy), Percona XtraBackup, and Percona Toolkit. The new architecture enables fast automated failover with no customer impact, safe use of EC2 instance-store for higher-performance local disks, and much faster recovery: binary restores in 2–3 hours and node-based resyncs that can get replication running in minutes. Overall, the move eliminated previous downtime pain points and delivered reliable, production-proven high availability.


Open case study document...

PagerDuty

Doug Barth

Operations Engineer


Percona

35 Case Studies