Case Study: Medium achieves reduced DynamoDB throttling and improved performance with Datadog

A Datadog Case Study

Preview of the Medium Case Study

How Medium monitors DynamoDB performance

Medium, a fast-growing publishing platform, relies on DynamoDB to scale its infrastructure but ran into performance problems caused by throttling. Whole-table provisioned capacity hid per-partition limits and “hot keys” (viral posts) could exhaust a single partition’s throughput, leading to high latency and user-facing errors unless anticipated and managed.

To address this, Medium uses Datadog plus an ELK pipeline: they estimate partition counts to compute per-partition limits, log and surface hottest keys, and report a custom throttling metric from the app to Datadog for real-time alerts. They also front DynamoDB with Redis, set staged/prod alerting (email/Slack/PagerDuty), track backups with a custom metric, and plan automated capacity tuning—resulting in better visibility, fewer user-facing failures, faster response to incidents, and opportunities to optimize costs.


Open case study document...

Datadog

90 Case Studies