Case Study: Coursera achieves high ElastiCache performance and prevents cache evictions with Datadog monitoring

A Datadog Case Study

Preview of the Coursera Case Study

How Coursera monitors ElastiCache and Memcached performance

Coursera, a leading online education platform with thousands of courses and millions of learners, uses Amazon ElastiCache (Memcached) as a read-through cache on top of Cassandra to serve course and membership metadata at scale. The challenge was to keep cache memory healthy, avoid evictions and hot-key overloads, and detect network saturation or node anomalies quickly—because any drop in hit rate would dramatically increase backend load and hurt user experience.

Coursera implemented Datadog to collect and correlate ElastiCache and application metrics, track memory, get/set distribution, network throughput and events, and set targeted alerts (evictions, available memory, hit rate, swap) routed via PagerDuty, Slack or email. This gave them unified dashboards and timeboards for fast root-cause analysis, proactive alerts per node/cluster, fewer evictions, sustained high hit rates, and faster incident response and capacity planning.


Open case study document...

Datadog

90 Case Studies