Case Study: Connectifier restores MongoDB performance and uptime with Datadog

A Datadog Case Study

Preview of the Connectifier Case Study

How Connectifier unfroze MongoDB with Datadog

Connectifier, a rapidly growing recruiting platform that runs across dozens of machines and processes hundreds of millions of datapoints, began experiencing severe performance degradation in August that forced SREs to reboot services multiple times a day. Suspecting anything from memory leaks to database contention, the team needed better cross-service correlation to find the root cause and started using Datadog to gather and correlate metrics.

Using Datadog (and contributing a patch to its open-source agent to handle MongoDB timeouts), Connectifier quickly traced the issue to MongoDB’s WiredTiger cache: cache growth preceded spikes in read tickets and latency. They increased the WiredTiger cache live to stop the immediate outages, added tcmalloc and cursor metrics for deeper visibility, and ultimately upgraded to MongoDB 3.0.6 to eliminate the bug. The result was stable performance, effective alerting to preempt outages, and a smoother troubleshooting workflow enabled by Datadog’s integrations and open-source agent.


Open case study document...

Connectifier

Ben McCann

Co-Founder


Datadog

90 Case Studies