Datadog
90 Case Studies
A Datadog Case Study
Connectifier, a rapidly growing recruiting platform that runs across dozens of machines and processes hundreds of millions of datapoints, began experiencing severe performance degradation in August that forced SREs to reboot services multiple times a day. Suspecting anything from memory leaks to database contention, the team needed better cross-service correlation to find the root cause and started using Datadog to gather and correlate metrics.
Using Datadog (and contributing a patch to its open-source agent to handle MongoDB timeouts), Connectifier quickly traced the issue to MongoDB’s WiredTiger cache: cache growth preceded spikes in read tickets and latency. They increased the WiredTiger cache live to stop the immediate outages, added tcmalloc and cursor metrics for deeper visibility, and ultimately upgraded to MongoDB 3.0.6 to eliminate the bug. The result was stable performance, effective alerting to preempt outages, and a smoother troubleshooting workflow enabled by Datadog’s integrations and open-source agent.
Ben McCann
Co-Founder