Case Study: Synadia catches one-in-a-million NATS data loss bugs with Antithesis

A Antithesis Case Study

Preview of the Synadia Case Study

Hunting for one-in-a-million bugs in NATS

Synadia, the company behind the open-source CNCF project NATS, faced the immense challenge of ensuring the absolute dependability of its distributed messaging system. Despite employing a rigorous seven-layer testing strategy that included traditional methods like chaos testing, they struggled to find exceedingly rare, complex bugs in their Raft consensus layer that could lead to catastrophic data loss in durable message queues. To hunt for these one-in-a-million bugs, they turned to the testing platform from Antithesis.

Antithesis provided a solution that systematically searched for and reproduced deeply nested failure scenarios that traditional methods missed. In their very first experiment, Antithesis discovered a terrifying data loss bug triggered by a specific sequence of failures during server recovery and restarts. This find allowed Synadia to fix a critical issue, significantly improving NATS's reliability. As a result, Antithesis has become an indispensable tool for Synadia, making development more linear by catching severe bugs early and also helping them quickly reproduce flaky tests and customer issues.


View this case study…

Synadia

Marco Primi

Synadia


Antithesis

6 Case Studies