Case Study: Leibniz Supercomputing Centre achieves stable supercomputer monitoring with Icinga

A Icinga Case Study

Preview of the Leibniz Supercomputing Centre (LRZ) Case Study

Leibniz Supercomputing Centre - Customer Case Study

The Leibniz Supercomputing Centre (LRZ), a foremost European supercomputing center, faced the challenge of assuring the stable operation of its immensely powerful and complex SuperMUC-NG system. Their legacy monitoring tool was insufficient, and they needed a way to identify issues proactively without impacting the performance of the supercomputer, which supports critical scientific research.

By implementing Icinga 2 in a high-availability setup with a master-satellite hierarchy, Icinga provided a robust and stable solution. This deployment monitors over 7,800 hosts and 76,000 services, enabling LRZ to prevent problems, efficiently share vital data across departments, and safeguard critical infrastructure like its cooling system. The result is a streamlined, reliable monitoring process that is crucial for the supercomputer's operation, leaving the system administrators highly satisfied and planning further expansion with Icinga.


Open case study document...

Leibniz Supercomputing Centre (LRZ)

Markus Michael Müller

System Administrator High Performance Systems Department


Icinga

62 Case Studies