Cloudera
293 Case Studies
A Cloudera Case Study
GBIF is an international research infrastructure that aggregates free, open biodiversity data from nearly 1,600 institutions in over 130 countries. As data grew—from museum specimens to citizen photos—its MySQL-based systems became fragmented and unable to support near‑real‑time ingestion, indexing and large‑scale analysis across billions of records, creating a bottleneck for sharing and research.
GBIF moved to a Cloudera Hadoop‑based data lake (using HBase, Solr, Hive, etc.), enabling near‑real‑time processing, quality control and indexing with updates at about 10,000 records per second. The platform supports filtered searches across 1.4 billion records with unrestricted exports, saved roughly one full‑time equivalent (~20% of team capacity), and significantly improved access and analysis for researchers worldwide.
Tim Robertson
Informatics Team Lead