Case Study: GBIF achieves real-time, scalable access to 1.4B biodiversity records with Cloudera

A Cloudera Case Study

Preview of the GBIF Case Study

Global Biodiversity Information Facility (GBIF) facilitating free and open access to biodiversity data in real-time

GBIF is an international research infrastructure that aggregates free, open biodiversity data from nearly 1,600 institutions in over 130 countries. As data grew—from museum specimens to citizen photos—its MySQL-based systems became fragmented and unable to support near‑real‑time ingestion, indexing and large‑scale analysis across billions of records, creating a bottleneck for sharing and research.

GBIF moved to a Cloudera Hadoop‑based data lake (using HBase, Solr, Hive, etc.), enabling near‑real‑time processing, quality control and indexing with updates at about 10,000 records per second. The platform supports filtered searches across 1.4 billion records with unrestricted exports, saved roughly one full‑time equivalent (~20% of team capacity), and significantly improved access and analysis for researchers worldwide.


Open case study document...

GBIF

Tim Robertson

Informatics Team Lead


Cloudera

293 Case Studies