Case Study: Uber achieves faster, fresher ETL with Onehouse's Apache Hudi

A Onehouse Case Study

Preview of the Uber Case Study

Meeting the Demands of a High-Growth, Real-Time Business with the Efficiency of Incremental Processing

Uber, the global ride-hailing leader, faced significant data management challenges as it scaled. Its reliance on real-time and near real-time data for operations like ETA predictions and fraud detection was hampered by inefficient and costly traditional ETL pipelines. These bulk data ingestion workflows were ill-equipped for the necessary incremental updates, leading to high operational expenses and difficulties in maintaining the required data freshness for its services. To address this, Uber turned to the vendor Onehouse for a solution.

Onehouse provided the solution through Apache Hudi, an open-source framework it originated, which created a transactional data lakehouse. This technology allowed Uber to process only new or updated data incrementally, drastically reducing file recopying. The implementation yielded substantial results: Uber achieved 100% strong data consistency, reduced pipeline runtimes by 50%, improved SLAs by 60%, and cut ETL costs for critical tables by up to 79%, all while significantly enhancing data quality and observability.


Open case study document...

Onehouse

7 Case Studies