Case Study: Baidu Achieves 50x Faster Queries with Alluxio

A Alluxio Case Study

Preview of the Baidu Case Study

Baidu Queries Data 30 Times Faster with Alluxio

Baidu, the leading Chinese-language internet search provider, needed a faster way to run ad-hoc queries across petabytes of distributed data. Its existing Hadoop and Spark SQL-based workflows still left many queries taking minutes or even hours, especially when data had to move across data centers, making interactive analysis difficult for product teams.

To solve this, Baidu implemented Alluxio as a compute-local in-memory storage layer alongside Spark SQL, using it to cache “hot” data near compute nodes and support tiered storage at scale. The result was a dramatic performance boost: typical queries dropped from more than 1,000 seconds on Hive to about 20 seconds with Alluxio, with some workloads running 30 times faster and meeting the team’s goal of interactive query response times under 30 seconds.


Open case study document...

Baidu

Shaoshan Liu

Senior Architect


Alluxio

20 Case Studies