Alluxio
20 Case Studies
A Alluxio Case Study
Baidu, the leading Chinese-language internet search provider, needed a faster way to run ad-hoc queries across petabytes of distributed data. Its existing Hadoop and Spark SQL-based workflows still left many queries taking minutes or even hours, especially when data had to move across data centers, making interactive analysis difficult for product teams.
To solve this, Baidu implemented Alluxio as a compute-local in-memory storage layer alongside Spark SQL, using it to cache “hot” data near compute nodes and support tiered storage at scale. The result was a dramatic performance boost: typical queries dropped from more than 1,000 seconds on Hive to about 20 seconds with Alluxio, with some workloads running 30 times faster and meeting the team’s goal of interactive query response times under 30 seconds.
Shaoshan Liu
Senior Architect