Alluxio
20 Case Studies
A Alluxio Case Study
Barclays needed a faster, more flexible way to work with large datasets in Spark. Their existing process loaded data from a relational database into Spark for analysis, but Spark’s in-memory cache was volatile across job restarts, causing repeated reloads that could take half an hour or more and slow down iterative data science work.
Barclays implemented Alluxio as an in-memory storage layer integrated with Spark, using it to keep raw and processed data available across iterations without reloading from the RDBMS. With Alluxio, the team could reuse data for ETL, model training, and evaluation, cutting workflow iteration time from hours to seconds and dramatically reducing waiting time, network traffic, and database load.