Case Study: Bilibili boosts AI training efficiency with Alluxio

A Alluxio Case Study

Preview of the Bilibili Case Study

When AI Meets Alluxio at Bilibili Building an Efficient AI Platform for Data Preprocessing and Model Training

Bilibili, a leading video community in China, needed a more efficient way to handle AI data preprocessing and model training across massive datasets. Its Coeus AI platform had to deal with container crashes during data download, application code changes for OSS and HDFS access, data too large for a single machine, and slow repeated pulls from remote storage.

To solve these issues, Bilibili used Alluxio as an intermediate data layer on its Kubernetes-based AI platform, leveraging Alluxio FUSE, unified namespace, serverless deployment, and distributed caching between OSS/HDFS and training workloads. With Alluxio, training became far more stable and efficient: one audio language recognition workload dropped from 242.56 hours with OSS S3Fuse to 64.17 hours, nearly matching local SSD performance, and a video portrait matting model completed each epoch in about 18 hours with an IoU improvement of about 2%.


Open case study document...

Alluxio

20 Case Studies