Case Study: Unisound accelerates AI training and improves GPU utilization with Alluxio and Fluid

A Alluxio Case Study

Preview of the Unisound Case Study

Speeding Up the Atlas Supercomputing Platform with Fluid + Alluxio

Unisound, an artificial intelligence company building a large-scale Atlas supercomputing platform for AI workloads, was facing serious storage and I/O challenges as its user base grew. With compute decoupled from storage on a Lustre-based architecture, the team ran into bandwidth bottlenecks, slow access to massive numbers of small files, metadata pressure, and storage redundancy, all of which reduced GPU utilization and lengthened model training times.

To address this, Unisound implemented Alluxio with Fluid as a cloud-native cache layer between compute and storage, integrated into its Kubernetes-based stack and atlasctl workflow. Alluxio provided tiered caching, metadata and data warmup, and POSIX access through Alluxio FUSE, while Fluid handled orchestration and cache management. The result was significantly faster training and lower storage traffic: noise reduction workloads saw nearly 10x faster first reads with warmup and about 90% GPU utilization, while OCR tests reduced warm-cache node I/O from 1300 Mb/s to nearly zero and increased GPU usage from 69.59% to 91.46%.


Open case study document...

Alluxio

20 Case Studies