Pure Storage
181 Case Studies
A Pure Storage Case Study
The University of California–Berkeley’s RISELab team builds high-performance genomic analysis tools (ADAM on Apache Spark) to accelerate DNA sequencing for research and clinical use. Their workload produces petabytes of data—single samples are ~300GB and projects can scale to tens of thousands of participants—so spinning-disk HDFS storage became a bottleneck, causing low compute utilization, limited scalability, costly capacity upgrades and failures on data-heavy tasks like bit matching.
Berkeley added Pure Storage FlashBlade to decouple scalable, high‑bandwidth flash storage from compute. FlashBlade delivered 3x end-to-end performance improvements, 17x faster variant calling with half the cost, and reduced a genome-index load from 30 to 11 minutes, while supporting concurrent hybrid workflows, reliably handling massive file counts and simplifying management so researchers can run previously impossible workloads.