Case Study: UC Berkeley achieves 3x faster genomic sequencing and scalable analysis with Pure Storage FlashBlade

A Pure Storage Case Study

Preview of the UC Berkeley Case Study

UC Berkeley Researchers Extend the Boundaries of Genomics Using FlashBlade

The University of California–Berkeley’s RISELab team builds high-performance genomic analysis tools (ADAM on Apache Spark) to accelerate DNA sequencing for research and clinical use. Their workload produces petabytes of data—single samples are ~300GB and projects can scale to tens of thousands of participants—so spinning-disk HDFS storage became a bottleneck, causing low compute utilization, limited scalability, costly capacity upgrades and failures on data-heavy tasks like bit matching.

Berkeley added Pure Storage FlashBlade to decouple scalable, high‑bandwidth flash storage from compute. FlashBlade delivered 3x end-to-end performance improvements, 17x faster variant calling with half the cost, and reduced a genome-index load from 30 to 11 minutes, while supporting concurrent hybrid workflows, reliably handling massive file counts and simplifying management so researchers can run previously impossible workloads.


Open case study document...

Pure Storage

181 Case Studies