Waterline Data
2 Case Studies
A Waterline Data Case Study
GlaxoSmithKline (GSK), the global healthcare and life sciences company, needed a better way to manage the massive volumes of data used by its scientists for R&D, clinical trials, manufacturing, and regulatory approvals. With millions of files spread across a multi-petabyte Cloudera data lake and more than 2,100 data sources, the company struggled with duplicate, overlapping, and cluttered data that slowed analysis and delayed clinical and regulatory work. Waterline Data helped address this challenge with its Smart Data Catalog and data discovery capabilities.
Waterline Data implemented Waterline Data Catalog and Waterline Data Fingerprinting™ so GSK could identify duplicate and overlapping datasets, understand data lineage, and automate ongoing de-duplication. The solution enabled scientists to search data more easily using business terms, improved self-service access, and reduced storage, server, and license costs tied to redundant data. Waterline Data also helped GSK reduce time to analysis, accelerate regulatory approvals, and save millions of dollars each year.
Mark Ramsey
Senior Vice President, R&D Chief Data Officer