Case Study: National Institutes of Health (NIH) achieves scalable, efficient and accurate TCGA-to-ICGC genome data transfers with Denodo Technologies

A Denodo Technologies Case Study

Preview of the National Institute of Health Case Study

Curing Advanced Data Ailments Using Data Virtualization to Aid Worldwide War on Cancer

The National Institutes of Health (NIH), through the National Cancer Institute and the National Human Genome Research Institute, needed to share The Cancer Genome Atlas (TCGA) sequencing data with the International Cancer Genome Consortium (ICGC). Moving and reformatting hundreds of millions of rows from multiple sources (XML, Oracle, MySQL) into ICGC’s required formats using custom PERL scripts proved not scalable, costly to maintain, and error-prone.

NIH implemented data virtualization to connect directly to source systems, apply TCGA→ICGC mappings, produce >50 final views (over 100M rows) and run a quarterly FTP upload of CSV files. This eliminated redundant copies, sped development, improved accuracy, created reusable workflows that scaled across 25 cancer types, and was later extended to similar projects such as TARGET.


Open case study document...

Denodo Technologies

109 Case Studies