Case Study: United States Census Bureau achieves a secure enterprise data lake for petabyte-scale, real-time analytics with Cloudera

A Cloudera Case Study

Preview of the United States Census Bureau Case Study

U.S. Census Bureau Embracing the digital age with Enterprise Data Lake

The U.S. Census Bureau faced a massive data challenge for the 2020 census: for the first time the count would be conducted largely online, creating petabytes of response, operational and administrative data while serving ~330 million residents. The bureau needed to capture, store, analyze and secure huge, diverse datasets in real time to improve accuracy, reduce redundant collection, and support fast operational decisions across a nationwide operation.

To meet that need the bureau implemented an Enterprise Data Lake (EDL) built on Cloudera technologies—Cloudera DataFlow for ingestion and real-time analytics, Hortonworks Data Platform as the data lake, and HDFS, Apache Ranger, Apache Atlas plus encryption for security and governance—deployed in a hybrid model on AWS GovCloud with on-demand processing clusters. The EDL enabled petabyte-scale processing, faster decision-making, stronger data governance and sharing across agencies, improved data quality and operational efficiency, and reduced infrastructure costs.


Open case study document...

United States Census Bureau

Kevin Smith

Chief Information Officer


Cloudera

293 Case Studies