Cloudera
293 Case Studies
A Cloudera Case Study
The U.S. Census Bureau faced a massive data challenge for the 2020 census: for the first time the count would be conducted largely online, creating petabytes of response, operational and administrative data while serving ~330 million residents. The bureau needed to capture, store, analyze and secure huge, diverse datasets in real time to improve accuracy, reduce redundant collection, and support fast operational decisions across a nationwide operation.
To meet that need the bureau implemented an Enterprise Data Lake (EDL) built on Cloudera technologies—Cloudera DataFlow for ingestion and real-time analytics, Hortonworks Data Platform as the data lake, and HDFS, Apache Ranger, Apache Atlas plus encryption for security and governance—deployed in a hybrid model on AWS GovCloud with on-demand processing clusters. The EDL enabled petabyte-scale processing, faster decision-making, stronger data governance and sharing across agencies, improved data quality and operational efficiency, and reduced infrastructure costs.
Kevin Smith
Chief Information Officer