Case Study: DataSift achieves real-time, petabyte-scale social-data analytics with Cloudera Enterprise

A Cloudera Case Study

Preview of the DataSift Case Study

DataSift Allowing Enterprises to Benefit from Social Data Using Hadoop

DataSift is a social data platform that helps companies mine insights from public social media—processing Tweets, Facebook posts, blogs and forums. Growing out of TweetMeme, the company quickly outgrew relational databases and needed a way to ingest, store and analyze massive, fast-growing volumes of semi-structured social data and retain historical records for rewind-style analysis. They also required expert help to tune and scale HBase and Hadoop for production use.

DataSift partnered with Cloudera and built a Hadoop/HBase-based pipeline using MapReduce and Cloudera Enterprise to handle real-time and historical processing. The platform now processes ~600 million interactions (2+ TB) per day and stores over 1 PB, enabling queries that used to take weeks to run in minutes, support NLP and sentiment analysis, correlate social metrics with business data, and deliver actionable results (including predicting a stock tipping point minutes before it occurred).


Open case study document...

DataSift

Nicholas Halstead

Chief Technical Officer and Founder


Cloudera

293 Case Studies