Cloudera
293 Case Studies
A Cloudera Case Study
DataSift is a social data platform that helps companies mine insights from public social media—processing Tweets, Facebook posts, blogs and forums. Growing out of TweetMeme, the company quickly outgrew relational databases and needed a way to ingest, store and analyze massive, fast-growing volumes of semi-structured social data and retain historical records for rewind-style analysis. They also required expert help to tune and scale HBase and Hadoop for production use.
DataSift partnered with Cloudera and built a Hadoop/HBase-based pipeline using MapReduce and Cloudera Enterprise to handle real-time and historical processing. The platform now processes ~600 million interactions (2+ TB) per day and stores over 1 PB, enabling queries that used to take weeks to run in minutes, support NLP and sentiment analysis, correlate social metrics with business data, and deliver actionable results (including predicting a stock tipping point minutes before it occurred).
Nicholas Halstead
Chief Technical Officer and Founder