Case Study: Pinterest achieves petabyte-scale self-service Hadoop and 30–60% higher throughput with Qubole

A Qubole Case Study

Preview of the Pinterest Case Study

Pinterest Builds a Self-Service Platform for Hadoop Using Qubole Data Service

Pinterest processes massive amounts of data to power its personalized discovery engine—over 30 billion Pins, ~20 TB of new logs per day and about 10 PB in S3. As usage grew, Amazon EMR became unstable at scale and proprietary Hive limitations, dependency management, and the need to onboard non-technical users made Hadoop hard to operate as a self-serve platform.

Pinterest implemented an executor abstraction and migrated jobs to Qubole Data Service, enabling on-demand, horizontally scalable clusters with strong Hive integration, 100% spot-instance support and simplified user access. The move delivered stable petabyte-scale performance with 30–60% higher throughput than EMR, supported 100+ MapReduce users running >2,000 jobs/day, six clusters (3,000+ nodes) and nearly a petabyte processed daily, while reducing operational overhead and speeding onboarding.


Open case study document...

Pinterest

Mohammad Shahangian

Data Engineer, Pinterest


Qubole

28 Case Studies