Case Study: Scribd achieves faster, more scalable data pipelines with Databricks on AWS

Moving to the cloud enables reading without limits

Scribd, the online reading platform with over 60 million titles, needed a more scalable way to support real-time data processing and personalization. Its legacy Hadoop infrastructure was rigid, hard to maintain, and struggled with large batch and streaming datasets, creating performance issues, small-file problems, and collaboration silos.

Databricks helped Scribd move to AWS and adopt Delta Lake as a unified Lakehouse platform. With Databricks, Scribd streamlined batch and streaming pipelines, improved collaboration through interactive notebooks, and simplified infrastructure management. The result was 30–50% better performance for most Spark workloads and an estimated 30–50% reduction in operational costs, while enabling fresher data and more personalized customer experiences.

Open case study document...

Scribd

R Tyler Croy

Director of Platform Engineering

Databricks

460 Case Studies

Case Study: Scribd achieves faster, more scalable data pipelines with Databricks on AWS

Moving to the cloud enables reading without limits

Scribd

Databricks

Was it helpful? Rate this case study:

Thank you for your feedback.