Databricks
398 Case Studies
A Databricks Case Study
MediaMath is a demand‑side media buying and data management platform that serves over a billion ads and tracks billions of events daily. The team needed to turn a promising proof‑of‑concept—the Audience Index Report, which compares observed vs. expected site visitors by demographic segment—into a scalable, production web service; doing so required heavy ETL, complex joins across user and site data, and aggregation over 30 days of activity at massive scale.
The solution used PySpark on Databricks: segment and pixel state were stored as S3 sequence files (UDB), processed with RDDs then converted to DataFrames, joined and aggregated to compute the required counts, and written to an AWS PostgreSQL RDS via Spark’s JDBC connector. Databricks notebooks and the job scheduler simplified development, orchestration and monitoring, enabling the team to condense hundreds of terabytes into a consumable 30‑day report that now serves clients reliably and sped up delivery of new reports.