Databricks
398 Case Studies
A Databricks Case Study
Elsevier Labs, the research arm of scientific publisher Elsevier, needed a scalable data platform to run complex natural language processing across terabytes of text and petabytes of related content. Prior attempts with Apache Storm and disparate AWS tools were slowed by manual data movement, a steep learning curve that discouraged code reuse, and difficulty presenting results—limiting who could contribute and dragging project timelines out to weeks.
By standardizing on Apache Spark via Databricks, the team gained a single, collaborative workspace with simple cluster management, mounted S3 access, and multi‑language notebooks that make it easy to reuse libraries and share results. The change cut typical project time from weeks to days, expanded active contributors from a few specialists to over 15 users, sped integration of external tools (e.g., CoreNLP) from days to a single day, and improved visibility and presentation of findings across the organization.
Ron Daniel
Labs Director, Elsevier