Case Study: The National Archives achieves large‑scale web & social media archiving and blazing‑fast full‑text search with MirrorWeb

A MirrorWeb Case Study

Preview of the The National Archives Case Study

The National Archives - Customer Case Study

The National Archives faced a rapidly growing, more complex UK Government Web Archive (UKGWA) and needed to modernise how web and social media content was captured, stored and made searchable. After a procurement process they chose MirrorWeb and its UKGWA service on AWS for its cloud expertise and social‑media archiving capabilities to deliver a reliable, comprehensive public search and replay service.

MirrorWeb migrated the legacy archive to Amazon (using Snowballs and custom ingest hardware in two weeks), then built a new public site and a full‑text, faceted search stack using Elasticsearch and their cloud data pipeline WarpPipe. MirrorWeb indexed the entire collection at scale—spinning a 1000+ node cluster to process some 120TB and index 14 billion documents in about 10 hours—enabled near‑real‑time capture of hundreds of social accounts, improved deduplication and search accuracy, and delivered fast replay and high‑traffic capacity for the UKGWA.


Open case study document...

The National Archives

John Sheridan

Digital Director


MirrorWeb

21 Case Studies