infrastructure a scalable platform for pandemic data integration, analytics, and preparedness


We present, a scalable online platform for collecting high-dimensional epidemiological data and transforming those data into a consistent schema to enable distributed analyses. was originally developed to handle the demands of high-volume, accurate collection of epidemiological line list data in the early months of the COVID-19 pandemic. It has since proven amenable to rapid adjustment as collection of new variables became relevant, for example tracking variants of concern and vaccination status in COVID-19 cases, as well as clinical data. The platform is based on a microservices architecture deployed to the cloud. We discuss this architecture and the choices that motivated it, as well as the steps needed for an independent group to run their own copy of in their local environments. We describe the data governance challenges related to providing appropriate privacy to people in multiple jurisdictions while fulfilling the project’s goal to enable open data sharing and rapid science during health emergencies.


Alex Benjamin

John Brownstein

Aaron Cansler

Shruti Chandra

Emmanuel Cornet

Abhishek Dasgupta

Rasmi Elasmar

Timothé Faudot

Felix Jackson

Moritz Kraemer

Anya Lindström Battle

Graham Lee

Kernie Obimakinde Consortium

Allyson Pemberton

David Pigott

Stephen Ratcliffe

Leslie Leland

Kelly Moran

Samuel Scarpino

James Sheldon

Gal Wachtel

In Development

Currently in development, launching early 2021.