We present Global.health, a scalable online platform for collecting high-dimensional epidemiological data and transforming those data into a consistent schema to enable distributed analyses. Global.health was originally developed to handle the demands of high-volume, accurate collection of epidemiological line list data in the early months of the COVID-19 pandemic. It has since proven amenable to rapid adjustment as collection of new variables became relevant, for example tracking variants of concern and vaccination status in COVID-19 cases, as well as clinical data. The Global.health platform is based on a microservices architecture deployed to the cloud. We discuss this architecture and the choices that motivated it, as well as the steps needed for an independent group to run their own copy of Global.health in their local environments. We describe the data governance challenges related to providing appropriate privacy to people in multiple jurisdictions while fulfilling the project’s goal to enable open data sharing and rapid science during health emergencies.
Currently in development, launching early 2021.