As the COVID-19 pandemic emerged, global governments turned to epidemiologists for insights to guide response strategies. These insights relied on open-access, high-quality, international and real-time epidemiological data. However, detailed public health data were rarely reported in a standardised form making comparisons across regions difficult. Considerable time and effort was therefore needed to process and standardise epidemiological data for each country, before any multi-country analysis could be conducted.
Global.health changed this by collating disparate linelist datasets from around the world into a single online platform, which allows both visualisation and easy access to real-time epidemiological anonymized data. Our COVID-19 dataset currently contains rich information on over 50 million anonymized cases with over a billion data points. Our platform is reproducible and adaptable to any emerging infectious disease in the future.
We present the challenges of building this open-access data platform, including the tools for designing standards and assuring data quality, scaling the database as case counts exploded, and how to provide open-access data to encourage global use whilst preserving privacy. We’ll also discuss research opportunities enabled by this multi-country database and the emerging need for data and analytics to be decentralised.