Transparent data for effective research: Beilstein open science symposium 2021


As the COVID-19 pandemic emerged, global governments turned to epidemiologists for insights to guide response strategies. These insights relied on open-access, high-quality, international and real-time epidemiological data. However, detailed public health data were rarely reported in a standardised form making comparisons across regions difficult. Considerable time and effort was therefore needed to process and standardise epidemiological data for each country, before any multi-country analysis could be conducted. changed this by collating disparate linelist datasets from around the world into a single online platform, which allows both visualisation and easy access to real-time epidemiological anonymized data. Our COVID-19 dataset currently contains rich information on over 50 million anonymized cases with over a billion data points. Our platform is reproducible and adaptable to any emerging infectious disease in the future.

We present the challenges of building this open-access data platform, including the tools for designing standards and assuring data quality, scaling the database as case counts exploded, and how to provide open-access data to encourage global use whilst preserving privacy. We’ll also discuss research opportunities enabled by this multi-country database and the emerging need for data and analytics to be decentralised.


Felix Jackson
Gal Wachtel
David M. Pigott
Moritz U.G. Kraemer

In Development

Currently in development, launching early 2021.