Currently in development, launching early 2021.
|Id evento caso (Case ID)||100%|
|Fecha diagnostico (Date of diagnosis)||93.46%|
|Fecha de inicio de síntomas|
(Date of symptom onset)
(Municipality & Department)
|Cuidado intensivo (ICU admission)||100%|
*as of 13/06/2021; $ line list includes 100% of cases reported for Argentina by the World Health Organization
In the following case-study we take a deep-dive into COVID-19 line-list data from Argentina, one of the >130 countries included in the Global.health platform. The case-study covers information about provenance of the data, data transformations to fit the Global.health schema and key characteristics and limitations of the data. While only addressing one country, the design of Global.health lets users quickly ask similar questions about any country included in the platform and we will discuss how users can conduct such an investigation on their own. Stay tuned for more of these data deep dives coming up. Here is another example for Peru and Colombia.
What is the provenance of the data?
The Ministry of Health of the Government of Argentina (Ministerio de Salud) collects and shares individual level case data for COVID-19 through an interactive website. Their platform was launched on May 15th, 2020, and data are updated daily. The Dirección Nacional de Epidemiología y Análisis de Situación de Salud is responsible for the data. A data dictionary and further information can be found here.
Where can I find the original data and how is the data transformed?
Raw data can be downloaded on the link here and details of our parser that transforms the data to our standard schema can be found here. We geocode cases within Argentina by adding centroids (latitude and longitude) through a manual lookup table, which is provided by the Instituto Geográfico Nacional. For geocoding countries included in travel information, a lookup table is used to map country ISO-2 codes to longitude/latitude of country centroids, obtained from GoogleMaps. We ingest this database once per day and check for any updates on previously ingested cases going back one month.
How complete is the data compared to aggregated data sources?
It appears that the individual level case data provided by the Ministry of Health of the Government of Argentina are complete compared to aggregated data provided to the World Health Organization (WHO): for example on June 12th, 2021 the dataset included 4,111,147 confirmed cases which is more than 100% of those reported by WHO on that day (4,066,156).
Key characteristics and limitations of this database:
The line list dataset from Argentina provides 25 metadata fields for each patient. These include a unique ID (‘id_evento_caso’) which helps track each case through time (e.g., from symptom onset to hospitalisation). Two different types of geographic metadata are provided, namely residencial geographic information provided at the country, province, and department levels (‘residencia_pais_nombre’, ‘residencia_provincia_nombre’, ‘residencia_departamento_nombre’, respectively), as well as geographic information to province level of where the confirmatory COVID-19 test was carried out (‘carga_provincia_nombre’). The residential geographical information is used for location of the case in all instances where this is provided because it includes a higher level of geographic detail; when this information is missing or the residential country is not Argentina, the test location is used to provide the geographic information of the case. When the country of residence is other than Argentina, travel in the past 30 days is assumed and the country added as part of case-associated travel history.
The date of diagnosis (‘fecha_diagnostico’) is provided in 93.46% of cases and is used as the date of confirmation of the patient in our database. In cases where this is missing, date of symptom onset (‘fecha_inicio_sintomas’) is used in lieu; when neither date of diagnosis nor date of symptom onset are provided, date of case opening is used (‘fecha_apertura’). We indicate which field is used in the notes field of the global.health database. The dataset contains some dates which are not realistic (e.g. date of diagnosis given as years 2001 and 2012). The global.health ingestion process will automatically ignore cases which are earlier than the earliest allowed date of 1st November 2019. In addition, according to the WHO the first COVID-19 case in Argentina was detected on March 3rd 2020, but the dataset contains a small number of cases with a confirmation date before this but after the earliest allowable date (i.e. between 1st November 2019 and 3rd March 2020).
Whether the patient has been hospitalized is indicated solely by the presence or absence of a date of hospitalization. ICU admission or death, however, are indicated in a binary field (‘cuidado_intensivo’ and ‘fallecido’, respectively), and further date information is provided in the ‘fecha_cui_intensivo’ and ‘fecha_fallecimiento’ fields, respectively. The date fields are left empty when no such information exists. When ICU admission and/or death are reported an associated date is reported 100 % of the time.
Importantly we are only ingesting cases classified (‘clasificacion_resumen) as confirmed, amounting to >4.1M cases out of >12.8M cases recorded in the dataset. >7.6M cases are currently listed as discarded.
The dataset includes reporting on additional metadata details not commonly seen in other COVID-19 databases, including reporting on whether the patient was put under mechanical ventilation (‘asistencia_respiratoria_mecanica’), and whether the patient was treated through the public or private health system (‘origen_financiamiento’). Both were reported with 100% completeness.
How to filter, view, and download this data:
To access the most up to date data described above please follow this link and see a visualisation of this data.
Signature and Contact
Currently in development, launching early 2021.