Skip to main content
Exit comparison
Removed
Added

Datasets Comparison

Versions selector

2
3
123

Version 2

Corona-virus disease (COVID-19) Data-set with Improved Measurement Errors of Referenced Official Data Sources

Published:4 May 2020|Version 2|DOI:10.17632/nw5m4hs3jr.2
Contributors:
,

Description

This dataset is the result of a study on the quality of official datasets available for COVID-19. We used comparative statistical analysis to evaluate the accuracy of data collection by a national (Chinese Center for Disease Control and Prevention) and two international (World Health Organization; European Centre for Disease Prevention and Control) organisations based on the value of systematic measurement errors. The data is collected by using text mining techniques and reviewing reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, standard country codes (M49 code), Alpha-2 codes, Alpha-3 codes, latitude, longitude, and some additional attributes such as population. The data of China is presented in more detail in another sheet, which is extracted from the attached reports to the main page of the CCDC website. Additionally, it is beneficiary of major corrections on the referenced data-sets and official reports such as adjustment of the date of reports (which was suffering from one or two days lags), removing four negative values, detecting unreasonable changes of historical data in new reports (which was revealed by comparing the daily reports), and finally the corrections on systematic measurement errors, (which was increased by the increase of the number of infected countries). An aggregated root mean square error was used to identify the main problematic parts of data-sets in addition to comparative statistical analysis to evaluate the errors. The result is a combined dataset with improved systematic measurement errors and with some new attributes in addition to the normal attributes of SARS-CoV-2 and cronavirus disease, such as daily mortality, and fatality rates. This data-set could be considered as a comprehensive and reliable source of COVID-19 data for further studies.

Institutions

Institutions

Universidade Nova de Lisboa

Categories

Epidemiology, Biostatistics, Data Quality Analysis, Coronavirus, Measurement Error Estimation, COVID-19

Related Links

Licence

Creative Commons Attribution 4.0 International

Version 3

COVID-19 Combined Data-set with Improved Measurement Errors

Published:13 May 2020|Version 3|DOI:10.17632/nw5m4hs3jr.3
Contributors:
,

Description

Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.

Institutions

Institutions

Universidade Nova de Lisboa

Categories

Statistics, Epidemiology, Public Health, Biostatistics, Data Quality Analysis, Coronavirus, Measurement Error Estimation, Severe Acute Respiratory Syndrome Coronavirus 2, COVID-19

Related Links

Licence

Creative Commons Attribution 4.0 International