Dataset and Hierarchical Clustering Code to Track the Next COVID-19 Epicenter

Published: 29 June 2020| Version 1 | DOI: 10.17632/7tyw5d3ccm.1
Contributors:
Ricardo Rios, Tatiane Nogueira, Danilo Coimbra, Ajith Abraham, Rodrigo Mello

Description

The Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) organized an online repository (available at https://github.com/CSSEGISandData/COVID-19) with world-wide information on the absolute number of new confirmed, recovered, and death cases related to the COVID-19 disease (Coronavirus Disease 2019) caused by the Sars-CoV-2 virus (coronavirus). From the whole dataset, we have focused our analysis on the daily time series summaries, which contain the accumulated numbers of confirmed, death, and recovered cases for each country. Given some countries (e.g., Australia, Canada, and China) were reported at the province/state level, we have aggregated all those into a single time series. After that, our dataset was composed of $186$ countries and an extra time series containing cases registered in the Diamond Princess cruise ship. Next, we removed the time series related to Diamond Princess and all time series on recovered cases, just to focus our attention on confirmed and death cases. Another important modification in this dataset was performed to reorganize the daily records. Instead of using accumulated cases, we calculated the lagged differences between consecutive days. Besides the dataset, we also share our source code designed to cluster time series from different countries with similar behavior.

Files

Steps to reproduce

If the reader are interested in reproducing our results, you can just run the following R code: #### Before running clustering-covid.R, select one of the following dataset: # load(file="data/confirmed-day-by-country.Rdata") # load(file="data/death-day-by-country.Rdata") ### The previous code plots all maps, but if you are interested in seeing all similar countries for every cluster, you can just run tree-clustering.R by also selecting the dataset as shown in previous example.

Institutions

  • Universidade de Sao Paulo Campus de Sao Carlos
  • Universidade Federal da Bahia

Categories

Applied Sciences, Health Sciences

Licence