Artificial Intelligence to Model the COVID-19 Country Infection Trends

Published: 7 December 2020| Version 2 | DOI: 10.17632/7tyw5d3ccm.2
Ricardo Rios,


The Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) organized an online repository (available at with world-wide information on the absolute number of new confirmed, recovered, and death cases related to the COVID-19 disease (Coronavirus Disease 2019) caused by the Sars-CoV-2 virus (coronavirus). From the whole dataset, we have focused our analysis on the daily time series summaries, which contain the accumulated numbers of confirmed, death, and recovered cases for each country. Given some countries (e.g., Australia, Canada, and China) were reported at the province/state level, we have aggregated all those into a single time series. Another important modification in this dataset was performed to reorganize the daily records. Instead of using accumulated cases, we calculated the lagged differences between consecutive days. Besides the dataset, we also share our source code designed to cluster time series from different countries with similar behavior. Aiming at reproducing our results, run the source code "tree-clustering.R" Our main contribution is the function "calc.dend.dists" available in "distances-dendrogram.R" . For more information, visit our project


Steps to reproduce

If the reader are interested in reproducing our results, you can just run the following R code: #### Before running clustering-covid.R, select one of the following dataset: # load(file="data/confirmed-day-by-country.Rdata") # load(file="data/death-day-by-country.Rdata") ### The previous code plots all maps, but if you are interested in seeing all similar countries for every cluster, you can just run tree-clustering.R by also selecting the dataset as shown in previous example.


Universidade de Sao Paulo Campus de Sao Carlos, Universidade Federal da Bahia


Applied Sciences, Health Sciences