Markov chain representation dataset of SARS-CoV-2 genome

Published: 13-05-2020| Version 1 | DOI: 10.17632/8jt93ggv9w.1
Maria Coutinho,
Ivanovitch Silva,
Luiz Affonso Guedes,
Marcelo Fernandes


COVID-19, the disease caused by the SARS-CoV-2 virus, has been spreading around the world quite aggressively since the end of 2019. It has been declared a pandemic by the World Health Organization, and Capturing data from May 13, 2020; there are more than 4 million cases with more than 250 thousand deaths. Thus, this work presents a new dataset in which creates the Markov chain representation of the SARS-CoV-2 genome sequences from NCBI (1557 instances). The dataset also provides a Markov chain representation of other viruses from the Virus-Host DB (11540 different viruses) and three Riboviria viruses from NCBI (Betacoronavirus RaTG13, bat-SL-CoVZC45, and bat-SL-CoVZXC21).