k-mers 1D and 2D representation dataset of SARS-CoV-2 nucleotide sequences

Published: 26 May 2020| Version 2 | DOI: 10.17632/f5y9cggnxy.2
Raquel de M. Barbosa,
Marcelo Fernandes


The dataset provides five types of k-mers genome representation characterized as k-mers count 1D, k-mers probability 1D, k-mers count 2D, k-mers probability 2D, and k-mers image. The dataset is composed of 1557 virus instances of SARS-CoV-2. Besides, the dataset also provides a data stream of 11540 viruses from the Virus-Host DB dataset and the other three Riboviria viruses from NCBI (Betacoronavirus RaTG13, bat-SL-CoVZC45, and bat-SL-CoVZXC21).