CGR images of virus families

Published: 21 September 2023| Version 1 | DOI: 10.17632/5pv49vxsjk.1
shahina Afzal


Chaos Game Representation (CGR) is a alignment free method which converts textual biosequences into graphical images, which can then be used to classify the sequence. Genome sequences can be represented as CGR images. Human pathogenic viruses are always a hot topic in the area of research. The dataset contains 1600 CGR images of 8 different human pathogenic virus families. The virus families included are Adenoviridae, Anelloviridae, Coronaviridae, Flaviviridae, Papillomaviridae, Picornaviridae, Poxviridae and Retroviridae. The images are placed under three folders named "train", "test" and "validate". The "train" folder has 1120 images, 140 images from each class. Likewise "test" folder has 320 images, 40 images from each class. Finally the "validate" folder has 160 images, 20 images from each class.