Chaos Game Representation (CGR images ) of SARS-CoV-2 Variants (Alpha,Beta, Delta, Gamma and Omicron)

Published: 12 December 2022| Version 2 | DOI: 10.17632/2x546shhwk.2
Contributor:
shahina Afzal

Description

Currently available genome sequence classification methods are based on text or sequence alignment techniques. Our aim is to build an image-based genome sequence classifier using deep learning technique. In 1990 H J Jeffry proposed a method Chaos Game Representation (CGR), which converts long one-dimensional sequences into two-dimensional images. This dataset contains the CGR images of genomic sequences of SARS-CoV-2 Variants - alpha, beta, delta, gamma, and omicron. The dataset is divided into three folders named train, test, and validate. Each folder contains five subfolders named alpha, beta, delta, gamma, and omicron. The "train" folder has a total of 17500 images - 3500 images in each subfolder. The "test" folder has 5000 images - 1000 from each category. The "validate" folder has 2500 images - 500 images from each individual class. Genomic sequences of the above-mentioned SARS- CoV-2 variants were downloaded from the GISAID database and the sequences were then converted to CGR images using a python script.

Files

Institutions

University of Kerala

Categories

Genome

Licence