Overlapping and touching G banded chromosome dataset

Published: 16 August 2022| Version 1 | DOI: 10.17632/h5b3zbtw8v.1


The automated Karyotyping System (AKS) is a computerized tool in the field of cytogenetics. G banded metaphase images are often the inputs to such systems. These input images usually contain overlapping and touching chromosomes. The overlapping and touching chromosome segmentation is of great research interest in the area of AKS research and development. However, the publicly available dataset in this domain is quite less. This issue is addressed with this contribution. The dataset contains 500 overlapping and touching chromosomes from G-banded metaphase images and the respective mask (label images). These G-banded metaphase images are prepared from the volunteer blood samples at the Regional Cancer Center, Thiruvananthapuram, Kerala, India, through a series of cytogenetic lab procedures. This dataset contains overlapping and touching with only two chromosomes and the respective label images. The label images are the target segmentation maps. The segmentation map provides information about the chromsome1, chromosome2, and the overlapping region between the chromosomes. This dataset can be utilized to design overlapping and touching chromosome segmentation algorithms.


Steps to reproduce

Step 1. G banded metaphase images are prepared from volunteer blood samples through a series of cytogenetic laboratory procedures. Step 2. After preprocessing, chromosome clusters are segmented out from the metaphases. Step 3. Chromosome clusters with overlapping or touching of two chromosomes are manually selected Step 4. Label images are prepared by painting the individual chromosomes using photoshop. For this, chromosomes1 is painted with the color red, chromosome2 is painted with the color green and the overlapping region between the chromosomes are painted with the color blue. Step 5. The input image and the corresponding label image are saved in .png format Step 6. For ease of access and preparing the data for loading into a Deep Learning pipeline, the images are read into a NumPy array. Step 7. The NumPy array of each object is compiled together (a total of 500 images), to form 2 sets of arrays for both the original and the mask (label image) of dimensions (500, 256, 256, 3). Step 8. Then the 2 arrays, each for original images and masks are stacked to form the final dataset and are together saved as a single .npz file.


Regional Cancer Centre Thiruvananthapuram, College of Engineering Karunagappally


Computer Science Applications, Cytogenetics, Karyotype, Computer Engineering, Automated Segmentation, Chromosome Analysis