Published: 4 July 2022| Version 1 | DOI: 10.17632/p2g5sk8brb.1
Enric Moreu


Download dataset (41.2 GB): Synth-Colon is a synthetic dataset for polyp segmentation. It is the first dataset generated using zero annotations from medical professionals. The dataset is composed of 20 000 images with a resolution of 500×500. Synth-Colon additionally includes realistic colon images generated with our CycleGAN and the Kvasir training set images. Additionally, Synth-Colon can also be used for the colon depth estimation task because it includes depth and 3D information for each image. Synth-Colon includes: • Synthetic images of the colon and one polyp. • Masks indicating the location of the polyp. • Realistic images of the colon and polyps generated using our CycleGAN baseline and the Kvasir dataset. • Depth images of the colon and polyp. • 3D meshes of the colon and polyp in OBJ format. The 3D colon and polyps are procedurally generated using Blender, a 3D engine that can be automated via scripting. Our 3D colon structure is a cone composed by 2454 faces. Vertices are randomly displaced following a uniform distribution in order to simulate the tissues in the colon. Additionally, the colon structure is modified by displacing 7 segments. For the textures we used a base color [0.80, 0.13, 0.18] (RGB). For each sample we shift the color to other tones by adding a 20% of uniform noise to each channel. One single polyp is used on every image, which is placed inside the colon. It can be either in the colon’s walls or in the middle. Polyps are distorted spheres with 16384 faces. Samples with polyps occupying less than 2.6% of the image are removed. This results in a dataset average polyp size of 5.87%, which is within the values of the real datasets: the dataset with the smallest average polyp size is CVC-300 with 3.36% and the largest is Kvasir with 16.46%. Lighting is composed by a white ambient light, two white dynamic lights that project glare into the walls, and three negative lights that project black light at the end of the colon. We found that having a dark area at the end helps the generative models to understand the structure of the colon. The 3D scene must be similar to real colon images or the models will not properly translate the images to the real-world domain. Dataset website: Source code:


Steps to reproduce

In order to generate the dataset follow the steps in


Dublin City University


Medical Imaging, Image Segmentation, Colorectal Cancer, Endoscopy, Polyp