VitralColor-12: A Synthetic Twelve-Color Segmentation Dataset from GPT-Generated Stained-Glass Images (including pixel location and lightness neighborhoods)

Published: 21 October 2025| Version 4 | DOI: 10.17632/c89n64y2x5.4
Contributors:
,
,
,
,
,

Description

VitralColor-12, a synthetic dataset for color classification and segmentation, utilizes LLMs in specific GPT-5 and DALL·E 3 models to generate images of stained-glass. This approach simplifies labeling by using the dark steel structure supporting the glass as a guide, which provides easy regions to label with a single color per region. After that, we obtain the images and use at least one hand-labelled centroid per color to automatically cluster all pixels based on Euclidean distance and morphological operations, including erosion and dilation with two iterations per operation, and a kernel size of 2 pixels. This process enables us to label a classification dataset and generate segmentation maps automatically. Our dataset comprises 910 images, organized into 70 generated images and 12 pixel segmentation maps—one for each color, which includes 9,509,524 labeled pixels. We include tables with pixel values in RGB, HSL, CIELAB, and YCbCr color representations, along with lightness values for neighborhoods 4 and 8, enabling detailed color analysis and training of machine learning algorithms in different color spaces. Furthermore, we also included descriptive statistics and ΔE76, ΔE94, and CIELAB a vs b Chromacity, which prove the distribution, applicability, and realistic perceptual structures, including warm, neutral, and cold colors, as well as the high contrast between black and white colors, offering meaningful perceptual clusters, reinforcing its utility for color segmentation and classification. If you found the VitralColor-12 dataset usefull please perform citation: Rivera, M. M., Guerrero-Mendez, C., Lopez-Betancur, D., Saucedo-Anaya, T., Sánchez-Cárdenas, M., & Gómez-Jiménez, S. (2025). VitralColor-12: A Synthetic Twelve-Color Segmentation Dataset from GPT-Generated Stained-Glass Images. Data 2025, Vol. 10, Page 165, 10(10), 165. https://doi.org/10.3390/DATA10100165 @article{Rivera2025, author = {Martín Montes Rivera and Carlos Guerrero-Mendez and Daniela Lopez-Betancur and Tonatiuh Saucedo-Anaya and Manuel Sánchez-Cárdenas and Salvador Gómez-Jiménez}, doi = {10.3390/DATA10100165}, issn = {2306-5729}, issue = {10}, journal = {Data 2025, Vol. 10, Page 165}, keywords = {color benchmark,color segmentation,generative AI,synthetic dataset,synthetic images}, month = {10}, pages = {165}, publisher = {Multidisciplinary Digital Publishing Institute}, title = {VitralColor-12: A Synthetic Twelve-Color Segmentation Dataset from GPT-Generated Stained-Glass Images}, volume = {10}, url = {https://www.mdpi.com/2306-5729/10/10/165/htm https://www.mdpi.com/2306-5729/10/10/165}, year = {2025}, }

Files

Steps to reproduce

All the steps to reproduce VitralColor-12 dataset can be found in: https://doi.org/10.3390/data10100165

Institutions

Universidad Politecnica de Aguascalientes, Universidad Autonoma de Zacatecas

Categories

Computer Vision, Image Processing, Image Segmentation, Color, Image Classification

Licence