Goldenhar-CFID: A Novel Dataset for Craniofacial Anomaly Detection in Goldenhar Syndrome
Description
The Goldenhar Syndrome Craniofacial Image Dataset (Goldenhar-CFID) is a high-resolution dataset designed for the automated detection and classification of craniofacial abnormalities associated with Goldenhar Syndrome (GS). It comprises 4,483 images, categorized into seven distinct classes of craniofacial deformities. This dataset serves as a valuable resource for researchers in medical image analysis, deep learning, and clinical decision-making. Dataset Characteristics: Total Images: 4,483 Number of Classes: 7 Image Format: JPG Image Resolution: 640 x 640 pixels Annotation: Each image is manually labeled and verified by medical experts Data Preprocessing: Auto-orientation and histogram equalization applied for enhanced feature detection Augmentation Techniques: Rotation, scaling, brightness adjustments, flipping, and contrast modifications Categories and Annotations The dataset includes images categorized into seven craniofacial deformities: Cleft Lip and Palate – Congenital anomaly where the upper lip and/or palate fails to develop properly. Epibulbar Dermoid Tumor – Benign growth on the eye’s surface, typically at the cornea-sclera junction. Eyelid Coloboma – Defect characterized by a partial or complete absence of eyelid tissue. Facial Asymmetry – Uneven development of facial structures. Malocclusion – Misalignment of the teeth and jaws. Microtia – Underdeveloped or absent outer ear. Vertebral Abnormality – Irregular development of spinal vertebrae. Dataset Structure and Splitting The dataset consists of four main subdirectories: Original – Contains 547 raw images. Unaugmented Balanced – Contains 210 images per class. Augmented Unbalanced – Includes 4,483 images with augmentation. Augmented Balanced – Contains 756 images per class. The dataset is split into: Training Set: 80% Validation Set: 10% Test Set: 10%