Datasets Comparison
Version 2
A Comprehensive High-Resolution Dataset for Analyzing Craniofacial Features Syndrome: Images for Feature Detection.
Description
Categories
Licence
Creative Commons Attribution 4.0 International
Version 3
Goldenhar-CFID: A Novel Dataset for Craniofacial Anomaly Detection in Goldenhar Syndrome
Description
The Goldenhar Syndrome Craniofacial Image Dataset (Goldenhar-CFID) is a high-resolution dataset designed for the automated detection and classification of craniofacial abnormalities associated with Goldenhar Syndrome (GS). It comprises 4,483 images, categorized into seven distinct classes of craniofacial deformities. This dataset serves as a valuable resource for researchers in medical image analysis, deep learning, and clinical decision-making.
Dataset Characteristics:
Total Images: 4,483
Number of Classes: 7
Image Format: JPG
Image Resolution: 640 x 640 pixels
Annotation: Each image is manually labeled and verified by medical experts
Data Preprocessing: Auto-orientation and histogram equalization applied for enhanced feature detection
Augmentation Techniques: Rotation, scaling, brightness adjustments, flipping, and contrast modifications
Categories and Annotations
The dataset includes images categorized into seven craniofacial deformities:
Cleft Lip and Palate – Congenital anomaly where the upper lip and/or palate fails to develop properly.
Epibulbar Dermoid Tumor – Benign growth on the eye’s surface, typically at the cornea-sclera junction.
Eyelid Coloboma – Defect characterized by a partial or complete absence of eyelid tissue.
Facial Asymmetry – Uneven development of facial structures.
Malocclusion – Misalignment of the teeth and jaws.
Microtia – Underdeveloped or absent outer ear.
Vertebral Abnormality – Irregular development of spinal vertebrae.
Dataset Structure and Splitting
The dataset consists of four main subdirectories:
Original – Contains 547 raw images.
Unaugmented Balanced – Contains 210 images per class.
Augmented Unbalanced – Includes 4,483 images with augmentation.
Augmented Balanced – Contains 756 images per class.
The dataset is split into:
Training Set: 80%
Validation Set: 10%
Test Set: 10%
Institutions
Institutions
East West University
Categories
Biochemical Disorders Genetics, Diagnosis, Object Detection, Deep Learning
Licence
Creative Commons Attribution 4.0 International