Photorealistic Synthesis of Oral Lichen Planus Lesions: Code and Example Dataset
Description
This dataset accompanies the manuscript "Photorealistic Synthesis of Oral Lichen Planus and Lichenoid Lesions Enhances Deep-Learning Segmentation in Intra-Oral Photographs" submitted to Computerized Medical Imaging and Graphics. Research Context: Medical image analysis for oral potentially malignant disorders (OPMDs) is hindered by limited annotated datasets. This work addresses data scarcity by combining CycleGAN with a Realistic Enhancement Algorithm (REA) to synthesize photorealistic oral lichen planus/lichenoid lesion (OLP/OLL) images on healthy buccal mucosa photographs. What This Dataset Contains: 1. codes.zip — Source code for the CycleGAN+REA synthesis pipeline: buccal mucosa segmentation, CycleGAN lesion generation, REA histogram matching, Laplacian pyramid blending, QC screening, and segmentation inference scripts. 2. models.zip — Pretrained weights: CycleGAN generator, TensorFlow-based buccal mucosa segmentation, OLP lesion segmentation (QC), and three Experiment 1 architectures (MANet+MiT-B2, FPN+MiT-B0, PSPNet+EfficientNet-B0). 3. data.zip — De-identified example images: healthy buccal mucosa inputs and real lesion images with ground-truth masks. Key Findings: * REA combines masked histogram matching with Laplacian pyramid blending to reduce CycleGAN artifacts (color drift, boundary discontinuities) * 520 dental professionals achieved only 58.22% discrimination accuracy (chance=50%), indicating high realism * Synthetic augmentation improved segmentation by up to +2.08 mIoU points How to Use: 1. Unzip codes.zip, data.zip, and models.zip 2. Install dependencies from requirements_simplified.txt 3. Run realistic_lesion_pipeline.py for synthesis 4. Run oral_lesion_segmentation.py for segmentation inference Data Restrictions: Due to ethics restrictions, the full clinical dataset cannot be released. This repository provides minimal examples to demonstrate pipeline functionality. Associated Publication: Theppitak S, Wongsapai M, Jaidee E, Sakdapreecha C, Ittichaicharoen J, Pongsiriwet S, Warin K, Suebnukarn S, Wuttisarnwattana P. "Photorealistic Synthesis of Oral Lichen Planus and Lichenoid Lesions Enhances Deep-Learning Segmentation in Intra-Oral Photographs", Computerized Medical Imaging and Graphics. [Under Review]
Files
Steps to reproduce
1. Data Collection Oral photographs were collected from five sources: (1) Thammasat University Oral and Maxillofacial Surgery Center, (2) Chiang Mai University Faculty of Dentistry, (3) Intercountry Centre for Oral Health community screening programs, (4) Samutprakarn Hospital, and (5) publicly available web-sourced images. All clinical images were acquired using professional dental cameras (e.g., Nikon D5200) or smartphones. Lesion images were biopsy-validated as OLP/OLL by oral medicine specialists. Ethical approvals were obtained from respective institutional committees. 2. Data Annotation Ground-truth lesion masks were created using Labelbox annotation tool. Biomedical engineers performed initial annotations using freehand drawing, which were subsequently reviewed and confirmed by oral medicine specialists. 3. Software Environment * Python 3.10+ * numpy, PyYAML, opencv-python-headless * torch, torchvision, segmentation-models-pytorch, timm * tensorflow, keras * Install via: pip install -r requirements_simplified.txt 4. CycleGAN Training CycleGAN was trained to translate healthy buccal mucosa to OLP/OLL lesions using 204 healthy and 201 lesion images. Parameters: ADAM optimizer, learning rate 2×10⁻⁴, batch size 1, λ=10 for cycle-consistency loss, 200 epochs with cosine annealing decay. 5. REA Pipeline Execution Run realistic_lesion_pipeline.py to process healthy images through: (1) buccal mucosa segmentation (FPN+EfficientNet-B5), (2) CycleGAN lesion generation, (3) masked background luminance histogram matching in CIELAB L* channel, (4) organ-masked Laplacian pyramid blending with σ=8.0 feathering, and (5) QC screening (reject if lesion area <500 pixels or <1% of image). 6. Segmentation Model Training Twenty decoder-encoder combinations were trained using Dice+BCE loss, ADAM optimizer (lr=1×10⁻⁴), batch size 16, cosine annealing with warm restarts, and early stopping (patience=30). Five-fold cross-validation with 70/10/20 train/val/test splits. 7. Segmentation Inference Run oral_lesion_segmentation.py with pretrained weights to generate predictions and compute IoU metrics.
Institutions
- Chiang Mai University
- Royal Thai Government Ministry of Public Health
- Samutprakarn Hospital
- Thammasat University