Endoscopic real-synthetic over- and underexposed frames for image enhancement
In endoscopy examinations, it is very common to come across with exposure errors due to light reflexions in the inner walls of the hollow organs. For instance, when the tip of the endoscope (which has a light) points to folds, these structures reflect the light, provoking overexposure, and whereas, an underexposed region at the other end of the frame can appear. This can be a big problem for physicians since they are fully responsible of detecting anomalies, blood, or even polyps, that could be translated in warning cases. Currently, methods for enhancing exposure errors need paired data, i.e., corrupted frames and their respective ground-truth (i.e., non-corrupted or clean image). For instance, for natural images the LOL or MIT-Adobe FiveK datasets, which contain common real life images have been proposed. These paired datasets allow researchers to both train and evaluate their models making use of standardized ground-truth images. To the best of our knowledge, in endoscopic domain, there are not publicly available in the literature. This is due to the difficulty of producing the same frame with and without exposures errors, naturally or with the help of a photo editing expert. Hence, our work aimed to create, through the use of GANs, a paired dataset of real images without any exposure error and the same image with an exposure error, that is a corrupted frame and its ground-truth. This dataset consists of three separate datasets containing i) normal endoscopic frames (without exposure errors), ii) synthetic overexposed frames, and iii) synthetic underexposed frames. Therefore, we have 1,231 real-overexposed pairs and 985 real-underexposed pairs; 2,216 paired frames in total, i.e., 4432 frames. Since our method for creating the synthetic data was stochastic, the distribution through the data varied in exposure intensity.
Steps to reproduce
The synthetic data was created from three state-of-the-art datasets (EAD2020, EAD2.0 and Hyper Kvasir), this raw data was filtered to keep only informative frames for our purposes. Then, by training an object detector with labeled data, we classified unlabeled data such as split data into normal (without exposure), overexposed and underexposed frames. Later, we performed style transfer to normal frames to induce exposure errors, that is what we called "synthetic" frames. Finally, we performed a quality evaluation, based on SSIM and PSNR metrics, to discard synthetic frames not properly "corrupted". The final results were quantitatively evaluated for us and qualitatively evaluated for medical experts. Every image X in the "normal-dataset" contains its own exposed (either over- or underexposed) synthetic version in its respective dataset. Both paired datasets were split in 70% for training, 27% for test, and 3% for validation.