NDB-UFES: An oral cancer and leukoplakia dataset composed of histopathological images and patient data
The dataset collected contains histopathological images of oral squamous cell carcinoma and leukoplakia (represented by samples with and without epithelial dysplasia). It also contains sociodemographic data (gender, age, and skin color) as well as clinical data (tobacco use, alcohol consumption, sun exposure, fundamental lesion, type of biopsy, lesion color, lesion surface, and lesion diagnosis). A supplementary dataset consisting of patches of the captured histopathological images is also provided. The data were collected between 2010 and 2021 in patients managed at the Oral Diagnosis project (NDB) of the Federal University of Espírito Santo (UFES), Brazil.
Steps to reproduce
Microscopic analysis is performed by two or three oral pathologists that reach a histopathological diagnosis in consensus, taking into consideration sociodemographic, clinical, and image data in association with histopathological data. A total of 237 samples (image and metadata) were labeled in the 77 lesions of 69 patients of the dataset. Additionally, a total of 3763 histopathological image patches were relabeled. The aim of this dataset is to make possible open access to histopathologic images and metadata of oral potentially malignant disorders and oral cancer to test machine and deep learning algorithms. It also may be useful for educational purposes, i.e. to train dental students or to standardize specialists in oral pathology from the same center regarding the diagnosis of oral epithelial dysplasia and squamous cell carcinoma of the oral cavity. This dataset is a complementary material of , which shows that cured demographic and clinical data positively influence the performance of artificial intelligence models in the automated classification of oral cancer.  L. M. de Lima et al., “On the importance of complementary data to histopathological image analysis of oral leukoplakia and carcinoma using deep neural networks,” Intell. Med., 2023. https://doi.org/10.1016/j.imed.2023.01.004 The complete dataset consisting of the histopathological images with patient information, in addition to the derived image patches and source code are available at https://github.com/lmlima/IM_ComplementaryData_OralLeukoplakia