PAD-UFES-20: a skin lesion dataset composed of patient data and clinical images collected from smartphones

Published: 07-07-2020| Version 1 | DOI: 10.17632/zr7vgbcyr2.1
Andre Pacheco,
Gustavo R. Lima,
Amanda S. Salomão,
Breno Krohling,
Igor P. Biral,
Gabriel Giorisatto De Angelo,
Fábio C. R. Alves Jr,
José G. M. Esgario,
Alana C. Simora,
Pedro B. C. Castro,
Felipe B. Rodrigues,
Patricia H. L. Frasson,
Renato A. Krohling,
Helder Knidel,
Maria C. S. Santos,
Rachel B. Espírito Santo,
Telma L. S. G. Macedo,
Tania R. P. Canuto,
Luíz F. S. de Barros


Summary description - The PAD-UFES-20 dataset was collected along with the Dermatological and Surgical Assistance Program (in Portuguese: Programa de Assistência Dermatológica e Cirurgica - PAD) at the Federal University of Espírito Santo (UFES-Brazil), which is a nonprofit program that provides free skin lesion treatment, in particular, to low-income people who cannot afford private treatment. - The dataset consists of 2,298 samples of six different types of skin lesions. Each sample consists of a clinical image and up to 22 clinical features including the patient's age, skin lesion location, Fitzpatrick skin type, and skin lesion diameter.  - The skin lesions are: Basal Cell Carcinoma (BCC), Squamous Cell Carcinoma (SCC), Actinic Keratosis (ACK), Seborrheic Keratosis (SEK), Bowen’s disease (BOD), Melanoma (MEL), and Nevus (NEV). As the Bowen’s disease is considered SCC in situ, we clustered them together, which results in six skin lesions in the dataset, three skin cancers (BCC, MEL, and SCC) and three skin disease (ACK, NEV, and SEK) - All BCC, SCC, and MEL are biopsy-proven. The remaining ones may have clinical diagnosis according to a consensus of a group of dermatologists. In total, approximately 58% of the samples in this dataset are biopsy-proven. This information is described in the metadata. - The images present in the dataset have different sizes because they are collected using different smartphone devices. All images are available in .png format. - The metadata associated with each skin lesion is composed of up to 26 features. All features are available in a CSV document in which each line represents a skin lesion and each column a metadata feature. - In total, there are 1,373 patients, 1,641 skin lesions, and 2,298 images present in the dataset. Each image/sample has a reference to the patient and the skin lesion in the metadata. Ethics statement The dataset was collected along with the Dermatological and Surgical Assistance Program (PAD) of the Federal University of Espírito Santo. The program is managed by the Department of Specialized Medicine and was approved by the university ethics committee (nº 500002/478) and the Brazilian government through Plataforma Brasil (nº 4.007.097), the Brazilian agency responsible for research involving human beings. In addition, all data is collected under patient consent and the patient’s privacy is completely preserved. ____ If you have any question do not hesitate to get in touch with us.