A Brazilian dataset for screening the risk of the Chronic Kidney Disease

Published: 08-02-2021| Version 3 | DOI: 10.17632/2gkg7vvcrm.3
Alvaro Sobrinho,
Leandro Dias da Silva,
Angelo Perkusich,
Andressa Queiroz,
Maria Eliete Pinheiro


We collected the medical data (60 real-world medical records) from physical medical records of adult subjects (age ≥ 18) under the treatment of University Hospital Prof. Alberto Antunes of the UFAL, Brazil.  The data collection was approved by the Brazilian ethics committee of the Federal University of Alagoas, approval number 47350313.9.0000.5013.  The 60 real-world medical records are related to the four risk classes: low risk (30 records), moderate risk (11 records), high risk (16 records), and very high risk (3 records). An experienced nephrologist, with more than 30 years of CKD treatment and diagnosis in Brazil, labeled the risk classification based on the KDIGO guideline.  We only translated the dataset to English and converted the gender of subjects from string to a binary representation to enable the usage of different machine learning algorithms. We augmented the dataset to decrease the impact of imbalanced data and improve the data analysis (more 54 records) by duplicating real-world medical records and carefully modifying the attributes, i.e., increasing each CKD biomarker by 0.5. We selected the constant 0.5 with no other purpose than differ the instances and maintain the new one with the same label as the original. The augmented data was also reviewed by the experienced nephrologist to increase confidence in the data augmentation. The dataset does not contain duplicated and missing values. Therefore, we provide two datasets: original and augmented.