Biomarker Dataset for Prediabetes Classification Using Interpretable Machine Learning
Description
This dataset contains clinical and biomarker measurements from 604 adult participants who attended the DiabHealth rural diabetes screening clinic between 2002 and 2015. Variables include demographics (Gender, Age), cardiovascular history (CVD-Revised, HT-Status), glycemic markers (ScreenGlucose, HbA1c), lipid profile (Triglyceride, TC, HDL, LDL), inflammatory markers (CRP, IL-6, IL-1Beta, IL-10, MCP-1, IGF-1), oxidative stress markers (8-OHdG, GSH, GSSG, GSH/GSSG), and mitochondrial-related peptides (Humanin, MOTS-c, p66Shc). Prediabetes and control status in the associated manuscript were derived from ScreenGlucose according to ADA fasting glucose criteria (5.6–6.9 mmol/L for prediabetes; <5.6 mmol/L for controls). All data are de-identified and were used to develop and validate interpretable machine learning models for prediabetes classification.
Files
Steps to reproduce
The dataset is a de-identified analytic subset derived from the DiabHealth rural diabetes screening program, which prospectively collected data between 2002 and 2015. From the original cohort, we excluded participants with known diabetes, fasting glucose ≥ 7.0 mmol/L, age > 85 years, and records with missing key biomarkers. Prediabetes and control groups were defined using ADA fasting glucose criteria. Derived variables (e.g., GSH/GSSG ratio) were computed from the raw measures. The original study procedures, recruitment, and laboratory protocols are described in detail in Jelinek et al. (2006, 2014), cited in the associated manuscript.
Institutions
- Khalifa University of Science and Technology