Biomarker Dataset for Prediabetes Classification Using Interpretable Machine Learning

Name: Biomarker Dataset for Prediabetes Classification Using Interpretable Machine Learning
Creator: Maher Maalouf
Published: 2025-12-19T13:32:51.587Z
Keywords: Biomarker, Diabetes, Machine Learning, Oxidative Stress, Metabolism, C-Reactive Protein, Clinical Diabetes, Mitochondrial Function, Endocrinologist, Prediabetes, Inflammatory Cytokine

Maalouf, Maher; Jelinek, Herbert

doi:10.17632/x8z62gkhhw.1

Biomarker Dataset for Prediabetes Classification Using Interpretable Machine Learning

Published: 19 December 2025| Version 1 | DOI: 10.17632/x8z62gkhhw.1

Contributors:

Maher Maalouf,

Description

This dataset contains clinical and biomarker measurements from 604 adult participants who attended the DiabHealth rural diabetes screening clinic between 2002 and 2015. Variables include demographics (Gender, Age), cardiovascular history (CVD-Revised, HT-Status), glycemic markers (ScreenGlucose, HbA1c), lipid profile (Triglyceride, TC, HDL, LDL), inflammatory markers (CRP, IL-6, IL-1Beta, IL-10, MCP-1, IGF-1), oxidative stress markers (8-OHdG, GSH, GSSG, GSH/GSSG), and mitochondrial-related peptides (Humanin, MOTS-c, p66Shc). Prediabetes and control status in the associated manuscript were derived from ScreenGlucose according to ADA fasting glucose criteria (5.6–6.9 mmol/L for prediabetes; <5.6 mmol/L for controls). All data are de-identified and were used to develop and validate interpretable machine learning models for prediabetes classification.

Files

Steps to reproduce

The dataset is a de-identified analytic subset derived from the DiabHealth rural diabetes screening program, which prospectively collected data between 2002 and 2015. From the original cohort, we excluded participants with known diabetes, fasting glucose ≥ 7.0 mmol/L, age > 85 years, and records with missing key biomarkers. Prediabetes and control groups were defined using ADA fasting glucose criteria. Derived variables (e.g., GSH/GSSG ratio) were computed from the raw measures. The original study procedures, recruitment, and laboratory protocols are described in detail in Jelinek et al. (2006, 2014), cited in the associated manuscript.

Institutions

Khalifa University of Science and Technology

Biomarker Dataset for Prediabetes Classification Using Interpretable Machine Learning

Description

Files

Steps to reproduce

Institutions

Categories

Licence