Diabetes Risk Prediction Dataset: Demographics, Lifestyle, and Clinical Indicators

Published: 13 October 2025| Version 2 | DOI: 10.17632/xv25yjbzkm.2
Contributors:
,
,

Description

đź“„ Dataset Description This dataset contains demographic, lifestyle, and clinical health indicators collected for the purpose of predicting diabetes risk using machine learning techniques. The data integrates both behavioral and physiological variables to provide a comprehensive view of factors influencing diabetes onset. đź“„Content 1. Demographics: Gender, Age 2. Lifestyle Factors: Physical Activity, Smoking Status, Alcohol Intake, Family History, Hypertension 3. Clinical Measurements: Glucose, Blood Pressure, Skin Thickness, Insulin, BMI, Cholesterol, Diabetes Pedigree Function 4. Target Variable: Outcome (Diabetic / Non-diabetic) đź“„Objective The primary goal of this dataset is to support the development and evaluation of classification models that can predict whether an individual is diabetic or non-diabetic based on measurable risk factors. đź“„Potential Use Cases 1. Building and testing machine learning models for diabetes prediction. 2. Exploring correlations between lifestyle choices and diabetes prevalence. 3. Conducting data analysis in healthcare research and medical decision support systems. 4. Developing early risk assessment tools for clinical practice. đź“„Format File type: CSV Rows: Each row represents an individual record. Columns: Attributes include demographic, lifestyle, clinical features, and diabetes outcome. đź“„Notes All values are anonymized. The dataset is intended for educational, research, and analytical purposes only. Users should validate models against external datasets before applying in clinical settings.

Files

Institutions

Daffodil International University

Categories

Public Health, Machine Learning

Licence