Clinical Stroke Risk Prediction Dataset

Published: 2 January 2026| Version 1 | DOI: 10.17632/2d9332pzfr.1
Contributors:
,

Description

Context According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths. This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relavant information about the patient. Attribute Information 1) id: unique identifier 2) gender: "Male", "Female" or "Other" 3) age: age of the patient 4) hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension 5) heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease 6) ever_married: "No" or "Yes" 7) work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed" 8) Residence_type: "Rural" or "Urban" 9) avg_glucose_level: average glucose level in blood 10) bmi: body mass index 11) smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"* 12) stroke: 1 if the patient had a stroke or 0 if not *Note: "Unknown" in smoking_status means that the information is unavailable for this patient

Files

Steps to reproduce

Clinical Stroke Risk Prediction Dataset : 22,419 patient records with 12 columns including id, gender, age, hypertension, heart_disease, ever_married, work_type, residence_type, avg_glucose_level, bmi, smoking_status, and stroke (0/1). Used to predict stroke risk.

Institutions

  • National Institute of Textile Engineering and Research

Categories

Public Health, Health, Healthcare Research, Health Care Environment in Health Care System

Licence