Multiclass Diabetes Dataset

Published: 26 February 2024| Version 1 | DOI: 10.17632/jpp8bsjgrm.1
Contributors:
Abdus Sahid, Mozaddid Ul Hoque Babar, Md Palash Uddin

Description

This refined dataset is originally based on the "Diabetes Dataset" uploaded by Ahlam Rashid in Mendeley Data. The link to the original dataset is: https://data.mendeley.com/datasets/wj9rwkp9c2/1. The original dataset contains a total of 1000 subjects divided into three classes: diabetic, non-diabetic, and predict-diabetic. Among the 1000 subjects, 844 are diabetic, 103 are non-diabetic, and 53 are predict-diabetic, resulting in an extreme class imbalance. We found a total of 174 duplicate subjects in the original dataset, which we subsequently removed. After removing the duplicate subjects, there were 690 diabetic, 96 non-diabetic, and 40 predict-diabetic subjects remaining. From the 826 unique subjects, we selected 128 diabetic, 96 non-diabetic, and 40 predict-diabetic subjects, resulting in a new dataset with a moderate class imbalance.

Files

Institutions

  • Hajee Mohammad Danesh Science and Technology University

Categories

Applied Sciences, Health Informatics, Diabetes

Licence