Breast Cancer Diagnostic Dataset (Tabular) for Machine Learning Classification

Published: 19 September 2025| Version 1 | DOI: 10.17632/dbz42w9x8h.1
Contributor:

Description

This dataset contains clinical features of breast tumors, designed for use in machine learning and AI-based classification tasks. Each row represents a patient sample with numeric attributes describing tumor characteristics. The dataset includes labels for diagnosis (benign vs malignant), making it suitable for supervised learning tasks such as classification, feature selection, and explainable AI studies. The dataset may be applied to: => Predictive modeling for early breast cancer detection => Feature importance and interpretability studies => Benchmarking traditional ML models vs. LLM-based tabular learning approaches. Researchers and practitioners may use this dataset for academic, educational, or applied AI purposes.

Files

Steps to reproduce

(1) Download the dataset file Breast_cancer_dataset.csv from this repository. (2) Open the file using any spreadsheet software (e.g., Microsoft Excel, Google Sheets) or load it into Python/R for analysis. (3) Each row corresponds to a patient case, and each column represents a clinical feature (such as tumor size, texture, etc.). (4) The diagnosis column indicates whether the tumor is benign (0) or malignant (1). (5) Researchers can directly apply machine learning models (e.g., logistic regression, decision trees, random forest, deep learning) for classification tasks. (6) Preprocessing (e.g., normalization or scaling) may be required before training models.

Institutions

  • University of Dhaka Faculty of Science

Categories

Breast Cancer, Hemigraphis / Strobilanthes / Relatives, Machine Learning, Medical Dermatology, Healthcare Quality, Healthcare Research, Safety in Healthcare, Confidentiality in Healthcare, Emergency Medical Technician in Emergency Medical Service, Medical Bacteriology, Meta Dataset, Binary Classification

Licence