Breast Cancer Diagnostic Dataset (Tabular) for Machine Learning Classification
Description
This dataset contains clinical features of breast tumors, designed for use in machine learning and AI-based classification tasks. Each row represents a patient sample with numeric attributes describing tumor characteristics. The dataset includes labels for diagnosis (benign vs malignant), making it suitable for supervised learning tasks such as classification, feature selection, and explainable AI studies. The dataset may be applied to: => Predictive modeling for early breast cancer detection => Feature importance and interpretability studies => Benchmarking traditional ML models vs. LLM-based tabular learning approaches. Researchers and practitioners may use this dataset for academic, educational, or applied AI purposes.
Files
Steps to reproduce
(1) Download the dataset file Breast_cancer_dataset.csv from this repository. (2) Open the file using any spreadsheet software (e.g., Microsoft Excel, Google Sheets) or load it into Python/R for analysis. (3) Each row corresponds to a patient case, and each column represents a clinical feature (such as tumor size, texture, etc.). (4) The diagnosis column indicates whether the tumor is benign (0) or malignant (1). (5) Researchers can directly apply machine learning models (e.g., logistic regression, decision trees, random forest, deep learning) for classification tasks. (6) Preprocessing (e.g., normalization or scaling) may be required before training models.
Institutions
- University of Dhaka Faculty of Science