Student Performance Prediction Dataset.

Published: 19 November 2025| Version 1 | DOI: 10.17632/98hdyhxf58.1
Contributors:
godwin otu,
,

Description

This dataset contains the research-ready, preprocessed version of the publicly available Student Performance dataset originally compiled by Cortez and Silva (2008). The original dataset includes academic achievement records from two Portuguese secondary schools, along with demographic, socio-economic, behavioral, and school-related attributes. The raw data used in this research was obtained from Kaggle at: https://www.kaggle.com/datasets/henryshan/student-performance-prediction . For the purposes of the study, several preparation steps were applied to produce the research-ready version uploaded here. These steps include data cleaning, label preparation, feature selection, transformation of academic grades (G1, G2, G3), harmonization of categorical variables, and formatting into a machine-learning friendly structure. No synthetic information or additional records were introduced. The uploaded dataset reflects exactly the data used in the final experiments of the study, enabling full reproducibility of the results. This dataset supports the article “Explainable Machine Learning for Student Academic Performance Prediction in Data-Constrained Educational Settings.” The authors of the study do not claim ownership of the original dataset. All rights, authorship, and credit for the original data belong to the original creators and the Kaggle uploader. This upload serves solely to preserve the exact version of the dataset used for reproducibility in accordance with open-science practices. keywords: Student Performance Prediction, Machine Learning, Fairness Auditing, Interpretability Stability, Dataset Constraint. Original Source Credit: Cortez, P., & Silva, A. (2008). “Using Data Mining to Predict Secondary School Student Performance.” Available through Kaggle at: https://www.kaggle.com/datasets/henryshan/student-performance-prediction

Files

Categories

Artificial Intelligence, Education, Data Science, Machine Learning

Licence