Student grade prediction dataset

Published: 16 June 2022| Version 1 | DOI: 10.17632/wf8568hxb7.1


The dataset provides a collection of 160 instances belonging to two classes (`pass' = 136 and `fail' = 24). The data is an anonymised, statistically sound and reliable representation of the original data collected from students studying computer science modules at a UK University. Each instance is made up of 19 features plus the class label. Eight of the features represent students' online behaviour including bio information retrieved from Virtual Learning Environment. Eleven of the features represent students' neighbourhood influence retrieved from Office for Students database. The data has been compiled and made available in de-facto/de-jure standard open formats (CSV and JSON). This data was collected and used in a research study undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. To encourage reproducibility of the experiments and results reported, the data is provided in the exact training-validation-testing splits used in the experiments.


Steps to reproduce

The data contains two categories of data namely: student behavioural statistics obtained from Blackboard VLE and student neighbourhood classification information obtained from the Office for Students young participation database. The latter was obtained by searching the students' postcode on the database to retrieve numerous classification data used to represent each are within the United Kingdom such as POLAR4 and TUNDRA. Further details of how to generate the data is available in the README file supplied with this data.


Edge Hill University


Machine Learning, Academic Achievement, Student Attitude, At-Risk Student, Student Behavior