Student Performance Metrics Dataset

Published: 7 October 2024| Version 1 | DOI: 10.17632/5b82ytz489.1
Contributors:
Tahmid Hasan,
,

Description

The Student Performance Metrics Dataset provides a diverse collection of academic and non-academic attributes aimed at evaluating factors influencing student performance in higher education. It enables researchers to analyse relationships between student demographics, academic achievements, socio-economic factors, and extracurricular activities. Dataset Attributes: Department: The academic department the student is enrolled in (e.g., Computer Science, Business, etc.). Gender: The gender of the student. HSC: Score obtained in higher secondary education. SSC: Score obtained in secondary school education. Income: Monthly family income of their parents. Hometown: The type of area where the student resides (e.g., urban, rural). Computer: Proficiency level in computer usage. Preparation: Time spent on study preparation outside class hours. Gaming: Time spent on gaming activities daily. Attendance: Regularity in class participation. Job: Indicates if the student has a part-time job. English: Proficiency in English communication skills. Extra: Participation in extracurricular activities. Semester: Current semester the student is enrolled in. Last: Performance in the last semester. Overall: Cumulative Grade Point Average (CGPA). Purpose and Use Cases: The dataset serves as a resource for educational research, enabling trend analysis and the development of predictive models for academic success. Researchers can explore the impact of socioeconomic status, gender, and extracurricular activities on student performance. Potential use cases include building machine learning models to predict performance and analyzing factors that contribute to student success or dropout risks. Limitations: This dataset does not cover all potential influences on student performance, such as personal motivation or health. Future studies can enhance this dataset by including additional variables. Acknowledgments: This dataset is compiled as an open resource for academic research. Proper citation is appreciated in academic works utilizing this dataset.

Files

Steps to reproduce

Data Collection: The data was collected from a survey conducted among undergraduate students across various departments in a university. Students were asked to fill out a structured questionnaire that captured information related to their demographics, academic performance, and extracurricular activities. Survey Design: The questionnaire was designed to include the following sections: Demographic Information: Gender, Hometown, Family Income. Academic Performance: SSC and HSC scores, Department, Current Semester. Behavioural and Social Attributes: Gaming habits, Preparation time, Attendance, Part-time jobs, and English proficiency. Extracurricular Activities: Participation in sports, clubs, or other extracurricular engagements. Data Entry and Cleaning: The survey responses were entered into a spreadsheet, followed by data cleaning processes to handle missing values and ensure data consistency. Inconsistent or ambiguous responses were cross-checked with participants for validation. Data Preprocessing: All categorical variables (e.g., Gender, Department, Hometown) were encoded using one-hot encoding. Numerical values (e.g., SSC, HSC scores, Income) were standardised for uniformity. Data Validation: The dataset was validated by comparing against known academic records and cross-referencing demographic data to ensure accuracy. Discrepancies were addressed, and the final dataset was reviewed for completeness. Software and Tools Used: Data Entry: Microsoft Excel/Google Sheets. Data Cleaning and Preprocessing: Python (Pandas and NumPy libraries). Statistical Analysis and Visualisation: Python (Matplotlib, Seaborn). Reproducibility: To reproduce this dataset, a similar survey needs to be conducted with students using the same questionnaire. Standardized preprocessing steps, such as encoding categorical variables and normalizing numerical data, should be applied to ensure consistency with the original dataset.

Institutions

Universiti Malaya

Categories

Education, Higher Education, Academic Achievement, Socioeconomic Status, Extracurricular Activity, Demographic Analysis, Predictive Modeling, Student Performance

Licence