Ontology-Based Dataset for Student and Course Performance Monitoring, Data Retrieval, and Decision-Making
Description
This dataset was developed to support the study of “Adaptive Ontology-Enabled Data Retrieval Model for Learning Analytics Integration Across Heterogeneous Educational Platforms.” The dataset addresses the challenge of integrating heterogeneous educational data sources, including Learning Management Systems (LMS) and Student Information Systems (SIS), into a unified framework for learning analytics applications. The research hypothesizes that an ontology-driven, graph-based model can more effectively integrate diverse data and enable accurate, flexible retrieval compared to traditional relational database approaches by preserving semantic relationships and supporting interoperable cross-platform querying. The dataset is organized into eight sheets: SECI_1 – enrolment records linking students, lecturers, faculties, and programmes. SECI_2 – course metadata, credit hours, and prerequisite structures. CPI_1 – grade distributions across semester, cohorts, groups, and lecturers. SAI_1 – detailed individual assessment scores, totals, and grades. SAI_2 – monthly student attendance by course and semester, and the lecturer responsible for teaching the group. SAI_3 – lecture engagement indicators measured through discussion posts. These sheets capture performance and engagement across conventional courses and classroom/lecturer activities. The ontology-based retrieval model successfully harmonized heterogeneous data without schema conflicts. A laboratory experimental session involving four domain experts was conducted to verify the retrieval results. The confusion matrix evaluation achieved above 99.8% accuracy and above 99.7% precision across both participating institutions (Institution A and Institution B). Only a small number of false positives and false negatives were recorded. How the Data Can Be Interpreted Records represent validated student and course performance data that can be interpreted across three themes: (1) managing resource allocation, including course groups, lecturer assignments, and prerequisite structures; (2) monitoring course performance through grade distributions and pass/fail analysis; and (3) monitoring student performance, attendance, and online engagement activities. Ontology cross-linking enables flexible semantic queries, such as “What is the total number of students who passed a course?” or “What was the student attendance for the previous month?” The dataset demonstrates the value of ontology-based integration for scalable, accurate, and flexible learning analytics across heterogeneous educational platforms.
Files
Steps to reproduce
Steps to reproduce Methods and Protocols The following is a structured workflow on how the data was gathered and how to reproduce our research: 1. Requirement Analysis – Conducted qualitative interviews with lecturers and administrators to identify the key data needed for learning analytics. 2. Data Extraction – Exported datasets from LMS, and SIS platforms into structured Excel sheets (6 categories covering enrolment, prerequisites, course outcomes, student performance, attendance, and engagement). 3. Preprocessing – Records were cleaned and anonymized, while grading codes, course identifiers, and data formats were standardized. Missing and inconsistent values were also normalized to ensure consistency across institutions. 4. Ontology Mapping – The SPC_Academic_Performance ontology was developed to semantically align educational records across institutions. To prepare the dataset for semantic integration, users should align the provided SPC ontology with the XLSX dataset using OpenRefine. 5. Validation – The ontology structure and semantic consistency were validated in Protégé using TDDonto2 and Description Logic (DL) axioms. In addition, domain experts cross-checked the ontology mappings and retrieval outputs to ensure alignment with institutional monitoring requirements. 6. Integration – The semantically aligned datasets were deployed into a graph-based repository environment, such as Neo4j or GraphDB, to support RDF storage and semantic data retrieval. 7. Querying – SPARQL queries were developed to address institutional learning analytics requirements, including ongoing assessment monitoring, attendance-performance analysis, course performance evaluation, and student engagement tracking. 8. Evaluation – Applied confusion matrix metrics (TP, FP, FN, TN) to measure retrieval accuracy. In summary, other researchers can reproduce this process by: 1. Conducting requirement analysis through interviews to identify relevant learning analytics questions. 2. Exporting LMS, SIS, and MOOC data into structured tabular format. 3. Mapping datasets to the SPC_Academic_Performance ontology (which is openly available). 4. Using ontology validation tools (e.g., TDDonto2 in Protégé) to check consistency. 5. Loading the integrated data into a graph database (e.g., Neo4j, GraphDB) and executing SPARQL queries to replicate retrieval tests.