SSC-BanglaTutor: A Curriculum-Aligned Bengali Dataset for Intelligent Tutoring Systems

Name: SSC-BanglaTutor: A Curriculum-Aligned Bengali Dataset for Intelligent Tutoring Systems
Creator: Eshraque Jabid Ifti
Published: 2025-10-27T01:12:43.477Z
Keywords: Computer Science, Artificial Intelligence, Education, Natural Language Processing, Intelligent Tutoring System, Large Language Model

Ifti, Eshraque Jabid; Ifty, Fihab; Hasan, Mehadi; Shil, Rahul Chandra; Saha, Utshab Kumar; Tanvir, Kazi; Rahman, Mahfujur; Gomes, Dipta

doi:10.17632/krn9bzypsn.2

SSC-BanglaTutor: A Curriculum-Aligned Bengali Dataset for Intelligent Tutoring Systems

Published: 27 October 2025| Version 2 | DOI: 10.17632/krn9bzypsn.2

Contributors:

,

, Dipta Gomes

Description

This dataset comprises a Bengali-language educational corpus specifically curated to support the fine-tuning and evaluation of AI-driven, hint-based tutoring systems aligned with the Secondary School Certificate (SSC) science curriculum of Bangladesh. It contains a total of 11,286 structured question–answer–hint entries, distributed across three core science subjects: - Biology: 4,859 entries (14 chapters) - Chemistry: 3,034 entries (12 chapters) - Physics: 3,393 entries (14 chapters) Each entry includes: - A question written in Bengali - Five progressively ranked hints guiding learners from general to specific concepts - A convergence metric estimating the probability of a correct response at each hint - Correct and distractor answers based on common student misconceptions - Curriculum-aligned topic tags mapped to the SSC syllabus All data are encoded in UTF-8 JSON Lines (.jsonl) format, ensuring compatibility with Bengali NLP tools and large-scale AI training pipelines. The dataset’s structured design supports personalized feedback, enabling adaptive learning, retrieval-augmented generation (RAG), and fine-tuning of large language models (LLMs) for education in low-resource languages.

SSC-BanglaTutor: A Curriculum-Aligned Bengali Dataset for Intelligent Tutoring Systems

Description

Files

Categories

Licence