Textbook Dataset from NCTB

Published: 11 September 2023| Version 1 | DOI: 10.17632/gktc5y2sy2.1


In our quest to advance Bangla language processing, we have created a specialized dataset tailored to our project's objectives. This dataset is a cornerstone in developing an effective Bangla Question-Answering system with a strong emphasis on customization. It comprises approximately 3,000 meticulously curated question-and-answer pairs. Human annotators, guided by NCTB textbooks from classes six to ten, painstakingly selected these pairs. Each passage in the dataset, averaging 387 words, offers rich context for meaningful question answering. Human annotators also diligently collected responses for various question types, ensuring the dataset's reliability and relevance in Bangla. Our primary goal is to develop a proficient Bangla question-answering system. We have organized the dataset into training and validation subsets to achieve this, conveniently encapsulated within CSV files. These files seamlessly integrate multiple passages with corresponding questions and expertly annotated answers. Our dataset forms the foundation for a precision-driven, context-aware Bangla question-answering system. It serves as a vital resource for researchers and developers working to enhance Bangla language processing capabilities, poised to advance the state of the art in this field.



BRAC University Department of Computer Science and Engineering


Natural Language Processing, Answer Extraction, Bengali Language, Reading Comprehension, Textbook, Text Comprehension, Text Processing