EDUBOT: A comprehensive General Science Question Answering Dataset in Bangla Language
Description
Chatbots are increasingly being used for a wide range of services that require interaction in natural language processing. Working with Bangla Natural Language Processing (NLP) is a notable challenge in Bangladesh due to the complexity of the language. To address this, Edubot is created as an educational tool designed to facilitate the development of a question-answering system in the Bangla language, specifically focused on general science topics. This dataset is intended to support the creation of educational tools capable of answering science-related questions in Bangla, covering subjects such as biology, chemistry, physics, and environmental science. We developed a comprehensive dataset comprising 3,379 questions paired with answers related to general science, all presented in the Bangla language. The data is structured into two columns and is available in CSV format for ease of use. The dataset includes the following key components: 1. The dataset gathers a variety of resources, including textbooks, articles, and other educational materials. 2. Questions are developed to evaluate comprehension and strengthen understanding of general science concepts. 3. Edubot provides each general science question with a clear, accurate answer, along with the answer's starting position in the dataset for easy reference and verification. This dataset is highly beneficial for advancing research and development in Bangla NLP, particularly in creating machine learning and artificial intelligence-driven educational chatbots and conversational AI systems.