TroubleQA-BE: A Bilingual Troubleshooting Question Answering Dataset for Bangla and English
Description
BTQA is a bilingual troubleshooting question answering dataset created to facilitate research and development in Natural Language Processing (NLP), multilingual artificial intelligence, conversational agents, and automated technical support systems. The dataset contains a total of 1,006 manually curated question-answer pairs, consisting of 503 Bangla QA pairs and 503 English QA pairs. The dataset focuses on troubleshooting and technical assistance scenarios commonly encountered in daily technology usage, including issues related to mobile devices, computers, software applications, internet connectivity, operating systems, hardware functionality, account management, and general technical support. Each question is paired with a concise, accurate, and contextually relevant answer designed to simulate real-world technical support interactions and improve the performance of AI-based support systems. The primary objective of BTQA is to provide a high-quality bilingual resource that can support a variety of downstream NLP and machine learning tasks, including question answering (QA), chatbot development, retrieval-augmented generation (RAG), semantic search, multilingual information retrieval, intent understanding, cross-lingual learning, and large language model (LLM) fine-tuning. The dataset is particularly valuable for Bangla NLP research, as publicly available domain-specific Bangla QA datasets remain limited compared to English resources. By including aligned troubleshooting content in both Bangla and English, BTQA enables comparative multilingual experimentation and supports the development of more inclusive AI systems for low-resource languages. The dataset is distributed in CSV format with UTF-8 encoding to ensure compatibility with multilingual text processing pipelines and modern machine learning frameworks. All entries were manually reviewed to maintain linguistic clarity, consistency, and practical relevance. BTQA is intended for academic research, educational purposes, benchmarking multilingual NLP systems, and building intelligent customer support applications capable of operating across multiple languages. The dataset aims to contribute to the advancement of multilingual AI technologies by providing a structured and domain-focused bilingual troubleshooting corpus that can serve as a valuable resource for researchers, students, and developers working in both academia and industry.
Files
Steps to reproduce
The dataset was manually curated by collecting troubleshooting-related questions and answers in both Bangla and English languages from common technical support scenarios and everyday technology usage contexts. Questions were created to represent realistic troubleshooting problems related to devices, software, internet connectivity, operating systems, and general technical assistance. Each question was paired with a concise and contextually relevant answer designed to simulate real-world support interactions. The dataset was organized into structured CSV files containing question-answer pairs with UTF-8 encoding to ensure compatibility with multilingual text processing systems. Data entries were manually reviewed and cleaned to maintain linguistic consistency, readability, and formatting quality across both languages. The final dataset was divided into Bangla and English subsets, each containing 503 question-answer pairs, resulting in a total of 1,006 bilingual QA pairs.
Institutions
- Daffodil International UniversityDhaka Division, Dhaka