MedBanglaTrust3: A Bengali Dataset for Explainable and Trustworthy AI Health Suggestions
Description
MedBanglaTrust3 is a curated, expert-validated dataset developed to facilitate machine learning, deep learning, and natural language processing (NLP) tasks focused on evaluating the trustworthiness of AI-generated health suggestions in the Bengali (Bangla) language. This dataset is specifically tailored for low-resource language modeling and is particularly relevant in the context of cyberchondria, where users excessively rely on online health advice without clinical verification. It contains symptom-specific prompts along with responses from OpenAI's ChatGPT and Google's GEMINI search-generated summaries, manually labeled into three trustworthiness levels: Highly Relevant, Partially Relevant, and Not Relevant. The dataset supports the development of explainable AI (XAI) systems, text classification models, and context-aware AI assistants for healthcare use in underrepresented languages. Objective: To enable research and development of trust classification models, automated health dialogue systems, and responsible AI assistants by providing labeled, real-world AI-generated responses in Bangla that reflect varying degrees of medical relevance and accuracy. Dataset Composition: - Language: Bengali (Bangla) - Final Validated Instances: 6,660 Class Distribution (Balanced): 1. Highly Relevant – 2,220 responses 2. Partially Relevant – 2,220 responses 3. Not Relevant – 2,220 responses