Bengali Colloquial Dataset of Primary Medical Issues for Improving Health System

Published: 9 July 2021| Version 3 | DOI: 10.17632/4tt953xwk2.3
Dr. M. F. Mridha,


This dataset has been created for our country, Bangladesh where people will get help from it. This dataset is for the medical specialist classification and Bengali Named Entity Recognition which will play a vital role in multi-purpose.


Steps to reproduce

This dataset is created based on the Chief Complaints (CC) of doctors and medical agents when they hear problem statements. There are no documents or any storage of actual people's usual problem speech or statements. Doctors and other medical agents only write CC, which we collect to illustrate some sample speech based on qualitative study and in-depth interviews with the doctor and GP at Bangladesh University of Business and Technology (BUBT) Medical Center to know how people express their problem at the initial stage in the primary level health system. This dataset can only use in the field of health systems along with Artificial Intelligence, Machine Learning, Deep Learning, and NLP for making the system more efficient but can’t be used for treatment purposes. The raw data part is about where we collect the samples to build the data format according to our qualitative study and in-depth interview with the doctor and GP at BUBT Medical Centre.


Bangladesh University of Business and Technology


Health System, Disease, Symptom, Bengali Language, Patient Care, Asian Health