Bangla-MedER: Bangla Medical Entity Recognition Dataset
Description
The Bangla-MedER dataset is a carefully compiled collection of 2980 annotated Bangla texts, centered on the field of medical entity recognition. This collection has six unique entity types: Medicine/Chemical Name, Organ, Disease, Hormone, Pharmacological Class, and Common Medical Terms. For the Bangla language, which is mostly spoken in Bangladesh and certain parts of India, this dataset intends to support research in natural language processing (NLP) and medical text mining. For training and assessing medical entity identification algorithms, the dataset—which was assembled from a variety of online medical resources, including blogs and websites—is an invaluable tool. This dataset can be used for a range of natural language processing (NLP) applications, including entity extraction, text classification, and information retrieval, which will enhance medical informatics and healthcare data processing. Here we have also provided the English translated dataset of our prepared Bengali dataset in a separate .csv file.