Bangla-MedER: Bangla Medical Entity Recognition Dataset

Name: Bangla-MedER: Bangla Medical Entity Recognition Dataset
Creator: Rubel Sheikh
Published: 2025-10-17T14:20:06.330Z
Keywords: Natural Language Processing, Machine Learning, Bengali Language, Deep Learning

Sheikh, Rubel; Rafiq, Shifat Ara; Hasan, Md. mehedi; Ahmed, Shakil; Aurpa, Tanjim Taharat; Akter, Farzana

doi:10.17632/jt4gywvwtj.1

Bangla-MedER: Bangla Medical Entity Recognition Dataset

Published: 17 October 2025| Version 1 | DOI: 10.17632/jt4gywvwtj.1

Contributors:

Rubel Sheikh,

,

Description

The Bangla-MedER dataset is a carefully compiled collection of 2980 annotated Bangla texts, centered on the field of medical entity recognition. This collection has six unique entity types: Medicine/Chemical Name, Organ, Disease, Hormone, Pharmacological Class, and Common Medical Terms. For the Bangla language, which is mostly spoken in Bangladesh and certain parts of India, this dataset intends to support research in natural language processing (NLP) and medical text mining. For training and assessing medical entity identification algorithms, the dataset—which was assembled from a variety of online medical resources, including blogs and websites—is an invaluable tool. This dataset can be used for a range of natural language processing (NLP) applications, including entity extraction, text classification, and information retrieval, which will enhance medical informatics and healthcare data processing. Here we have also provided the English translated dataset of our prepared Bengali dataset in a separate .csv file.

Bangla-MedER: Bangla Medical Entity Recognition Dataset

Description

Files

Categories

Licence