ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition
Published: 8 April 2026| Version 4 | DOI: 10.17632/gbkszkt8z3.4
Contributors:
, , , , , , , Description
We developed ANCHOLIK-NER, a Bangla Regional Named Entity Recognition dataset focusing on the Sylhet, Chittagong, Barishal, Mymensingh, and Noakhali dialects. It comprises 17,405 sentences, evenly distributed across the five regions, with entities categorized into 10 types. The dataset was sourced from publicly available resources and supplemented with manual translations.
Files
Institutions
- Bangladesh University of Engineering and TechnologyDhaka District, Dhaka
- Ahsanullah University of Science and TechnologyDhaka District, Dhaka
- Southeast UniversityDhaka Division, Dhaka
Categories
Natural Language Processing, Dialect, Bangladesh