ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition

Published: 13 March 2025| Version 2 | DOI: 10.17632/gbkszkt8z3.2
Contributors:
,
,
,
,
,
,
,

Description

We developed ANCHOLIK-NER, a Bangla Regional Named Entity Recognition dataset focusing on the Sylhet, Chittagong, Barishal, Mymensingh, and Noakhali dialects. It comprises 17,405 sentences, evenly distributed across the five regions, with entities categorized into 10 types. The raw sentences were collected from two publicly available datasets and through web scraping from various online newspapers, articles.

Files

Institutions

Bangladesh University of Engineering and Technology, Ahsanullah University of Science and Technology, Southeast University

Categories

Natural Language Processing, Dialect, Bangladesh

Licence