ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition
Published: 11 February 2025| Version 1 | DOI: 10.17632/gbkszkt8z3.1
Contributors:
, , , , , Description
We developed ANCHOLIK-NER, a Bangla Regional Named Entity Recognition dataset focusing on the Sylhet, Chittagong, and Barishal dialects. It comprises 10,443 sentences, evenly distributed across the three regions, with entities categorized into 10 types. The dataset, sourced from both formal (57.54%) and informal (42.46%) texts, enhances NER models by capturing regional linguistic nuances.
Files
Institutions
Bangladesh University of Engineering and Technology, Ahsanullah University of Science and Technology, Southeast University
Categories
Natural Language Processing, Dialect, Bangladesh