ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition

Published: 11 February 2025| Version 1 | DOI: 10.17632/gbkszkt8z3.1
Contributors:
,
,
,
,
,

Description

We developed ANCHOLIK-NER, a Bangla Regional Named Entity Recognition dataset focusing on the Sylhet, Chittagong, and Barishal dialects. It comprises 10,443 sentences, evenly distributed across the three regions, with entities categorized into 10 types. The dataset, sourced from both formal (57.54%) and informal (42.46%) texts, enhances NER models by capturing regional linguistic nuances.

Files

Institutions

Bangladesh University of Engineering and Technology, Ahsanullah University of Science and Technology, Southeast University

Categories

Natural Language Processing, Dialect, Bangladesh

Licence