BanglaDial: A Merged and Imbalanced text Dataset for Bengali Regional dialect analysis.

Published: 24 February 2025| Version 1 | DOI: 10.17632/sx6ybcps2n.1
Contributors:
,

Description

This dataset is gathered from online repositories, and academic papers. It includes sentences in 12 distinct regional dialect of Bangladesh. The dataset is imbalanced, reflecting real-world dialect distributions as it depends on specific group and population. The dataset supports research in dialect classification, machine translation, and regional language analysis. Dialect-Wise Sentence Distribution: Chittagong: 8,819 Kishoreganj: 8,751 Narail: 7,829 Tangail: 6,793 Rangpur: 5,909 Narsingdi: 5,862 Standard Bangla: 4,545 Barisal: 4,270 Sylhet: 3,922 Mymensingh: 3,212 Noakhali: 2,500 Rajshahi: 891 Total : 63,303

Files

Institutions

Daffodil International University

Categories

Natural Language Processing, Machine Translation, Language Identification, Bengali Language

Licence