Bangla Regional Dialects Speech Dataset

Published: 26 May 2026| Version 1 | DOI: 10.17632/777wsgjgtm.1
Contributors:
M Sakib Rahman,
,
, Dr. Nasima Begum

Description

The Bangla Regional Dialects Speech Dataset is a curated speech corpus containing regional Bangla dialect audio recordings collected from multiple areas of Bangladesh. The dataset is designed to support research and development in speech recognition, natural language processing (NLP), dialect classification, speaker analysis, and other AI-based language technologies. The dataset contains speech recordings from 5 different regions of Bangladesh, representing diverse regional accents and pronunciations of the Bangla language. A total of 19 speech categories are included in the dataset. Each region contains 1330 unique Bangla sentences, resulting in a total of 6650 recorded speech samples across the complete dataset. All audio files are provided in: .wav audio format Mono channel 16 kHz sampling rate The dataset also includes transcription metadata files for efficient training and evaluation of machine learning and deep learning models. This dataset can be used for: Automatic Speech Recognition (ASR) Bangla dialect identification Speech-based AI systems NLP research Linguistic analysis Audio classification Deep learning research The dataset aims to contribute to the advancement of Bangla language technology and regional dialect research by providing a structured and high-quality regional speech resource for researchers and developers. Keywords: Bangla speech, Bengali speech recognition, ASR, speech dataset, dialect recognition, Bangla NLP, audio dataset, 16kHz speech, mono audio, regional dialects, machine learning dataset

Files

Steps to reproduce

1. Download the dataset from Mendeley Data. 2. Extract the dataset zip file. 3. Use the "metadata.csv" file to map audio files with their transcriptions, region labels, and categories. 4. Select the required dataset version (Fully processed, Processed, or Original Raw Dataset) based on the experiment. 5. Load the .wav audio files (Mono, 16 kHz) using any audio processing library (e.g., Librosa or Torchaudio). 6. Preprocess the audio if required (normalization, padding, feature extraction such as MFCC or Mel spectrogram). 7. Split the dataset into training, validation, and test sets. 8. Train a machine learning or deep learning model for ASR or dialect classification tasks. 9. Evaluate model performance using standard metrics such as accuracy, WER (Word Error Rate), or F1-score.

Institutions

Categories

Linguistics, Computer Science, Artificial Intelligence, Natural Language Processing, Speech Recognition, Machine Learning

Licence