Bangla Voice Dataset: Simple, Complex, and Compound Structures

Name: Bangla Voice Dataset: Simple, Complex, and Compound Structures
Creator: Md Abdullah-Al-Kafi Kafi
Published: 2025-09-02T19:30:39.086Z
Keywords: Linguistics, Computer Science, Natural Language Processing, Audio Analysis

Kafi, Md Abdullah-Al-Kafi; Moni, Raka; Raza, Dewan Mamun

doi:10.17632/2wn7c48dtp.3

Bangla Voice Dataset: Simple, Complex, and Compound Structures

Published: 2 September 2025| Version 3 | DOI: 10.17632/2wn7c48dtp.3

Contributors:

Md Abdullah-Al-Kafi Kafi,

,

Description

The dataset is a comprehensive resource designed for linguistic analysis, natural language processing (NLP), and speech recognition tasks specifically tailored for the Bangla language. It comprises the following key features: Textual Data: Sentence Types: The corpus includes a balanced collection of simple, complex, and compound sentences, carefully curated to represent diverse syntactic structures and real-world language usage in Bangla. Diversity: Sentences cover a wide range of topics and contexts, ensuring linguistic richness and variety. Voice Data: Audio Recordings: Each sentence is paired with high-quality voice recordings by native Bangla speakers, ensuring accurate pronunciation, intonation, and regional linguistic nuances. Annotation: Sentence Labeling: Each sentence is tagged as simple, complex, or compound, aiding in syntactic analysis and supervised learning applications. Applications: Speech Recognition and Synthesis: Ideal for training and evaluating speech-to-text and text-to-speech systems for Bangla. Language Modeling: Supports NLP tasks such as machine translation, sentiment analysis, and syntactic parsing. Educational Use: Useful for linguistic research, Bangla grammar teaching, and phonetic studies. Compliance: The dataset adheres to ethical guidelines, ensuring informed consent from all contributors. This dataset serves as a valuable asset for researchers, developers, and educators seeking to advance technologies and studies involving the Bangla language.

Files

Institutions

Daffodil International University

Bangla Voice Dataset: Simple, Complex, and Compound Structures

Description

Files

Institutions

Categories

Licence