Bangla Voice Dataset: Simple, Complex, and Compound Structures

Published: 9 December 2024| Version 1 | DOI: 10.17632/2wn7c48dtp.1
Contributors:
Md Abdullah-Al-Kafi Kafi,
,

Description

The dataset is a comprehensive resource designed for linguistic analysis, natural language processing (NLP), and speech recognition tasks specifically tailored for the Bangla language. It comprises the following key features: Textual Data: Sentence Types: The corpus includes a balanced collection of simple, complex, and compound sentences, carefully curated to represent diverse syntactic structures and real-world language usage in Bangla. Diversity: Sentences cover a wide range of topics and contexts, ensuring linguistic richness and variety. Voice Data: Audio Recordings: Each sentence is paired with high-quality voice recordings by native Bangla speakers, ensuring accurate pronunciation, intonation, and regional linguistic nuances. Annotation: Sentence Labeling: Each sentence is tagged as simple, complex, or compound, aiding in syntactic analysis and supervised learning applications. Applications: Speech Recognition and Synthesis: Ideal for training and evaluating speech-to-text and text-to-speech systems for Bangla. Language Modeling: Supports NLP tasks such as machine translation, sentiment analysis, and syntactic parsing. Educational Use: Useful for linguistic research, Bangla grammar teaching, and phonetic studies. Compliance: The dataset adheres to ethical guidelines, ensuring informed consent from all contributors. This dataset serves as a valuable asset for researchers, developers, and educators seeking to advance technologies and studies involving the Bangla language.

Files

Categories

Linguistics, Computer Science, Natural Language Processing, Audio Analysis

Licence