BEASC: Bangla emotional audio-speech corpus - A speech emotion recognition corpus for the low-resource Bangla language

Name: BEASC: Bangla emotional audio-speech corpus - A speech emotion recognition corpus for the low-resource Bangla language
Creator: Rakesh Kumar Das
Published: 2022-02-09T15:14:14.150Z
Keywords: Machine Learning, Hidden Markov Models, Audio Signal Processing, Human-Computer Interaction, Bangladesh, Convolutional Neural Network, Long Short-Term Memory Network

Das, Rakesh Kumar; Islam, Nahidul; Ahmed, Md. Rayhan

doi:10.17632/t9h6p943xy.2

BEASC: Bangla emotional audio-speech corpus - A speech emotion recognition corpus for the low-resource Bangla language

Published: 9 February 2022| Version 2 | DOI: 10.17632/t9h6p943xy.2

Contributors:

,

Description

BEASC is an audio-speech emotion recognition corpus for the Bangla language. The developed dataset consists of voice data from 34 speakers from diverse age groups between 19 to 57 (mean = 28.75 and Standard deviation = 9.346), equally balanced with 17 males and 17 females. This dataset contains 1224 speech-audio data of four emotional states. There are four emotional states recorded for three sentences. The three sentences are i. ‘১২ টা বেজে গেছে,’ ii. ‘আমি জানতাম এমন কিছু হবে’, and iii. ‘এ কেমন উপহার’. These emotional states include four basic human emotions: Angry, Happy, Sad, and Surprise. Three trials were preserved for each emotional expression. Hence, the total number of utterances involves three sentences × three repetitions × four emotions × 34 speakers = 1224 recordings. The format of the audio file is a . WAV format. We consider that happy and sad emotional speech has normal intensity and angry and surprise emotional states have a strong intensity. The data files are divided into 34 individual folders. Each folder contains 36 audio recordings of each participating actor. BEASC is a balanced dataset with 306 recordings of each individual emotion. The size of the BEASC dataset is 619 MB. While most of the existing datasets of different languages are recorded inside a closed studio or cover a single sentence, this dataset is collected by recording through smartphones, hence preserving the slightly noisy real-life environment. BEASC is compatible with various shallow machine learning and deep learning architectures such CNN, LSTM, HMM, Transformer, etc. Each data file has a unique filename. We followed the same procedure as the famous RAVDESS dataset for the naming. The filename consists of seven two-digit numerical identifiers, separated by hyphens (e.g., 03-01-01-01-02-02-02.wav). Each two-digit numerical identifier defines the level of a different experimental factor. The identifiers are ordered: Modality - Statement type - Emotion - Emotion Intensity - Statement - Repetition - Actor.wav. For example, the filename “03-01-01-01-02-02-02.wav” refers to: Audio only (03) - Scripted (01) - Happy (01) - Normal intensity (01) - 2nd Statement (02) - 2nd Repetition (02) - 2nd Actor, Female (02).

Files

Institutions

Stamford University Bangladesh, United International University

BEASC: Bangla emotional audio-speech corpus - A speech emotion recognition corpus for the low-resource Bangla language

Description

Files

Institutions

Categories

Licence