Mexican Emotional Speech Database (MESD)

Name: Mexican Emotional Speech Database (MESD)
Creator: Mathilde Marie Duville
Published: 2022-02-17T13:45:31.797Z
Keywords: Speech Adaptation, Speech Analysis, Voice Output, Prosody, Culture, Speech Perception, Emotion, Male, Adult, Female, Mexico, Child, Speech Generation, Human Voice

Duville, Mathilde Marie; Alonso-Valerdi, Luz María; Ibarra-Zarate, David I.

doi:10.17632/cy34mh68j9.5

Mexican Emotional Speech Database (MESD)

Published: 17 February 2022| Version 5 | DOI: 10.17632/cy34mh68j9.5

Contributors:

Mathilde Marie Duville, Luz María Alonso-Valerdi, David I. Ibarra-Zarate

Description

The Mexican Emotional Speech Database (MESD) provides single-word utterances for anger, disgust, fear, happiness, neutral, and sadness affective prosodies with Mexican cultural shaping. The MESD has been uttered by both adult and child non-professional actors: 3 female, 2 male, and 6 child voices are available (female mean age ± SD = 23.33 ± 1.53, male mean age ± SD = 24 ± 1.41, and children mean age ± SD = 9.83 ± 1.17). Words for emotional and neutral utterances come from two corpora: (corpus A) composed of nouns and adjectives that are repeated across emotional prosodies and types of voice (female, male, child), and (corpus B) which consists of words controlled for age-of-acquisition, frequency of use, familiarity, concreteness, valence, arousal, and discrete emotion dimensionality ratings. The audio recordings took place in a professional studio with the following materials: (1) a Sennheiser e835 microphone with a flat frequency response (100 Hz to 10 kHz), (2) a Focusrite Scarlett 2i4 audio interface connected to the microphone with an XLR cable and to the computer, and (3) the digital audio workstation REAPER (Rapid Environment for Audio Production, Engineering, and Recording). Audio files were stored as a sequence of 24-bit with a sample rate of 48000Hz. The amplitude of acoustic waveforms was rescaled between -1 and 1. Two speaker-embedded naturalness-reduced versions were created out of human emotional utterances for female voices from corpus B. Specifically, naturalness was progressively reduced from human voices to level 1 to level 2. In particular, duration and median pitch were edited on stressed syllables to reduce the difference between stressed and unstressed syllables. On whole utterances, F2/F1 and F3/F1 ratios were lowered by editing F2 and F3 frequencies. Intensity of harmonics 1 and 4 were also reduced. 24 utterances per emotion are available for each type of voice, corpus, and level of naturalness. They are shared as audio files in WAV format. Please see README for audio files nomenclature explanation. The MESD seems to be the first set of single-word emotional utterances that includes both adult and child voices for the Mexican population. Additionally, the MESD provides naturalness-reduced versions of emotional utterances. Citation M. M. Duville, L. M. Alonso-Valerdi, and D. Ibarra-Zarate, “The Mexican Emotional Speech Database (MESD): elaboration and assessment based on machine learning,” 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, p. 4, 2021. Duville, M.M.; Alonso-Valerdi, L.M.; Ibarra-Zarate, D.I. Mexican Emotional Speech Database Based on Semantic, Frequency, Familiarity, Concreteness, and Cultural Shaping of Affective Prosody. Data 2021, 6, 130. https://doi.org/10.3390/data6120130

Files

Institutions

Instituto Tecnologico y de Estudios Superiores de Monterrey

Mexican Emotional Speech Database (MESD)

Description

Files

Institutions

Categories

Licence