Mexican Emotional Speech Database (MESD)

Published: 10 October 2021| Version 2 | DOI: 10.17632/cy34mh68j9.2


The Mexican Emotional Speech Database (MESD) provides single-word utterances for anger, disgust, fear, happiness, neutral, and sadness affective prosodies with Mexican cultural shaping. The MESD has been uttered by both adult (male and female) and child non-professional actors: 3 female, 2 male, and 6 child voices are available (female mean age ± SD = 23.33 ± 1.53, male mean age ± SD = 24 ± 1.41, and children mean age ± SD = 9.83 ± 1.17). Words for emotional and neutral utterances come from two corpora: (corpus A) composed of nouns and adjectives that are repeated across emotional prosodies and types of voice (female, male, child), and (corpus B) which consists of words controlled for age-of-acquisition, frequency of use, familiarity, concreteness, valence, arousal, and discrete emotion dimensionality ratings. Particularly, words from corpus B are nouns and adjectives which subjective age of acquisition is under 9-year-old. Neutral-uttered words have valence and arousal ratings strictly greater than 4, but lower than 6 (in a 9-point-scale). Emotional-uttered words have valence and arousal ratings ranging from 1 to 4, or from 6 to 9. Furthermore, ratings for discrete emotional dimension greater than 2.5 (on a 5-point scale) allowed the emotional utterance with the corresponding anger, disgust, fear, happiness, or sadness prosody. Finally, words from corpus B were selected so that emotional prosodies do not differ as regards frequency of use, familiarity, and concreteness dimensions. The audio recordings took place in a professional studio with the following materials: (1) a Sennheiser e835 microphone with a flat frequency response (100 Hz to 10 kHz), (2) a Focusrite Scarlett 2i4 audio interface connected to the microphone with an XLR cable and to the computer, and (3) the digital audio workstation REAPER (Rapid Environment for Audio Production, Engineering, and Recording). Audio files were stored as a sequence of 24-bit with a sample rate of 48000Hz. Utterances are shared as 864 audio files in WAV format that are named according to the following pattern: <emotion>_<type of voice>_<word corpus>_<word>. <emotion> Anger, Disgust, Fear, Happiness, Neutral, or Sadness <type of voice> F: female, M: male, C: child <word corpus > A: corpus A, B: corpus B <utterance word > Entire word in lowercase letters The MESD seems to be the first set of single-word emotional utterances that includes both adult and child voices for the Mexican population. If you use this dataset for your work, please cite the related paper: M. M. Duville, L. M. Alonso-Valerdi, and D. Ibarra-Zarate, “The Mexican Emotional Speech Database (MESD): elaboration and assessment based on machine learning,” 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, p. 4, 2021.



Speech Analysis, Prosody, Culture, Emotion, Male, Adult, Female, Mexico, Child