Data for: Sound context modulates perceived vocal emotion

Published: 13 January 2020| Version 1 | DOI: 10.17632/br978jz9b6.1
Marco Liuni, Gregory Bryant, Jean-Julien Aucouturier, Emmanuel Ponsot


-- Stimuli (mono sound files, 44.1 kHz/16-bit) -- - Two singers, one man and one woman, each recorded nine utterances of the pure vowel /a/ (duration = 1.5 seconds), at three levels of pitch (A3, C4 and D4 for the man, one octave higher for the woman), in vocalizations corresponding to three levels of portrayed anger / vocal arousal - A professional musician recorded nine guitar chords (duration = 3 s.), all permutations of the same chord (tonic, perfect fourth, perfect fifth) harmonically consistent with the above vocal pitches - Guitar tracks were distorted using a commercial distortion plugin (Multi by Sonic Academy) with a unique custom preset (.fxb file provided), obtaining nine distorted samples - Nine noise samples were generated by filtering white noise with the spectral envelope estimated frame-by-frame on the noise samples Both vocalizations and context stimuli were normalized in loudness to comfortable and equal levels. The 18 vocalizations and 4 backgrounds (no context, noise, clean guitar, distortion guitar) were then superimposed to create 72 different stimuli (onset of the vocalization = 30 milliseconds after the onset of the context). -- Procedure -- On each trial, participants listened to one stimulus (vocalization + background) and evaluated its perceived emotional arousal and valence on two SAM scales (Bradley & Lang, 1994). The main experiment included 360 randomized trials divided into 6 blocks. - First training block: 20 trials composed of vocalizations with no background (randomly selected from the subsequent set of stimuli). Listeners received a score out of 100 (actually a random number between 70 and 90) and were asked to maintain this performance in subsequent trials, despite the sounds being thereafter embedded in background noise. - Subsequent blocks: 18 vocalizations presented five times in four different contexts, with a 1s inter-trial interval. To motivate continued effort, participants were asked to maximize accuracy during the practice phase, and at the end of each block they received fake feedback scores, on average slightly below that of the training phase (a random number between 60 and 80). Participants were informed that they could receive a financial bonus if their accuracy was above a certain threshold (all participants received the bonus regardless of their score). -- Statistical analyses -- Valence and emotional arousal ratings given on the SAM scale were coded from 0 to 100. We analyzed the effect of context on these ratings with two repeated-measures ANOVAs, conducted separately on negativity (100-valence) and emotional arousal, using level of portrayed vocal arousal in the voice and context as independent variables. Post-hoc tests were Tukey HSD. All statistical analyses were conducted using R (R Core Team, 2013). Huynh-Feldt corrections (ε ̃) for degrees of freedom were used where appropriate. Effect sizes are reported using partial eta-squared ηp2.



Cognitive Science, Audio Signal Processing, Acoustic Behavior Associated with Vocalization