Data for: Prosodic alignment toward emotionally expressive speech: Comparing human and Alexa model talkers

Published: 12 May 2021| Version 2 | DOI: 10.17632/w54rh87pjx.2
Contributors:
Michelle Cohn, Melina Sarian, Georgia Zellou, Kristin Predeck

Description

This study tests whether individuals vocally align toward emotionally expressive prosody produced by two types of interlocutors: a human and a voice-activated artificially intelligent (voice-AI) device. Participants (n=66) completed a word shadowing experiment of 18 interjections (e.g., “Awesome”) produced in emotionally neutral and expressive prosodies by both a human and generated by a voice-AI system (Amazon’s Alexa). Results show increases in participants’ word duration, mean f0, and f0 variation in response to emotional expressiveness, suggesting that participants are aligning toward a general ‘positive-emotional’ speech style. Additionally, we observe two general differences in Alexa-shadowed speech (shorter duration, larger f0 variation), as well as differences in emotional expressiveness adjustments by model talker: more word duration increases for Alexa, but less increases for mean f0 and f0 variation. Post hoc analyses suggest that nearly all model talker differences can be attributed to acoustic differences in model talker’s productions. Furthermore, we see differences in participant gender: greater alignment toward emotionally expressive Alexa productions by the female participants. Taken together, these findings are relevant for understanding the role of emotion in speech communication theories, models of human vocal alignment, and for technology personification frameworks.

Files

Institutions

University of California Davis

Categories

Emotion, Human-Computer Interaction, Laboratory Phonetics

Licence