EmoAds: A Comprehensive Audio Dataset for Emotional Context Analysis in Advertisements

Published: 28 October 2023| Version 2 | DOI: 10.17632/svr3fzk2nt.2


The dataset is a valuable resource that incorporates textual and audio data, facilitating the recognition of emotions and evaluation of advertisement effectiveness. It has the potential to resolve concerns regarding the impact of advertisements on the emotions and purchase intent of users. The data collection procedure involved a survey for real-world user feedback on advertisements, yielding a textual dataset containing statements relating to emotions. Additionally, the dataset contains 100 audio advertisements in WAV format that were initially segmented into 60-second segments. The result of further division was 1,000 audio snippets. The dataset was enriched by various audio feature extraction techniques, including MFCC, Short Time Energy, Zero Crossing Rate, Power Spectral Density, and Spectrogram Analysis. The survey's dependent variables are Arousal (A), Valence (V), Dominance (D), Liking (L), and Purchase (P). Independent features are coefficient values extracted from audio feature extraction techniques. The dataset incorporates these characteristics for thorough analysis. Generalization simplified the use of classification and prediction algorithms by reducing the rating scale for emotional statements from 1 to 5 to 1 to 3, thereby reducing their complexity. The potential of the dataset includes emotion recognition, advertisement evaluation, algorithm benchmarking, and advertising strategy optimization. Companies, marketers, and advertisers can improve their advertising strategies and comprehend public perception and purchase rates. It also functions as a standard for testing algorithms. There may be correlations between audio features and user-reported emotions or purchase intent that are noteworthy. These findings can be utilized to create more effective advertisements and content.


Steps to reproduce

The data collection procedure involved gathering both textual and audio data in order to produce a valuable dataset. In the textual data collection, a structured survey was conducted in which users provided feedback on various advertisements by rating their emotional responses to statements such as Arousal, Valence, Dominance, and Liking. This rating gauge was reduced from 1 to 5 to 1 to 3 points. For audio data, 100 distinct advertisements were collected in WAV format and segmented into 60-second and 6-second segments, respectively. MFCC, Short Time Energy, Zero Crossing Rate, Power Spectral Density, and Spectrogram Analysis were used to extract audio features. This yielded coefficient values representing audio properties. The dataset combines these independent coefficients with user-reported emotions and purchase intent as dependent features. The dataset is a resource for emotion recognition, advertisement evaluation, and algorithm benchmarking, and it provides insights into the effect of advertisements on user emotions and purchase intent.


BRAC University


Machine Learning, Emotion, Audio Analysis, Deep Learning, Long Short-Term Memory Network