UrduSER: A Dataset for Urdu Speech Emotion Recognition

Published: 17 December 2024| Version 3 | DOI: 10.17632/jcpfjnk5c2.3
Contributors:
,
,

Description

Speech Emotion Recognition (SER) is a rapidly evolving field of research aimed at identifying and categorizing emotional states through the analysis of speech signals. As SER holds significant socio-cultural and commercial importance, researchers are increasingly leveraging machine learning and deep learning techniques to drive advancements in this domain. A high-quality dataset is an essential resource for SER studies in any language. Despite Urdu being the 10th most spoken language globally, there is a significant lack of robust SER datasets, creating a research gap. Existing Urdu SER datasets are often limited by their small size, narrow emotional range, and repetitive content, reducing their applicability in real-world scenarios. To address this gap, the Urdu Speech Emotion Corpus (UrSEC) was developed. This comprehensive dataset includes 3500 Urdu speech signals sourced from 10 professional actors, with an equal representation of male and female speakers from diverse age groups. The dataset encompasses seven emotional states: Angry, Fear, Boredom, Disgust, Happy, Neutral, and Sad. The speech samples were curated from a wide collection of Pakistani Urdu drama serials and telefilms available on YouTube, ensuring diversity and natural delivery. Unlike conventional datasets, which rely on predefined dialogs recorded in controlled environments, UrSEC features unique and contextually varied utterances, making it more realistic and applicable for practical applications. To ensure balance and consistency, the dataset contains 500 samples per emotional class, with 50 samples contributed by each actor for each emotion. Additionally, an accompanying Excel file provides detailed metadata for each recording, including the file name, duration, format, sample rate, actor details, emotional state, and corresponding Urdu dialog. This metadata enables researchers to efficiently organize and utilize the dataset for their specific needs. The UrSEC dataset underwent rigorous validation, integrating expert evaluation and model-based validation to ensure its reliability, accuracy, and overall suitability for advancing research and development in Urdu Speech Emotion Recognition.

Files

Institutions

Islamia University, COMSATS Institute of Information Technology

Categories

Speech Recognition, Machine Learning, Urdu Language, Pakistan, Speech Signal Analysis, Convolutional Neural Network, Deep Learning

Licence