Dataset of Emotional Patterns and Synthetic Labeled Corpus for Anxiety and Depression Detection in Spanish
Description
This dataset provides a comprehensive collection of textual narratives in Spanish designed for Affective Computing and Natural Language Processing (NLP) research. The data collection was conducted at the Benemérita Universidad Autónoma de Puebla (BUAP) among students and researchers. The repository consists of four main components: 1. Raw Narrative Data (Dataset_Emotional_Patterns_Clean.csv): Contains 12 open-ended responses per participant, triggered by scenarios strategically designed to evoke the six basic emotions proposed by Paul Ekman (joy, sadness, fear, anger, surprise, and disgust). 2. Synthetic Labeled Corpus (Synthetic_Labeled_Corpus.csv): A processed dataset optimized for Machine Learning tasks, featuring text instances labeled for detecting anxiety, depression, or neutral states. This subset is particularly useful for training classification models in mental health contexts. 3. Data Dictionary (Data_Dictionary.csv): A detailed mapping that links the English metadata headers with the original Spanish survey questions and data types. 4. Survey Instrument (Survey_Instrument_ES.pdf): The original visual interface and informed consent provided to participants during the data collection process. This dataset addresses the scarcity of specialized Spanish-language corpora for emotion and mental health detection, providing a valuable resource for developing and benchmarking affective models.
Files
Steps to reproduce
1. Data Collection: Survey design based on Ekman's 6 basic emotions to trigger emotional narratives. 2. Anonymization: Removal of personal identifiable information using Python (pandas). 3. Synthesis & Labeling: Generation of a derived corpus focused on mental health (Anxiety/Depression) for multi-class classification tasks.
Institutions
- Benemérita Universidad Autónoma de PueblaPuebla, Puebla City