Dataset of Emotional Patterns and Synthetic Labeled Corpus for Anxiety and Depression Detection in Spanish

Name: Dataset of Emotional Patterns and Synthetic Labeled Corpus for Anxiety and Depression Detection in Spanish
Creator: Luis Yael Méndez Sánchez
Published: 2026-02-26T08:28:39.924Z
Keywords: Social Sciences, Psychology, Computer Science, Natural Language Processing, Machine Learning

Méndez Sánchez, Luis Yael

doi:10.17632/mrkr9t2rfn.1

Dataset of Emotional Patterns and Synthetic Labeled Corpus for Anxiety and Depression Detection in Spanish

Published: 26 February 2026| Version 1 | DOI: 10.17632/mrkr9t2rfn.1

Contributor:

Description

This dataset provides a comprehensive collection of textual narratives in Spanish designed for Affective Computing and Natural Language Processing (NLP) research. The data collection was conducted at the Benemérita Universidad Autónoma de Puebla (BUAP) among students and researchers. The repository consists of four main components: 1. Raw Narrative Data (Dataset_Emotional_Patterns_Clean.csv): Contains 12 open-ended responses per participant, triggered by scenarios strategically designed to evoke the six basic emotions proposed by Paul Ekman (joy, sadness, fear, anger, surprise, and disgust). 2. Synthetic Labeled Corpus (Synthetic_Labeled_Corpus.csv): A processed dataset optimized for Machine Learning tasks, featuring text instances labeled for detecting anxiety, depression, or neutral states. This subset is particularly useful for training classification models in mental health contexts. 3. Data Dictionary (Data_Dictionary.csv): A detailed mapping that links the English metadata headers with the original Spanish survey questions and data types. 4. Survey Instrument (Survey_Instrument_ES.pdf): The original visual interface and informed consent provided to participants during the data collection process. This dataset addresses the scarcity of specialized Spanish-language corpora for emotion and mental health detection, providing a valuable resource for developing and benchmarking affective models.

Files

Steps to reproduce

1. Data Collection: Survey design based on Ekman's 6 basic emotions to trigger emotional narratives. 2. Anonymization: Removal of personal identifiable information using Python (pandas). 3. Synthesis & Labeling: Generation of a derived corpus focused on mental health (Anxiety/Depression) for multi-class classification tasks.

Institutions

Benemérita Universidad Autónoma de Puebla
Puebla, Puebla City

Dataset of Emotional Patterns and Synthetic Labeled Corpus for Anxiety and Depression Detection in Spanish

Description

Files

Steps to reproduce

Institutions

Categories

Licence