Sentiment, Emotion analysis for mental health based on text

Published: 4 June 2026| Version 3 | DOI: 10.17632/dp338mw226.3
Contributors:
,
,
,

Description

This dataset contains 160,000 social media text records curated and customized from two distinct sources to facilitate advanced emotion recognition and sentiment analysis tasks. The text entries capture nuanced human emotions and psychological states expressed on digital platforms. Each record is categorized into one of ten specific classes, spanning standard emotions, general sentiments, and critical psychological states like depression. This curated dataset is highly suitable for training machine learning models, deep learning architectures (e.g., BERT, RoBERTa, BiLSTM), and ensemble meta-classifiers.

Files

Steps to reproduce

This dataset was created through a process of data integration, curation, and filtration from two distinct social media textual datasets. To reproduce or replicate this dataset, follow these sequential steps: 1. Data Sourcing & Collection: - Identify and retrieve two independent raw datasets containing social media text (such as tweets, posts, or comments) labeled with emotions and mental health indicators. 2. Data Merging & Formatting: - Load both datasets into a data processing environment (e.g., using Python Pandas or R). - Standardize the column names across both datasets to 'Text' (for the social media posts) and 'Emotion' (for the target labels). - Concatenate/merge the two datasets into a single unified dataframe. 3. Label Standardization & Curation: - Analyze the target labels from both original sources. - Map and harmonize overlapping or synonymous labels into 10 distinct categorical classes: 'love', 'happiness', 'sadness', 'Normal', 'hate', 'anger', 'Depression', 'fun', 'surprise', and 'worry'. - Filter out any records with ambiguous, corrupted, or irrelevant emotional labels to maintain dataset integrity. 4. Data Cleaning & Deduplication (Raw Text Preservation): - Remove any duplicate text entries to avoid data leakage during model training. - Drop rows with missing values (NaN) in either the 'Text' or 'Emotion' columns. - Keep the social media text in its completely RAW format (preserving original spelling, punctuation, and linguistic nuances) without applying heavy preprocessing, tokenization, or stopword removal, making it ideal for deep learning architectures like BERT/RoBERTa. 5. Final Export: - Reset the dataframe index. - Export the final curated corpus of 160,000 records into a single comma-separated values file named "Emotion_Sentiment_DataSet.csv".

Institutions

Categories

Computer Science, Artificial Intelligence, Mental Health, Data Science, Natural Language Processing, Machine Learning, Emotion Expression, Text Mining, Sentiment Analysis, Transformer-Based Deep Learning

Licence