Multiclass Social Media Dataset for Mental Health and Emotion Classification

Published: 4 June 2026| Version 2 | DOI: 10.17632/rw57wvg6z2.2
Contributor:

Description

This dataset comprises 42,518 unique, fully cleaned text records tailored specifically for mental health detection and emotion classification. The data represents social media textual expressions categorized into six precise emotional and psychological states: Positive, Neutral, Anger, Depression, Sadness, and Anxiety. This corpus contains zero duplicate entries and has been thoroughly pre-cleaned, making it a benchmark dataset for training reliable Machine Learning, Deep Learning (BERT, RoBERTa, BiLSTM), and Ensemble meta-classifiers in computational psychology and NLP.

Files

Steps to reproduce

To reproduce or utilize this final cleaned mental health text dataset, follow these sequential steps: 1. Text Data Aggregation: - Gather text sequences originating from online social platforms or emotion-labeled textual corpora containing mental health expressions. 2. Rigorous Data Auditing & Deduplication: - Load the entire corpus into a data-handling pipeline (e.g., Python Pandas). - Execute a strict deduplication check on the 'Text' column using functions like `drop_duplicates(subset=['Text'])` to completely eliminate redundant data and prevent data leakage during downstream machine learning workflows. 3. Label Alignment (6 Distinct Classes): - Categorize and map the text inputs into 6 standardized targets representing distinct behavioral and mental health states: 'Positive', 'Neutral', 'Anger', 'Depression', 'Sadness', and 'Anxiety'. - Verify that there are no overlapping labels or misspelled target classes. 4. Structural Cleaning & Quality Control: - Identify and discard rows with empty text segments or missing values (`dropna()`) in the target variables. - Ensure the raw linguistic characteristics (like slang or emotional context) are preserved while maintaining a clean, structured dataframe layout. 5. Verification and Final Output: - Confirm the final shape of the dataset stands precisely at 42,518 unique records. - Reset the row indexing and save the artifact as a comma-separated values file named "Mental_Health_6Class_Final_Cleaned.csv".

Institutions

Categories

Computer Science, Artificial Intelligence, Mental Health, Natural Language Processing, Machine Learning, Mental Disorder, Text Mining, Deep Learning, Sentiment Analysis, Meta Dataset

Licence