Multiclass Social Media Dataset for Mental Health and Emotion Classification
Description
This dataset comprises 42,518 unique, fully cleaned text records tailored specifically for mental health detection and emotion classification. The data represents social media textual expressions categorized into six precise emotional and psychological states: Positive, Neutral, Anger, Depression, Sadness, and Anxiety. This corpus contains zero duplicate entries and has been thoroughly pre-cleaned, making it a benchmark dataset for training reliable Machine Learning, Deep Learning (BERT, RoBERTa, BiLSTM), and Ensemble meta-classifiers in computational psychology and NLP.
Files
Steps to reproduce
To reproduce or utilize this final cleaned mental health text dataset, follow these sequential steps: 1. Text Data Aggregation: - Gather text sequences originating from online social platforms or emotion-labeled textual corpora containing mental health expressions. 2. Rigorous Data Auditing & Deduplication: - Load the entire corpus into a data-handling pipeline (e.g., Python Pandas). - Execute a strict deduplication check on the 'Text' column using functions like `drop_duplicates(subset=['Text'])` to completely eliminate redundant data and prevent data leakage during downstream machine learning workflows. 3. Label Alignment (6 Distinct Classes): - Categorize and map the text inputs into 6 standardized targets representing distinct behavioral and mental health states: 'Positive', 'Neutral', 'Anger', 'Depression', 'Sadness', and 'Anxiety'. - Verify that there are no overlapping labels or misspelled target classes. 4. Structural Cleaning & Quality Control: - Identify and discard rows with empty text segments or missing values (`dropna()`) in the target variables. - Ensure the raw linguistic characteristics (like slang or emotional context) are preserved while maintaining a clean, structured dataframe layout. 5. Verification and Final Output: - Confirm the final shape of the dataset stands precisely at 42,518 unique records. - Reset the row indexing and save the artifact as a comma-separated values file named "Mental_Health_6Class_Final_Cleaned.csv".
Institutions
- National University BangladeshDhaka Division, Dhaka