BanglaSportsEmotion: A Multi-Class Sentiment Dataset for Bangla Sports Commentary

Published: 12 February 2026| Version 2 | DOI: 10.17632/ykxsndr53y.2
Contributors:
Soumik Paul Jisun Soumik, Md Jalal Uddin Chowdhury, Md Mahdi Hossain Hira

Description

BanglaSportsEmotion addresses a critical gap in Bangla natural language processing by providing the first comprehensive, multi-sport sentiment corpus specifically designed for emotion analysis in sports commentary. BanglaSportsEmotion is a manually annotated dataset containing 8,582 Bangla sports comments collected from Facebook and YouTube platforms. This dataset enables researchers to develop and benchmark sentiment analysis models for the Bangla language, particularly in the sports domain where fan emotions range from celebration to criticism. Data Collection Sources and Scope Data was collected from various Bangla sports-related online platforms to ensure broad coverage and diversity: 1. Sources: Publicly accessible comments collected from sports-related Facebook pages/groups and YouTube channels (examples include bdcrictime.com discussions, T Sports video comments, RabbitHoleBD sports threads). 2. Initial raw volume: ≈ 16,000 raw comments were collected prior to filtering. 3. Final released volume: 8,582 comments after deduplication, spam removal, and relevance filtering. 4. Sports Coverage: Cricket, football, volleyball, hockey, and other sports 5. Geographic Scope: Comments about Bangladeshi national teams, international teams, club football, and various sporting events 6. Time Period: Recent comments reflecting current fan discourse and language usage Class Definitions: i. Joy (Label 0) - Positive emotions such as happiness, excitement, celebration, or praise for a team or player ii. Anger (Label 1) - Negative emotions directed toward one's own team, players, or performance. iii. Support (Label 2) - Encouragement, loyalty, or backing for a team or player regardless of the outcome. iv. Toxic (Label 3) - Harsh, offensive, or sarcastic remarks often directed at opponents or rival fans. Key Features: 1. Fairly balanced class distribution 2. Multi-sport coverage ensuring broader generalizability 3. Clear annotation guidelines for reproducibility 4. High inter-annotator agreement 5. Captures nuanced emotions including semantic overlap between Anger and Toxic classes Use Cases: i. Sentiment analysis model development for Bangla ii. Low-resource NLP research iii. Sports analytics and fan engagement studies iv. Benchmark evaluation for transformer and classical ML models v. Cross-lingual sentiment analysis studies File Format CSV file with two columns: - Comment text (Bangla) - Class label (0: Joy; 1: Anger; 2: Support; 3: Toxic)

Files

Institutions

Categories

Computer Science, Computational Linguistics, Data Science, Natural Language Processing, Machine Learning, Bengali Language, Deep Learning

Licence