BanglaSentimentEmotionTextCorpus: A Dataset for Analyzing Sentiment and Emotion from Social Media, News Portals, and Literature in Bengali Language

Published: 27 January 2025| Version 2 | DOI: 10.17632/kztpv8g89p.2
Contributor:
Rownuk Ara Rumy

Description

The dataset consists of 34,812 Bengali posts and comments sourced from Facebook, Twitter, and Instagram, Bengali news portals and literature. Techniques employed in data acquisition included data scraping from social media accounts through API and scraping only text data from websites. Microblogs consist of posts and comments from platforms like Facebook, Twitter, and Instagram, which allow for the capture of informal and emotionally rich text. Newspaper and magazine articles provide formal, sentiment-related information through opinions. Online literature, including Bengali novels, poems, and blogs, incorporates semantic relationships and linguistic nuances. Text data is collected from public sources through automated scripts. We used selenium scripts, created using the Python programming language. We used APIs to obtain structured social media data. Additionally, we complied with the requirements of privacy, data collection, and ethics.It contains 5 Emotion and 5 Sentiment class. For emotion "Creepy" being the most frequent emotion with 12,000 entries, followed by "Unbiased" with 8,500 entries, "Joyful" with 7,500 entries, "Bullying" with 4,000 entries, and "Surprise" with 2,500 entries. On the other hand, for sentiment "Negative" being the most frequent with 8,000 entries, followed by "Neutral" with 7,000 entries, "Strongly Negative" with 6,800 entries, "Positive" with 5,500 entries, and "Strongly Positive" with 4,500 entries in that order.

Files

Institutions

Stamford University Bangladesh

Categories

Emotion, Sentiment Analysis

Licence