BABSA: A Large Scale Bangla Aspect Based Sentiment Analysis Dataset

Name: BABSA: A Large Scale Bangla Aspect Based Sentiment Analysis Dataset
Creator: M F R Siam Rohman
Published: 2025-12-08T12:34:04.062Z
Keywords: Natural Language Processing, Sentiment Analysis

Rohman, M F R Siam; Wafa, Narmeen

doi:10.17632/j7yb2sv263.1

BABSA: A Large Scale Bangla Aspect Based Sentiment Analysis Dataset

Published: 8 December 2025| Version 1 | DOI: 10.17632/j7yb2sv263.1

Contributors:

,

Description

BABSA (Bangla Aspect-Based Sentiment Analysis) is a large-scale, manually annotated dataset for fine-grained aspect-level sentiment analysis in Bangla. The dataset contains 15,860 quality-controlled instances spanning 21 domains, including book reviews, product reviews, news, politics, and social commentary. Dataset Files: final_set.csv – Primary release file containing 15,860 manually annotated instances after quality control and filtering. total.csv – Complete collection of 24,653 instances prior to filtering, provided for transparency and reproducibility. Schema of final_set.csv: (Column, Description) : [ (text_content, Full Bangla text (review, comment, or news excerpt) , (AnnotatedAspect : Comma separated list of Aspect term or phrase extracted from the text ), (AnnotatedSentiment: Comma separated Sentiment polarity (positive, neutral, or negative)) , (MacroCategory : Domain/topic category (one of 21 predefined categories) ) ] Schema of total.csv: (Column, Description) : [ (text_content, Full Bangla text (review, comment, or news excerpt) , (AnnotatedAspect : Comma separated list of Aspect term or phrase extracted from the text ), (AnnotatedSentiment: Comma separated list of Sentiment polarity (positive, neutral, or negative)) ] Text content was aggregated from four publicly available Bangla NLP corpora (BanglaBook, SentNoB, EmoNoBa, Sazzed) and a web-scraped Bangla news corpus (January–June 2025). All aspect-level annotations (aspect terms, boundaries, and sentiment labels) are original contributions created through a three-pass manual annotation protocol, achieving inter-annotator agreement of Cohen's κ ≥ 0.84.

Files

Institutions

North South University

BABSA: A Large Scale Bangla Aspect Based Sentiment Analysis Dataset

Description

Files

Institutions

Categories

Related Links

Licence