BABSA: A Large Scale Bangla Aspect Based Sentiment Analysis Dataset
Description
BABSA (Bangla Aspect-Based Sentiment Analysis) is a large-scale, manually annotated dataset for fine-grained aspect-level sentiment analysis in Bangla. The dataset contains 15,860 quality-controlled instances spanning 21 domains, including book reviews, product reviews, news, politics, and social commentary. Dataset Files: final_set.csv – Primary release file containing 15,860 manually annotated instances after quality control and filtering. total.csv – Complete collection of 24,653 instances prior to filtering, provided for transparency and reproducibility. Schema of final_set.csv: (Column, Description) : [ (text_content, Full Bangla text (review, comment, or news excerpt) , (AnnotatedAspect : Comma separated list of Aspect term or phrase extracted from the text ), (AnnotatedSentiment: Comma separated Sentiment polarity (positive, neutral, or negative)) , (MacroCategory : Domain/topic category (one of 21 predefined categories) ) ] Schema of total.csv: (Column, Description) : [ (text_content, Full Bangla text (review, comment, or news excerpt) , (AnnotatedAspect : Comma separated list of Aspect term or phrase extracted from the text ), (AnnotatedSentiment: Comma separated list of Sentiment polarity (positive, neutral, or negative)) ] Text content was aggregated from four publicly available Bangla NLP corpora (BanglaBook, SentNoB, EmoNoBa, Sazzed) and a web-scraped Bangla news corpus (January–June 2025). All aspect-level annotations (aspect terms, boundaries, and sentiment labels) are original contributions created through a three-pass manual annotation protocol, achieving inter-annotator agreement of Cohen's κ ≥ 0.84.
Files
Institutions
- North South University