BECS: A dataset for multi-class sentiment analysis of Bangla e-commerce reviews
Description
The BECS (Bangla E-Commerce Sentiment) dataset is a high-quality textual resource for multi-class sentiment analysis in Bangla, developed to address the lack of publicly available e-commerce sentiment datasets for this low-resource language. It contains 15,584 Bangla reviews collected from Facebook, YouTube, and Daraz, reflecting real customer opinions on online shopping experiences. The data were refined from an initial 20,604 reviews through cleaning, normalization, deduplication, and anonymization. Each review is manually annotated as Positive, Negative, or Neutral by trained native Bangla speakers, ensuring annotation reliability and balanced class distribution. The dataset is released in CSV format and has been validated through baseline experiments using traditional machine learning and transformer-based models, confirming its suitability for modern NLP research. # BECS is suitable for applications in: - Bangla sentiment analysis and opinion mining - Benchmarking machine learning and transformer-based NLP models - E-commerce review analysis and customer feedback modeling - Low-resource language research and multilingual NLP studies # Key Features - Total entries: 15,584 Bangla reviews - Data type: Text - Sentiment categories: Positive (6,019), Negative (3,841) and, Neutral (5,724) - Data format: CSV / XLSX - Source platforms: Facebook, YouTube, Daraz - Language: Bangla (with minimal code-mixing) - Annotation: Manual, expert-validated # Potential Uses This dataset can be used to train, evaluate, and benchmark sentiment analysis models for Bangla text, conduct transfer learning experiments, and support comparative studies in low-resource NLP. It also serves as a practical open-access resource for researchers, developers, and educators working on e-commerce analytics, opinion mining, and equitable language technology development.
Files
Institutions
- Leading UniversitySylhet Division, Sylhet