BanglaBlend: A Large-Scale Nobel Dataset of Bangla Sentences Categorized by Saint(Sadhu) and Common(Cholito) Form of Bengali Language

Published: 9 December 2024| Version 3 | DOI: 10.17632/7rx9mk8v4m.3
Contributors:
,
,

Description

This BanglaBlend dataset is a comprehensive collection of Bangla (Bengali) sentences meticulously categorized based on two specific forms: Saint(Sadhu) and Common(Cholito). This dataset is comprised of a total 7350 annotated Bangla sentences as well as it is preprocessed dataset where several data preprocessing techniques have been applied. This dataset is designed to facilitate research and development in natural language processing (NLP) and computational linguistics, particularly for Bangla, a widely spoken language in Bangladesh and parts of India. Specially, this dataset can be leveraged for several natural language processing task such as text summarization, text classification, sentiment analysis, automatic language translation.

Files

Institutions

Daffodil International University

Categories

Data Science, Natural Language Processing, Machine Learning, Bengali Language, Sentence Processing

Licence