BanglaBlend: A Large-Scale Nobel Dataset of Bangla Sentences Categorized by Saint(Sadhu) and Common(Cholito) Form of Bengali Language
Published: 9 December 2024| Version 3 | DOI: 10.17632/7rx9mk8v4m.3
Contributors:
, , Description
This BanglaBlend dataset is a comprehensive collection of Bangla (Bengali) sentences meticulously categorized based on two specific forms: Saint(Sadhu) and Common(Cholito). This dataset is comprised of a total 7350 annotated Bangla sentences as well as it is preprocessed dataset where several data preprocessing techniques have been applied. This dataset is designed to facilitate research and development in natural language processing (NLP) and computational linguistics, particularly for Bangla, a widely spoken language in Bangladesh and parts of India. Specially, this dataset can be leveraged for several natural language processing task such as text summarization, text classification, sentiment analysis, automatic language translation.
Files
Institutions
Daffodil International University
Categories
Data Science, Natural Language Processing, Machine Learning, Bengali Language, Sentence Processing