BengaliTenseCorpus: A comprehensive corpus in Bengali texts categorized in Present , Past, and Future
Published: 6 December 2024| Version 1 | DOI: 10.17632/w9mdy6tw84.1
Contributor:
Description
The BengaliTenseCorpus has been sourced from various publicly accessible Bangla blogs, Facebook pages, magazines, books, and news articles, and some of the data are self-made, which ensures a diverse representation of contemporary language use. A critical aspect of the dataset’s curation was maintaining an equal distribution of sentences across three tense categories: past, present, and future. The dataset comprises 13,500 Bangla sentences that are categorized into three classes: present tense with 4,550 sentences, past tense with 4,460, and future tense collection with 4,490 sentences. For labeling purposes, 3 numerical values are used as - 0, 1, and 2, respectively, for present tense, past tense, and future tense.
Files
Institutions
Daffodil International University
Categories
Data Science, Natural Language Processing, Machine Learning