BengaliTenseCorpus: A comprehensive corpus in Bengali texts categorized in Present , Past, and Future

Published: 6 December 2024| Version 1 | DOI: 10.17632/w9mdy6tw84.1
Contributor:

Description

The BengaliTenseCorpus has been sourced from various publicly accessible Bangla blogs, Facebook pages, magazines, books, and news articles, and some of the data are self-made, which ensures a diverse representation of contemporary language use. A critical aspect of the dataset’s curation was maintaining an equal distribution of sentences across three tense categories: past, present, and future. The dataset comprises 13,500 Bangla sentences that are categorized into three classes: present tense with 4,550 sentences, past tense with 4,460, and future tense collection with 4,490 sentences. For labeling purposes, 3 numerical values are used as - 0, 1, and 2, respectively, for present tense, past tense, and future tense.

Files

Institutions

Daffodil International University

Categories

Data Science, Natural Language Processing, Machine Learning

Licence