Bangla News Dataset
Published: 9 December 2019| Version 2 | DOI: 10.17632/xp92jxr8wn.2
Contributors:
Aisha Khatun, , Description
A corpus on Bangla newspaper articles created using a custom web crawler containing 12 different topics. The total number of word tokens in this dataset is 28.5+ million. The number of unique words is around 3% of the entire vocabulary of the dataset. The Dataset is imbalanced. 20% of the dataset was separated as a held-out dataset.
Files
Institutions
Shahjalal University of Science and Technology
Categories
Natural Language Processing, Bengali Language