Bangla News Dataset

Name: Bangla News Dataset
Creator: Aisha Khatun
Published: 2019-12-09T15:55:07.433Z
Keywords: Natural Language Processing, Bengali Language

Khatun, Aisha; Rahman, Anisur; Islam, Md. Saiful

doi:10.17632/xp92jxr8wn.2

Bangla News Dataset

Published: 9 December 2019| Version 2 | DOI: 10.17632/xp92jxr8wn.2

Contributors:

Aisha Khatun, Anisur Rahman, Md. Saiful Islam

Description

A corpus on Bangla newspaper articles created using a custom web crawler containing 12 different topics. The total number of word tokens in this dataset is 28.5+ million. The number of unique words is around 3% of the entire vocabulary of the dataset. The Dataset is imbalanced. 20% of the dataset was separated as a held-out dataset.

Files

Institutions

Shahjalal University of Science and Technology

Bangla News Dataset

Description

Files

Institutions

Categories

Related Links

Licence