Human Abnormality Detection Based on Bengali Text

Published: 12 October 2020| Version 1 | DOI: 10.17632/gz2tbhpr37.1
, Ohi Md Abu Quwsar


The dataset contained sentences with appropriate labels (normal/abnormal). Multiple specialists appropriately classified the sentences of the dataset. The dataset is generated in a CSV format. The CSV file contains two columns, "Abnormality" and "Sentence". Each Bengali sentence contains two target values, 1 (abnormal) or 0 (normal). The dataset comprises 14414 Bengali sentences, where 14312 sentences are normal, and 102 sentences are abnormal. For more details, please inspect the source paper of the dataset.


Steps to reproduce

The data was gathered from volunteers who made conversations on social media sites like Twitter, Facebook, WhatsApp, Messenger, etc. The dataset was collected in the Bengali Text form. The dataset sentences had many irrelevant tags, spaces, emojis, etc. which was removed by an automated script.


Bangladesh University of Business and Technology


Semantics, Natural Language Processing, Machine Learning, Bengali Language, Natural Language Semantics, Sociality, Deep Learning