Human Abnormality Detection Based on Bengali Text

Published: 12 October 2020| Version 1 | DOI: 10.17632/gz2tbhpr37.1
Contributors:
Muhammad Firoz Mridha,
Ohi Md Abu Quwsar

Description

The dataset contained sentences with appropriate labels (normal/abnormal). Multiple specialists appropriately classified the sentences of the dataset. The dataset is generated in a CSV format. The CSV file contains two columns, "Abnormality" and "Sentence". Each Bengali sentence contains two target values, 1 (abnormal) or 0 (normal). The dataset comprises 14414 Bengali sentences, where 14312 sentences are normal, and 102 sentences are abnormal. For more details, please inspect the source paper of the dataset.

Files

Steps to reproduce

The data was gathered from volunteers who made conversations on social media sites like Twitter, Facebook, WhatsApp, Messenger, etc. The dataset was collected in the Bengali Text form. The dataset sentences had many irrelevant tags, spaces, emojis, etc. which was removed by an automated script.