Human Abnormality Detection Based on Bengali Text
The dataset contained sentences with appropriate labels (normal/abnormal). Multiple specialists appropriately classified the sentences of the dataset. The dataset is generated in a CSV format. The CSV file contains two columns, "Abnormality" and "Sentence". Each Bengali sentence contains two target values, 1 (abnormal) or 0 (normal). The dataset comprises 14414 Bengali sentences, where 14312 sentences are normal, and 102 sentences are abnormal. For more details, please inspect the source paper of the dataset.
Steps to reproduce
The data was gathered from volunteers who made conversations on social media sites like Twitter, Facebook, WhatsApp, Messenger, etc. The dataset was collected in the Bengali Text form. The dataset sentences had many irrelevant tags, spaces, emojis, etc. which was removed by an automated script.