ALERT (Analysis of Linguistic Extremism in Religious Text)

Published: 9 May 2025| Version 2 | DOI: 10.17632/2pwtrtcc72.2
Contributors:
SUHANA BINTA RASHID,
,
,
,

Description

The widespread dissemination of religiously aggressive content on social media platforms poses significant threats to social cohesion and communal harmony. Social media has emerged as a prevalent venue for discussing diverse topics, including religion which frequently result in debates. These debates often fuel animosity, incite violence, and spread life-threatening messages that disrupt societal peace and security. To address this challenge, we developed a novel Bengali dataset, ALERT, accompanied by English translations, to identify and classify religious aggression in texts. The dataset was obtained from several online platforms, including Facebook, YouTube, blogs, online news portals, and group discussions. We executed multiple stages for data preprocessing, including the elimination of duplicates, special characters, emoji to improve the coherence of the dataset. Each instance in the dataset was annotated by two of the lists of four annotators with diverse academic, religious, and racial backgrounds, with any discrepancies resolved by expert review. The ALERT dataset is a collection of 4,003 Bangla texts categorized as 1. hate speech (1,007), 2. vandalism (998), 3. life-threatening (994), 4. no aggression (1,004). The dataset is structured with the following fields: • Annotator 1 • Annotator 2 • Final Annotation • Text • English Translation Our developed dataset contains a mix of formal and informal Bangla texts, reflecting how people communicate in real life. Instead of simply labeling content as aggressive or not, it offers more detailed categories, helping with more precise content moderation. The dataset is publicly accessible for research purposes to promote innovation and collaboration within the Bengali NLP community.

Files

Institutions

Chittagong University of Engineering and Technology

Categories

Cybersecurity, Natural Language Processing, Machine Learning, Deep Learning

Licence