A Context-Aware Bengali Toxic Comments Dataset
Description
This dataset provides a context-aware collection of Bengali toxic and non-toxic comments curated from the official Facebook pages of two leading Bangladeshi news portals, Prothom Alo and News24, during the period April–June 2025. All data were obtained from publicly available sources, ensuring that no private or personally identifiable information was collected or retained. In total, 1,004 comments were collected, covering diverse domains such as politics, sports, entertainment, and social issues. Unlike conventional toxic comment datasets, this dataset consists of several key attributes, including the news title, which represents the headline of the source article; metadata, which contains the detailed text of the corresponding news article; the target comment, which is the primary comment being analyzed; the predecessor comment, referring to the immediate parent comment where available; and the successor comment, which is the next comment in the thread. Additionally, the dataset retains the user name, representing the display name of the commenter, and a label assigned through majority voting to categorize each entry as either toxic or non-toxic. Regarding class distribution, the dataset contains 507 toxic and 497 non-toxic samples, resulting in a nearly balanced composition. Toxic comments are defined as those containing hate speech, offensive or abusive language, sarcasm, personal attacks, or threats. For example, one toxic comment states: “শালা জানোয়ার টারে একদম ঠিকমতো বাটাম দেওয়া হোক,” which translates to “That bastard should be properly beaten.” Non-toxic comments, by contrast, are neutral, informative, constructive, or polite in nature, without harmful expressions. For instance, one non-toxic comment states: “চরিত্র খারাপ মনে হয় না, তবে বদমেজাজি মনে হয়, রাগ কন্ট্রোল করতে পারে না,” translated as “Doesn't seem to have a bad character, but appears short-tempered and cannot control anger.”
Files
Institutions
- Jahangirnagar University