FanSpeak - Bangla Toxic Sports Comment Dataset

Name: FanSpeak - Bangla Toxic Sports Comment Dataset
Creator: MD Arafat Alam Khandaker
Published: 2025-10-02T14:19:45.194Z
Keywords: Natural Language Processing

Khandaker, MD Arafat Alam; Raha, Ziyan Shirin; Bin Moin, Mukaffi; Dipta, Dipta; Hasib, Khan Md

doi:10.17632/vt27sjpnf7.1

FanSpeak - Bangla Toxic Sports Comment Dataset

Published: 2 October 2025| Version 1 | DOI: 10.17632/vt27sjpnf7.1

Contributors:

,

Description

The FanSpeak dataset contains 3,970 Bangla sports-related comments collected from online platforms, annotated into six categories: Individual, Team, Nation, Racism, Officials, and Fanbase. This dataset supports research in natural language processing (NLP) for Bangla, a low-resource language, with applications in toxic comment detection, targeted hate speech classification, and sentiment understanding in online sports discussions. While centered on sports, it also provides broader insights into online toxicity, discourse patterns, and multilingual content moderation systems.

Files

Institutions

Ahsanullah University of Science and Technology, Bangladesh University of Business and Technology, University of New South Wales Canberra at ADFA

FanSpeak - Bangla Toxic Sports Comment Dataset

Description

Files

Institutions

Categories

Licence