FanSpeak - Bangla Toxic Sports Comment Dataset
Published: 2 October 2025| Version 1 | DOI: 10.17632/vt27sjpnf7.1
Contributors:
, , , , Description
The FanSpeak dataset contains 3,970 Bangla sports-related comments collected from online platforms, annotated into six categories: Individual, Team, Nation, Racism, Officials, and Fanbase. This dataset supports research in natural language processing (NLP) for Bangla, a low-resource language, with applications in toxic comment detection, targeted hate speech classification, and sentiment understanding in online sports discussions. While centered on sports, it also provides broader insights into online toxicity, discourse patterns, and multilingual content moderation systems.
Files
Institutions
Ahsanullah University of Science and Technology, Bangladesh University of Business and Technology, University of New South Wales Canberra at ADFA
Categories
Natural Language Processing