Datasets Comparison

Version 1

Towards Comprehensive Cyberbullying Detection: A Dataset Incorporating Aggressive Texts, Repetition, Peerness, and Intent to Harm

Published:10 November 2023|Version 1|DOI:10.17632/wmx9jj2htd.1

Contributors:Naveed Ejaz, Salimur Choudhury, Fakhra Razi

Description

~~The increasing usage of social media networks has raised concerns about the growing frequency of cyberbullying incidents.~~ Cyberbullying is characterized by aggressive, repetitive, and intentional communication among peers. However, most existing datasets for cyberbullying detection only focus on aggressive texts classified as aggressive or non-aggressive, disregarding the other three aspects of cyberbullying. This ~~paper~~ ~~proposes~~ a ~~new~~ dataset ~~incorporating~~ ~~all~~ four aspects of ~~cyberbullying to address this gap~~. ~~The~~ ~~text~~ ~~messages~~ ~~are~~ ~~sourced~~ ~~from~~ a ~~real~~ dataset*, ~~while~~ ~~the~~ ~~users'~~ ~~data~~ is ~~generated~~ ~~synthetically.~~ ~~The~~ ~~resulting~~ ~~dataset~~ ~~contains~~ ~~messages~~ ~~exchanged~~ ~~randomly~~ ~~among~~ ~~different~~ ~~pairs~~ of ~~users,~~ ~~thus~~ ~~inculcating~~ ~~repetition.~~ ~~Additionally,~~ the ~~degree~~ of ~~peerness,~~ ~~defined~~ ~~and~~ ~~calculated~~ to ~~measure~~ ~~the~~ ~~likelihood~~ of ~~two~~ ~~users~~ ~~being peers~~, is ~~used~~. As a ~~result,~~ this dataset ~~encompasses~~ ~~all~~ ~~four~~ ~~aspects~~ of ~~cyberbullying~~ by ~~providing~~ ~~repeated~~ ~~aggressive~~ ~~messages~~ ~~among~~ ~~users~~ ~~along~~ ~~with~~ ~~quantitative~~ ~~values~~ of ~~the~~ ~~degree~~ of peerness and intent to harm.. Text Messages sourced from: Elsafoury, "Cyberbullying datasets," Mendeley. com, 2020. [Online]. Available: https://data. mendeley. com/datasets/jf4pzyvnpj/1.

Licence

Creative Commons Attribution 4.0 International

Version 2

A Comprehensive Dataset for Automated Cyberbullying Detection

Published:22 January 2024|Version 2|DOI:10.17632/wmx9jj2htd.2

Contributors:Naveed Ejaz, Salimur Choudhury, Fakhra Razi

Description

Cyberbullying is characterized by aggressive, repetitive, and intentional communication among peers. However, most existing datasets for cyberbullying detection only focus on aggressive texts classified as aggressive or non-aggressive, disregarding the other three aspects of cyberbullying. This dataset is a comprehensive dataset that incorporates the four aspects of Cyberbullying. This dataset is an updated version of the dataset presented in our paper[1] and has been developed using the same methodology. In this updated version, we present complete and enhanced data and the code to generate data. The aggressive and non-aggressive messages compiled from different sources [2,3] have also been shared. If you use this dataset, please cite our paper [1] [1] Ejaz, Naveed, Fakhra Razi, and Salimur Choudhury. "Towards comprehensive cyberbullying detection: A dataset incorporating aggressive texts, repetition, peerness, and intent to harm." Computers in Human Behavior (2023): 108123. Text Messages sourced from: [2] Elsafoury, "Cyberbullying datasets," Mendeley. com, 2020. [Online]. Available: https://data. mendeley. com/datasets/jf4pzyvnpj/1. [3] R. Kumar, A. N. Reganti, A. Bhatia, and T. Maheshwari, "Aggression-annotated Corpus of Hindi-English Code-mixed Data," in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May 7-12, 2018.

Licence

Creative Commons Attribution 4.0 International

Datasets Comparison

Version 1

Towards Comprehensive Cyberbullying Detection: A Dataset Incorporating Aggressive Texts, Repetition, Peerness, and Intent to Harm

Description

Categories

Licence

Version 2

A Comprehensive Dataset for Automated Cyberbullying Detection

Description

Categories

Licence