This dataset is a collection of datasets from different sources related to the automatic detection of cyber-bullying. The data is from different social media platforms like Kaggle, Twitter, Wikipedia Talk pages and YouTube. The data contain text and labeled as bullying or not. The data contains different types of cyber-bullying like hate speech, aggression, insults and toxicity.
Steps to reproduce
The data is ready to use with any classifier of your choice. The code is used with this dateset is available in this repository:https://github.com/ewulczyn/wiki-detox