Arabic Hate Speech Dataset 2023

Published: 21 February 2024| Version 3 | DOI: 10.17632/mcnzzpgrdj.3
Contributors:
,
,
, Qasem Abu Al-Haija,
,
,

Description

Description of Jordanian Hate Speech Corpus (JHSC): The folder consists of two CSV files: 1. annotated-hatetweets-4-classes_train.csv Which contains (302,766) labeled tweets 2. annotated-hatetweets-4-classes_test.csv Which contains (100,923) labeled tweets Each file contains three features: 1. Tweet id: Unique ID given for each tweet (removed before training) 2. Text: The tweet text in Arabic, cleaned and pre-processed. 3. Label: the dataset has 4 labels: a. Negative: No hate speech is included in the tweet. b. Neutral: General tweet (add, prayer, no sentiment is included) c. Positive: A hate speech exists, bullying, sarcasm, racism, ...etc. d. Very positive: A severe hate speech exists; includes phrases that can cause fights, or very bad influence on people and society.

Files

Steps to reproduce

Please cite the following paper if you use our dataset: Ahmad A, Azzeh M, Alnagi E, Abu Al-Haija Q, Halabi D, Aref A and AbuHour Y (2024) Hate speech detection in the Arabic language: corpus design, construction, and evaluation. Front. Artif. Intell. 7:1345445. doi: 10.3389/frai.2024.1345445

Institutions

Princess Sumaya University for Technology

Categories

Artificial Intelligence, Natural Language Processing, Machine Learning, Information Classification, Detection System, Deep Learning

Funding

Amman, Jordan

Ministry of Higher Education and Scientific Research

ICT-Ict/1/2021

Licence