Arabic Hate Speech Dataset 2023
Description
Description of Jordanian Hate Speech Corpus (JHSC): The folder consists of two CSV files: 1. annotated-hatetweets-4-classes_train.csv Which contains (302,766) labeled tweets 2. annotated-hatetweets-4-classes_test.csv Which contains (100,923) labeled tweets Each file contains three features: 1. Tweet id: Unique ID given for each tweet (removed before training) 2. Text: The tweet text in Arabic, cleaned and pre-processed. 3. Label: the dataset has 4 labels: a. Negative: No hate speech is included in the tweet. b. Neutral: General tweet (add, prayer, no sentiment is included) c. Positive: A hate speech exists, bullying, sarcasm, racism, ...etc. d. Very positive: A severe hate speech exists; includes phrases that can cause fights, or very bad influence on people and society.
Files
Steps to reproduce
Please cite the following paper if you use our dataset: Ahmad A, Azzeh M, Alnagi E, Abu Al-Haija Q, Halabi D, Aref A and AbuHour Y (2024) Hate speech detection in the Arabic language: corpus design, construction, and evaluation. Front. Artif. Intell. 7:1345445. doi: 10.3389/frai.2024.1345445
Institutions
Categories
Funding
Amman, Jordan
Ministry of Higher Education and Scientific Research
ICT-Ict/1/2021