Arabic Hate Speech Dataset 2023

Name: Arabic Hate Speech Dataset 2023
Creator: Ashraf Ahmad
Published: 2023-09-11T17:09:02.217Z
Keywords: Artificial Intelligence, Natural Language Processing, Machine Learning, Information Classification, Detection System, Deep Learning

Ahmad, Ashraf; Azzeh, Mohammad; Elnagi, Eman; Abu Al-Haija, Qasem; Halabi, Dana; Aref, Abdullah; Abu Hour, Yousef

doi:10.17632/mcnzzpgrdj.2

Arabic Hate Speech Dataset 2023

Published: 11 September 2023| Version 2 | DOI: 10.17632/mcnzzpgrdj.2

Contributors:

,

, Eman Elnagi, Qasem Abu Al-Haija, Dana Halabi, Abdullah Aref, Yousef Abu Hour

Description

Description of Jordanian Hate Speech Corpus (JHSC): The folder consists of two CSV files: 1. annotated-hatetweets-4-classes_train.csv Which contains (302,766) labeled tweets 2. annotated-hatetweets-4-classes_test.csv Which contains (100,923) labeled tweets Each file contains three features: 1. Tweet id: Unique ID given for each tweet (removed before training) 2. Text: The tweet text in Arabic, cleaned and pre-processed. 3. Label: the dataset has 4 labels: a. Negative: No hate speech is included in the tweet. b. Neutral: General tweet (add, prayer, no sentiment is included) c. Positive: A hate speech exists, bullying, sarcasm, racism, ...etc. d. Very positive: A severe hate speech exists; includes phrases that can cause fights, or very bad influence on people and society.

Files

Steps to reproduce

Please cite the following paper if you use our dataset: Ahmad, A.; Azzeh, M.; Alnagi, E.; Abu Al-Haija, Q.; Halabi, D.; Aref, A.; AbuHour, Y. Hate Speech Detection in the Arabic Language: Corpus Design, Construction and Evaluation. Preprints 2023, 2023090497. https://doi.org/10.20944/preprints202309.0497.v1

Institutions

Princess Sumaya University for Technology

Funders

Amman, Jordan
Ministry of Higher Education and Scientific Research
Libya
Grant ID: ICT-Ict/1/2021

Arabic Hate Speech Dataset 2023

Description

Files

Steps to reproduce

Institutions

Categories

Funders

Related Links

Licence