Arabic Sentiment Datasets

Published: 20 September 2024| Version 1 | DOI: 10.17632/6w9g62xc67.1
Contributor:
Tamara Alqablan

Description

This dataset is specifically designed for sentiment analysis (SA) in the Arabic language, serving as a crucial resource for developing and evaluating various SA models. The dataset contains [briefly describe the content, e.g., number of entries, types of sentiments (positive, negative, neutral), sources of the data like social media, reviews, etc.]. It has been curated to meet the unique linguistic characteristics of Arabic text, facilitating the training, validation, and benchmarking of machine learning and natural language processing models. While there are several sentiment analysis datasets available in multiple languages, this dataset focuses on Arabic, supporting research aimed at understanding sentiment in Arabic-speaking communities. To ensure the effectiveness of feature selection approaches in sentiment analysis, the dataset can be used alongside well-known datasets such as those available from the UCI Machine Learning Repository (https://archive.ics.uci.edu/), which provides a range of datasets commonly employed for evaluating feature selection techniques. The dataset aligns with previous work such as Al-Moslmi et al.'s construction of an Arabic sentiment lexicon for public use, which contributed significantly to Arabic sentiment analysis resources [1]. Additionally, this dataset draws inspiration from established Arabic corpora such as the Opinion Corpus for Arabic (OCA) by Rushdi-Saleh et al. [2], and Ar-Twitter, a corpus designed for sentiment analysis on Arabic tweets, as demonstrated by Abdulla et al. [3]. References: Al-Moslmi, T., Albared, M., Al-Shabi, A., Omar, N., Abdullah, S.: Arabic sentilexicon: Constructing publicly available language resources for Arabic sentiment analysis. Journal of Information Science, 44(3), 345–362 (2018). Rushdi-Saleh, M., Martín-Valdivia, M.T., Ureña-López, L.A., Perea-Ortega, J.M.: OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology, 62(10), 2045–2054 (2011). Abdulla, N., Mahyoub, N., Shehab, M., Al-Ayyoub, M.: Arabic sentiment analysis: Corpus-based and lexicon-based. In: Proceedings of The IEEE Conference on Applied Electrical Engineering and Computing Technologies (AEECT) (2013).

Files

Categories

Feature Selection, Sentiment Analysis

Licence