Political Arabic Article Dataset

Published: 8 Jan 2020 | Version 1 | DOI: 10.17632/spvbf5bgjs.1
Contributor(s):

Description of this data

PAAD: Political Arabic Article Dataset is a collection of political Arabic text, which covers modern Arabic language used in newspaper, blogs and social network. PAAD can be used in different Arabic NLP tasks such as Text Classification, Target, Article Orientation and Word Embedding. The text contains alphabetic, numeric, English word and symbolic words. The documents in the dataset are categorized into 3 classes: Reform "اصلاحي", Conservative "محافظ" and Revolutionary "ثوري". The number of documents for each class:
Reform = 80
Conservative = 58
Revolutionary = 68
PAAD contains a total number of 206 articles. Articles were manually collected and using python scripts specifically for Excel file. There are two Excel file first original file this file same raw data but in excel file second file with Arabic normalization:
1- إأٱآا = ا
2- ي = ى
3- ؤ ئ = ء
4- ة = ه
5- Remove diacritics as (ُ,ْ,َ,ِ,ّ,~,ً,ٍ,ٌ)
How to use it:
___________

  1. Unzip compressed resources.
  2. There are three main folders each folder labelled by the category's name as Reform = S, Conservative = M and Revolutionary = T.
  3. Each folder contains a set of article files corresponding to its category.
  4. There are 2 excel file first one as raw data but in one file with the label for each article second excel file with Arabic normalization.

Experiment data files

Latest version

  • Version 1

    2020-01-08

    Published: 2020-01-08

    DOI: 10.17632/spvbf5bgjs.1

    Cite this dataset

    Abd, dhafar (2020), “Political Arabic Article Dataset”, Mendeley Data, v1 http://dx.doi.org/10.17632/spvbf5bgjs.1

Statistics

Views: 81
Downloads: 0

Categories

Natural Language Processing, Machine Learning, Arabic Language, Categorization, Text Processing, Targeting, Sentiment Analysis

Licence

CC BY 4.0 Learn more

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?
You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.

Report