Multi-Label Arabic Dataset

Name: Multi-Label Arabic Dataset
Creator: Nawal Aljedani
Published: 2020-09-02T17:17:19.666Z
Keywords: Natural Language Processing, Machine Learning, Classification System, Arabic Language, Categorization, Text Processing

Aljedani, Nawal; Alotaibi, Reem; Taileb, Mounira

doi:10.17632/rxhpvwwmbz.1

Multi-Label Arabic Dataset

Published: 2 September 2020| Version 1 | DOI: 10.17632/rxhpvwwmbz.1

Contributors:

Nawal Aljedani, Reem Alotaibi, Mounira Taileb

Description

The dataset is a collection of hierarchical multi-label Arabic texts, related to the Islamic field. It consists of 26,470 instances distributed over 578 labels ordered in a hierarchy. After ranking the features using (BR-Chi-Square) feature selection method, a different number of the high-ranking features are selected for evaluation purposes which are 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000 features. The processed version of the dataset with all aforementioned features sets is available in the ARFF file format suitable for MULAN multi-label classification tool, along with the XML file format that defines the hierarchical structure of the labels.

Multi-Label Arabic Dataset

Description

Files

Categories

Licence