Multi-Label Arabic Dataset

Published: 2 September 2020| Version 1 | DOI: 10.17632/rxhpvwwmbz.1
Nawal Aljedani,


The dataset is a collection of hierarchical multi-label Arabic texts, related to the Islamic field. It consists of 26,470 instances distributed over 578 labels ordered in a hierarchy. After ranking the features using (BR-Chi-Square) feature selection method, a different number of the high-ranking features are selected for evaluation purposes which are 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000 features. The processed version of the dataset with all aforementioned features sets is available in the ARFF file format suitable for MULAN multi-label classification tool, along with the XML file format that defines the hierarchical structure of the labels.



Natural Language Processing, Machine Learning, Classification System, Arabic Language, Categorization, Text Processing