Optimizing Semantic Deep Forest for Tweet Topic Classification

Published: 23-02-2021| Version 1 | DOI: 10.17632/c2b8mj2hsf.1
Kheir Eddine Daouadi,
Rim Zghal Reabï,
Ikram Amous


The dataset includes: Dataset_1: Is a set of English labeled tweets as poltical or apolitical (Our manually labeled tweets + Weakly labeled tweets). Dataset_2: Is a set of English labeled tweets as discrimination or non discrimination published by [1] (Manually labeled tweets+ Weakly labeled tweets). Dataset_3: Is a set of Portuguese lableled tweets as political or apolitical published by [2] (Manually labeled tweets + Weakly labeled tweets). Please Read ---Read me---.txt files that have a description for each dataset. The files in the dataset includes the tweet content and their correponding class topic. References: [1] S. Yuan, X. Wu, Y. Xiang, Incorporating pre-training in long short-termmemory networks for tweet classification, Social Network Analysis and Mining 8 (1) (2018) 52. doi:10.1007/s13278-018-0530-1. [2] B. de Sousa Pereira Amorim, A. L. F. Alves, M. G. de Oliveira,C. de Souza Baptista, Using supervised classification to detect political tweets with political content, in: Proceedings of the 24th Brazil-675ian Symposium on Multimedia and the Web, ACM, 2018, pp. 245–252. doi:10.1145/3243082.3243113.