Reddit Ideological and Extreme Bias Dataset - Part 3

Published: 28 February 2024| Version 1 | DOI: 10.17632/f7knr8r94w.1
Kamalakkannan Ravi


Data 2: Dataset with articles posted in the Liberal, Conservative, and Restricted (private or banned) subreddits. In total, we collected a corpus of 1.3 million articles. We have collected news articles to understand radicalized communities through the shared news articles. Part 3 has Data 2 (Raw and Unlabeled Data - reamaining 36 of the 76 .json files)


Steps to reproduce

Data 1: 1. All articles (Raw) - 226,010 2. Sampled class-balanced articles - 45,108 3. Annotated articles - 4,000 Each folder has two files: Liberal.json and Conservative.json Data 2: 1. Raw and Labeled Data - 377,144 (Folder has 3 .json files for Liberal, Conservative, and Restricted classes) 2. Raw and Unlabeled Data - 922,522 (Folder has 76 .json files) Cite: Ravi, K., Vela, A. E., & Ewetz, R. (2022, December). Classifying the Ideological Orientation of User-Submitted Texts in Social Media. In 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 413-418). IEEE.


University of Central Florida


Computational Linguistics, Social Media, Social Collaborative Computing, Computer Modeling in Social Science, Ideology