Automated Personality Prediction

Published: 12 February 2024| Version 1 | DOI: 10.17632/3sndbd4p84.1
Contributor:
Fatima Habib

Description

This is a dataset of preprocessed texts from Reddit Platform and their corresponding Big Five Scores for 1608 users of the platform with more than 27,000 comments.

Files

Steps to reproduce

The data is extracted from the personality-focused PANDORA dataset. The texts are segmented into 3 files necessary to train a large language model, validate it and finally evaluate the results.

Institutions

  • National University of Computer and Emerging Sciences

Categories

Social Media, Personality, Computational Aspects, Language

Licence