Brazilian Tweets Dataset about Volleyball Labeled by Sentiment

Published: 12-09-2020| Version 1 | DOI: 10.17632/k2mby4pg5p.1
Matheus Cardoso,
Anita M. R. Fernandes


This database contains Brazilian Portuguese phrases about volleyball and their respective sentiment polarity. All sentences were manually labeled by the author with negative, neutral and positive polarities. The dataset contains 12.506 texts, where 1.032 negative, 10.176 neutral and 1.298 positive sentiment polarity. The polarities are represented as: * Negative: -1; * Neutral: 0; * Positive: 1. These texts were processed with the following steps: * Removing stopwords; * Removing numbers; * Removing symbols (#, @, %, &, !, etc...); * Removing line break; * Removing URLs, Links and mentions; * Removing punctuation; * Transformed to lowercase. These dataset were developed with texts extracted from Twitter in order to be used to train and test classification models of Sentiment Analysis in the portuguese language used in Brazil. In order to organize the data, this is represented in CSV (Comma-separated values) format.