Indonesian Tweets Dataset for Identifying Emotion Changes Among Twitter Users Following the Onset of the COVID-19

Published: 9 November 2022| Version 1 | DOI: 10.17632/x8t4gn6mt6.1
Apriantoni Apriantoni,


Tweet data was collected using the Twitter API services based on a point location with a radius of 10 km to obtain high tweet intensity in strategic locations. The location for data collection was the Setiabudi area, Jakarta, Indonesia, obtained by rounding the latitude value to -6.22 and longitude to 106.83; this location point was chosen via probability sampling to enhance the analysis process, as this area was most affected by COVID-19 cases in Indonesia during the pandemic. Data collection was divided into two periods: before the COVID-19 outbreak (i.e., December 2019 to March 2020) and the beginning of the COVID-19 outbreak (i.e., March 2020 to June 2020). This study considered the first day of the COVID-19 pandemic in Indonesia to be March 14, 2020, according to the rules released by the Indonesian government. This mechanism resulted large and varied data, as a data filtering process based on a specific context was not conducted. Then, our work performed three steps of data reduction to obtain the appropriate data dimensions; a) select active users based on tweet intensity, b) remove tweet data with a word count below five, and c) eliminate data based on the suitability of discussion topics. The data used in the modeling process has passed three stages of data reduction. Linear with our work, there are three labeling processes: discussion topic, emotion and sentiment. For discussion topic labeling, this data performed a topic modeling mechanism using the LDA algorithm. On the other hand, for emotion and sentiment labeling, three annotators manually labeled the data and used the majority vote strategy for the final class label on sample data. In our annotation strategy, for emotion labeling, each annotator was asked to annotate the individual tweets as "Happiness", "Love", "Fear", "Sadness", and "Anger". While for sentiment labeling, each tweet has been annotated into three predetermined category, namely "Positive", "Negative", and "Neutral".



Institut Teknologi Sepuluh Nopember


Social Sciences, Computer Science, Natural Language Processing, Social Network Analysis, Collective Behavior, Longitudinal Analysis, Emotion, Behavior Change, Text Mining, Twitter, Sentiment Analysis, COVID-19