Covid Twitter Emotion Analysis Data

Published: 16-07-2020| Version 1 | DOI: 10.17632/47hy8yyky5.1
Nikhil Matta


Twitter data was collected using Twitter’s Application Programming Interface(API) and Tweepy, a python library to access the twitter API. Certain keywords related to COVID’19 like Coronavirus, ncov, Wuhan, China, Covid-19, Epidemic, Pandemic, SocialDistancing, etc. were used to collect the tweets. Only the tweets that were in English and the ones that had a geo-tag were collected. During the exploratory data analysis, we noticed that a number of tweets consisted of only certain words and not proper sentences and analyzing the emotion of such tweets might not give us a proper overview of the emotions. Thus, only the tweets with at least 6 words in them were used. This significantly reduced the number of tweets collected. Finally, we had over 1 million tweets over the span of February, March, April, May, and June. The tweets were then further processed to remove all the HTML text, ‘@’ mentions, URL links, and #hashtags. The data was analyzed using a machine learning model and tweets were categorized into various emotions. The dataset provides the count of tweets per country per emotion for 5 months.