Annotated Indonesian Tweets Dataset containing Abusive Words for Sentiment Analysis
999 Indonesian tweets containing abusive words, manually tagged with positive and negative sentiment. Along with this is a list of 20 abusive words frequently used in Indonesian tweets.
Steps to reproduce
We make a list of swear words that frequently used in Twitter through online survey from Indonesian Twitter users, there a 20 swear words, some of them have the same meaning but different writings. We then crawl the twitter using TAGS 6.1 using the list as the search keywords. We excluded retweets, clear unnecessary fields and omit mentioned username, then manually tagged the tweets with positive and negative sentiment label. We keep the balance between positive and negative data amount. There are 404 positive and 595 negative tweets.