Corpus of Sarcasm in Twitter Conversations

Published: 21 February 2018| Version 1 | DOI: 10.17632/fn2mmff85g.1
Gavin Abercrombie


A corpus of two-part author-audience Twitter conversations, with associated manually annotated sarcasm polarity labels. The corpus is presented as a csv file in the format author, audience, label, where 'author' is the ID number of the target Tweet, 'audience' is the ID number of the other tweet in the conversation, and 'label' is the hand-annotated positive (1) or negative (0) sarcasm class label.


Steps to reproduce

Following Twitter terms of service, only ID numbers of each tweet in the corpus are available here. Text and associated metadata can be recovered using these ID numbers from the Twitter API.


Social Sciences, Computer Science, Computational Linguistics, Data Science, Natural Language Processing, Statistical Natural Language Processing