Amazon Rainforest Wildfires Rumor Detection

Published: 6 December 2022| Version 1 | DOI: 10.17632/m7k4gsffry.1
, Matthias Bogaert,


The data set contains information about the Amazon rainforest wildfires that took place in 2019. Twitter data has been collected between August 21, 2019 and September 27, 2019 based on the following hashtags: #PrayforAmazonas, #AmazonRainforest, and #AmazonFire. The goal of this data set is to detect whether a tweet is identified as a rumor or not (given by the 'label' column). A tweet that is identified as a rumor is labeled as 1, and 0 otherwise. The tweets were labeled by two independent annotators using the following guidelines. Whether a tweet is a rumor or not depends on 3 important aspects: (1) A rumor is a piece of information that is unverified or not confirmed by official instances. In other words, it does not matter whether the information turns out to be true or false in the future. (2) More specifically, a tweet is a rumor if the information is unverified at the time of posting. (3) For a tweet to be a rumor, it should contain an assertion, meaning the author of tweet commits to the truth of the message. In sum, the annotators indicated that a tweet is a rumor if it consisted of an assertion giving information that is unverifiable at the time of posting. Practically, to check whether the information in a tweet was verified or confirmed by official instances at the moment of tweeting, the annotators used BBC News and Reuters. After all the tweets were labeled, the annotators re-iterated over the tweets they disagreed on to produce the final tweet label. Besides the label indicating whether a tweet is a rumor or not (i.e., ‘label’), the data set contains the tweet itself (i.e., ‘full_text’), and additional metadata (e.g., ‘created_at’, ‘favorite_count’). In total, the data set contains 1,392 observations of which 184 (13%) are identified as rumors. This data set can be used by researchers to make rumor detection models (i.e., statistical, machine learning and deep learning models) using both unstructured (i.e., textual) and structured data.



Universiteit Gent


Social Media


Bijzonder Onderzoeksfonds UGent


Fonds Wetenschappelijk Onderzoek