SpanishTweetsCOVID-19: A Social Media Enriched Covid-19 Twitter Spanish Dataset

Published: 04-11-2020| Version 2 | DOI: 10.17632/nv8k69y59d.2
Contributors:
Antonela Tommasel,
Juan M. Rodriguez,
Daniela Godoy

Description

This dataset presents a large-scale collection of millions of Twitter posts related to the coronavirus pandemic in Spanish language. The collection was built by monitoring public posts written in Spanish containing a diverse set of hashtags related to the COVID-19, as well as tweets shared by the official Argentinian government offices, such as ministries and secretaries at different levels. Data was collected between March and August 2020 using the Twitter API, and will be periodically updated. In addition to tweets IDs, the dataset includes information about mentions, retweets, media, URLs, hashtags, replies, users and content-based user relations, allowing the observation of the dynamics of the shared information. Data is presented in different tables that can be analysed separately or combined. The dataset aims at serving as source for studying several coronavirus effects in people through social media, including the impact of public policies, the perception of risk and related disease consequences, the adoption of guidelines, the emergence, dynamics and propagation of disinformation and rumours, the formation of communities and other social phenomena, the evolution of health related indicators (such as fear, stress, sleep disorders, or children behaviour changes), among other possibilities. In this sense, the dataset can be useful for multi-disciplinary researchers related to the different fields of data science, social network analysis, social computing, medical informatics, social sciences, among others.

Files

Steps to reproduce

The raw data belonging to the Twitter posts were retrieved from the Twitter API using our own toll called Faking it!, which internally uses Twitter4J for easily integrating with the Twitter API. Faking it! can also be used to rehydrate the data collection. In all cases, longs are encoded as Radix 32 Strings. The code for processing and analysing the raw data and the shared tables is also available at the Faking it! repository at https://github.com/knife982000/FakingIt.