Tracking the Global Pulse: The first public Twitter dataset from FIFA World Cup
Description
The first public large-scale multilingual Twitter dataset related to the FIFA World Cup 2022, comprising over 28 million posts in 69 unique spoken languages, including Arabic, English, Spanish, French, and many others. This dataset aims to facilitate research in future sentiment analysis, cross-linguistic studies, event-based analytics, meme and hate speech detection, fake news detection, and social manipulation detection. The file 🚨Qatar22WC.csv🚨 contains tweet-level and user-level metadata for our collected tweets. 🚀Codebook for FIFA World Cup 2022 Twitter Dataset🚀 | Column Name | Description| |-------------------------------- |----------------------------------------------------------------------------------------| | `day`, `month`, `year` | The date where the tweet posted | | `hou`, `min`, `sec` | Hour, minute, and second of tweet timestamp | | `age_of_the_user_account` | User Account age in days | | `tweet_count` | Total number of tweets posted by the user | | `location` | User-defined location field | | `follower_count` | Number of followers the user has | | `following_count` | Number of accounts the user is following | | `follower_to_Following` | Follower-following ratio | | `favouite_count` | Number of likes the user did| | `verified` | Boolean indicating if the user is verified (1 = Verified, 0 = Not Verified) | | `Avg_tweet_count` | Average tweets per day for the user activity| | `list_count` | Number of lists the user is a member | | `Tweet_Id` | Tweet ID | | `is_reply_tweet` | ID of the tweet being replied to (if applicable) | | `is_quote` | boolean representing if the tweet is a quote | | `retid` | Retweet ID if it's a retweet; NaN otherwise | | `lang` | Language of the tweet | | `hashtags` | The keyword or hashtag used to collect the tweet | | `is_image`, | Boolean indicating if the tweet associated with image| | `is_video` | Boolean indicating if the tweet associated with video | |-------------------------------|----------------------------------------------------------------------------------------| Examples of use case queries are described in the file 🚨fifa_wc_qatar22_examples_of_use_case_queries.ipynb🚨 and accessible via: https://github.com/khairied/Qata_FIFA_World_Cup_22 🚀 Please Cite This as: Daouadi, K. E., Boualleg, Y., Guehairia, O. & Taleb-Ahmed, A. (2025). Tracking the Global Pulse: The first public Twitter dataset from FIFA World Cup, Journal of Computational Social Science.
Files
Steps to reproduce
All steps is detailed via the github project Via:https://github.com/khairied/Qata_FIFA_World_Cup_22