Tracking the Global Pulse: The first public Twitter dataset from FIFA World Cup

Published: 27 May 2025| Version 2 | DOI: 10.17632/gw3mcnbkwr.2
Contributors:
,
,
,

Description

The first public large-scale multilingual Twitter dataset related to the FIFA World Cup 2022, comprising over 28 million posts in 69 unique spoken languages, including Arabic, English, Spanish, French, and many others. This dataset aims to facilitate research in future sentiment analysis, cross-linguistic studies, event-based analytics, meme and hate speech detection, fake news detection, and social manipulation detection. The file 🚨Qatar22WC.csv🚨 contains tweet-level and user-level metadata for our collected tweets. 🚀Codebook for FIFA World Cup 2022 Twitter Dataset🚀 | Column Name | Description| |-------------------------------- |----------------------------------------------------------------------------------------| | `day`, `month`, `year` | The date where the tweet posted | | `hou`, `min`, `sec` | Hour, minute, and second of tweet timestamp | | `age_of_the_user_account` | User Account age in days | | `tweet_count` | Total number of tweets posted by the user | | `location` | User-defined location field | | `follower_count` | Number of followers the user has | | `following_count` | Number of accounts the user is following | | `follower_to_Following` | Follower-following ratio | | `favouite_count` | Number of likes the user did| | `verified` | Boolean indicating if the user is verified (1 = Verified, 0 = Not Verified) | | `Avg_tweet_count` | Average tweets per day for the user activity| | `list_count` | Number of lists the user is a member | | `Tweet_Id` | Tweet ID | | `is_reply_tweet` | ID of the tweet being replied to (if applicable) | | `is_quote` | boolean representing if the tweet is a quote | | `retid` | Retweet ID if it's a retweet; NaN otherwise | | `lang` | Language of the tweet | | `hashtags` | The keyword or hashtag used to collect the tweet | | `is_image`, | Boolean indicating if the tweet associated with image| | `is_video` | Boolean indicating if the tweet associated with video | |-------------------------------|----------------------------------------------------------------------------------------| Examples of use case queries are described in the file 🚨fifa_wc_qatar22_examples_of_use_case_queries.ipynb🚨 and accessible via: https://github.com/khairied/Qata_FIFA_World_Cup_22 🚀 Please Cite This as: Daouadi, K. E., Boualleg, Y., Guehairia, O. & Taleb-Ahmed, A. (2025). Tracking the Global Pulse: The first public Twitter dataset from FIFA World Cup, Journal of Computational Social Science.

Files

Steps to reproduce

All steps is detailed via the github project Via:https://github.com/khairied/Qata_FIFA_World_Cup_22

Institutions

Universite de Lille, Universite Mohamed Khider de Biskra, Universite de Tebessa Faculte des Sciences Exactes et des Sciences de la Nature et de la Vie

Categories

English, French Language, Sport, Multilingualism, Arabic Language, Japanese Language, Spanish Language, Portuguese Language, FIFA World Cup, Twitter

Licence