Storm-related social media data

Published: 21-03-2020| Version 1 | DOI: 10.17632/5c3cpnvgx3.1
Rob Grace


This data makes available the qualitative content analysis of crisis social media datasets collected over a six-hour period during a severe storm and tornado that struck Central Pennsylvania on May 1st, 2017. Three datasets were collected from Twitter using location, keyword, and network filtering techniques. Only 2% of the 22,706 total tweets overlap among the three datasets, providing researchers with a broader scope of information than normally available when collecting tweets using location (i.e., geotag-based) and keyword filtering alone or in combination during a crisis. Each data collection technique is described in detail, including network filtering which collects data from networks of social media users associated with a geographic area. The three datasets are manually labelled for information content and toponym usage. The 22,706 tweet IDs, dehydrated for privacy, are labelled for relevance (on-topic and off-topic) and 19 types of crisis-related information organized into six categories: infrastructure damage, service disruption, personal experience, weather updates, weather forecasts, and weather warnings. Tweets in each dataset are also labelled for toponym usage (with or without toponyms), location (local, non-local, and generic toponyms), and granularity (hyperlocal, municipal, and regional toponyms). The comprehensively labelled datasets provide researchers with detailed resources for the analysis of crisis-related information and volunteered location information posted during a hyperlocal crisis event.