2019 NYC Blackout tweets sentiment data

Published: 14-07-2020| Version 2 | DOI: 10.17632/y3jkjj9yzz.2
Lingyao Li


The dataset records the real-time sentiment records based on tweets data for 2019 Manhattan blackout, which occurred on July 13, 2019, at approximately 6:47 pm (EST). Twitter Standard Search API was utilized with key search terms “power outage,” “blackout,” and “power cut” to search against related tweets. The original data were stored in JavaScript Object Notation (.json) files, which were converted to Excel (.xlsx) files for subsequent processing. Location-related terms including “New York,” “NY,” “NYC,” “Manhattan,” and “Time Square” were used for filtering and cleaning work. The final dataset is saved as .xlsx format and includes 189,709 records with the ETC timeline from 6:47 pm on July 13 to 8:00 pm on July 15. Due to the Twitter Developer Policy, https://developer.twitter.com/en/developer-terms/agreement-and-policy#id34, user information (screen name, verified, descriptions, user-input location, favorites, and followers, etc.) or messages are not allowed to distribute. In the validation process, some tweets were found to be incorrectly classified by the Watson NLP API. To further improve the accuracy of sentiment extraction, those tweets appeared in our n=410 validation set that were incorrectly classified by the API were adjusted based on our manual labels. We attach the sentiment results for the sole purpose of validation of our research results. The full library of word patterns used in this research is also attached. This file exhibits the lists of word patterns that we input into python to classify the tweets into different categories of identified behavioral responses.