Panama City, Panama road traffic indicidents 2014-2022 - Social Media Dataset (in Spanish)
The raw data set is composed of 200,410 Tweets in Spanish language from the road traffic social reporting account @traficocpanama (1_raw_data_200410.csv). Tweets were collected between January 2014 and May 2022. The data were collected using a the Python Programming language with modules Selenium, and Tweepy. This raw data set was first processed by keeping tweets with at least 3 words and then stop words (see stop-words.csv) were removed. Which brought the number of tweets to 192,707 (2_preliminar_data_192707.csv). The second cut-off was made via a machine learning classification model to sort tweets that had a relation with 1) Accidents (in Spanish: Choques, accidentes, colisiones, vuelcos, atropellos; in English: Crashes, accidents, collisions, overturns, run-overs). 2) Obstacles (in Spanish: Tranques, huelgas, motines, paros, protestas, trabajos en vía, cierres; in English: Traffic jams, strikes, riots, stoppages, protests, road works, closures). 3) Dangers (in Spanish: Incendios, inundaciones, lluvias fuertes; in English: Fires, floods, heavy rains), whichs brought the Tweet number to 120,000 tweets (3_sample_class_120000.csv). Finally, a machine learning incident categorization model was trained on 51,000 Tweets between categories: accident, obstacle, danger (4_sample_categ_51000.csv) This data set is intended for academic use and research in natural language processing (NLP) in Spanish. Specially, for road traffic incident detection.