Panama City, Panama road traffic indicidents 2014-2022 - Social Media Dataset (in Spanish)

Published: 22 June 2022| Version 2 | DOI: 10.17632/tmwrd45m7x.2


The raw data set is composed of 200,410 Tweets in Spanish language from the road traffic social reporting account @traficocpanama (1_raw_data_200410.csv). Tweets were collected between January 2014 and May 2022. The data were collected using a the Python Programming language with modules Selenium, and Tweepy. This raw data set was first processed by keeping tweets with at least 3 words and then stop words (see stop-words.csv) were removed. Which brought the number of tweets to 192,707 (2_preliminar_data_192707.csv). The second cut-off was made via a machine learning classification model to sort tweets that had a relation with 1) Accidents (in Spanish: Choques, accidentes, colisiones, vuelcos, atropellos; in English: Crashes, accidents, collisions, overturns, run-overs). 2) Obstacles (in Spanish: Tranques, huelgas, motines, paros, protestas, trabajos en vía, cierres; in English: Traffic jams, strikes, riots, stoppages, protests, road works, closures). 3) Dangers (in Spanish: Incendios, inundaciones, lluvias fuertes; in English: Fires, floods, heavy rains), whichs brought the Tweet number to 120,000 tweets (3_sample_class_120000.csv). Finally, a machine learning incident categorization model was trained on 51,000 Tweets between categories: accident, obstacle, danger (4_sample_categ_51000.csv) This data set is intended for academic use and research in natural language processing (NLP) in Spanish. Specially, for road traffic incident detection.



Universidad Tecnologica de Panama


Traffic Accident, Road Traffic Safety, Twitter, Road Safety