Enriched Traffic Datasets for Madrid

Published: 27 January 2025| Version 2 | DOI: 10.17632/697ht4f65b.2
Contributors:
,

Description

DESCRIPTION OF THE RESEARCH AND DATA: This work presents the Madrid Traffic Dataset (MTD), a comprehensive resource for the analysis and modeling of traffic patterns in Madrid. The dataset integrates data from traffic sensors, weather observations, calendar information, road infrastructure, and geolocation data to support advanced studies of urban mobility and predictive modeling. In addition to the core data sources, the dataset includes temporal sequences and a traffic adjacency matrix, enabling the application of time-series analysis and graph-based modeling approaches. -COMPLETE DATASET: The complete version of the MTD includes data from 554 traffic sensors distributed across the Madrid region, covering a total of 30 months (from June 2022 to November 2024). -SUBSET DATASET: A more compact version derived from the complete dataset, focused on a subset of 300 traffic sensors with 17 months of data (from June 2022 to October 2023). This subset is designed for researchers requiring a lighter dataset. DATA ORGANIZATION: The dataset is organized in a main directory containing a subfolder identified by the configuration data hash. This subfolder includes all key components: datasets, temporal sequences, adjacency matrices, and configuration files. The structure ensures that all resources are clearly arranged to facilitate easy access and reproducibility for researchers. For more details, see [Submitted to IEEE Internet of the Things Journal].

Files

Steps to reproduce

Data collection process and methods summary for these datasets: -Traffic Intensity Data Collection: Traffic intensity data were obtained from the Madrid Open Data Portal. Initially, these data were stored in CSV files, selecting only the columns for sensor ID, date and time of the record, and traffic intensity. -Sensor Location Data: Detailed data on the geographical position of the sensors were collected from the same Madrid Open Data portal, adding these coordinates in Well-Known Text (WKT) format to facilitate integration with other geospatial data. -Labor Calendar Data: Madrid's labor calendar data, including workdays, holidays, Saturdays, and Sundays, were integrated, also obtained from the Open Data Portal. This step is crucial for analyzing how traffic patterns vary according to the type of day. -Meteorological Data: Climatic variables such as temperature, precipitation, and wind were incorporated, aligning these observations with traffic records by date to analyze the influence of weather on traffic. -Road Information Data: Road information from OpenStreetMap processed through OSMnx was used to enrich traffic data with information about road infrastructure. This included transforming the data into a GeoDataFrame and applying a KDTree for nearest point search on the road network, linking each traffic record with a specific location on the road network. -Data Optimization and Cleaning: Techniques were applied to clean and organize the data, processing it to remove inconsistencies, outliers, duplicates, and low-activity sensors. -Optimization for Temporal and Predictive Analysis: Columns were refined and transformed for machine learning analysis preparation. This included trigonometric encoding for time features, standardization of numerical attributes, one-hot encoding for categorical features, ordinal encoding for ordinal features, and handling passthrough features. -Temporal Sequences and Adjacency Matrix: The dataset includes temporal sequences to capture time-series patterns across predefined intervals, facilitating predictive modeling and temporal analysis. Additionally, a graph-based adjacency matrix was generated to represent the spatial relationships and connectivity between traffic sensors, enabling the application of graph-based machine learning techniques.

Institutions

Universidad de Zaragoza

Categories

Data Analysis, Calendering, Spain, Weather, Data Processing, Traffic Congestion, Road Network

Funding

Gobierno de Aragón

T64_23R

Agencia Estatal de Investigación

PID2020-113037RB-I00

Licence