Enriched Traffic Datasets for Madrid

Published: 11 May 2026| Version 4 | DOI: 10.17632/697ht4f65b.4
Contributors:
,

Description

DESCRIPTION OF THE RESEARCH AND DATA This work presents the Madrid Traffic Dataset (MTD), a comprehensive resource for the analysis and modeling of traffic patterns in Madrid. The dataset integrates traffic sensor measurements, weather observations, labor calendar information, road infrastructure attributes, and geolocation data to support urban mobility studies and predictive modeling. In addition to the core tabular data, this release includes temporal sequences and traffic adjacency matrices, enabling time-series analysis and graph-based machine learning approaches. COMPLETE DATASET The complete version of the MTD includes data from 554 traffic sensors distributed across the Madrid region, covering 30 months, from June 2022 to November 2024. SUBSET DATASET A compact version derived from the complete dataset is also provided. It focuses on 300 traffic sensors and covers 17 months, from June 2022 to October 2023. This subset is intended for researchers who need a lighter dataset for experimentation. DATA ORGANIZATION The dataset is organized into folders identified by configuration data hashes. Each folder contains processed datasets, temporal sequences, adjacency matrices, sensor coordinate files, and configuration files. This structure supports traceability, reproducibility, and comparison between the complete and subset versions. For more details, see the associated article: Gómez, I. and Ilarri, S., Advanced Prediction of Traffic at Different Temporal Scales Using Heterogeneous Data Sources, IEEE Open Journal of Intelligent Transportation Systems, 2025. DOI: 10.1109/OJITS.2025.3637305.

Files

Steps to reproduce

DATA COLLECTION PROCESS AND METHODS Traffic intensity data were obtained from the Madrid Open Data Portal. The original CSV files were processed by selecting the sensor identifier, timestamp, and traffic intensity fields. Sensor location data were obtained from the Madrid Open Data Portal. Sensor coordinates were integrated using geospatial representations to support spatial analysis and road-network matching. Labor calendar data for Madrid were incorporated to classify records according to day type, including working days, holidays, Saturdays, and Sundays. Meteorological observations, including temperature, precipitation, and wind variables, were integrated by aligning weather records with traffic measurements by date and time. Road infrastructure information was obtained from OpenStreetMap and processed with OSMnx. The road network was transformed into a geospatial representation, and nearest-road matching was performed using spatial search methods such as KDTree. This linked each traffic sensor record with road attributes such as road class, number of lanes, one-way information, maximum speed, and segment length. Data cleaning and optimization steps were applied to remove inconsistencies, outliers, duplicates, and sensors with insufficient activity. For predictive modeling, temporal and categorical features were transformed using trigonometric time encoding, numerical standardization, one-hot encoding, ordinal encoding, and passthrough handling where appropriate. Temporal sequences were generated to capture time-series patterns across predefined windows and prediction horizons. Traffic adjacency matrices were generated to represent spatial relationships and connectivity between sensors, enabling graph-based machine learning experiments. The configuration files included in each dataset folder document the preprocessing parameters used to generate the corresponding complete or subset dataset.

Institutions

Categories

Data Analysis, Calendering, Spain, Weather, Data Processing, Traffic Congestion, Road Network

Funders

Licence