Monterrey-LCS-PM2.5: A Dataset for Hybrid Calibration and Spatial Transferability Analysis

Published: 24 April 2026| Version 2 | DOI: 10.17632/pyg6sxxfnn.2
Contributors:
Edgar Tello-Leal,

Description

This dataset contains high-frequency atmospheric monitoring data collected over 18 months (January 2023 – July 2024) in the Monterrey Metropolitan Area (MMA), Mexico. The data was generated to evaluate the resilience, aging, and spatial transferability of low-cost sensor (LCS) networks under extreme semi-arid conditions. The dataset includes collocated measurements from three Plantower PMSA003 sensors and federal-grade reference instruments (Beta-ray Attenuation Monitor - BAM) across four strategic urban sites. This data supports the research presented in the manuscript: "A Scalable and Resilient Hybrid XGB-L Framework for Long-term Urban $PM_{2.5}$ Monitoring: Enhancing Data Integrity across Complex Physicochemical Gradients.

Files

Steps to reproduce

Data Dictionary (Column Definitions) timestamp: Date and hour of the observation (UTC-6). station_id: Unique identifier for the monitoring site. PM25_ref: Reference $PM_{2.5}$ concentration ($\mu g/m^3$) from BAM instruments. LCS_raw_mean: Ensemble average of three collocated LCS units. LCS_dry: Physically-corrected LCS signal based on hygroscopic growth theory. RH: Relative Humidity (%). Temp: Ambient Temperature (°C). WS, WD: Wind Speed (m/s) and Wind Direction (°). days_since_install: Cumulative days of sensor field operation (aging proxy). LCS_dry_lag1: Previous hour $LCS_{dry}$ value for temporal memory. Usage Notes: This dataset is intended for researchers working on: Machine Learning calibration for environmental sensors. Long-term drift compensation and hardware aging analysis. Spatial transferability of calibration models in complex urban terrains.

Institutions

Categories

Air Pollution, Data Science, Fine Particulate Matter

Licence