Longitudinal indoor air quality dataset (CO2, PM, Temp, RH) in South African classrooms across six infrastructure types (2023-2025)
Description
Research Hypothesis & Context: In the resource-constrained South African education sector, temporary infrastructure (shipping containers and prefabricated units) is frequently used as a permanent solution for classroom overcrowding. This study hypothesises that these temporary structures offer significantly poorer Indoor Air Quality (IAQ) and thermal comfort compared to permanent brick infrastructure, potentially impacting learner cognition and health. This dataset provides a longitudinal comparative analysis of these building types for occupied and unoccupied dates. It captures the "real-world" learning environment, accounting for external factors unique to the region, such as rolling national power outages, dictating mechanical ventilation availability. Methodology & Data Collection: Data was collected via a custom LoRaWAN IoT network deployed in primary schools in Stellenbosch, South Africa. Timeframe: Feb 2023 - Jan 2025 (24 mos.), capturing multiple seasonal cycles. Sampling: Intervals from 11 min (early deployment) to 3 min (primary deployment) to capture high-resolution data. Sensors: DHT22 (T/RH), COZIR-LP-5000 (CO2), and PMS5003 (PM 1.0, 2.5, 10). Data Interpretation: The dataset comprises ~2.8 mil data points across 6 infrastructure types: Container (with and without a wood-panel wall retrofit), Mobile/Prefab, and Brick (First/Second Floor/Single Storey). Files and Structure: rawmeasurements.csv: This file exceeds 100MB and contains ~2.8 million rows. May require statistical software (Python, R, Stata) to process. Contains the primary time-series vectors for CO2, T, RH, and PM. roomdetails.csv: Static metadata linking roomcode to physical attributes (dimensions, window size, building materials, and orientation). weatherinfo.csv: Daily ambient conditions including Temperature, Wind Speed/Direction, and Solar Irradiance (GHI, DNI, DHI) for energy modelling, as well as ASHRAE 55-2023 thermal comfort metrics Tpma, and 80% thermal acceptability limits. (Full weather data not published due to copyrights, may be requested) occupancyschedules.csv: High-resolution binary arrays (0/1) estimating room occupancy based on school timetables. powerofftimes.json: Logs of power outage events, allowing researchers to correlate spikes in CO2 or Temperature with forced HVAC outages. CodeBook.xlsx: The master dictionary for all variable codes and units. Notable Findings & Usage: The data reveals distinct thermal profiles where uninsulated containers exhibit extreme temperature fluctuations compared to brick structures. High CO2 accumulation rates observed during winter months highlight ventilation deficits when windows are closed to conserve heat. Research Potential: Public Health: Using CO2 as a proxy for viral transmission risk in crowded spaces. Building Physics: Energy modelling using the provided Solar Irradiance and power outage data. Policy: Providing evidence-based recommendations for school infrastructure procurement in developing economies.
Files
Steps to reproduce
1. Experimental Setup and Hardware: Data was collected using custom-built IoT sensing nodes deployed across six infrastructure types in Stellenbosch primary schools. Sensors included the DHT22 (Temperature and Relative Humidity), COZIR-LP-5000 (CO2), and PMS5003 (Laser Scattering for PM1.0, PM2.5, PM10 estimates). Readings were transmitted via a LoRaWAN network (868 MHz) in 12-byte packets to a central gateway. The sampling interval was initially set to 11 minutes (early 2023) and increased to 3 minutes (2024) to capture high-resolution environmental dynamics. 2. Data Structure and Linkage: The dataset is structured as a star schema with rawmeasurements.csv acting as the central fact table. To reproduce the full analysis context, files should be merged as follows: rawmeasurements.csv (Primary Data): Contains the time-series sensor data. Link using the 'roomcode' column to join with roomdetails.csv. Note that room codes may contain suffixes (-a pre-construction, -b during, -c post-construction) indicating periods where a classroom underwent retrofitting (e.g., adding wood-panel insulation to a container). roomdetails.csv (Static Metadata): Provides physical attributes including dimensions (length, depth, height), orientation (doorface_orientation), and material composition (material_wall, material_roof). It also flags if a room is 'alonestanding' or attached to a block. weatherinfo.csv (Environmental Context): Contains daily weather data. Link to raw measurements based on Date for the mean, maximum, and minimum metrics for weather features. Includes Solar Irradiance (GHI, DNI, DHI) for energy modelling and ASHRAE 55-2023 thermal comfort metrics for exposure analyses. Full weather data not published but can be purchased or requested. occupancyschedules.csv and schedulecodes.csv (Occupancy Logic): For each ‘roomcode’ and ‘Date’, there is an assigned 'schedulecode'. Use this code to look up the specific day type in schedulecodes.csv. For high-resolution analysis, map the schedulecode to occupancyschedules.csv, which provides a 5-minute binary array (0=Empty, 1=Occupied) derived from school timetables. powerofftimes.json (Load Shedding): Contains start/end timestamps of national power outages. Map these timestamps against rawmeasurements to identify periods where mechanical ventilation (Air Cons/Fans) was forcibly disabled. 3. Data Processing Notes: All times are reported in South African Standard Time (SAST, UTC+2). The dataset includes raw sensor output. Researchers should filter for valid ranges as defined in the CodeBook.xlsx. Occasional transmission packet loss is inherent to LoRaWAN; time-series interpolation may be required depending on the specific analysis goal.