DigitalExposome: A Dataset for Wellbeing Classification using Environmental Air Quality and Human Physiological data
Description
The DigitalExposome dataset is a comprehensive collection of multi-sensor data designed to explore the relationship between urban environmental factors and human wellbeing. A total of 42, 437 samples were collected from 40 participants who undertook the experiment. Key features of the DigitalExposome dataset include: Environmental Data: This includes measurements of air pollutants (e.g., Particulate Matter (1.0, 2.5 & 10), Carbon Monoxide, Ammonia, Nitrogen Dioxide and noise levels. These environmental factors are crucial for assessing the impact of pollution and other urban stressors on health. Physiological Data: The dataset captures a range of physiological responses using an E4 Empatica, including Electrodermal Activity (EDA), Heart Rate (HR), Heart Rate Variability (HRV), and other indicators such as Blood Volume Pulse (BVP). These signals provide insights into the body’s responses to environmental stressors. Perceived Wellbeing Data: The dataset comprises self-reported responses from participants regarding their overall wellbeing, capturing their emotional and mental states in response to various environmental exposures. Wellbeing was assessed using a 5-point Likert scale, where participants rated their feelings from 1 (very negative) to 5 (very positive). This scale provides a structured way to quantify subjective experiences, allowing researchers to analyse trends in emotional valence (e.g., positive or negative feelings) across different conditions. The dataset was developed as part of the DigitalExposome framework, which integrates data from multiple sources, including environmental sensors, physiological sensors, and self-reported wellbeing responses. This framework aims to provide a deeper understanding of how urban exposures—such as air pollution, noise, and other environmental stressors can impact individual health and wellbeing. The three data collection devices operated at different sampling rates, which has been accounted for in the processing stage. After data collection and integration, the combined dataset from 40 users has been normalised and prepared for analysis. The physiological data recorded by the E4 Empatica included Heart Rate (HR) at 1Hz, ElectroDermal Activity (EDA) at 4Hz, and Blood Volume Pulse (BVP) at 64Hz. Heart Rate Variability (HRV), instead of being recorded at a fixed rate, was provided as a sequence of time intervals between detected heartbeats. Environmental air quality data was sampled at 0.2Hz. To ensure uniformity across all data sources, physiological data from the Empatica device was downsampled to 1Hz to align with HR measurements. Similarly, environmental data was upsampled to match the physiological data rate of 1Hz. Additionally, self-reported mental wellbeing data collected via smartphone was extracted and upsampled to 1Hz to maintain consistency with both the environmental and physiological datasets.
Files
Steps to reproduce
The experimental study design and approach is discussed in the following journal: Johnson, T., Kanjo, E. & Woodward, K. DigitalExposome: quantifying impact of urban environment on wellbeing using sensor fusion and deep learning. Comput.Urban Sci. 3, 14 (2023). https://doi.org/10.1007/s43762-023-00088-9