Household-level occupancy profiles generated using Bayesian Neural Networks for occupancy uncertainty in residential buildings

Published: 13 August 2024| Version 2 | DOI: 10.17632/xhpgjnr4bx.2
Contributors:
,
,

Description

This dataset comprises time-series data related to occupancy availability and metabolic rates based on activities performed in residential dwellings, captured at 10-minute intervals. The data is structured into several CSV files, divided into two categories: metabolic rates and availability. The profiles are developed for two types of dwellings: house-type and apartment-type dwellings. For house-type dwellings, there are 1000 CSV files each for metabolic rates and availability. These files are named sequentially from 1 to 1000. For apartment-type dwellings, there are 500 CSV files each for metabolic rates and availability, named sequentially from 1001 to 1500. The metabolic rate files, such as DateTime_Metrate_1.csv to DateTime_Metrate_1000.csv for house-type dwellings and DateTime_Metrate_1001.csv to DateTime_Metrate_1500.csv for apartment-type dwellings, contain DateTime and metrate columns. The DateTime column records the timestamp at 10-minute intervals, and the metrate column records the metabolic rate in watts per person associated with the activity performed during the corresponding time interval. The availability files, such as DateTime_Availability_1.csv to DateTime_Availability_1000.csv for house-type dwellings and DateTime_Availability_1001.csv to DateTime_Availability_1500.csv for apartment-type dwellings, contain DateTime and availability columns. The DateTime column records the timestamp at 10-minute intervals, and the availability column is a binary indicator where 1 denotes the individual is available (e.g., at home), and 0 denotes they are not available (e.g., outside). The data originates from Time Use Survey (TUS) data, which records detailed activities of individuals over specified periods. Activities are classified based on their metabolic rates into Low Metabolic Activity (LA) for activities below 100 W/person, Medium Metabolic Activity (MA) for activities exceeding 150 W/person, NotActive for idle, resting, and sleeping states, and Outside Activity captured indirectly through the availability status. This dataset supports research in occupancy prediction under uncertainty, energy consumption modelling in buildings, and smart home automation systems. To utilise the dataset, the metabolic rate and availability files should be integrated using the DateTime column. This alignment allows for a comprehensive view of both the activity level and presence status of individuals, which can be used to train and validate occupancy models.

Files

Steps to reproduce

The data provided is generated from Probabilistic Occupancy Model (POM) which is developed using Time Use Survey (TUS) data, leveraging Bayesian Neural Networks (BNNs) to predict occupancy under uncertainty. BNNs incorporate probabilistic elements into traditional neural networks, allowing the quantification of uncertainty in model parameters. The first step involves data collection and preprocessing. The TUS data, which includes detailed records of activities performed by individuals over specified periods, serves as the primary data source. From this data, input features and corresponding output labels are extracted at 10-minute intervals. These input features might include the time of day, day of the week, individual demographic information (such as age, gender, and occupation), and historical activity patterns. Activities are then classified based on their metabolic rates: those with a metabolic rate below 100 W/person are classified as Low Metabolic Activity (LA), those exceeding 150 W/person fall under Medium Metabolic Activity (MA), idle, resting, and sleeping states are classified as NotActive, and activities performed outside the house are categorised as Outside Activity. The next step is designing the neural network architecture. In Bayesian inference, prior beliefs about model parameters are updated based on new data. The posterior probability distribution is derived using Bayes' theorem, which integrates the likelihood of data and the prior probability of parameters. Due to the intractability of direct Bayesian inference, Variational Inference (VI) is used to approximate posterior distributions. This involves parameterising the posterior distribution by a set of variational parameters and optimising these parameters to minimise the Kullback-Leibler (KL) divergence between the variational and true posterior distributions. The model is trained using the processed TUS data, with the likelihood of each data point modelled using Gaussian distributions for weights and biases. To avoid biases and overfitting, a five-fold cross-validation technique is applied. Model performance is evaluated using various criteria, including prediction accuracy, loss (such as log-loss), uncertainty, confusion matrix, AUC-ROC (Area Under the Receiver Operating Characteristic curve), and weighted average F1 score. Finally, the profiles of occupancy states are extracted from the posterior distribution, representing the probability of each occupancy state at each time interval. This allows for capturing the uncertainty and variability in occupancy patterns, providing a more robust and realistic estimation of occupancy compared to deterministic models. By following these detailed steps, the Probabilistic Occupancy Model (POM) can be accurately developed using TUS data and Bayesian Neural Networks (BNNs). This approach allows for capturing and quantifying the inherent uncertainty in occupancy predictions, leading to more reliable and nuanced occupancy profiles.

Institutions

University College Dublin

Categories

Multi-Scale Modeling, Urban Energy Consumption, Uncertainty Analysis, Residential Building

Licence