A Home Assistant-Based Residential HEMS Dataset for HVAC Control, Shiftable Appliance Scheduling, PV Generation, and Real-Time Pricing

Published: 9 May 2026| Version 1 | DOI: 10.17632/23s96gn2j7.1
Contributor:
Mozhgan Rahmatinia

Description

This dataset provides a simulated residential home energy management system (HEMS) data environment based on Home Assistant household automation and synchronized online exogenous data. It was developed to support research on HVAC control, thermal comfort, shiftable appliance scheduling, photovoltaic generation, real-time electricity pricing, and residential demand-side energy management. The household-related variables were generated using a Home Assistant-based scenario representing a four-person household. The simulated variables include occupancy count, washing machine and dishwasher requests, appliance energy profiles, desired indoor temperature, active comfort indicator, base load, HVAC operating state, and HVAC load. These data were combined with online exogenous signals, including CAISO real-time electricity prices, NSRDB-based photovoltaic generation, and outdoor temperature for the Los Angeles / SP-15 California region. The dataset is organized at a 30-minute time resolution and covers the period from 2020-01-01 00:00:00 to 2022-09-07 23:30:00. It includes a full dataset file and predefined train, evaluation, real-test, and comprehensive-test splits to support reproducible model development and evaluation. This dataset is not an optimized controller or an intelligent HEMS by itself; rather, it provides a reusable simulation-based data environment for researchers to implement, train, test, and compare their own control, optimization, and machine-learning methods for residential HEMS applications.

Files

Steps to reproduce

The dataset was constructed in three main stages. First, a residential household scenario was simulated in Home Assistant using virtual entities and automation rules to represent a four-person household. The simulation generated occupancy count, appliance requests, base load, desired temperature, active comfort status, HVAC operating state, and appliance/HVAC energy-use profiles. Second, online exogenous variables were collected and synchronized with the household simulation timeline. The real-time electricity price was obtained from the EnergyOnline CAISO Real-time Price page for the TH_SP15 / SP-15 California region and converted from USD/MWh to USD/kWh by dividing by 1000. PV generation and outdoor temperature were prepared using NSRDB/NREL data for a Los Angeles-area location, selected to be geographically consistent with the SP-15 electricity price region. Third, all simulated and online variables were synchronized to a common 30-minute time resolution. The final dataset was split into train, evaluation, real-test, and comprehensive-test subsets. The full dataset includes a split column for reproducibility, while the separate split files contain only the corresponding subset. Additional documentation is provided in HA_Data_Dictionary.xlsx and the Home Assistant scenario documentation file.

Categories

Energy Engineering, Computer Engineering, Smart City, Building, Smart Grid

Licence