Training and Test Dataset for Early Stage Energy Prediction Model using Advanced Machine Learning Approaches

Published: 7 August 2023| Version 1 | DOI: 10.17632/jtrg6hv366.1


The dataset is related to the article titled "Convolutional Neural Network to Learn Building Shape Representations for Early Stage Energy Prediction". It contains simulation data from EnergyPlus that is processed in a structure to train machine learning model using approaches of artificial neural network (ANN), component-based machine learning (CBML) model, and convolutional neural network (CNN) model. The dataset contains raw data, programs to reproduce the data, trained machine learning (ML) models, and their predictions. It also contains an auto encoder model used to reduce dimensionality of raw three-dimensional input to the CNN model.


Steps to reproduce

Requirements: * Extract and browse to the downloaded directory. The location of program is relative to the downloaded directory. i. The following instructions have been tested for only Linux environment. ii. Python3 should have been installed with the libraries: pandas, tensorflow, shapely, scikit-learn, scikit-image, SALib, shapely. iii. mono-devel is required to generate EnergyPlus models iv. EnergyPlus needs to be installed in the home directory. Please change the location in Programs/DataCollection/ for other locations. v. Dataset contains only the ML models with the least validation losses. The models trained for hyper parameter tuning have beed removed to reduce the dataset size. 1. Data collection: The following steps generate samples, create EnergyPlus models, run simulations, read EnergyPlus output and preprocess data for ML models: (can be skipped if new samples are not required) python Programs/DataCollection/ cd Programs/DataCollection/CSharp mono IDFWrite.exe folder:{full/path/to/directory}/TrainingData python:{path/to/python} mono IDFWrite.exe folder:{full/path/to/directory}/TestData python:{path/to/python} (can be skipped if new samples are not generated) cd Programs/DataCollection python (can be skipped if new samples are not generated) cd Programs/DataCollection/CSharp mono IDFRead.exe folder:{full/path/to/directory}/TrainingData mono IDFRead.exe folder:{full/path/to/directory}/TestData (need to be run to create TensorFlow dataset for training ML model) cd Programs/DataCollection python 2. Train ANN, CBML, and CNN model for 8 iterations: (only run if re-training of ML models is required) cd Programs/MLModels python python python 3. Train auto encoder and CNN model on the encoded data (only run if re-training of the auto encoder is required) python python {full/path/to/directory} TrainingData python {full/path/to/directory} TestData python For making prediction use the following Python script: Set the working directory to {full/path/to/directory}/Programs/MLModels from {ann/cbml/cnn/cnnAutoencoder} import {ANN/CBML/CNN/CNNAutoEncoder} as Model m = Model('/Volumes/ga54dax/MLCNN', f'Samples_{NumberOfSamples:05}') m.Predict(f"{path/to/testdata/folder}", f"{path/to/new/outputfile.csv}")


Machine Learning, Building