Data increment and enrichment to train component-based machine learning model for early stage energy prediction
Description
This dataset is used for training of component based machine learning (CBML) models described in the article. The article examines the effect of increasing and enriching training data on machine learning model's ability to generalise. Please read the full article for the relevant details of ML models. There are seven training dataset BaseCase, E-1, E-2, E-3, I-1, I-2, and I-3 and one test dataset. The trained machine learning (ML) components are saved under 'Models' folder in each dataset.
Files
Steps to reproduce
To train components of component-based machine learning model on different training datasets: Run the program "Programs/RunAnalysis.py" using python. This step will train the ML components for every training datasets and make predictions on the test dataset. Additional columns will be added to each component data. For example Building.csv will have additional columns '{trainingDataset}_Predicted_Energy Demand'. Further, it will evaluate the performance of ML components on test dataset. To assess the performance of component-based machine learning model: The simulated energy values and machine learning predictions can be used to obtain the relevant performances of the ML model trained using different training datasets. Please use TestData/Building.csv to find the relevant values. To assess the performance of CBML trained on BaseCase dataset, the columns 'Energy Demand' should be compared to 'BaseCase_Predicted_Energy Demand'