Datasets Comparison
Version 1
SMART-CCPP Dataset
Description
The SMART-CCPP dataset comprises four input features—ambient temperature (AT), ambient barometric pressure (BP), dew point temperature (DP), and steam turbine LP exhaust vacuum (V)—along with one target variable, the electrical power output (EP). The original dataset included 8,124 samples; however, following preprocessing and cleaning, 3,234 valid data points were retained.
Data were collected over a five-year period (2020–2025) across all four seasons to capture varying climatic conditions. Although the raw measurements were initially recorded at 40 ms intervals, they were resampled to four-hour intervals during preprocessing to ensure manageability.
To guarantee data quality prior to model training, several preprocessing steps were conducted using Python’s pandas and scikit-learn libraries:
Data Cleaning: Samples affected by missing values, instrument malfunctions, grid instabilities, load transients, or maintenance activities were removed. To ensure reliability, only data from full-load operation were preserved.
Feature Scaling: All features and the target variable were standardized and normalized using Min-Max scaling to maintain consistency.
Correlation Analysis: A correlation matrix was generated to examine relationships between the features and the target variable.
Institutions
Institutions
Salahaddin University- Hawler
Categories
Energy Consumption
Licence
Creative Commons Attribution 4.0 International
Version 2
SMART-CCPP Dataset
Description
The SMART-CCPP dataset comprises four input features—ambient temperature (AT), ambient barometric pressure (BP), dew point temperature (DP), and steam turbine LP exhaust vacuum (V)—along with one target variable, the electrical power output (EP). The original dataset included 8,124 samples; however, following preprocessing and cleaning, 3,234 valid data points were retained.
Data were collected over a five-year period (2020–2025) across all four seasons to capture varying climatic conditions. Although the raw measurements were initially recorded at 40 ms intervals, they were resampled to four-hour intervals during preprocessing to ensure manageability.
To guarantee data quality prior to model training, several preprocessing steps were conducted using Python’s pandas and scikit-learn libraries:
Data Cleaning: Samples affected by missing values, instrument malfunctions, grid instabilities, load transients, or maintenance activities were removed. To ensure reliability, only data from full-load operation were preserved.
Feature Scaling: All features and the target variable were standardized and normalized using Min-Max scaling to maintain consistency.
Correlation Analysis: A correlation matrix was generated to examine relationships between the features and the target variable.
Institutions
Institutions
Salahaddin University- Hawler
Categories
Energy Consumption
Licence
Creative Commons Attribution 4.0 International