Skip to main content
Exit comparison
Removed
Added

Datasets Comparison

Version 1

SMART-CCPP Dataset

Published:21 August 2025|Version 1|DOI:10.17632/y25g22ydct.1
Contributor:Asaad Saber Ameen

Description

The SMART-CCPP dataset comprises four input features—ambient temperature (AT), ambient barometric pressure (BP), dew point temperature (DP), and steam turbine LP exhaust vacuum (V)—along with one target variable, the electrical power output (EP). The original dataset included 8,124 samples; however, following preprocessing and cleaning, 3,234 valid data points were retained. Data were collected over a five-year period (2020–2025) across all four seasons to capture varying climatic conditions. Although the raw measurements were initially recorded at 40 ms intervals, they were resampled to four-hour intervals during preprocessing to ensure manageability. To guarantee data quality prior to model training, several preprocessing steps were conducted using Python’s pandas and scikit-learn libraries: Data Cleaning: Samples affected by missing values, instrument malfunctions, grid instabilities, load transients, or maintenance activities were removed. To ensure reliability, only data from full-load operation were preserved. Feature Scaling: All features and the target variable were standardized and normalized using Min-Max scaling to maintain consistency. Correlation Analysis: A correlation matrix was generated to examine relationships between the features and the target variable.

Institutions

Institutions

Salahaddin University- Hawler

Categories

Energy Consumption

Licence

Creative Commons Attribution 4.0 International

Version 2

SMART-CCPP Dataset

Published:22 August 2025|Version 2|DOI:10.17632/y25g22ydct.2
Contributors:Asaad Saber Ameen,

Description

The SMART-CCPP dataset comprises four input features—ambient temperature (AT), ambient barometric pressure (BP), dew point temperature (DP), and steam turbine LP exhaust vacuum (V)—along with one target variable, the electrical power output (EP). The original dataset included 8,124 samples; however, following preprocessing and cleaning, 3,234 valid data points were retained. Data were collected over a five-year period (2020–2025) across all four seasons to capture varying climatic conditions. Although the raw measurements were initially recorded at 40 ms intervals, they were resampled to four-hour intervals during preprocessing to ensure manageability. To guarantee data quality prior to model training, several preprocessing steps were conducted using Python’s pandas and scikit-learn libraries: Data Cleaning: Samples affected by missing values, instrument malfunctions, grid instabilities, load transients, or maintenance activities were removed. To ensure reliability, only data from full-load operation were preserved. Feature Scaling: All features and the target variable were standardized and normalized using Min-Max scaling to maintain consistency. Correlation Analysis: A correlation matrix was generated to examine relationships between the features and the target variable.

Institutions

Institutions

Salahaddin University- Hawler

Categories

Energy Consumption

Licence

Creative Commons Attribution 4.0 International