PV SCADA data

Published: 3 June 2026| Version 1 | DOI: 10.17632/9fdvyjrf67.1
Contributors:
Alireza Rostamipour, Amir Hossein Ghayeni

Description

This repository contains three complementary datasets collected from an operational utility-scale photovoltaic (PV) farm, used to develop and evaluate a reinforcement learning-based controller for PV plant optimization. Dataset 1 – Inverter SCADA Data Half-hourly records from eight string inverters, spanning 1 April to 29 June 2025. Variables include DC power (kW), AC power (kW), DC voltage (V), DC current (A), module temperature (°C), inverter efficiency (%), and inverter status codes. After preprocessing (timestamp standardization, nighttime removal using GHI > 50 W/m², and short-gap interpolation), the dataset contains approximately 18,000 valid daytime samples per inverter. Dataset 2 – Meteorological Data 15-minute resolution measurements from three on-site weather stations, resampled to 30-minute intervals via linear interpolation to align with SCADA timestamps. Variables include global horizontal irradiance (GHI, W/m²), direct normal irradiance (DNI, W/m²), diffuse horizontal irradiance (DHI, W/m²), ambient temperature (°C), wind speed (m/s), relative humidity (%), and additional environmental parameters. All timestamps are provided in UTC. Dataset 3 – Grid Export Data Half-hourly records of grid-side measurements, including exported energy (kWh), reactive power (kvar), curtailment flag, power factor, and frequency (Hz). Preprocessing: Steps applied across all datasets include timestamp standardization, removal of nighttime records (GHI ≤ 50 W/m²), forward-fill interpolation for short gaps (maximum 2 consecutive steps), and correction of temperature units. A refined Performance Ratio (PR) was computed per inverter following the IEC 61724 definition, yielding an average PR of 12.63% across all inverters (range: 11.31%–14.20%), reflecting real-world losses including module temperature effects (average 43.2 °C), soiling, mismatch, and inverter efficiency variations. These datasets provide a realistic representation of utility-scale PV plant operating conditions and serve as the empirical basis for the reinforcement learning experiments described in the associated publication.

Files

Categories

Photovoltaic Performance

Licence