Dataset Name: PV Yield Forecasting Data Set Data Format: The dataset is provided in CSV format, consisting of time-stamped records. Features: Time Series Data: Date and time stamps for each record, ranging from 1990 Jan to 2014 Dec. PV Output Readings: Measurements of Photovoltaic output in kWh. Environmental Factors: Includes solar irradiance, temperature, humidity, and cloud cover. Processed Features: Data columns processed for XGBoost and LSTM models, including seasonal components. Time Range and Frequency: Data spans from 1990 to 2014, with hourly recording frequency. Pre-processing and Cleaning: The dataset has been pre-processed for anomalies and missing values, and features have been selected and cleaned for optimal model input. Purpose: This data is used to develop and test a novel forecasting framework combining XGBoost, time series decomposition, and LSTM for both short and long-term solar yield predictions. Performance Metrics: Includes columns for prediction accuracy, nRMSE values for the developed framework, and comparison with benchmark models. Comparative Model Data: Results from LSTM, FRNN, NNE, and other models for comparative analysis. Dataset Size and Scope: Contains approximately [X million] records, covering various geographical locations and types of PV plants. Additional Notes: This dataset is part of a comprehensive study aiming to enhance the integration of PV plant output with the main power grid, as detailed in our research paper.