High-Quality Crop Phenology NDVI Time Series from Copernicus HR-VPP
Description
This dataset contains high-quality NDVI time series derived from the “Copernicus High Resolution Vegetation Phenology and Productivity” (HR-VPP) product for extensive irrigated crops located in two major Mediterranean irrigation areas in northeastern Spain (Monegros and Zaidín, Aragón). The dataset covers six growing seasons (2018–2023) and includes 1,673 agricultural plots representing six cropping systems: barley, wheat, peas, maize monoculture, maize double cropping, and sunflower double cropping. NDVI observations were extracted from Sentinel-2 HR-VPP products with a spatial resolution of 10 m and a temporal frequency of 5 days. A rigorous cloud-masking strategy based on the QFLAG2 quality layer was applied to retain only high-quality observations. Missing observations were reconstructed using a Kalman filter-based approach, evaluated in controlled simulations of missing data. In addition, the dataset incorporates a multi-level quality control workflow that combines clustering based on the dynamic time warping (DTW) technique, K-means clustering, and outlier detection methods based on functional depth (Fraiman-Muniz, h-modal, random projection, and Tukey depth) to remove anomalous phenological trajectories. Two complementary datasets are provided in CSV format: • Imputed NDVI time series that include both original and reconstructed observations. • Filtered and labeled NDVI time series that contain only representative phenological trajectories and cluster assignments obtained using the DTW method. The dataset is intended for use in crop phenology, remote sensing, precision agriculture, irrigation management, machine learning, time series analysis, and the validation of phenological methodologies or data imputation techniques. These data were generated as part of the doctoral thesis: “Modelización y teledetección para la actualización de la demanda hídrica de cultivos en grandes zonas regables ante el reto de mejorar la gestión del agua en un contexto de cambio climático”, which is part of the project “Ajuste de ciclos y coeficientes de cultivos para la optimización de la gestión del agua en grandes zonas regables en un contexto de cambio climático (LAIKcA; PID2021-124029OR-I00). Alexey Valero-Jorge sincerely thanks the PRE2022-102328 grant funded by MICIU/AEI/10.13039/501100011033 and FSE+.
Files
Steps to reproduce
1. Download the Copernicus High Resolution Vegetation Phenology and Productivity (HR-VPP) NDVI products corresponding to Sentinel-2 tile 30TYM for the 2018–2023 agricultural seasons through the WEkEO platform. 2. Apply cloud masking using the QFLAG2 quality layer, retaining only pixels classified as “Clear Land” (QFLAG2 = 1). 3. Obtain agricultural plot boundaries and crop declarations from the official SIGPAC database for the study areas of Monegros and Zaidín (Aragón, Spain). 4. Select agricultural plots with: o consistent crop declarations during the study period, o and an area ≥ 5 ha to reduce mixed-pixel effects. 5. Extract plot-level NDVI values by calculating the median of valid Sentinel-2 pixels within each plot boundary for every acquisition date. 6. Construct NDVI temporal series for each plot and identify missing observations generated after cloud masking. 7. Perform missing-data imputation using the Kalman filter implemented in R. The imputation procedure was evaluated under controlled missing-data simulations (10–40% missing values). 8. Apply the first filtering stage using Dynamic Time Warping (DTW) distance and K-means clustering to identify divergent phenological trajectories. 9. Apply the second filtering stage using functional depth measures: o Fraiman-Muniz depth, o h-modal depth, o random projection depth, o and random Tukey depth. 10. Remove anomalous trajectories detected by the multi-level filtering workflow and retain only representative phenological series. 11. Export the final datasets in CSV format, including: • imputed NDVI time series, • filtered and labeled NDVI time series, • and DTW cluster assignments. 12. The workflow was implemented in R using the packages: • terra, • sf, • zoo, • dtw, • fda.usc, • dplyr, • tidyr, • purrr, • and TSGenerator.
Institutions
- Centro de Investigación y Tecnología Agroalimentaria de AragónAragon, Zaragoza
- Universidad de ZaragozaAragon, Zaragoza
Categories
Funders
- Agencia Estatal de InvestigaciónMinisterio de Ciencia, Innovación y UniversidadesMadridGrant ID: PRE2022-102328
- Agencia Estatal de InvestigaciónMinisterio de Ciencia, Innovación y UniversidadesMadridGrant ID: PID2021-124029OR-I00