Data, R Scripts and Random Forest Models for Winter Catch Crop Monitoring from Sentinel-2 NDVI Time Series in Germany
Description
This is the supplementary data to the article "Large-scale winter catch crop monitoring with Sentinel-2 time series and machine learning–An alternative to on-site controls?" in Computers and Electronics in Agriculture by C. Schulz, A.-K. Holtgrave, and B. Kleinschmit 2021 (https://doi.org/xxxxxxxx). The data contains a zip-file with the following folders: data (parcels, filled and unfilled time series tables, feature extraction results and prediction results) (csv, shp), model (random forest models for catch crop prediction) (rds), and R (R script files for Random Forest model training and prediction with RStudio) (r). The algorithms and RF models developed for this study were implemented via virtual Docker containers into the timeStamp software prototype which allows for large-scale automatized catch crop analysis on the parcel-level (timestamp.lup-umwelt.de). This software saves the raster data from the GTS² archive as parcel-wise clipped image time series into a PostGIS database. All further processing steps were performed with the statistical computing language R (RStudio Team, 2020). For raster data manipulation within the PostGIS database and downloading NDVI time series, we used the packages rpostgis (Bucklin and Basille, 2019) and RPostgreSQL (Conway et al., 2017). For time series filling and calculation of the predictors, we used the packages zoo (Zeileis et al., 2020), hydroGOF (Zambrano-Bigiarini, 2020), tsoutliers (de Lacalle, 2019), and changepoint (Killick et al., 2016). For RF modelling, we used the package caret (Kuhn et al., 2020).
Files
Steps to reproduce
To extract NDVI time series vectors from the GTS² archive, the following steps were conducted for each parcel: a) download of the S2 image time series, b) calculation of the NDVI for each image, c) computation of the mean NDVI pixel value, d) calculation of an equidistant time series with daily time steps, and e) linear temporal interpolation of the missing values within the time series. By steps d) and e), two univariate NDVI time series tables with daily resolution coded by a unique parcel-ID were created. One table comprises the real observations (i.e, the unfilled NDVI time series). This table was used for quality assessment regarding the number of available observations and the length of data gaps within the time series. The other table comprises the real observations with interpolated values (i.e., the filled NDVI time series). This step is necessary for the uniform calculation and comparison of temporal metrics throughout datasets with different observation dates and observation gaps.