Subsets according time windows for Coffee Leaf Rust Incidence modeling. Paper: Discovering weather periods and crop properties favorable for coffee rust incidence from feature selection approaches

Published: 08-06-2021| Version 1 | DOI: 10.17632/wpy54dw6t7.1


The climate dataset was processed to generate four data subsets corresponding to four time windows: 3, 4, 7, and 14 consecutive days. We use the concept of time windows to generate consecutive subperiods of each climate variable within the main period of 14 days before the date of prediction. (DP). This process generates new attributes corresponding to each variable. The number of periods depends on the size of the window e.g., the window of 4 consecutive days generates 11 new sub-periods for each climatic variable. The index that characterizes it indicates the days covered by the window e.g., tMin11-8 corresponds to the minimum temperature between days 11 and 8 before DP. We called the generated subsets 3D, 4D, 7D, 14D. Each subset had 439 instances, and the dimension depended on window size: 14D had 13 variables (8 related to climate), 7D had 69 features (64 related to climate), 4D had 93 features (88 related to climate), and 3D had 101 features (96 related to climate). The target variable was predicted Coffee Leaf Rust Incidence (pCLRI), and the predictors were the rest of the experiment variables: current CLRI (cCLRI), shade, host growth (hGrowt), management (mgmt) and climatic variables: maximum (tMax) and minimum (tMin) air temperature, average (tAvg) air temperature calculated over the day, average (hAvg) and minimum (hMin) relative humidity, daily precipitation (pre). The data in the files did not contain null data. The thermal amplitude (tAmp), which represents the difference between the maximum and minimum temperatures, and the characterization of each day as a rainy day or not (precipitation greater or equal to 1 mm) (rDay)