Phenotypic, Genotypic and Environmental Data: Non-linear kernels, dominance, and envirotyping data increase the accuracy of genome-based prediction in multi-environment trials
The phenotypic data consisted of grain yield (ton/ha) records collected from two data sets of tropical maize hybrids in Brazil (HEL and USP). HEL includes a set of 247 maize hybrids from a core of 452 F1 hybrids obtained by crossing 106 inbred lines. Those hybrids were evaluated in 2015 in five sites in Brazil (S1-S3 in the southern region and S4-S5 in the mid-west region). USP includes a set of 570 single-hybrids evaluated across eight environments (E1-E8), involving an arrangement of 2 locations, 2 years, and 2 nitrogen levels. For both sets, the parent lines were genotyped with an Affymetrix Axiom Maize Genotyping Array of 616 K SNPs (Single Nucleotide Polymorphisms) (Unterseer et al. 2014). Then, standard quality controls (QC) were applied to the data, by removing markers with a Call Rate ≥ 0.95. After this process, the remaining missing data in the lines were imputed with the Synbreed package (Wimmer et al. 2012) using the algorithms from the Beagle 4.0 software (Browning and Browning 2008). Finally, markers with a Minor Allele Frequency (MAF) of ≤ 0.05 were removed, resulting in a total of 52,811 high-quality SNPs for HEL and 54,113 high-quality SNPs for USP. This envirotyping pipeline used in this study were based on the core of functions present in the R package EnvRtype (available at https://github.com/allogamous/EnvRtype [verified 05 July. 2020]). We used 5 time intervals for the function W.matrix() described in the package webpage: (0,14,35,65,90,120). Includes a total of 248 envirotype covariables was obtained for USP and 243 for HEL.
Steps to reproduce
All codes and steps are given in https://github.com/gcostaneto/KernelMethods