Data and scripts of Combining genotyping approaches improves resolution for association mapping: a case study in tropical maize under water stress conditions study

Published: 31 May 2024| Version 1 | DOI: 10.17632/6pb9prrbbb.1
Júlio César DoVale, Roberto Fritsche-Neto


This dataset comprises all R codes, phenotypic and molecular data necessary to replicate this study. Iin short this study is: Genome-wide Association Studies (GWAS) identify genome variations related to specific phenotypes, typically analyzed by Single Nucleotide Polymorphism (SNP) markers. Genotyping platforms such as those involving genomic hybridization microarray (SNP-Chip or SNP-Array) or sequencing-based genotyping techniques (GBS) are effective in genotyping various samples with hundreds of thousands of SNPs. However, these approaches can introduce bias in tropical maize germplasm analyses, as the temperate line B73 is commonly used as the reference genome. Therefore, an alternative to overcome this limitation is using a simulated genome called “Mock,” adapted to the population and created with bioinformatics tools. A few recent studies have shown that SNP-Array, GBS, and Mock yield similar results concerning population structure, definition of heterotic groups, tester selection, and genomic hybrid prediction. However, no studies have been identified thus far regarding the results generated by these different genotyping approaches for GWAS. Therefore, this study aims to test the equivalence among the three genotyping scenarios in identifying significant effect genes in GWAS. To achieve this, maize was used as the model species, where SNP-Array genotyped 360 inbred lines from a public panel via the Affymetrix platform and GBS. The GBS data were used to perform SNP calling using the temperate inbred line B73 as the reference genome (GBS-B73) and a simulated genome “Mock” obtained in-silico (GBS-Mock). The study encompassed four above-ground traits with plants grown under two levels of water supply: well-watered (WW) and water-stressed (WS). In total, 46, 34, and 31 SNP were identified in the SNP-Array, GBS-B73, and GBS-Mock scenarios, respectively, across the two water levels. Overall, the identified candidate genes varied along the various scenarios but had the same functionality. Regarding SNP-Array and GBS-B73, genes with functional similarity were identified even without coincidence in the physical position of the SNPs. These genes and regions are involved in various processes and responses with applications in plant breeding. In terms of accuracy, the combination of genotyping scenarios compared to those isolated is feasible and recommended, as it increased all traits under both water supply conditions. In this sense, it is worth highlighting the combination of GBS-B73 and GBS-Mock scenarios, not only due to the increase in the resolution of GWAS results but also due to the reduction of costs associated with genotyping as well as the possibility of conducting genomic breeding methods.



Universidade Federal do Ceara


Quantitative Genetics, Plant Breeding