Genome-Wide Association Study of Metabolic Traits in the Duckweed Spirodela polyrhiza

Published: 27 July 2024| Version 1 | DOI: 10.17632/xwsxpfcysd.1
Contributors:
,
,
,
,

Description

This dataset archives raw data and scripts for GWAS and analysis of metabolite contents in the giant duckweed Spirodela polyrhiza. In our study we aim on identifying the genetic basis controlling free metabolite contents and growth in S. polyrhiza. A total of 42 free metabolites were extracted from 137 genotypes and correlated with fitness parameters. Genetic associations with these metabolic traits were determined using GWAS. Metabolite contents were determined using LC-MS. Candidate gene expression was checked with qPCR. All scripts were written using R (version 4.2.0) and all GWAS were conducted in a macOS (version 10.14.6)/conda environment using the vcf2gwas platform.

Files

Steps to reproduce

The folder Figure2 contains raw data files and a R-script for PCA and correlation analysis of metabolic traits with all pdf files created by these analyses: correlation_137_genotypes.txt – Input data file containing mean genotype values for fitness parameters and free metabolite concentrations for correlation analysis pop_growth_PCA_98_genotypes.txt – Input data file for PCA analysis of population effects on fitness parameters pop_metabolites_PCA_97_genotypes.txt - Input data file for PCA analysis of population effects on free metabolite levels The folder Figure3 contains text files and an R-script used for generation of Manhattan and QQ-Plots from previous GWA analyses on selected free metabolite contents. Generated Manhattan and QQ-Plots are stored as pdf-files in the same folder. The Folder Figure4 contains text files and an R-script used for generation of Manhattan and QQ-Plots from previous GWAS, quantification of genetic marker effects and analysis of gene expression. Raw gene RT-qPCR expression data are stored in the subfolder raw_data, whereas the subfolder primer_efficiency_calculation contains a script and files used for calculation of primer efficiencies for RT-qPCR: L.Glutamine_umol_g_DW_mod_sub_metabolites_97_genotypes_SVs_reformatted_new_imputed.assoc.txt and L.Serine_umol_g_DW_mod_sub_metabolites_97_genotypes_SVs_reformatted_new_imputed.assoc.txt – Input data files for generating Manhattan and QQ-Plots from Structure variation based GWAS on Serine and Glutamine contents SV_effects.txt – Input data for relating presence of a significant genetic marker with plant phenotype qPCR_data_root_frond.txt and qPCR_inter_plate_calibration.txt – Data files used for quantification of gene expression of candidate gene SpUBP7 The Folder GWAS contains an example code used for GWAS analysis and four subfolders. Among them SNPs and SVs contain all output files from SNP and structure variation based GWAS on fitness parameters and free metabolite levels, respectively. Phenotypic and genotypic information of analysed genotypes are stored in the folders genotypic_data and phenotypic_data, respectively. The folder heritability analysis contains a R-script and data files used for quantification of broad sense heritability of metabolic traits. The folder metabolite_quantification contains excel files used for quantification of free metabolite contents. The subfolder raw_data contains all measurement files from LC-MS analysis of metabolite contents. Files with the integrated peak areas are stored in the folder peak_areas. The folder Supplemental_figures contains R-scripts and data files for analysing trait distributions (subfolder FigureS1-4), correlations between fitness data and metabolite levels (subfolders FigureS5, FigureS6 and FigureS7) and clustering of metabolite classes (FigureS8).

Institutions

Westfalische Wilhelms-Universitat Munster, Johannes Gutenberg Universitat Mainz

Categories

Aquatic Plant, Metabolomics, Quantitative Genetics

Funding

Deutsche Forschungsgemeinschaft

427577435

Licence