Data for: Genotyping-by-sequencing and ecological niche modeling illuminate phylogeography, admixture, and Pleistocene range dynamics in quaking aspen (Populus tremuloides)

Published: 13-04-2020| Version 3 | DOI: 10.17632/jhkhvdgyfy.3
Justin Bagley,
Neander Heming,
Eliecer Gutierrez


In support of the manuscript by Bagley et al. (2020; see below) on quaking aspen phylogeography and ecological niche modeling (ENM), this accession provides 1) the in-house laboratory protocol used to extract DNA from aspen leaf tissues (modified from Strauss Lab); 2) the Supporting Information files for the corresponding manuscript (Bagley et al. 2020); 3) code used to conduct independent runs of the TASSEL-GBSv2 SNP discovery pipeline (Glaubitz et al. 2014) on our final (combined) genotyping-by-sequencing (GBS) dataset; 4) resulting SNP variant files from TASSEL-GBSv2 and final filtered variant call format (VCF) and genotype data files used during our genomic analyses; 5) R script and metadata used during basic population genomics analyses of the final filtered SNP data (final VCF file); and 6) unfiltered vs filtered species occurrence data files and computer code (R scripts) used during our ENM analyses of our focal taxon, Populus tremuloides. REFERENCES Bagley, J. C., Heming, N. M., Gutiérrez, E. E., Devisetty, U. K., Mock, K. E., Eckert, A. J., & Strauss, S. H. (2020). Genotyping-by-sequencing and ecological niche modeling illuminate phylogeography, admixture, and Pleistocene range dynamics in quaking aspen (Populus tremuloides). Ecology and Evolution. Glaubitz, J. C., Casstevens, T. M., Lu, F., Harriman, J., Elshire, R. J., Sun, Q., & Buckler, E. S. (2014). TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS One, 9(2): e90346


Steps to reproduce

OVERVIEW Molecular laboratory methods: ------------------------------------- - Conduct DNA extraction from leaf tissues as described in the enclosed in-house protocol - Follow other methods for sequencing and dataset construction listed in the Materials and Methods section of the manuscript Genomic data analyses involved: ------------------------------------- - Preparing code and data files, installing requisite software, and setting up directory structure on local machines (Mac) and high-performance supercomputing cluster (Linux) - Running the SNP discovery pipeline TASSEL-GBSv2 on our final dataset (raw sequence files from our GBS experiment and from plates sequenced by Schilling et al. 2014; see text for details) - Conducting various phylogenomic and population genetic analyses in TreeMix and R - Plotting results, exploring the data/results, and conducting statistical analyses in R - Running analyses to estimate sample ploidy levels from mapped NGS reads in the final dataset using the program nQuire - Re-running some analyses of population structure and genetic diversity clines after removal of putative polyploid individuals (triploids, tetraploids) to assess potential impacts of polyploidy on inferences Ecological niche modeling analyses involved: ------------------------------------- - Preparing code and installing software on local machines (Mac) - Preparing the environmental data in R - Preparing the occurrence data in R - Preparing minimum convex polygons (MCPs) for the full-species and cluster datasets, and a minimum concave polygon (MCcP) for the full-species occurrence data - Extracting cluster coordinates from within MCP-based calibration areas (see text, Rscripts, Appendix S1, and Data S2 for details) - Tuning MaxEnt model parameters (FCs and RM) using ENMevaluate function of ENMeval - Running final ENMs on the species and cluster datasets, using parameters selected in ENMeval - Projecting the final ENMs onto different climate scenarios (time-slices) - Plotting results and calculating metrics describing the models As mentioned in the text and shown in the Rscript files, R analyses largely relied upon the R packages raster, ENMwizard, and ENMeval. For the ENM analyses, only the occurrences and Rscripts are necessary to replicate our results with climate/paleoclimate data layers, because the results files and other analysis files (calibration area shapefiles) are all generated by the Rscripts. Other information necessary for reproducing our analyses is provided in the main text, Appendix S1, Data S1 and S2 files of the Supporting Information, and the README file in this accession. Contact the corresponding author (JCB) for additional information on files or analyses.