POPULATION GENETIC STRUCTURE OF TWO CRYPTIC DUCKWEED SPECIES
The data presented in this study, titled "Population Genetic Structure of Two Cryptic Duckweed Species (Lemna minor & L. turionifera) in Alberta Using a Genotyping-by-Sequencing Approach," is about to be published in the Journal Aquatic Botany. Our research utilized genotyping-by-sequencing to investigate the population genetic structure of two duckweed species, Lemna minor and L. turionifera Landolt, in Alberta. For this analysis, we employed four distinct datasets, each represented by a .vcf file: Lemna_minor_&_turionifera_lm550refgenom: This dataset includes samples of both L. minor and L. turionifera, with the reference genome being L. minor 5500 (An et al., 2018). Lemna_minor_lm550refgenom: This dataset comprises solely L. minor samples, and the reference genome used is L. minor 5500 (An et al., 2018). Lemna_turionifera_lm550refgenom: This dataset contains only L. turionifera samples, with the reference genome being L. minor 5500 (An et al., 2018). Lemna_minor_lm8627refgenom: This dataset consists exclusively of L. minor samples, and the reference genome utilized is L. minor 8627 (An et al., 2018). The samples of Lemna minor and L. turionifera were collected from various locations across six watersheds and 12 river basins in Alberta. This comprehensive study aims to shed light on the population genetic structure of these cryptic duckweed species in Alberta using advanced genotyping-by-sequencing techniques. For more information please contact the corresponding author Kanishka M. Senevirathna, firstname.lastname@example.org
Steps to reproduce
1. Sampling To investigate potential correlations between the population structure of Lemna populations and geographical and environmental factors, we collected samples of Lemna minor and L. turionifera from various locations in six watersheds and 12 river basins in Alberta (refer to Figure 1 and Supplementary Table 1). This sampling effort was conducted through a collaborative initiative involving the Alberta Biodiversity Monitoring Institute and researchers from the University of Lethbridge (see Senevirathna et al., 2021). 2. DNA Extraction and Genotyping-by-Sequencing (GBS) For GBS analysis, we extracted DNA from fronds of 48 L. minor and 144 L. turionifera individuals using the Geneaid Genomic DNA Mini Kit (Plant: GP100; FroggaBio Inc.). Subsequently, a total of 192 Lemna DNA samples from 50 locations were sent to the Genomic Analysis Platform at Université Laval (Quebec City, QC, Canada) for library preparation using PstI and MspI enzymes. Sequencing was performed on an Ion Torrent sequencer, Ion S5 with 540 chips, following the protocol described by Abed et al. (2019). 3. SNP Discovery To identify single nucleotide polymorphisms (SNPs), we employed the Fast-GBS pipeline (Torkamaneh et al., 2017). After demultiplexing with Sabre, we utilized the Cutadapt tool (Martin, 2011) to eliminate adapter sequences, primers, and other undesired sequences from the high-throughput sequencing reads. Subsequently, we aligned the reads using the Burrows-Wheeler Aligner tool (BWA) (Li and Durbin, 2010) against three different reference genomes: L. minor 5500, L. minor 8627, and L. gibba 7742 genomes (An et al., 2018). Remarkably, the results obtained were similar across all three reference genomes. Therefore, we selected the L. minor 8627 (800 Mb) genome sequence as the reference for subsequent analyses. SAMtools was utilized for file conversion and indexing (Li, 2011). Post-processing of the aligned reads, haplotype construction, and variant calling were performed using Platypus (Rimmer et al., 2014). Various settings were employed, including a minimum depth of coverage (minDP ≥ 2), a maximum mismatch for alignment (n = 5), a maximum threshold for missing data (MaxMD = 50%), and a minimum minor allele frequency (MinMAF ≥ 0.05). Filtered SNP files were generated using VCFtools v. 0.1.11 (Danecek et al., 2011). To construct the final datasets, we removed SNPs with over 50% missing data (using the max-missing option of vcftools: "--max-missing 0.5") and individuals with more than 40% missing SNPs. Three datasets were subsequently produced: the first dataset contained both L. minor and L. turionifera, while the remaining two datasets contained each species separately (L. minor or L. turionifera).
Alberta Conservation Association Research Grant
015-00-90-281 [RAL and TMB]
Natural Sciences and Engineering Research Council of Canada Discovery Grants
RGPIN-2015-05486 [RAL] and 2019-05068 [TMB]