VCF files of LD-pruned genome-wide SNPs and InDels in abaca (Musa textilis Née)
The study that produced this dataset aimed to discover and analyze polymorphisms in abaca (Musa textilis) vital for varietal authentication and cross-species genotyping for advanatageous traits such as disease resistance, climate change resilience and enhanced agronomic traits. This dataset contains the resulting genotype calls in abaca (within M. textilis and between M. textilis and other Musa spp.) stored in variant call format (VCF) files. The genotypes are in the form of SNPs or InDels. The VCf files starting with 'Mtextilis' and 'Musa' pertain to genotypes mined within M. textilis and between Musa spp., respectively. The VCF files containing SNPs or InDels are denoted by 'SNPsonly' and 'Indelsonly' file labeles, respectively. The 'AP' label indicate that the reference genome used for mapping and variant calling is a polished version of Galvez et al. (2020)'s reference genome. The 'minQ40' label indicates that the VCF files contains only SNPs and InDels with mapping quality of at least 40. The 'geno0.1' and 'pruned' labeles indicate that genotypes having at most 10% missing genotypes and those that are pruned-in (based on linkage disequilibrium thresholds) were selected.
Steps to reproduce
SNP and InDels were mined in abaca through whole genome resequencing of 11 abaca varieties and accession, followed by mapping to the abaca reference genome (Galvez et al., 2021) and variant calling using BWA, Samtools and BCFtools programs including M. textilis and Musa spp. sequence reads produced by Sambles et al. (2020). The mined variants were then filtered by selecting those that are biallelic, have mapping qualities (MQ) of at least 40, are in linkage equilibrium and have at most 10% missing genotypes. Principal component analysis and phylogenetic analysis of these mined genotypes enabled genetic differentiation between abaca varieties and accessions, and between abaca, Musa troglodytarum and M. acuminata /M. balbisiana.