Variantions dataset of Lychee

Published: 26 July 2021| Version 1 | DOI: 10.17632/v37bv5jt6g.1
Contributor:
junting feng

Description

The VCF files contained all of cleaned SNPs and small InDels across the whole genome in lychee population.

Files

Steps to reproduce

Resequencing data for the 72 accessions were mapped to the reference genome using BWA8 mem (v.0.7.17). The mapped reads were then sorted according to genomic coordinates using samtools10. Sequence data generated from different Illumina lanes were combined using ‘samtools merge’. After merging, duplicates were removed using Picard51 (v.2.5.0), and then HaplotypeCaller from the Genome Analysis ToolKit (GATK)11 (v.3.8) was used to call individual-specific gvcf files. Finally, the GenotypeGVCFs was used for joint calling of SNPs. After quality control of the SNPs using bcftools52, the SNPs were hard filtered using GATK VariantFiltration (DP < 300 || DP > 3000 || QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0), and only biallelic SNPs were selected for further analysis.