Variantions dataset of Lychee
The VCF files contained all of cleaned SNPs and small InDels across the whole genome in lychee population.
Steps to reproduce
Resequencing data for the 72 accessions were mapped to the reference genome using BWA8 mem (v.0.7.17). The mapped reads were then sorted according to genomic coordinates using samtools10. Sequence data generated from different Illumina lanes were combined using ‘samtools merge’. After merging, duplicates were removed using Picard51 (v.2.5.0), and then HaplotypeCaller from the Genome Analysis ToolKit (GATK)11 (v.3.8) was used to call individual-specific gvcf files. Finally, the GenotypeGVCFs was used for joint calling of SNPs. After quality control of the SNPs using bcftools52, the SNPs were hard filtered using GATK VariantFiltration (DP < 300 || DP > 3000 || QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0), and only biallelic SNPs were selected for further analysis.