Genome-wide ambiguous variant-calls from the misalignments of SNPs-stricken next-generation sequencing reads
A gzipped variant-call-format (VCF) file of genome-wide ambiguous variant-calls from the misalignments of SNPs-stricken next-generation sequencing reads. The variant-calls calls are 'complement'/missing from the gnomAD v3.1 database's common (>=1% MAF) SNPs of 7 populations. The investigated populations are afr: African/African American, amr: Latino/Admixed American, asj: Ashkenazi Jewish, eas: East Asian, fin: Finnish, nfe: Non-Finnish European and sas: South Asian. The simulated data sets were based on two common read-lengths/-formats: 1x100bp and 2x150bp. The intention for such a design was to mimic short DNA fragments' insert sizes with the 1x100bp and good quality DNA fragments with the 2x150bp read format. Within the VCF, "SM" information tag denotes the name of the sample, which the variant is ambiguously called from. The samples' naming convention follows the format of "<read-length>-<population>". As such, "100-sas" would denote 100bp read-length data set simulated with common (>= 1% MAF) SNPs specific to the South Asian population.
Steps to reproduce
Please refer to the Online Methods of the main paper for steps to reproduce the list of ambiguous variant-calls. Thank you.