SARS-CoV-2 Intra-host Mutational Landscape: A Curated Dataset of iSNVs

Published: 30 April 2024| Version 2 | DOI: 10.17632/8nvgtrkzdm.2
Fatima Mostefai,


This dataset, derived from 128,423 high-quality SARS-CoV-2 NGS libraries, represents a comprehensive and precise collection of intra-host single nucleotide variants (iSNVs) processed through a rigorous workflow to ensure accuracy and reliability. Key steps include stringent quality control, variant calling, application of metrics like Strand Bias Likelihood (S) and Alternative Allele Frequency (AAF) for artifact removal. This iSNV dataset, refined to exclude sequencing artifacts, offers a valuable resource for understanding SARS-CoV-2 intra-host mutational dynamics. We also provide a file with the recommended genomic positions to mask for accurate iSNVs calling. The 477 genomic positions are highly recurrent strand bias artifacts.



Institut De Cardiologie de Montreal, Universite de Montreal


Single Nucleotide Polymorphism, Severe Acute Respiratory Syndrome Coronavirus 2