Gene-Level DRACH Motif Atlas of SARS-CoV-2 (2020 - 2025): N, Spike, ORF6, 5′ UTR, 3′ UTR

Published: 28 October 2025| Version 1 | DOI: 10.17632/6y75tk8j5b.1
Contributor:
Tahir Bhatti

Description

This dataset contains per-genome DRACH motif annotations for 9,356,279 SARS-CoV-2 genomes across five genomic regions: 5′ UTR, Spike (S), ORF6, Nucleocapsid (N), and 3′ UTR. For each genome and region, it provides: seq_id: Genome identifier year: Collection year (2020–2025) drach_count: Number of DRACH motifs ([AGT][AG]AC[ACT]) drach_positions: 1-based genomic positions of motifs drach_sequences: Actual DRACH k-mers (e.g., GGACT) drach_density_per_kb: Motif count normalized per kilobase Data is derived from the Wuhan-Hu-1 reference (NC_045512.2) and processed from a globally representative aligned FASTA. Files are provided in tab-separated (TSV) format, with compressed .zst versions for efficient storage. Summary: SARS-CoV-2 Gene-Level DRACH Motif Analysis Summary ======================================================= 5′ UTR ------------------------------ Total genomes: 9,356,279 Mean DRACH density: 21.48 motifs/kb Median DRACH density: 22.64 motifs/kb Temporal trend (2020–2025): -8.5% decline Spike (S) ------------------------------ Total genomes: 9,356,279 Mean DRACH density: 21.42 motifs/kb Median DRACH density: 21.72 motifs/kb Temporal trend (2020–2025): -1.3% decline ORF6 ------------------------------ Total genomes: 9,356,279 Mean DRACH density: 21.44 motifs/kb Median DRACH density: 21.50 motifs/kb Temporal trend (2020–2025): 0.9% increase Nucleocapsid (N) ------------------------------ Total genomes: 9,356,279 Mean DRACH density: 28.03 motifs/kb Median DRACH density: 28.57 motifs/kb Temporal trend (2020–2025): 1.3% increase 3′ UTR ------------------------------ Total genomes: 9,356,279 Mean DRACH density: 12.33 motifs/kb Median DRACH density: 13.10 motifs/kb Temporal trend (2020–2025): -5.9% decline This dataset is for our upcoming article. A Preprint of full genome analysis has been posted at: https://doi.org/10.21203/rs.3.rs-7926428/v1 please note, this dataset is large scale analysis, suitable for use therefore, for population level analysis around the globe. Data has been processed by TahirHB@Hotmail.Com

Files

Categories

Virology, Severe Acute Respiratory Syndrome Coronavirus 2, m6A Methylation

Licence