Recombination shapes the 2022 monkeypox outbreak
Supplemental File 1. The Newick tree of the MPXV sequence dataset for Supplemental File 2. Supplemental File 2. Rooted phylogenetic tree of MPXV genomes during 2022 pandemics. Alignments of viral sequences were generated using MAFFT, and the phylogenetic tree was visualized using FigTree. Rooting was done by introducing the camelpox virus sequence (MZ300860) as an outgroup virus. Group U, M, and I of MPXV genomes is labeled in green, blue, and red, respectively. Recombinant MPXV isolates based on TR analysis are labeled in orange. Supplemental File 3. Genomic sequences of MPXV (B.1 clade) from January 1 to July 20, 2022 (N=415) in this study. The camelpox virus sequence (MZ300860) is included for phylogenetic analysis in Supplemental File 2. Supplemental File 4. TRAL analysis of the annotated TRs in MPXV genomic sequences of Supplemental File 3 and Figure S6. Supplemental Table 1. Summary of MPXV genome sequence information in this study. Figure S1. Sequences that contain SNP pairs with strong linkage disequilibrium, which the upper 95% confidence bound of the D’ is above 0.98 and the lower bound is above 0.7. Figure S2. TR C in the intergenic region of monkeypox A47R and A49R. (A) The map of TR C, A47R, and A49R gene. Arrows indicate the gene direction. (B) The location of TR C at the 3’-untranslated region of A47R transcript. The 3’ termination UUUUUNU sequence for adding poly(A) tail is boxed in red. Reference MPXV sequence is OP19276. Figure S3. TR B in the intergenic region of monkeypox A41L and A42R. (A) The map of TR B, A41L, and A42R gene. Arrows indicate the gene direction. Sequence b and c are shown in (B) and (C). (B) Sequence of TR B insertion in monkeypox genome as in (A) compared to vaccinia sequence. (C) Alignment of monkeypox and vaccinia in A42R promoter. TAAAT and AAAAAA element are boxed in red. Reference MPXV sequence is OP19276. Figure S4. TR D in the intergenic region of monkeypox B18R and B19R. (A) (A) The map of TR D, B18R, and B19R gene. Arrows indicate the gene direction. Sequence b, c and d are shown in (B), (C) and (D). (B) Alignment of monkeypox sequence B with vaccinia B19R promoter, including TAAAT and AAAAAAA element only found in vaccinia. (C and D) Sequences of B19R promoter of monkeypox. Sequence of TR D insertion in monkeypox genome as in (A). Reference MPXV sequence is OP19276. Figure S5. Comparison of vaccinia K3L and monkeypox C3L sequence. The stop codon was introduced at bp 130-132, causing C3L a truncated protein compared to K3L. Figure S6. (A) Table of divergence value ranges of TR A to F. The value of TR B is not available because its sequence is AT only. (B) Each box (TR A, C, D, E, F) contains individual TRNs with different colors associated with the parental viral sequences: IRBA22-14 (I, ON755040), MUW1527495 (M, ON019276) or USA_2022_CA002 (U, ON954773) as Figure 1F. TRs and TRNs were validated by the TRAL algorithm with p-value < 10-4.
Steps to reproduce
Sequence information Genomic sequences of MPXV (B.1 clade) from January 1 to July 20, 2022 (N=415) were obtained from NCBI database and available in Supplemental File. FASTA files of viral sequences were downloaded and analyzed by Tandem Repeat Finder Version 4.09. A statistical significance measurement for TRs was performed by using a cut-off alignment score [(2Ímatch%-7Ímismatch%-7Íindel%)Í100] by comparing them to random sequences created by simulation. Only scores of more than 100 by Tandem Repeat Finder were considered TRs. Our TR analysis did not include small TRs (<6 bp, microsatellites). Phylogenetic tree FASTA files of MPXV sequences were first aligned using MAFFT 7 software. The neighbor-joining method and Jukes-Cantor substitution model were used to analyze phylogenetic relationships between MPXV genomes using with bootstrap resampling number set as five. The rectangular phylogenetic tree was generated by exporting the tree file in Newick format by MAFFT. The FigTree software (version 1.4.4) was used to display the cladogram after rooting with an outgroup virus sequence of the camelpox virus (MZ300860). Statistical approaches for TR significance testing We used Tandem Repeat Annotation Library (TRAL 2.0), an open source Python 3 algorithm, to identify TR seed motifs from sequence profile databases via seven de novo TR algorithms (HHrepID, TRED, T-REKS, TRUST, XSTREAM, PHOBOS, TRF). Briefly, circular profile hidden Markov models (cpHMMs) were built from these TR seeds and then utilized to annotate TR regions in MPXV sequences. The mutation process was described by Markov models of substitution based on the Kimura’s two parameter (K2P) model to calculate the divergence value. All annotated TRs were statistically validated using a likelihood ratio test (LRT) that contrasts the model for the evolutionary origin of putative TR units to the scenario where the putative TRs were observed by random chance. The log-likelihood for a putative TR with n repeat units of length l (lnL1), and the log-likelihood of a putative random TR under null model (lnL0) can be calculated using the equation (3) and (4) described previously. Under each of the nested models, maximized log-likelihoods can be used to construct the LRT statistic, 2(lnL1-lnL0), which distribution was established empirically by Monte Carlo simulations to test the statistics significance (p-value). Each test set consisted of 1,000 simulated TRs. Gaps were treated as ambiguity characters.