Large Scale Insect Phylogenomics Analyses reveal new insights into insect Evolution
Description
Large-scale phylogenomic analysis plays a pivotal role in clarifying phylogenetic relationships within broad species groups, refining the precision, and resolving power beyond that of prior multi-locus efforts. Here, we constructed a robust, time-calibrated phylogenetic tree encompassing 694 insect species across 19 orders, using 683 Benchmarking Universal Single-Copy Orthologs (BUSCO) genes. Employing both concatenation and coalescent tree-building methods, we conducted an exhaustive analysis of our extensive multigene dataset. Our data suggest a sister group relationship between Odonata and Neoptera (modern winged insects), challenging the longstanding hypothesis of Palaeoptera monophyly including Odonata and Ephemeroptera. Next, we traced the last common ancestor of all winged insects to the Middle Silurian (423 million years ago, Ma), a timeline that closely parallels the emergence of tracheophytes (450.8-419.3 Ma), providing insights into the temporal dynamics of insect evolution. By delving into systematic discrepancies at controversial nodes and considering the impact of biological variability stemming from incomplete lineage sorting, our study enhances the fidelity of phylogenetic hypothesis evaluation. Consequently, our work not only contributes to the phylogenetic mapping of Insecta but also highlights the importance of sophisticated analytical methodologies for deciphering the complex narratives of evolutionary history. [Insecta; divergence time; concordance and discordance]
Files
Steps to reproduce
Data Collection We initially obtained genome assemblies from InsectBase 2.0 and the NCBI Genome Database (NCBI Resource Coordinators et al. 2018; Mei et al. 2022). To ensure the quality and comprehensiveness of gene sets for ortholog identification, we employed BUSCO v5.2.0 to assess genome completeness (Manni et al. 2021). For this purpose, we used the Insecta official ortholog sets from OrthoDB (Insecta_odb10.2020-09-10) (Zdobnov et al. 2021). We retained genome assemblies with BUSCO completeness scores greater than or equal to 80% for subsequent analysis. This resulted in 694 genome assemblies representing 414 genera from 146 families across 19 orders and two Diplura species to serve as outgroups (Table S1). Data Processing and Alignment We identified orthologous groups (OGs) using BUSCO v5.2.0 (Manni et al. 2021). To ensure a comprehensive representation of species, we specifically chose single-copy OG genes that exit in over 95% of the species (662 species), resulting in 698 single-copy OGs. Amino acid sequences for each OG were aligned with MAFFT v7.505 (Rozewicki et al. 2019) using the parameters "-localpair -maxiterate 1000". Subsequently, we trimmed each OG alignment using TrimAl v1.4 (Capella-Gutiérrez et al. 2009) with the settings “-resoverlap 0.5 -seqoverlap 50”. We employed IQ-TREE 2 (Minh et al. 2020b)with ModelFinder (Kalyaanamoorthy et al. 2017) to determine the optimal substitution model, and conducted 1000 ultrafast bootstrap replicates with UFBoot2 (Hoang et al. 2018), including 1000 corresponding branch support metrics via SH-aLRT (Anisimova et al. 2011) (Shimodaira-Hasegawa approximate likelihood ratio test), to obtain the maximum likelihood (ML) trees. As part of refining the dataset, we used the best-scoring ML trees for each gene. We then identified branches where the length exceeded 20% of the total tree length (i.e., the sum of all branch lengths). This relative branch length test is invaluable for detecting misaligned or misidentified orthologs within alignments, as it can reveal abnormally long branch lengths (Dos Reis et al. 2012; Springer and Gatesy 2018). We detected 15 ortholog alignments with at least one relative branch length exceeding 20% of the total tree length, and therefore excluded those specific ortholog alignments from further analysis.