Compilation and Alignment of Eukaryotic Arginyl-tRNA Synthetases

Published: 18 August 2025| Version 4 | DOI: 10.17632/ts4jbw9nft.4
Contributor:
Gabor Igloi

Description

The compilation of eukaryotic arginyl-tRNA synthetase sequences was last updated in 2022. In the following years, the accumulated sequence data of this house-keeping gene product has permitted the manually curated addition of more than 3600 protein sequences to this collection. In a previous report, the molecular distinction between fungal and plant arginyl-tRNA synthetase was used to reveal fungal contamination in the plant transcriptome database (Igloi 2019). Unfortunately, although this observation referred to only a handful of examples, one cannot escape the realization that such, and similar, errors may be perpetuated, since the entries have not been corrected. Propagation, detection and correction of errors using the sequence database network is likely to be exacerbated with its exponential growth. Despite its modest size, this updated compilation of data from Arthropod, other Metazoan, non-Metazoan organisms, in addition to green plants, nevertheless, now permits a robust multiple protein sequence alignment and derived phylogenetic analysis. Phylogenetic clustering is sufficiently tight within taxonomic orders and classes to permit doubts concerning the identity of organisms that miscluster to be raised. The details of their source, feeding habits, parasites, predators as well as annotations of the sequence submitted to the database can then be examined more closely to speculate about misidentification, sample contamination or misannotation. The compilation has been updated by BLASTN searches of the GenBank TSA databases for Arthropoda, Other Metazoans, Non-Metazoans and Green Plants (Embryophyta) (as of mid-2024). An update of Chordata and Fungi is no longer within the scope of manual curation. Following multiple sequence alignment, they were examined visually. In view of the established N-terminal sequence variation (Igloi 2020), N termini were not modified. Conspicuous internal insertions (rather than ones that are species-specific and confirmed by alignment) were apparent but not examined further. Their occurrence could be due to conceptually mis-spliced genomic data. Sequences that were more than approximately 10% C-terminally incomplete were removed from the alignment. The derived phylogenetic trees were visually examined and obvious misclustering was investigated further by BLASTP searches in GenBank protein database with the suspect sequence as query. Entries for Arthropoda, Metazoans and Non-Metazoans that were shown to be contaminants, misidentifications or of other questionable origin have been retained in the compilation and have also been grouped separately. Entries with evident contamination of the Plant TSA database [(as determined previously, (Igloi 2019)] have been removed from the compilation and been appended separately. Igloi GL (2019) Plant Syst Evol 305:563–568. doi: 10.1007/s00606-019-01586-2 Igloi GL (2020) Gene Reports 20:100778. doi: 10.1016/j.genrep.2020.100778

Files

Categories

Molecular Biology, Evolutionary Biology, Amino Acyl tRNA Synthesis

Licence