Evolutionary study of Metalloproteinases

Published: 28 June 2023| Version 1 | DOI: 10.17632/8rgh56jjyy.1
Contributors:
simone oliveira,
,
,
,
,

Description

This dataset is the product of the article entitled Metalloproteinases in restorative dentistry: an in silico study for an ideal animal model. Abstract: Dentin degradation, providing restorative procedures with greater longevity. Among MMPs, collagenases and gelatinases are intrinsic constituents of the fibrillar network of the organic matrix of human dentin and are the most abundant MMPs in this tissue. In this study, we were able to obtain 176,077 sequences of mammalian MMPs from the UNIPROT database. After data curation, 3,178 sequences were aligned and used in phylogenetic reconstruction to search for the model organism evolutionarily closest to humans. We were also able to annotate (n=235) and re-annotate (n=27) several MMP sequences. In our results we were able to infer the most appropriate model organisms for studies in restorative dentistry for collagenases and gelatinases.

Files

Steps to reproduce

Mammalian MMPs sequences were retrieved from the Uniprot database using the search term based on (Kapoor et al. 2016) and listed below: (MMP OR collagenase OR matrix OR metalloproteinase OR metallopeptidase OR (interstitial AND collagenase) OR (neutrophil AND collagenase) OR stromelysin OR metalloelastase OR gelatinase OR matrilysin OR MT-MMP OR (Macrophage AND metalloelastase)) AND (taxonomy_id:40674) An Archaeal metalloproteinase sequence was chosen as an outgroup in order to aid in adequate MMP phylogenetic tree construction. A complete interstitial collagenase sequence was selected and downloaded from the Uniprot database as well. Curated sequences The mammalian MMPs sequences downloaded from Uniprot databases were filtered and pre-processed before the construction of the phylogenetic tree multifasta file. Such tasks are described below and comprehend several bioinformatics activities. Sequence size filter We generated a preliminary report with the sequences’ basic statistics (minimum length, maximum length, average length) with an in-house script. After such, we kept the sequences which had an average length +/- 50% in order to remove the discrepant sequences that could influence the phylogenetic tree construction. The RPSBlast program was used in order to identify each sequence's conserved domains. The Conserved Domains Database was used as it is a curated and publicly available domains database. An in-house script was developed in order to search for sequences that have at least three of these specific domains: matrixin, hemopexin, pg-binding, fibronectin The remaining sequences were submitted to the CD-HIT program so that we could remove duplicated sequences and generate a mammalian MMPs multifasta file. MSA alignment and phylogenetic tree construction The previously generated mammalian multifasta sequences file were aligned with Mafft using standard parameters and later used as an input for FastTree2 program for phylogenetic reconstruction with a 2,000 bootstrap. The ITOL software was used to identify the MMP classes through different colors. We retrieved each of the available functional MMPs classes annotations. The MMPs sequences that appeared in clades with different colors were manually re-annotated. Conserved Domains Tree A manual cladogram was constructed to mirror the same relationships between each MMP. A sequence with complete domains was chosen for each MMP and was submitted to Interpro to locate each domain’s adequate position. We used only the original sequence’s annotations in order to infer the model organisms’s phylogenetic tree. Specifically, we were interested and restricted our MMPs classes scope to collagenases (MMP-1, MMP-8 and MMP-13) and gelatinases (MMP-2 and MMP-9) only. The phylogeny was reconstructed with Mafft as the aligner and FastTree 2 for the tree construction. The following organisms were used: Homo sapiens, Bos taurus, Mus musculus and Rattus norvegicus.

Institutions

Instituto Oswaldo Cruz, Universidade do Estado do Rio de Janeiro, Fundacao Oswaldo Cruz, Universidade Estadual de Campinas Faculdade de Odontologia de Piracicaba

Categories

Bioinformatics, Dental Restorative Material

Funding

Fundação de Amparo à Pesquisa do Estado de São Paulo

2019/20576-0

Licence