Genome-wide analysis of the Firmicutes illuminates the diderm/monoderm transition

Published: 30 October 2020| Version 1 | DOI: 10.17632/3pcn9779gc.1
Najwa TAIB


SUPPORTING DATA: Data from the manuscript of Taib et al., " Genome-wide analysis of the Firmicutes illuminates the diderm/monoderm transition” 1- UBA-FIRMICUTES/UBA_Firmicutes.faa Annotated proteomes of the 1,639 UBA Firmicutes from Parks et al., (2017) 2- DB_TaxIds.xls TaxId, names and taxonomy of the taxa used to build the four databanks used in the analyses. There are 4 sheets in the file: - Firmicutes DB LARGE: 1,869 Firmicutes taxa, both reference and UBAs. - Firmicutes DB SMALL: 316 Firmicutes taxa, both reference and UBAs. - DB BACTERIA: 358 bacterial taxa. - OUTGRP: 13 bacterial taxa used to root the reference tree of Firmicutes in Figure 1. 3- OM_ProteinIDsConcatenations.xlsx Accession numbers of the OM proteins used to build the trees in Figure 5 and Supplementary Figure 4 4- spo0A-vs-DBSMALL.xls Accession numbers of the spo0A domain homologues identified in Firmicutes DB SMALL and mapped in Supplementary Figure 1. 5- 15964FAMILIES/ The folder contains the protein families based on an 80% coverage-35% identity cutoff, and present in at least 5 Firmicutes. For each family, two files are available: a fasta file of the sequences included in the family and a table with the taxonomy of each protein sequence. 6- 3500HCLUSTERS/ Tables corresponding to each of the 3500 clusters generated by the hierarchical clustering approach. Each file contains the families belonging to the cluster with their annotation and taxonomic distribution. 7- PFAMDOMAINS/ Three folders corresponding to the three approaches used for pfam domains annotation: ALL; COLLAPSED and SINGLE. Each folder contains the fasta files and the tables with protein accession numbers, taxids and the taxonomy of the proteins where the domains were identified. 8- TREES/ Data related to the five phylogenies presented. For each tree, there is the corresponding alignment (.fasta), the supermatrix (.phy), the newick file (.treefile), and for three trees the corresponding pdf. 9-OMCluster_AccNum.xls Accession numbers of the proteins represented in the OM cluster figure (Figure 4)