Ancient origin and constrained evolution of the division and cell wall (dcw) gene cluster across Bacteria

Published: 30 August 2022| Version 1 | DOI: 10.17632/4y5mzppzmb.1
Contributors:
Daniela Megrian, Najwa Taib, A Jaffe, Jillian F. Banfield, Simonetta Gribaldo

Description

SUPPORTING DATA: Data from the manuscript of Megrian et al., "Ancient origin and constrained evolution of the division and cell wall (dcw) gene cluster across Bacteria” - CONCAT/ - concat_12DCW.treefile Phylogeny presented in Figure 6 and Supplementary Figure 14, in newick format. - concat_12DCW_collapsed.treefile Same phylogeny as concat_12DCW.treefile, but with phyla collapsed into a single branch. - concat_12DCCW.aln Concatenation alignment used to reconstruct the phylogeny. Corresponds to the concatenation of 12 dcw cluster proteins. - RENAME_CONCAT.txt Annotation file to rename the labels of the newick file on iTol (https://itol.embl.de) - cleaned_trimmed_single_alignments/ Directory containing cleaned and trimmed single alignments used for the concatenation. - PARALOGS/ - *.treefile Phylogenies presented in Figure 4 and Supplementary Figures 2, 3 and 4, in newick format. - *.trim Trimmed alignments used to reconstruct the phylogenies. - *.aln Alignments used to reconstruct the phylogenies (before trimming). - *.fasta Sequences aligned to reconstruct the phylogenies. - PastML/ - pastml_raw_output.tab Output of PastML inferences. Columns correspond to contiguous pairs of dcw cluster genes. Rows correspond to node names in the reference phylogeny. 0 refers to absence of the pair in the correspoding node, 1 refers to presence. - pastml_ref_tree.treefile Reference phylogeny used for the inference. Node names are indicated. - SGT_before_cleaning - *_raw.treefile Single gene phylogenies obtained after the homology searches, before cleaning. - *_raw.trim Trimmed alignments used to reconstruct the phylogenies. - *_raw.aln Alignments used to reconstruct the phylogenies (before trimming). - *_raw.fasta Sequences aligned to reconstruct the phylogenies. - RENAME_SGT.txt Annotation file to rename the labels of the newick files on iTol (https://itol.embl.de) - OTHER_REF_TREES - CORE Contains CONCAT and SGT_before_cleaning data, based on 63 core genome markers. - RNAPOL_IF2 Contains CONCAT and SGT_before_cleaning data, based on RNApol+IF2 markers. - RPROTS Contains CONCAT and SGT_before_cleaning data, based on 16rprot markers. - CPR - CPR.treefile Phylogeny presented in Figure 5. Obtained from a supermatrix that contains 302 sequences (27 chloroflexi + 275 CPRs), and 2126 amino acid positons. The concatenated proteins are: MurG, MurF, MurE, MurD, MurC, MraZ, MraY, MraW, FtsZ, FtsW, FtsI, FtsA. IQ-TREE v2.3.1 was used to infer the ML tree. Best-fit model: LG+F+R10 chosen according to BIC. Ultrafast bootstrap 1000. - TREES_CLEANING - cleanTrees.R R script used for cleaning phylogenies. - SAMPLE_FILES Files needed to run the script.

Files

Institutions

University of California Berkeley, Institut Pasteur, Sorbonne Universite

Categories

Microbiology, Evolutionary Biology, Bioinformatics, Bacteria, Cell Wall, Division

Licence