Protein data

Published: 5 April 2024| Version 1 | DOI: 10.17632/6rnzgxxrzt.1
Contributor:
Lea Bou Dagher

Description

This folder contains 10 directories, one for each taxon: Bacillales, Bacteroidales, Corynebacteriales, Enterobacterales, Escherichia, Hyphomicrobiales, Methanococcales,Pseudomonadales, Sulfolobales, Thermococcales. Each directory contains 2 sub-directories: 1. Protein family data • Each protein family is named according to the PDB ID and the RefSeq Prot ID (PDBID_ProtID) of the protein used as seed to assemble the protein family. • For each protein family the following files are provided: • PDBID_ProtID.pdb file corresponds to the structure from the PDB of the protein used as seed to build protein families, • PDBID_ProtID.fst file contains the sequence of the protein used as seed and its homologues, • PDBID_ProtID.mafft file corresponds to the multiple alignment of the protein family obtained with MAFFT, • PDBID_ProtID.mafft.BMGE45.fst file corresponds to the multiple alignment trimmed with BMGE, • PDBID_ProtID.mafft.BMGE45.fst.treefile file corresponds to the maximum likelihood tree of the family inferred with IQ-TREE. 2. Alphafold2 predictions • Each protein family is named according to the PDB ID and the RefSeq Prot ID (PDBID_ProtID) of the protein used as seed to assemble the protein family. • Each protein family file contains a number of sub-directories equal to the number of sequences in the protein family. The name of each such sub-directory is the number of the sequence in order of appearance in the PDBID_ProtID.fst file. • Each sub-directory contains the ranked_0.pdb file corresponding to the best structure predicted by Alphafold2.

Files

Categories

Protein, Protein Structure

Licence