Genomic Discovery and Structure-Activity Exploration of a Novel Family of Enzyme-Activated Covalent Cyclin-Dependent Kinase Inhibitors

Published: 19 June 2024| Version 1 | DOI: 10.17632/rkc3k3zmby.1
Contributors:
Jack Davison, Michalis Hadjithomas, Stuart Romeril, Yoon Choi,
, John Biggins, Paola Castaldi, Nadia Chacko, Lawrence Chan, Jared Cumming, Thomas Downes, Eric Eisenhauer, Venkatesh Endalur Gopinarayanan, Fan Fei, Benjamin Fontaine, Srishti Gurnani, Audrey Hecht, Christopher Hosford, Ashraf Ibrahim, Annika Jagels, Camil Joubran, Ji-Nu Kim, John Lisher, Daniel Liu, James Lyles, Matteo Mannara, Gordon Murray, Emilia Musial, Mengyao Niu, Roberto Olivares-Amaya, Marielle Percuoco, Susanne Saalau, Kristen Sharpe, Anjali Sheahan, Neroshan Thevakumaran, James Thompson, Dawn Thompson, Aric Wiest, Stephen Wyka, Jason Yano, Gregory Verdine

Description

Phylogenetic sequences and tree files for the publication "Genomic Discovery and Structure-Activity Exploration of a Novel Family of Enzyme-Activated Covalent Cyclin-Dependent Kinase Inhibitors". Figure 2: Phylogenetic comparison of the candidate ros BGC ETaG protein from P. flanaganii with yeast and human CDK protein isoforms. Note that the ETaG is a second copy of and is most closely related to the corresponding housekeeping Pho85. Figure 4: Multiprotein species phylogenetic analysis of 100 fungal housekeeping proteins from 83 fungal strains that were fermented to identify better producers of roseopurpurins. Supplemental Figure 2: Maximum likelihood phylogenetic analysis of concatenated marker genes placed the ros production strain Penicillium sp. (LifeBase accession GN013094.0) in the P. restrictum clade of Penicillium sect. Excilicaulis.

Files

Steps to reproduce

To create the CDK tree (Figure 2), CDK protein sequences from humans (CDK1 – CDK20) and S. cerevisiae were imported from UniProt (Supplemental Table 5 or see table below). All sequences were used to identify CDK homologs in P. flanaganii using DIAMOND BlastP v.2.0.9 on default settings with an E-value cutoff of 1e-5. Kinase domains were extracted from each sequences then aligned using the L-INS-I parameter in MAFFT v .508 and subsequently trimmed using TrimAl v.1.4 with the --automated1 parameter. Maximum likelihood (ML) phylogenetic reconstruction was performed using I -TREE 2 v.2.2.0.3 (--seed 12345, 1000 bootstraps replicates). Sequence homologs in P. flanaganii that we placed outside of the human CDK clade were removed and the resulting tree was rooted at the midpoint. To create a fungal species tree (Figure 4), single copy protein markers were identified in our genomes using BUSCO v.2 and the Dikarya database. We selected a set of 100 complete protein marker sequences that were shared among all fungal genomes of interest and were used to create a multilocus phylogenetic species tree. Each marker protein set was individually aligned using MAFFT v .508 (–maxiterate 1000) and the resulting alignment was trimmed with TrimAl v.1.4 (--automated1). The trimmed alignments were then concatenated to create a super-alignment, which was used for maximum likelihood (ML) phylogenetic reconstruction with FastTreeMP v2.1.11 (–gamma parameter, 1000 bootstrap replicates). Tree was subsequently rooted to the midpoint. To create a Penicillium species tree (Supplemental figure 2), we imported the CaM, BenA, and RPB2 gene sequences for 24 Penicillium species in the sect. Excilicaulis (with a focus on the P. restrictum clade) from public sources (Supplementary Figure 2) and from our proprietary P. flanaganii genome. Datasets containing sequences from each gene were individually aligned using the L-INS-I parameter in MAFFT v .508 and subsequently trimmed using TrimAl v.1.4 (--automated1). The resulting trimmed alignments were then concatenated and used for maximum likelihood (ML) phylogenetic reconstruction using I -TREE 2 v.2.2.0.3 (K2P+G model, --seed 12345, 1000 bootstraps replicates). Tree was subsequently rooted to the midpoint.

Institutions

LifeMine Therapeutics

Categories

Phylogenetics

Licence