Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA-Ligand interactions

Published: 21 May 2024| Version 3 | DOI: 10.17632/8yg88x7rdk.3
Contributors:
,
,
,
,

Description

This dataset accompanies the publication "Comparative Analysis of RNA 3D Structure Prediction Methods: Towards Enhanced Modeling of RNA-Ligand Interactions." Our study's primary objective was to evaluate the accuracy of various methods in modeling RNA structures, with a particular focus on RNA-small molecule complexes and ligand-binding sites. We scrutinized the performance of six RNA 3D structure prediction programs—DeepFoldRNA, RhoFold, BRiQ, FARFAR2, SimRNA, and Vfold2—using RNA sequences as a standard input across all methods. Methods like FARFAR2, SimRNA, and Vfold2 were examined both with and without the inclusion of secondary structure information. Notably, BRiQ requires secondary structure restraints for its operation and was, therefore, only run under these conditions. The dataset is meticulously organized into sub-directories named according to each method. For SimRNA, FARFAR2, and Vfold2, directories without secondary structure input maintain the method's name, whereas runs that included secondary structure information are denoted with an '_ss' suffix (e.g., SimRNA_ss). For instances where secondary structures were utilized, we employed ideal secondary structures derived from the reference structure, extracted using the x3dna-dssr program v1.9.10. All secondary structures were subject to manual inspection and refinement to address any anomalies introduced by x3dna-dssr, ensuring the highest fidelity in our modeling efforts. During the final stages of preparing this publication, AlphaFold 3 was released. To benchmark the performance of all ML-based methods (AlphaFold 3, DeepFoldRNA, and RhoFold), we developed two new datasets: Blind set 1 (B1) and Blind set 2 (B2) (see Supplementary Table S2).

Files

Steps to reproduce

For both DeepFoldRNA and RhoFold programs, we used only sequence as input for the modeling. The programs were set up locally along with the sequences from Rfam, RNACentral, and NCBI nucleotide databases. Both programs generate multiple sequence alignments to perform the prediction of secondary structures and other restraints used in the simulations. We also set up locally the external dependencies necessary to run the DeepFoldRNA program: PETfold, rMSA, SimRNA, QRNAS, and Spot-RNA-1D. The DeepFoldRNA program generated up to six models for each simulation while RhoFold produced a single model for each RNA. The SimRNA method was run with and without secondary structure restraints. For each RNA, the program was run with eight independent replica-exchange Monte-Carlo simulations starting with different random seeds, and with ten replicas per simulation. Each simulation was run for 16 million iterations. The simulations were performed and the lowest energy structures from the top three clusters (72) were used in further analysis. FARFAR2 was executed for one million cycles of Monte Carlo simulations, both with and without secondary structure restraints. It generated up to five representative models for benchmarking studies. For the BRiQ method, the simulations were only performed with secondary structure restraints. This method generated only one model per simulation and was used in further analysis. The Vfold2 pipeline was run with and without secondary structure restraints. Without secondary structures, Vfold2 internally utilizes Vfold-2D to generate multiple alternative secondary structures, each leading to ensembles of 3D models labeled as npk1, npk2, pk1, pk2, etc., where 'pk' and 'npk' denote secondary structures generated with and without PK prediction, respectively. Up to five models from these ensembles were selected for further analysis. With secondary structure restraints, Vfold2 generated a single ensemble of models, from which up to the first five models were chosen. For both B1 and B2 datasets, we ran the AlphaFold 3 server (http://alphafoldserver.com/) with default settings. Among the six methods tested, Vfold2 was the only one that failed to generate a model for some RNA molecules. The Vfold2 and Vfold2_ss generated models for 90 and 123 out of 139 cases, respectively.

Categories

RNA, RNA Structure, Structure Prediction

Licence