Mi3-GPU: MCMC-based inverse Ising inference on GPUs for protein covariation analysis

Published: 7 May 2020 | Version 1 | DOI: 10.17632/ftbcfy2p35.1
Contributor(s):

Description of this data

Inverse Ising inference is a method for inferring the coupling parameters of a Potts/Ising model based on observed site-covariation, which has found important applications in protein physics for detecting interactions between residues in protein families. We introduce Mi3-GPU (“mee-three”, for MCMC Inverse Ising Inference) software for solving the inverse Ising problem for protein-sequence datasets with few analytic approximations, by parallel Markov-Chain Monte Carlo sampling on GPUs. We also provide tools for analysis and preparation of protein-family Multiple Sequence Alignments (MSAs) to account for finite-sampling issues, which are a major source of error or bias in inverse Ising inference. Our method is “generative” in the sense that the inferred model can be used to generate synthetic MSAs whose mutational statistics (marginals) can be verified to match the dataset MSA statistics up to the limits imposed by the effects of finite sampling. Our GPU implementation enables the construction of models which reproduce the covariation patterns of the observed MSA with a precision that is not possible with more approximate methods. The main components of our method are a GPU-optimized algorithm to greatly accelerate MCMC sampling, combined with a multi-step Quasi-Newton parameter-update scheme using a “Zwanzig reweighting” technique. We demonstrate the ability of this software to produce generative models on typical protein family datasets for sequence lengths L ~ 300 with 21 residue types with tens of millions of inferred parameters in short running times.

Experiment data files

This data is associated with the following publication:

Mi3-GPU: MCMC-based inverse Ising inference on GPUs for protein covariation analysis

Published in: Computer Physics Communications

Latest version

  • Version 1

    2020-05-07

    Published: 2020-05-07

    DOI: 10.17632/ftbcfy2p35.1

    Cite this dataset

    Haldane, Allan; Levy, Ronald M. (2020), “Mi3-GPU: MCMC-based inverse Ising inference on GPUs for protein covariation analysis”, Mendeley Data, v1 http://dx.doi.org/10.17632/ftbcfy2p35.1

Statistics

Views: 123
Downloads: 5

Categories

Computational Physics, Protein Evolution

Licence

GPLv3 Learn more

The files associated with this dataset are licensed under a GNU Public License Version 3 licence.

What does this mean?
The GNU General Public License is a free, copyleft license for software and other kinds of works.

Report