Mi3-GPU: MCMC-based inverse Ising inference on GPUs for protein covariation analysis

Published: 07-05-2020| Version 1 | DOI: 10.17632/ftbcfy2p35.1
Allan Haldane,
Ronald M. Levy


Inverse Ising inference is a method for inferring the coupling parameters of a Potts/Ising model based on observed site-covariation, which has found important applications in protein physics for detecting interactions between residues in protein families. We introduce Mi3-GPU (“mee-three”, for MCMC Inverse Ising Inference) software for solving the inverse Ising problem for protein-sequence datasets with few analytic approximations, by parallel Markov-Chain Monte Carlo sampling on GPUs. We also provide tools for analysis and preparation of protein-family Multiple Sequence Alignments (MSAs) to account for finite-sampling issues, which are a major source of error or bias in inverse Ising inference. Our method is “generative” in the sense that the inferred model can be used to generate synthetic MSAs whose mutational statistics (marginals) can be verified to match the dataset MSA statistics up to the limits imposed by the effects of finite sampling. Our GPU implementation enables the construction of models which reproduce the covariation patterns of the observed MSA with a precision that is not possible with more approximate methods. The main components of our method are a GPU-optimized algorithm to greatly accelerate MCMC sampling, combined with a multi-step Quasi-Newton parameter-update scheme using a “Zwanzig reweighting” technique. We demonstrate the ability of this software to produce generative models on typical protein family datasets for sequence lengths L ~ 300 with 21 residue types with tens of millions of inferred parameters in short running times.