MHCflurry: open-source class I MHC binding affinity prediction

Published: 03-09-2018| Version 2 | DOI: 10.17632/8pz43nvvxh.2
Timothy O'Donnell


Predicting the binding affinity of major histocompatibility complex I (MHC I) proteins and their peptide ligands is important for vaccine design. We introduce an open-source package for MHC I binding prediction, MHCflurry. The software implements allele-specific neural networks that use a novel architecture and peptide encoding scheme. When trained on affinity measurements, MHCflurry outperformed the standard predictors NetMHC 4.0 and NetMHCpan 3.0 overall and particularly on non-9-mer peptides in a benchmark of ligands identified by mass spectrometry. The released predictor, MHCflurry 1.2.0, uses mass spectrometry datasets for model selection and showed competitive accuracy with standard tools, including the recently released NetMHCpan 4.0, on a small benchmark of affinity measurements. MHCflurry’s prediction speed exceeded 7,000 predictions per second, 396 times faster than NetMHCpan 4.0. MHCflurry is freely available to use, retrain, or extend, includes Python library and command line interfaces, may be installed using package managers, and applies software development best practices. DEPOSITED DATA ---------------------- * Curated training and model selection dataset: data_curated.20180219.tar.bz2 , derived from IEDB and other sources * MS benchmark dataset: abelin_peptides.mhcflurry_no_mass_spec.csv.bz2 , derived from Abelin et al. Immunity 2017. * MHCflurry 1.2.0 models: models_class1.20180225.tar.bz2 * MHCflurry (no MS) models: models_class1_selected_no_mass_spec.20180225.tar.bz2 * MHCflurry (train-MS) models: models_class1_trained_with_mass_spec.20180228.tar.bz2 See also: UPDATES ---------------------- * Version 2: The abelin_peptides.all_predictions.csv.bz2 given in Version 1 corresponded to an earlier preprint of the MHCflurry paper, not the final version published in Cell Systems. This file has been replaced with abelin_peptides.mhcflurry_no_mass_spec.csv.bz2, which gives the predictions analyzed in the Cell Systems paper.