MHCflurry 2.0: Improved pan-allele prediction of MHC Class I-presented peptides by incorporating antigen processing
Computational prediction of the peptides presented on MHC class I proteins is an important tool for studying T cell immunity. The data available to develop such predictors has expanded with the use of mass spec to identify naturally-presented MHC ligands. In addition to elucidating binding motifs, the identified ligands also reflect the antigen processing steps that occur prior to MHC binding. Here, we developed an integrated predictor of MHC I presentation that combines new models for MHC I binding and antigen processing. Considering only peptides first predicted by the binding model to bind strongly to MHC, the antigen processing model is trained to discriminate published mass spec-identified MHC I ligands from unobserved peptides. The integrated model outperformed the two individual components as well as NetMHCpan 4.0 and MixMHCpred 2.0.2 on held-out mass spec experiments. Our predictors are implemented in the open source MHCflurry package, version 2.0 (github.com/openvax/mhcflurry). See the paper at: https://doi.org/10.1016/j.cels.2020.06.010 Data S1. MULTIALLELIC benchmark dataset with predictions (CSV, gzipped). Data S2. MONOALLELIC benchmark dataset with predictions (CSV, gzipped). Data S3. Training data for MHCflurry 2.0 binding affinity (BA) predictor (CSV). Data S4. Training data for the variant of MHCflurry BA evaluated on the MONOALLELIC benchmark in Fig. S2 (CSV). Data S5. Training data for MHCflurry 2.0 antigen processing (AP) predictors (CSV). Data S6. Training data for MHCflurry 2.0 presentation score (PS) predictors (CSV).