APPDFT:An antigen prediction method based on data fusion and transformer

Published: 21 July 2022| Version 1 | DOI: 10.17632/fwxg5mgntn.1
Juntao Deng


Identifying immunogenic MHC ligands is paramount for medical treatment. Here we present all training and testing data adopted by APPDFT. Quantitative comparison results demonstrate that APPDFT significantly (P<0.00005) outperforms NetMHCpan4.1 on the independent antigen presentation prediction testing dataset. When applied to immunogenicity prediction, APPDFT also achieves state-of-the-art performance on testing datasets of neoantigens and Sars-Cov2. The source code and trained models are available at our github repository ( Training data: 1. ELSA_full_trainingData: single allelic eluted ligand data obtained from NetMHCpan4.1 2. IM_full_trainingData: immunogenicity data obtained from DeepHLApan 3. ELIM_full_trainingData: fused ELSA_full_trainingData and IM_full_trainingData 4. BA_full_trainingData: binding affinity data obtained from NetMHCpan4.1 5. fivefold_val_flags(ELSA): 5-fold-cross-validation for baseline-EL 6. fivefold_val_flags(ELIM): 5-fold-cross-validation for APPDFT and baseline-IG Testing data: 1. EL_full_testingData: independent eluted ligand data obtained from NetMHCpan4.1 2. IM_full_testingData: independent immunogenicity data obtained from DeepHLApan 3. TESLA_full_testingData: neoantigen data obtained from DeepImmuno 4. SarsCov2-con/un_full_testingData: SarsCov2 data obtained from DeepImmuno Other: pseudoSequence(ELIM): pseudo sequences for MHC-I proteins appeared in training and testing data.



Tsinghua University


Immunology, Bioinformatics, Immunogenicity, Antigen Presentation, Major Histocompatibility Complex