APPDFT：An antigen prediction method based on data fusion and transformer
Identifying immunogenic MHC ligands is paramount for medical treatment. Here we present all training and testing data adopted by APPDFT. Quantitative comparison results demonstrate that APPDFT significantly (P<0.00005) outperforms NetMHCpan4.1 on the independent antigen presentation prediction testing dataset. When applied to immunogenicity prediction, APPDFT also achieves state-of-the-art performance on testing datasets of neoantigens and Sars-Cov2. The source code and trained models are available at our github repository (github.com/ddd9898/APPDFT). Training data: 1. ELSA_full_trainingData: single allelic eluted ligand data obtained from NetMHCpan4.1 2. IM_full_trainingData: immunogenicity data obtained from DeepHLApan 3. ELIM_full_trainingData: fused ELSA_full_trainingData and IM_full_trainingData 4. BA_full_trainingData: binding affinity data obtained from NetMHCpan4.1 5. fivefold_val_flags(ELSA): 5-fold-cross-validation for baseline-EL 6. fivefold_val_flags(ELIM): 5-fold-cross-validation for APPDFT and baseline-IG Testing data: 1. EL_full_testingData: independent eluted ligand data obtained from NetMHCpan4.1 2. IM_full_testingData: independent immunogenicity data obtained from DeepHLApan 3. TESLA_full_testingData: neoantigen data obtained from DeepImmuno 4. SarsCov2-con/un_full_testingData: SarsCov2 data obtained from DeepImmuno Other: pseudoSequence(ELIM): pseudo sequences for MHC-I proteins appeared in training and testing data.