Data for: Connecting MHC-I-binding motifs with HLA alleles via deep learning

Published: 10 September 2021| Version 3 | DOI: 10.17632/c249p8gdzd.3
Contributors:
,
,
,
,
,

Description

This dataset contains the research data supporting the study, “connecting MHC-I-binding motifs with HLA alleles via deep learning”. 1. MHCI_res182_seq.json: the peptide-binding cleft sequence of each MHC-I allele extracted from the IPD-IMGT/HLA database (version 3.41.0) 2. MHCI_res182_onehot.npy: the one-hot encoding of the peptide-binding cleft sequence of MHC-I allele 3. dataframe.tar.gz: this folder contains training, validation, and benchmark datasets [the common columns] 1. sequence: the peptide sequence 2. peptide_length: the length of peptide sequences 3. mhc: the MHC-I allele 4. meas: binding affinity for assay data 5. value: the value (between 0 and 1) converted from the binding affinity 6. bind: the label of binding 7. source: data source (assay, MS, random decoy, or peptide decoy) 1. train_hit.csv 1. data: measurements extracted from IEDB for the training process 2. columns 1. the common columns 2. MHCfovea: the prediction score of MHCfovea (used for ScoreCAM analysis) 2. train_decoy_{1-90}.csv 1. data: artificial decoy peptides for the training process; the data number of each file is almost equal to the number of eluted peptides 2. columns 1. the common columns 3. valid.csv 1. data: measurements extracted from IEDB and decoy peptides for validation 2. columns 1. the common columns 2. batch_size_{16, 32, 64}_and_learning_rate_{0.00100, 0.00010, 0.00001}: for optimizing hyperparameters of batch size and learning rate 3. DE_{1, 5, 10, 15, 30}_and_{30, 60, 90}: for optimizing the D-E ratio of the training and downsized dataset; the first number is the D-E ratio of the downsized dataset and the second number is the D-E ratio of the training dataset 4. benchmark.csv 1. data: eluted peptides extracted from IEDB and decoy peptides for the testing process 2. columns 1. the common columns 2. NetMHCpan4.1, MHCflurry2.0, MixMHCpred2.1, MHCfovea: the prediction score of each predictor 4. allele_expansion.tar.gz: this folder contains data for the allele expansion 1. peptides.csv: peptides used for allele expansion 2. output/{MHC-I group} 1. allele.json: alleles of the MHC-I group 2. motif.npy: binding motifs for each allele 3. prediction.npy: prediction score for each allele (the order is the same as that of peptides.csv) 5. MHCfovea-v1.0.0.zip: the source code of MHCfovea

Files