BigMHC Training and Evaluation Data
Description
All training and evaluation data used in the BigMHC study (https://doi.org/10.1101/2022.08.29.505690). All code is freely available at https://github.com/KarchinLab/bigmhc -------------------------------------------------------------------------------- CSV Columns mhc - MHC-I allele pep - peptide sequence if epitope data or mutated peptide sequence if neoepitope data tgt - target value of 1 (presented/immunogenic) or 0 (non-presented/non-immunogenic) All other columns are the outputs of MHC-I epitope presentation and immunogenicity predictors. -------------------------------------------------------------------------------- datasets.zip contains: - el_test.csv - epitope presentation evaluation data - el_train.csv - epitope presentation training data - el_val.csv - epitope presentation validation data - im_test.csv - immunogenicity transfer learning evaluation data - im_train.csv - immunogenicity transfer learning training data - im_val.csv - immunogenicity transfer learning validation data - iedb.csv - infectious disease epitope evaluation data - summary.csv - table of positives and negatives across each allele for each dataset - pseudoseqs.csv - one-hot encoded MHC representations eltrainval_models.zip contains the the models used to evaluate BigMHC on el_test.csv (the production models can be found in the GitHub repository) -------------------------------------------------------------------------------- sha256 sums are below: datasets.zip - 33a51710aa55ff1af99944ee617fc9f5e2a5224f4dfd2df85dd36d6fe7f44c5d eltrainval_models.zip - e8500173cb2afbe5f8e8c0ebc60785c0de6d91aab622f8b83fff3b4e65b43223