BigMHC Training and Evaluation Data

Published: 14 April 2023| Version 4 | DOI: 10.17632/dvmz6pkzvb.4
Contributors:
,
,
,
,
,
,

Description

All training and evaluation data used in the BigMHC study (https://doi.org/10.1101/2022.08.29.505690). All code is freely available at https://github.com/KarchinLab/bigmhc -------------------------------------------------------------------------------- CSV Columns mhc - MHC-I allele pep - peptide sequence if epitope data or mutated peptide sequence if neoepitope data tgt - target value of 1 (presented/immunogenic) or 0 (non-presented/non-immunogenic) manafest.csv columns also include wtp (wild-type peptide) and gene (the name of the mutated gene) pseudoseqs.csv columns include the MHC-I allele along with the index and amino acid of the aligned positions. All other columns are the outputs of MHC-I epitope presentation and immunogenicity predictors. -------------------------------------------------------------------------------- datasets.zip contains the curated training, validation, and testing datasets along with a summary of the number of negatives and positives for each allele in each dataset (summary.csv): - el_test.csv - epitope presentation evaluation data and all evaluated model predictions - el_train.csv - epitope presentation training data - el_val.csv - epitope presentation validation data - im_test.csv - immunogenicity transfer learning evaluation data and all model predictions - im_train.csv - immunogenicity transfer learning training data - im_val.csv - immunogenicity transfer learning validation data - iedb.csv - infectious disease epitope evaluation data and all model predictions - summary.csv - table of positives and negatives across each allele for each dataset - manafest.csv - neoepitope immunogenicity data validated using MANAFEST assays - pseudoseqs.csv - one-hot encoded MHC representations el.csv.zip contains the predictions of the BigMHC production models and all other methods on all EL data (train, val, test). The pMHCs were filtered so that all other methods can score them (e.g. peptide lengths 8-11). eltrainval_models.zip contains the the models used to evaluate BigMHC on el_test.csv (the production models can be found in the GitHub repository) -------------------------------------------------------------------------------- sha256 sums are below: datasets.zip - 0d152a452756cf2e0014ffced6afc25118e7c11cf1f626b26e49f50f79edffaa el.csv.zip - cb94b42406b96a3b13d941cf87dd43f4f53e9ebfbc3d1619f0e43327f1fb6395 eltrainval_models.zip - e8500173cb2afbe5f8e8c0ebc60785c0de6d91aab622f8b83fff3b4e65b43223 manafest.csv - accf19b8bb797ec842c3ee1b1ce1966feb35035df746f7a56b2994403ba1ad99 pseudoseqs.csv - cd1fa24fb4c9fc0ee592a3d753458c4e3abed0d5cc4ca76e325aa274df8e900a

Files

Institutions

Johns Hopkins Medicine, Johns Hopkins University

Categories

Immunology, Bioinformatics, Mass Spectrometry, Cancer, Immunoassay, Epitope, Major Histocompatibility Complex

Licence