BigMHC Training and Evaluation Data

Published: 20 January 2023| Version 3 | DOI: 10.17632/dvmz6pkzvb.3
Contributors:
,
,
,
,
,
,

Description

All training and evaluation data used in the BigMHC study (https://doi.org/10.1101/2022.08.29.505690). All code is freely available at https://github.com/KarchinLab/bigmhc -------------------------------------------------------------------------------- CSV Columns mhc - MHC-I allele pep - peptide sequence if epitope data or mutated peptide sequence if neoepitope data tgt - target value of 1 (presented/immunogenic) or 0 (non-presented/non-immunogenic) manafest.csv columns also include wtp (wild-type peptide) and gene (the name of the mutated gene) pseudoseqs.csv columns include the MHC-I allele along with the index and amino acid of the aligned positions. All other columns are the outputs of MHC-I epitope presentation and immunogenicity predictors. -------------------------------------------------------------------------------- datasets.zip contains the curated training, validation, and testing datasets along with a summary of the number of negatives and positives for each allele in each dataset (summary.csv): - el_test.csv - epitope presentation evaluation data and all model predictions - el_train.csv - epitope presentation training data - el_val.csv - epitope presentation validation data - im_test.csv - immunogenicity transfer learning evaluation data and all model predictions - im_train.csv - immunogenicity transfer learning training data - im_val.csv - immunogenicity transfer learning validation data - iedb.csv - infectious disease epitope evaluation data and all model predictions - summary.csv - table of positives and negatives across each allele for each dataset - manafest.csv - neoepitope immunogenicity data validated using MANAFEST assays - pseudoseqs.csv - one-hot encoded MHC representations eltrainval_models.zip contains the the models used to evaluate BigMHC on el_test.csv (the production models can be found in the GitHub repository) -------------------------------------------------------------------------------- sha256 sums are below: datasets.zip - 0d152a452756cf2e0014ffced6afc25118e7c11cf1f626b26e49f50f79edffaa eltrainval_models.zip - e8500173cb2afbe5f8e8c0ebc60785c0de6d91aab622f8b83fff3b4e65b43223 manafest.csv - accf19b8bb797ec842c3ee1b1ce1966feb35035df746f7a56b2994403ba1ad99 pseudoseqs.csv - cd1fa24fb4c9fc0ee592a3d753458c4e3abed0d5cc4ca76e325aa274df8e900a

Files

Institutions

Johns Hopkins Medicine, Johns Hopkins University

Categories

Immunology, Bioinformatics, Mass Spectrometry, Cancer, Immunoassay, Epitope, Major Histocompatibility Complex

License