BigMHC Training and Evaluation Data

Published: 19 January 2023| Version 2 | DOI: 10.17632/dvmz6pkzvb.2
Contributors:
,
,
,
,
,
,

Description

All training and evaluation data used in the BigMHC study (https://doi.org/10.1101/2022.08.29.505690). All code is freely available at https://github.com/KarchinLab/bigmhc -------------------------------------------------------------------------------- CSV Columns mhc - MHC-I allele pep - peptide sequence if epitope data or mutated peptide sequence if neoepitope data tgt - target value of 1 (presented/immunogenic) or 0 (non-presented/non-immunogenic) All other columns are the outputs of MHC-I epitope presentation and immunogenicity predictors. -------------------------------------------------------------------------------- datasets.zip contains: - el_test.csv - epitope presentation evaluation data - el_train.csv - epitope presentation training data - el_val.csv - epitope presentation validation data - im_test.csv - immunogenicity transfer learning evaluation data - im_train.csv - immunogenicity transfer learning training data - im_val.csv - immunogenicity transfer learning validation data - iedb.csv - infectious disease epitope evaluation data - summary.csv - table of positives and negatives across each allele for each dataset - pseudoseqs.csv - one-hot encoded MHC representations eltrainval_models.zip contains the the models used to evaluate BigMHC on el_test.csv (the production models can be found in the GitHub repository) -------------------------------------------------------------------------------- sha256 sums are below: datasets.zip - 33a51710aa55ff1af99944ee617fc9f5e2a5224f4dfd2df85dd36d6fe7f44c5d eltrainval_models.zip - e8500173cb2afbe5f8e8c0ebc60785c0de6d91aab622f8b83fff3b4e65b43223

Files

Institutions

Johns Hopkins Medicine, Johns Hopkins University

Categories

Immunology, Bioinformatics, Mass Spectrometry, Cancer, Immunoassay, Epitope, Major Histocompatibility Complex

Licence