Large database for the analysis and prediction of spliced and non-spliced peptide generation by proteasomes

Published: 14-04-2020| Version 1 | DOI: 10.17632/nr7cs764rc.1
michele mishto,
Gerd Specht,
Juliane Liepe,
Hanna Roetschke


The following dataset is linked to the paper “Large database for the analysis and prediction of spliced and non-spliced peptide generation by proteasomes” by Specht et al. published by Scientific Data in 2020. The Mendeley dataset contains three files: - ProteasomeDB.csv. This database is the core of the manuscript . It reports cis and trans spliced as well as non-spliced peptide products identified in the in vitro digestions of 55 synthetic substrates with different proteasome isoforms and conditions. Our database accounts for 22,333 product sequences (7,305 non-spliced, 7,323 cis spliced and 7,705 trans spliced product sequences), which are defined as peptide product sequences unique within each sample. Several product sequences are identified in more than one sample of the same substrate. Therefore, the number of unique peptide sequence, i.e. peptide sequences unique within each substrate, is smaller than the number of product sequences. In particular, our database contains 14,433 unique peptide sequences (3,834 non-spliced, 5,011 cis spliced and 5,588 trans spliced unique peptide sequences). The samples were measured by three different mass spectrometers in two independent proteomics centres. The related mass spectrometry files have been deposited to the PRIDE repository with the dataset identifier PXD016782. - proteasomeDB_sql_script.txt, which is the sql script to create the database. - ProteasomeDB.sql, which has been created by using the command mysqldump. It contains the the entire database (including table, schema, script) in sql format. For any further query, please contact directly: - Juliane Liepe, Max Planck Institute for Biophysical Chemistry Email: - Michele Mishto, King's College London