VarDB2.0 - a proteogenomics database

Published: 5 May 2022| Version 1 | DOI: 10.17632/kdfjcyz5z2.1
Contributor:
Yafeng Zhu

Description

This VarDB2.0 database contains variant peptide sequences from COSMIC mutation, non-synonymous SNPs, three frame translated lncRNA and Pseudogenes. The database is used in the following paper, please cite the paper if it contributes your work: Rong Xiang, Leyao Ma, Mingyu Yang, Zetian Zheng, XiaoFang Cheng, Fujian Jia, Fanfan Xie, Fuqiang Li, Kui Wu, Yafeng Zhu*. Increased expression of peptides from non-coding genes in cancer proteomics datasets suggests potential tumor neoantigens. Communications Biology. April 2021. File information: VarDB2.gtf - a GTF format contains genomics coordinates for corresponding transcripts included in VarDB2.0 VarDB2.zip - a zipped fasta format for VarDB2.fasta variant peptide sequences. CanProVar_peptide.fa.txt.zip - a zipped fasta format for variant peptides from CanProVar DB. Note: CanProVar_peptide.fa is already included in VarDB2.fasta. e.g. variant peptide from non-synonymous SNPs >CanProVar_rs6673641_ZMYM6_P134S_missense_15 CITRHSSPACLPPPSKK the last integer "_15" in the header indicates the position of substituted amino acid in the peptide sequence. "P134S" indicates the position of substituted amino acid in the protein sequence. e.g. variant peptide from COSMIC mutation >COSMIC:ENST00000342988:SMAD4:p.K519E:Substitution-Missense:4 the last integer "4" in the header indicates the position of substituted amino acid in the peptide sequence. "p.K519E" indicates the position of substituted amino acid in the protein sequence. e.g. variant peptide from three frame translated Pseudogenes. >PGOHUM_ENST00000359512.8_PAR_Y_WASH6P_RF2 "RF2" indicates it is translated from the seconding reading frame (starting from second nucleotide) of its transcript with stop codon allowed in the middle. e.g. variant peptide from three frame translated lncRNA. >lncRNA_lnc-MLKL-27:11_RF2 "RF2" indicates it is translated from the seconding reading frame (starting from second nucleotide) of its transcript with stop codon allowed in the middle.

Files

Institutions

Sun Yat-Sen University

Categories

Proteomics, Proteogenomics

Licence