The Influence of Transcript Assembly on Proteogenomics Discovery of Microproteins

Published: 29 January 2018| Version 1 | DOI: 10.17632/sjbnjr7brz.1
Max Shokhirev


Supplementary dataset for "The Influence of Transcript Assembly on Proteogenomics Discovery of Microproteins" This dataset contains paired RNA-Seq reads simulated with flux-simulator in fastq.gz format. In addition, the flux-simulator parameter file is included as hg19.par. These are located in the flux_simulator folder. The reads were generated from the human hg19 genome in order to test transcript assembly. The hg19 refseq annotation was used to define genes (see hg19_refseq.gtf). The hg19 chromosome sequence files (e.g. chr1.fa) are also included for completeness. These are located in the hg19 folder.


Steps to reproduce

Please download and install flux-simulator (V1.2.1 with Flux Library 1.22) and then run it with the supplied hg19 parameter file and using the hg19 genomics sequence and annotation. Due to the stochastic nature of the flux-simulator read generation process, reads generated should have similar distributions to the ones included and used for testing but may vary on a gene-by-gene basis. Also, please change the GEN_DIR in hg19.par to point to the directory containing the hg19 sequence and hg19_refseq.gtf file.


Salk Institute for Biological Studies


Genomics, Proteogenomics