Cis-regulatory mutations with driver hallmarks in major cancers

Published: 22 January 2021| Version 2 | DOI: 10.17632/4kx5sfx9vz.2
Contributor:
zhongshan cheng

Description

Jan-11-2021 Two types of datasets for each cancer type are provided as follows: Varscan called somatic mutation data: these are raw somatic mutations called with default Varscan parameters without any filtering. Gene-level ASE data: these are gene-level ASE based on RNA-seq data of tumor samples. In our paper, the above data were further filtered and performed association test between gene-level ASE and somatic mutation occurrence within different regulatory regions. See details in the section of method in our iScience paper: Zhongshan Cheng, Michael Vermeulen, Micheal Rollins-Green, Brian DeVeale, Tomas Babak. 2021. Cis-regulatory mutations with driver hallmarks in major cancers. iScience. Dataset Annoations: Headers for somatic mutation derived from Whole Genome Sequence (WGS) data using the software Varscan (dataset: Cancer_type_varscan_mutations.csv): chrom="the chromosome that the mutation is residing in" position="mutation position on the chromosome (hg19)" ref="reference allele for the mutation" var="mutated allele for the mutation" normal_reads1="sequence reads for the reference allele in normal WGS" normal_reads2="sequence reads for the mutated allele in normal WGS" normal_var_freq="variant allele frequency in normal WGS" normal_gt="normal genotype at this site" tumor_reads1="sequence reads for the reference allele in tumor WGS" tumor_reads2="sequence reads for the mutated allele in tumor WGS" tumor_var_freq="variant allele frequency in tumor WGS" tumor_gt="tumor genotype at this site" somatic_p_value="Varscan somatic mutation P value" gp="TCGA WGS sample ID" Header for gene-level ASE dataset, 'Cancer_type_gene_level_ASE.csv': transcript_id="assembled transcript ids for gene-level ASE" ASE_Reads_Hap1="RNA-seq read sum for phased haplotype 1" ASE_Read_Hap2="RNA-seq read sum for phased haplotype 2" SNP_Read_on_Hap1_2="For each SNP phased into two haplotypes, its allele reads on each haplotype" SNPs="SNPs phased into two haplotypes for the assembled transcript" SNP_Alleles="Two alleles of each SNP phased into each haplotype" TCGA_Sample_ID="TCGA RNA-seq ID" transcript_st="assembled transcript start position (hg19)" transcript_end="assembled transcript end position (hg19)" chr="chromosome information for assembled transcript"

Files

Steps to reproduce

Details for generating these allele specific expression and somatic mutations data can be found in the Method of our iScience paper. Zhongshan Cheng, Michael Vermeulen, Micheal Rollins-Green, Brian DeVeale, Tomas Babak. 2021. Cis-regulatory mutations with driver hallmarks in major cancers. iScience.