Mesothelioma survival prediction based on a six-gene transcriptomic signature
Description
Background: Mesothelioma is an aggressive, fatal cancer that is inextricably linked to asbestos exposure. Recent trials using a combination of the immune checkpoint inhibitors ipilimumab and nivolumab has significantly improved treatment outcomes, however durable treatment responses remain restricted to a subset of patients (15-20%), highlighting the need to identify strategies that better predict treatment response. Method: We performed RNAseq on a large tumor biobank (n=167) from genetically diverse mouse model, CC-MexTAg model to compare gene expression profiles of tumors from mice with different overall survival to develop a prognostic gene signature. Results: while the variation in gene expression data of tumors did not associate with 3-fold variation in overall survival of CC-MexTAg mice, we identified two distinct tumor clusters characterized with immune and non-immune phenotypes, in which immune cluster tumours showed the better potential of response to cancer therapies. We used 20 hub genes associated with this tumor phenotype to develop a 6-gene signature that could predict survival in four independent mesothelioma datasets (Bueno, NCI, TCGA and Creaney) and showed a potential to respond to cancer immunotherapy. Here, the shared data include R markdown files to perform Gene set enrichment analysis (GSEA), CIBERSORT and WGCNA on RNAseq data from CCMT mouse model (CCMT data analysis_part 1 and 2). Folder (Gene_signature_development_validation_part 3) include the R markdown file for developing and validating the 6-gene signature via interrogating five independent human mesothelioma datasets.
Files
Steps to reproduce
The sequence FASTQ files (stored in GSE232512 repository) were aligned against the mouse reference genome (GRCm38) using Kallisto, generating TSV and h5 format files that can be used by Tximport package to create a countdata object in Rstudio. The analysis has three main parts that are needed to be done in the exact order. Count data and clinical data tables from human mesothelioma datasets cannot be uploaded in this repository, but datasets can be acquired upon the request from the owner of each datasets. Part 1: To reproduce the results from part 1, the R markdown codes must be simply run in Rstudio software in the same folder location (CCMT data analysis folder) as the input files. Human gene signature list and Tilsed et al responders plus ZEMEK et al responders gene list can be acquired upon the request from the authors of the papers. To perform CIBERSORT analysis, cell subset matrix_input and the normalised count data must be uploaded into the CIBERSORT website: https://cibersortx.stanford.edu/. Part 2: The R markdown codes uploaded in the CCMT data analysis folder can reproduce the results of WGCNA. CytoHubba plugin from cytoscape and STRING database are needed for hub gene identification. Part 3: Hub genes identified from the previous part are needed to be used to run the R markdown codes uploaded in Gene_signature_development_validation_part 3. This will lead to a 6-gene signature. Human datasets in a raw count data format were used to validate the 6-gene signature. Countdata must be normalised and VST-transformed using DESeq2 package libraries.