Integrative analysis using molecular profiles (CHIP)

Published: 26 September 2016| Version 2 | DOI: 10.17632/xdz2kv8zzb.2
Contributors:
MJN Jones,
Sandy Miller

Description

Integrated genomic profiling of 259 men with prostate cancer Cambridge discovery cohort and Stockholm validation cohort. 100-feature gene set that reliably differentiates five subgroups (iClusters) of prostate cancer. Using the CHIP protocol

Files

Steps to reproduce

Previous studies have shown that most heritable gene expression traits are predominantly controlled by cis-acting proximal loci (< 1 Mb), and that these signals are consistently more abundant and stable than more distal trans effects (Curtis et al., 2012). For an integrative analysis – combining copy number and gene expression data – we selected features displaying linear correlations between CN state and local transcript expression levels, to identify genome-wide expression quantitative trait loci (eQTL) in the discovery data set. These eQTL features were used in a joint latent variable framework for integrative analysis (iClusterPlus (Mo et al., 2013); see Methods), which identified five distinct molecular subtypes (iCluster1–5) in the Cambridge cohort with characteristic copy number and gene expression profiles (Fig. 1). These were driven by a core set of 100 genes that had both CN and mRNA level changes. We confirmed this by comparing the results for alternative numbers of clusters (2–11) and features (100 to 1000). These five clusters (k = 4; 100 features) describe 60% of the total observed variance. These same 100 gene features were used to train a classifier, and partition the Stockholm data set into five patient subtypes with characteristic profiles, similar to those described in the discovery cohort. We assessed transcript and copy number levels for these 100 classifying genes for both the discovery and validation cohorts (Fig. 2). There was clear consistency of expression and copy number aberrations in trained clusters with the exception of subsets of genes in iCluster 2 and 5, which displayed marked copy number amplification in the Stockholm cohort. There was consistent copy number loss and downregulation of expression of genes on chromosome 8 (e.g. MTMR9, LSM1 and ER1) in two particular subgroups iCluster 1 and 3, while iCluster 3 was characterised uniquely by copy number gain and upregulation of neighbouring genes on chromosome 8 (e.g. RIPK2, SPIDR and IMPA1). By contrast, iCluster 4 had consistent copy number loss and downregulation of genes on chromosome 13 (e.g. TRIM13, PHF11 and SUGT1). Finally, we considered the sample groups identified by our integrative analysis (Fig. 3A) as ‘true’ clusters with clinical relevance, and compared these ‘true’ clusters to the sample groupings suggested by either copy number or gene expression data alone. We used two different approaches to determine the similarity of the alternative clustering methods to the ‘true’ clusters. Based on both the Adjusted Rand Index (ARI) (Hubert and Arabie, 1985) and the Variation of Information Index (VII) (Meilă, 2007), sample clustering based on CN-data is more similar to integrative (‘true’) clustering than is expression-based clustering.