Regional heterogeneity in gene expression, regulation and coherence in hippocampus and prefrontal cortex across development and in schizophrenia. Collado-Torres et al, Neuron, 2019.

Published: 2 May 2019 | Version 1 | DOI: 10.17632/3j93ybf4md.1
Contributor(s):

Description of this data

This dataset contains the Supplementary Figures and Tables for the "Regional heterogeneity in gene expression, regulation and coherence in hippocampus and prefrontal cortex across development and in schizophrenia" article by Collado-Torres et al, Neuron, 2019.

Please cite the original article as well as the data itself if appropriate.

For clarifications about the figures or data, please contact us by creating an issue at https://github.com/LieberInstitute/brainseq_phase2/issues.

Thank you!

Experiment data files

  • Main Figure Files
    Cite
    • Figure 1
      Cite

      Figure 1. (A) Replication rate with consistent fold change for the adult and prenatal differences between brain regions using the BrainSpan dataset (n=16 for adult, n=36 for prenatal) across Bonferroni p-value thresholds for the exon, gene and exon-exon junction features. The labels for each point denote the number of features considered at the given Bonferroni p-value threshold. (B) Volcano plots of the differential expression signal between brain regions for each feature type and age group stratified by whether the differential expression replicated in BrainSpan (p<0.05 and consistent fold change). (C) Differentially expressed features grouped by gene ID for the adult and prenatal age groups.

    • Figure 2
      Cite

      Figure 2 (A) Estimated cell type RNA fractions for fetal quiescent neurons (FQN), neurons, and oligodendrocytes. See Figure S11 A for the other five cell type RNA fractions estimated with RNA deconvolution. (B) Differentially expressed features between brain regions across development grouped by gene ID. (C) GABRD is differentially expressed between DLPFC and HIPPO across development. Y-axis shows the residualized expression after removing modeled covariates. PCW: post-conception weeks.

    • Figure 3
      Cite

      Figure 3 (A) Venn diagram of differentially expressed genes with higher expression in neurotypical controls than SCZD affected individuals by brain region (DLPFC: FDR <10%, HIPPO: FDR <20%). (B) Similar to (A) but for genes with higher expression in SCZD than in neurotypical controls. (C) T-statistics for the top 400 differentially expressed genes in HIPPO compared to their t-statistics in DLPFC. Pink points: FDR<5% in DLPFC or HIPPO, blue points: FDR<5% in both brain regions. (D) KCNA1 is differentially expressed in DLPFC. (E) Gene set enrichment (GSE) analysis on DLPFC with biological process ontology terms and HIPPO (F); Count: number of enriched genes for the given ontology term. Gene Ratio: ratio of genes enriched for the ontology term among the number of genes that are contained in the given ontology term. In (E) and (F) terms are colored by their p-value of enrichment.

    • Figure 4
      Cite

      Figure 4 (A) HIPPO eQTLs grouped by gene ID. (B) region-dependent eQTLs grouped by SNP ID. (C) Region-dependent eQTLs group by gene ID. (D) Unique risk schizophrenia GWAS index SNPs from PCG2 (Pardiñas et al., 2018) by brain region that are in eQTLs. (E) Top HIPPO eQTL among schizophrenia risk SNPs. This exon-exon junction skips exon 4/14 of the FANCL gene. (F) Index schizophrenia risk SNP corresponding to the proxy SNP from (E).

  • Supplementary Figures
    Cite
    • FigureS01_RNAseq_overview
      Cite

      Figure S1. BrainSeq Phase II RNA-seq overview. RNA-seq samples from the hippocampus (HIPPO) and dorsolateral prefrontal cortex (DLPFC) across the human lifespan were sequenced. The individuals were also genotyped with some being patients with SCZD and others being neurotypical controls. Differences in expression across age, across HIPPO and DLPFC and between SCZD status were assessed. Expression was evaluated at 4 feature levels: gene, exons, exon-exon junctions and transcripts. The data is also available to perform expressed regions analyses.

    • FigureS02_data_n
      Cite

      Figure S2. RNA-seq samples in BrainSeq Phase II and BrainSpan. (A) All BrainSeq Phase II RNA-seq samples. (B) Neurotypical control samples in BrainSeq Phase II. (C) Neurotypical control samples in BrainSpan. (D) SCZD samples in BrainSeq Phase II.

    • FigureS03_expression_cuts
      Cite

      Figure S3. Determination of the low expression filters. (Left) Mean expression and the number of features passing the cutoff. (Right) Mean expression compared to the number of expressed samples (non-zero expression) among the features that pass the cutoff. The maximum suggested cutoff (highlighted in red) was used for each feature type.

    • FigureS04_expressed_genes
      Cite

      Figure S4. Expressed features grouped by gene ID. Venn diagram of expressed features grouped by Ensembl gene ID. See Figure S48 for further details on the 955 genes that are only measured at the gene summarization level.

    • FigureS05_concordance
      Cite

      Figure S5. Concordance across the top 5,000 features between BrainSeq Phase II and BrainSpan differences between DLPFC and HIPPO in prenatal and adult age groups. Across all features, the concordance was higher for the adult age group than the prenatal age group, although both showed higher concordance than expected by chance. The decreased concordance in the prenatal age group likely resulted from differences in prenatal dissections between the two studies. BrainSeq Phase II prenatal dissections contained adjacent parahippocampal neocortex (but not the amygdala), while the BrainSpan hippocampus dissections apparently did not include neocortex.

    • FigureS06_reprate_regspec
      Cite

      Figure S6. Region-specific differentially expressed replication rate with BrainSpan. (A) Replication rate when considering directionality only instead of consistent directionality and p-value < 0.05. Consistent directionality is determined by the sign of the t-statistics comparing DLPFC and HIPPO in each dataset. (B) Number and percent (C) of differentially expressed features across the different Bonferroni p-value thresholds.

    • FigureS07_subsample_adult
      Cite

      Figure S7. Sub-sampling sensitivity analysis for differentially expressed genes between DLPFC and HIPPO in adults. In the prenatal regional analyses, we used 28 samples per brain region (56 total) for our gene differential expression analysis across DLPFC and HIPPO. To evaluate the scenario where we had a reduced number of adult samples, we sub-sampled the adult samples to the same number of prenatal samples (28 per brain region). Out the 100 sub-sampled iterations, 82 resulted in tractable models using our original differential expression methods, and we observed 70 genes in prenatal samples with bonferroni-adjusted p-values less than 1%, out of which 32 replicated in BrainSpan (same t-statistic sign and p-value <5% in BrainSpan). Across our 82 sub-sampled replicates, we observed a mean of 353.9 (A) genes with P-bonferroni <1% with an average of 217.9 genes also replicating in BrainSpan (B). These averages are significantly higher than 70 at P-bonferroni <1% in fetal (p = 5.08x10-13 ) and 32 that further replicated in BrainSpan (p=3.06x10-18). The figure shows the number of genes with P-bonferroni <1% between DLPFC and HIPPO in the 82 adult sub-sampled gene-level analyses with (A) and without replication in BrainSpan (B). The orange line denotes the number of genes in our prenatal gene analysis with P-bonferroni <1% (70 and 32, respectively).

    • FigureS08_venn_regionspecific
      Cite

      Figure S8. Region-specific differentially expressed features. Venn diagrams of differentially expressed features between DLPFC and HIPPO grouped by Ensembl gene ids. (A) DE features in prenatal samples with higher expression in DLPFC. (B) DE features in adult samples with higher expression in DLPFC. (C) DE features in prenatal samples with higher expression in HIPPO. (D) DE features in adult samples with higher expression in HIPPO.

    • FigureS09-10_go_regionspecific
      Cite

      Figure S9. Enriched GO terms for region-specific adult genes. Gene ontology enrichment results for differentially expressed features between HIPPO and DLPFC in adult samples. Enriched (A) biological processes, (B) molecular function, (C) cellular component ontologies and (D) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways at the gene, exon or junction feature levels separated by higher expression in HIPPO or DLPFC. See table Table S14 for the full list of enriched terms. -------------------------------------------------------------------------------------------------------------------- Figure S10. Enriched GO terms for region-specific prenatal genes. Gene ontology enrichment results for differentially expressed features between HIPPO and DLPFC in prenatal samples. Enriched (A) biological processes, (B) molecular function, (C) cellular component ontologies and (D) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways at the gene, exon or junction feature levels separated by higher expression in HIPPO or DLPFC. See table Table S14 for the full list of enriched terms.

    • FigureS11_cellcomp_box
      Cite

      Figure S11. Cell type estimated fractions across developmental age groups. (A) Estimated RNA fractions from RNA deconvolution based on the Darmanis et al, 2015 reference (Darmanis et al., 2015). (B) Estimated RNA proportions from DNAm data using a subset of samples for which we had measured DNAm.

    • FigureS12-17_pcs_dev
      Cite

      Figure S12. Top PCs by brain region and age. Principal components for each feature across the 614 samples used for the development analysis colored by pre- and postnatal status as well as brain region. Top 8 PCs for (A) gene, (B) exon, (C) exon-exon junctions and (D) transcripts. -------------------------------------------------------------------------------------------------------------------- Figure S13. Top PCs by brain region and sex. Principal components for each feature across the 614 samples used for the development analysis colored by sex and brain region. Top 8 PCs for (A) gene, (B) exon, (C) exon-exon junctions and (D) transcripts. -------------------------------------------------------------------------------------------------------------------- Figure S14. Top PCs by race. Principal components for each feature across the 614 samples used for the development analysis colored by race. Top 8 PCs for (A) gene, (B) exon, (C) exon-exon junctions and (D) transcripts. -------------------------------------------------------------------------------------------------------------------- Figure S15. Top PCs by brain region and age in BrainSpan. Principal components for each feature across 79 the BrainSpan samples used for the development replication analysis colored by pre- and postnatal status as well as brain region. Top 8 PCs for (A) gene, (B) exon, (C) exon-exon junctions and (D) transcripts. -------------------------------------------------------------------------------------------------------------------- Figure S16. Top PCs by brain region and sex in BrainSpan. Principal components for each feature across the 79 BrainSpan samples used for the development analysis colored by sex and brain region. Top 8 PCs for (A) gene, (B) exon, (C) exon-exon junctions and (D) transcripts. -------------------------------------------------------------------------------------------------------------------- Figure S17. Top PCs by race in BrainSpan. Principal components for each feature across the 79 BrainSpan samples used for the development replication analysis colored by race. Top 8 PCs for (A) gene, (B) exon, (C) exon-exon junctions and (D) transcripts.

    • FigureS18_cellcomp
      Cite

      Figure S18. Cell type estimated RNA fractions across developmental age groups by brain region. Each panel shows the estimated RNA fraction for a given cell type across the six developmental age groups with a regression line shown per brain region (A: fetal replicating neurons, B: fetal quiescent neurons, C: OPC, D: neurons, E: astrocytes, F: oligodendrocytes, G: microglia, H: endothelial). While there are significant differences between brain regions at these age groups (Figure S11), the trends over age are similar (P-bonferroni >5%) across both brain regions except for the RNA fraction from astrocytes in adults aged 20 to 50 (E).

    • FigureS19_go_dev
      Cite

      Figure S18. Cell type estimated RNA fractions across developmental age groups by brain region. Each panel shows the estimated RNA fraction for a given cell type across the six developmental age groups with a regression line shown per brain region (A: fetal replicating neurons, B: fetal quiescent neurons, C: OPC, D: neurons, E: astrocytes, F: oligodendrocytes, G: microglia, H: endothelial). While there are significant differences between brain regions at these age groups (Figure S11), the trends over age are similar (P-bonferroni >5%) across both brain regions except for the RNA fraction from astrocytes in adults aged 20 to 50 (E).

    • FigureS20_pcs_gold
      Cite

      Figure S20. PMI and PCA EDA with all 900 samples. Post mortem interval (PMI) measured in hours for (A) HIPPO and (B) DLPFC separated by SCZD status. Using all 900 BrainSeq Phase II RNA-seq samples, the first gene-level principal component (PC) is associated with brain region (C), gene assignment rate (D) and mitochondrial mapping rate (E). The second PC is associated with age (F), in particular with the prenatal vs postnatal age boundary. Samples in the bottom row are colored by brain region and RNA-seq preparation kit.

    • FigureS21_qSVA_overview
      Cite

      Figure S21. qSVA overview. Workflow illustration for a complete analysis using degradation data and case-control data to implement the qSVA method. The degradation data, a few samples across time, are used to infer expressed regions that are associated with degradation. We save the genome coordinates of the top 1000 regions. We then quantify the expression of those regions in our case-control data (BrainSeq Phase II) to obtain a degradation matrix. Using this matrix we compute the quality surrogate variables (qSVs) that we can then use as adjustment covariates in a case-control differential expression analysis. We identify the qSVs by performing PCA on the degradation matrix and choosing k number with the BE algorithm (Buja and Eyuboglu, 1992).

    • FigureS22_pcs_qsv
      Cite

      Figure S22. qSVA EDA for HIPPO and DLPFC. (A) First quality surrogate variable (qSV) from the DLPFC samples (n=379) is associated with RIN and SCZD diagnosis status (B). Similarly, the first qSV from the HIPPO samples (n=333) is associated with RIN (D) and SCZD diagnosis status (E). There is no strong association with age for the first qSV for both DLPFC (C) and HIPPO (F).

    • FigureS23_sczd_cellprop
      Cite

      Figure S23. RNA cell fraction by SCZD diagnosis. By brain region, RNA cell fractions grouped by SCZD diagnosis with the corresponding Bonferroni-adjusted p-value shown for each comparison. There are significant differences between SCZD cases and controls in the endothelial fraction in both brain regions as well as the astrocytes in DLPFC (P-bonferroni <5%). This analysis was carried out with the same samples used for the SCZD case-control differential expression analysis by brain region region.

    • FigureS24_sczd_qsv
      Cite

      Figure S24. RNA cell fraction associations with quality surrogate variables by brain region. For the 15 DLPFC quality surrogate variables (qSVs) and the 16 HIPPO qSVs we determined if they were significant (P-bonferroni<5%) associated with each of the eight cell RNA fractions as visualized in this figure (red panels: p-bonferroni<5%). All cell types except fetal replicating neurons (FRN) were associated with at least one qSV in either brain region and all qSVs associated with at least one RNA fraction in both DLPFC and HIPPO. Thus, the SCZD case-control analysis adjusting for the quality surrogate variables already incorporated RNA fraction information.

    • FigureS25_dequal
      Cite

      Figure S25. DEqual plots. Degradation quality (DEqual) plots for each brain region (HIPPO: left column, DLPFC: right column) comparing against a naïve diagnosis model (A and B), adjusting for covariates including known quality-associated metrics (C and D) and the final model adjusting for the region-specific qSVs and covariates (E and F). Correlation between the log2 fold change by SCZD status (x-axis) and log2 fold change by degradation in HIPPO or DLPFC is shown in the top left corner of each panel. DEqual plots are defined in Jaffe et al. (Jaffe et al., 2017).

    • FigureS26_venn_casecontrol
      Cite

      Figure S26. Feature DE by SCZD status. Venn diagrams of differentially expressed features between neurotypical controls and SCZD affected individuals by brain region (DLPFC: FDR <10%, HIPPO: FDR <20%). Exons (A and B), exon-exon junctions (C and D) and transcripts (E and F) with higher expression in neurotypical controls (A, C and E) or in SCZD cases (B, D and F).

    • FigureS27_venn_casecontrol_feat
      Cite

      Figure S27. Model and feature DE by SCZD status. Venn diagrams of differentially expressed features (FDR<10%) between neurotypical controls and SCZD affected individuals by brain region and studies grouped by gene id. Differentially expressed features in HIPPO (A) or DLPFC (B). Exons and exon-exon junctions differentially expressed by SCZD status (FDR<10%) grouped by gene ID (C), gene and exons (D), as well as genes and exon-exon junctions (E) across brain regions. (F) Differentially expressed genes in BrainSeq Phases I and II as well as in the CommonMind Consortium (CMC).

    • FigureS28_scattertop
      Cite

      Figure S28. SCZD case-control t-statistics across regions and datasets for top SCZD results at the gene level. T-statistics for the top 400 differentially expressed genes between neurotypical controls and SCZD cases for either DLPFC (A and B) or HIPPO (C and D) compared against the t-statistics in BrainSeq Phase 1 DLPFC (A and C) and the CommonMind Consortium dataset (B and D). Pink points: FDR<5% in either of the two sets being compared, blue points: FDR<5% in both sets. See Figure S29 A-D for all the genes.

    • FigureS29_scatterall
      Cite

      Figure S29. SCZD case-control t-statistics across regions and datasets at the gene level. T-statistics for all genes between neurotypical controls and SCZD cases for either DLPFC (A and B) or HIPPO (C and D) compared against the t-statistics in BrainSeq Phase 1 DLPFC (A and C) and the CommonMind Consortium dataset (B and D). (E) BrainSeq Phase II DLPFC versus HIPPO. Pink points: FDR<5% in either of the two sets being compared, blue points: FDR<5% in both sets.

    • FigureS30_noqsva
      Cite

      Figure S30. SCZD case-control t-statistics across regions and datasets at the gene level. T-statistics for all genes between neurotypicalcontrols and SCZD cases for DLPFC (A, C, E) or HIPPO (B, C, F) without adjusting for quality surrogate variables compared against the t-statistics with the qSVA adjustment (A and B), the CommonMind Consortium dataset (C and D) and in BrainSeq Phase 1 DLPFC (E and F). (G) BrainSeq Phase II DLPFC versus HIPPO without qSVA adjustment. Pink points: FDR<5% in either of the two sets being compared, blue points: FDR<5% in both sets. The t-statistics between SCZD cases and controls with and without qSVA adjustment were correlated (⍴: HIPPO=0.72, DLPFC=0.74) with 8 out of 48 (16.7%) and 137 out of 245 (55.9%) significantly differentially expressed genes (FDR<5%) when adjusting for qSVs also being differentially expressed without qSVA adjustment, respectively in HIPPO and DLPFC. In both brain regions, though particularly in DLPFC, there was a higher number of genes at FDR<5% (HIPPO: 63 vs 48, DLPFC: 1,084 vs 245) in the analysis without qSVA adjustment compared to the original qSVA-adjusted analysis. The correlation of the t-statistics for each brain region against the CMC dataset (as previously analyzed in hg19) increases (HIPPO: 0.334 versus 0.0783, DLPFC: 0.306 versus 0.138) when not adjusting for qSVs compared to adjusting for them. Similarly, the correlation between DLPFC and HIPPO increases (0.564 vs 0.276) when not adjusting for qSVs. We believe these observations reflect confounding by degradation.

    • FigureS31-32_gse_casecontrol
      Cite

      Figure S31. GSE DLPFC results. Gene set enrichment (GSE) results for differentially expressed features between SCZD cases and neurotypical controls in DLPFC. Enriched (A) biological processes, (B) molecular function, (C) cellular component ontologies and (D) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Biological process is duplicated here from Figure 3 E to facilitate comparisons. See Table S2 for the full list of enriched GO terms across the different ontologies. -------------------------------------------------------------------------------------------------------------------- Figure S32. GSE HIPPO results. Gene set enrichment (GSE) results for differentially expressed features between SCZD cases and neurotypical controls in HIPPO. Enriched (A) biological processes, (B) molecular function, (C) cellular component ontologies and (D) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Biological process is duplicated here from Figure 3 F to facilitate comparisons. See Table S2 for the full list of enriched GO terms across the different ontologies.

    • FigureS33_go_casecontrol
      Cite

      Figure S33. SCZD enriched GOs. Gene ontology enrichment results for differentially expressed features between SCZD cases and neurotypical controls either in the DLPFC or HIPPO brain region. Enriched (A) biological processes, (B) molecular function, (C) cellular component ontologies and (D) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways at the gene, exon or junction feature levels separated by higher expression in SCZD cases or neurotypical controls. See table Table S14 for the full list of enriched terms.

    • FigureS34_shifts
      Cite

      Figure S33. SCZD enriched GOs. Gene ontology enrichment results for differentially expressed features between SCZD cases and neurotypical controls either in the DLPFC or HIPPO brain region. Enriched (A) biological processes, (B) molecular function, (C) cellular component ontologies and (D) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways at the gene, exon or junction feature levels separated by higher expression in SCZD cases or neurotypical controls. See table Table S14 for the full list of enriched terms.

    • FigureS35_casecontrol_interaction
      Cite

      Figure S35. Differential expression by interaction between SCZD diagnosis status and brain region. (A) Differentially expressed features grouped by gene id excluding 29 un-annotated DE exon-exon junctions. (B) The SCLO2A1 gene has a significantly different relationship between expression and SCZD status across DLPFC (decreased in SCZD versus controls) and HIPPO (increased in SCZD versus controls); LFC: log2 fold change. Enriched biological processes (C) and cellular components (D) among the genes with DE signal at the exon and exon-exon junction levels. No terms were enriched among the 111 DEGs with gene support.

    • FigureS36_go_corr_gene
      Cite

      Figure S36. Enriched GOs among highly correlated genes in DLPFC and HIPPO adults. Gene ontology enrichment results for genes with significant correlation between HIPPO and DLPFC (FWER<5%) among the matched individuals from the SCZD case-control analysis (n=265 individuals). Enriched (A) biological processes, (B) molecular function, (C) cellular component ontologies and (D) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways at the gene level either with the raw expression values [log2(RPKM + 0.5)] or the cleaned expression values [log2(RPKM + 0.5) with covariates and qSVs removed while retaining the SCZD diagnosis effect]. See table Table S14 for the full list of enriched terms.

    • FigureS37_corr_gene
      Cite

      Figure S37. Correlated genes in DLPFC and HIPPO adults compared vs SCZD case-control t-statistic. Scatter plot of the correlation between HIPPO and DLPFC compared against the SCZD t-statistic with significantly correlated genes (FWER<5%) labeled in red. (A) HIPPO and (B) DLPFC SCZD t-statistics. The correlation was calculated with the cleaned expression values: log2(RPKM + 0.5) with covariates and qSVs removed while retaining the SCZD diagnosis effect.

    • FigureS38_corr_indv
      Cite

      Figure S38. Correlation differences in HIPPO and DLPFC by SCZD diagnosis for all feature levels. Correlation for each expression feature across HIPPO and DLPFC per individual (n=265) shows that SCZD affected individuals have decreased correlation at the (A) gene, (B) exon, (C) exon-exon junction and (D) transcript levels compared to neurotypical controls. The correlation was calculated with the cleaned expression values: log2(RPKM + 0.5) for genes and exons, log2(RP10M + 0.5) for exon-exon junctions and log2(TPM + 0.5) for transcripts with covariates and qSVs removed while retaining the SCZD diagnosis effect.

    • FigureS39_corr_sex
      Cite

      Figure S39. Correlation differences in HIPPO and DLPFC by sex in autosomal chromosomes. Correlation for each expression feature across HIPPO and DLPFC per individual (n=265) shows no significant (p-value > 0.05) different by sex at the gene (A), exon (C) and transcript levels (D). Among annotated junctions in the autosomal chromosomes (C) there is a slight decrease in the correlation in females compared to males. The schizophrenia-related coexpression findings (Figure S38) were supported by signal at all four feature summarizations suggesting a more robust association than sex which was only supported at the junction level.

    • FigureS40_corrsczd_indv
      Cite

      Figure S40. Correlation differences in HIPPO and DLPFC by SCZD diagnosis for SCZD differentially expressed genes. For genes differentially expressed by SCZD status, their correlation across HIPPO and DLPFC per individual (n=265) shows that SCZD have decreased correlation for the (A) HIPPO SCZD DEGs, but not for the (B) DLPFC, (C) BrainSeq Phase 1 (BSP1), or (D) CommonMind Consortium (CMC) DEGs.

    • FigureS41_go_corr_indv
      Cite

      Figure S41. Highlighted GO and KEGG pathways with significant correlation differences in HIPPO and DLPFC by SCZD diagnosis. Sets of genes grouped by their membership to GO terms (level 3) and KEGG pathways that show differential correlation among individuals with SCZD and neurotypical controls. Highlighted significantly (FDR<5%) differentially correlated GO terms (A: 0044326, B: 1900454, C: 0002253) and KEGG pathway (D: hsa04142) among HIPPO and DLPFC. The correlation was calculated with the cleaned gene expression values: log2(RPKM + 0.5) with covariates and qSVs removed while retaining the SCZD diagnosis effect.

    • FigureS42_venn_eqtls
      Cite

      Figure S42. eQTLs by feature and risk analysis. (A) HIPPO eQTLs grouped by gene ID. (B) region-dependent eQTLs grouped by SNP ID. (C) region-dependent eQTLs group by gene ID. (D) Brain samples by individual used for the SNP GWAS risk analysis. (E) Input information for the SNP GWAS risk analysis. (F) SNP GWAS risk analysis results summary by brain region. (G) Number of risk GWAS loci that are eQTLs in BrainSeq Phase II that pair to genes by brain region.

    • FigureS43_hippo_eqtl
      Cite

      Figure S43. Example GWAS PGC2 index risk SNP eQTLs in HIPPO. An exon-exon junction (A) and a transcript (B) from the gene NDRG4 are in eQTL associations with the schizophrenia risk snp rs42945 in HIPPO (and DLPFC: not shown). Residual expression is log2(RP10M + 1) or log2(TPM + 1) with SCZD diagnosis, sex, five ancestry PCs and expression PCs removed.

    • FigureS44_interaction_eqtl
      Cite

      Figure S44. Brain-region dependent eQTLs that involve GWAS PCG2 index risk SNPs. GWAS PCG2 index risk snps rs324015 (A and D), rs12293670 (B) and rs4144797 (C) have brain-region dependent eQTL associations with DLPFC and HIPPO. They involve either an exon (A) or exon-exon junctions (B-D) for genes LRP1, NRGN and NGEF. Two of the exon-exon junctions are un-annotated (B, C) though they are in the same strand as the only gene they overlap: NRGN and NGEF, respectively. Residual expression is log2(RPKM + 1) or log2(RP10M + 1) with SCZD diagnosis, sex, five ancestry PCs and expression PCs removed while retaining the brain region effect. Orange: DLPFC, blue: HIPPO.

    • FigureS45_twas_z
      Cite

      Figure S45. TWAS Z scores across DLPFC and HIPPO for each expression summarization feature. TWAS Z-scores for expression features with TWAS weights in either brain region with color denoting whether the TWAS association for the feature was significant (FDR<5%) using the PGC2+CLOZUK GWAS (Pardiñas et al., 2018). The shape of the point denotes whether the given expression feature had a TWAS in both brain regions or not. The results are separated by whether the best GWAS SNP is in a risk loci for schizophrenia or not.

    • FigureS46_twas_z_gandal
      Cite

      Figure S46. TWAS Z scores across DLPFC and HIPPO compared to Gandal et al’s TWAS Z scores. TWAS Z-scores at the gene summarization level with TWAS weights in either DLPFC or HIPPO compared to the TWAS results from Gandal et al (Gandal et al., 2018). The color denoting whether the TWAS association for the feature was significant (FDR<5%) using the PGC2+CLOZUK GWAS (Pardiñas et al., 2018). The shape of the point denotes whether the given expression feature had a TWAS in both the given brain region and the Gandal et al TWAS results (Gandal et al., 2018). The results are separated by whether the best GWAS SNP is in a risk loci for schizophrenia or not.

    • FigureS47_twas_z_vs_szd_t
      Cite

      Figure S47. TWAS Z scores compared against the SCZD case-control differential expression t-statistics. For each expression summarization feature level and brain region, we compared the TWAS z-score against the SCZD case-control t-statistic. The color of the point denotes whether the best GWAS (PGC2+CLOZUK) SNP is genome wide significant (p-value < 5x10-8). The results are separated by whether the best GWAS SNP is in a risk loci for schizophrenia or not. The correlation between the TWAS z-score and the SCZD t-statistic is shown in each panel on the top left.

    • FigureS48_check_expr
      Cite

      Figure S48. Exploration of the genes only expressed at the gene summarization level. Of the 955 genes observed only in genes in Figure S4, there are no exons or junctions for 1 (0.1%) and 868 (90.9%) of the genes. We verified that indeed the feature with the highest expression for each of these genes did not pass the feature-specific expression cutoffs (exon = 0.3 RPKM, junction = 0.46 RP10M, transcript = 0.38 TPM) as shown in this figure. The 955 genes themselves have low levels of expression with a range from 0.2501 to 0.679 mean RPKM (mean=0.369) when the cutoff at the gene level was 0.25 mean RPKM. We note that only 68 (7.1%) of these 955 genes are protein coding, in contrast with 415 (43.5%), 160 (16.8%) and 110 (11.5%) which are processed pseudogenes, lincRNAs or antisense. Thus, it is expected that the more fine-grained features (exon, junctions and transcripts) show even lower levels of expression.

  • Supplementary Tables
    Cite
    • TableS01
      Cite

      Table S1. Development DEGs adjusting for RNA fraction sensitivity results. We re-ran our development DEG analysis for each expression feature adjusting for the first seven cell types (to keep a full rank design model, dropping the endothelial fraction). Across features, we observed a high degree of concordance across our initial analysis and this re-analysis at a threshold of Bonferroni<1%. This high concordance can be explained by the observation that, while the cell fractions are significantly different by brain regions, they are not significantly different (Bonferroni>5%) by age between DLPFC and HIPPO for each of the six age groups we considered, with the only exception being astrocytes for individuals 20 to 50 years old (Figure S18). Almost all of the regional differences in expression within age groups (prenatal and postnatal) or across age groups retained significance in sensitivity analyses adjusting for the RNA composition proportions (>93%). See file SupplementaryTable1.xlsx.

    • TableS02
      Cite

      Table S2. SCZD case-control GSE enriched GO terms for DLPFC and HIPPO. See file SupplementaryTable2.txt (TSV).

    • TableS03
      Cite

      Table S3. Protein coding enrichment sensitivity analyses results for enrichment among expressed genes and among SCZD case-control differentially expressed genes. We found that features passing our expression cutoffs are enriched for protein coding genes compared to the other 44 annotation categories that were all non-coding using Gencode v25. Overall, in the annotation, there were 19,950 protein coding genes out of 58,037 genes (34.4%), but 14,653 out of the 24,652 genes passing our expression cutoff were protein coding (59.4%). Unlike non-coding genes, the majority of protein coding genes pass the expression cutoff (73.4%, 14,653/19,950; see sheet 1). In contrast the percent of lincRNAs retained decreased from 13% to 8% after the expression cuts with 1,965 (26.1%) out of 7,539 lincRNAs passing the expression cutoff. Among genes used for our SCZD case-control differential expression analysis (FDR <5%) in each brain region, we observe a significant enrichment (p-value <5%) for protein coding genes in the differentially expressed genes (sheet 2). See file SupplementaryTable3.xlsx.

    • TableS04
      Cite

      Table S4. Schizophrenia case-control t-statistic shifts in prenatal and postnatal life. This table complements Figure S34 with DLPFC(1) representing the BrainSeq Phase I polyA+ DLPFC data and DLPFC(2) representing the BrainSeq Phase II RiboZero DLPFC data. See file SupplementaryTable4.xlsx.

    • TableS05
      Cite

      Table S5. Sensitivity analysis of SCZD case-control differential expression against sex differences in autosomal chromosomes. We performed differential expression analyses for sex for each brain region (FDR<5%) using the exact same samples we used for the SCZD case-control comparison. Focusing on features in the autosomal chromosomes, we found non-significant overlaps between sex and schizophrenia effects (chi-square test, bonferroni p-value > 0.05) for nearly all features except for junctions in hippocampus (2 overlapped). These two junctions are novel junctions in the mitochondrial chromosome. Overall, the t-statistics from both models are uncorrelated. The following table summarizes these results which suggest little overlap in differential expression between sex and schizophrenia on the autosomes. See file SupplementaryTable5.xlsx.

    • TableS06
      Cite

      Table S6. Housekeeping genes depletion among highly correlated genes between DLPFC and HIPPO. To identify housekeeping genes we used the list of 514 unique housekeeping gene symbols from RSeQC http://rseqc.sourceforge.net/ (Wang et al., 2012). Among the highly correlated genes (FWER<5%) from Figure S37, we found a significant depletion (odds ratio = 0.4) for housekeeping genes with a p-value of 0.01084 (chi-square test). See file SupplementaryTable6.xlsx.

    • TableS07
      Cite

      Table S7. Differentially (FDR<5%) correlated GO terms and KEGG pathways at the individual level between SCZD affected individuals and neurotypical controls. See file SupplementaryTable7.xlsx.

    • TableS08
      Cite

      Table S8. DLPFC, HIPPO and brain-region dependent eQTL replication with GTEx and BrainSeq Phase I DLPFC polyA+. (A) This table shows the number of eQTL SNP-feature pairs that were identified for each of the three eQTL analyses in BrainSeq Phase II, the number and percent of them measured in GTEx or BrainSeq Phase I DLPFC polyA+, as well as the number and percent of them that are either P<5% in the replication dataset, concordant directionality, or both. Furthemore it shows the gene level replication results when subsetting our samples to the self-reported CAUC individuals with either P<5% and P<0.1%. (B) Self reported race sample sizes for the eQTL analysis. (C) Gene level eQTLs at FDR<1% in the CAUC-only analysis compared to the main mixed-raced analysis eQTLs. See file SupplementaryTable8.xlsx. For the genome wide significant (FDR<1%) eQTL snp-feature pairs and their replication statistics see Table S15.

    • TableS09
      Cite

      Table S9. TWAS summary tables. (A) Summary of the TWAS results with the PGC2+CLOZUK GWAS for both DLPFC and HIPPO across each of the expression summarization features. (B) Evaluation of the risk loci with eQTLs at FDR<1% in the TWAS results. See file SupplementaryTable9.xlsx. See Table S17 for the detailed TWAS results.

    • TableS10
      Cite

      Table S10. Demographic and sequencing metrics by group for the RNA-seq samples. See file SupplementaryTable10.xlsx.

    • TableS11
      Cite

      Table S11. Differentially expressed features for each of the three main models: (1) HIPPO versus DLPFC developmental differences among neurotypical controls, (2) HIPPO versus DLPFC in both adults and prenatal age groups among neurotypical controls, (3) SCZD cases versus neurotypical controls for both DLPFC and HIPPO. See file SupplementaryTable11.zip.

    • TableS12
      Cite

      Table S12. Regression calibration design matrix for the RNA deconvolution. This table has the Z coefficients for the RNA deconvolution using the single cell data from Darmanis et al (Darmanis et al., 2015). For each of the 158 genes it also includes the gene GencodeID, Symbol and the cell type it distinguishes. See file SupplementaryTable12.txt (TSV).

    • TableS13
      Cite

      Table S13. Genomic coordinates of the top 1000 degradation-associated expressed regions across DLPFC and HIPPO. See file SupplementaryTable13.xlsx.

    • TableS14
      Cite

      Table S14. GO and KEGG enrichment analysis results. This table contains the results for the “DLPFC and HIPPO adult differences” (Figure S9), “DLPFC and HIPPO prenatal differences” (Figure S10), “DLPFC and HIPPO differential development” (Figure S19), “SCZD case-control differences” (Figure S33), and “DLPFC and HIPPO high correlation” (Figure S36). See file SupplementaryTable14.txt (TSV).

    • TableS15
      Cite

      Table S15. Genome wide significant eQTL snp-feature pairs. This table contains the significant (FDR<1%) eQTL snp-feature pairs in DLPFC, HIPPO and brain-region dependent (interaction) analyses at each of the expression summarization feature levels. The columns are: SNP (SNP id), feature_id (for exons, we also include the Gencode Exon ID), statistic, p-value, FDR, beta (coefficient), (gene) symbol, Ensembl Gene ID, (feature) type, classification “class” such as whether the exon-exon junction is observed in Gencode v25 (‘InGen’), gencode transcript IDs (“gencodeTx”), and “gene_type” (protein coding, antisense, etc). The feature levels and models with either GTEx, BrainSeq Phase I or CAUC-only replication statistics have additional columns (see Table S8) with the “gtex_”, “bsp1_” and “cauc_” prefixes, respectively. See file SupplementaryTable15.tar.gz (also available from https://s3.us-east-2.amazonaws.com/libd-brainseq2/SupplementaryTable15_eQTL.tar.gz).

    • TableS16
      Cite

      Table S16. Risk loci focused eQTL snp-feature pairs. rAggr SNP output and eQTL (FDR<1%) results for all four expression summarization feature levels focused on the risk loci. (A) DLPFC, (B) HIPPO. See file SupplementaryTable16.xlsx.

    • TableS17
      Cite

      Table S17. TWAS genome wide significant results. (A) Significant (FDR<5%) TWAS results based on the PGC2+CLOZUK GWAS (Pardiñas et al., 2018) for each of the four expression summarization feature levels in DLPFC and HIPPO. (B) Paired TWAS results for comparing them across DLPFC and HIPPO and for the gene level results, the TWAS results from Gandal et al (Gandal et al., 2018). (C) Presence/absence of the Gusev et al (Gusev et al., 2018) TWAS 83 unique genes from CMC or CMC splicing in our TWAS results with either the PGC2+CLOZUK GWAS (Pardiñas et al., 2018) or the PGC2 GWAS (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). (D) Similar to (A) but with the PGC2 GWAS. (E) Similar to (B) but with the PGC2 GWAS. See file SupplementaryTable17.xlsx.

Steps to reproduce

Please see the source code at https://github.com/LieberInstitute/brainseq_phase2 and https://github.com/LieberInstitute/qsva_brain for how to reproduce the files and figures from this dataset.

Related links

Latest version

  • Version 1

    2019-05-02

    Published: 2019-05-02

    DOI: 10.17632/3j93ybf4md.1

    Cite this dataset

    Collado-Torres, Leonardo; Burke, Emily E; Peterson, Amy; Shin, Joo Heon; Straub, Richard E; Rajpurohit, Anandita; Semick, Stephen A; Ulrich, William S; Consortium, BrainSeq; Price, Amanda J; Valencia, Cristian; Tao, Ran; Deep-Soboslay, Amy; Hyde, Thomas M; Kleinman, Joel E; Weinberger, Daniel R; Jaffe, Andrew E (2019), “Regional heterogeneity in gene expression, regulation and coherence in hippocampus and prefrontal cortex across development and in schizophrenia. Collado-Torres et al, Neuron, 2019.”, Mendeley Data, v1 http://dx.doi.org/10.17632/3j93ybf4md.1

Statistics

Views: 4989
Downloads: 1027

Institutions

Lieber Institute for Brain Development

Categories

Schizophrenia, Development Studies, RNA Sequencing, Brain, Human

Licence

CC BY NC 3.0 Learn more

The files associated with this dataset are licensed under a Attribution-NonCommercial 3.0 Unported licence.

What does this mean?
You are free to adapt, copy or redistribute the material, providing you attribute appropriately and do not use the material for commercial purposes.

Report