BindDB: LincRNAs Case Study
Epigenetic Classification of Novel Genes Provides Insights into Functional Subgroups The introduction of NGS platforms has greatly facilitated the discovery of novel genes, especially non-coding genes, which were under the radar of the previous methodologies. By combining epigenetic analysis to the regions of novel transcription, 1000s of novel long intervening non-coding RNAs, or lincRNAs, residing in intergenic regions were discovered (Guttman et al., 2009; Khalil et al., 2009). The understanding that histone marks can point to areas of active and functional transcription enabled the use of a minimal epigenetic profile ("K4-K36"), to locate regions with H3K4me3 marking a promoter region, followed by H3K36me3 within the potential coding region. But, epigenetic regulation of active genes could be far more complex. We applied the BindDB analysis capabilities to this novel group of genes in order to ask whether their promoters are regulated similarly to protein-coding genes and test if we could expand it beyond the basic "K4-K36" profile. We uploaded a list of 2074 genomic regions (in BED format since many are not annotated with gene symbols) classified as lincRNAs in the Ensembl annotation (Hubbard et al., 2002) according to the algorithms in Guttman et al. (Guttman et al., 2009), yet also include strand information from which promoter regions can be deduced. These regions were merged by BindDB into 1729 non-overlapping proximal promoter regions. It is striking to see a noticeable enrichment for almost all factors beyond H3K4me3 (1.55-2.65-fold, log2 scale, see Supplemental Experimental Procedures) and H3K36me3 (0.04-1.25-fold) (FigureS3). The low enrichment of H3K36me3 persisted when we examined only lincRNA bodies compared to all gene bodies in a separate analysis (0.54-1.92), indicating that lincRNAs have similar levels of H3K36me3 as in protein-coding genes. The clustered heatmap view of all factors and histone modification along all lincRNA promoters (Figure1B) gives us two important insights into this large group. First, there is quite an extensive epigenetic profile for many of them involving active histone modification and transcription factors. Second, there are as much as 20% of lincRNA annotations that do not display such a profile (Figure1B, orange star). This group also exhibits none to low levels of RNA expression and may include lincRNAs specific to non-ESC types or false-positives resulting from the extensive statistical methods applied in search of the "K4-K36" profile along with some regions that have only the minimal "K4-K36" profile. We can speculate that this minimal profile may not be enough to support bona-fide regulated transcription, at least not in ESCs, but the deposit of histone modifications may be a byproduct of another process.