BindDB: Pseudogenes Case Study
As a negative control for BindDB, we uploaded a list of 76 pseudogene names, identified by the 'ps' extension to the gene symbols in the Refseq annotation, to the multi-gene query form of BindDB and selected to query their proximal promoter regions. Overall, the epigenetic profile of these pseudogenes was that of inactive chromatin. Sorting by the "Enrichment" column of the positive results table shows depletion of epigenetic marks that are associated with active transcription such as H3K4me1, H3K36me3, H3K27ac and H3K79me2 (Figure S2A). In agreement, the heatmap shows very little evidence of factor binding to most of the pseudogenes and the prominent yellow color of the sidebar indicates that there is no evidence of RNA expression coming from these regions either(Figure S2B). Yet, a handful of pseudogenes, including Rpl34-ps and Pisd-ps1/2,display evidence of H3K4me3 and RNAP II bindingas well as RNA expression (Figure S2B, top cluster).These pseudogenes are also bound by many transcription factors, including those from the pluripotent network, such as OCT4, indicating that this small group of genes are likely expressed in ESCs and may be inaccurately annotated as "pseudogenes". In the case of Rpl34-ps, its annotation completely overlaps the non-pseudo gene version of the same gene, such that it shares its epigenetics with Rpl34 itself, a key ribosomal protein that is expressed in ESCs. For Pisd-ps1/2, the Refseq and Ensembl annotations have categorized it as a non-coding transcript, hence misleadingly termed a pseudogene. By clicking the gene name, Pisd-ps1, on the BindDB results page, the UCSC genome browser(Kent et al., 2002)opens up at the gene's genomic location along with the ENCODE/LICR Histone Mods(ENCODE, 2012) and RNA-seq tracks for E14 ESCs(Mortazavi et al., 2008) providing a more focused view on this gene (Figure S2C). Its active epigenetic profile, including OCT4 binding, makes it an interesting candidate for the study of ncRNA in ESCs. Running the same analysis on a larger cohort of ~5500 non-overlapping pseudogenes in the Ensembl annotation, reassuringly replicated the same general lack of factor binding and histone gene modifications at their promoters (data not shown).