Transcriptional regulatory network of developing mouse telencephalon
In mouse, a vesicle forms at the anterior part of the developing embryo at 9.5 day/stage (M9.5), which further develops into telencephalon at M10.5 day/stage. The equivalent stages in chick embryo are referred to as HH17 and HH24. We sequenced RNA populations from telencephalic region at the early (M9.5 and HH17) and late stages (M10.5 and HH24) of mouse and chick embryo. Four samples were sequenced for each stage of telencephalon development. The resulting RNA sequencing reads were used to assemble transcripts and for counting their abundance. The read counts for each transcript then used to compute its differential expression between M9.5 and M10.5 stages in mouse. Likewise, each chick transcript was compared between H17 and HH24 stages. Genes having significant p-values with positive log2 -fold change represent increased expression at developmental stage B (late) compared to stage A (early) and are referred to as up-regulated (UP). Likewise, genes with negative log2-fold change represent decreased expression at stage B compared to stage A and are referred to as down-regulated (Down, DN). Gene expression with p-values above 0.05 was considered non-significant and represents no change between stage B and stage A and is referred to as no change (NC). Genes with read count roughly less than five in less than four samples were considered not expressed and referred to as NE. These four groups of genes were further categorized into sixteen groups based on the expression status of mouse and chick gene orthologs. These sixteen gene groups; UP, DN, NC, and NE mouse gene groups; and a gene group composed of all differentially expressed genes in mouse (DEG), were submitted to iRegulon Cytoscape plugin for predicting their transcriptional regulatory factors. These gene groups and iRegulon prediction results for each of the groups are provided as datasets. Using significant iRegulon prediction results, we reconstructed transcriptional regulatory network for mouse telencephalon development, which is also provided as network file. In addition to a single excel file containing results for some of the gene groups where we found meaningful connection between the predicted transcription factors and their differentially expressed targets genes.
Steps to reproduce
The iRegulon method was used to identify direct transcription factors of the genes belonging to each of the gene groups (folder genelist) (Sande et al., 2014). The iRegulon plugin allows the identification of regulons using motif discovery of known transcription factor (TF) binding sites in a set of co-regulated genes. A regulon consists of a TF and its direct transcriptional targets, which contain its common binding sites in their cis-regulatory control elements. iRegulon was used to detect motifs using nearly ten thousand positional weight matrices (PWM) in 500 bases, 10kb, and 20kb upstream of the transcription start site of every gene. The output of iRegulon is a list of enriched motifs/tracks in the input genes along with the candidate TFs binding to the motifs, a set of direct target genes of the input, motifs ranking scores, the Area Under the Curve (AUC), and the normalized enrichment score (NES). Each mouse gene group (UP, DN, NC, and NE and the those defined by the 16 comparative groups) was subjected to iRegulon plugin in Cytoscape to predict conserved motifs in 500 bases, 10 kb and 20kb of their upstream regions independently (Sande et al., 2014; Shannon et al., 2003). Each gene of the predicted regulon was then paired with TFs having their binding motifs in its upstream region. We identified 1,532 TFs targeting the 10,891 out of 12,902 analyzed genes, constituting 425,009 edges among them. To reduce the complexity of the network for visualization, we filtered out target genes which are not expressed differentially in mouse and not expressed in chick. Furthermore, we removed TFs having less than 20 predicted target genes in the network. Finally, we analyzed our resulting network of 608 TFs targeting 2,908 genes, constituting 101,479 edges among them.