SALL4 controls cell fate in response to DNA base composition. Pantier et al.
Mammalian genomes contain long domains with distinct average compositions of A/T versus G/C base pairs. In a screen for proteins that might interpret base composition by binding to AT-rich motifs, we identified the stem cell factor SALL4 which contains multiple zinc-fingers. Mutation of the domain responsible for AT binding drastically reduced SALL4 genome occupancy and prematurely up-regulated genes in proportion to their AT content. Inactivation of this single AT-binding zinc-finger cluster mimicked defects seen in Sall4-null cells, including precocious differentiation of embryonic stem cells and embryonic lethality in mice. In contrast, deletion of two other zinc-finger clusters was phenotypically neutral. Our data indicate that loss of pluripotency is triggered by down-regulation of SALL4, leading to de-repression of a set of AT-rich genes that promotes neuronal differentiation. We conclude that base composition is not merely a passive by-product of genome evolution, but constitutes a signal that aids control of cell fate. Python scripts and source code used for bioinformatic analyses, raw Western blot and microscopy images, as well as other types of unprocessed and processed data used to generate the figures are available here. "Figure 1.zip", "Figure 4.zip", "Figure 5.zip", "Figure 6.zip", "Figure S1.zip", "Figure S2.zip", "Figure S3.zip", "Figure S4.zip", "Figure S5.zip" and "Figure S6.zip" contain the raw western blot, microscopy images, processed and unprocessed data associated with figure panels in respective figures. "bioinformatics.zip" contains the respective ipynb (jupyter notebook) files for Figure 1, 2, 3, 6 and 7. Each notebook contains the source code to generate main+supplemental figure panels and the sub-folder "data" contain the processed data associated with the respective figure. "selex_scripts.zip" contains the Python scripts and all the necessary information to reproduce the analysis of HT-SELEX data. Raw and processed high-throughput sequencing data is deposited on Array Express, as described below. E-MTAB-7343 RNA-seq of WT, S4KO, ZFC4mut and ZFC4Δ ESCs E-MTAB-7655 RNA-seq of WT, S4KO, ZFC4mut and ZFC1-2Δ ESCs E-MTAB-9197 SALL4 ChIP-seq in WT, S4KO, ZFC4mut and ZFC1-2Δ ESCs E-MTAB-9198 Timecourse RNA-seq during differentiation (day 0, 2 and 5) of WT, S4KO, ZFC4mut and ZFC1-2Δ ESCs E-MTAB-9202 RNA-seq of S4KO cells carrying Sall4 cDNA or EGFP cDNA under a doxycycline inducible promoter E-MTAB-9236 HT-SELEX of recombinant C2H2 zinc-finger domains of SALL4 E-MTAB-9245 ATAC-seq in WT ESCs