MC-4C: Enhancer hubs and loop collisions identified from single-allele topologies
Chromatin folding is increasingly recognized as a regulator of genomic processes such as gene activity. Chromosome conformation capture (3C) methods have been developed to unravel genome topology through the analysis of pair-wise chromatin contacts and have identified many genes and regulatory sequences that, in populations of cells, are engaged in multiple DNA interactions. However, pair-wise methods cannot discern whether contacts occur simultaneously or in competition on the individual chromosome. We present a novel 3C method, Multi-Contact 4C (MC-4C), that applies Nanopore sequencing to study multi-way DNA conformations of tens of thousands individual alleles for distinction between cooperative, random and competing interactions. MC-4C can uncover previously missed structures in sub-populations of cells. It reveals unanticipated cooperative clustering between regulatory chromatin loops, anchored by enhancers and gene promoters, and CTCF and cohesin-bound architectural loops. For example, we show that the constituents of the active -globin super-enhancer cooperatively form an enhancer hub that can host two genes at a time. We also find cooperative interactions between further dispersed regulatory sequences of the active proto-cadherin locus. When applied to CTCF-bound domain boundaries, we find evidence that chromatin loops can collide, a process that is negatively regulated by the cohesin release factor WAPL. Loop collision is further pronounced in WAPL knockout cells, suggestive of a “cohesin traffic jam”. In summary, single molecule multi-contact analysis methods can reveal how the myriad of regulatory sequences spatially coordinate their actions on individual chromosomes. Insight into these single allele higher-order topological features will facilitate interpreting the consequences of natural and induced genetic variation and help uncovering the mechanisms shaping our genome.
Steps to reproduce
Dataset format Datasets are prepared in gzipped tab-delimitted files and named using the following format: MC4C_<Cell_Type>-<Viewpoint_Name>.tsv.gz Datasets are already filtered for PCR duplicates and therefore, each read uniquely represents a single allele. Each row in a .gz file represents a single fragment. Each fragment is then described by the following columns (represented as headers): ReadID: A numeric identifier (starting from 1) representing each sequenced read. Chr: A number between 1 to #chr in the genome representing the chromosome where the corresponding fragment is mapped to. MappedBegin and MappedEnd: Begin and end coordinates of closest restriction site in the genome where the fragment is mapped to. Strand: A numberic value representing strand of mapped fragment in the reference genome (i.e. 1 --> Forward, -1 --> Reverse). MQ: Mapping quality, representing mapping confidence of aligner (i.e. BWA-SW) for mapping that fragment. SeqRunIndex: A numeric identifier to represent sequencing runs of the same view point (this column can be potentially ignored most of the time as after PCR filter, data from same the viewpoint but different sequencing run can be combined together). Dataset information (meta-data) A meta-data file is provided for each dataset to provide additional information that could be useful in analysing MC-4C datasets. These files are named as <Dataset_Name>_run-info.txt. Each file contain the following rows: SequencingRuns: List of sequencing runs used for this dataset (delimitted by ;). ViewpointRegion: Viewpoint coordinate for this dataset (format: <chromosome>:<primerBegin>-<primerEnd>). RestrictionEnzymes: Pair of restriction enzyme names used to prepare MC-4C library for this dataset. ReferenceGenome: Reference genome used to process this dataset (i.e. hg19 or mm9). LocusofInterest: Begin and end coordinates of locus of interest (LOI) used in multi-contact analysis of MC-4C.