Decoding the Inversion Symmetry Underlying Transcription Factor DNA-Binding Specificity and Functionality in the Genome

Published: 15 April 2019| Version 1 | DOI: 10.17632/8ks7pm2jw6.1
Laurel Coons


[Column E and Column F] We overlapped the location coordinates of the 81,922 0-nt to 5-nt variant 13-nt ERE or HRE DNA elements in the genome and the location coordinates of the ChIPSeq or ChIPExo peaks in an experiment (157 ER experiments at 0-nt to 5-nt variant ERE DNA elements) (194 KR experiments at 0-nt to 5-nt variant HRE DNA elements) to determine the absolute number of times each 0-nt to 5-nt variant 13-nt ERE or HRE DNA element occurred within an experiment (the entire 13-nt DNA element was required to be within the peak boundaries). [Column V] Multiple peak selection criteria were used (L4, L8, L10, L15, L20), where Lx represents an x-fold greater tag density at peaks than in the surrounding 10-kb region. This performs a low-to-high stringency analysis of the data. By studying the data with respect to a multiple spectrum of peak selection criteria, we adjust for both the risk of excess background noise and the risk of filtering out any low-amplitude information. All peaks are observed at L4, [Column V] identifies whether that specific peak is also observed at L8, L10, L15 or L20. The ChIPSeq or ChIPExo peaks were annotated to regions in the genome ( using Homer. [Column A] Peak ID [Column B] Chromosome [Column C] Peak start position [Column D] Peak end position [Column G] Strand [Column H] Peak Score [Column I] FDR/Peak Focus Ratio/Region Size [Column J] Annotation (i.e. Exon, Intron, ...) [Column K] Detailed Annotation (Exon, Intron etc. + CpG Islands, repeats, etc.) [Column L] Distance to nearest RefSeq TSS [Column M] Nearest TSS: Native ID of annotation file [Column N] Nearest TSS: Entrez Gene ID [Column O] Nearest TSS: Unigene ID [Column P] Nearest TSS: RefSeq ID [Column Q] Nearest TSS: Ensembl ID [Column R] Nearest TSS: Gene Symbol [Column S] Nearest TSS: Gene Aliases [Column T] Nearest TSS: Gene description [Column U] Nearest TSS: Gene type


Steps to reproduce

Experiments were obtained from Gene Expression Omnibus and converted to fastq format using fastq-dump v2.4.5; original experiment names were retained. Reads were first selected using a cross-correlation analysis ( The sequencing reads were then mapped uniquely, allowing for no more than two mismatches, to the reference genome [mouse genome build 38 (mm10) or human genome build 37 (hg19)] using Bowtie v1.1.2. Mapped reads were deduplicated using MarkDuplicates.jar from the Picard tools package v1.96. Peak selection was performed using Hypergeometric Optimization of Motif Enrichment (HOMER) v4.7.2. See "DNA Sequence Constraints Define Functionally Active Steroid Nuclear Receptor Binding Sites in Chromatin" (Coons et al., 2017) for additional details (Supplemental Table 6).


National Institutes of Health, Duke University, National Institute of Environmental Health Sciences


Combinatorics, Genomics, Bioinformatics, Information Theory, Transcription, DNA, Steroid Hormones, Steroid, DNA Computing, Computational Genomics, Nuclear Receptor, Binding Protein, Complementary DNA, DNA Sequencing, Gene Expression, Protein P53, DNA Binding Protein, Nuclear Hormone Receptor, Complement, Transcription Factor, Gene Transcription, DNA Transcription, Chip Technology, Big Data, Combinational Logic, Transcription Binding Site, Computational Bioinformatics, Data Acquisition in Bioinformatics, Bioinformatics Programming Language, Gene Regulation, Molecular Mechanism of Gene Regulation, Functional Genomics, High-Throughput Sequencing, Molecular Symmetry, High Throughput Analysis, Genetic Research, Inversion, P53, Binding, Functional Analysis, Symmetry Breaking, Symmetry Detection, Steroid Hormone Receptor, Application of Big Data