Decoding the Inversion Symmetry Underlying Transcription Factor DNA-Binding Specificity and Functionality in the Genome
[Column E and Column F] We overlapped the location coordinates of the 81,922 0-nt to 5-nt variant 13-nt ERE or HRE DNA elements in the genome and the location coordinates of the ChIPSeq or ChIPExo peaks in an experiment (157 ER experiments at 0-nt to 5-nt variant ERE DNA elements) (194 KR experiments at 0-nt to 5-nt variant HRE DNA elements) to determine the absolute number of times each 0-nt to 5-nt variant 13-nt ERE or HRE DNA element occurred within an experiment (the entire 13-nt DNA element was required to be within the peak boundaries). [Column V] Multiple peak selection criteria were used (L4, L8, L10, L15, L20), where Lx represents an x-fold greater tag density at peaks than in the surrounding 10-kb region. This performs a low-to-high stringency analysis of the data. By studying the data with respect to a multiple spectrum of peak selection criteria, we adjust for both the risk of excess background noise and the risk of filtering out any low-amplitude information. All peaks are observed at L4, [Column V] identifies whether that specific peak is also observed at L8, L10, L15 or L20. The ChIPSeq or ChIPExo peaks were annotated to regions in the genome (annotatePeaks.pl) using Homer. [Column A] Peak ID [Column B] Chromosome [Column C] Peak start position [Column D] Peak end position [Column G] Strand [Column H] Peak Score [Column I] FDR/Peak Focus Ratio/Region Size [Column J] Annotation (i.e. Exon, Intron, ...) [Column K] Detailed Annotation (Exon, Intron etc. + CpG Islands, repeats, etc.) [Column L] Distance to nearest RefSeq TSS [Column M] Nearest TSS: Native ID of annotation file [Column N] Nearest TSS: Entrez Gene ID [Column O] Nearest TSS: Unigene ID [Column P] Nearest TSS: RefSeq ID [Column Q] Nearest TSS: Ensembl ID [Column R] Nearest TSS: Gene Symbol [Column S] Nearest TSS: Gene Aliases [Column T] Nearest TSS: Gene description [Column U] Nearest TSS: Gene type
Steps to reproduce
Experiments were obtained from Gene Expression Omnibus and converted to fastq format using fastq-dump v2.4.5; original experiment names were retained. Reads were first selected using a cross-correlation analysis (trim_and_filter_SE.pl). The sequencing reads were then mapped uniquely, allowing for no more than two mismatches, to the reference genome [mouse genome build 38 (mm10) or human genome build 37 (hg19)] using Bowtie v1.1.2. Mapped reads were deduplicated using MarkDuplicates.jar from the Picard tools package v1.96. Peak selection was performed using Hypergeometric Optimization of Motif Enrichment (HOMER) v4.7.2. See "DNA Sequence Constraints Define Functionally Active Steroid Nuclear Receptor Binding Sites in Chromatin" (Coons et al., 2017) for additional details (Supplemental Table 6).