Data on G-quadruplex topology, and binding ability of G-quadruplex forming sequences found in the promoter region of biomarker proteins and those relations to the presence of nuclear localization signal in the proteins
Aptamer is a nucleic acid ligand which specifically binds to its target molecule. It is typically obtained by an in vitro screening process called SELEX (Sequential Evolution of Ligands by Exponential Enrichment) [1, 2]. However, it sometimes fails to obtain aptamers because of PCR bias [3, 4] and the limited diversity of the random library . To overcome the problems, we previously designed a SELEX-free aptamer identification method called “G-quadruplex (G4) promoter-derived aptamer selection (G4PAS)” . In G4PAS procedure, putative G4 forming sequences (PQS) were explored in a promoter region of a target biomarker protein in human genomic DNA through computational analysis, and the identified DNA sequences were characterized as aptamer candidates towards the gene product encoded in the downstream of the promoter. As the characterization, the identified PQSs were chemically synthesized, and the binding ability was investigated by surface plasmon resonance (SPR) measurement and gel-shift assay. Also, the G4 topology of the obtained PQSs was investigated by circular dichroism measurement. Additionally, the presence of nuclear localization signal in the target protein was predicted in silico using web tools (NLSdb  and cNLS Mapper ). This data set summarized all the DNA sequences of PQSs, the dissociation constant (KD) obtained by SPR measurement, the results of gel-shift assay, and the results of nuclear localization signal prediction to address the possibility of binding of these PQS region to the target proteins in vivo. Those data should contribute to increase the success rate of G4PAS. Moreover, considering the G4 motifs in genomic DNA are suggested to be involved in in vivo gene regulation [9, 10], this data set is also potentially beneficial for the cell biology field.  C. Tuerk, L. Gold L, Science 249 (1990) 505–510.  A.D. Ellington, J.W. Szostak, Nature 346 (1990) 818–822.  M. Polz, C. Cavanaugh, Applied and Environmental Microbiology 64 (1998) 3724–3730.  T. Kanagawa, Journal of Bioscience and Bioengineering 96 (2003) 317–323.  S.J. Klug, M. Famulok, Molecular Biology Reports 20 (1994) 97–107.  W. Yoshida, T. Saito, T. Yokoyama, S. Ferri, K. Ikebukuro, PLoS ONE, 8 (2013) e65497.  R. Nair, P. Carter, B. Rost, Nucleic acids research 31 (2003) 397-399.  S. Kosugi, M. Hasebe, M. Tomita, H. Yanagawa, Proceedings of the National Academy of Sciences of the United States of America, 106 (2009) 10171-10176  H. J. Lipps, D. Rhodes, Trends in Cell Biology, 19 (2009) 414-422.  D. Varshney, J. Spiegel, K. Zyner, D. Tannahill, S. Balasubramanian. Nature Reviews Molecular Cell Biology, 21 (2020) 259-474.
Steps to reproduce
Known biomarker proteins were chosen as the targets. DNA sequences 1 kbp upstream and 1 kbp downstream from the transcription start site of the target protein were extracted using the UCSC genome browser (https://genome. ucsc.edu/). Followingly, putative G-quadruplex-forming sequences were extracted using the QGRS mapper (http://bioinformatics.ramapo.edu/QGRS/index. php) with the criterion of “G2 < N1-7G2 < N1-7G2 < N1-7 G2 <”. The binding of the extracted DNA sequences towards the target protein was investigated by surface plasmon resonance (SPR), and gel shift assay. For the SPR assay, the oligonucleotides were diluted in TBS buffer (10 mM Tris-HCl, 150 mM NaCl, 100 mM KCl; pH7.4) and heated to 95 °C for 10 min and then cooled to 25 °C gradually over 30 min. After the heat treatment, it was further diluted to various concentrations using TBS buffer, and was used for the SPR assay. The SPR signal of the reference cell without protein immobilization was subtracted from that of the protein-immobilized cell. The DNA was associated for 120 s in TBS, and dissociated for 120 s in 1 M NaCl at a flow rate of 30 µL/min at 25 °C. The dissociation constant; KD was calculated by applying curve fitting using BIAevaluation software (GE Healthcare). For the gel-shift assay, FITC-labeled oligonucleotides were diluted to 1 µM in TBS and heated to 95 °C for 10 min and then cooled down to 25 °C gradually. The heat-treated oligonucleotides and target proteins were mixed in TBS at the final concentration of 500 nM and 1 µM, respectively. The mixed samples were incubated with shaking (1200 rpm) for 30 min at 25 °C. The prepared sample was electrophoresed in 12% polyacrylamide gel in TBE buffer (90 mM Tris, 90 mM Boric acid, 2 mM EDTA, pH 8.16), followed by scanning the gel using Typhoon8600 (GE Healthcare, Chicago, IL, USA). The topology of the G-quadruplex-structure were investigated by Circular dichroism (CD) spectroscopy. The DNA oligonucleotide samples were diluted to 2 µM in Tris buffer (10 mM Tris–HCl, 150 mM NaCl; pH 7.4) or TBS buffer, and were heated to 95 °C for 10 min and then gradually cooled to 25 °C over 30 min. 50 µL of the prepared sample was added into a quarts cell; Micro cell 50 µL 10 mm Path UV (Agilent Technologies, CA), and CD spectra were measured in the range of 220–320 nm using a J-820 spectropolarimeter (JASCO, Tokyo, Japan) with the optical path of 10 mm at 20°C. To search nuclear localization signal, all the amino acid sequences of target proteins including its isoforms were obtained from UniProt (https://www.uniprot.org). The obtained sequences were subjected for NLS prediction by web tools - NLSdb (https://rostlab.org/services/nlsdb/) and cNLS Mapper (http://nls-mapper.iab.keio.ac.jp/cgi-bin/NLS_Mapper_form.cgi). Prediction by cNLS Mapper were carried out with the cut-off score of 4.0 within the entire region of protein sequence.