iSTOP datasets (Billon, Bryant et al, Molecular Cell, 2017)

Published: 30 Jul 2017 | Version 1 | DOI: 10.17632/xbdtvf6bvj.1

Description of this data

iSTOP datasets in 8 eukaryotic species (H. sapiens, M. musculus, R. norvegicus, D. rerio, C. elegans, D. melanogaster, A. thaliana and S. cerevisiae).

Species names and genome assembly IDs are specified in each file name. Each row represents a targetable genomic coordinate within a gene. All ORFs were validated to have start and stop codons, an appropriate sequence length, and no internal stop codons. Columns are defined as follows: gene – a single gene name; chr – chromosome name; strand – the strand of the targeted base in the coding sequence; genome_coord – the genomic coordinate of the targeted base; codon – the codon targeted; n_isoforms – the number of isoforms considered for the gene; percent_isoforms – the percentage of a gene’s isoforms that are targeted at this coordinate; percent_NMD – the percentage of isoforms predicted to incur nonsense-mediated decay as predicted by targeting of an isoform’s coding sequence 55 bases upstream of the final exon-exon-junction; rel_pos_largest_isoform – the relative position in the largest isoform targeted at the genomic coordinate (0 = beginning of coding sequence, 1 = end of coding sequence); no_upstream_G – TRUE indicates there is no G in the 5’ position relative to the targeted C; RFLP_Loss – enzymes that uniquely cut +/- 50 bases of genomic sequence from targeted base before editing; RFLP_Gain – enzymes that uniquely cut +/- 50 bases of genomic sequence from targeted base after editing; sgNGG, sgNGA, sgNGCG, sgNGAG, sgNNGRRT, sgNNNRRT – 20 bp guide sequence for corresponding PAM (targeted C is lowercase); sgNGG_off_targets, sgNGA_off_targets, sgNGCG_off_targets, sgNGAG_off_targets, sgNNGRRT_off_targets, sgNNNRRT_off_targets – Number of off-target locations in the genome determined by searching for matching sequence with up to two mismatches allowed in the first 8 bases of the guide sequence.

Experiment data files

Steps to reproduce

Instructions for reproducing all computational analyses are available on GitHub (

This data is associated with the following publication:

CRISPR-Mediated Base Editing Enables Efficient Disruption of Eukaryotic Genes through Induction of STOP Codons

Published in: Molecular Cell

Latest version

  • Version 1


    Published: 2017-07-30

    DOI: 10.17632/xbdtvf6bvj.1

    Cite this dataset

    Ciccia, Alberto (2017), “iSTOP datasets (Billon, Bryant et al, Molecular Cell, 2017)”, Mendeley Data, v1


Views: 5968
Downloads: 256


Biological Sciences, Genetic Engineering


CC BY 4.0 Learn more

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?

You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.