iSTOP datasets (Billon, Bryant et al, Molecular Cell, 2017)

Published: 30 July 2017| Version 1 | DOI: 10.17632/xbdtvf6bvj.1
Alberto Ciccia


iSTOP datasets in 8 eukaryotic species (H. sapiens, M. musculus, R. norvegicus, D. rerio, C. elegans, D. melanogaster, A. thaliana and S. cerevisiae). Species names and genome assembly IDs are specified in each file name. Each row represents a targetable genomic coordinate within a gene. All ORFs were validated to have start and stop codons, an appropriate sequence length, and no internal stop codons. Columns are defined as follows: gene – a single gene name; chr – chromosome name; strand – the strand of the targeted base in the coding sequence; genome_coord – the genomic coordinate of the targeted base; codon – the codon targeted; n_isoforms – the number of isoforms considered for the gene; percent_isoforms – the percentage of a gene’s isoforms that are targeted at this coordinate; percent_NMD – the percentage of isoforms predicted to incur nonsense-mediated decay as predicted by targeting of an isoform’s coding sequence 55 bases upstream of the final exon-exon-junction; rel_pos_largest_isoform – the relative position in the largest isoform targeted at the genomic coordinate (0 = beginning of coding sequence, 1 = end of coding sequence); no_upstream_G – TRUE indicates there is no G in the 5’ position relative to the targeted C; RFLP_Loss – enzymes that uniquely cut +/- 50 bases of genomic sequence from targeted base before editing; RFLP_Gain – enzymes that uniquely cut +/- 50 bases of genomic sequence from targeted base after editing; sgNGG, sgNGA, sgNGCG, sgNGAG, sgNNGRRT, sgNNNRRT – 20 bp guide sequence for corresponding PAM (targeted C is lowercase); sgNGG_off_targets, sgNGA_off_targets, sgNGCG_off_targets, sgNGAG_off_targets, sgNNGRRT_off_targets, sgNNNRRT_off_targets – Number of off-target locations in the genome determined by searching for matching sequence with up to two mismatches allowed in the first 8 bases of the guide sequence.


Steps to reproduce

Instructions for reproducing all computational analyses are available on GitHub (


Biological Sciences, Genetic Engineering