TableS6. Inclusion of 43583 human exons in GTEx V6p consortium data.
We were interested in the percent inclusion of repeat-derived exons in human tissues. Repeat-derived exons were defined as any exon, for which either the 5' or the 3' splice site is derived from an Alu or a LINE repeat. We quantified these and all exons of the same genes in the V6p data of the GTEx consortium. In total, we covered 43583 exons across 52 tissues and sub-tissues. The data is from hg19 and positions are 0-based.
Steps to reproduce
We retrieved positions of all exons of genes with a LINE or Alu-derived exon from UCSC table browser, hg19. We identified junction-spanning reads to each of these exons in a 2 nt grace window around the splice site and used those to identify the 5’ and 3’ splice site of the upstream and downstream exon. We identified internal exons by restricting the data to exons with upstream and downstream junctions. We only allowed a single exon inclusion isoform across tissues (i.e. identical flanking exons) and chose the isoform with more junction reads. To ensure sequencing depth and gene expression were sufficient to calculate exon inclusion, we only used exons with at least 200 reads across the 8,555 samples (average of up+downstream junctions or skipping junctions). We calculated the Percent-spliced-in as PSI = 50*(upstream + downstream junctions) / (skipping junction + 0.5*(upstream + downstream junctions), and inclusion within each tissue as average of all samples. If an exon was absent in any tissue, as judged by absence of any junction spanning read and any read for the skipping junction, it was treated as ‘data not available’ for this particular tissue.