Comparison of L1 insertions in hg19 with six other mammalian genomes

Published: 21-04-2018| Version 1 | DOI: 10.17632/56sxpgs4d9.1
Contributor:
Jan Attig

Description

The table contains the position and result of cross-species comparison for all RepeatMasker annotated L1 elements in hg19. File is gzip'ped. The first eleven columns correspond to the information provided by RepeatMasker. Then 4 columns provide annotation of each L1 as used in the associated publication. The remaining columns refer to the result from liftover of the L1 to the respective genome (hg38, gorGor5, rheMac8, mm10, rn6, bosTau8, canFam3), and consists of one of four categories: absent, degenerate, notLINE, or present. Degenerate means the L1 was lifted to the genome, but was heavily truncated (size less than 33% of human). notLINE means the L1 was lifted but the liftover position is not recognised as a LINE repeat by RepeatMasker in the new genome.

Files

Steps to reproduce

We tested for presence of orthologues positions with the UCSC Genome Browser LiftOver tool, using the respective all-chain BLASTZ files. Human L1 element from hg19 RepeatMasker annotation were first lifted to hg38. We then tested for the presence of each L1 element by retrieving orthologue genomic loci for the genomes of rhesus macaque (rheMac8), gorilla (gorGor5), mouse (mm10), rat (rn6), dog (canFam3) and cow (bosTau8). To curate the LiftOver results and safeguard against misannotation by errors in the genome lift, we cross referenced for all liftover positions if the element overlaps with a LINE annotated by RepeatMasker for the respective genome, and only refer to the element as present in a species if at least 33% of the lifted genomic position are LINE-derived as annotated by RepeatMasker. All other elements are either ‘notLINE’ if they were not identified by RepeatMasker, ‘degenerate’ if LiftOver reported them as ‘partially-deleted’, or ‘absent’ if LiftOver reported them as ‘deleted’. Elements from hg19 that were not ‘present’ in hg38 were discarded entirely. Then we converted the LiftOver annotation to phylogenetic groups after manual inspection of the liftover results in the following manner. We denoted elements as human- and primate-specific, which are ‘absent’ in all other species. We denoted additional elements as primate-specific, if they were either ‘present’, ‘degenerate’ or ‘notLINE’ in at least one of the two primate species, and ‘absent’ or ‘notLINE’ in all of the others. We denoted elements as specific for the euarchontoglires branch, if the element was ‘absent‘ or ‘notLINE’ in the two laurasiatherian species, and ‘present’ or ‘degenerate’ in mouse or rat. The remaining elements were all lifted towards at least one of the two laurasiatherian species, and hence present in the last common ancestor of the species we surveyed. Elements present in one but absent in the other were denoted as found in ‘one distant species’, elements present in both as found in ‘two distant species’. All remaining elements were either reported as degenerate in both species, or the liftover results were ‘unclear’ (for example if the element was lifted to many species but did not overlap with the LINE annotation in any of those).