Datasets and codes for repeats annotation in genome

Published: 23 May 2018| Version 1 | DOI: 10.17632/k88h5xnhcb.1
Contributors:
Lu Zeng,
, Joy Raison,
,

Description

1) Files begin with 'all_retrovirus.' in 'report_run' are data we used to identify retrovirus (see detail in supplementary 1.5.3); 2) Files begin with 'GB_TE.new' in 'GBTE_data' are index files we used to identify reverse transcriptase and TE sequences from NCBI (see detail in supplementary 1.5.3); 3) 'report_run' are codes used to run reportsJ.pl (see detail in supplementary 1.5.3); 4) Files begin with 'sprot.' in 'report_run' are index files we used to identify proteins (see detail in supplementary 1.5.3); 5) 'Vertebrate_use.fa' is Vertebrate repeat consensus sequences downloaded from Repbase, we used it as CENSOR library (see detail in supplementary 1.5.1); 6) 'our_known_reps_20130520' was used in the first CENSOR run (see detail in supplementary 1.5.1). 7) 'RepBase20.04.fasta' used in last step of TE annotation, contains CENSOR TE references

Files

Institutions

The University of Adelaide

Categories

Bioinformatics

Licence