Datasets and codes for repeats annotation in genome

Published: 23-05-2018| Version 1 | DOI: 10.17632/k88h5xnhcb.1
Contributors:
Lu Zeng,
Dan Kortschak,
Joy Raison,
Terry Bertozzi,
David Adelson

Description

1) Files begin with 'all_retrovirus.' in 'report_run' are data we used to identify retrovirus (see detail in supplementary 1.5.3); 2) Files begin with 'GB_TE.new' in 'GBTE_data' are index files we used to identify reverse transcriptase and TE sequences from NCBI (see detail in supplementary 1.5.3); 3) 'report_run' are codes used to run reportsJ.pl (see detail in supplementary 1.5.3); 4) Files begin with 'sprot.' in 'report_run' are index files we used to identify proteins (see detail in supplementary 1.5.3); 5) 'Vertebrate_use.fa' is Vertebrate repeat consensus sequences downloaded from Repbase, we used it as CENSOR library (see detail in supplementary 1.5.1); 6) 'our_known_reps_20130520' was used in the first CENSOR run (see detail in supplementary 1.5.1). 7) 'RepBase20.04.fasta' used in last step of TE annotation, contains CENSOR TE references

Files