Sterechinus neumayeri v1 genome annotation
Description
Genome annotation was performed using the BRAKER3 pipeline (Gabriel et al., 2023). A repeat library was generated with RepeatModeler (Smit & Hubley, 2010). The repeat families were compared to known echinoderm protein coding gene models using BLAST and any repeat with a significant hit (e-value < 5e-5) were removed. The resulting repeat library was used to identify and mask repeats using RepeatMasker prior to annotation (Smit et al., 2010). BRAKER was run using protein models from S. purpuratus (available on echinobase https://download.xenbase.org/echinobase/Genomics/Spur5.0/sp5_0_GCF.gff3.gz ; last accessed 2/15/24), L.variegatus (available from echinobase https://download.xenbase.org/echinobase/Genomics/Lvar3.0/Lvar3_0_GCF_proteins.fa.gz ; last accessed 2/15/24), and L.pictus (available from echinobase https://download.xenbase.org/echinobase/Genomics/Lpic2.1/Lpic2_1_GCF_proteins.fa.gz ; last accessed 2/15/24) along with publicly available transcriptomic data on NCBI sequencing read archive and in house RNAseq dataset spanning embryogenesis. The gene models were annotated using the notation SNE_XXXXXX and analyzed for ‘completeness’ with BUSCO version 4 using the metazoan gene set (Simão et al., 2015). GRN gene curation was first carried out by curating a list of gene regulatory network genes (supplemental file X). Then, orthofinder2 (Emms & Kelly, 2019) was used to identify orthologs between S.neumayeri, L.variegatus, S.purupratus, and recently published genome annotation for P.lividus (Emms & Kelly, 2019; Marlétaz et al., 2023).
Files
Institutions
Categories
Funding
U.S. National Science Foundation
OPP-191661
U.S. National Science Foundation
OPP-2038149
U.S. National Science Foundation
OPP-2038088
U.S. National Science Foundation
OPP-2225144
U.S. National Science Foundation
OPP-1916665