Dataset 1 - Protein Libraries Of Seven Databases From Cnidaria Omics Data After Duplicates Removal
Description
Non duplicated protein libraries from seven databases of Cnidaria: Db1 – 6 proteomes derived from sequenced genomes of Anthozoa Db2 – 2 proteomes derived from sequenced genomes of Medusozoa Db3 – 46 whole body/non-specific transcriptomes of Anthozoa Db4 – 24 whole body/non specific transcriptomes of Medusozoa Db5 – 25 transcriptomes specific to the tentacles of Anthozoa Db6 – 7 transcriptomes specific to the tentacles of Medusozoa Db7 – 2 transcriptomes specific to the nematocysts of Anthozoa
Files
Steps to reproduce
1. A total of 8 proteomes derived from genomic data were obtained from UniProt Proteome Database (https://www.uniprot.org/), and 104 transcriptomes were collected from the National Centre for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/) 2. TransDecoder v5.7.1 (https://github.com/TransDecoder/TransDecoder) with a minimum open reading frame of 50 amino acids 3. Construction of 7 protein databases categorized by species and tissue type (whole body/non-specific, tentacles and nematocysts) for both Anthozoa and Medusozoa 4. Seqkit tool v2.6.1 (https://bioinf.shenwei.me/seqkit/download/) to remove duplicates