Dataset 1 - Protein Libraries Of Seven Databases From Cnidaria Omics Data After Duplicates Removal

Name: Dataset 1 - Protein Libraries Of Seven Databases From Cnidaria Omics Data After Duplicates Removal
Creator: Alexandre Barroso
Published: 2024-12-06T22:05:18.441Z
Keywords: Peptides, Biodiscovery, Omics, Antimicrobial

Barroso, Alexandre; Agüero-Chapin, Guillermin; Sousa, Rita; Marrero-Ponce, Yovani; Antunes, Agostinho

doi:10.17632/grwy638mtr.1

Dataset 1 - Protein Libraries Of Seven Databases From Cnidaria Omics Data After Duplicates Removal

Published: 6 December 2024| Version 1 | DOI: 10.17632/grwy638mtr.1

Contributors:

,

Description

Non duplicated protein libraries from seven databases of Cnidaria: Db1 – 6 proteomes derived from sequenced genomes of Anthozoa Db2 – 2 proteomes derived from sequenced genomes of Medusozoa Db3 – 46 whole body/non-specific transcriptomes of Anthozoa Db4 – 24 whole body/non specific transcriptomes of Medusozoa Db5 – 25 transcriptomes specific to the tentacles of Anthozoa Db6 – 7 transcriptomes specific to the tentacles of Medusozoa Db7 – 2 transcriptomes specific to the nematocysts of Anthozoa

Files

Steps to reproduce

1. A total of 8 proteomes derived from genomic data were obtained from UniProt Proteome Database (https://www.uniprot.org/), and 104 transcriptomes were collected from the National Centre for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/) 2. TransDecoder v5.7.1 (https://github.com/TransDecoder/TransDecoder) with a minimum open reading frame of 50 amino acids 3. Construction of 7 protein databases categorized by species and tissue type (whole body/non-specific, tentacles and nematocysts) for both Anthozoa and Medusozoa 4. Seqkit tool v2.6.1 (https://bioinf.shenwei.me/seqkit/download/) to remove duplicates

Institutions

Universidade do Porto Centro Interdisciplinar de Investigacao Marinha e Ambiental, Universidad San Francisco de Quito Colegio de Ciencias de la Salud, Universidad Panamericana Aguascalientes Facultad de Ingenieria, Universidade do Porto Faculdade de Ciencias

Dataset 1 - Protein Libraries Of Seven Databases From Cnidaria Omics Data After Duplicates Removal

Description

Files

Steps to reproduce

Institutions

Categories

Licence