Dataset_1

Published: 1 March 2021| Version 1 | DOI: 10.17632/df8w8dct3b.1
Contributors:
,

Description

Dataset_1 provides seven FASTA files corresponding to protein databases. The composite database, named “All_Databases_5950827_sequences.fasta” contains protein sequences retrieved from public databases related to cephalopods salivary glands and proteins identified from our original data. This database comprises a total of 5,950,827 protein sequences and in turn it is composed by six smaller databases, named with capital letters from A to F: Database_A_19087_sequences.fasta, Database_B_16990_sequences.fasta, Database_C_2427_sequences.fasta, Database_D_84778_sequences.fasta, Database_E_5106635_sequences.fasta, Database_F_720910_sequences.fasta. Each one of these databases, contains data from several sources, i.e.: Database_A_19087_sequences.fasta – protein database from proteogenomic analyses of O. vulgaris salivary apparatus, built by Fingerhut et al. (2018); Database_B_16990_sequences.fasta – antimicrobial peptides from a non-redundant database collected by Aguilera-Mendoza et al. (2015); Database_C_2427_sequences.fasta – proteins identified with Proteome Discoverer using our 12 LTQ raw files against the UniProt database for the Metazoa taxonomic selection (2018_07 release); Database_D_84778_sequences.fasta and Database_E_5106635_sequences.fasta – proteins identified, from de novo transcriptome assemblies of 16 cephalopods posterior salivary glands, by TransDecoder and six-frame translation tool, respectively; Database_F_720910_sequences.fasta – proteins obtained by six-frame translation tool using the transcripts profiled in the transcriptome of O. vulgaris, but not included by the authors in Database_A_19087_sequences.fasta.

Files

Steps to reproduce

For more information on how this Dataset was obtained please refer to the Data article: Almeida, Daniela; Domínguez-Pérez, Dany; Matos, Ana; Agüero-Chapin, Guillermin; Castaño, Yuselis; Vasconcelos, Vitor; Campos, Alexandre; Antunes, Agostinho. 2020. "Data Employed in the Construction of a Composite Protein Database for Proteogenomic Analyses of Cephalopods Salivary Apparatus" Data 5, no. 4: 110. https://doi.org/10.3390/data5040110.

Institutions

Universidade do Porto Centro Interdisciplinar de Investigacao Marinha e Ambiental

Categories

Proteogenomics

Licence