The patellogastropod shell proteome

Published: 16-07-2021| Version 1 | DOI: 10.17632/tsdg7zt35g.1
Donald Colgan


Transcriptome data for Patelloida mimula and MGF data for shell and tissue proteomics analyses from various patellogastropods. The results of the ID nanoLC ESI MS/MS liquid chromatography/mass spectrophotometry are presented for individual specimens (in MGF format) in the “MGF data” folder. Note that the prefix DC is appended to the sample numbers. The spreadsheets output from the proteomics analyses are presented in the subfolders of the “Peptide and protein reports ex SearchGUI” folder. The subfolder “Lottia gigantea proteome on tissue extractions” contains default protein and peptide reports from comparisons with the Lotgi1_GeneModels_FilteredModels1_aa (at ttps:// The subfolder “Lottia gigantea shell proteome comparisons” includes reports for comparisons with the sequences in Lottia_shell_proteome_associated_proteins.fas (see below); reports for comparisons with the file “P_mimula_getorf_output.fasta“ are shown in the subfolder “Patelloida mimula overall comparisons”; and those for “P_mimula_shell_proteome_associated_proteins.fsa” in “Patelloida mimula shell protein comparisons” The “Tables and combined files” folder includes summaries of the combined protein reports for comparisons with the L. gigantea shell-associated proteins (“combined_lottia_shell_protein_reports.csv”) and possible descriptions and Blast2GO identifications of contigs in the Patelloida mimula transcriptome (“P_mimula_orfs_with_possible_identification_blast2GO.csv”). The "Proteins found in all tissues" file lists the proteins found confidently in all the tissue samples using either the Lottia gigantea gene models or the identified ORFS in the Patelloida mimula transcriptome. The "mascot search summary" file lists proteins identified in shell extractions using MASCOT searches (Matrix Science, Boston, MA) of general protein databases conducted by Matthew Fitzhenry of the Australian Proteomics Analysis Facility. There are five files In the folder “Transcriptome and sequence data”. “PM1_ATV9B_TAGCTT_L001_R1 (paired) trimmed (paired) contig list.fa” is the fasta format transcriptome assembly. Amino acid sequences from the getorf analysis of these data are shown in “P_mimula_getorf_output.fasta” (getorf parameters: minimum length of 200 and “translation of regions between START and STOP codons”, with searches on both strands). The “Peroxidases_mafft_aligned.fasta’ file contains gastropod peroxidases aligned by the program MAFFT. Patelloida mimula shell-associated proteins found in the transcriptome are shown in “P_mimula_shell_proteome_associated_proteins.fsa” and L gigantea shell-associated proteins in “Lottia_shell_proteome_associated_proteins.fas”


Steps to reproduce

The transcriptome was prepared from mRNA isolated using Trizol according to the manufacturer’s instructions. Next generation DNA sequencing of paired-end 101 base pair reads was conducted on an Illumina MiSeq instrument by the Australian Genome Research Facility. The data were filtered in CLC Genomics Workbench ( ), trimming reads by quality scores (limit 0.05), and removing terminal ambiguities and omitting them entirely when they had more than two internal ambiguities. CLC Genomics Workbench was then used for assembly using a word length of 20, bubble size of 30 and the “fast” algorithm. ORFS in transcriptome assemblies were identified by getorf in Galaxy, with a minimum length of 200 and “translation of regions between START and STOP codons”, with searches on both strands. Blast2GO searches were used to search for preliminary functional prediction. For shell extractions, 5 to 10 mg of material (the whole shell or fragments) was used from individual specimens. The material was cleaned with four washes of 12.5 % sodium hypochlorite solution during a period of three hours with sonication for five minutes at each change. The acid soluble fraction of proteins was extracted from the cleaned shell material by dissolving it in 0.6 ml 25% acetic acid overnight at room temperature in an open tube. The solution was concentrated by centrifugation using Amicon Ultra-0.5 micro-filters (Ultracel-3 membrane, 3 kDa exclusion limit). The remaining proteomics experimentation was conducted by the Australian Proteomics Analysis Facility (APAF) using their standard approach for ID nanoLC ESI MS/MS liquid chromatography/mass spectrophotometry. The programs MS-GF+, Comet and Andromeda were used in the package SearchGUI 3.3.17 to analyse MGF format spectrum data. PeptideShaker-1.16.43 was used with various sets of protein sequences to search for peptide spectrum matches (“PSMs”), peptides and proteins in the searchGUI output. Decoy sequences were generated by reversing the sequences in the dataset. Default protein and peptide reports (with non-validated matches) were exported from PeptideShaker as spreadsheets.