Composite Dataset of Input and Output Files from Complex Similarity Network Analysis of Secreted Cysteine-Rich peptides/proteins Without Annotation (SCRs-WA)

Published: 27 February 2025| Version 2 | DOI: 10.17632/mjnn6kjgkh.2
Contributors:
Dany Domรญnguez Pรฉrez,
,
,

Description

This dataset contains a composite collection of bioactive peptide sequences and Complex Similarity Network (CSN) analysis outputs, designed to explore the functional relationships of 1,872 Secreted Cysteine-Rich peptides/proteins Without Annotation (SCRs-WA). The dataset integrates eight peptide classes, including antimicrobial peptides (AMPs), defensins, venoms/toxins, and non-AMP controls, to establish a reference chemical space for functional inference. It includes both input sequence data (FASTA format) and CSN-derived output files, which facilitate the visualization and clustering of peptide sequences based on structural and functional similarities: 1- FileSM1: FileSM1_12449_All_8_datasets.fasta ๐Ÿ“„ Content: A FASTA file containing 12,449 peptide sequences across eight datasets: (i) Low-toxicity antimicrobial peptides (AMPs) (ii) Defensins (iii) Animal venoms and toxins (iv) Cytotoxic peptides (v) Haemolytic peptides (vi) Non-AMPs (negative controls) (vii) Cnidarian toxin candidates from S. savaglia (viii) Secreted Cysteine-Rich ORFs Without Annotation (mSCRs-WA) ๐Ÿ” Usage: - Serves as the primary input dataset for complex similarity network (CSN) analysis. - Enables homology searches, functional annotation, and comparative analyses. ๐Ÿ“ค Output Files from CSN Analysis 2- ๐Ÿ—‚ FileSM2: FileSM2_HSPN_Topology_GraphML.zip ๐Ÿ“„ Content: A compressed ZIP file containing GraphML representations of the Half-Space Proximal Network (HSPN): HSPN_clusters_projection.graphml โ†’ Clustered projection of peptide connectivity based on similarity metrics. HSPN_peptide_classes_projection.graphml โ†’ Projection of peptide classes (AMPs, toxins, defensins, etc.), highlighting their network positioning. ๐Ÿ–ฅ Visualization: Can be opened in Gephi v0.10 or any GraphML-compatible tool. Nodes represent peptide sequences, edges indicate functional similarity, and clusters reflect shared bioactivity profiles. ๐Ÿ” Usage: - Facilitates visual exploration of sequence relationships. - Enables functional annotation transfer by identifying clusters with known bioactive peptides. 3- ๐Ÿ—‚ FileSM3: FileSM3_Clusters_Composition_Analysis.xlsx ๐Ÿ“„ Content: A spreadsheet detailing cluster composition in the HSPN analysis, including: Cluster ID and size Distribution of peptides across eight datasets Functional annotation insights for each cluster ๐Ÿ” Usage: - Helps identify key functional groups within the CSN framework. - Provides quantitative insights into peptide distribution and classification. 4- ๐Ÿ—‚ FileSM4: FileSM4_HSPN_Connections_Analysis.xlsx ๐Ÿ“„ Content: A spreadsheet detailing functional connections between peptides, including: Pairwise similarity scores Network centrality measures (e.g., harmonic centrality, degree centrality) Annotations of linked sequences ๐Ÿ” Usage: - Supports similarity-based functional inference. - Helps track peptide relationships and connectivity patterns within the network.

Files

Steps to reproduce

A curated dataset comprising 12,449 representative peptides has been assembled to approach the mature peptides from a subset of 1,872 Secreted Cysteine-Rich ORFs Without Annotation (SCR-WA), along with 248 cnidarian toxin from Savalia savaglia. This dataset incorporates eight subsets, including well-characterized peptide classes (i-vi), such as: (i) low-toxicity antimicrobial peptides (AMPs), (ii) defensins, (iii) animal venoms and toxins, (iv) cytotoxic peptides, (v) haemolytic peptides, and (vi) non-AMPs (negative controls). The remaining subsets (vii and viii) consist of putative cnidarian toxins and the SCR-WA mature peptides/proteins currently under examination.

Institutions

Stazione Zoologica Anton Dohrn

Categories

Neural Networks (Biological Sciences), Sequence Analysis

Licence