SeqLengthPlot Outputs on toxin candidates identified by DeTox in the Paired-End Transcriptome of Savalia savaglia

Published: 27 July 2024| Version 2 | DOI: 10.17632/5kky464sf2.2
Contributors:
Dany Domínguez Pérez,
,
,
,

Description

This dataset contains the output folder compiled by SeqLengthPlot applied to the toxin candidates identified by DeTox in the paired-end of Savalia savaglia. The folder seq_length_DeTox_output_Ss_PE_candidate_toxins contains: • seq_above99aa.fasta: Retrieved FASTA file containing the toxin candidates with lengths of 100 aa and above, after splitting the input FASTA file based on the given threshold. • seq_below100bp.fasta: Retrieved FASTA file containing the toxin candidates with lengths below 100 aa, after splitting the input FASTA file based on the given threshold. • seq_length_distribution_above99aa.png: PNG image file showing a histogram of toxin candidate lengths of 100 aa and above on a linear scale. • seq_length_distribution_above99_log.png: PNG image file showing a histogram of toxin candidate lengths of 100 aa and above on a logarithmic scale. • seq_length_distribution_below100aa.png: PNG image file showing a histogram of toxin candidate lengths below 100 aa on a linear scale. • seq_length_distribution_below100_log.png: PNG image file showing a histogram of toxin candidate lengths below 100 aa on a logarithmic scale. • seq_length_stats_by_threshold_100.txt: Text file containing detailed statistics of the toxin candidate lengths in the input FASTA file, including the total number of sequences, the number of sequences 100 aa and above, the number of sequences below 100 aa, and the corresponding minimum and maximum lengths.

Files

Steps to reproduce

The folder seq_length_DeTox_output_Ss_PE_candidate_toxins is the resulting output of applying the python-based script SeqLengthPlot.py on toxin candidates identified by DeTox in the Paired-End Transcriptome of Savalia savaglia, using a length cuttof of 100 amino acids (aa).

Institutions

Stazione Zoologica Anton Dohrn

Categories

Protein Annotation, Sequence Analysis

Funding

This work was supported by Centro Ricerche ed Infrastrutture Marine Avanzate in Calabria (CRIMAC) - Fondo FSC 2014-2020 - Piano Stralcio «Ricerca e Innovazione 2015-2017» – Programma Nazionale Infrastrutture di Ricerca (PNIR), CUP C64I20000320001.

Licence