SeqLengthPlot assessment on the de novo paired-end transcriptome assembly of Savalia savaglia

Published: 10 October 2024| Version 3 | DOI: 10.17632/ngxzc3dcfp.3
Contributors:
Dany Domínguez Pérez,
,
,
,

Description

This dataset contains the output folder compiled by SeqLengthPlot assessed on the de novo paired-end transcriptome assembly of the false black coral Savalia savaglia. The folder seq_length_Assembly_Ss_PE.Trinity contains: • seq_above199bp.fasta: Retrieved FASTA file containing transcripts with lengths of 200 bp and above, after splitting of the input FASTA file. • seq_below200bp.fasta: Retrieved FASTA file containing transcripts with lengths below 200 bp, after splitting of the input FASTA file. • seq_length_distribution_above99bp.png: PNG image file showing a histogram of transcripts with lengths of 200 bp and above on a linear scale. • seq_length_distribution_above199_log.png: PNG image file showing a histogram of transcripts with lengths of 200 bp and above on a logarithmic scale. • seq_length_distribution_below200bp.png: PNG image file showing a histogram of transcripts with lengths below 200 bp on a linear scale. • seq_length_distribution_below200_log.png: PNG image file showing a histogram of transcripts with lengths below 200 bp on a logarithmic scale. • seq_length_stats_by_threshold_200.txt: Text file containing detailed statistics of the transcripts lengths in the input FASTA file, including the total number of sequences, the number of sequences 200 bp and above, the number of sequences below 200 bp, and the corresponding minimum and maximum lengths.

Files

Steps to reproduce

The folder seq_length_Assembly_Ss_PE.Trinity is the resulting output of applying the python-based script SeqLengthPlot.py on on the de novo paired-end transcriptome assembly of Savalia savaglia Assembly_Ss_PE.Trinity.fasta, using a length cuttof of 200 base pairs (bp). The input FASTA file was previously assembled with Trinity v2.15.1 using a single kmer in forward mode.

Institutions

Stazione Zoologica Anton Dohrn

Categories

Transcriptomics, Protein Annotation, Sequence Analysis

Funding

This work was supported by Centro Ricerche ed Infrastrutture Marine Avanzate in Calabria (CRIMAC) - Fondo FSC 2014-2020 - Piano Stralcio «Ricerca e Innovazione 2015-2017» – Programma Nazionale Infrastrutture di Ricerca (PNIR), CUP C64I20000320001.

Licence