Development of a pipeline for the search and annotation of lncRNA in transcriptomic data and its application for the analysis of maize

Published: 31 May 2022| Version 1 | DOI: 10.17632/fnk8pmp2yz.1
Artem Pronozin,
Dmitry Afonnikov


LncRNAs are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have demonstrated the diversity and importance of lncRNA functions in plants: involvement in the regulation of gene expression, homeostasis of plant physiological parameters. However, structure and function features are known only for a small number of lncRNAs and were experimentally confirmed only for single cases. To expand knowledge about lncRNA in other species, computational pipelines that allow standardized data processing steps in a mode that does not require user control up to the final result have recently been actively developed. This makes it possible to implement wider functionality for lncRNA data identification and analysis. In the present work, we proposed a pipeline ICAnnoLncRNA for automatic prediction, classification, and annotation of plant lncRNAs. This pipeline was applied to analysis of 877 maize transcriptome libraries. More than 9 million lncRNAs were predicted and classified into 3 classes with respect to their localization in the genome, structural features of lncRNAs, tissue specificity, and homology with other organisms.


Steps to reproduce

blast.outfmt6 - Blastn results. Contain homologs with known lncRNAs sequences from the LncAPDB library. index_and_newindex.fasta - index of PNRD, CANTATAdb, GREENC, PlncDB, EVLncRNA databases compared with new index for LncAPDB library. LncAPDB.fasta - lncRNA sequences of LncAPDB library in fasta format. lncrna_class.tmap - novel lncRNAs divided into gffcompare classes. lncrna_coordinats.bed - coordinates of novel lncRNAs on chromosomes. lncrna_intron_size.tsv - intron size of novel lncRNAs and their coordinates on the genome. new_lncrna.fasta - novel lncRNA sequences in fasta format. new_lncrna_locus.loci - locus of lncRNA sequences on genome. transcriptome_lib.txt - Maize transcriptome libraries.


Institut citologii i genetiki SO RAN


Transcriptome, Pipeline, Database, Long Noncoding RNA