Rubus moschus genome assembly ver. 1
Description
Rubus moschus genome assembly generated and used in the work of Sochor et al.: Plant kleptomaniacs: geographic genetic patterns in the amphi-apomictic Rubus ser. Glandulosi (Rosaceae) reveal complex reticulate evolution of Eurasian brambles. doi: https://doi.org/10.1101/2024.01.16.575855
Files
Steps to reproduce
First, long reads were generated by Oxford Nanopore Technologies (ONT) using Rapid Sequencing kit SQK-RAD004 and the MinION sequencing device and two R9.4.1 flow cells following the manufacturer’s instructions. The genomic DNA was extracted using Invisorb Spin Plant Mini Kit (Invitec Molecular, Berlin) and subsequently size-selected for fragments of >40 kbp by Short Read Eliminator XL Kit (Circulomics). Basecalling was performed in the software MINKNOW 21.02.1 using the DNA High-Accuracy algorithm. Second, short high-accuracy reads were generated from the same specimen by Macrogen Europe (Amsterdam) on the Illumina Novaseq6000 sequencing platform in the 2×150 bp configuration, using the TruSeq DNA PCR-Free kit with a 350 bp insert for library preparation. The ONT data were checked in FastQC 0.11.9 (Andrews, 2010) and adaptor sequences were trimmed in LongQC 1.2.0c (Fukasawa et al., 2020). NANOFILT 2.8.0 (De Coster et al., 2018) was used for filtering data on the minimum read length of 150 bp, minimum average read quality score of 5, and 10 nucleotides were trimmed from start of each read. Whole-plastome sequence was assembled from the Illumina data in GETORGANELLE 1.7.6.1 (Jin et al., 2020) using kmer sizes of 21, 45, 65, 85 and 105 and the embryophyta plant plastome database. Completeness of the sequence was checked visually in alignment with publicly available Rubus plastomes. ONT sequences that mapped on the plastome sequence were subsequently filtered out via mapping in MINIMAP2 (ver. 2.24; Li, 2018) and only non-plastome reads were used for de novo genome assembly in SMARTDENOVO (Liu et al., 2021) with default parameters and kmer size set to 19. Subsequently, the sequence was polished in MEDAKA 1.6.0. (https://github.com/nanoporetech/medaka) using the base-called ONT reads, and in NEXTPOLISH 1.3.0 (Hu et al., 2019) using the Illumina data. Finally, the contigs were scaffolded into chromosomes in RAGTAG 2.1.0 (Alonge et al., 2022) using its scaffold function and two reference genome sequences: R. ulmifolius ‘Burbank Thornless’ genome ver. 1 (https://www.dnazoo.org/assemblies/Rubus_ulmifolius) and R. occidentalis genome ver. 3 (VanBuren et al., 2018).
Institutions
Categories
Funding
Czech Science Foundation
21-01233S