SNPs of Cassava Diversity South American collection

Published: 27 September 2022| Version 1 | DOI: 10.17632/r4bn3r9k9x.1
Contributors:
,

Description

Detection of single nucleotide SNPs of a cassava diversity panel of South American genebank collection, using RAD-tag libraries developed by the Beijing Genomic Institute (BGI) and using the EcoRI restriction enzyme (recognition site: 5’-G/AATTC-3’). The RAD-Seq products from the Latin American cassava landraces were processed in the next generation Illumina® sequencing platform HiSeq2000 (at BGI, Hong Kong, China).

Files

Steps to reproduce

The cassava reference genome v6.1 was downloaded from the Phytozome website (www.phytozome.net) and the GATK pipeline was used to map RAD-Seq reads against the cassava reference genome for discovering SNPs and small InDels. For the detection of single nucleotide SNPs, NGSEP was used on default settings: 1) minimum genotype quality (40); 2) maximum value allowed for a base quality score (30); and 3) maximum number of alignments allowed starting at the same reference site (100). We first filtered SNPs in repetitive regions of the genome follow by filtering out SNPs genotyped in less than 85% of the samples; and with Phred quality scores higher than 40 (Q40) in each sample. We then excluded indels, multiallelic and monomorphic variants, leaving only biallelic SNPs variables in the final data set. Finally, we excluded SNPs with other variants within 10bp. The filters, functional annotation of variants, and the conversion of variant call format (VCF) to other formats for downstream analysis were performed also in NGSEP.

Institutions

Centro Internacional de Agricultura Tropical

Categories

Single Nucleotide Polymorphism, Genotyping

Licence