Sequence and annotation of 42 cannabis genomes reveals extensive copy number variation in cannabinoid synthesis and pathogen resistance genes

Published: 20 Jan 2020 | Version 1 | DOI: 10.17632/3sx3v5zvrm.1
Contributor(s):

Description of this data

Cannabis is a diverse and polymorphic species. To better understand cannabinoid synthesis inheritance and its impact on pathogen resistance, we shotgun sequenced and assembled a Cannabis trio (sibling pair and their offspring) utilizing long read single molecule sequencing. This resulted in the most contiguous Cannabis sativa assemblies to date. These reference assemblies were further annotated with full-length male and female mRNA sequencing (Iso-Seq) to help inform isoform complexity, gene model predictions and identification of the Y chromosome. To further annotate the genetic diversity in the species, 40 male, female, and monoecious cannabis and hemp varietals were evaluated for copy number variation (CNV) and RNA expression. This identified multiple CNVs governing cannabinoid expression and 82 genes associated with resistance to Golovinomyces chicoracearum, the causal agent of powdery mildew in cannabis. Results indicated that breeding for plants with low tetrahydrocannabinolic acid (THCA) concentrations may result in deletion of pathogen resistance genes. Low THCA cultivars also have a polymorphism every 51 bases while dispensary grade high THCA cannabis exhibited a variant every 73 bases. A refined genetic map of the variation in cannabis can guide more stable and directed breeding efforts for desired chemotypes and pathogen-resistant cultivars.

Experiment data files

Related links

Latest version

  • Version 1

    2020-01-20

    Published: 2020-01-20

    DOI: 10.17632/3sx3v5zvrm.1

    Cite this dataset

    McKernan, Kevin (2020), “Sequence and annotation of 42 cannabis genomes reveals extensive copy number variation in cannabinoid synthesis and pathogen resistance genes”, Mendeley Data, v1 http://dx.doi.org/10.17632/3sx3v5zvrm.1

Statistics

Views: 19
Downloads: 0

Categories

Genomics

Licence

CC BY 4.0 Learn more

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?
You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.

Report