Sequence and annotation of 42 cannabis genomes reveals extensive copy number variation in cannabinoid synthesis and pathogen resistance genes
Cannabis is a diverse and polymorphic species. To better understand cannabinoid synthesis inheritance and its impact on pathogen resistance, we shotgun sequenced and assembled a Cannabis trio (sibling pair and their offspring) utilizing long read single molecule sequencing. This resulted in the most contiguous Cannabis sativa assemblies to date. These reference assemblies were further annotated with full-length male and female mRNA sequencing (Iso-Seq) to help inform isoform complexity, gene model predictions and identification of the Y chromosome. To further annotate the genetic diversity in the species, 40 male, female, and monoecious cannabis and hemp varietals were evaluated for copy number variation (CNV) and RNA expression. This identified multiple CNVs governing cannabinoid expression and 82 genes associated with resistance to Golovinomyces chicoracearum, the causal agent of powdery mildew in cannabis. Results indicated that breeding for plants with low tetrahydrocannabinolic acid (THCA) concentrations may result in deletion of pathogen resistance genes. Low THCA cultivars also have a polymorphism every 51 bases while dispensary grade high THCA cannabis exhibited a variant every 73 bases. A refined genetic map of the variation in cannabis can guide more stable and directed breeding efforts for desired chemotypes and pathogen-resistant cultivars.