Data from: Over eight hundred cannabis strains characterized by the relationship between their subjective effects, perceptual profiles, and chemical compositions
Commercial cannabis strains have multiplied in recent years as a consequence of regional changes in legislation for medicinal and recreational use. The lack of a standardized system to label plants and seeds hinders the consistent identification of particular strains with their elicited psychoactive effects. We analyzed a large publicly available dataset where users freely reported their experience with different strains, including subjective effects and flavour associations. Metrics of strain similarity based on self- reported effects and flavour tags allowed machine learning classification into three major clusters associated with species (Cannabis sativa, Cannabis indica, and hybrids). Synergy between terpene and cannabinoid content was suggested by significative correlations between psychoactive effects and flavour tags. The use of predefined tags was validated by the application of Latent Semantic Analysis (LSA) to unstructured written reviews, also providing breed-specific topics consistent with their purported medicinal and subjective effects. While cannabinoid content was variable even within individual strains, terpene profiles matched the perceptual characterizations made by the users and could be used to predict psychoactive effects. Our work represents the first data-driven synthesis of self-reported and chemical information on a large number of cannabis strains. Since terpene content is robustly inherited and less influenced by environmental factors, flavour perception could represent a reliable marker for the prediction of psychoactive effects of cannabis. Our novel methodology contributes towards the demands for reliable strain classification and characterization in the context of an ever-growing market for medicinal and recreational cannabis.