In silico discovery of terpenoid metabolism in Cannabis sativa

Due to their efficacy, cannabis based therapies are currently being prescribed for the treatment of many different medical conditions. Interestingly, treatments based on the use of cannabis flowers or their derivatives have been shown to be very effective, while therapies based on drugs containing THC alone lack therapeutic value and lead to increased side effects, likely resulting from the absence of other pivotal entourage compounds found in the Phyto-complex. Among these compounds are terpenoids, which are not produced exclusively by cannabis plants, so other plant species must share many of the enzymes involved in their metabolism. In the present work, 23,630 transcripts from the canSat3 reference transcriptome were scanned for evolutionarily conserved protein domains and annotated in accordance with their predicted molecular functions. A total of 215 evolutionarily conserved genes encoding enzymes presumably involved in terpenoid metabolism are described, together with their expression profiles in different cannabis plant tissues at different developmental stages. The resource presented here will aid future investigations on terpenoid metabolism in Cannabis sativa.


Introduction
Due to its astonishing efficacy 1 , nowadays cannabis is prescribed by physicians for the treatment of neurological, psychiatric, immunological, cardiovascular, gastrointestinal, and oncological conditions [2][3][4][5][6][7] . Although therapies based on the use of cannabis flowers or their derivatives are recognized to be very effective, treatments centered on drugs containing Δ 9 -tetrahydrocannabinol (THC) alone lack efficacy and lead to increased side effects 8,9 . This discrepancy seems to result from the absence of the synergistic effects of additional pivotal compounds found in the Phytocomplex, the so-called entourage effect 10 . Among these molecules are other cannabinoids and terpenoids, which are thought to play major roles in the modulation of THC 11 .
Terpenes are small hydrocarbon (isoprenoid) molecules classified as either as monoterpenes, sesquiterpenes, diterpenes or carotenoids, depending on the number of isoprene units (C 5 ) used to synthetize them. Terpenoids are small lipids derived from terpenes, often accompanied by a strong odor useful for the plants to protect themselves against possible predators 12 . Terpenoids not only have important functions when working in concert with cannabinoids; they have been widely investigated in many different plant species and are being exploited as anti-fungal, antibacterial, anti-oxidant, anti-inflammatory, anti-stress, anti-cancer and analgesic agents 13-18 . However, whilst the gene networks controlling the biosynthesis of cannabinoids and their precursors have been extensively studied 19-22 , the biosynthetic pathway of terpenoid molecules in Cannabis sativa is only recently being elucidated. Only two genes have been characterized, one encoding (-)-limonene synthase, the other (+)-α-pinene synthase 23 , two enzymes responsible for the conversion of geranyl pyrophosphate into limonene and pinene, respectively 24,25 . Remarkably, while cannabinoids are only found in cannabis plants, terpenoids are also produced by a variety of other plant species, so they must share many of the enzymes involved in their metabolism.
In the present work, evolutionarily conserved genes encoding enzymes predicted to be involved in terpenoid metabolism have been identified within the transcripts of the canSat3 reference transcriptome of Cannabis sativa 21 . Moreover, by taking advantage of available gene expression data 26 , gene expression profiling of these enzymes was performed in cannabis plant tissue at different developmental stages. The data note presented here will provide researchers with a corollary of candidate genes that will considerably accelerate future investigations on terpenoid metabolism in Cannabis sativa.

Gene expression analysis
Gene expression profiles from cannabis plant tissue at different developmental stages were downloaded from the NCBI GEO repository (https://www.ncbi.nlm.nih.gov/geo/). Gene expression heatmaps and unsupervised hierarchical clustering were performed with GENE-E 3.0.213 29 .

Results
Although the Cannabis sativa reference genome and transcriptome has been publicly released 21 , only a few genes have been characterized and surveyed for their molecular functions. To defne the possible roles of these genes, 40,197 canSat3 transcript sequences were downloaded from the cannabis genome browser (http://genome.ccbr.utoronto.ca/), translated in silico, and scanned for evolutionarily conserved protein domains for functional anno- tation ( Figure 1). To identify the genes presumably playing a role in terpenoid metabolism, annotated transcripts were filtered for gene ontology (GO) categories involved in terpene biosynthesis and catabolism using the AmiGO 2 reference database. A total of 288 transcripts representing 215 different genes were predicted to be involved in the metabolism of bisabolene, cadinene, carotene, copaene, ent-kaurene, farnesol, geraniol, germacrene, lycopene, limonene, myrcene, phytoene, pinene, squalene, and others (Supplementary table 1). Functional characterization of this subset confirmed an enrichment for GO categories involved in different terpene biosynthetic and catabolic processes ( Figure 2).
Terpenoids are produced by several plants species and in several types of plant tissue as defense against predators 30 . Similar to other were constitutively expressed in all tissues. These results highlight which enzymes are expressed by specific tissues and will provide a strong rationale for further investigations on the molecular basis of terpenoid metabolism.

Discussion
The active principles inside plants have been exploited by humans for centuries, with Cannabis sativa being one of the oldest ever used for medicinal purposes 31 . Surprisingly, contrary to whole plant extracts, medicinal products containing exclusively THC have been found to lack efficacy and lead to unbearable side effects 8,9 . These results arise from the fact that these products lack other important co-factors typically found in the Phyto-complex, such as terpenoids and other cannabinoids 10 that contribute to the synergistic effects seen with whole plant extracts.
While genes involved in cannabinoid biosynthesis have been widely investigated 19-22 , the gene network controlling terpenoid metabolism is only recently being elucidated, with genes encoding (-)-limonene synthase and (+)-α-pinene synthase being the only two characterized 23 . To this end, Cannabis sativa transcripts 21 have been scanned for evolutionarily conserved protein domains and annotated according to their presumptive molecular function. As a result, 215 evolutionarily conserved genes were predicted to be involved in terpenoid metabolism. Furthermore, in silico gene expression profiling 26 of these enzymes in cannabis plant tissue at different developmental stages highlighted different gene clusters with peculiar expression patterns. For instance, cluster 3 genes (Figure 3) displayed high expression specifically in hemp flowers, which could be of great interest as different cannabis strains harbor different entourage effects.
Since the current cannabis reference transcriptome is still at preliminary stages 21 , it is very likely that false negatives have caused important transcripts to still be missing. For example, the two genes

Supplementary table 1. Evolutionarily conserved terpenoid metabolism transcripts
List of putative terpenoid metabolism genes obtained with Blast2GO. Click here to access the data.

Supplementary table 2. Terpenoid metabolism gene profiling in different tissues and developmental stages
Gene expression matrix of predicted terpenoid metabolism genes. Expression units are expressed in RPKM. Click here to access the data. encoding for (-)-limonene synthase and (+)-α-pinene synthase 23 align on the same transcript predicted to encode for Myrcene synthase (PK25781 .1 in Supplementary table 1), and therefore cannot be discriminated. Unfortunately, to overcome this issue at whole genome level we need the complete version of the reference transcriptome to be available. Until that time, researchers are forced to validate single transcripts with classic low-throughput technology, such as molecular cloning followed by Sanger sequencing.
Nevertheless, the data presented here will ease future investigations on terpenoid metabolism in Cannabis sativa by providing researchers with a collection of candidate genes. For instance, one of these genes was predicted to encode for β-bisabolene synthase (PK05069.1 in Supplementary table 1). Bisabolene is being used as an antimicrobial agent 32 , as well as a biofuel 33 . However, prior to this report nothing was known about the gene network controlling its metabolism in Cannabis sativa. As soon as future studies will integrate gene expression data with chemical analysis, the complete molecular scenario underlying terpenoid metabolism will be revealed.

Data availability
Processed gene expression data can be found in the NCBI GEO repository (https://www.ncbi.nlm.nih.gov/geo/) with accession number GSE93201.

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work.

Are the datasets clearly presented in a useable and accessible format? Partly
No competing interests were disclosed. Competing Interests: I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The present study is appropriate as a data note, and may provide a useful shortcut for future studies on terpenoid biosynthesis in cannabis. The author identifies transcripts putatively involved in terpenoid biosynthesis, including processes of both primary and specialized metabolism. The rationale is well explained, and the methods are technically sound. Major changes: More information is required in the Materials and Methods for the study to be replicable. Two versions of the canSat3 transcriptome are available, and the authors should specify whether they used the full or representative transcript set. The authors must also provide the NCBI accession numbers for the GEO dataset they used. There is an unsupported statement in the results section, which reads: "Similar to other biological compounds, their abundance directly correlates with the expression levels of the enzymes involved in their metabolism." In a complex system, this is not obvious and requires textual support.
Minor changes: The Introduction generally provides a good rationale for the study. However, the definitions provided for 'terpene' and 'terpenoid' in the second sentence of the second paragraph do not reflect the definitions used in reference 12, nor are they the traditional definitions for those terms. References 24 and 25 address enzymes characterized in other organisms, and so do not support the statement regarding their biosynthetic activities in cannabis. That information is provided in reference 23. The final statement of the second introductory paragraph is unsupported and requires citation. Figure 2 caption states that the transcripts were taken from the reference genome, but the Materials and Methods indicate that a transcriptome was used. Readers should note the specific products of terpene synthases and other enzymes in specialized metabolism are currently difficult to predict using methods. A recent paper (Booth et al., 2017 ) in silico includes biochemical and phylogenetic analysis of some of the candidate genes highlighted here, and may be of interest to readers.
No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.