Whole genome sequencing of colonies derived from cannabis flowers and the impact of media selection on benchmarking total yeast and mold detection tools

Background: Cannabis products are subjected to microbial testing for human pathogenic fungi and bacteria. These testing requirements often rely on non-specific colony forming unit (CFU/g) specifications without clarity on which medium, selection or growth times are required. We performed whole genome sequencing to assess the specificity of colony forming units (CFU) derived from three different plating media: Potato Dextrose Agar (PDA), PDA with chloramphenicol and Dichloran Rose Bengal with chloramphenicol (DRBC). Methods: Colonies were isolated from each medium type and their whole genomes sequenced to identify the diversity of microbes present on each medium selection. Fungal Internal Transcribed Spacer (ITS3) and Bacterial 16S RNA(16S) quantitative polymerase chain reactions (qPCR) were performed, to correlate these CFUs with fungi- and bacterial- specific qPCR. Results: Each plating medium displayed a ten-fold difference in CFU counts. PDA with chloramphenicol showed the highest diversity and the highest concordance with whole genome sequencing. According to ITS3 and 16S qPCR confirmed with whole genome sequencing, DRBC under counted yeast and mold while PDA without chloramphenicol over counted CFUs due to bacterial growth without selection. Conclusions: Colony Forming Unit regulations lack specificity. Each medium produces significant differences in CFU counts. These are further dependent on subjective interpretation, failure to culture most microbes, and poor selection between bacteria and fungi. Given the most human pathogenic microbes found on cannabis are endophytes which culture fails to detect, molecular methods offer a solution to this long-standing quantification problem in the cannabis testing field.


Introduction
Total yeast and mold testing are required in many states to test the safety of cannabis, prior to the sale of cannabis flowers and cannabis-infused products. Cannabis is an inhaled product, and cases of cannabis-transmitted Aspergillosis have been reported in the clinical literature (Bal et al., 2010;Gargani et al., 2011;McKernan et al., 2015McKernan et al., , 2016Remington et al., 2015;Ruchlemer et al., 2015). Cannabis is a unique matrix, in that antibiotic cannabinoids can make up to 20% of the flowers' weight, and many fungi infecting cannabis are endophytes. Endophytes are not easily cultured from the plant without lysing open plant cell walls. The conditions which lyse open plant cells walls also lyse open fungal cell walls, thus impacting the viability of the microbes in the lysis and homogenization processes required for testing. Cannabis flowers contain both bacteria and fungi, further complicating fungal quantification for colony forming units (CFU) that lack speciation. Antibiotic selections are often utilized to reduce background bacteria, but many of these antibiotics (e.g. chloramphenicol) inhibit the growth of the most human pathogenic fungi found on cannabis (Fusarium, Pythium and Aspergillus) (Smith & Marchant, 1968;Day et al., 2009;Joseph et al., 2015).
As part of an AOAC Emergency response validation (ERV) in the State of Michigan, we investigated the impact of medium selection on surveying total yeast and mold on cannabis. Cured cannabis flowers were homogenized and tested on 3 different plating media. These media's were chosen as they are referenced in the FDA Bacterial Analytical Manual (BAM) (https://www.fda.gov/food/laboratory-methods-food/bacteriological-analytical-manual-bam). These data were compared to ITS3-and 16S-based qPCR and whole genome sequencing. To further complement these cannabis flower samples, organisms were acquired from the American Tissue Culture Collection (ATCC) and plated as pure monocultures on different plating media to confirm the differential growth on each medium.

Plating
Samples originated from Steadfast Analytical Laboratories (Hazel Park, MI) and were tested independently at a laboratory within the Michigan Coalition of Independent Cannabis Testing Laboratories. Briefly, 10 grams of dried cannabis flowers were sampled from three lots of homogenized cannabis containing high, medium and low quantities of fungal and bacterial CFUs, as measured using culture-based techniques with chloramphenicol selection. 10 grams of homogenized flower were soaked with 90 ml of Tryptic Soy Broth (TSB, Medicinal Genomics #420205) in a filtered Nasco Whirl-Pak bag (#B01385). Samples were homogenized by hand, and then 0.1 mL of solution plated onto three media (DRBC, PDA with chloramphenicol, PDA, at 1:100 dilution). Two additional dilutions were prepared (10 mL into 90 mL) and the same plating protocol was followed. All plates were incubated for 5 days at 25°C. qPCR ITS3 qPCR was performed as described in McKernan et al. with two modifications. Briefly, 1ml of homogenate from a Whirl-Pak bag was collected and briefly micro-centrifuged to enrich for live organisms. This pellet was resuspended in 200 μl ddH 2 O and lysed with the addition of 12 μl of Thaumatin-like protein (TLP) and incubated at 37°C for 30 minutes. This enzymatic lysis step (glucanase) ensures more complete lysis of fungal cell walls (Medicinal Genomics part #420206, McKernan et al., 2015. 12.5 μl of MGC Lysis buffer was added, vortexed and incubated for 5 minutes at 25°C. Lysed samples were micro-centrifuged and 200 μl of supernatant was aspirated and added to 250 μl of Medicinal Genomics binding buffer (MGC part# 420001) for magnetic bead isolation. The samples were incubated with the Medicinal Genomics magnetic bead mixture for 10 minutes, magnetically separated and washed two times with 70% ethanol. The beads were dried at 37°C for 5 minutes to remove excess ethanol and eluted with 25 μl of ddH 2 O. Quantitative PCR was performed using Medicinal Genomics PathoSEEK Total Yeast and Mold detection assay (MGC# 420103) and Medicinal Genomics PathoSEEK Total Aerobic Count Assay (MGC# 420106) according to the manufacturers' instructions on a BioRad CFX96 thermocycler.
DNA isolation from colonies for whole genome sequencing A total of 45 colonies were picked with a pipette tip and introduced into 200 μl of ddH 2 O with 12.5 μl of MGC TLP (MGC part #420206). TLP is a glucanase active at 37°C. Samples were digested for 30 minutes at 37°C and 12.5 μl of REVISED Amendments from Version 1 We have updated the manuscript to clarify human pathogens and plant pathogens. We have added references to the extensive prior art in the field scrutinizing ITS classification of cannabis microbes per Dr. Punja's suggestions. We have also expanded on the challenges assessing endophytes with culture based methods and how our study was restricted to only those colonies that could culture.
Any further responses from the reviewers can be found at the end of the article MGC Lysis buffer was added, vortexed and incubated for 5 minutes at 25°C. Lysed sample were micro-centrifuged and 200 μl of supernatant was aspirated and added to 250 μl of MGC binding buffer (MGC part # 420001) for magnetic bead isolation. The samples were incubated with the bead mixture for 10 minutes, magnetically separated and washed 2 times with 70% ethanol. The beads were dried at 37°C for 5 minutes to remove excess ethanol and eluted with 25 μl of ddH 2 O.
Library construction for whole genome sequencing. Fragmentation Genomic DNA (gDNA) was quantified with a Qubit (Thermo Fisher Scientific) and normalized to reflect 4-8 ng/μl in 13 μl of TE buffer. Libraries were generated using enzymatic fragmentation with the NEB Ultra II kits (NEB part # E7103). Briefly, 3.5 μl of 5X NEB fragmentation buffer and 1 μl of Ultra II fragmentation enzyme mix are added to 13 μl of DNA. This reaction was tip-mixed 10 times, vortexed, and quickly centrifuged. Fragmentation was performed in a BioRad CFX96 thermocycler at 3.5 minutes at 37°C, 30 minutes at 65°C. The reaction was kept on ice until ready for adaptor ligation.

Component
Volume (  least 10 times. The mixture was incubated for 5 minutes at 25°C. The PCR plate was placed on an appropriate magnetic stand (Medicinal Genomics #420202) to separate the beads from the supernatant. After the solution was clear (about 5 minutes), the supernatant was carefully removed and discarded. We were careful not to disturb the beads containing target DNAmolecules. The magnetic beads were washed by adding 200 μl of 70% ethanol to the PCR plate while on the magnetic stand. Followed incubation at room temperature for 30 seconds, and then careful removal and discarding of the supernatant. The ethanol wash was repeated once for a total of 2 washes. Trace amounts of ethanol were removed. The beads were air dried for~7 minutes while the PCR plate was on the magnetic stand with the lid open. The PCR plate was then removed from the magnet and target DNA eluted from the beads into 10 μl of H 2 O, then 9 μl of cleaned DNA was transferred to a fresh well.

PCR amplification
A volume of 12.5 μl 2x NEBNext Q5 Hot Start Master Mix (New England Biolabs #M0492S) was added to 9 μl ligated DNA, then 3.5 μl of NEB 8bp index primer/universal primer were added to the mix. The reaction ran in a cycling program set at 98°C for 30 seconds as an initial denaturization step; six cycles of denaturation, annealing and extension were performed, cycling between 98°C for 10 seconds and 65°C for 75 seconds. A final 5-minute step at 65°C was performed, with a final 4°C forever step. Step

PCR reaction cleanup
AMPure XP beads were resuspended at room temperature with a brief vortex. A volume of 15 μl of resuspended AMPure XP beads was added to the PCR reactions (~25 μl). To mix well, we pipetted up and down at least 10 times. The mixture was incubated for 5 minutes at room temperature. The PCR plate was put on an appropriate magnetic stand to separate the beads from the supernatant. After the solution was clear (about 5 minutes), the supernatant was carefully removed and discarded. We were careful not to disturb the beads containing the target DNA. A volume of 200 μl of 70% ethanol was added to the PCR plate while on the magnetic stand. The mix was incubated at room temperature for 30 seconds, and then the supernatant was carefully removed and discarded. The ethanol wash was repeated once more. The beads were air dried fof 7 minutes while the PCR plate was on the magnetic stand with the lid open. The target DNA molecules were eluted from the beads into 15 μl of nuclease-free H 2 O, and 15 μl were transferred into a fresh well.

Sample quality control
Libraries were evaluated on an Agilent Tape Station prior to pooling for Illumina sequencing. Sequencing was performed by GeneWiz, Cambridge MA. A total of 473 million paired reads (2 Â 150bp) were generated, averaging over 10 million read pairs per sample and a total sequence of 141Gb.

Analysis
Fastq files were uploaded to OneCodex) for Kmer analysis and Simpson's diversity index analysis for each genome (Extended data: Supplementary  assembled with MegaHit v.1.2.9 (Li et al., 2015(Li et al., , 2016. The Nextflow mapping and assembly pipeline is published on GitHub. Quast 5.0 was used to calculate the assembly quality statistics (Gurevich et al., 2013). Sequencing data is deposited in NCBI under Project ID PRJNA725256.

Results
Each colony which was imaged on plates and chosen for whole genome sequencing and OneCodex analysis is displayed in Figure 1 (DRBC), Figure 2 (PDA-chloramphenicol) and Figure 3 (PDA no chloramphenicol). A link to each OneCodex analysis and its respective NCBI submission ID is available in Supplementary Table 1 -Sheet Summary (Extended data, McKernan et al., 2021). Some of the colonies from the plate merged with other colonies producing mixtures of genomes as evident in the OneCodex pie charts. These merged colonies were further evidenced by the display of bimodal sequence coverage (clusters of contigs at 1000X and 10X coverage) and compared with the plating images ( Figure 4). A heatmap of sequencing read speciation and purity is seen in Figure 5. While merged colonies can be difficult to resolve visually, whole genome sequencing can resolve simple metagenomes and still extract additional diversity information from the samples. Colonies that were noticeably mixed according to sequence analysis and colony visual inspection were more prevalent with the PDA without selection colonies (Table 1). A Simpson's diversity index analysis demonstrated PDA with CAMP provides the highest diversity score ( Figure 6) While the DRBC had 100-fold lower CFU counts than PDA without selection, it predominantly displayed fungal colonies (80%) while PDA without selection was biased toward bacteria (22%). PDA with chloramphenicol displayed more fungi (55%) than bacteria and also produced a half log more fungal colonies than DRBC with chloramphenicol (Table 2).
One fungal sample (Cladosporum) presented delayed Ct (31.79) with PathoSEEK Total Yeast and Mold (ITS3-TYM) qPCR primers. Scrutiny of the primer sequences against the Cladosporum genome shows proper primer binding locations but missing probe sequences. This genome has low coverage (10X) and the repetitive ITS qPCR target regions are often poorly assembled in low coverage-genomes. This may explain the missing probe sequence in the low coverage fragmented assembly. Additionally, some significantly delayed PathoSEEK Total Aerobic Count (TAC) signal was observed in fungal colonies. This is the result of the use of the lytic enzyme (TLP) which is cloned and expressed in E. coli and contains some background E.coli DNA. This background TLP expression in E. coli produces signals that can be seen in blank preparations. In some cases, this signal is elevated due to mixed colonies observed in the sequencing data.
The qPCR method represents an increased selectivity in assessing fungal and bacterial CFU compared to DRBC, where only -92% of the colonies were fungal colonies. Quantitative PCR identified all fungi and never mistook one for bacteria. In a minority of cases we had visually mixed colonies. Even if we discount the mixed colonies and count, only the single bacterial colony out of 13 on DRBC, we obtain 92% (1/13) fungal colonies on DRBC where qPCR delivered perfect results. As a comparison, quantitative PCR demonstrated over 10 Cts (1024 fold) differences between the TYM and TAC signals on fungal colonies. The majority of the residual TAC signal being observed in fungi can be normalized and discounted with the background E. coli TLP DNA signal measured in blank preparations.
To confirm these observations several Aspergillus species and Botrytis cinerea were ordered from ATCC and plated on various plating medias in absence of background cannabis matrix (Table 3 and Figure 7). In all cases DRBC showed reduced CFU counts. techniques but ITS based methods can predict their inclusion and exclusion organisms in-silico and a-priori, where culture based methods cannot.

Discussion
In this study the DRBC selection reduced bacterial growth more than PDA with chloramphenicol, but also reduced the fungal CFU 5-fold in the process. This has important implications for chloramphenicol-sensitive cannabis endophytes like Aspergillus, Pythium and Fusarium. Cannabis endophytes are an important consideration in this work as endophytes can colonize both the inside and outside of the plant and methods used to quantitatively access them need to lyse open plant cell walls. These conditions also lyse open pathogen cells walls and cell membranes, rendering the pathogens nonculturable. Many of the pathogens listed for cannabis testing are documented plant endophytes including E. coli, Salmonella, Listeria and Aspergillus (Li et al., 2013;Wright et al., 2013;Kljujev et al., 2018aKljujev et al., , 2018b.This presents challenges when attempting to benchmark molecular methods to culture-based platforms incapable of detecting endophytic pathogenic risk. This sequencing was performed only on colonies that were identified through culture and thus does not include the complete endophytic diversity of the cannabis samples.  Both media types (PDA and DRBC) are referenced in the FDA Bacteriological Analytical Manual. States exclusively considering DRBC for ease of colony visualization should be aware of the species-specific sensitivities of using a single medium type, and consider species-specific testing for such human pathogenic organisms, to complement a partial yeast and mold test offered from a single selection-based medium. PCR-based techniques can identify more organisms than DRBC alone as no selection is occurring given thorough cell lysis is achieved for qPCR analysis. This is not a surprising result as Dichloran was developed as a media designed to suppress the growth of rapidly growing molds and bacteria (Henson, 1981).
Plating also suffers from having a very limited dynamic range. Since it is difficult to count colonies when more than 100 colonies are present on a plate, multiple dilutions are often required to understand the full range of CFU counts one    Sample DRBC 10 -2 CFU/g 10 -2 CFU/g 10 -2 CFU/g 10 -3 CFU/g 10 -3 CFU/g 10 -3 CFU/g Sample PDA with Chloramphenicol 10 -2 CFU/g 10 -2 CFU/g 10 -2 CFU/g 10 -3 CFU/g 10 -3 CFU/g 10 -3 CFU/g    may encounter with a test which is attempting to quantify 10,000 CFUs/gram. This results in multiplying diluted CFUs 10, 100 and even a 1,000 fold to back-estimate the total CFU count. In this scenario a single colony can swing the CFU count from passing to failing (9 colonies x 1,000 fold dilution vs 10 colonies at 1,000 fold dilution). Quantitative PCR has a linear dynamic range over 5-6 orders of magnitude and no such multiplication is required. Thus, qPCR provides a more accurate itemization of actual CFUs counts.
In-vitro inclusion and exclusion testing with ITS3 qPCR on ATCC-sourced organisms demonstrated over 96% inclusion (50 yeast and mold) and zero bacterial cross reactivity (30 bacteria This project contains the following extended data: Summary

Open Peer Review
The inclusion of the 3 selection media should be elaborated on. Why were these particular 3 media selected? Provide references to show where, or in what capacity, they may have been used in previously published work. There is abundant published work on the use of PDA with antibiotics to isolate fungi in the plant pathology literature. In fact, it is a standard medium used for isolation in labs worldwide. The addition of dichloran and rose bengal have also been used to restrict the growth of certain groups of fungi and bacteria as a semiselective medium for isolation in particular from soil samples. Therefore, it would not be expected to provide a broad spectrum of recovery of fungi and yeasts. It is surprising that this medium would be used to assess total yeast and mold counts in cannabis.

○
The comparison of the 3 media in this study sheds light on the differences in levels of recovery of fungi and yeasts. This is an important finding -not all media behave in the same manner. To observe a 10-fold difference in recovery between these media is quite significant as it illustrates the potential for under-representation in the recovery process.

○
The use of whole genome sequencing to apply to the identification of colony-forming units is a definite plus for this work. It shows the ability to rapidly identify what is present on the culture media with regard to molds that originated from the samples.

○
There are several prior reports of authors having recovered a range of fungi from cannabis buds and identified them using the ITS region. Please include these as a reference by which to compare the fungi and yeasts identified in the present study. It is important to build a body of knowledge on the exact identity of the general and species found on cannabis and how prevalent they are.

○
The report of endophytes in cannabis should be accompanied by a reference citation. These particular microbes are more difficult to recover in culture media and therefore a molecular approach has merit.

○
The inclusion of confirmed ATCC culture specimens to demonstrate differences in growth on the 3 media is a good confirmatory experiment.

○
The cannabis samples that originated from Steadfast Analytical Laboratories would have had an analysis of total yeast and mold conducted on them. Is it possible to have these results compared to those from the present study to show how the commercial lab testing may differ from the current study? Or was that not an objective of the current study?
○ During the preparation of samples for the ITS3 qPCR procedure, was there a subset of samples included that did not contain the TLP lysis step to show that it made a difference? Or is that included in prior published work? ○ For the 45 colonies that were selected for whole genome sequencing, could the identified genus and species be presented in a separate table? Perhaps in accordance with the media from which they were derived from? These would be a summary of what is shown in Figures  1, 2, 3. This is in addition to the OneCodex analysis and the NCBI submission ID available in Supplementary Table 1 It also helps clarify the data shown in Figure 5.
The Simpson's diversity index analysis shown in Figure 6 is extremely helpful to show the differences between the 3 media types in recovery.

○
The results from qPCR of the homogenate that was collected from the Whirl-Pak bags and subjected to PathoSEEK. How did this compare with the colony identification of the same sample plated on the 3 different media with regards to the identification of the genus and species present? Can this be shown in a Table? ○ The use of DRBC, if conducted by testing laboratories, is worrisome. It is known that the addition of dichloran and rose bengal is specifically used to discourage certain types of microbes from growing when used for recovery of specific types of fungi from soil samples. The inclusion of DRBC in a testing laboratory for cannabis TYM counts should be discouraged, as shown in the present work where total CFU's recovered were significantly lower compared to PDA with chloramphenicol. DRBC would significantly under-estimate the TYM counts as shown in Figure 4.

○
In Table 2, the headings seem incorrect as there are two with "PDA with chloramphenicol" and one should be "PDA w/o chloramphenicol" ○ Table 3 and Figure 7 clearly show how DRBC provides reduced growth compared to PDA.

○
Overall, this is an informative study and the results merit publication. Once the items identified by the reviewer are addressed, this study will be a good addition to the slowly expanding studies showing how complex the assessment of total yeast and mold levels in cannabis is. The information from these types of studies should guide government agencies on the pitfalls of certain methods used to assess TYMC.

Are sufficient details provided to allow replication of the method development and its use by others? Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article? Yes

Response:
We were not involved in the selection of these media types. The study was initiated on PDA in Michigan. After we completed the PDA study, the organizers informed us they were switching to DRBC based on Steadfast having used chloramphenicol based culture platforms to categorize the samples before shipping them to other labs in Michigan. We suspect the 3 media types were chosen due to their presence in the FDA BAM. We have added a sentence to clarify this.

○
The comparison of the 3 media in this study sheds light on the differences in levels of recovery of fungi and yeasts. This is an important finding -not all media behave in the same manner. To observe a 10-fold difference in recovery between these media is quite significant as it illustrates the potential for under-representation in the recovery process.  Figures 1, 2, 3. This is in addition to the OneCodex analysis and the NCBI submission ID available in Supplementary Table 1 It also helps clarify the data shown in Figure 5. ○ Response: This is an important point. This does exist in Figure 5 but we failed to clarify the sample nomenclature that clarifies this. We have added a sample key to describe which samples are DRBC, PDA-CAMP, PDA-no-CAMP.

○
The Simpson's diversity index analysis shown in Figure 6 is extremely helpful to show the differences between the 3 media types in recovery.  Figure 4. Response: We were only allowed to ship cultures on plates across state lines. As a result, we have Cq scores for the colonies that were picked and isolated in Figures  1,2,3 under the TYM and TAC Cq columns on the right. This only informs on inclusion and exclusion capabilities of the primers for the colonies harvested but loses quantitative information. We were not allowed to ship homogenized matrix in the mail to assess the Cq prior to plating. Labs local to Michigan have performed this comparison and are free to publish those results. The summary of the results communicated to us were that the qPCR had better concordance with PDA with CAMP and over estimated CFUs on the DRBC Low samples. This significantly differs from Michigans stated intentions with the ERV where they voiced concerns about molecular methods undercounting risk (https://help.medicinalgenomics.com/hubfs/Regulatory%20Info%20for%20Sales/Michigan%20MRA%2 The opposite turned out to be true. DRBC is undercounting risk compared to qPCR. The MRA was also led to believe that Klebsiella was not an appropriate validation organism as it was not commonly found on cannabis despite it having been published by Thompson et al previously. Candida albicans (which we have never seen documented on Cannabis) was prioritized as a CRM. We have added some language to address this to the best of our ability.

○
In Table 2, the headings seem incorrect as there are two with "PDA with chloramphenicol" and one should be "PDA w/o chloramphenicol" chemotype affects on culture. This is a very good idea as we have seen chemotype specific effects on cannabis microbiology and its published to occur in Trema orientalis (https://pubmed.ncbi.nlm.nih.gov/34035994/).
One of the concerns with plating, is that the antibiotic cannabinoids and terpenes may get liberated from trichomes in the aggressive lab homogenization and media saturation process. This may influence the viability of some of the microbes. This aggressive homogenization and fluid saturation is not what a consumer experiences.
Many publications demonstrate the antibiotic nature of cannabinoids and how different cannabinoids exhibit different antibiotic properties thus we should expect different chemotypes to plate differently given there is no purification step prior to plating (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7105690/). Molecular methods lyse open cells and purify the DNA away from such potential growth inhibitors with Ethanol extractions.
To confirm these samples were indeed a mixture of different cannabis samples, we were sent DNA from these mixtures and performed 10Mb SureSelect capture and deep Illumina Sequencing on these samples to under stand how well they were mixed and how diverse they where. The read genotypes indeed suggested more than a single cannabis sample was present and likely more than 4 in each High, Medium and Low Categories.
We put these data public for anyone who is interested but felt it would bloat this manuscript with confirmatory data and distract from the core focus of the study.