A draft reference assembly of the Psilocybe cubensis genome

We describe the use of high-fidelity single molecule sequencing to assemble the genome of the psychoactive Psilocybe cubensis mushroom. The genome is 46.6Mb, 46% GC, and in 32 contigs with an N50 of 3.3Mb. The BUSCO completeness scores are 97.6% with 1.2% duplicates. The Psilocybin synthesis cluster exists in a single 3.2Mb contig. The dataset is available from NCBI BioProject with accessions PRJNA687911 and PRJNA700437.


Introduction
There are hundreds of mushrooms capable of synthesizing the psychoactive compound psilocybin. This compound has been classified as a "breakthrough therapy" for depression by the FDA (Johnson and Griffiths 2017). The psilocybin pathway was identified by Fricke et al., but to date no public references exist in NCBI with N50s longer than 50kb (Fricke et al. 2017;Blei et al. 2018;Fricke et al. 2019a;Fricke et al. 2019b;Blei et al. 2020;. A more contiguous genome assembly can assist in further resolution of the genetic diversity in the fungi's secondary metabolite production.

DNA isolation
Dried stems from Psilocybe cubensis strain P.envy. The strain name is anecdotal reported to have been grown axenically (unknown conditions) and obtained in Somerville, MA, US. These samples were used for isolation of high molecular weight (HMW) DNA using a modified CTAB/Chloroform and SPRI protocol. Briefly, 300mg of stem sample were ground to a fine powder using a -80C frozen mortar and pestle. 150 mg of powder was then aliquoted into 2 mL conical tubes (USA Scientific) with 1.5 mL cetrimonium bromide. These tubes were then incubated at room temperature on a tube rotator for 10 minutes. 6 uL of RNase A (Promega 4 mg/mL) was then added and both tubes were incubated at 37°C for one hour, vortexing every 15 minutes. Following this incubation, 7.5 uL Proteinase K (New England Biolabs 20 mg/mL) was added and the tubes were incubated at 60°C for 30 minutes, vortexing every 10 minutes. At the conclusion of the Proteinase K incubation, both tubes were incubated on ice for 10 minutes. The samples were then centrifuged for 5 minutes at 14000 rpm. 600 uL of supernatant was removed from each tube and added to 600 uL chloroform. The tubes were then vortexed until opaque and spun for 5 minutes at 14000 rpm. 400 uL of the aqueous layer was removed using a wide bore tip and added to a 1.5 mL Eppendorf tube. 400 uL MIP (marijuana infused products) Solution B and 400 uL DNA Binding Beads (Medicinal Genomics PN 420004) were added to the Eppendorf tube and inverted to homogenize. The tubes were then incubated at room temperature on the tube rotator for 15 minutes. The tubes were then removed from the rotator and placed on a magnetic tube rack for 3 minutes. The supernatant was removed, the beads were washed twice with 1 mL of 70% ethanol and allowed to dry for 5 minutes. The beads were then eluted in 100 uL of 56°C Elution Buffer (Medicinal Genomics PN 420004) using a wide bore tip and incubated at 56°C for 5 minutes. Following this incubation, the tubes were returned to the magnetic rack, the supernatant of both tubes were removed using a wide bore tip and pooled in a fresh Eppendorf tube. HMW DNA length was quantified on an Agilent TapeStation and produced a DIN of 8.1. Qubit Fluorometer (Thermo Fisher Scientific) quantified 55ng/ul. Nanodrop Spectrophotometer (Thermo Fisher Scientific) quantified 104ng/ul with 260/280nm ratio of 1.85 and 260/230nm of 0.95.

Sequencing
Sequencing libraries were constructed according to the manufacturer's instructions for Pacific Biosciences Sequel II HiFi sequencing. 773,735 CCS reads were generated. Quast (Gurevich et al. 2013) was used to assess the quality of the input fasta sequence file (N50 = 13.9Kb) and the output assembly fasta file (3.33Mb N50).

REVISED Amendments from Version 1
To address the reviewers' very helpful comments 1. We have included more descriptions of the SNP calling including a Github version of the code one can run to reproduce this.
2. We have expanded the analysis to include variant calling on the HiFi reads mapped back to their own reference and in doing so recalled the SNPs from the MGC unknown strain mapped to the P.envy reference utilizing the same code for a controlled comparison. This leveraged different variant callers and produced more variants.
3. We have improved the readability of figure tracks.
4. Adjusted Title language, mushroom number and Fungi vs Fungus typo.
5. Addressed the Anexic growth and bacterial contamination concerns.
6. Clarified these citations are to NCBI/SRA submissions which currently do not have a DOI publication associated with them.
7. We have clarified the text to underscore the importance of other variants in the pathway. We have included his suggested reference here and added links to a SNPeff file that can be used to compare these variants to those he listed.
8. In performing SNP calling on the HiFi reads mapped back to the reference we note that 98% of the variants found are heterozygous variants with balanced alleles. Coverage maps can also be viewed in the CoGe genome browser provided to address additional questions regarding aneuploidy/multiple nuclei but coverage for all scaffold is very consistent with the exception of scaffold_26 which is mitochondria.
Any further responses from the reviewers can be found at the end of the article
Three Illumina genome assemblies (Reynolds et al., McKernan et al., Fricke et al.) were additionally aligned using MUMmer for whole genome alignment plots ( Figure 2).

Structural variation
The N-methyltransferase gene responsible for Psilocybin production in P.envy contains a structural variation not seen in previous P. cubensis surveys (Figure 3). Illumina read mapping of the McKernan et al. P. cubensis assembly in NCBI (NCBI Project: PRJNA687911) demonstrates multiple read pairs spanning a 4.6kb insertion in the HiFi P. cubensis strain P.envy (SRA submission SRP299420). This insertion extends the 3' end of the P.envy N-methyltransferase gene. The 4.6kb insertion is also observed as a deletion in Psilocybe cyanescens and as a deletion in RNA-Seq data from Torrens-Spence et al. (NCBI Project: PRJNA450675) (Reynolds et al. 2018). Other SNPs also exist in these genes and should be considered in context of this deletion. Further work is required to understand the biological significance of this variation.

Conclusions
A highly contiguous Psilocybe cubensis genome has been generated. The N50 contigs lengths are 75 fold more contiguous than the existing assembly available at JGI. This reference can aid in the identification of genetic variation that may impact psilocybin, psilocin, norpsilocin, baeocystin, norbaeocystin and aeruginascin production. McKernan and colleagues present on the first highly contiguous draft genome for the magic mushroom Psilocybe cubensis. We commend their use of High accuracy long read sequencing and an advanced bioinformatics pipeline to build a much more complete picture of the P. cubensis genome and for making it openly available to the public with the promise the genetic architecture of tryptamine expression in magic mushrooms.

Data availability
The methods employed are state of the art and the authors provide sufficient access to the data to enable peers and the public to replicate the experiment. While they acknowledge that the HiFi sequencing approach comes with great advantages, in particular greatly improved contiguity and and BUSCO completeness scores compared to other P. cubensis genomes published to date, the authors did not acknowledge that fungi can have multiple nuclei in a cell, sometimes with completely different haplotypes. As such, we posit that their assembly could possibly be a metagenome assembly, rather than the assembly of a single genome, thus providing an alternative explanation to the large insertion detected in the norbaeocystin methyltranferase (psiM) gene.
Perhaps a means to provide a remedy to this is to provide some additional background on the P. cubensis Penis Envy (PE) strain, in particular the alleged origin of the PE mutant and its probable clonal propagation. While anecdotal at best, the fabled "mutation" of PE appeared and was selected from a phenotype of an amazonian P. cubensis accession, as a towering fruiting body with a pale cap and missing partial veil, which was then preserved via clonal propagation. The mutant is also described as being more potent that most other P. cubensis strain, leading to the hypothesis that it had a skewed drug to prodrug ratio (psilocyn/psilocybin) which would hint to a mutation in the psiK gene as opposed to the large insertion in psiM.
Other putative mechanisms could be polymorphisms at other loci involved in the psilocybin biosynthetic pathways as well as ancillary genes involved in the SAM salvage pathway (e.g. ref 1), a list of putative functional SNPs that may interact with the large insertion is shown here from an earlier version of the P.cubensis genome. Genotyping several strains at the 4.6kb insertion and ancillary SNPs may help shed light on the mechanism behind the higher perceived potency of PE compared to other P. cubensis strains and other species in the Psilocybe and Panaeolus genus, In that vein, the authors may gain additional insight by including the P. serbica var bohemica genome to their comparative analysis, provided that chemotypic information associated with each accessions is made available. The study was undertaken in order to provide a high quality reference genome for the species. To date, the genomes in the species and genus are fragmented more than is desirable for basic and applied comparative investigations of genome content and architecture. Are the protocols appropriate and is the work technically sound?

○
The Pacific Biosciences Sequel II HiFi methods used for sequencing are among the best for generating near-chromosome level assembly. Assembly was performed with cutting-edge Peregrine Assembler, and the annotation appropriately used the fungus-specific FunAnnotate pipeline. Single nucleotide polymorphisms (SNP) were called with appropriate software, but parameters were not detailed in the text. This and structural variation were not intended to be exhaustive, but provide intriguing statistics and examples to warrant follow-up investigations. Are sufficient details of methods and materials provided to allow replication by others?
○ Parameters for SNP calling would have to be further detailed in order to allow replication of raw SNP numbers between two isolates. Are the datasets clearly presented in a useable and accessible format?
○ Figure 1 and both tables are clear and informative. Figure 2 readability would be improved by increasing the size of the axis titles. Figure 3 is not sufficiently informative or simple to acquire meaning as it is currently presented. This figure would benefit from marking the "tracks" clearly, but number and perhaps with additional labels for RNAseq, P. cyanescens, and annotation as the IGV display is too small to read as is. Is there significance to the locus that is presented in Figure 3? If not, then perhaps indicate it is a "representative" locus.

Other comments:
The title might flow better as "A draft reference assembly of...".

○
In the introduction, "several mushrooms" might better be stated "about 200 mushroom species". ○ In the Introduction " fungi's " would be more syntactically correct as "fungus' " ○ Given that dried stems were used for genomic DNA isolation, it is expected that some additional microbial DNA might be present. Authors should note if the mushroom was produced axenically, or if contaminant reads were filtered to either prevent or address presence of additional species' genomes in the assembly.

○
Citations are incomplete in the last sentence of "Assembly and annotation" section, and second sentence of "Structural variation" section.

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of methods and materials provided to allow replication by others? Partly