Sequencing data from Massachusetts General Hospital shows Cas9 integration into the genome, highlighting a serious hazard

The ability to edit a specific gene within our genomes using guided-nucleases (Cas9/ZFN/TALEN CaZiTa) presents huge opportunities for curing many genetic disorders. Delivery of this ‘drug’ within cells is a critical step for such therapies. The ability of recombinant adeno-associated virus (rAAV) to enter cells makes it a perfect choice as a vector for gene therapy. A plasmid comprising the rAAV, the CaZiTa, guide RNAs (for CRISPR) is expected to enter the cell, edit the target gene(s), remain episomal, and thus fade away with time. However, the rather obvious danger of integration of the plasmid into the genome, if the episomal hypothesis is incorrect, is under-reported. A recent report has highlighted that bacterial genes from a plasmid were integrated into bovine genomes. Massachusetts General Hospital has recently published data on CRISPR edits (Accid:PRJNA563918), noting ‘high levels of AAV integration (up to 47%) into Cas9-induced double-strand breaks’. However, there is no mention of Cas9 integration. Here, the same data from Massachusetts General Hospital shows Cas9 integration in the exact edit sites provided for two genes TMC1 and DMD. Also, there is a mis-annotation of one sample as ‘no gRNA’, since Cas9 integrations have been detected in that sample. This is an important distinction between AAV and CaZiTa integration: while AAV integration can be tolerated, Cas9 integration is a huge, and unacceptable, danger.


Introduction
Nuclease based gene-editing techniques (Cas9/ZFN/TALEN -CaZiTa) introduce a double stranded break at a specified location with the guide of DNA-binding proteins (ZFN/TALEN) or RNA (CRISPR) 1 . Delivery within cells is a critical step for such gene-therapies 2 . The ability of recombinant adenoassociated virus (rAAV) to enter cells makes it a perfect choice as a vector for gene therapy 3 . A plasmid comprising the rAAV, CaZiTa, guide RNAs (for CRISPR), etc. is expected to enter the cell, edit, and remain episomal 4 . However, the rather obvious danger of integration of the plasmid into the genome is ignored. A recent pre-print 5 has highlighted that bacterial genes from the template plasmid (pCR2.1-TOPO) has integrated into bovine genomes 6 .
Recently, Massachusetts General Hospital published data on CRISPR edits (Accid:PRJNA563918), and concluded that 'AAV integration should be recognised as a common outcome for applications that utilize AAV for genome editing' 7 . However, there was no report of Cas9 integration. Here, data showing Cas9 integrating in the exact edit sites is shown. The same caveat applies to all three CaZiTa gene-therapies 1 .

Results and discussion
Cas9 integration in transmembrane channel like 1 (TMC1) TMC1, a transmembrane protein, is required for proper functioning of cochlear hair cells, and has been implicated in hearing loss and prelingual deafness 10 . In mice, this gene is located in chr19 (Accid:NG_008213.1)). Table 1 lists the samples sequenced for targeting this gene. As expected, the non-injected controls have no Cas9 reads. The sample SRR10068671 (marked with triple asterisks) has probably been mis-annotated as one with "no gRNA", since Cas9 integrations are detected in that sample. From the Cas9 integration site, the guide RNA (gRNA) for the TMC1 gene can be deduced to be "CATGGTAATGTCCCTCCTGGGGA", although this information is not yet available. The sequences for these reads are provided as extended data. As a specific example, the sequence (SRR10068639.63180.1) encodes the 26aa peptide= SPEKLLMYHHDPQTYQKLKLIMEQYG from Cas9 and is merged with the TMC1 gene ( Figure 1).

Cas9 integration in dystrophin
Duchenne muscular dystrophy (DMD), a genetic disorder associated with progressive muscle degeneration (dystrophy), is caused by aberrations in the dystrophin protein, located on chromosome X in humans and mice 11 . According to the data (Accid: PRJNA563918), different exons and introns have been targeted using two different variations of the Cas9 -SpCas9 and SaCas9 -which differ in certain characteristics 12 . SaCas9 does seem empirically to have more samples without any integration, but that might just be random. The sequences for these reads are provided as extended data. As a specific example, the sequence (SRR10068622.33932.1) encodes the 12aa peptide=LDATLIHQSITG from Cas9 and is merged with the DMD gene ( Figure 2).

AAV integration in TMC1 and DMD
The integration of AAV into the genome has already been noted 7 . This has been replicated in this study as well. The AAV genome used was Accid:MK163936.1. The reads can be found in Extended data: TMC1.aav.fa (N=5000) and DMD.aav.fa (N=14000).

Conclusions
Gene-editing based therapies provide revolutionary hope for curing many diseases. However, ensuring safety of any such endeavours must be paramount to avoid doing more harm 13,14 . Plasmid integration is one such potential hazard 5 . Recently, Massachusetts General Hospital reported high-levels of in vivo AAV into the genome while providing sequencing data (Accid: PRJNA563918), and concluded that 'AAV integration should be recognized as a common outcome for applications that utilize AAV for genome editing' 7 . In this study, in addition to AAV, Cas9 integration is shown in the same samples. The same caveat applies to gene-therapies using CaZiTa 1 . Offtarget edits (OTE) are an important aspect of CRISPR-cas gene-editing [15][16][17][18][19] . Such integrations, found by targeting amplicon sequencing, will only get worse due to OTEs, which are hard to detect 20-22 . Another problem is pre-existing immunity to Cas9 proteins 23,24 . This problem can be mitigated by sending the CaZiTa as protein 25 , but that would seriously restrict the use-cases, and also suffer from OTEs, large deletions, complex rearrangements and translocations 26 , or even including fragments from exosomes 27 . This is an important distinction between AAV and CaZiTa integration, since AAV integration can be tolerated (integration at chromosome breakage points 28 , though there is debate on its role in hepatocellular carcinoma 29,30 ), Cas9 integration is a huge, and unacceptable Table 1. Integration of Cas9 into the edited gene. The paired reads are clubbed into a single file, and then uniquified. Open reading frames from these are then matched to Cas9 proteins for 10 aa, and subsequently these reads are matched to the gene of interest (TMC1 or DMD), helping identify the exact edit site. The sample SRR10068671 (marked with triple asterisks) has probably been mis-annotated as one with "no gRNA", since Cas9 integrations are detected in that sample. These integrations happen both for SpCas9 and SaCas9. From the Cas9 integration site, the gRNA for the TMC1 gene can be deduced to be "CATGGTAATGTCCCTCCTGGGGA", although this information is not yet available. CN, cortical neuron; CA: cochlea, adult. This project contains the following extended data:

Rakesh Chatrikhi
Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA Chakraborty studied sequencing data on CRISPR-induced DNA edits from Massachusetts General Hospital to show Cas9 integration into the genome. The author analyzed previously published data on CRISPR-induced DNA edits to highlight integration of Cas9 into the genome, which was not reported in the previously published study that generated the sequencing data. This study highlights risks involved in using gene-editing techniques such as CRISPR-Cas9 based system and raises important caveats that need to be considered when using the technology for therapeutics. However, the author needs to address the following points to strengthen the manuscript: The author needs to quantitatively compare the levels of AAV and Cas9 integration and whether it is statistically significant. This is important because the author is using the same samples and sequencing data to discuss Cas9 integration into the genome, which was not reported earlier and thus required to effectively highlight the results in this study. The author needs to make a clear comparison of AAV and Cas9 integration from the data in the Results and Discussion section.
The author needs to support the argument that Cas9 integration is a serious hazard and is a huge unacceptable danger. This is important because the author argues that Cas9 integration is hazardous throughout the manuscript (title, abstract and conclusion). The author needs to include multiple references to previous work in the literature that have shown or highlighted the toxicity of Cas9 integration into the genome and that it is hazardous.
In the Conclusions section, the author needs to briefly discuss the alternative approaches that could be used to avoid Cas9 integration. For example, the author needs to elaborate on the use of purified Cas9 ribonucleoproteins as a substitute to plasmids. The author included a sentence on this topic in the Conclusions section, but need to elaborate on the advantages of using Cas9 ribonucleoproteins. For example, in addition to avoiding Cas9 integration, this method has higher efficiency (Farboud B et

5.
The author needs to include the rationale for focusing on the genes TMC1 and DMD. Were those genes studied in the previous study that generated the data (Ref 7)? This needs to be clearly stated. If there are more genes studied in Ref 7, then the author needs to include the rationale of choosing these two genes over other genes in the study.