ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Method Article

A unified GenomeSpace recipe to identify essential genes and associated subnetworks from Genome-Scale CRISPR-Cas9 knockout screens

[version 1; peer review: 2 approved with reservations]
PUBLISHED 12 Oct 2018
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Genomics and Genetics gateway.

This article is included in the Bioinformatics gateway.

This article is included in the GenomeSpace collection.

Abstract

We present a unified GenomeSpace recipe that combines the results of a high throughput CRISPR genetic screen and a biological network to return a subnetwork that suggests a mechanistic explanation of the screen’s results. The explanatory subnetwork is found by network propagation, a popular systems biology approach.  We demonstrate our pipeline on an alpha toxin screen, revealing a subnetwork that is both highly interconnected and highly enriched for hits in the screen.

Keywords

genetic screen, network biology, network propagation, CRISPR

Introduction

The rise of next generation sequencing technology and CRISPR gene editing technology has opened up new opportunities for high throughput genetic screens. Increasingly systems biology and molecular networks are becoming more important in the analysis of the mechanisms that are implicated in these screens. Here we present a GenomeSpace recipe providing a standardized pipeline for combining the analysis of a screen with networks and represents a logical next step in providing user-friendly bioinformatics workflows for these types of screens.

This recipe provides a way to process the results of a CRISPR-Cas9 genome wide knockout screen. In such a screen, single guide RNAs (sgRNAs) are designed to target and knock down genes by binding to the target gene and introducing double strand DNA breaks. (Koike-Yusa et al., 2014). Those bound mRNA are subsequently digested by the Cas9 complex and thus do not yield a gene product. In a cell, if the sgRNA is introduced for a gene that is essential for the survival of the cell, that cell will die, and the sgRNA will be depleted. Thus, by sequencing the sgRNAs and looking for a depletion of the sgRNAs targeting a particular gene, we can infer the essentiality of that target gene. Since a large number of sgRNAs can be introduced in a single screen, the essentiality of many genes can be tested at once. However, there are challenges that can arise in the normalization and processing of the read counts; often more than one sgRNA corresponds to a gene but with different efficiency, and significant biases exist in sequencing different sgRNAs. For these reasons the MAGeCK (Li et al., 2014) method was developed to handle data resulting from such a screen.

On the systems biology side of the analysis, we have chosen to employ network propagation as a method of identifying subnetworks representing inferred mechanisms that are implicated by the CRISPR screen. Network propagation has become an essential tool in many network applications; it has been used to identify mechanisms of cancer (Leiserson et al., 2015), to implicate genes in GWAS studies (Qian et al., 2014), and to find functional modules (Vanunu et al., 2010). Network propagation considers genes as nodes on the graph of a biological network. It performs a random walk along the edges of the graph from a set of query nodes. We expect that genes that are implicated in a phenotype will occur in regions of the network that represent mechanisms that are relevant to the screen conditions, and so the random walk will be likely to land on relevant genes. Genes that are near query nodes are therefore implicated by association. For a review of the many flavors and applications of network propagation, see (Cowen et al., 2017).

Methods

An overview of the pipeline appears in Figure 1. The recipe begins using the raw read counts of sgRNAs as input to the MAGeCK module in GenePattern (www.genepattern.org). After normalizing the data, MAGeCK detects differential read counts for each sgRNA using an over-dispersed Poisson model. Next, it detects statistical underrepresentation of the sgRNAs corresponding to particular genes to infer that a gene is essential for the survival of a cell. The reasoning behind this is that if a sgRNA targets an essential gene, the cells that contain that sgRNA will not replicate and the sgRNA will be underrepresented compared to other genes.

8a9d38ae-6620-4c15-a707-88c110f20cc6_figure1.gif

Figure 1. Workflow for the recipe.

This recipe shows how using GenomeSpace seamlessly integrates multiple bioinformatics tools into a single, easily reproducible pipeline. The publicly available preprocessed knockout screen files and sgRNA library from Koike-Yusa et al. can be transferred directly from GenomeSpace to the ported MAGeCK module in GenePattern (Li et al., 2014). By exporting the resulting list of significant genes to GenomeSpace, the data can be imported directly to Cytoscape without having to download the files locally. Cytoscape’s integrated plugins for NDEx (Pratt et al., 2015), Network Diffusion (Carlin et al., 2017; Cowen et al., 2017), and GeneMANIA (Montojo et al., 2010) allow for the remainder of the recipe to completed within its user-friendly environment.

After we determine a set of essential genes, we pass their identity via GenomeSpace to Cytoscape (Shannon et al., 2003). All analysis in Cytoscape is based on networks, thus a relevant reference molecular network must be imported; the recipe uses the NDEx database (Pratt et al., 2015) to identify such a network. In this case, we choose the National Cancer Institute’s Pathway Interaction Database (Schaefer et al., 2009). The set of essential genes, i.e., the hits from the genetic screen, are imported from GenomeSpace as a table, then used as the seed nodes for network propagation.

The propagation process starts with a single unit of “heat” on each of the nodes that represent the genes that are found to be underrepresented in the screen, and therefore essential for the growth of the cell. We use a heat diffusion process, treating the network as an unweighted, undirected graph. Heat diffusion smooths the original signal over the network, iteratively passing the signal on each node to its neighbor. It identifies regions of the network that have a high concentration of hits. Here we use a time parameter (which represents the amount of time that the heat is allowed to diffuse over the graph) of 0.1. This is a common choice for the time parameter (see Paull et al., 2013). The recipe employs the network diffusion service built into Cytoscape natively (Carlin et al., 2017).

Next, applying a cutoff of the top 200 genes with the most heat after diffusion, we choose a subnetwork that has a high concentration of hits. Finally, in order to understand the composition of the subnetwork, we apply the GeneMANIA Cytoscape plugin. For any network, GeneMANIA (Montojo et al., 2010) shows what functional Gene Ontology categories are enriched in that network.

Use case

We used a previously published CRISPR study (Koike-Yusa et al., 2014) to illustrate the use of this pipeline. In this study, the authors use mouse embryonic stem cells grown in the presence of alpha-toxin. This screen was therefore designed to expose the genes involved in the mechanism of resistance to the toxin. The largest connected network component of the top 200 genes after propagation appears in Figure 2.

8a9d38ae-6620-4c15-a707-88c110f20cc6_figure2.gif

Figure 2. The final subnetwork associated with alpha toxin resistance.

The black nodes represent genes that are significantly deplete in a CRISPR screen. Grey nodes represent genes that are closely associated with the hits by the network and are scaled by the strength of their association.

The results of the GeneMania enrichment suggest that DNA repair is the single most important gene set in handling alpha-toxin. This is consistent with the findings in (Bantel et al., 2001) that alpha-toxin causes an influx of monovalent ions that can cause DNA fragmentation. In the absence of DNA repair machinery, the cells cannot recover from this stress and therefore die. The complete table of the Gene Ontology terms that were significantly enriched in the subnetwork appear in Table S1.

Variations

There are several variations that can be used depending on preferences of the user. For example, different tools such as DESeq (Anders & Huber, 2010) and edgeR (Robinson et al., 2010) can be used to identify the hits. Also, the final biological interpretation of the subnetworks was performed with the GeneMania plugin, but the gene list can also be exported and interpreted by another annotation tool. Another approach is to export the gene list and corresponding heats using GenomeSpace and use the Molecular Signature Database gene set overlap tool (http://software.broadinstitute.org/gsea/msigdb/index.jsp), (Liberzon et al., 2011) applied to the genes in the identified subnetwork.

Data and software availability

The recipe and Koike-Yusa et al. datasets are publicly available at http://recipes.genomespace.org/view/75. GenomeSpace, an open-source bioinformatics tool, serves as the data highway allowing for seamless transfer of information between tools, and can be found at http://www.genomespace.org/. The MAGeCK algorithm has been wrapped as a GenePattern module, which can be run locally or on the public GenePattern servers at http://genepattern.org. Additionally, GenePattern has added Jupyter Notebook compatibility through GenePattern Notebook (http://genepattern-notebook.org/). Finally, Cytoscape and all associated plugins (ie. GeneMANIA and NDEx) can be found at http://www.cytoscape.org/.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 12 Oct 2018
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Carlin DE, Kim F, Ideker T and Mesirov JP. A unified GenomeSpace recipe to identify essential genes and associated subnetworks from Genome-Scale CRISPR-Cas9 knockout screens [version 1; peer review: 2 approved with reservations]. F1000Research 2018, 7:1636 (https://doi.org/10.12688/f1000research.16290.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 12 Oct 2018
Views
11
Cite
Reviewer Report 03 Sep 2019
Xiaowei Wang, Department of Radiation Oncology, Washington University School of Medicine, St Louis, MO, USA 
Approved with Reservations
VIEWS 11
The authors present a bioinformatic recipe for network-based functional analysis of CRISPR screening data. CRISPR screening has quickly become a mainstream tool for functional genomics studies. However, bioinformatics tools are lacking in this emerging field. This study provides a timely ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Wang X. Reviewer Report For: A unified GenomeSpace recipe to identify essential genes and associated subnetworks from Genome-Scale CRISPR-Cas9 knockout screens [version 1; peer review: 2 approved with reservations]. F1000Research 2018, 7:1636 (https://doi.org/10.5256/f1000research.17795.r52690)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
14
Cite
Reviewer Report 24 Jun 2019
Shouhong Guang, School of Life Sciences, University of Science and Technology of China, Hefei, China 
Approved with Reservations
VIEWS 14
The CRISPR gene editing technology followed by next generation sequencing technology has brought up new means for high throughput genetic screening. However, a streamlined and systematic method for downstream data analysis is required to examine these high throughput screenings. In ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Guang S. Reviewer Report For: A unified GenomeSpace recipe to identify essential genes and associated subnetworks from Genome-Scale CRISPR-Cas9 knockout screens [version 1; peer review: 2 approved with reservations]. F1000Research 2018, 7:1636 (https://doi.org/10.5256/f1000research.17795.r49975)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 12 Oct 2018
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.