CellMap visualizes protein-protein interactions and subcellular localization

Many tools visualize protein-protein interaction (PPI) networks. The tool introduced here, CellMap, adds one crucial novelty by visualizing PPI networks in the context of subcellular localization, i.e. the location in the cell or cellular component in which a PPI happens. Users can upload images of cells and define areas of interest against which PPIs for selected proteins are displayed (by default on a cartoon of a cell). Annotations of localization are provided by the user or through our in-house database. The visualizer and server are written in JavaScript, making CellMap easy to customize and to extend by researchers and developers.

In the new version of the manuscript, we have highlighted the origin of the data sources for the example deployment of the portal on http://cell.dallago.us and the behavior of the visualization

Introduction
Many tools visualize different aspects of protein-protein interaction (PPI) networks; the most prominent might be Cytoscape 1 .
Existing visualizations of large PPI networks continue to be difficult to use. Some proteins interact with many hundreds or thousands of others. Often referred to as 'PPI hairballs', such hubs are in the way of understanding large data sets. Many ways have been proposed to resolve such hairballs through the addition of biologically meaningful dimensions such as pathways 2 or time 3 .
Another dimension was first introduced a decade ago, namely the overlay of PPIs with subcellular localization 4 . Combining PPI networks with protein location provide an intuitive way of laying out PPI networks on a graphical representation of the cell, and might reduce the clutter from PPI hairballs. This decade-old solution 4 no longer copes with today's data, in terms of scalability nor of customizability and in terms of ease-of-use.
CellMap, the prototype introduced here, takes up on the idea of PPI visualization constrained by protein location, and provides a simple visual interface for users to explore protein location inside a cell. It presents this information in a graphically pleasant way and offers several customization features. The framework has been optimized to simplify future developments, such as the addition of further data dimensions (e.g. inclusion of protein trafficking). An instance of the tool with localization data from a previous publication that includes protein localizations of the human proteome 5 and PPI data from the Human Integrated Protein-Protein Interaction rEference (HIPPIE) resource 6 is available at http://cell.dallago.us.

Implementation
The CellMap prototype is an integrated portal that exposes API calls to retrieve images (representing cells) and protein information, as well as a frontend to visualize protein location and PPI data. The portal is fully written in JavaScript, namely in the JavaScript interpreter node.js (https://nodejs.org) for the backend and vanilla JavaScript for the frontend. The portal is deployed to the public through a Docker container. Docker is a technology that allows shipping of packaged services such as web applications to customers and users without the need to install dependencies other than the Docker engine (available through: https://www.docker.com). For the representation of cell images as maps, the Leaflet framework is used. Leaflet is a JavaScriptbased tool used to represent maps (http://leafletjs.com).
Data about proteins are stored as JSON documents in a Mongo (http://mongodb.com) database. All information about the interaction partners and the subcellular localization of a protein is stored in a single JSON document, making the data structure simple to understand for non-experts and enabling them to deploy prototypes using their own data. Figure 1 schematically represents a protein data model (for a specific example for a protein object: http://cell.dallago.us/api/proteins/search/Q99943).

Operation
In CellMap, users can choose to upload new maps (images of cells). They can modify the location of regions of interest (ROIs) for a selected map (Figure 2), and visualize the locations of selected proteins on a map or render protein-protein interaction networks from a set of selected proteins.
To maintain a consistent coloring scheme for different cellular compartments throughout a set of different images, each compartment is assigned a unique color through the hash of the compartment's name (e.g. light blue = vacuole, Figure 3B). Using this coloring approach, users might eventually learn to associate color with compartment. When proteins are loaded into the map, they are assigned pseudo-random coordinates representing a point that lies within the boundaries of the ROI in which they are localized ( Figure 3D). A circle of a given radius is placed on the randomly generated point ( Figure 3E-F), and the circle will be filled with the same color as the compartment in which the protein is located in (Figure 3B and 3F).
Users can choose between two visualization options: the subcellular location in the context of the protein-protein interaction     viewer (PPI viewer, Figure 4A, http://cell.dallago.us/ppi), and the protein subcellular location viewer (Map viewer, Figure 4B, http:// cell.dallago.us/map). The two viewers can load the same images of cells (maps) and collect localization data from the same source, in the publicly available instance by 5. The PPI viewer offers the possibility to overlay networks between proteins being visualized. The map viewer displays all locations reported for a given protein simultaneously, while the PPI viewer only displays only one location at a time (by default: the first localization in the array of localizations as described in the protein data model, Figure 1); users can manually change the location by clicking on the protein circle and selecting a new location from the information box ( Figure 5) To facilitate the retrieval of proteins and their interacting partners, CellMap provides basic search functionalities. Users can search for proteins based on their UniProt identifiers, by their gene identifiers or by their protein names. When performing the search, the page renders a grid containing boxes, each representing a different protein (Figure 7). Inside the boxes, the UniProt identifier for the protein that matched the search criterion is displayed. Starting on the top-right of every box a smaller colored square for each compartment is displayed in which that protein is localized. For proteins annotated to be in a single compartment, the border of the outer box (representing one protein as indicated by the UniProt ID in the center of the box) will get the color of that compartment (2 nd box in Figure 7). Clicking on one of the colored squares will filter results based on the compartment represented by that color. In the bottom-right of each box, the total number of PPI partners are annotated.

Discussion
Some CellMap functionality is exemplified by a heat shock protein (HSPA4; Heat shock 70 kDa protein 4, UniProt identifier P34932) with many interaction partners (338, according to HIPPIE, http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/ query.php?s=HSPA4) in different compartments. The objective was to showcase how CellMap can simplify PPI hairballs. We visualize the same PPI network using CellMap ( Figure 8A) and Cytoscape 1 in the form of the Cytoscape.js version used by HIPPIE ( Figure 8B) and the Cytoscape desktop version ( Figure 8C).
None of the three viewers solves the PPI hairball problem completely. Without zooming in, the information density for 338 protein pairs is too high to be helpful. HIPPIE's layout for Cytoscape.js ( Figure 8B) clearly improves over the standard Cytoscape desktop version ( Figure 8C) by centering the view around HSPA4, the protein of interest. In CellMap ( Figure 8A) the biologically relevant differences between pairs from the same and from different compartments remain visible.
By using a biologically relevant dimension (protein localization), instead of drawing nodes in positions based on edge weight (force layout of Cytoscape), some aspects of the protein and its partners become obvious at first glance, e.g. that HSPA4 interacts with many nuclear and cytoplasmic proteins, as well as with proteins that are secreted (extra-cellular) and located in the Endoplasmic Reticulum (ER, Figure 8). This may suggest the hypothesis HSPA4 to be an important hub involved in process spanning across compartments. Such a hypothesis is presented in our supplementary material ( Figure SOM_1), where we analyze the visualization of the FOXO3 protein through CellMap.
One disadvantage of CellMap over the Cytoscape.js view is that the protein identifiers are not visible at all on the static image (protein identifiers become visible through mouse-over events in CellMap). However, in the image shown ( Figure 8) the Cytoscape.js names also remain unreadable. Another problem with CellMap are the numbers displayed on edges (experimental   reliability of the PPI as given by HIPPIE). In our view, this information is extremely important to look at interactions, but we are still lacking a more sophisticated mechanism to visualize these numbers.
CellNetVis 8 is a recent tool that also connects localization with PPI networks. It emphasizes the way PPI networks are laid out through the adaptation of a so-called force-directed layout (using the tool While). Although CellMap and CellNetVis are founded on a similar idea, user experience and focus differ importantly. For instance, CellMap can be driven by data from users that define the number of compartments on a map, and provide localizations. In contrast, CellNetVis uses a fixed subset of compartments and an ad hoc diagram for the cell. Additionally, CellMap comes with out of the box data for the human proteome and allows the community to grow the tool by enriching datasets (images and localizations), whereas CellNetVis has a per-use approach, allowing to visualize networks stored in specialized XGMML files. Another unique aspect of CellMap is the openness to introduce further biologically meaningful dimensions (beyond location such as time or pathways) that increase the usefulness of PPI visualization tools to create new testable hypotheses.

Conclusions
CellMap is a prototype providing a portal exploring the idea of using protein subcellular location as the basis to construct more complete visualizations of biological data, such as protein-protein interactions (PPIs). Using this paradigm, we claim that additional information, such as pathways, can be layered on top of the current visualization of subcellular location to potentially generate meaningful biological insights. The source code for the portal is publicly available and an instance of the portal with location data from a previous publication about the subcellular localization of the human proteome 5 and protein-protein interaction data from The tool provides an interesting feature to help declutter visualizations of biological networks using localization information. Some comments: It would be good if the names of the used databases was stated in the last paragraph of the introduction.
The tool would be more intuitive for new users, if it provided descriptions the various colors used on the site with the same explanation as in the paper. For example, the colored boxes that represent localizations in the search results and the dot colors used for the protein visualization on the cell map.
It is unclear from the paper all the types of interactions might be shown in the represented networks.
Also, it is unclear from the paper, what happens to the network visualization in the cases where the identified proteins are present in multiple locations.
Is the rationale for developing the new software tool clearly explained?

Is the description of the software tool technically sound? Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? Yes

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes No competing interests were disclosed. Competing Interests:

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Author Response 29 Jan 2018 , Technical University of Munich, Germany Christian Dallago Dear Dr. Luna, Thank you very much for your input on our work.
We have submitted a new version of the manuscript, which should address of points one and four your comment.
As to the : we have created a new feature item for our next release that displays a second point button on the map viewer to display a modal with the legend. As of now: a legend is available by scrolling down to the second half of the page in the map or ppi viewers (e.g. ) and expanding the "Legend" http://cell.dallago.us/map/573c87c182a9e1ae1e37d08e?p=P04637 tab. We understand that this can be overseen and improved, therefore we thank you for the input.
As to the : in this manuscript, we focus on discussing the software implementation and third point visualization abilities of CellMap, rather than the data sources used in the example deployment hosted on . More information about the types of interactions reported by the http://cell.dallago.us HIPPIE data source can be found in ( the latest paper describing HIPPIE ) and directly on the http://nar.oxfordjournals.org/content/early/2016/10/28/nar.gkw985 HIPPIE ( ). information page http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/information.php#sources Please, feel free to suggest any other changes to both our manuscript and tool. Thank you for your valuable input on the tool! : we apologize for the broken database connection, unfortunately, the With regard to point 1 deployment system missed that flag and thus didn't restart the service. We have fixed the issue and the website is now running. Up until now, I have not identified any other issues that could prevent the web server to run properly.
: the software presented in this paper has a dual-purpose. On the one hand, With regard to point 2 we want to give the ability to discover protein localization and protein-protein interaction from two known sources (HIPPIE for PPI, and subcellular localization from a publication, which describes localization for the human proteome based on a consensus of experimental data and state-of-the-art prediction models ( )). On the other hand, we http://doi.org/10.1038/ncomms8866 want to propose a system that can be reused on user-defined data (as long as it complies with the format the visualization tool digest, as from Figure 1) and be integrated as JavaScript visualization tool in different portals. For now, we would like to avoid having a direct integration of the portal with external tools via, for example, API calls. In an upcoming version of the portal, we will offer scripts to populate the database from different sources for the two data entities (protein localization and interaction).
PSICQUIC generates interaction data on-demand, which can later be downloaded. Obtaining the data requires some time: a user input one specific protein identifier, selects the databases to use to collect interaction data, submits a cluster job and finally gets access to the data. Searching for protein P45381 identified 80 interactions in all online databases. After several hours, the job was not finished, so we decided to lower the number of databases to fetch information from. Reducing the number of databases produced results quickly. The results page of PSICQUIC presents a table of interactions and visualizes a graph, which we could not load due to lack of compatibility with the Chrome browser. We believe it would be interesting to present CellMap at the level of this resource and will contact the authors of the tool to discuss what the best idea in this regard would be. Fetching the data from PSICQUIC as it is now and putting it into the portal requires to also normalize the PSICQUIC data and map it to protein localization data. Writing a parser for the PSI-MITAB tables is straightforward, the normalization and mapping of identifiers should occur externally to CellMap. We will create a guide on how this can be done in the next days and put it on the landing page of CellMap. Integrating protein-molecule data and displaying these entities meaningfully is an interesting idea for the future development of the CellMap tool.
: the data about protein localization stems from a publication of our group With regard to point 3 (http://doi.org/10.1038/ncomms8866). The data on protein subcellular localization for humans published through this paper was the starting point for the development of CellMap. In the current manuscript, we focused more on describing the visualization tool, rather than going into detail about how the localization data was retrieved (which in this case is by building a consensus over experimental (where available) and predicted localisations for 6 subcellular compartments). This is again because we didn't want to develop a tool around this specific data source, but rather offer the possibility to change the origin for the localization data in the future.
We appreciate the suggestions for further data sources and data entities that can be used and integrated into CellMap. In upcoming releases, we will make sure to offer a bigger variety of data sources and scripts to populate and update the information on protein subcellular localization, and protein-protein interaction data used by the visualization tool. Additionally, we will contact the authors of PSICQUIC to discuss if it would be possible to integrate CellMap in the results page of a cluster job.
No competing interests were disclosed.

Competing Interests:
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com