RCy3: Network biology using Cytoscape from within R

RCy3 is an R package in Bioconductor that communicates with Cytoscape via its REST API, providing access to the full feature set of Cytoscape from within the R programming environment. RCy3 has been redesigned to streamline its usage and future development as part of a broader Cytoscape Automation effort. Over 100 new functions have been added, including dozens of helper functions specifically for intuitive data overlay operations. Over 40 Cytoscape apps have implemented automation support so far, making hundreds of additional operations accessible via RCy3. Two-way conversion with networks from \textit{igraph} and \textit{graph} ensures interoperability with existing network biology workflows and dozens of other Bioconductor packages. These capabilities are demonstrated in a series of use cases involving public databases, enrichment analysis pipelines, shortest path algorithms and more. With RCy3, bioinformaticians will be able to quickly deliver reproducible network biology workflows as integrations of Cytoscape functions, complex custom analyses and other R packages.


Introduction
In the domain of biology, network models serve as useful representations of interactions, whether social, neural or molecular. Since 2003, Cytoscape has provided a free, open-source software platform for network analysis and visualization that has been widely adopted in biological and biomedical research fields 1 . The Cytoscape platform supports community-developed extensions, called apps, that can access third-party databases, offer new layouts, add analytical algorithms, support additional data types, and much more 2,3 .
In 2011, the CytoscapeRPC app was created to enable R-based workflows to exercise Cytoscape v2 functionality via functions in the corresponding RCytoscape R package over XML-RPC communications protocols 4 . In 2015, the CyREST app was created to enable R-based workflows to exercise Cytoscape v3 functionality. This was achieved by the first version of the RCy3 R package, which re-implemented much of RCytoscape's organization, data structures and syntax over REST communications protocols 3,5 .
Here, we describe version 2.0 of the RCy3 package, which is better aligned with Cytoscape's CyREST API. We have rewritten every function, deprecated 43 functions and added over 100 new functions. This work provides a more intuitive and productive experience for Cytoscape users learning about the RCy3 package, and it positions RCy3 to take advantage of future Cytoscape Automation 6 development and evolution. The goal of this paper is to describe the implementation and operation of the updated RCy3 package and to provide detailed use cases relevant to network biology applications in common and advanced bioinformatics workflows.

Design and Implementation
RCy3 is a component of Cytoscape Automation. At the core of Cytoscape Automation is CyREST, which implements an interface to the Cytoscape Java application via the REST protocol 6 . A collection of GET, POST, PUT and DELETE operations practically cover the complete feature set of the Cytoscape desktop software. Additional features, including those provided by user-installed Cytoscape apps, are covered by a separate command-line interface called Commands (Figure 1). For version 2.0, we redesigned the RCy3 package to parallel CyREST and Commands APIs to standardize the syntax and organization of its functions and streamline its future development. RCy3 functions are grouped into categories to aid the parallel development with Cytoscape Automation APIs and to facilitate navigation and comprehension of the overall package (Table 1).

Amendments from Version 2
In response to reviewer suggestions, we added another network-related package to Table 2 and swapped in "TRUE" for "T" in a code snippet.  Tables  Managing table columns and table column functions, like map and rename, as well as  loading and extracting table data  The basics All RCy3 functions are ultimately implemented as calls to a small set of utility functions that execute the CyREST or Commands REST protocol (e.g., cyrestGET and commandsPOST). The internals of these functions handle the composition of operations and parameters to be sent via httr functions to CyREST, as well as the processing of results from JSON to appropriate R objects using RJSONIO.

REVISED
In most RCy3 functions there is an optional argument for base.url. This is the URL used by RCy3 to connect to the Cytoscape desktop application via CyREST, and it defaults to "http://localhost:1234/v1". The default CyREST port is 1234, and it can be changed in Cytoscape through Edit/Preferences/Properties or by command-line (see CyREST setup guide). If you change the CyREST port, you should reflect the change in the base.url argument per function call or change each function's default value using the default package.
The second most common argument in RCy3 functions is network. If left as NULL (default), the currently selected network in the Cytoscape application is referenced. Network name or SUID (session unique identifier) can thus be explicitly specified or inferred from the current state of the application. The current network can also be controlled and retrieved by setCurrentNetwork and getCurrentNetwork. Given a base.url and network (when needed), the majority of RCy3 functions simply validate parameters and construct arguments in order to call one of the cyrest* or commands* functions.
The commandsRun function is a special RCy3 function that allows users to directly issue commands via Cytoscape's command-line syntax (e.g., "node l�st network�current" node l�st network�current" "), including commands implemented by Cytoscape app developers (see Use cases). This single function can perform hundreds of operations made available by both Cytoscape and automation-enabled apps 6 . Over 40 of these RCy3-supported apps are currently registered in the Cytoscape App Store 2 . The cyrestAPI and commandsAPI open interactive Swagger documentation for the CyREST and Commands programmatic interfaces. Cytoscape Automation can be performed via these Swagger web pages. The same operations and parameters are supported by the cyrest* and commands* functions in RCy3. Command-line syntax can also be run from the Automation panel in Cytoscape, at manual.cytoscape.org/en/stable/Command_Tool.html.

Generic and specific
The primary goal of RCy3 is to provide wrappers for every feature made available by CyREST and Commands. However, we also have a secondary goal of providing useful and intuitive functions for common workflows in R. So, in addition to the generic functions implemented to parallel the CyREST and Commands APIs, we have also implemented sets of specific helper functions.
As an example, consider the common Cytoscape operation of mapping network data values to visual style properties. CyREST has a POST endpoint for /styles/{style name}/mapp�ngs that takes a JSON data structure defining the mapping. We implemented updateStyleMapp�ng which takes a style.name and mapping arguments and sends them out via cyrestPOST. We also implemented mapV�sualProperty to help construct the mapp�ng argument. With these generic functions one can perform any of the hundreds of visual style mappings supported by Cytoscape, including new ones added in the future. However, these functions are not simple to use, requiring knowledge of specific property names, like "NO�E��ILL�COLOR" NO�E��ILL�COLOR" ", and mapping data structures. To simplify usage for common situations, we therefore also implemented specific functions for over a dozen of the most commonly used mappings (e.g., setNodeColorMapp�ng). With autocomplete in tools like RStudio, after just typing setNode... a script author is presented with a series of intuitively named functions with obvious arguments.

Networks in R
Networks are a popular visualization option in R often implemented as graph models by igraph and Biocondutor's graph (i.e., graphNEL). RCy3 can create networks in Cytoscape from either igraph, graphNEL or dataframe objects (createNetwork�rom*). Likewise, igraph and graphNEL objects can be created from networks (create*�romNetwork), and dataframes from node and edge tables in Cytoscape (getTableColumns).
In the case of createNetwork�rom�ata�rames, two dataframes are accepted as arguments, one for nodes and one for edges. The nodes dataframe must include a column named "�d" �d" ", and the edges dataframe must include "source" source" " and "target" target" " columns. Additional columns are imported as node and edge attributes into Cytoscape. The function can also work with just one dataframe. If a dataframe of only edges is passed to createNetwork�rom�ata�rames, then a connected network will be created with all of the nodes. If a dataframe of only nodes is passed, then a network with no connections, only nodes, will be created.
RCy3 can also import network file formats supported by Cytoscape natively (e.g., SIF, xGMML and CX 7 ) and via user-installed apps (e.g., GPML 8 and adjacency matrices). With these functions RCy3 can interoperate with any other Bioconductor packages that deal with networks in a standardized manner, providing advanced network visualization options and advanced network analytics from the Cytoscape ecosystem (see Table 2). �f (!requ�reNamespace("B�ocManager", qu�etly � TRUE)) �nstall.packages("B�ocManager") B�ocManager::�nstall("RCy3") l�brary(RCy3) Launch Cytoscape and keep it running whenever using RCy3. Confirm that you have everything installed and that RCy3 is communicating with Cytoscape via CyREST: As with any R package, one can access the documentation and browse over a dozen vignettes included in the RCy3 package: help(package�RCy3) browseV�gnettes("RCy3")

Use cases
The following sections demonstrate a variety of common and advanced network biology use cases as runnable R code snippets. The code for these use cases is also available as an online Rmd notebook and Rmd file in the Cytoscape Automation repository (see Data availability). The first set focuses on fundamental Cytoscape operations that are common to most use cases: • Loading networks (from R objects, Cytoscape files and public databases) • Visualizing network data • Filtering by node degree or data • Saving and exporting networks Additionally, there are examples that demonstrate analytical workflows, relying not only on Cytoscape, but also on Cytoscape apps and other R packages: • Building maps of enrichment analysis results using EnrichmentMap and AutoAnnotate • Visualizing integrated network analysis using BioNet • Performing advanced graph analytics using RBGL Loading Networks. Networks come in all shapes and sizes, in multiple formats from multiple sources. The following code snippets demonstrate just a few of the myriad ways to load networks into Cytoscape using RCy3.
# Load demo Cytoscape sess�on f�le openSess�on() # default f�le � gal��ltered.cys net.su�d <-getNetworkSu�d() # get SUI� for future reference # ��lter for ne�ghbors of h�gh degree nodes create�egree��lter(f�lter.name � "degree f�lter", cr�ter�on � c(0,9), pred�cate � "IS�NOT�BETWEEN") select��rstNe�ghbors() # expand select�on to f�rst ne�ghbors createSubnetwork(subnetwork.name � "f�rst ne�ghbors of h�gh degree nodes") # ��lter for h�gh edge betweenness createColumn��lter(f�lter.name � "edge betweenness", type � "edges", column � "EdgeBetweenness", 4000, "GREATER�THAN", network � net.su�d) createSubnetwork(subnetwork.name � "h�gh edge betweenness") Saving and exporting networks. There are local and cloud-hosted options for saving and sharing network models and images. The Cytoscape session file (CYS) includes all networks, collections, tables and styles. It retains every aspect of your session, including the size of the application window. Network and image exports include only the currently active network. Export to NDEx requires account information you can obtain from ndexbio.org. Files are saved to the current working directory by default, unless a full path is provided. # Sav�ng sess�ons saveSess�on("MySess�on") #.cys ## Leave f�lename blank to update prev�ously saved sess�on f�le # Export�ng to N�Ex, a.k.a. "�ropbox" for networks exportNetworkToN�Ex(username, password, TRUE) ## Account �nformat�on (username and password) �s requ�red to upload ## Use updateNetworkInN�Ex �f the network has prev�ously been uploaded Building maps of enrichment analysis results. This workflow illustrates how to plot an annotated map of enrichment results using the EnrichmentMap Pipeline Collection of apps in Cytoscape 9 . An enrichment map is a network visualization of related genesets in which nodes are gene sets (or pathways) and edge weight indicates the overlap in member genes 10 . Following the construction of the enrichment map, AutoAnnotate clusters redundant gene sets and uses WordCloud 11 to label the resulting cluster (Figure 2). The code uses the Commands interface to invoke EnrichmentMap and AutoAnnotate apps. After installing apps, run commandsAPI() to open the live Swagger documentation to browse and execute command-line syntax.

Data availability
Underlying data All data underlying the results are available as part of the article and no additional source data are required. The package is very well documented in both analysis vignettes on the Bioconductor landing page of the package and in the article itself.
The implemented functionality covers a range of useful features including network import/export, network layout, and integration with network and enrichment analysis packages.

Is the description of the software tool technically sound? Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? Yes

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Bioinformatics, Cancer genomics, Gene expression data analysis I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

James Denvir
Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV, USA This article presents an R package, RCy3, designed to control and automate the network visualization software application Cytoscape. The article describes both a low-level ("generic") and a high-level ("specific") API and presents use-cases with some code samples.
Overall this appears to be an API with great potential utility for the bioinformatics community, and the article presents the package clearly and with several useful and informative examples.
I have a couple of minor comments and suggestions, which I hope will serve to improve the article.
The abstract mentions the potential for RCy3 to deliver reproducible workflows. In my opinion, reproducibility is among the most important benefits of code/script-based workflows over workflows performed using "point and click" GUI-based applications. However, the article does not elaborate on this outside of the abstract. A couple of sentences in the discussion describing the potential for RCy3 to enhance reproducibility would be worthwhile.
The article mentions that the Cytoscape App store currently lists over 40 "RCy3-supported apps". A quick note on what is needed for a third party app to be "RCy3-supported" might be pertinent here. This may be already sufficiently covered in existing publications (e.g. reference 6), in which case a simple note to that effect would suffice.
The code block for the section "Building maps of enrichment analysis results" is not quite as clear as the other code blocks in the article. A comment indicating that commandsAPI() opens a web browser page (externally; at least in my environment) and comments after the print() statements showing the expected output, as in the other blocks of code, would be helpful. The uses of file.path and paste(... sep='/') are a little confusing here (to be honest, I was surprised to find these actually worked on Windows systems); brief comments added to the code might aid readability here.

Is the rationale for developing the new software tool clearly explained? Yes
Is the description of the software tool technically sound? Yes

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes