SLiMScape 3.x: a Cytoscape 3 app for discovery of Short Linear Motifs in protein interaction networks

Short linear motifs (SLiMs) are small protein sequence patterns that mediate a large number of critical protein-protein interactions, involved in processes such as complex formation, signal transduction, localisation and stabilisation. SLiMs show rapid evolutionary dynamics and are frequently the targets of molecular mimicry by pathogens. Identifying enriched sequence patterns due to convergent evolution in non-homologous proteins has proven to be a successful strategy for computational SLiM prediction. Tools of the SLiMSuite package use this strategy, using a statistical model to identify SLiM enrichment based on the evolutionary relationships, amino acid composition and predicted disorder of the input proteins. The quality of input data is critical for successful SLiM prediction. Cytoscape provides a user-friendly, interactive environment to explore interaction networks and select proteins based on common features, such as shared interaction partners. SLiMScape embeds tools of the SLiMSuite package for de novo SLiM discovery (SLiMFinder and QSLiMFinder) and identifying occurrences/enrichment of known SLiMs (SLiMProb) within this interactive framework. SLiMScape makes it easier to (1) generate high quality hypothesis-driven datasets for these tools, and (2) visualise predicted SLiM occurrences within the context of the network. To generate new predictions, users can select nodes from a protein network or provide a set of Uniprot identifiers. SLiMProb also requires additional query motif input. Jobs are then run remotely on the SLiMSuite server ( http://rest.slimsuite.unsw.edu.au) for subsequent retrieval and visualisation. SLiMScape can also be used to retrieve and visualise results from jobs run directly on the server. SLiMScape and SLiMSuite are open source and freely available via GitHub under GNU licenses.

This article is included in the Cytoscape apps channel. Many protein-protein interactions (PPIs) are mediated by a short linear motif (SLiM) in one protein interacting with globular domains in another 1 . The past decade has seen the development of many computational methods and tools for predicting SLiMs from protein sequences and/or PPI data 2,3 . SLiMs are short (2-15 amino acids in length) and degenerate (with few residues determining specificity) 4 , which makes them hard to identify against a backdrop of highly conserved structural domains. These features also impart remarkable evolutionary plasticity, and convergent (i.e. independent) evolution of new SLiM occurrences is common 5,6 . Some of the best examples of this come from viruses, which often hijack host cellular processes via molecular mimicry of host SLiMs 7 . An effective approach for de novo SLiM discovery is to explicitly model this convergent evolution and look for enriched sequence patterns in non-homologous proteins 5,8,9 . SLiMFinder combines this approach with a robust statistical model, which enables a good estimation of the probability that an observed enrichment is by chance 6,9-11 . "Query" SLiMFinder (QSLiMFinder) extends this approach to define the motif search space on a specific protein, such as a viral interactor, and look for enrichment in the remaining nonhomologous proteins in the dataset 11 .
Combining motif discovery tools with biological knowledge has recently identified a new motif ("ABBA") that binds the anaphasepromoting complex or cyclosome (APC/C) ubiquitin ligase 12 . Nevertheless, despite the improved performance and potential of these methods, they are yet to deliver the promised windfall of new motifs 4 . In large part, this is due to the difficulty in constructing appropriate datasets for SLiM discovery 2,6 . SLiM discovery relies on a careful balance of maximising the SLiM-containing signal in the data whilst removing noise by reducing the search space, either in terms of PPI or protein regions 11 . Cytoscape is a well developed platform for the interactive generation and exploration of PPI datasets 13 . Cytoscape is a useful resource for SLiM discovery, enabling visual groupings of proteins that share a common interaction partner and may also share a SLiM-mediated mechanism of binding. Such proteins can either be used as input for de novo SLiM discovery approaches 5,6 or explored for enrichment of known motifs 14 .
SLiMScape brings these tools together in a friendly environment that allows the user to browse, define and explore protein nodes whose interactors are enriched for over-represented motifs.  27 . The language used has been Java SE Runtime Environment 7 (Java 7). An active internet connection is required to submit or retrieve server jobs, although subsequent analysis can be performed offline. Cytoscape may be closed whilst jobs are queued and/or running on the server.

Loading input data
SLiMScape is designed to be run directly from within Cytoscape, or to visualise the results of a previous SLiMSuite server job. As such, all three SLiMSuite programs will accept three different inputs to identify the primary dataset of proteins for analysis ( Figure 1): 1. A selection of nodes from an existing Cytoscape network view. Node 'name' attributes must be Uniprot identifiers or accession numbers.

2.
A list of Uniprot identifiers or accession numbers, separated by commas, whitespace and/or new lines.
3. The Job ID of a previous SLiMSuite server job. This may have been submitted via SLiMScape or run directly on the server.

SLiMFinder.
A set of proteins is the only required input for SLiMFinder.
QSLiMFinder. QSLiMFinder needs an additional query protein input, which is used to build the motif space 11 . This should be the Uniprot accession number of one of the input proteins. If no accession number is given, the first protein returned by Uniprot will be used. This is not necessarily the same as the first accession number provided and should therefore be avoided. The query protein(s) used will be reported in the job's log output at the server.

Setting parameters
The main input parameters for the SLiM programs are specified in the "Settings" panel ( Figure 1B): • Disorder masking: masks residues with an IUPred 19 disorder score < 0.2. This threshold can be modified by adding an iucut=X option to the custom parameters box.
• Conservation masking: masks residues with low relative local conservation 20 . By default, this uses GOPHER 21 to generate a Clustal Omega 22 alignment of predicted eukaryotic orthologues from the April 2015 release of the Quest For Orthologues reference proteomes 23 . Different protein databases for GOPHER can be selected by adding an orthdb=X command to the custom parameters box. (See GOPHER server for details.) • Feature masking: masks residues which occur in Uniprotannotated domain and transmembrane features. A different set of Uniprot features may be masked by adding ftmask=LIST to the custom parameters box.
• SLiMBuild settings: the maximum motif length (number of defined positions and maximum wildcard spacer length) and whether to return ambiguous motifs. Amino acid equivalences for motif ambiguity can be set using the custom parameters box.
• Custom parameters: additional commandline options can be provided as "Custom parameters" ( Figure 1B). These may be used to modify/supersede the options above, or to use SLiMSuite features that have been left out of the dialogue box for clarity. Please see the SLiMSuite documentation for a full list of commandline options. New features added to the SLiMSuite servers are instantly available through the custom parameters box.
Running the program Once all required inputs are provided and parameters are set, the desired program can be executed by clicking on the "Run X" button (where X is the name of the program being run). A popup window indicating processing will appear if a new server run is commencing ( Figure 2). Jobs typically take a few minutes to run, although larger jobs (>100 proteins) may take several hours to complete depending on server load and the program settings. The server Job ID for this run will be displayed and also populate the Job ID box in the input panel ( Figure 1A). This popup gives three progress options: • Stop and return to Cytoscape. The job will continue running on the server but Cytoscape can be used as usual in the meantime. Additional jobs may be sent to the server whilst waiting for one to complete. • Monitor job progress at the SLiMSuite server. This will open the job's status page at the SLiMSuite server in the user's default web browser.
• Check for job completion. If complete, this will load the results for visualisation. Otherwise, the popup will reload.
Alternatively, a previous server Job ID can be entered in the Job ID box and loaded by clicking the "Retrieve" button. If complete, this will load the data into Cytoscape for visualisation. Otherwise, the running popup will appear. If an inappropriate Job ID is provided, or a job has crashed, an appropriate message should appear. If in doubt, the Monitor button in the progress popup ( Figure 2) can be used to check that a job has executed correctly.

Output
Once finished, tables showing the discovered SLiMs are presented in the Cytoscape control panel, in a new tab named after the SLiMSuite Job ID ( Figure 1C). In each case, two tables are provided: (1) a table summarising overall statistics for each motif in the dataset, and (2) details of the individual occurrences (if any) of motifs in the input proteins. Table fields are identical for SLiMFinder and QSLiMFinder, whereas SLiMProb has slightly different output ( Table 1). The Job ID produced by the server is presented in the "Job ID" box. The SLiMScape panel only shows a subset of the (Q)SLiMFinder/SLiMProb results fields; full output can be accessed at the server by clicking the "Full results" button and viewing the "main" or "occ" tabs. Results can also be accessed by entering the Job ID directly at the SLiMSuite server  If a Job ID is retrieved without any nodes selected, a new network is created. By default, this will be named "SLiMOutput"; it is recommended to rename it in the network tab if multiple networks are to be analysed. Edges in this new network represent homology as detected with BLAST+ (E < 1e −4 ) 24 . The subnetworks defined by these edges correspond to the "Unrelated Protein Clusters (UPC)" used by the SLiMChance statistics to correct for evolutionary relationships and explicitly model convergent evolution.
When nodes are selected in the Cytoscape network graph, the visual representation of nodes will be updated upon results retrieval. Due to the wide range of possible applications and user requirements, SLiMScape node formatting has been kept deliberately simple to avoid confusion. SLiM presence in a node is indicated by a change in colour and shape; from the native settings to a red diamond ( Figure 1C). A darker shade of red indicates multiple SLiMs being present in that node. Only selected nodes will be altered and any nodes without SLiM occurrences remain as they were; if a network has already been formatted (e.g. by an earlier SLiMScape run), the old formatting is not removed first. To remove SLiMScape formatting, select the appropriate nodes, visit the Node tab of the Style control panel and "Remove Bypass" for the relevant properties. Users can also apply additional bypass or mapping styles to manually combine results from different runs. Missing nodes will not be added to an existing network; if nodes need to be added, users should create a new network by retrieving results without any nodes selected, and then merge the networks. Clearly, this will only happen when importing results from a SLiMSuite run that was created directly on the server, or with a different network.
It should be noted that although modified node attributes will be retained, SLiMScape tabs are not saved with a Cytoscape session. It is therefore recommended to rename modified networks with the Job ID if multiple runs have been performed on the same data.

Dependencies
The app relies on the Apache HTTP Client library to make HTTP requests to the SLiMSuite RESTful API server. The Java built in class (java.net.HttpURLConnection) was avoided as it does not support cancellation; an element important to a responsive user interface, particularly with large data sets and the substantial processing times these require.

Use cases
Cytoscape can be used to import PPI data from a number of databases. The IntAct database at the European Bioinformatics Institute (EBI) 25 is particularly suitable for SLiMScape analyses because the majority of nodes are mapped to Uniprot accession numbers. Other databases can be used but some additional mapping on to Uniprot identifiers might be required.
The proteins 'F-box and WD repeat domain containing 11' (Gene Symbol: FBXW11; Uniprot: Q9UKB1) and 'beta-transducin repeat containing E3 ubiquitin protein ligase' (Gene Symbol: BTRC; Uniprot: Q9Y297) are two proteins of the SCF(beta-TRCP)-ubiquitin ligase complex, which recognise phosphodegron motifs (ELM DEG_SCF_TRCP1_1 18 ) via their WD40 domains. We used the File » Import » Network » Public Databases function of Cytoscape to search for records matching FBXW11 and BTRC and imported the 184 records (184 edges; 86 nodes) from IntAct ( Figure 3A). This network was then reduced to direct human and viral PPI of BTRC and FBXW11 and duplicate edges compressed.
Working on the principle that WD40-interaction motifs are most likely to be found in proteins that interact with both WD40 proteins, Cytoscape was used to reduce the network further to nodes interacting with both proteins ( Figure 3B). These interactors were then input to SLiMFinder for de novo SLiM prediction, using disorder and Uniprot feature masking. SLiMFinder returned a variant of the phosphodegron motif, DSG (P < 0.05), which was found in 14 of the 21 proteins ( Figure 3C).
The HIV Vpu protein interacts with both proteins and is therefore of particular interest as a potential molecular mimic. We repeated the analysis using QSLiMFinder with Vpu as the query. As previously observed 11 , restricting the motif search space with QSLiM-Finder increases the sensitivity of the search and returns the DSG motif with greater significance (P < 10 −5 ) in addition to a number of different variants of the same motif ( Figure 3D).  Figure 4B).
Visual inspection of the motif distribution suggested that the DSG motif is actually more specific for the interaction than the defined ELM, showing a greater enrichment in the proportion of joint interactors versus interaction partners of only BTRC or FBXW11. This observation was confirmed by subsequent SLiMProb analysis of six different groups of proteins ( Figure 5). However, when the sequence composition of the proteins was taken into consideration, it became clear that the defined ELM was considerably more enriched across all BTRC/FBXW11 interactors than the simpler DSG motif. This was particularly pronounced when looking at evolutionarily conserved instances (i.e. results from SLiMProb analyses with conservation masking) ( Figure 5C). This demonstrates the power of    Table 1).
combining Cytoscape and SLiMSuite to get insights that are not obvious from either tool in isolation.
Also of interest is the other viral protein in the dataset, the NSP1 protein of ribovirus A (Uniprot: Q84940), which is reported to interact with FBXW11 but not BTRC. It has a DSG.SD sequence, which is intriguingly similar to the DEG_SCF_TRCP1_1 motif. Indeed, this motif was recently reported to be another case of molecular mimicry, targeting the beta-TrCP subunit 26 . The evolutionary dynamics of SLiMs are complex; further work will need to be done to establish whether any of the other DSG instances that fail to match the more specific ELM definition represent hitherto undescribed variants of the motif or non-functional background sequence patterns.

Summary
SLiM discovery is a challenging task that requires high quality data in addition to appropriate bioinformatics tools. There is often a disconnect between users with the biological knowledge to construct the former and those with the computational experience to run the latter. Embedding the SLiM discovery tools of SLiMSuite within the interactive environment of Cytoscape will help to bridge that gap, enable new patterns to be identified and new questions to be formulated.

Data availability
SLiMSuite results for the figures in this manuscript can be retrieved by entering the Job ID indicated in the text through the SLiMScape app, or at: http://www.slimsuite.unsw.edu.au/servers.php. Job IDs for Figure 5 are given in Table 2.  The integration of some of the SLiMSuite tools with the Cytoscape network analysis platform is likely to be of potential importance in the discovery of SLiMs. We believe that the manuscript is suitable for indexing. However, we have the following general queries:

Software availability
Are there any other APPs or plugins available with Cytoscape with similar utility? If yes, then has the performance of SLiMScape been compared with those?
On pg.4 of the version 1 of the paper, the authors refer to the "SLiMSuite documentation" for the complete list of commandline options. It would be good if the link to the documentation be added somewhere in the manuscript.
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. SLimfinder is one of the state of the art linear sequence motif discovery tool. The idea of short linear sequence motifs is intuitive, has been demonstrated as a binding mode for many domains and may govern PTMs to some extent. However, because of low sequence complexity, it is very hard to find motifs in a reliable manner, i.e. statistically sound manner, and it is thus beyond the scope of non-motif de novo expert biologist. Cytoscape 3 was a major change in network analysis. APPs are required that run easily and reliably and on a solid algorithmic basis, which are often computer intensive.
This manuscript reports the update of the SLIMfinder plugin to a SLiMScape APP and provides a very detailed description how to use it and a test case searching for FBXW11 and BTRC WD40 domain binding motifs " " (within the space of their interaction partners from INTACT). This is a very useful de novo report for a very useful tool and the nice documentation, including the available cys sessions, will stimulate its use.

Two minor points:
Praxis of storage of data on the external server is a little unclear, I guess quite some potential use may find an option to make sure data are removed after analysis / after some time quite important.
In practice output files will be renamed by the user, but it is always nice to give unique name to files that are exported e.g. coupled to jobID or something systematic, so that many outputs can be saved e.g. as batch and then processed with a script. If all are "SLiMOutput" than this may get tedious.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed.

Competing Interests:
Author Response 27 Aug 2015 , University of New South Wales, Australia

Richard Edwards
These are excellent points and we will update the documentation accordingly. The quick/simple answers are: