ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article
Revised

Identifier Mapping in Cytoscape

[version 2; peer review: 3 approved]
PUBLISHED 06 Aug 2018
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Cytoscape gateway.

Abstract

Identifier Mapping, the association of terms across disparate taxonomies and databases, is a common hurdle in bioinformatics workflows. The idmapper app for Cytoscape simplifies identifier mapping for genes and proteins in the context of common biological networks. This app provides a unified interface to different identifier resources accessible through a right-click on the table's column header. It also provides an OSGi programming interface via Cytoscape Commands and CyREST that can be utilized for identifier mapping in scripts and other Cytoscape apps, and supports integrated Swagger documentation.

Keywords

Cytoscape, ID Mapping, Identifiers, BridgeDb

Revised Amendments from Version 1

The following reviewer comments are addressed in this version:
* Clarification about relationship to and reliance upon BridgeDb project app, databases and web services.
* Updates to Table 1 and caption
* Clarification of persistent selection behavior in GUI
* Added Use Case 3: Identifiers and symbols
* Explanation of "force single"
* Example of R code with and without the custom function”
* Clarification on how regular expressions are used for data source inference
* Consistent references to “Uniprot-TrEMBL”
* Described how results are added to Table Panel
* Changed “singular” to “single”
* Updated documentation on available species

See the authors' detailed response to the review by Nadezhda Doncheva
See the authors' detailed response to the review by Augustin Luna
See the authors' detailed response to the review by Ruth Isserlin

Introduction

Cytoscape is an integrated network visualization tool and analysis platform1,2. Within its common workflows, identifier mapping remains a challenge when working with biological data from different sources. This problem has been addressed by the BridgeDB project3, which created clients and services to translate between various identifiers. The original BridgeDb app4 for Cytoscape was written to provide an exhaustive set of functions to match the full capabilities of BridgeDb. Though this provided the needed functionality, its basic usage was unnecessarily complex. The idmapper app is a useful alternative, providing access to a commonly used subset of BridgedDb databases via web services by means of a simplified interface bundled into Cytoscape. Now, without any installation or configuration, Cytoscape users can right-click on a table header to map that column’s data to a different namespace (Figure 1). Although, the breadth of coverage is smaller than the full-featured BridgeDb app, it still covers over a dozen identifier data sources maintained by BridgeDb, including Ensembl, Entrez Gene, HGNC, KEGG, Uniprot-TrEMBL and various species-specific sources. Because idmapper supports Cytoscape’s new CyREST interface, identifier mapping can be included in scripted workflows, and driven from R or python programs.

2dfa77a2-73a7-4f36-9d3b-7bbb16c24875_figure1.gif

Figure 1. Simplified dialog for ID Mapping.

Four options are presented to the user when accessing idmapper from within the Cytoscape GUI, each with common default or inferred values to reduce the number of steps required of the user.

Implementation

Inferring the data source

From within Cytoscape, a user initiates an ID mapping operation by right-clicking on the header of a column containing identifiers in the Table Panel. Based on the specified species a list of data sources is provided to the user. In the most common cases the type of identifier can be guessed by idmapper based on the its format and is presented as the default selection. Table 1 shows the supported data sources and example identifier formats. The app looks at the first ten entries and chooses the source that matches corresponding regular expressions provided by BridgeDb. If there is no match (or if more than one system is matched), then it simply chooses first option in the list as the default selection.

Table 1. Supported Data Sources.

The parameter names of supported data sources, their species exclusivity and an example identifier. Note that Ensembl support is only for gene identifiers, not proteins.

Data SourceSpeciesExample
EnsemblAnyENSG00000139618
Entrez Gene Any 11234
KEGG Genes Any syn:ssr3451
UniGene Any Hs.553708
Uniprot-TrEMBL Any P62158
FlyBaseDrosophila melanogasterFBgn0011293
HGNCHomo sapiensDAPK1
MGIMus musculusMGI:2442292
RGDRattus norvegicus2018660
SGDSaccharomyces cerevisiaeS000028457
TAIRArabidopsis thalianaAT1G01030
WormBaseCaenorhabditis elegansWBGene00000001
ZFINDanio rerioZDB-GENE-041118-11

Cytoscape tasks

There are two different tasks supported by the idmapper app. ColumnMappingTask is activated by the right-click mouse event on a table header. It infers the current table and column from the information that comes from the mouse event, triggering a dialog (see GUI use case) that collects the information needed to make a call to BridgeDb web services. Please refer to the BridgeDb project for details about their services and sources3. In order to support automation, we added MapColumnCommandTask as an analog that is exposed specifically for Commands and CyREST access. These tasks eventually result in the same algorithms being invoked.

Use cases

Cytoscape graphical user interface (GUI)

The idmapper app provides the same basic functionality of the BridgeDb app with less fuss. Users do not have to install it, launch it, make configuration decisions or think about which database they are accessing. The app comes bundled with every Cytoscape release. As such its usage in Cytoscape via the interactive GUI (graphical user interface) is documented in the Cytoscape manual: http://manual.cytoscape.org/en/stable/Node_and_Edge_Column_Data.html#mapping-identifiers.

To map an identifier from one source to another, right click on the column header of your identifier. Select the option to Map Column to bring up the idmapper dialog (Figure 1).

The idmapper dialog presents a few choices the user can override before performing ID mapping. The default Species is determined by the previous selection made by the user per network, providing a persistant behavior across mulitple searches. The available choices for the identifier data sources are determined by the species. The Map from data source is automatically selected based on an inspection of the first ten identifiers found in the column clicked on by the user. This can be overridden by the pull down menu. The To data source must be selected by the user; Ensembl is presented by default. Finally, the Force single checkbox offers to simplify the results of ID mapping by ignoring one-to-many cases and only keeping the first result (arbitrarily determined by the BridgeDb web service result). If the option is off, a list of results will appear in the column. This can easily be overridden by clicking the toggled checkbox. The result of the mapping is appended to the node table in a column named after the target data source, e.g., "Ensembl". If a column by that name already exists, a parenthesized number is appended to the name to ensure it is unique, e.g., "Ensembl(1)".

Cytoscape command line interface

The command interface does not use the same tasks as the GUI. In the GUI use case, the app knows the current context of where the command was activated, i.e., the network, table and column. This information must explicitly be provided as paramaters to the command interface to perform the same operation. Thus, in addition to species, mapFrom, mapTo and forceSingle, the command line operation of idmapper also requires networkName, table and columnName (see next section for more details).

Cytoscape automation

In the scripting environment, idmapper provides all of its functionality in a single call (Figure 2). This means that identifier mapping can be incorporated into Cytoscape automation workflows with a single additional command. The scripting version of the command includes extra parameter for columnName, networkName and table, which are implicit in the GUI version from the location of the mouse event.

The map column function takes the following parameters:

  • columnName (string): Specifies the column name where the source identifiers are located

  • forceSingle (string, optional): When multiple identifiers can be mapped from a single term, this forces a singular result

  • mapFrom (string): Specifies the data source describing the existing identifiers

  • mapTo (string): Specifies the data source identifiers to be returned as a result in a new column

  • networkName (string, optional): Which network is used in the mapping.

  • species (string): The common or latin name of the species to which the identifiers apply, e.g., Human, Homo sapiens, Mouse, Mus musculus, Rat, Rattus norvegicus, Frog, Xenopus tropicalis, Zebra fish, Danio rerio, Fruit fly, Drosophila melanogaster, Mosquito, Anopheles gambiae, Arabidopsis, Arabidopsis thaliana, Yeast, Saccharomyces cerevisiae, E. coli, Escherichia coli, Tuberculosis, Mycobacterium tuberculosis, Worm, Caenorhabditis elegans

  • table (string, optional): Which table is used as the source of the identifiers, e.g., "node" for the default node table

2dfa77a2-73a7-4f36-9d3b-7bbb16c24875_figure2.gif

Figure 2. Swagger documented function.

The functionality of idmapper is contained in this single function: map column.

With Cytoscape running, the map column function can be called from any scripting environment or programming language that supports REST calls. In the case of R and Python scripts, there are dedicated packages to make this even easier. The RCy3 package wraps this command in an R function called mapTableColumn to conform to other table functions (https://www.bioconductor.org/packages/release/bioc/html/RCy3.html). The py2cytoscape library similarly provides this command as a python function, cyclient.idmapper.map_column (https://github.com/cytoscape/py2cytoscape). The advantage of using one of these dedicated packages is having more concise syntax and language-specific conventions. In RCy3, for example, the custom mapTableColumn function simplifies the call, conforms to other RCy3 functions and returns a dataframe with the map.from and map.to columns, while the generic commandsPOST function relies on the composition of a command string using the idmapper parameters defined in Figure 2:

(RCy3 generic): commandsPOST(paste('idmapper map column,
                        columnName="name",  forceSingle="true", 
                        mapFrom="Ensembl",  mapTo="Entrez Gene", 
                        species="Human",  table="node",  sep=" '))

(RCy3 custom): mapTableColumn(column="name",  species="Human", 
                map.from="Ensembl",  map.to="Entrez  Gene")

A sample script demonstrates how to map identifiers via RCy3, covering the most common use cases (https://github.com/cytoscape/RCy3/blob/master/vignettes/Identifier-mapping.Rmd).

Case 1: Species-specific considerations

The Yeast Perturbation sample network provided with Cytoscape can be loaded from the Starter Panel and provides gene identifiers of the form “YDL194W”. These are actually Ensembl-supported identifiers for Yeast, distinct from the typical “ENSXXXG00000123456” form as presented in Table 1. This presents a special case that users will need to be aware of when selecting species and source database or mapFrom in the GUI. (Ensembl has special cases for Yeast, Worm and Fly identifiers in addtition to the standard terms that start with ENS.) In terms of automation, you could generate a new column of Entrez Gene IDs in this network with these calls:


(RCy3): mapTableColumn(column="name", species="Yeast", 
                map.from="Ensembl", map.to="Entrez Gene")

(py2cytoscape): cyclient.idmapper.map_column(source_column="name", species="Yeast", 
                source_selection="Ensembl", target_selection="Entrez Gene")

Case 2: From proteins to genes

When working with protein interaction networks, for example those from the STRING database (see http://apps.cytoscape.org/apps/stringapp), you may want to translate protein identifiers (e.g., Uniprot-TrEMBL) to gene identifiers. The idmapper app supports this case as well, but one should be aware of the assumptions involved when making this translation. Since most genes encode for many proteins, you may have many-to-one mappings in your results. For all human networks imported from STRING using the StringApp5, the following commands will perform an ID mapping from Uniprot-TrEMBL (proteins) to Ensembl (genes):

(RCy3): mapTableColumn(column="canonical name", species="Human",
                map.from="Uniprot–TrEMBL", map.to="Ensembl")
                
(py2cytoscape): cyclient.idmapper.map_column(source_column="canonical name",
                species="Human", source_selection="Uniprot–TrEMBL",
                target_selection="Ensembl")

Case 3: Identifiers and symbols

In contrast to gene names and symbols, identifiers provide a more reliable means of specifying a particular gene. All data integration should be performed using identifiers as keys. Nevertheless, names and symbols play an important role in making results easier to read and understand. The idmapper app is primarily concerned with identifiers. However, relying on a subset of commonly used sources from BridgeDb (Table 1) it does provide one exception. HGNC symbols, when used properly, can serve as identifiers in ID mapping and more generally can be added when starting from any other human ID source:

(RCy3): mapTableColumn(column="canonical name", species="Human", 
                map.from="Ensembl", map.to="HGNC")
(py2cytoscape): cyclient.idmapper.map_column(source_column="canonical name", 
                species="Human", source_selection="Ensembl",
                target_selection="HGNC")

Limitations

The idmapper app provides easy access to a critical subset of ID mapping functionality originally covered by the BridgeDb app. When users run into the limitations of idmapper, they still have the option of installing and using the full-featured BridgeDb app from https://apps.cytoscape.org/apps/bridgedb. Examples of limitations include support for additional species or data sources. The BridgeDb app includes more of both as well as means to access custom data sources.

Software availability

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 11 Jun 2018
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Treister A and Pico AR. Identifier Mapping in Cytoscape [version 2; peer review: 3 approved]. F1000Research 2018, 7:725 (https://doi.org/10.12688/f1000research.14807.2)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 06 Aug 2018
Revised
Views
5
Cite
Reviewer Report 08 Aug 2018
Nadezhda Doncheva, Novo Nordisk Foundation Center for Protein Research & Center for non-coding RNA in Technology and Health, University of Copenhagen, Copenhagen, Denmark 
Approved
VIEWS 5
The idmapper is a simple, but very useful app for Cytoscape that significantly enhances the functionality of Cytoscape for users of the GUI and the CyREST interface. My concerns about the article have been addressed by the authors and I ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Doncheva N. Reviewer Report For: Identifier Mapping in Cytoscape [version 2; peer review: 3 approved]. F1000Research 2018, 7:725 (https://doi.org/10.5256/f1000research.16859.r36844)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
7
Cite
Reviewer Report 07 Aug 2018
Ruth Isserlin, Donnelly Centre for Cellular and Biomolecular Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada 
Approved
VIEWS 7
This version addressed all my concerns.  The idmapper app ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Isserlin R. Reviewer Report For: Identifier Mapping in Cytoscape [version 2; peer review: 3 approved]. F1000Research 2018, 7:725 (https://doi.org/10.5256/f1000research.16859.r36842)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 11 Jun 2018
Views
12
Cite
Reviewer Report 19 Jul 2018
Nadezhda Doncheva, Novo Nordisk Foundation Center for Protein Research & Center for non-coding RNA in Technology and Health, University of Copenhagen, Copenhagen, Denmark 
Approved with Reservations
VIEWS 12
The paper describes idmapper, a simple yet very useful app for converting node identifiers from one data source to another. It is provided as part of the widely used network analysis and visualization software Cytoscape and the mapping functionality can ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Doncheva N. Reviewer Report For: Identifier Mapping in Cytoscape [version 2; peer review: 3 approved]. F1000Research 2018, 7:725 (https://doi.org/10.5256/f1000research.16116.r34907)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 06 Aug 2018
    Alexander Pico, Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
    06 Aug 2018
    Author Response
    Nadya, thank you for the review. We have prepared a version 2 of the paper that addresses your main comments and corrections. A couple additional notes that are not reflected ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 06 Aug 2018
    Alexander Pico, Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
    06 Aug 2018
    Author Response
    Nadya, thank you for the review. We have prepared a version 2 of the paper that addresses your main comments and corrections. A couple additional notes that are not reflected ... Continue reading
Views
11
Cite
Reviewer Report 03 Jul 2018
Augustin Luna, cBio Center, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA 
Approved
VIEWS 11
This Cytoscape app provides functionality that is widely useful for Cytoscape users for converting network identifiers to different databases. Some points not completely clear:
  1. The list of regular expressions used for inference, do they come from
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Luna A. Reviewer Report For: Identifier Mapping in Cytoscape [version 2; peer review: 3 approved]. F1000Research 2018, 7:725 (https://doi.org/10.5256/f1000research.16116.r34906)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 04 Jul 2018
    Alexander Pico, Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
    04 Jul 2018
    Author Response
    Augustin, thanks for the review.

    1. The regular expressions come from BridgeDb (which in turn gets them from identifiers.org). If there isn't a match to the regular expressions (or if more ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 04 Jul 2018
    Alexander Pico, Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
    04 Jul 2018
    Author Response
    Augustin, thanks for the review.

    1. The regular expressions come from BridgeDb (which in turn gets them from identifiers.org). If there isn't a match to the regular expressions (or if more ... Continue reading
Views
18
Cite
Reviewer Report 12 Jun 2018
Ruth Isserlin, Donnelly Centre for Cellular and Biomolecular Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada 
Approved with Reservations
VIEWS 18
The paper entitled "Identifier Mapping in Cytoscape: idmapper" by Adam Treister and Alexander Pico presents a new implementation of id mapping available directly through Cytoscape with no additional configuration.  

There is a lot of discussion throughout the ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Isserlin R. Reviewer Report For: Identifier Mapping in Cytoscape [version 2; peer review: 3 approved]. F1000Research 2018, 7:725 (https://doi.org/10.5256/f1000research.16116.r34908)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 26 Jun 2018
    Alexander Pico, Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
    26 Jun 2018
    Author Response
    Ruth, thank you so much for your thorough review. These clarifications, fixes and additions have greatly improved the article.  Version 2 should be released soon, addressing all the issues you ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 26 Jun 2018
    Alexander Pico, Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
    26 Jun 2018
    Author Response
    Ruth, thank you so much for your thorough review. These clarifications, fixes and additions have greatly improved the article.  Version 2 should be released soon, addressing all the issues you ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 11 Jun 2018
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.