Explicit interaction information from WikiPathways in RDF facilitates drug discovery in the Open PHACTS Discovery Platform

Open PHACTS is a pre-competitive project to answer scientific questions developed recently by the pharmaceutical industry. Having high quality biological interaction information in the Open PHACTS Discovery Platform is needed to answer multiple pathway related questions. To address this, updated WikiPathways data has been added to the platform. This data includes information about biological interactions, such as stimulation and inhibition. The platform's Application Programming Interface (API) was extended with appropriate calls to reference these interactions.  These new methods of the Open PHACTS API are available now.

Amendments from Version 1 • The most significant change in this revision is the addition of two simple example workflows • Supplementary File 1 was added which includes example result outputs for the two remaining API calls not shown in the main manuscript

Introduction
Targeting proteins to ideally restore normal biological processes is a common starting point in drug discovery 1 . The Open PHACTS Discovery Platform (OPDP) was designed to help identify protein targets and information about their associations with each other 2-4 . The OPDP supports target identification and validation by including target-target interactions from WikiPathways 5-7 . Of these interaction networks, proteins sharing a downstream path allows investigation of alternative drug target combinations. Even the knowledge of which biological pathways participate in disease-related processes provides insight in the pathway topology between the targets. The importance and need of providing access to interaction information for real-world research questions was outlined in a recent Open PHACTS paper 8 .
The Open PHACTS project was born out of the desire to integrate pharmacological data from multiple precompetitive sources to efficiently address scientific questions that cannot be answered with single data sources 8 . It integrates data using linked data approaches 3 from chemical and biological sources such as ChEBI, ChEMBL, UniProt, and WikiPathways 6 . However, the OPDP did not previously include calls to access specific up-and downstream interaction effects. This information is needed for questions related to drug repositioning and repurposing. Up-or downstream targets may be interesting alternatives with similar therapeutic effect to targets, for which it is particularly hard to develop a drug agent. Thus, finding a target that has already been drugged or is more drug tractable will be advantageous. Here we describe how to identify alternative targets in the same cellular pathway using OPDP against the WikiPathways data. The WikiPathways basic drawing tools also contain generic arrows and T-bar annotations that give the user the ability to create basic diagrams without the semantic meaning of MIM or SBGN notations. The interactions connecting these nodes are captured, but the only explicit information is that it is a directed interaction from a source to a target. To handle more complicated enzyme reaction drawings, where there is not a single line that directly connects targets in a cascade of enzymatic reactions, a query was developed that recognizes these types of reactions. However, this is not implemented in the current Open PHACTS Application Programming Interface (API).

Implementation
Version 2.1 of the OPDP API contains three new calls for interactions and their pathways. The first call, /pathway/getInteractions, returns all interactions involved in a pathway. To use this feature, the user specifies a pathway URI and OPDP returns its interactions including information about direction and the connected entities. The direction information is relayed as a starting node having a wp:source annotation, while the end of the interaction has the wp:target annotation. In its simplest form, this means that if gene product A is interacting with a gene product B, then we have wp:source for product A and wp:target for product B. However, the presented new methods also support interactions with multiple sources and targets for more complex interactions that are more accurately represented this way.
The second added call, /pathways/interactions/byEntity, returns the direction of the interactions involving this entity. An entity is specified by a URI and can be a metabolite, protein, gene product, or RNA. API options allow the user to select only upstream or only downstream interactions. If a direction is not specified in the call, all the adjacent interactions will be retrieved regardless of their direction. The results also specify the interaction type (e.g. inhibition, stimulation, conversion). Vocabularies.wikipathways.org also identifies catalysis and binding events as well as a more generic directedInteraction in the case where the type of the interaction is not identified. This ability to select the interaction direction is specifically what allows users to answer scientific questions around upstream and downstream effects, such as those defined by Open PHACTS. The third API call is /pathways/interactions/byEntity/ count which is a helper function that returns the number of interactions for a target.

Operation
The OPDP API calls are backed by SPARQL searches against the loaded WikiPathways RDF. The query parameters that are required or optional are given in the documentation of Open PHACTS (https://dev.openphacts.org/docs/2.1). As in previous versions, the API uses HTTP GET to call methods and needs a (free) application ID and key (see https://dev.openphacts.org/ signup) 3 .
To ensure multiple URI schemes can be used to identify genes, proteins, and metabolites, the Open PHACTS platform uses an Identifier Mapping Service (IMS) 6 . This ensures that people can use Ensembl, NCBI Gene, and others for genes, UniProt, Ensembl, etc. for proteins, and HMDB, ChEBI, CAS registry number, and PubChem for metabolites. Furthermore, it supports identifiers.org formatted URIs, further simplifying entering identifiers 13 .

Example queries
We are demonstrating the platform with three example calls. All the API calls require use of an application ID and an application key. This key and ID can be acquired by creating a free Open PHACTS account. The first example is an application to the PI3K/AKT pathway for cell growth regulation which contain important targets for cancer treatment 14 . The AKT protein has a central role and usefully shows the API call's ability to return connected elements with the /pathways/interactions/byEntity and the /pathway/getInteractions calls. The API calls can help aid drug discovery by taking a target, in this case AKT, and easily identify other connected proteins that could potentially be used as drug targets with a common downstream effect. Figure 1 shows the web interface of the API call that returns the connectivity of the AKT2 target to both upstream or downstream proteins or gene products. This method allows the user to identify connections to other targets in the pathway. The results of that API call (Figure 2) show the AKT2 interaction with microRNA. A helper method ( Figure 3): /pathways/interactions/ byEntity/count is also included. It returns the number of all interactions in which an entity is participates. This helps the user get a sense of the prevalence of the queried entity with interactions in pathways found on WikiPathways. An example result for this query can be found in Supplementary Figure 1.
The other call implemented, /pathway/getInteractions (Figure 4), demonstrates an API call to return all interactions in the MicroRNAs in cardiomyocyte hypertrophy pathway 15 . This pathway has interaction details for AKT, mTOR, and PI3K, which are all important targets in cancer research 16 . For each interaction the participants are given and whether it is a directed  or undirected interaction. An example result for this query can be seen in Supplementary Figure 2.

Example workflows
In order to demonstrate the basic use of the introduced API methods, we developed two workflows, available in the Supplementary Material. One uses Python to return a file with the results in a table and the other uses a HTML webpage using the ops.js JavaScript client library 17 . More involved workflows have been developed for KNIME and Pipeline Pilot 18,19 .
The Python script example uses the Open PHACTS /pathway/ getInteraction API call and prompts the user to enter a WikiPathways pathway number that they wish to query, such as 1544 for WikiPathways pathway WP1544. Invocation of the API call with the pathway identifier returns information about the directed interactions that are involved with the pathway. The information that is returned is the interaction ID used by WikiPathways, the interaction type, and URIs for the source and target of the interaction. In order to convert the URIs into something more readable, a SPARQL query is then executed to get labels, from the WikiPathways SPARQL endpoint, for the source and target of the interaction. The results are written to a file with the interaction ID, interaction type, URIs for the source and target, as well as alias IDs, the curl for the API call, the pathway ID used, and a number of interactions returned.
The second example uses a HTML5 webpage and the ops.js JavaScript client library to retrieve interactions for a particular gene, using the URI for the gene's Ensembl identifier and the /pathways/interactions/byEntity API method. The ops.js library passes the returned JSON with interaction information to a callback function, where the interacting source and target are extracted and the interacting entity determined. For each interacting entity, which may be a protein, RNA, or small compound, a call to the /pathways/interactions/byEntity/count method is made to return the number of interaction that entity has.

Summary
While the calls identified here are simple calls, workflow tools make it possible to take advantage of the integrative nature of the OPDP to make API calls in succession. Two such workflow tools that work with the OPDP are KNIME and Pipeline Pilot. With these tools, it is possible to perform a directional query of a target and identify alternative targets that can then be queried against the chemistry calls to identify active compounds for these alternative targets. The client libraries ops.js, ops4j, and ropenphacts also support Open PHACTS and the interaction calls for pathways. This allows users to perform API calls to the OPDP using their preferred language or platform, such as JavaScript, Java, or R.
The addition of interactions with direction information allows OPDP to answering more of the pre-defined scientific questions 2 . The directional information allows the user to explore how proteins and gene products are connected with one another and easily access this information. This is illustrated in the example queries using the cancer target AKT. 1.
Some comments: In the first example, the query for AKT2 in the figure 1 uses but the result in the figure 2 shows http://identifiers.org/ensembl/ENSG00000105221 . I guess the API should be able to accept different commonly http://identifiers.org/ncbigene/208 used identifiers, however, either the article or the web interface mention about the acceptable identifiers. In addition, It would be nice if the input of the query accepts simple identifiers, instead of constructing the full URI.
The result in figure 2 is difficult to interpret. It would be nice to expand the query to get the basic information for those interacting entities. For example, the gene symbol or the compound name.
The examples in figure 3 and figure 4 didn't show the query results.
Simple queries in the examples may be difficult to show the usefulness of the new implemented APIs. A more sophisticated application which contains either bulk query or pipeline query should be helpful for the readers to understand the demand of using the APIs.

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 11 Oct 2018 , Maastricht University, The Netherlands Ryan Miller 1. The Open PHACTS API includes the Identifier Mapping Service component which allows use of many identifier schemes, as long as suitable link sets are available (at http://data.openphacts.org/1.5/ims/linksets/). These sets include many of the popular gene, protein, and metabolite identifier:for genes, proteins, and RNAs Ensembl, Entrez gene, protein, and metabolite identifier:for genes, proteins, and RNAs Ensembl, Entrez Gene, UniProt, and others; for metabolites the ID sources can include HMDB, ChEBI, PubChem, CAS registry numbers, and ChemSpider.
2. Retrieving specific gene symbol or compound names aside from the identifier URI, resource, and identifier can be part of a workflow in workflow tools like KNIME or Pipeline Pilot. We did create a simple workflow to address this. In this example, we use the /pathway/getInteractions API call to return the directed interactions for a pathway and the resulting IDs for the interaction and then returned the more human readable labels for the URIs.
3. Additional example output figures were added to the supplementary materials section to reflect the addition of example query results for the remaining calls.

1.
I assume that when "up" or "down" interactions are not specified for return then both are returned is this correct?
Are the example interactions (i.e. inhibition, stimulation, conversion) the only ones available in the API? If not, is there a listing of the interaction types with description of the interaction type.
It would be more clear to just state the call function name rather than "the first and third calls" and "the api call" in the example queries section.
Additional examples of the result output for all described function calls would be helpful for the described API calls.
Additional examples that highlight the integrative nature of Open PHACTS would be nice. For example, to show how results of the currently described API calls can paired the other data available, such as the CHEBI data from the platform.

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. responsible for the drawing and graphical elements. The WPRDF side is there to describe the semantic elements of the RDF. We have updated the Methods section and clarified this in the first "Implementation" paragraph. The API is REST-like and dedicated clients are not required. However, a few client libraries are already available for various languages and platforms: JavaScript (ops.js), Java (ops4j), ropenphacts, and KNIME nodes. Personal experience with auto-generated packages has been negative, but based on code generation in general, not the Open PHACTS API. Supported libraries are referenced in the summary sections along with the workflow environments. We intended to clarify that this simple example referred to single-source, single-target interactions, but that the framework also supports interactions with more than one source and/or target. Such complex interactions occur frequently. The last sentence of the third paragraph under Methods: Implementation has been updated to make this more clear. The Open PHACTS API includes the Identifier Mapping Service component which allows use of many identifier schemes, as long as suitable link sets are available (at http://data.openphacts.org/1.5/ims/linksets/). These sets include many of the popular gene, protein, and metabolite identifier:for genes, proteins, and RNAs Ensembl, Entrez Gene, UniProt, and others; for metabolites the ID sources can include HMDB, ChEBI, PubChem, CAS registry numbers, and ChemSpider. The caption of Figure 1 has been extended and a paragraph added to Operation section to reflect this. Yes, you are correct: if the user does not specify the directions for the interactions, all the immediate interactions are retrieved regardless of their direction. This can be found in the last paragraph of the Methods section and updated here to clarify. Vocabularies.wikipathways.org also identifies catalysis and binding events as well as a more generic directedInteraction in the case where the type of the interaction is not identified. This can be found in the last paragraph of the Methods: Implementation. The text in the under Example Queries section has been updated to use the names of the API calls used. Additional figures that show example outputs for the remaining figures can be found in the supplemental info section. Workflow tools make it possible to take advantage of the integrative nature of the OPDP to make API calls in succession. With these tools, it is possible to perform a directional query of a target and identify alternative targets that can then be queried against the chemistry calls to identify active compounds for these alternative targets. An Example Workflows section has been added describing two simple workflows and the code for the workflows can be found in the supplementary information.
No competing interests were disclosed. Competing Interests: