ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Method Article

AnalogExplorer2 – Stereochemistry sensitive graphical analysis of large analog series

[version 1; peer review: 2 approved]
PUBLISHED 09 Oct 2015
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Cheminformatics gateway.

Abstract

AnalogExplorer is a computational methodology for the extraction and organization of series of structural analogs from compound data sets and their graphical analysis. The method is suitable for the analysis of large analog series originating from lead optimization programs. Herein we report AnalogExplorer2 designed to explicitly take stereochemical information during graphical analysis into account and describe a freely available deposition of the original AnalogExplorer program, AnalogExplorer2, and exemplary compound sets to illustrate their use.

Keywords

Medicinal chemistry, analog series, computational design, graphical analysis, structure-activity relationships, open access software

Introduction

In medicinal chemistry, analog series are typically analyzed in R-group tables. Once analog series become so large that they are difficult to represent and study in conventional R-group tables, computational tools are indispensable for their exploration. Therefore, different computational methods have been introduced for graphical analysis of analog series19. Many of these approaches are based on the determination of maximum common substructures (MCS) of compound series and focus on substituents of common cores, while others employ the matched molecular pair10 formalism to define analog series. Among these computational methods is AnalogExplorer9, which has been designed to systematically organize and graphically analyze analog series and associated structure-activity relationship (SAR) information. AnalogExplorer initially identifies analog series on the basis of hierarchical molecular scaffolds11 and then determines their MCS for further analysis. Accordingly, the method is not limited to the study of individual analog series but can also be applied to extract series from structurally heterogeneous compound sets. For example, AnalogExplorer is directly applicable to late-stage lead optimization sets that often contain multiple series with large numbers of analogs.

Herein we introduce AnalogExplorer2, an extension of the approach, which explicitly considers stereoisomers during graphical analysis, providing a detailed account of stereochemistry and its influence on SARs. AnalogExplorer2 is publicly available. Hence, we also report an open access deposition of the original AnalogExplorer program and AnalogExplorer2 as well as exemplary data sets assembled to illustrate the workflow of graphical analysis and help users become familiar with the program12.

Methodology

Organization of analog series

AnalogExplorer systematically determines substitution sites or site combinations in analog series and divides series into subsets having varying R-groups at the same site(s). Analog series are initially identified on the basis of hierarchical molecular scaffolds11. For a series of analogs sharing the same scaffold, the maximum common substructure (MCS) is then determined, as illustrated in Figure 1, and used for R-group decomposition in order to index and identify all substitution sites and the respective R-groups for each compound in a series.

8f77b99e-10d4-44f1-9efc-ddfdb7e6a6f2_figure1.gif

Figure 1. Maximum common substructure (MCS).

Shown are six structurally analogous compounds represented by their MCS (right) with five substitution sites (R1–R5). For each compound, the corresponding substituents are highlighted in blue.

On the basis of the MCS, an analog series is divided into subsets of compounds with varying R-groups at the same substitution site or site combination. The organization is compound-based such that each member of a series only occurs in one subset. Unique compound subsets provide the basis for graphical analysis, as discussed in the following. Further methodological details are provided in the original AnalogExplorer reference9.

Graphical components

AnalogExplorer consists of three graphical components. The complete graph (Figure 2a) captures all possible substitution sites and site combinations for a series following R-group decomposition (by design it is a directed acyclic graph). Each node represents a substitution site or site combination and all compounds with varying R-groups at the site(s). The root node 0 corresponds to a (hypothetical) compound with no R-group at any site. Node 1 represents analogs that only contain R-groups at R1 and node 12 compounds with R-groups at R1 and R2 etc. Nodes are arranged in different layers. For example, layer 1 consists of all nodes with one substitution site and layer 2 of all nodes with two sites. Edges between nodes in adjacent layers indicate all possible subset relationships, i.e. an edge is drawn if the substitution site(s) represented by a node is a subset of a site combination of another node. As indicated in Figure 2a, nodes are scaled in size according to the number of analogs comprising the subset they represent and color-coded according to the mean potency of the analogs. In addition, node border thickness indicates the potency range covered by a subset. Furthermore, white (empty) nodes correspond to possible site combinations for which no analogs are currently available within a given series. In the reduced graph, all empty nodes and connecting edges are removed for clarity (Figure 2b). Thus, the reduced graph provides a convenient format for the analysis of individual series. As the third graphical component, R-group trees are provided for each substitution site and site combination, as illustrated in Figure 2c. In the R-group tree, substitution sites for a given subset are arranged in different layers, the order of which is determined by the number of unique R-groups at each site. All R-groups are displayed in the tree. Each leaf node represents an analog (colored according to its potency). Intermediate nodes represent subsets of analogs sharing the same substituents at corresponding site(s) (and are colored by mean analog potency).

8f77b99e-10d4-44f1-9efc-ddfdb7e6a6f2_figure2.gif

Figure 2. AnalogExplorer graphs.

(a) Shown is the complete graph for a series of 25 analogs active against serotonin 7 receptor. (b) The reduced graph is displayed obtained from (a) by removing all empty nodes and edges between them. (c) The R-group tree for substitution site combination 1345 is shown. All R-groups are provided for individual tree nodes. Stereoisomers and their corresponding pKi values are given at the bottom. Abbreviation: OoM, order of magnitude.

Given its design, AnalogExplorer provides a systematic hierarchical organization of all possible substitution sites or site combinations for an analog series (complete graph) and enables the elucidation of SAR patterns within the hierarchy (reduced graph) and at further increased resolution for analog subsets (R-group trees). The approach is particularly suitable for the analysis of large analog series because subsets of such series associated with interesting SAR information can be selectively displayed and analyzed.

Stereochemical information

The explicit consideration of stereochemistry during graphical analysis at the level of R-group trees is the major methodological enhancement of AnalogExplorer2 (in addition to further increased consistency of compound mapping to MCS considering intra-molecular symmetry). In the original R-group tree structure, nodes located in the same layer and originated from the same parent node are associated with distinct R-groups. Therefore, stereoisomers having the same substituents are combined into a single leaf node. Hence if a terminal node is associated with more than one compound, stereoisomers are present. In AnalogExplorer2, stereoisomers are explicitly considered, as illustrated in Figure 2c. Each stereoisomer is represented by a single node and stereoisomers belonging to the same subset (i.e. compounds with different stereochemistry at the same site) are identified by a unique index (i.e. ‘1’ for the three stereoisomers in Figure 2c). If different subsets of stereoisomers are present in an R-group tree, incremental indices are used to identify and distinguish them (i.e. ‘1’, ‘2’ etc.).

Implementation

Routines for scaffold, analog, and MCS identification, R-group decomposition, and indexing of substitution sites are implemented in Java using the OpenEye OEChem toolkit version 2.0.2 (Open Eye Scientific Software; http://www.eyesopen.com). Therefore, this toolkit is required to execute the program. All graphical components of AnalogExplorer and AnalogExplorer2 are implemented using the open source Java package JUNG version 2.0.1 (http://jung.sourceforge.net/). Potential inconsistencies with subsequent versions of OEChem or JUNG can be avoided by using the specified versions.

Program use

The executable program utilizes standard SD files as input and generates complete or reduced graphs for all or individual series, depending on the user’s preference. The initial graph layout is produced by the DAGLayout algorithm of JUNG (http://jung.sourceforge.net/) and usually interactively modified for graphical analysis. The number of compounds assigned to each node and their mean potency can be viewed by navigating the graph. R-group trees representing compound subsets are generated together with the complete or reduced graph. In each R-group tree, the substituents associated with individual nodes, compounds (leaf nodes), and corresponding potency values can also be viewed. Subsets of stereoisomers, if available, are depicted using numerical indices, as discussed. Furthermore, an output file is generated reporting compounds belonging to individual subsets.

Exemplary applications

AnalogExplorer2 can be used for different types of SAR analysis, as illustrated by a few exemplary applications. Compound data were taken from ChEMBL13 version 20. Figure 3 displays the reduced graph for a series of 45 alpha-1a adrenergic receptor ligands with a total of seven substitution sites and the R-group tree for an exemplary three-site combination. The tree reveals a clear SAR pattern (with increasing potency of analogs from the bottom left to the right) and identifies six (uniquely indexed) pairs of stereoisomers among these analogs. Figure 4 provides a corresponding representation for a series of 64 matrix metalloproteinase 9 inhibitors and compares R-group trees for three substitution sites. It becomes apparent that substituents attached to R4 alone or in combination with other sites consistently yield compounds having only low potency. In Figure 5, target-specific reduced graphs are compared for a series of 32 analogs with a total of nine substitution sites and activity against two dipeptidyl peptidases (DPP4 and DPP8). The graphs generated for the same analog series display different compound potency distributions, reflecting a selectivity tendency for DPP4 over DPP8. Figure 6 shows reduced graphs for six different analog series with activity against the same kinase. The graphs reveal different structural content and a different degree of chemical exploration among these series as well as differences in the SAR information they provide. In Figure 7, two of these series are combined into a new single series by re-calculating the MCS that comprises 43 analogs with a total of eight substitution sites. The reduced graph captures the structural organization and activity information of this combined series.

8f77b99e-10d4-44f1-9efc-ddfdb7e6a6f2_figure3.gif

Figure 3. Alpha-1a adrenergic receptor ligands.

At the top, the MCS for a series of 45 analogs active against alpha-1a adrenergic receptor is shown. In addition, the corresponding reduced graph (middle, left) and R-group tree for substitution site combination 247 (middle, right) are displayed. At the bottom, six pairs of stereoisomers are shown. For each compound, its pKi value is given.

8f77b99e-10d4-44f1-9efc-ddfdb7e6a6f2_figure4.gif

Figure 4. Matrix metalloproteinase 9 inhibitors.

Shown is the reduced graph for a series of 64 analogous inhibitors of matrix metalloproteinase 9. R-group trees of three substitution sites (nodes 3, 4 and 5; dashed box) are shown on the right. For each R-group tree, substituents are provided.

8f77b99e-10d4-44f1-9efc-ddfdb7e6a6f2_figure5.gif

Figure 5. Analogs with multi-target activities.

Shown are reduced graphs for analog series with inhibitory activity against dipeptidyl peptidase IV (DPP4; left) and VIII (DPP8; right). Three representative compounds associated with node 17 (dashed box) are shown at the bottom. For each compound, the potency value (pIC50) for DPP4 and DPP8 is reported in a green and red box, respectively.

8f77b99e-10d4-44f1-9efc-ddfdb7e6a6f2_figure6.gif

Figure 6. Multiple series of serine/threonine-protein kinase PIM2 inhibitors.

Shown are reduced graphs for six different series of PIM2 kinase inhibitors (capital letters AF represent series identifiers). For each series, corresponding MCS and reduced graph are shown.

8f77b99e-10d4-44f1-9efc-ddfdb7e6a6f2_figure7.gif

Figure 7. Combined series.

Series A and E from Figure 6 are combined into a single series by determining their MCS yielding eight substitution sites (left). The reduced graph of the combined series is shown (right) that consists of 43 analogs.

Software and data availability

The following tools and data sets are made publicly available without restrictions via a deposition on the ZENODO open access platform12. Three executable files of the original AnalogExplorer program are provided for different applications including the analysis of multiple analog series from a given compound set, analysis of an individual series, and selectivity analysis (according to Figure 5). With the exception of the OpenEye OEChem library, jar files of the required external libraries are also provided. In addition, all compound sets analyzed in the original publication9 are deposited. These compound sets were taken from ChEMBL version 18. Furthermore, three executable files are made available for AnalogExplorer2 (for multiple analog series, individual series, and selectivity analysis) as well as the compound sets for which graph representations are reported herein. These compounds were taken from ChEMBL version 20. A “readme” document with detailed explanations is also provided as a part of the deposition.

Conclusions

The AnalogExplorer method was designed for the systematic organization and graphical analysis of large series of analogs, which frequently originate from lead optimization efforts. Herein, an extension of the methodology has been introduced. AnalogExplorer2 explicitly accounts for all stereoisomers during graphical analysis and SAR exploration. The AnalogExplorer2 program is made freely available to the scientific community.

Comments on this article Comments (1)

Version 1
VERSION 1 PUBLISHED 09 Oct 2015
  • Author Response (F1000Research Advisory Board Member) 04 Nov 2015
    Jürgen Bajorath, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
    04 Nov 2015
    Author Response F1000Research Advisory Board Member
    We thank both reviewers for their constructive comments and appreciate the positive feedback. We agree that a future release of this program should best make use of a publicly available ... Continue reading
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Hu Y, Zhang B, Vogt M and Bajorath J. AnalogExplorer2 – Stereochemistry sensitive graphical analysis of large analog series [version 1; peer review: 2 approved]. F1000Research 2015, 4(Chem Inf Sci):1031 (https://doi.org/10.12688/f1000research.7146.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 09 Oct 2015
Views
26
Cite
Reviewer Report 04 Nov 2015
Carleton Sage, Arena Pharmaceuticals Inc, San Diego, CA, USA 
Approved
VIEWS 26
The authors provide an extension to AnalogExplorer incorporating stereochemistry, and important extension of the method. This method of analysing complicated compound data sets is sophisticated and easy to use.   The approach to communicating results is intuitive both from the perspective ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Sage C. Reviewer Report For: AnalogExplorer2 – Stereochemistry sensitive graphical analysis of large analog series [version 1; peer review: 2 approved]. F1000Research 2015, 4(Chem Inf Sci):1031 (https://doi.org/10.5256/f1000research.7699.r11071)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
30
Cite
Reviewer Report 15 Oct 2015
Peter Ertl, Novartis Institutes for Biomedical Research, Basel, Switzerland 
Approved
VIEWS 30
Analysis of diverse molecular datasets is one of the most common tasks a computational chemist in pharmaceutical industry, but also in academia is facing daily. The tool presented in this manuscript – the AnalogExplorer2 supports this process and makes it ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Ertl P. Reviewer Report For: AnalogExplorer2 – Stereochemistry sensitive graphical analysis of large analog series [version 1; peer review: 2 approved]. F1000Research 2015, 4(Chem Inf Sci):1031 (https://doi.org/10.5256/f1000research.7699.r10815)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (1)

Version 1
VERSION 1 PUBLISHED 09 Oct 2015
  • Author Response (F1000Research Advisory Board Member) 04 Nov 2015
    Jürgen Bajorath, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
    04 Nov 2015
    Author Response F1000Research Advisory Board Member
    We thank both reviewers for their constructive comments and appreciate the positive feedback. We agree that a future release of this program should best make use of a publicly available ... Continue reading
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.