ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Data Note

Comprehensive knowledge base of two- and three-dimensional activity cliffs for medicinal and computational chemistry

[version 1; peer review: 3 approved]
PUBLISHED 25 Jun 2015
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Cheminformatics gateway.

This article is included in the Data: Use and Reuse collection.

Abstract

Activity cliffs are formed by pairs or groups of structurally similar or analogous active compounds with large differences in potency. They can be defined in two or three dimensions by comparing graph-based molecular representations or compound binding modes, respectively. Through systematic analysis of publicly available compound activity data and ligand-target X-ray structures we have in a series of studies determined all currently available two- and three-dimensional activity cliffs (2D- and 3D-cliffs, respectively). Furthermore, we have systematically searched for 2D extensions of 3D-cliffs. Herein, we specify different categories of activity cliffs we have explored and introduce an open access data deposition in ZENODO (doi: 10.5281/zenodo.18490) that makes the entire knowledge base of current activity cliffs freely available in an organized form.

Keywords

Active compounds, X-ray structures, activity cliffs, data mining, structure-activity relationships, computational methods

Introduction

The activity cliff concept has experienced increasing interest in chemical informatics and medicinal chemistry15. A consensus definition of activity cliffs14 refers to pairs or groups of structurally similar or analogous active compounds with large differences in potency4,5. For the definition of activity cliffs, the specification of similarity and potency difference criteria is required. Two-dimensional activity cliffs (2D-cliffs) have mostly been defined on the basis of Tanimoto similarity6 comparing molecular fingerprint representations2. More recently, 2D-cliffs have also been defined on the basis of substructure relationships, preferably employing the matched molecular pair (MMP) formalism7,8, leading to the introduction of MMP-cliffs9. An MMP is defined as a pair of compounds that are only distinguished by a structural change at a single site7, i.e., the exchange of a substructure, termed a chemical transformation8. For the definition of MMP-cliffs, transformation size restrictions have been introduced to limit transformations to small chemical changes typically observed in analog series9. Applying well-defined similarity and potency difference criteria, 2D-cliffs can be systematically extracted from compound databases10.

The vast majority of 2D-cliffs (i.e., close to or more than 95%, depending on the molecular representations and similarity measure used) are not formed in isolation (i.e., in the absence of structural neighbors with significant potency variations), but rather in a coordinated manner involving series of compounds with varying potency forming multiple and overlapping cliffs4,5,11. In activity cliff network representations where nodes represent compounds and edges activity cliffs, coordinated cliffs emerge as individual clusters of varying composition and size11, which can be isolated for further analysis.

In addition to 2D-cliffs, three-dimensional activity cliffs (3D-cliffs) can also be defined by comparing compound binding modes in X-ray structures12. This requires the superposition of structures of a given target available in different crystallographic ligand-target complexes and the assessment of the 3D similarity of bound ligands12. Three-dimensional activity cliffs can be further extended by taking 2D ligand information into account. This can be accomplished by systematically searching compound activity classes for analogs of 3D-cliff partners13. For example, for each cliff partner, MMPs with database compounds sharing the same activity can be determined and qualifying analogs can be assigned to the 3D-cliff13, leading to what we term herein a 3D-cliff-MMP extension. Figure 1 shows an exemplary 2D-cliff (MMP-cliff), activity cliff cluster, 3D-cliff, and 3D-cliff-MMP extension.

679e4312-cf22-4ecb-abac-077401435745_figure1.gif

Figure 1. Exemplary 2D- and 3D-cliffs.

Different categories of activity cliffs are shown formed by inhibitors of tyrosine kinase ABL. MMP-cliffs are used to represent 2D-cliffs. For each compound, the ChEMBL or Protein Data Bank (PDB) ID and its negative logarithmic potency value are reported. (a) An exemplary MMP-cliff (structural modification highlighted in red) taken from an activity cliff cluster (dashed blue box) is shown. In an activity cliff network, nodes represent compounds and edges cliffs. Nodes are colored according to potency values using a continuous color spectrum from red (lowest potency) via yellow (intermediate) to green (highest potency). In network representations, coordinated activity cliffs emerge as clusters. (b) An exemplary 3D-cliff and its 2D extension are shown (3D-cliff-MMP). The extension results from MMP-based mapping of analogs from ChEMBL to 3D-cliff compounds. Structural differences between 3D-cliff compounds and their 2D (MMP) partners are highlighted in red.

In medicinal chemistry, 2D-cliffs are often considered in the context of structure-activity relationship (SAR) analysis and compound design2,3. For SAR exploration, activity cliff clusters are of particular interest because they provide more SAR information than 2D-cliffs studied individually. Furthermore, 3D-cliffs are of prime interest for structure-based design and also for computational chemistry applications including, for example, the calibration of scoring functions or free energy (perturbation) calculations. Last but not least, 2D extensions of 3D-cliffs bridge different applications in medicinal and computational chemistry and help to identify candidate compounds for further analysis.

Methods and materials

The activity cliff information provided herein is the result of recent surveys and systematic analyses of 2D-cliffs including clusters14, 3D-cliffs15, and extensions of 3D-cliffs16. Table 1 summarizes the different activity cliff categories. All 2D-cliffs reported herein originated from the most recent release of ChEMBL (version 20)17,18 and all 3D-cliffs from the Protein Data Bank (PDB; accessed December, 2014)19. MMP extensions of 3D-cliffs were identified in ChEMBL (version 19).

Table 1. Activity cliff categories and statistics.

Reported are the number of 2D-cliffs belonging to three different categories (FP, fingerprint; Tc, Tanimoto coefficient), corresponding activity cliff clusters (comprising at least three compounds), 3D-cliffs for different potency measurement-dependent data sets, and corresponding 3D-cliff-MMPs (giving the total number of MMPs detected for 3D-cliffs from each data set).

#Cliffs/clusters#Targets
2D
(Ki)
FP Tc-basedMACCSCliffs34,813320
Cliff clusters1402
ECFP4Cliffs31,975339
Cliff clusters1462
MMPCliffs17,111301
Cliff clusters1267
3D3D-cliffsKi23626
IC5029243
Ki/IC5059558
3D-cliff-MMPsKi101718
IC50111335
Ki/IC50260848

For all activity cliffs, an at least 100-fold difference in potency between cliff partners was consistently required. For 2D-cliffs, only (assay-independent) Ki values were considered as potency measurements. For 3D-cliffs, Ki and IC50 measurements were separately considered (using a Ki and IC50 value-based data set, respectively). In addition, 3D-cliffs were also determined in a combined Ki/IC50 data set (taking into consideration that 3D-cliffs provide a much smaller knowledge base than 2D-cliffs; vide infra). For 3D-cliff analysis, a crystallographic resolution limit of 3.0 Å was applied.

Two-dimensional activity cliffs were determined using three different molecular representations including the extended connectivity fingerprint with bond diameter 4 (ECFP4)20, molecular access system (MACCS) structural keys21, and transformation size-restricted MMPs (MMP-cliffs)9. As similarity criteria for ECFP4- and MACCS-based activity cliffs, Tanimoto coefficient threshold values of 0.55 and 0.85 were applied, respectively2,14. For an MMP-cliff, our preferred 2D-cliff definition3,4, the formation of a transformation size-restricted MMP served as a similarity criterion. By definition 2D-cliffs do not contain stereochemical information.

For the identification of 3D-cliffs, the normalized overlap of atomic property density functions calculated for a pair of bound ligands was used as a measure of 3D similarity, taking conformational, positional, and atomic property differences into account12. An at least 80% calculated 3D similarity was required as a threshold for 3D-cliff formation12,15.

Data description

Activity cliff statistics are reported in Table 1. A total of 17,111 MMP-cliffs, 31,975 ECFP4-, and 34,813 MACCS-based 2D-cliffs were identified formed by compounds active against more than 300 targets in each case. The corresponding number of activity clusters (comprising at least 3 compounds) was 1267, 1462, and 1402, respectively. Therefore, a very large knowledge base of well-defined 2D-cliffs is currently available. In addition, on the basis of Ki and IC50 measurements, 236 and 292 3D-cliffs were detected and were formed by crystallographic ligands of 26 and 43 targets, respectively. The combined Ki/IC50 data set yielded 595 3D-cliffs for 58 targets. Although many more 2D- than 3D-cliffs are currently available, as one would expect, the number of 3D-cliffs is larger than we anticipated, hence providing substantial opportunities for structural and computational studies. Table 2 provides details for the 61 different targets for which 3D-cliffs were detected. Furthermore, more than 1000 3D-cliff-MMPs were identified for each of the Ki and IC50 data sets and 2608 for the combined set (Table 1). Hence, for many 3D-cliffs, active analogs are available whose SAR characteristics and possible interaction patterns can be explored based upon 3D-cliff information, for example, by superposing them onto 3D-cliff compounds with which they form transformation size-restricted MMPs.

Table 2. Targets with available 3D-cliffs.

A total of 61 targets are listed for which 3D-cliffs were detected. For each target, the ChEMBL ID, UniProt accession ID (UniProtID)22, and the number of available 3D-cliffs are reported. 3D-cliffs were separately determined on the basis of only Ki or IC50 measurements (available for active compounds) as well as for the combined Ki and IC50 data set (Ki/IC50). In addition, the number of MMP-cliffs (if available; defined on the basis of Ki values) is also reported for each target.

Target ID
(ChEMBL)
UniProtIDTarget name#3D-cliffs#MMP-cliffs
KiIC50Ki/IC50Ki
202P00374Dihydrofolate reductase0350
204P00734Thrombin1582166296
205P00918Carbonic anhydrase II8624397
206P03372Estrogen receptor alpha32103
235P37231Peroxisome proliferator-activated receptor gamma01224
239Q07869Peroxisome proliferator-activated receptor alpha0010
242Q92731Estrogen receptor beta10210
244P00742Coagulation factor X211281130
260Q16539MAP kinase p38 alpha0111368
262P49841Glycogen synthase kinase-3 beta10211
267P12931Tyrosine-protein kinase SRC02220
275Q07343Phosphodiesterase 4B0220
279P35968Vascular endothelial growth factor receptor 212412
280P45452Matrix metalloproteinase 1302131
283P08254Matrix metalloproteinase 300218
284P27487Dipeptidyl peptidase IV04540
286P00797Renin0877
288Q08499Phosphodiesterase 4D0220
301P24941Cyclin-dependent kinase 2326417
335P18031Protein-tyrosine phosphatase 1B13820
1782P14324Farnesyl diphosphate synthase0223
1808P12821Angiotensin-converting enzyme20219
1827O76074Phosphodiesterase 5A0112
1862P00519Tyrosine-protein kinase ABL02018
1871P10275Androgen Receptor10015
1892Q04609Glutamate carboxypeptidase II20242
1918P39086Glutamate receptor ionotropic kainate 130343
1966Q02127Dihydroorotate dehydrogenase1142
2147P11309Serine/threonine-protein kinase PIM107721
2179P04062Beta-glucocerebrosidase1018
2288Q13526Peptidyl-prolyl cis-trans isomerase NIMA-interacting 103426
2308Q9UKM7Endoplasmic reticulum mannosyl-oligosaccharide
1,2-alpha-mannosidase
0117
2360P00492Hypoxanthine-guanine phosphoribosyltransferase1017
2524P06280Alpha-galactosidase A3030
2527O96017Serine/threonine-protein kinase Chk2012120
2534O155303-phosphoinositide dependent protein kinase-10450
2835P23458Tyrosine-protein kinase JAK11017
3267P48736PI3-kinase p110-gamma subunit0028
3286P00749Urokinase-type plasminogen activator72126
3587Q02750Dual specificity mitogen-activated protein kinase kinase 1011110
3589P55263Adenosine kinase0111
3717P08581Hepatocyte growth factor receptor03530
3835P51955Serine/threonine-protein kinase NEK20440
3880P07900Heat shock protein HSP 90-alpha039430
3922P50579Methionine aminopeptidase 204433
3959P16083Quinone reductase 20120
3975P09467Fructose-1,6-bisphosphatase0110
3991P08709Coagulation factor VII4042
4073P09237Matrix metalloproteinase 71013
4393P39900Matrix metalloproteinase 121024
4439P36897TGF-beta receptor type I010187
4581P52732Kinesin-like protein 1043110
4588P22894Matrix metalloproteinase 813313
4617P11086Phenylethanolamine N-methyltransferase60667
4618P09960Leukotriene A4 hydrolase032370
4630O14757Serine/threonine-protein kinase Chk10881
4722O14965Serine/threonine-protein kinase Aurora-A0441
4822P56817Beta-secretase 12575772
5147P54760Ephrin type-B receptor 40440
5879O60760Hematopoietic prostaglandin D synthase0330
1795117Q8TEK3Histone-lysine N-methyltransferase, H3 lysine-79 specific20232

Data availability

The activity cliff information described above is made freely available in four separate data files containing 2D-cliffs, 3D-cliffs, 3D-cliff-MMP extensions, and superpositions of complex X-ray structures and 3D ligands for selected targets:

  • (1) 2D-Cliffs_and_Cliff-Clusters.xlsx (Excel format): 2D-cliffs and clusters belonging to different categories are separately recorded using ChEMBL IDs.

  • (2) 3D-Cliffs.xlsx (Excel): 3D-cliffs from the Ki, IC50, and Ki/IC50 data sets are separately provided using PDB IDs for compounds and UniProt22 IDs for targets.

  • (3) 3D-Cliff_Extension.xlsx (Excel): Analogs of 3D-cliff compounds identified by MMP search are reported. For each of the three data sets, all 3D-cliff-MMPs are listed.

  • (4) Superpositions.zip (all files in MOL2 format): For each target in Table 2, superpositions of complex X-ray structures and 3D ligands are provided.

These data sets are contained in an open access ZENODO deposition23. The deposition also contains a README document that details the data organization and information provided.

Conclusions

In this study, we have discussed different categories of activity cliffs (including cliff extensions) and reported the distribution of cliffs belonging to these categories. Given the cliff definitions applied herein, the activity cliff information we provide as an open access deposition is up-to-date and comprehensive. We hope that this large knowledge base of activity cliffs will be helpful in the practice of medicinal chemistry and structure-based drug design as well as in further evaluating and advancing computational methods.

Data availability

ZENODO: Knowledge base of two- and three-dimensional activity cliffs, doi: 10.5281/zenodo.1849023

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Jun 2015
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Hu Y, Furtmann N, Stumpfe D and Bajorath J. Comprehensive knowledge base of two- and three-dimensional activity cliffs for medicinal and computational chemistry [version 1; peer review: 3 approved]. F1000Research 2015, 4(Chem Inf Sci):168 (https://doi.org/10.12688/f1000research.6661.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 25 Jun 2015
Views
19
Cite
Reviewer Report 16 Jul 2015
Gerhard Müller, Medicinal Chemistry, Mercachem, Nijmegen, The Netherlands 
Approved
VIEWS 19
Y. Hu, N. Furtmann, D. Stumpfe, and J. Bajorath report on a conceptual extension of their well-exemplified activity cliff concept from the previously established graph-based two-dimensional representation to a three-dimensional version by factoring crystallographically solved high-resolution complex structures of respective ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Müller G. Reviewer Report For: Comprehensive knowledge base of two- and three-dimensional activity cliffs for medicinal and computational chemistry [version 1; peer review: 3 approved]. F1000Research 2015, 4(Chem Inf Sci):168 (https://doi.org/10.5256/f1000research.7155.r9508)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
18
Cite
Reviewer Report 09 Jul 2015
Veer Shanmugasundaram, Center of Chemistry Innovation & Excellence, Pfizer PharmaTherapeutics Research & Development, Groton, CT, USA 
Approved
VIEWS 18
Recently there has been a lot of interest and effort in curating databases and providing MMP information for analysis of various properties by both academic and industrial groups as noted by some references below
 
http://dx.doi.org/10.1021/ci5005256 
http://dx.doi.org/10.1021/jm400223y 
http://dx.doi.org/10.1021/jm500317a 
 
Further, a few companies have started providing ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Shanmugasundaram V. Reviewer Report For: Comprehensive knowledge base of two- and three-dimensional activity cliffs for medicinal and computational chemistry [version 1; peer review: 3 approved]. F1000Research 2015, 4(Chem Inf Sci):168 (https://doi.org/10.5256/f1000research.7155.r9206)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
17
Cite
Reviewer Report 06 Jul 2015
Alexandre Varnek, Department of Chemistry, University of Strasbourg, Strasbourg, France 
Approved
VIEWS 17
This is short but nice paper describing the data related to 2D and 3D activity cliffs for large variety of biological targets. The data can freely be downloaded from ZENODO which opens a way for numerous computational experiments. They also ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Varnek A. Reviewer Report For: Comprehensive knowledge base of two- and three-dimensional activity cliffs for medicinal and computational chemistry [version 1; peer review: 3 approved]. F1000Research 2015, 4(Chem Inf Sci):168 (https://doi.org/10.5256/f1000research.7155.r9335)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Jun 2015
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.