Comprehensive knowledge base of two-and three-dimensional activity cliffs for medicinal and computational chemistry

Activity cliffs are formed by pairs or groups of structurally similar or analogous active compounds with large differences in potency. They can be defined in two or three dimensions by comparing graph-based molecular representations or compound binding modes, respectively. Through systematic analysis of publicly available compound activity data and ligand-target X-ray structures we have in a series of studies determined all currently available twoand three-dimensional activity cliffs (2Dand 3D-cliffs, respectively). Furthermore, we have systematically searched for 2D extensions of 3D-cliffs. Herein, we specify different categories of activity cliffs we have explored and introduce an open access data deposition in ZENODO (doi: ) that makes the entire knowledge 10.5281/zenodo.18490 base of current activity cliffs freely available in an organized form.


Introduction
The activity cliff concept has experienced increasing interest in chemical informatics and medicinal chemistry [1][2][3][4][5] . A consensus definition of activity cliffs [1][2][3][4] refers to pairs or groups of structurally similar or analogous active compounds with large differences in potency 4,5 . For the definition of activity cliffs, the specification of similarity and potency difference criteria is required. Twodimensional activity cliffs (2D-cliffs) have mostly been defined on the basis of Tanimoto similarity 6 comparing molecular fingerprint representations 2 . More recently, 2D-cliffs have also been defined on the basis of substructure relationships, preferably employing the matched molecular pair (MMP) formalism 7,8 , leading to the introduction of MMP-cliffs 9 . An MMP is defined as a pair of compounds that are only distinguished by a structural change at a single site 7 , i.e., the exchange of a substructure, termed a chemical transformation 8 . For the definition of MMP-cliffs, transformation size restrictions have been introduced to limit transformations to small chemical changes typically observed in analog series 9 . Applying well-defined similarity and potency difference criteria, 2D-cliffs can be systematically extracted from compound databases 10 .
The vast majority of 2D-cliffs (i.e., close to or more than 95%, depending on the molecular representations and similarity measure used) are not formed in isolation (i.e., in the absence of structural neighbors with significant potency variations), but rather in a coordinated manner involving series of compounds with varying potency forming multiple and overlapping cliffs 4,5,11 . In activity cliff network representations where nodes represent compounds and edges activity cliffs, coordinated cliffs emerge as individual clusters of varying composition and size 11 , which can be isolated for further analysis.
In addition to 2D-cliffs, three-dimensional activity cliffs (3D-cliffs) can also be defined by comparing compound binding modes in X-ray structures 12 . This requires the superposition of structures of a given target available in different crystallographic ligandtarget complexes and the assessment of the 3D similarity of bound ligands 12 . Three-dimensional activity cliffs can be further extended by taking 2D ligand information into account. This can be accomplished by systematically searching compound activity classes for analogs of 3D-cliff partners 13 . For example, for each cliff partner, MMPs with database compounds sharing the same activity can be determined and qualifying analogs can be assigned to the 3D-cliff 13 , leading to what we term herein a 3D-cliff-MMP extension. Figure 1 shows an exemplary 2D-cliff (MMP-cliff), activity cliff cluster, 3D-cliff, and 3D-cliff-MMP extension.
In medicinal chemistry, 2D-cliffs are often considered in the context of structure-activity relationship (SAR) analysis and compound design 2,3 . For SAR exploration, activity cliff clusters are of particular interest because they provide more SAR information than 2D-cliffs studied individually. Furthermore, 3D-cliffs are of prime interest for structure-based design and also for computational chemistry applications including, for example, the calibration of scoring functions or free energy (perturbation) calculations. Last but not least, 2D extensions of 3D-cliffs bridge different applications in medicinal and computational chemistry and help to identify candidate compounds for further analysis.

Methods and materials
The activity cliff information provided herein is the result of recent surveys and systematic analyses of 2D-cliffs including clusters 14 , 3D-cliffs 15 , and extensions of 3D-cliffs 16 . Table 1 summarizes the different activity cliff categories. All 2D-cliffs reported herein originated from the most recent release of ChEMBL (version 20) 17,18 and all 3D-cliffs from the Protein Data Bank (PDB; accessed December, 2014) 19 . MMP extensions of 3D-cliffs were identified in ChEMBL (version 19).
For all activity cliffs, an at least 100-fold difference in potency between cliff partners was consistently required. For 2D-cliffs, only (assay-independent) K i values were considered as potency measurements. For 3D-cliffs, K i and IC 50 measurements were separately considered (using a K i and IC 50 value-based data set, respectively). In addition, 3D-cliffs were also determined in a combined K i /IC 50 data set (taking into consideration that 3D-cliffs provide a much smaller knowledge base than 2D-cliffs; vide infra). For 3D-cliff analysis, a crystallographic resolution limit of 3.0 Å was applied.
Two-dimensional activity cliffs were determined using three different molecular representations including the extended connectivity fingerprint with bond diameter 4 (ECFP4) 20 , molecular access system (MACCS) structural keys 21 , and transformation size-restricted MMPs (MMP-cliffs) 9 . As similarity criteria for ECFP4-and MACCS-based activity cliffs, Tanimoto coefficient threshold values of 0.55 and 0.85 were applied, respectively 2,14 . For an MMP-cliff, our preferred 2D-cliff definition 3,4 , the formation of a transformation size-restricted MMP served as a similarity criterion. By definition 2D-cliffs do not contain stereochemical information.
For the identification of 3D-cliffs, the normalized overlap of atomic property density functions calculated for a pair of bound ligands was used as a measure of 3D similarity, taking conformational, positional, and atomic property differences into account 12 . An at least 80% calculated 3D similarity was required as a threshold for 3D-cliff formation 12,15 .

Data description
Activity cliff statistics are reported in Table 1. A total of 17,111 MMP-cliffs, 31,975 ECFP4-, and 34,813 MACCS-based 2D-cliffs were identified formed by compounds active against more than 300 targets in each case. The corresponding number of activity clusters (comprising at least 3 compounds) was 1267, 1462, and 1402, respectively. Therefore, a very large knowledge base of welldefined 2D-cliffs is currently available. In addition, on the basis of K i and IC 50 measurements, 236 and 292 3D-cliffs were detected and were formed by crystallographic ligands of 26 and 43 targets, respectively. The combined K i /IC 50 data set yielded 595 3D-cliffs for 58 targets. Although many more 2D-than 3D-cliffs are currently available, as one would expect, the number of 3D-cliffs is larger than we anticipated, hence providing substantial opportunities for structural and computational studies. Table 2 provides details for the 61 different targets for which 3D-cliffs were detected. Furthermore, more than 1000 3D-cliff-MMPs were identified for each of the K i and IC 50 data sets and 2608 for the combined set (Table 1). Hence, for many 3D-cliffs, active analogs are available whose SAR characteristics and possible interaction patterns can be explored  based upon 3D-cliff information, for example, by superposing them onto 3D-cliff compounds with which they form transformation size-restricted MMPs.

Data availability
The activity cliff information described above is made freely available in four separate data files containing 2D-cliffs, 3D-cliffs, 3D-cliff-MMP extensions, and superpositions of complex X-ray structures and 3D ligands for selected targets: (1) 2D-Cliffs_and_Cliff-Clusters.xlsx (Excel format): 2D-cliffs and clusters belonging to different categories are separately recorded using ChEMBL IDs.
(2) 3D-Cliffs.xlsx (Excel): 3D-cliffs from the K i , IC 50 , and K i /IC 50 data sets are separately provided using PDB IDs for compounds and UniProt 22 IDs for targets.  Table 2, superpositions of complex X-ray structures and 3D ligands are provided.
These data sets are contained in an open access ZENODO deposition 23 . The deposition also contains a README document that details the data organization and information provided.

Conclusions
In this study, we have discussed different categories of activity cliffs (including cliff extensions) and reported the distribution of cliffs belonging to these categories. Given the cliff definitions applied herein, the activity cliff information we provide as an open access deposition is up-to-date and comprehensive. We hope that this large knowledge base of activity cliffs will be helpful in the practice of medicinal chemistry and structure-based drug design as well as in further evaluating and advancing computational methods.

Data availability
ZENODO: Knowledge base of two-and three-dimensional activity cliffs, doi: 10.5281/zenodo.18490 23 Author contributions JB designed the study, NF, YH, and DS collected, organized, and deposited the data, JB wrote the manuscript, all authors examined the manuscript.

Competing interests
No competing interests declared.

Grant information
The author(s) declared that no grants were involved in supporting this work.

Gerhard Müller
Medicinal Chemistry, Mercachem, Nijmegen, The Netherlands Y. Hu, N. Furtmann, D. Stumpfe, and J. Bajorath report on a conceptual extension of their well-exemplified activity cliff concept from the previously established graph-based two-dimensional representation to a three-dimensional version by factoring crystallographically solved high-resolution complex structures of respective ligands into the comparative analyses of sub-structural molecular changes linked to changes in activity. With that extension into three-dimensional space they account for ligand-target interactions, thus providing activity cliff-forming compound sets a design-relevant context that is of immediate assistance when embedded in a molecular design campaign. This clearly is a useful enrichment of the toolbox of computational, as well as medicinal chemists.
The authors provide all the relevant molecular similarity and potency difference criteria underlying this analyses that are essential for the definition of activity cliff-forming compound clusters, together with a sound description of all methodological details. While the traditional 2D cliffs were based on e.g. Tanimoto similarity and the size-restricted matched molecular pair concept, introduced by the Bajorath group earlier, 3D cliffs are based on mutual molecular similarity for ligands for which high-resolution complex structures are available. Obtained 3D activity cliffs were further enriched to 3D-cliff-MMP extended sets by including molecular similarity considerations to active compounds for which no x-ray structure is available.
In total, app. 17.000 MMP cliffs have been identified at more than 300 distinct biological targets revealing close to 1.300 activity clusters. Cumulatively, 600 3D cliffs have been detected involving app. 60 distinct protein targets, respectively. In Table 2, the authors provide a comprehensive overview of the target landscape and underlying statistics for the detected 3D activity cliffs. In addition, all relevant data are accessible and an open access data deposition in ZENODO has been established.
It is especially the extended 3D-cliff-MMP datasets in which ligands with experimentally determined binding modes and interaction patterns serve as probe compound for closely related active analogues with available SAR information that bears a huge potential to extrapolate medicinal chemists' understanding of structure-activity relationships from a pure comparative framework into a 3D direct design concept. An immediate interrogation of the binding site's functionalities becomes amenable to a previously restricted indirect design approach. It will be interesting to see as to whether the 3D activity previously restricted indirect design approach. It will be interesting to see as to whether the 3D activity concept introduced in this contribution can be extended to a better understanding of structure-selectivity relationships of compound sets acting e.g. at different isoforms of densely populated target families. Summarizing, this contribution laid the basis for migrating a formerly indirect design-restricted tool for comprehensive SAR analysis into 3D space, actually into the binding pocket of investigated ligand sets, thus increasing the interpretability and the feasibility of identified cliff information for the community of practicing medicinal chemists No competing interests were disclosed.