Promiscuity progression of bioactive compounds over time

Ye Hu; Swarit Jasial; Jürgen Bajorath

doi:10.12688/f1000research.6473.2

Home Browse Promiscuity progression of bioactive compounds over time

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Revised

Promiscuity progression of bioactive compounds over time

[version 2; peer review: 3 approved]

Ye Hu¹, Swarit Jasial¹, Jürgen Bajorath ¹

PUBLISHED 15 Jun 2015

Author details Author details

¹ Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, Bonn, D-53113, Germany

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Cheminformatics gateway.

Abstract

In the context of polypharmacology, compound promiscuity is rationalized as the ability of small molecules to specifically interact with multiple targets. To study promiscuity progression of bioactive compounds in detail, nearly 1 million compounds and more than 5.2 million activity records were analyzed. Compound sets were assembled by applying different data confidence criteria and selecting compounds with activity histories over many years. On the basis of publication dates, compounds and activity records were organized on a time course, which ultimately enabled monitoring data growth and promiscuity progression over nearly 40 years, beginning in 1976. Surprisingly low degrees of promiscuity were consistently detected for all compound sets and there were only small increases in promiscuity over time. In fact, most compounds had a constant degree of promiscuity, including compounds with an activity history of 10 or 20 years. Moreover, during periods of massive data growth, beginning in 2007, promiscuity degrees also remained constant or displayed only minor increases, depending on the activity data confidence levels. Considering high-confidence data, bioactive compounds currently interact with 1.5 targets on average, regardless of their origins, and display essentially constant degrees of promiscuity over time. Taken together, our findings provide expectation values for promiscuity progression and magnitudes among bioactive compounds as activity data further grow.

Keywords

Polypharmacology, compound promiscuity, pharmaceutical targets, publicly available activity data, data growth, data confidence levels, promiscuity progression

Corresponding author: Jürgen Bajorath

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2015 Hu Y et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

How to cite: Hu Y, Jasial S and Bajorath J. Promiscuity progression of bioactive compounds over time [version 2; peer review: 3 approved]. F1000Research 2015, 4(Chem Inf Sci):118 (https://doi.org/10.12688/f1000research.6473.2) First published: 13 May 2015, 4(Chem Inf Sci):118 (https://doi.org/10.12688/f1000research.6473.1) Latest published: 15 Jun 2015, 4(Chem Inf Sci):118 (https://doi.org/10.12688/f1000research.6473.2)

Revised Amendments from Version 1

Comments to all three reviews have been posted to address more general points that were raised or points that we felt should not be specifically considered in our revision (e.g., issues that would likely modify the focus of our analysis). In our revision, the following points made by the reviewers have been addressed: The discussion of promiscuity and polypharmacology has been extended, the number of publications from which ChEMBL data originated was analyzed over time, a few definitions and analysis details have been further clarified, and promiscuity was also analyzed for compounds with varying logP values (as a measure of lipophilicity) and molecular weight. Additional results are presented in a revised Figure 1c and a new Figure 5 of the revision. The source data sets are made available via an open access deposition as specified in the manuscript.

See the authors' detailed response to the review by John A. Lowe III
See the authors' detailed response to the review by Christopher Southan
See the authors' detailed response to the review by Georgia B. McGaughey

Introduction

Polypharmacology is an emerging theme in pharmaceutical research and refers to the property of many bioactive compounds or drugs to act on multiple physiological targets, modulate different signaling pathways, and elicit multi-target-dependent pharmacological effects^1–3. Typically, polypharmacology is not considered to include toxic or other undesired side effects. The molecular basis of polypharmacology is provided by compound promiscuity, which is defined as the ability of small molecules to specifically interact with multiple targets^4,5. It should be emphasized that this form of “specificity pattern promiscuity” is distinct from non-specific interactions or assay artifacts^6–8. In light of the latter problems, it is important to identify compound classes that are frequently responsible for artificial activity readouts^7,8, e.g. through reactivity under assay conditions. Even in the absence of interaction artifacts, the experimental assessment of promiscuity, e.g. by systematic compound profiling on target sets or families, might be affected by assay confidence limits or detection techniques⁹, as is the case with any screening experiment. Hence, it might sometimes be difficult to clearly distinguish between “assay promiscuity” and true target promiscuity. Furthermore, not all of the interactions between a compound and multiple targets might make positive contributions to polypharmacology; another point that merits consideration. However, compound promiscuity, as defined herein, is a condition sine qua non for polypharmacology.

In addition to experimental studies, promiscuity can also be assessed computationally by mining the rapidly increasing amounts of compound activity data that become available and systematically collecting target annotations for compounds^3–5. For computational analysis, it is also of critical importance to carefully consider activity data integrity and confidence levels to arrive at reliable promiscuity estimates⁵. For compound data mining, public repositories are essential including ChEMBL¹⁰, the major public source of data from medicinal chemistry, PubChem’s BioAssay database¹¹, the major source of screening data, and DrugBank¹², which collects target annotations for drug candidates and drugs. Systematic computational analysis of promiscuity has been largely dependent on these resources (although proprietary pharmaceutical data have also been used).

In recent years, computational investigations have provided different promiscuity estimates, depending on the specific aims, study design, and data selection criteria that were applied. Drugs have been the major focal point of these studies. Early estimates on the basis of drug-target networks have suggested that a drug interacts with two targets on average¹³. Recently, it has been proposed that drugs directed against different target families bind to an average of two to seven targets, depending on their primary target families, and that more than 50% of current drugs bind to more than five targets³. For bioactive compounds, analysis of high-confidence activity data indicated that they interact with an average of one to two targets, with most promiscuous compounds being annotated with two to five targets from the same target family^5,14. Moreover, the analysis of high-confidence activity data from 1085 PubChem confirmatory bioassays for 439 targets revealed that a confirmed hit interacted with only two targets on average, although nearly 80% of these active PubChem compounds were tested in more than 50 different assays¹⁵. Taken together, computational analyses of bioactive compounds from medicinal chemistry and screening sources indicated the presence of lower degrees of promiscuity overall than was detected for drugs.

These findings could be rationalized based on the assumption that drugs might often be more extensively tested against different targets than average bioactive compounds. However, this would not explain the relatively low degree of promiscuity observed for active compounds from screening libraries, many of which are extensively tested. Furthermore, promiscuity estimates from computational analysis are occasionally questioned in light of data sparseness¹⁶, referring to the fact that available active compounds have not been tested against all targets, which represents the vision and ultimate goal of chemogenomics¹⁷. Data incompleteness might principally lead to an underestimation of the degree of promiscuity. However, it remains unclear how significant such deviations might be. In fact, if one considers that millions of activity annotations are already available at present, it should be possible to deduce statistically meaningful trends from such large data samples. Such promiscuity trends might be detected by monitoring promiscuity over time as activity data grow. In a recent study, this type of analysis has been carried out for approved drugs¹⁸. For a set of 518 drugs, promiscuity was quantified over different time intervals considering activity data at different confidence levels. When only high-confidence activity records were considered, an increase in the average degree of promiscuity from 1.5 to 3.2 targets per drug was detected over a period of 14 years (from 2000 and 2014). By contrast, when all available activity data were considered, regardless of confidence levels, partially unrealistic increases in promiscuity were observed, ranging from six targets per drug on average in 2000 to more than 28 targets in 2014¹⁸. For individual high-profile drugs, literally hundreds of target annotations were detected when no confidence criteria were applied. This study showed how dramatic the influence of data confidence levels on promiscuity assessment could be. Furthermore, when considering the results obtained on the basis of high-confidence activity data, the findings also corroborated conclusions drawn from earlier studies discussed above, which indicated that detectable promiscuity of active compounds and drugs might be lower overall than often assumed (and that these observations might not be largely determined by data incompleteness).

To further refine current promiscuity estimates, we report herein a detailed analysis of the degree of promiscuity of current bioactive compounds monitored over time, spanning a period of 39 years. Special attention was paid to compounds that were first recorded many years ago and are still available. Promiscuity was viewed in light of data growth and monitored using high- and low-confidence activity data. A large number of compounds qualified for this analysis and clear trends were detected. The results of our analysis are presented in the following.

Materials and methods

Growth of compound activity data

The ChEMBL database¹⁰ that was analyzed collects large numbers of compounds and activity data, mainly from the medicinal chemistry literature and the PubChem BioAssay database¹¹. The current ChEMBL version (v.20) contains 1,463,270 structurally distinct compounds with activity against 10,774 targets. From 1,148,942 assays, a total of 13,520,737 activity records originated, as reported in Table 1. To systematically explore data growth over time, our analysis focused on data for which publication dates were available, which included 913,972 compounds, 10,142 targets, 872,577 assays, and 5,258,052 activity records (Table 1). The growth of these data was monitored on an annual basis. For each year, the number of new entries that became available and the total (cumulative) number of entries was recorded.

Table 1. ChEMBL v.20 statistics.

For ChEMBL v.20 and subsets for which specific publication dates were available, the total number of compounds, targets, assays, and activity records (activities) is shown.

Number of	Total	With publication dates
Compounds	1,463,270	913,972
Targets	10,774	10,142
Assays	1,148,942	872,577
Activities	13,520,737	5,258,052

Data sets of varying confidence levels

In order to investigate compound promiscuity over time as well as the effect of data confidence levels on promiscuity degrees, two data sets with different confidence were assembled from ChEMBL v.20. For the high-confidence data set, a series of selection criteria was applied. Compounds with direct interactions (i.e. assay relationship type “D”) with human single-protein targets at the highest confidence level (i.e. assay confidence score 9) were collected. The two ChEMBL parameters ‘assay relationship type’ and ‘assay confidence score’ qualitatively and quantitatively describe, respectively, the level of confidence that the activity against a given target is evaluated in a relevant assay system. Accordingly, type “D” and score 9 represent the highest level of confidence for activity data. In addition, two types of activity measurements were considered; assay-independent equilibrium constants (K_i values) and assay-dependent IC₅₀ values. To ensure a high level of data integrity, only compounds with explicitly defined K_i and/or IC₅₀ values were selected. Hence, approximate measurements such as “>”, “<”, and “~” were disregarded. Furthermore, activity records including the comments “inactive”, “inconclusive”, or “not active”, were discarded. Thus, this compound set exclusively contained high-confidence activity data. By contrast, the low-confidence data set comprised all compounds with reported interactions against human single-protein targets, regardless of their confidence levels and activity measurement types.

Monitoring compound activity records over time

On the basis of the high- and low-confidence data sets, the progression of compound promiscuity was quantified. Activity records with publication dates were assigned to individual compounds. For each year, activity records were assembled. For instance, if a compound was reported to be active against target A in 1990, targets B and C in 2000, and target D in 2005, the cumulative activity records for this compound consisted of target A in 1990, targets A, B and C in 2000, and targets A, B, C, and D in 2005. Thus, the degree of promiscuity of this compound increased from 1 over 3 to 4. For the assessment of compound promiscuity, potency values were not taken into account. The degree of promiscuity was assessed on the basis of qualifying activity records. If a compound was tested in various assays against the same target in different years, yielding the same or different potency values, we only recorded the first year of reported activity. For a given year, the average degree of promiscuity was calculated over all qualifying compounds. In addition, subsets of compounds for which activity data first became available in 1994 (20 year activity history) or 2004 (10 year history) were separately monitored.

Results and discussion

Growth of compounds, targets, assays, and activity records

In ChEMBL v.20, publication dates were reported for 913,972 compounds, 10,142 targets, 872,577 assays, and 5,258,052 activity records (Table 1). Initially, the growth of these source data was analyzed over time. Figure 1 reports the number of new entries that became available each year since 1976 and the total (cumulative) number of entries for each year. As shown in Figure 1a, only 3188 compounds were reported in 1976. In 1977, 6496 compounds were published, yielding a total of 9684 compounds. Since then steady growth in compound numbers was observed until 2006 when the growth rate became nearly exponential, with ~50,000–80,000 compounds becoming available in 2007 and subsequent years. The number of new compounds published in 2014 was much lower, probably due to the likely situation that not all new compounds and activity data published in 2014 would have been deposited in the database by the end of the year. Similar growth trends were observed for targets (Figure 1b), assays (Figure 1c) and activity records (Figure 1d). The cumulative number of papers published over time was found to parallel the growth of assay data, as shown in Figure 1c. In a related study, compounds and targets published in the scientific and patent literature during 1991 and 2012 were analyzed in detail on the basis of the commercial GOSTAR database¹⁹.

Figure 1. Growth of compounds, targets, assays, and activity records.

The growth of compounds (a), targets (b), assays (c), and activity records (d) is reported. In (a), the number of new compounds becoming available each year is provided using blue bars (scale on the left vertical axis) and the cumulative number of compounds is given as a red line (scale on the right). Corresponding representations are used in (b)–(d). In (c), the cumulative number of publications (green) from which ChEMBL data originated is also reported (scale on the left).

In Table 2, the numbers of compounds, targets, assays, and activity records available in 1976 and 2014 are compared. Within this 39-year period, available activity records increased most significantly from 13,999 to 5,258,052 (by a factor of ~376). For compounds and assays, growth factors were comparable (~287 and ~261, respectively). The number of targets increased by a factor of ~79.

Table 2. Data growth.

The numbers of compounds, targets, assays, and activity records available in 1976 and 2014 are compared.

Number of	1976	2014	Increase (fold)
Compounds	3188	913,972	286.7
Targets	128	10,142	79.2
Assays	3347	872,577	260.7
Activities	13,999	5,258,052	375.6

Overall, significant increases in the number of compounds, targets, assays, and activity records were observed, especially from 2007 on, thus providing a sound basis for the analysis of compound promiscuity progression over time.

High- and low-confidence data sets

Based on the selection criteria detailed above, two sets of compounds with high- and low-confidence activity data were assembled. In the low-confidence set, compounds with any reported activities against human single-protein targets were included, without applying additional data confidence criteria. By contrast, for the high-confidence set, additional criteria were applied including assay confidence levels as well as the type and integrity of potency measurements. As reported in Table 3, the high-confidence set contained 154,062 compounds active against 1449 targets, yielding a total of nearly 234,000 activity records with publication dates. In the low-confidence set, 361,159 compounds active against 2552 targets were available, yielding a total of nearly 782,000 activity records. Datasets of this magnitude were expected to reveal statistically relevant trends in promiscuity progression.

Table 3. Data with different confidence levels.

The numbers of compounds, targets, assays, and activity records with available publication dates are reported for the high- and low-confidence data sets, respectively.

Number of	High-confidence set	Low-confidence set
Compounds	154,062	361,159
Targets	1449	2552
Assays	27,876	141,319
Activities	233,971	781,707

Compound promiscuity over time

Global estimate. For compounds in the high- and low-confidence data sets, the average degree of compound promiscuity was determined over the years, as reported in Figure 2. Early on, compounds from both data sets were mostly associated with single-target activities (corresponding to a promiscuity degree of 1). Beginning in 2004, a difference in promiscuity between the high- and low-confidence sets became apparent. However, only a limited increase in promiscuity was observed for compounds from both data sets. From 1976 to 2014, the average degree of promiscuity increased from 1 to 1.5 for the high- and from 1 to 2.2 for the low-confidence data set, thus indicating an overall low degree of promiscuity among bioactive compounds. More interestingly, the average degree of promiscuity for compounds in the high-confidence set only increased by 0.4 (i.e. by less than one target) after 1994 and essentially remained constant between 2004 and 2014, although the amount of available compounds and activity data dramatically increased after 2006 (Figure 1).

Figure 2. Compound promiscuity over time.

For compounds in the high- and low-confidence data sets, the average degree of compound promiscuity is reported over different years.

Promiscuity on a per-compound basis. In addition to the global assessment of compound promiscuity, progression of promiscuity was also monitored for individual compounds. Table 4 reports the number of compounds with increasing degrees of promiscuity over time. Strikingly, a total of 151,786 (i.e. 98.5%; high-confidence set) and 352,466 (97.6%; low-confidence set) compounds displayed constant degrees of promiscuity over time. Exemplary compounds are shown in Figure 3. These compounds were active against varying numbers of targets. Yet their degrees of promiscuity remained constant until 2014. It is unlikely that subsets of large numbers of compounds with a constant degree of promiscuity over many years have not been tested in various assays. For example, the compound shown at the bottom left in Figure 3 (CHEMBL340211) was reported to be active against two targets in 1993. However, no additional high-confidence activity data became available for this compound during the following 21 years. An abundance of such examples exists for compounds active across current targets.

Table 4. Increasing promiscuity.

The number of compounds with increasing degrees of promiscuity (∆Promiscuity) is reported for the high- and low-confidence data sets. For example, “0” indicates that the degree of promiscuity remained constant over time and “5” that the degree of promiscuity increased by five target annotations.

∆Promiscuity	#Compounds
	High-confidence set	Low-confidence set
0	151,786	352,466
1	1239	4099
2	469	1721
3	220	816
4	102	398
5	65	305
6–10	130	698
11–20	40	283
21–50	9	137
> 50	2	236
Total	154,062	361,159

Figure 3. Compounds with constant promiscuity.

Shown are eight exemplary compounds from the high-confidence data set that displayed a constant degree of promiscuity over different time periods. For each compound, its ChEMBL ID, the degree of promiscuity, and the first year in which target-specific activities were reported are given. For example, “2 | 1993” (lower left) indicates that this compound was first reported in 1993 to be active against two targets and that this degree of promiscuity (i.e., 2) has remained constant until 2014.

Increases in promiscuity were only observed for 2276 and 8693 compounds in the high- and low-confidence sets, respectively (Table 4). Moreover, only 181 (high-confidence set) and 1354 (low-confidence set) compounds - a minute fraction of all monitored compounds - gained more than five target annotations over the years.

Compounds with 20 year activity history. Subsets of compounds reported to be active since 1994 were assembled. From the high- and low-confidence sets, 1040 and 19,351 qualifying compounds were obtained, respectively. Promiscuity progression over the subsequent 20 years was separately analyzed for these compound subsets. Figure 4a shows that the degree of promiscuity of the 1040 compounds from the high-confidence data set essentially remained constant, with an increase from 1.1 (1994) to only 1.2 (2014), hence representing lower promiscuity than the global degree of promiscuity determined for the high-confidence set. For the 19,351 compounds from the low-confidence set, the degree of promiscuity only increased from 1.3 to 1.6, which was also lower than the global degree of promiscuity for this set (Figure 4b). Hence, on the basis of activity data monitored over the course of 20 years, compound promiscuity only slightly increased and promiscuity rates were lower than might have been anticipated, although large amounts of activity data became available over time.

Figure 4. Promiscuity of compounds available since 1994.

The average degree of promiscuity was compared for all high- (a) and low-confidence (b) set compounds (solid lines) and subsets of compounds reported to be active beginning in 1994 (dashed lines).

Compounds with varying molecular weight and lipophilicity

For compounds in the high- and low-confidence data sets, molecular weight (MW) and octanol/water (o/w) partition coefficient (logP) values were calculated using the Molecular Operating Environment²⁰ and their degrees of promiscuity were calculated, as shown in Figure 5.

Figure 5. Average degree of promiscuity for compounds with varying molecular weight and logP values.

The average degree of promiscuity is reported for compounds in the high- and low-confidence data sets with increasing (a) molecular weight and (b) log P values. For each property value range, the number of corresponding compounds is provided at the bottom.

Figure 5a reports the average degree of promiscuity for compounds of increasing MW. Seven MW ranges were defined. The majority of compounds in the high- and low-confidence data sets had MW between 300 and 600 Da. In both data sets, the degree of promiscuity tended to decrease with increasing MW, consistent with previous observations⁵. For compounds with MW between 300 and 600 Da, the average promiscuity rates remained nearly constant.

In addition, the influence of lipophilicity on compound promiscuity was analyzed, as reported in Figure 5b (logP values were calculated as a measure of lipophilic character). Compounds covered a wide range of logP values. The majority of compounds in the high- and low-confidence data sets had logP values between 0 and 6. In general, average promiscuity rates first increased over low logP value ranges and then decreased over intermediate to high value ranges. Slightly different trends were observed for the high- and low-confidence sets. For compounds in the high-confidence set, hydrophilic compounds (i.e., having logP values between -4 and 0) displayed the highest degree of promiscuity. Compounds with logP values between 0 and 4 in the low-confidence set displayed above average promiscuity rates, i.e., 2.2. For compounds with further increasing logP values, promiscuity degrees declined. In the high-confidence set, compounds with logP values greater than 4 had an essentially constant degree of promiscuity. Surprisingly, these findings did not reflect the generally expected trend that compound promiscuity correlates with lipophilicity²¹. However, in another study²², decreasing assay hit rates were also observed for compounds with increasing lipophilicity²².

Current promiscuity levels for bioactive compounds

Up-to-date promiscuity levels were determined for all qualifying compounds, the subsets of compounds for which activity data first became available in 1994 (20 year activity history), and compound subsets for which activity data first became available in 2004 (10 year history). The results are reported in Table 5. The degree of promiscuity was consistently low in all cases and differences in promiscuity were only marginal. For the high-confidence set, the average degree of promiscuity ranged from 1.3 (20 year activity history) over 1.5 (all compounds) to 1.7 (10 year activity history). For the low-confidence set, it ranged from 1.6 (20 year history) over 2.0 (10 year history) to 2.2 (all compounds). Thus, bioactive compounds generally displayed only a low degree of promiscuity, regardless of the data set from which they originated.

Table 5. Current promiscuity rates.

For the high- and low-confidence data sets, the current average degree of promiscuity is reported for all compounds and compound subsets with activity records available since 1994 and 2004, respectively.

		Avg. promiscuity rate
High-confidence set	All 154,062 compounds	1.5
	1040 compounds with activity available since 1994	1.3
	9979 compounds with activity available since 2004	1.7
Low-confidence set	All 361,159 compounds	2.2
	19,351 compounds with activity available since 1994	1.6
	101,370 compounds with activity available since 2004	2.0

Conclusions

Currently available activity data provide an unprecedented source of information for the analysis of bioactive compounds. To assess the promiscuity of bioactive compounds in detail, available activity data have been assigned on the basis of publication dates to individual years, thus enabling the study of data growth and compound promiscuity on a time scale and in context. Monitoring compound promiscuity over time was expected to reveal sound trends concerning promiscuity progression and evolving magnitudes. Furthermore, to take data confidence explicitly into account, high- and low-confidence compound data sets were separately generated and analyzed. Data growth and promiscuity progression were ultimately monitored over nearly 40 years (beginning in 1976), both at a global level, as well as focusing on individual compounds or subsets of compounds (from the high- and low-confidence sets) with a 20 year or 10 year activity history. The analysis provided a perhaps unexpectedly clear picture and revealed generally low degrees of promiscuity for bioactive compounds, regardless of their activities and origins. Moreover, only minor increases in promiscuity over time were detected for compounds from all sets and subsets, although activity data dramatically increased since 2007. For the high-confidence set, the average degree of promiscuity only increased from 1 to 1.5 over time. Furthermore, even for the low-confidence set, an increase in the degree of promiscuity to only 2.2 was detected. Interestingly, in both cases, promiscuity was constant over time for most compounds. Moreover, for the high-confidence set, the degree of promiscuity essentially remained constant between 2004 and 2014, despite massive data growth. Given the extensive time course followed, the large data volumes accumulated, and the consistent trends detected, these findings could hardly be solely attributed to data incompleteness (although conclusions drawn from data mining might well be affected by data integrity and/or sparseness issues). In our systematic analysis, bioactive compounds were found to display only low degrees of promiscuity, with a surprisingly small influence of data confidence levels, and very limited promiscuity progression over time. The observed trends are anticipated to remain stable as compounds and activity data continue to grow at high rates and provide reference points for future studies of compound and drug promiscuity as the molecular basis of polypharmacology.

Data availability

The data selection criteria specified in the Materials and methods section make it possible to reproduce all data sets from ChEMBL v.20, including publication dates. The resulting data set statistics are provided in the first part of the Results and discussion section. However, the data sets generated for this study are also made freely available. ZENODO: Sets of ChEMBL compounds with high or low confidence activity data, DOI: 10.5281/zenodo.18182²³

Author contributions

JB conceived the study, YH and SJ planned and performed the analysis, YH and JB wrote the manuscript. All authors agreed to the final content of the manuscript.

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Faculty Opinions recommended

References

1. Paolini GV, Shapland RH, van Hoorn WP, et al.: Global mapping of pharmacological space. Nat Biotechnol. 2006; 24(7): 805–815. PubMed Abstract | Publisher Full Text
2. Boran AD, Iyengar R: Systems approaches to polypharmacology and drug discovery. Curr Opin Drug Discov Devel. 2010; 13(3): 297–309. PubMed Abstract | Free Full Text
3. Jalencas X, Mestres J: On the origins of drug polypharmacology. Med Chem Comm. 2013; 4(1): 80–87. Publisher Full Text
4. Hu Y, Bajorath J: Compound promiscuity: what can we learn from current data? Drug Discov Today. 2013; 18(13–14): 644–650. PubMed Abstract | Publisher Full Text
5. Hu Y, Bajorath J: High-resolution view of compound promiscuity. [v2; ref status: indexed, http://f1000r.es/1ig]. F1000Res. 2013; 2: 144. PubMed Abstract | Publisher Full Text | Free Full Text
6. McGovern SL, Caselli E, Grigorieff N, et al.: A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. J Med Chem. 2002; 45(8): 1712–1722. PubMed Abstract | Publisher Full Text
7. Baell JB, Holloway GA: New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem. 2010; 53(7): 2719–2740. PubMed Abstract | Publisher Full Text
8. Baell J, Walters MA: Chemistry: Chemical con artists foil drug discovery. Nature. 2014; 513(7519): 481–483. PubMed Abstract | Publisher Full Text
9. Dimova D, Hu Y, Bajorath J: Matched molecular pair analysis of small molecule microarray data identifies promiscuity cliffs and reveals molecular origins of extreme compound promiscuity. J Med Chem. 2012; 55(22): 10220–10228. PubMed Abstract | Publisher Full Text
10. Gaulton A, Bellis LJ, Bento AP, et al.: ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012; 40(Database issue): D1100–D1107. PubMed Abstract | Publisher Full Text | Free Full Text
11. Wang Y, Xiao J, Suzek TO, et al.: PubChem’s BioAssay Database. Nucleic Acids Res. 2012; 40(Database issue): D400–D412. PubMed Abstract | Publisher Full Text | Free Full Text
12. Law V, Knox C, Djoumbou Y, et al.: DrugBank 4.0: Shedding new light on drug metabolism. Nucleic Acids Res. 2014; 42(Database issue): D1091–D1097. PubMed Abstract | Publisher Full Text | Free Full Text
13. Yildirim MA, Goh KI, Cusick ME, et al.: Drug-target network. Nat Biotechnol. 2007; 25(10): 1119–1126. PubMed Abstract | Publisher Full Text
14. Hu Y, Bajorath J: Promiscuity profiles of bioactive compounds: potency range and difference distributions and the relation to target numbers and families. Med Chem Commun. 2013; 4: 1196–1201. Publisher Full Text
15. Hu Y, Bajorath J: What is the likelihood of an active compound to be promiscuous? Systematic assessment of compound promiscuity on the basis of PubChem confirmatory bioassay data. AAPS J. 2013; 15(3): 808–815. PubMed Abstract | Publisher Full Text | Free Full Text
16. Mestres J, Gregori-Puigjané E, Valverde S, et al.: Data completeness--the Achilles heel of drug-target networks. Nat Biotechnol. 2008; 26(9): 983–984. PubMed Abstract | Publisher Full Text
17. Rognan D: Chemogenomic approaches to rational drug design. Br J Pharmacol. 2007; 152(1): 38–52. PubMed Abstract | Publisher Full Text | Free Full Text
18. Hu Y, Bajorath J: Monitoring drug promiscuity over time [v2; ref status: indexed, http://f1000r.es/4oa]. F1000Res. 2014; 3: 218. PubMed Abstract | Publisher Full Text | Free Full Text
19. Southan C, Varkonyi P, Boppana K, et al.: Tracking 20 years of compound-to-target output from literature and patents. PLoS One. 2013; 8(10): e77142. PubMed Abstract | Publisher Full Text | Free Full Text
20. Molecular Operating Environment (MOE), 2014.09. Chemical Computing Group Inc., 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7. 2014. Reference Source
21. Leeson PD, Springthorpe B: The influence of drug-like concepts on decision-making in medicinal chemistry. Nat Rev Drug Discov. 2007; 6(11): 881–890. PubMed Abstract | Publisher Full Text
22. Tarcsay Á, Keserű GM: Contributions of molecular properties to drug promiscuity. J Med Chem. 2013; 56(5): 1789–1795. PubMed Abstract | Publisher Full Text
23. Hu Y, Jasial S, Bajorath J: Sets of ChEMBL compounds with high or low confidence activity data. Zenodo. 2015. Data Source

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 13 May 2015

Author details Author details

¹ Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, Bonn, D-53113, Germany

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (2)

version 2

Revised

Published: 15 Jun 2015, 4:118

https://doi.org/10.12688/f1000research.6473.2

version 1

Published: 13 May 2015, 4:118

https://doi.org/10.12688/f1000research.6473.1

© 2015 Hu Y et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Hu Y, Jasial S and Bajorath J. Promiscuity progression of bioactive compounds over time [version 2; peer review: 3 approved]. F1000Research 2015, 4(Chem Inf Sci):118 (https://doi.org/10.12688/f1000research.6473.2)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 2

VERSION 2

PUBLISHED 15 Jun 2015

Revised

Views

Reviewer Report 09 Jul 2015

Christopher Southan, IUPHAR/BPS Guide to PHARMACOLOGY, Center for Integrative Physiology, University of Edinburgh, Edinburgh, UK

Approved

https://doi.org/10.5256/f1000research.7096.r9427

While I could still take issue with some of the responses, the authors ... Continue reading

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 15 Jun 2015

John A. Lowe III, JL3Pharma LLC, Stonington, CT, USA

Approved

https://doi.org/10.5256/f1000research.7096.r9042

I approve the new ... Continue reading

CITE

Report a concern

Respond or Comment

Version 1

VERSION 1

PUBLISHED 13 May 2015

Views

Reviewer Report 29 May 2015

John A. Lowe III, JL3Pharma LLC, Stonington, CT, USA

Approved

https://doi.org/10.5256/f1000research.6945.r8802

The authors investigate the potential growth of off-target activity over time as new assays become available. They control for multiple potential confounds, and possibly the most important is data quality enabling confidence in the results. They note that prior data indicated screening compounds typically bind to at least two targets, while drugs may bind up to seven, but this result might be skewed by insufficient data quality. Their study reported here uses very large datasets and controls for the growth of compound number as well as new data over time. They also deliberately analyze high- and low-quality (or confidence) data separately. This careful analysis gives them a much clearer picture of the changes in compound promiscuity over time, and reveals a low level of off-target activity and only a slight increase with time. Despite new assays becoming available at an increasing rate, compound promiscuity has not increased significantly, a result that will surprise many readers, but which the authors have documented admirably. I highly recommend this manuscript for indexing.

Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Author Response (F1000Research Advisory Board Member) 15 Jun 2015

Jürgen Bajorath, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113, Germany

15 Jun 2015

Author Response F1000Research Advisory Board Member

We thank the reviewer very much for these encouraging comments (that ease the pain often felt when going through massive amounts of compound activity data while striving for a rigorous ... Continue reading We thank the reviewer very much for these encouraging comments (that ease the pain often felt when going through massive amounts of compound activity data while striving for a rigorous analysis …).
We thank the reviewer very much for these encouraging comments (that ease the pain often felt when going through massive amounts of compound activity data while striving for a rigorous analysis …).
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response (F1000Research Advisory Board Member) 15 Jun 2015

Jürgen Bajorath, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113, Germany

15 Jun 2015

Author Response F1000Research Advisory Board Member

We thank the reviewer very much for these encouraging comments (that ease the pain often felt when going through massive amounts of compound activity data while striving for a rigorous ... Continue reading We thank the reviewer very much for these encouraging comments (that ease the pain often felt when going through massive amounts of compound activity data while striving for a rigorous analysis …).
We thank the reviewer very much for these encouraging comments (that ease the pain often felt when going through massive amounts of compound activity data while striving for a rigorous analysis …).
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 26 May 2015

Christopher Southan, IUPHAR/BPS Guide to PHARMACOLOGY, Center for Integrative Physiology, University of Edinburgh, Edinburgh, UK

Approved with Reservations

https://doi.org/10.5256/f1000research.6945.r8663

The analysis presented in this paper is of considerable interest and should be indexed, However, I think there are many confounding factors within the ChEMBL data that the authors have not addressed sufficiently. I will pick up some of these ... Continue reading

Polypharmacology usually implies the affects mediated via the multiple targets are therapeutically “positive”. Is this the authors' implication also? Otherwise the term implicitly extends to toxicity and side effects.
Figure 1 should include the distribution that underlies the other three, namely papers per-year.
While the databases used were different, a published tracking of compound output from papers showed much less increase over 20 years than in figure 1 (PMID:24204758) although the target growth pattern was similar. Have the authors checked that ChEMBL did not pick up new journal coverage from 2008 that would spike the increases?
I would like more detail on how the filtration methodology in the paper is used to extract and score (a flow chart would help). Let me pose a hypothetical case of two compounds. The first ranks target A at an IC50 of 20nM and target B at 30 nM. The second compound is 1nM and 500 nM for the same two targets. Do the two cases get the same promiscuity score? (It would be confounding if they did.)
What happens when compound-target-assay values are identical for different publication years (not uncommon in ChEMBL) - Do you score only the first year ?
I’m confused by use of “release date” (as for a database) surely “publication date” is meant?
For fig.3 I suggest the dominant explanation for apparently constant promiscuity is simply “publish-and-forget” (i.e. researchers typically do not re-test compounds published by others). As we know re-testing leading to the publication of new results (promiscuous or not) will be largely dependent as to whether structures become reference compounds, are advanced into development, or become drugs. So could the “papers-per-compound” relationship be plotted to provide insight into this?
There are other confounding trends that could be tested for, for example targets-per-paper (i.e. < cross-screening over the years might correlate with apparent promiscuity <) and orthologue vs paralogoue cross screening (i.e. if the average human:rodent ratio changes over time for the low confidence set).
Why not select kinase inhibitors as a control subset? We would expect these to exhibit highest promiscuity and they would thus be an important methodological cross-check.
In terms of other obvious hypothesis checks why not split by LogP (as might increase promiscuity) and Mw (as might decrease it) ?
While appreciating the academic imperative I do wish this team could have merged some of their previous papers that appear to address essentially the same theme. For example, comparing drug (ref.18) vs non-drug promiscuity in the same standardised study is better (easier to review even :) than splitting the result sets.

Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response (F1000Research Advisory Board Member) 15 Jun 2015

Jürgen Bajorath, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113, Germany

15 Jun 2015

Author Response F1000Research Advisory Board Member

1. This view is shared by us.

2. The number of papers published per year was analyzed and found to essentially parallel the growth of assays (Figure 1c of the revision).

3. ... Continue reading 1. This view is shared by us.

2. The number of papers published per year was analyzed and found to essentially parallel the growth of assays (Figure 1c of the revision).

3. The study (PMID: 24204758) published by Southan et al. is now referenced but not directly comparable to our current analysis. The authors analyzed the commercial GOSTAR database that includes the GVKBIO Medicinal Chemistry Database and Target Class Database. Figure 1 in this study reports the number of compounds linked to human protein targets in journals published over different years. Promiscuity analysis is not reported. On a close look, the trends of compound growth over time reported in the Southan et al. paper and our current study are actually rather similar. In addition, we note that GVKBIO apparently covers 120 journals while ChEMBL covers 47.

4. The data filtering criteria for the collection of high- and low-confidence data sets have been detailed in the section “Data sets of varying confidence levels”. No promiscuity score was calculated taking potency into account. The degree of compound promiscuity was assessed on the basis of qualifying activity records, as stated in our paper. In the hypothetical example given by the referee, the two compounds would share the same degree of promiscuity (2).

5. We did not score compound-target-assay values, only recorded them. Combination of compounds, targets, and assays analyzed in our study were unique for individual years. Thus, there were no identical values for the same compound-target-assay combination in different years. However, there were cases where a compound was tested in various assays against the same target in different years, yielding the same or different potency values. In these cases, we only recorded the first year, as specified in the revision.

6. Yes, “release date” means “publication date”, as clarified in the revision.

7. This is a (quite plausible) hypothesis, like others put forward in trying to rationalize promiscuity. Our view is data-centric, as commented on in our study. The papers-per-compound ratio in ChEMBL typically is close to one (with relatively small sets of standards/references used in drug target assays being an exception). However, we cannot deduce much from this ratio because inactivity records are sparse in the literature and are not considered in ChEMBL (and other repositories). However, they are available in PubChem. As reported previously (PMID: 23605807), approximately 77% of screening hits in PubChem have been tested in at least 50 confirmatory assays. Yet, the detectable average degree of promiscuity for screening hits is only 2.5, thus only slightly larger than for ChEMBL compounds.

8. There certainly is always more one could possibly do.

9. The promiscuity of kinase inhibitors has been explored in previous studies.
For example, in reference 5, compound promiscuity for five well-known target families including kinases was reported. It was observed that kinase inhibitors did not display a higher degree of promiscuity than compounds directed against other major therapeutic targets; a trend we have consistently observed (please, also consider PMID: 25051177). In the context of promiscuity analysis, kinase inhibitors are a good example for cases where expectations/hypotheses are not necessarily consistent with results obtained on the basis of current data.

10. As suggested, the degree of promiscuity was further analyzed for compounds with varying logP values and molecular weight, as reported in new Figure 5 of our revision.

11. The in part unexpected results reported in reference 18 had inspired us to also have a close look at promiscuity of bioactive compounds on a time course; as we know it takes some thought and time to conceptualize such studies and understand which questions to ask next.
1. This view is shared by us.

2. The number of papers published per year was analyzed and found to essentially parallel the growth of assays (Figure 1c of the revision).

3. The study (PMID: 24204758) published by Southan et al. is now referenced but not directly comparable to our current analysis. The authors analyzed the commercial GOSTAR database that includes the GVKBIO Medicinal Chemistry Database and Target Class Database. Figure 1 in this study reports the number of compounds linked to human protein targets in journals published over different years. Promiscuity analysis is not reported. On a close look, the trends of compound growth over time reported in the Southan et al. paper and our current study are actually rather similar. In addition, we note that GVKBIO apparently covers 120 journals while ChEMBL covers 47.

4. The data filtering criteria for the collection of high- and low-confidence data sets have been detailed in the section “Data sets of varying confidence levels”. No promiscuity score was calculated taking potency into account. The degree of compound promiscuity was assessed on the basis of qualifying activity records, as stated in our paper. In the hypothetical example given by the referee, the two compounds would share the same degree of promiscuity (2).

5. We did not score compound-target-assay values, only recorded them. Combination of compounds, targets, and assays analyzed in our study were unique for individual years. Thus, there were no identical values for the same compound-target-assay combination in different years. However, there were cases where a compound was tested in various assays against the same target in different years, yielding the same or different potency values. In these cases, we only recorded the first year, as specified in the revision.

6. Yes, “release date” means “publication date”, as clarified in the revision.

7. This is a (quite plausible) hypothesis, like others put forward in trying to rationalize promiscuity. Our view is data-centric, as commented on in our study. The papers-per-compound ratio in ChEMBL typically is close to one (with relatively small sets of standards/references used in drug target assays being an exception). However, we cannot deduce much from this ratio because inactivity records are sparse in the literature and are not considered in ChEMBL (and other repositories). However, they are available in PubChem. As reported previously (PMID: 23605807), approximately 77% of screening hits in PubChem have been tested in at least 50 confirmatory assays. Yet, the detectable average degree of promiscuity for screening hits is only 2.5, thus only slightly larger than for ChEMBL compounds.

8. There certainly is always more one could possibly do.

9. The promiscuity of kinase inhibitors has been explored in previous studies.
For example, in reference 5, compound promiscuity for five well-known target families including kinases was reported. It was observed that kinase inhibitors did not display a higher degree of promiscuity than compounds directed against other major therapeutic targets; a trend we have consistently observed (please, also consider PMID: 25051177). In the context of promiscuity analysis, kinase inhibitors are a good example for cases where expectations/hypotheses are not necessarily consistent with results obtained on the basis of current data.

10. As suggested, the degree of promiscuity was further analyzed for compounds with varying logP values and molecular weight, as reported in new Figure 5 of our revision.

11. The in part unexpected results reported in reference 18 had inspired us to also have a close look at promiscuity of bioactive compounds on a time course; as we know it takes some thought and time to conceptualize such studies and understand which questions to ask next.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response (F1000Research Advisory Board Member) 15 Jun 2015

Jürgen Bajorath, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113, Germany

15 Jun 2015

Author Response F1000Research Advisory Board Member

1. This view is shared by us.

2. The number of papers published per year was analyzed and found to essentially parallel the growth of assays (Figure 1c of the revision).

3. ... Continue reading 1. This view is shared by us.

2. The number of papers published per year was analyzed and found to essentially parallel the growth of assays (Figure 1c of the revision).

3. The study (PMID: 24204758) published by Southan et al. is now referenced but not directly comparable to our current analysis. The authors analyzed the commercial GOSTAR database that includes the GVKBIO Medicinal Chemistry Database and Target Class Database. Figure 1 in this study reports the number of compounds linked to human protein targets in journals published over different years. Promiscuity analysis is not reported. On a close look, the trends of compound growth over time reported in the Southan et al. paper and our current study are actually rather similar. In addition, we note that GVKBIO apparently covers 120 journals while ChEMBL covers 47.

4. The data filtering criteria for the collection of high- and low-confidence data sets have been detailed in the section “Data sets of varying confidence levels”. No promiscuity score was calculated taking potency into account. The degree of compound promiscuity was assessed on the basis of qualifying activity records, as stated in our paper. In the hypothetical example given by the referee, the two compounds would share the same degree of promiscuity (2).

5. We did not score compound-target-assay values, only recorded them. Combination of compounds, targets, and assays analyzed in our study were unique for individual years. Thus, there were no identical values for the same compound-target-assay combination in different years. However, there were cases where a compound was tested in various assays against the same target in different years, yielding the same or different potency values. In these cases, we only recorded the first year, as specified in the revision.

6. Yes, “release date” means “publication date”, as clarified in the revision.

7. This is a (quite plausible) hypothesis, like others put forward in trying to rationalize promiscuity. Our view is data-centric, as commented on in our study. The papers-per-compound ratio in ChEMBL typically is close to one (with relatively small sets of standards/references used in drug target assays being an exception). However, we cannot deduce much from this ratio because inactivity records are sparse in the literature and are not considered in ChEMBL (and other repositories). However, they are available in PubChem. As reported previously (PMID: 23605807), approximately 77% of screening hits in PubChem have been tested in at least 50 confirmatory assays. Yet, the detectable average degree of promiscuity for screening hits is only 2.5, thus only slightly larger than for ChEMBL compounds.

8. There certainly is always more one could possibly do.

9. The promiscuity of kinase inhibitors has been explored in previous studies.
For example, in reference 5, compound promiscuity for five well-known target families including kinases was reported. It was observed that kinase inhibitors did not display a higher degree of promiscuity than compounds directed against other major therapeutic targets; a trend we have consistently observed (please, also consider PMID: 25051177). In the context of promiscuity analysis, kinase inhibitors are a good example for cases where expectations/hypotheses are not necessarily consistent with results obtained on the basis of current data.

10. As suggested, the degree of promiscuity was further analyzed for compounds with varying logP values and molecular weight, as reported in new Figure 5 of our revision.

11. The in part unexpected results reported in reference 18 had inspired us to also have a close look at promiscuity of bioactive compounds on a time course; as we know it takes some thought and time to conceptualize such studies and understand which questions to ask next.
1. This view is shared by us.

2. The number of papers published per year was analyzed and found to essentially parallel the growth of assays (Figure 1c of the revision).

3. The study (PMID: 24204758) published by Southan et al. is now referenced but not directly comparable to our current analysis. The authors analyzed the commercial GOSTAR database that includes the GVKBIO Medicinal Chemistry Database and Target Class Database. Figure 1 in this study reports the number of compounds linked to human protein targets in journals published over different years. Promiscuity analysis is not reported. On a close look, the trends of compound growth over time reported in the Southan et al. paper and our current study are actually rather similar. In addition, we note that GVKBIO apparently covers 120 journals while ChEMBL covers 47.

4. The data filtering criteria for the collection of high- and low-confidence data sets have been detailed in the section “Data sets of varying confidence levels”. No promiscuity score was calculated taking potency into account. The degree of compound promiscuity was assessed on the basis of qualifying activity records, as stated in our paper. In the hypothetical example given by the referee, the two compounds would share the same degree of promiscuity (2).

5. We did not score compound-target-assay values, only recorded them. Combination of compounds, targets, and assays analyzed in our study were unique for individual years. Thus, there were no identical values for the same compound-target-assay combination in different years. However, there were cases where a compound was tested in various assays against the same target in different years, yielding the same or different potency values. In these cases, we only recorded the first year, as specified in the revision.

6. Yes, “release date” means “publication date”, as clarified in the revision.

7. This is a (quite plausible) hypothesis, like others put forward in trying to rationalize promiscuity. Our view is data-centric, as commented on in our study. The papers-per-compound ratio in ChEMBL typically is close to one (with relatively small sets of standards/references used in drug target assays being an exception). However, we cannot deduce much from this ratio because inactivity records are sparse in the literature and are not considered in ChEMBL (and other repositories). However, they are available in PubChem. As reported previously (PMID: 23605807), approximately 77% of screening hits in PubChem have been tested in at least 50 confirmatory assays. Yet, the detectable average degree of promiscuity for screening hits is only 2.5, thus only slightly larger than for ChEMBL compounds.

8. There certainly is always more one could possibly do.

9. The promiscuity of kinase inhibitors has been explored in previous studies.
For example, in reference 5, compound promiscuity for five well-known target families including kinases was reported. It was observed that kinase inhibitors did not display a higher degree of promiscuity than compounds directed against other major therapeutic targets; a trend we have consistently observed (please, also consider PMID: 25051177). In the context of promiscuity analysis, kinase inhibitors are a good example for cases where expectations/hypotheses are not necessarily consistent with results obtained on the basis of current data.

10. As suggested, the degree of promiscuity was further analyzed for compounds with varying logP values and molecular weight, as reported in new Figure 5 of our revision.

11. The in part unexpected results reported in reference 18 had inspired us to also have a close look at promiscuity of bioactive compounds on a time course; as we know it takes some thought and time to conceptualize such studies and understand which questions to ask next.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 14 May 2015

Georgia B. McGaughey, Vertex Pharmaceuticals, Boston, MA, USA

Approved

https://doi.org/10.5256/f1000research.6945.r8662

I believe this paper should be indexed as I have not seen such a methodical and quantified examination of promiscuity before. The article is well written and easy to read. Figures are compelling.

Although there is compelling data included suggesting that over time, promiscuity generally doesn’t increase for a given compound, I'm a bit skeptical on concluding that promiscuity may not have markedly increased over the past few decades based on merely ChEMBL. I, however, recognize that those in academia (or in a biotech company where large receptor screening may not be part of the business model) may not have access to an orthogonal data set. Frankly, other than ChEMBL, I’m not sure where else one would go to look for off target data. There are purchasable databases (e.g. Integrity). However, more public data is relatively sparse. Even a (young) small-ish biotechnology company would not have enough data to utilize.

One topic that has been raised in the literature the past few years is the concept of phenotypic versus target based drug discovery approaches to developing new medicines. I would have liked to see some differentiation between promiscuity of target based versus phenotypic based projects. Is that something the reviewers can go back to and annotate their data set?

Additionally, discussion around the differences between promiscuity and polypharmacology should be elaborated upon. I realize this is raised in the "introduction", but I would have liked to see more attention paid to this topic.

Finally, will the data sets be publicly available with annotations?

Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Author Response (F1000Research Advisory Board Member) 15 Jun 2015

Jürgen Bajorath, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113, Germany

15 Jun 2015

Author Response F1000Research Advisory Board Member

1. We absolutely agree with the referee that it would be highly desirable to have more pharmaceutically relevant data sets available in the public domain, for instance, large profiling data ... Continue reading 1. We absolutely agree with the referee that it would be highly desirable to have more pharmaceutically relevant data sets available in the public domain, for instance, large profiling data sets, which we would love to analyze. We also note that ChEMBL data have been, and continue to be, incorporated in a number of different public databases. Concerning commercial databases, we have made a decision not to analyze (and publish about) databases that are not publicly available (although they are occasionally offered to us).

2. It is also true that publicly available phenotypic data sets would be of great help for the field moving forward. Unfortunately, very little phenotypic data is currently available. For ChEMBL activity records, it is not possible to trace potential phenotypic origin at present (we assume very little target-based activity data derived from phenotypic assays is currently available, if any).

3. The discussion on promiscuity and polypharmacology has been further extended in our revision, as suggested.

4. The high- and low-confidence data sets will be made publicly available as an open access deposition, as stated in the revised manuscript.
1. We absolutely agree with the referee that it would be highly desirable to have more pharmaceutically relevant data sets available in the public domain, for instance, large profiling data sets, which we would love to analyze. We also note that ChEMBL data have been, and continue to be, incorporated in a number of different public databases. Concerning commercial databases, we have made a decision not to analyze (and publish about) databases that are not publicly available (although they are occasionally offered to us).

2. It is also true that publicly available phenotypic data sets would be of great help for the field moving forward. Unfortunately, very little phenotypic data is currently available. For ChEMBL activity records, it is not possible to trace potential phenotypic origin at present (we assume very little target-based activity data derived from phenotypic assays is currently available, if any).

3. The discussion on promiscuity and polypharmacology has been further extended in our revision, as suggested.

4. The high- and low-confidence data sets will be made publicly available as an open access deposition, as stated in the revised manuscript.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response (F1000Research Advisory Board Member) 15 Jun 2015

Jürgen Bajorath, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113, Germany

15 Jun 2015

Author Response F1000Research Advisory Board Member

1. We absolutely agree with the referee that it would be highly desirable to have more pharmaceutically relevant data sets available in the public domain, for instance, large profiling data ... Continue reading 1. We absolutely agree with the referee that it would be highly desirable to have more pharmaceutically relevant data sets available in the public domain, for instance, large profiling data sets, which we would love to analyze. We also note that ChEMBL data have been, and continue to be, incorporated in a number of different public databases. Concerning commercial databases, we have made a decision not to analyze (and publish about) databases that are not publicly available (although they are occasionally offered to us).

2. It is also true that publicly available phenotypic data sets would be of great help for the field moving forward. Unfortunately, very little phenotypic data is currently available. For ChEMBL activity records, it is not possible to trace potential phenotypic origin at present (we assume very little target-based activity data derived from phenotypic assays is currently available, if any).

3. The discussion on promiscuity and polypharmacology has been further extended in our revision, as suggested.

4. The high- and low-confidence data sets will be made publicly available as an open access deposition, as stated in the revised manuscript.
1. We absolutely agree with the referee that it would be highly desirable to have more pharmaceutically relevant data sets available in the public domain, for instance, large profiling data sets, which we would love to analyze. We also note that ChEMBL data have been, and continue to be, incorporated in a number of different public databases. Concerning commercial databases, we have made a decision not to analyze (and publish about) databases that are not publicly available (although they are occasionally offered to us).

2. It is also true that publicly available phenotypic data sets would be of great help for the field moving forward. Unfortunately, very little phenotypic data is currently available. For ChEMBL activity records, it is not possible to trace potential phenotypic origin at present (we assume very little target-based activity data derived from phenotypic assays is currently available, if any).

3. The discussion on promiscuity and polypharmacology has been further extended in our revision, as suggested.

4. The high- and low-confidence data sets will be made publicly available as an open access deposition, as stated in the revised manuscript.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 13 May 2015

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 2 (revision) 15 Jun 15		read	read
Version 1 13 May 15	read	read	read

Georgia B. McGaughey, Vertex Pharmaceuticals, Boston, USA
Christopher Southan, University of Edinburgh, Edinburgh, UK
John A. Lowe III, JL3Pharma LLC, Stonington, USA

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

22 Views

09 Jul 2015 | for Version 2

Christopher Southan, IUPHAR/BPS Guide to PHARMACOLOGY, Center for Integrative Physiology, University of Edinburgh, Edinburgh, UK

22 Views Cite this report Responses(0)

Approved

While I could still take issue with some of the responses, the authors have shown sufficient diligence in formulating them for me to accept these revisions.

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

22 Views

15 Jun 2015 | for Version 2

John A. Lowe III, JL3Pharma LLC, Stonington, CT, USA

22 Views Cite this report Responses(0)

Approved

I approve the new version of the article.

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

34 Views

29 May 2015 | for Version 1

John A. Lowe III, JL3Pharma LLC, Stonington, CT, USA

34 Views Cite this report Responses(1)

Approved

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (1)

Back to all reports

Reviewer Report

54 Views

26 May 2015 | for Version 1

Christopher Southan, IUPHAR/BPS Guide to PHARMACOLOGY, Center for Integrative Physiology, University of Edinburgh, Edinburgh, UK

54 Views Cite this report Responses(1)

Approved With Reservations

Polypharmacology usually implies the affects mediated via the multiple targets are therapeutically “positive”. Is this the authors' implication also? Otherwise the term implicitly extends to toxicity and side effects.
Figure 1 should include the distribution that underlies the other three, namely papers per-year.
While the databases used were different, a published tracking of compound output from papers showed much less increase over 20 years than in figure 1 (PMID:24204758) although the target growth pattern was similar. Have the authors checked that ChEMBL did not pick up new journal coverage from 2008 that would spike the increases?
I would like more detail on how the filtration methodology in the paper is used to extract and score (a flow chart would help). Let me pose a hypothetical case of two compounds. The first ranks target A at an IC50 of 20nM and target B at 30 nM. The second compound is 1nM and 500 nM for the same two targets. Do the two cases get the same promiscuity score? (It would be confounding if they did.)
What happens when compound-target-assay values are identical for different publication years (not uncommon in ChEMBL) - Do you score only the first year ?
I’m confused by use of “release date” (as for a database) surely “publication date” is meant?
For fig.3 I suggest the dominant explanation for apparently constant promiscuity is simply “publish-and-forget” (i.e. researchers typically do not re-test compounds published by others). As we know re-testing leading to the publication of new results (promiscuous or not) will be largely dependent as to whether structures become reference compounds, are advanced into development, or become drugs. So could the “papers-per-compound” relationship be plotted to provide insight into this?
There are other confounding trends that could be tested for, for example targets-per-paper (i.e. < cross-screening over the years might correlate with apparent promiscuity <) and orthologue vs paralogoue cross screening (i.e. if the average human:rodent ratio changes over time for the low confidence set).
Why not select kinase inhibitors as a control subset? We would expect these to exhibit highest promiscuity and they would thus be an important methodological cross-check.
In terms of other obvious hypothesis checks why not split by LogP (as might increase promiscuity) and Mw (as might decrease it) ?
While appreciating the academic imperative I do wish this team could have merged some of their previous papers that appear to address essentially the same theme. For example, comparing drug (ref.18) vs non-drug promiscuity in the same standardised study is better (easier to review even :) than splitting the result sets.

Competing Interests

No competing interests were disclosed.

Respond to this report

Responses (1)

Author Response F1000Research Advisory Board Member

15 Jun 2015

Jürgen Bajorath, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113, Germany

1. This view is shared by us.

2. The number of papers published per year was analyzed and found to essentially parallel the growth of assays (Figure 1c of the revision).

3. The study (PMID: 24204758) published by Southan et al. is now referenced but not directly comparable to our current analysis. The authors analyzed the commercial GOSTAR database that includes the GVKBIO Medicinal Chemistry Database and Target Class Database. Figure 1 in this study reports the number of compounds linked to human protein targets in journals published over different years. Promiscuity analysis is not reported. On a close look, the trends of compound growth over time reported in the Southan et al. paper and our current study are actually rather similar. In addition, we note that GVKBIO apparently covers 120 journals while ChEMBL covers 47.

4. The data filtering criteria for the collection of high- and low-confidence data sets have been detailed in the section “Data sets of varying confidence levels”. No promiscuity score was calculated taking potency into account. The degree of compound promiscuity was assessed on the basis of qualifying activity records, as stated in our paper. In the hypothetical example given by the referee, the two compounds would share the same degree of promiscuity (2).

5. We did not score compound-target-assay values, only recorded them. Combination of compounds, targets, and assays analyzed in our study were unique for individual years. Thus, there were no identical values for the same compound-target-assay combination in different years. However, there were cases where a compound was tested in various assays against the same target in different years, yielding the same or different potency values. In these cases, we only recorded the first year, as specified in the revision.

6. Yes, “release date” means “publication date”, as clarified in the revision.

7. This is a (quite plausible) hypothesis, like others put forward in trying to rationalize promiscuity. Our view is data-centric, as commented on in our study. The papers-per-compound ratio in ChEMBL typically is close to one (with relatively small sets of standards/references used in drug target assays being an exception). However, we cannot deduce much from this ratio because inactivity records are sparse in the literature and are not considered in ChEMBL (and other repositories). However, they are available in PubChem. As reported previously (PMID: 23605807), approximately 77% of screening hits in PubChem have been tested in at least 50 confirmatory assays. Yet, the detectable average degree of promiscuity for screening hits is only 2.5, thus only slightly larger than for ChEMBL compounds.

8. There certainly is always more one could possibly do.

9. The promiscuity of kinase inhibitors has been explored in previous studies.
For example, in reference 5, compound promiscuity for five well-known target families including kinases was reported. It was observed that kinase inhibitors did not display a higher degree of promiscuity than compounds directed against other major therapeutic targets; a trend we have consistently observed (please, also consider PMID: 25051177). In the context of promiscuity analysis, kinase inhibitors are a good example for cases where expectations/hypotheses are not necessarily consistent with results obtained on the basis of current data.

10. As suggested, the degree of promiscuity was further analyzed for compounds with varying logP values and molecular weight, as reported in new Figure 5 of our revision.

11. The in part unexpected results reported in reference 18 had inspired us to also have a close look at promiscuity of bioactive compounds on a time course; as we know it takes some thought and time to conceptualize such studies and understand which questions to ask next.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

56 Views

14 May 2015 | for Version 1

Georgia B. McGaughey, Vertex Pharmaceuticals, Boston, MA, USA

56 Views Cite this report Responses(1)

Approved

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (1)

Author Response F1000Research Advisory Board Member

15 Jun 2015

Jürgen Bajorath, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113, Germany

1. We absolutely agree with the referee that it would be highly desirable to have more pharmaceutically relevant data sets available in the public domain, for instance, large profiling data sets, which we would love to analyze. We also note that ChEMBL data have been, and continue to be, incorporated in a number of different public databases. Concerning commercial databases, we have made a decision not to analyze (and publish about) databases that are not publicly available (although they are occasionally offered to us).

2. It is also true that publicly available phenotypic data sets would be of great help for the field moving forward. Unfortunately, very little phenotypic data is currently available. For ChEMBL activity records, it is not possible to trace potential phenotypic origin at present (we assume very little target-based activity data derived from phenotypic assays is currently available, if any).

3. The discussion on promiscuity and polypharmacology has been further extended in our revision, as suggested.

4. The high- and low-confidence data sets will be made publicly available as an open access deposition, as stated in the revised manuscript.

View more View less

Competing Interests

No competing interests were disclosed.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Paolini GV, Shapland RH, van Hoorn WP, et al.: Global mapping of pharmacological space. Nat Biotechnol. 2006; 24(7): 805–815. PubMed Abstract | Publisher Full Text

[2] 2. Boran AD, Iyengar R: Systems approaches to polypharmacology and drug discovery. Curr Opin Drug Discov Devel. 2010; 13(3): 297–309. PubMed Abstract | Free Full Text

[3] 3. Jalencas X, Mestres J: On the origins of drug polypharmacology. Med Chem Comm. 2013; 4(1): 80–87. Publisher Full Text

[4] 4. Hu Y, Bajorath J: Compound promiscuity: what can we learn from current data? Drug Discov Today. 2013; 18(13–14): 644–650. PubMed Abstract | Publisher Full Text

[5] 5. Hu Y, Bajorath J: High-resolution view of compound promiscuity. [v2; ref status: indexed, http://f1000r.es/1ig]. F1000Res. 2013; 2: 144. PubMed Abstract | Publisher Full Text | Free Full Text

[6] 6. McGovern SL, Caselli E, Grigorieff N, et al.: A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. J Med Chem. 2002; 45(8): 1712–1722. PubMed Abstract | Publisher Full Text

[7] 7. Baell JB, Holloway GA: New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem. 2010; 53(7): 2719–2740. PubMed Abstract | Publisher Full Text

[8] 8. Baell J, Walters MA: Chemistry: Chemical con artists foil drug discovery. Nature. 2014; 513(7519): 481–483. PubMed Abstract | Publisher Full Text

[9] 9. Dimova D, Hu Y, Bajorath J: Matched molecular pair analysis of small molecule microarray data identifies promiscuity cliffs and reveals molecular origins of extreme compound promiscuity. J Med Chem. 2012; 55(22): 10220–10228. PubMed Abstract | Publisher Full Text

[10] 10. Gaulton A, Bellis LJ, Bento AP, et al.: ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012; 40(Database issue): D1100–D1107. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Wang Y, Xiao J, Suzek TO, et al.: PubChem’s BioAssay Database. Nucleic Acids Res. 2012; 40(Database issue): D400–D412. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Law V, Knox C, Djoumbou Y, et al.: DrugBank 4.0: Shedding new light on drug metabolism. Nucleic Acids Res. 2014; 42(Database issue): D1091–D1097. PubMed Abstract | Publisher Full Text | Free Full Text

[13] 13. Yildirim MA, Goh KI, Cusick ME, et al.: Drug-target network. Nat Biotechnol. 2007; 25(10): 1119–1126. PubMed Abstract | Publisher Full Text

[14] 14. Hu Y, Bajorath J: Promiscuity profiles of bioactive compounds: potency range and difference distributions and the relation to target numbers and families. Med Chem Commun. 2013; 4: 1196–1201. Publisher Full Text

[15] 15. Hu Y, Bajorath J: What is the likelihood of an active compound to be promiscuous? Systematic assessment of compound promiscuity on the basis of PubChem confirmatory bioassay data. AAPS J. 2013; 15(3): 808–815. PubMed Abstract | Publisher Full Text | Free Full Text

[16] 16. Mestres J, Gregori-Puigjané E, Valverde S, et al.: Data completeness--the Achilles heel of drug-target networks. Nat Biotechnol. 2008; 26(9): 983–984. PubMed Abstract | Publisher Full Text

[17] 17. Rognan D: Chemogenomic approaches to rational drug design. Br J Pharmacol. 2007; 152(1): 38–52. PubMed Abstract | Publisher Full Text | Free Full Text

[18] 18. Hu Y, Bajorath J: Monitoring drug promiscuity over time [v2; ref status: indexed, http://f1000r.es/4oa]. F1000Res. 2014; 3: 218. PubMed Abstract | Publisher Full Text | Free Full Text

[19] 19. Southan C, Varkonyi P, Boppana K, et al.: Tracking 20 years of compound-to-target output from literature and patents. PLoS One. 2013; 8(10): e77142. PubMed Abstract | Publisher Full Text | Free Full Text

[20] 20. Molecular Operating Environment (MOE), 2014.09. Chemical Computing Group Inc., 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7. 2014. Reference Source

[21] 21. Leeson PD, Springthorpe B: The influence of drug-like concepts on decision-making in medicinal chemistry. Nat Rev Drug Discov. 2007; 6(11): 881–890. PubMed Abstract | Publisher Full Text

[22] 22. Tarcsay Á, Keserű GM: Contributions of molecular properties to drug promiscuity. J Med Chem. 2013; 56(5): 1789–1795. PubMed Abstract | Publisher Full Text

[23] 23. Hu Y, Jasial S, Bajorath J: Sets of ChEMBL compounds with high or low confidence activity data. Zenodo. 2015. Data Source

Promiscuity progression of bioactive compounds over time

Abstract

Keywords

Revised Amendments from Version 1

Introduction

Materials and methods

Growth of compound activity data

Table 1. ChEMBL v.20 statistics.

Data sets of varying confidence levels

Monitoring compound activity records over time

Results and discussion

Growth of compounds, targets, assays, and activity records

Figure 1. Growth of compounds, targets, assays, and activity records.

Table 2. Data growth.

High- and low-confidence data sets

Table 3. Data with different confidence levels.

Compound promiscuity over time

Figure 2. Compound promiscuity over time.

Table 4. Increasing promiscuity.

Figure 3. Compounds with constant promiscuity.

Figure 4. Promiscuity of compounds available since 1994.

Compounds with varying molecular weight and lipophilicity

Figure 5. Average degree of promiscuity for compounds with varying molecular weight and logP values.

Current promiscuity levels for bioactive compounds

Table 5. Current promiscuity rates.

Conclusions

Data availability

Author contributions

Competing interests

Grant information

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated