Monitoring drug promiscuity over time

Drug promiscuity and polypharmacology are much discussed topics in pharmaceutical research. Experimentally, promiscuity can be studied by profiling of compounds on arrays of targets. Computationally, promiscuity rates can be estimated by mining of compound activity data. In this study, we have assessed drug promiscuity over time by systematically collecting activity records for approved drugs. For 518 diverse drugs, promiscuity rates were determined over different time intervals. Significant differences between the number of reported drug targets and the promiscuity rates derived from activity records were frequently observed. On the basis of high-confidence activity data, an increase in average promiscuity rates from 1.5 to 3.2 targets per drug was detected between 2000 and 2014. These promiscuity rates are lower than often assumed. When the stringency of data selection criteria was reduced in subsequent steps, non-realistic increases in promiscuity rates from ~6 targets per drug in 2000 to more than 28 targets were obtained. Hence, estimates of drug promiscuity significantly differ depending on the stringency with which target annotations and activity data are considered.


Introduction
Promiscuous compounds specifically interact with multiple biological targets 1 . As such, they are distinct from compounds that exhibit assay liabilities or engage in various non-specific interactions. Compound promiscuity is often functionally relevant and represents the molecular origin of polypharmacology 2 , a concept that experiences increasing interest in drug discovery. Drugs are often, but not always, found to act on multiple targets and modulate multiple cellular pathways and/or signaling cascades. Such effects might often substantially contribute to therapeutic efficacy, for example, in cancer treatment 3 . The potentially far reaching consequences of drug polypharmacology for therapy, the frequency of these effects, and likely pros and cons are just beginning to be understood.
Experimentally, promiscuity can be assessed by profiling of compounds or drugs on arrays of biological targets 1,2 , although such studies might often only provide an incomplete picture of in vivo effects. The same applies to computational estimates of promiscuity. Given the increasingly large amounts of compound activity data that are becoming available, the promiscuity of drugs and bioactive compounds can be explored through data mining by systematically evaluating activity annotations 1 . For the assessment of compound and drug promiscuity, public databases such as ChEMBL 4 , the major repository of compounds and activity data from medicinal chemistry, the PubChem BioAssay database 5 , the major repository of screening data, and DrugBank 6 , which collects approved and experimental drugs, have become indispensible resources.
Computational analyses reported thus far have suggested different degrees of promiscuity among bioactive compounds and drugs, dependent on the compound sources used and the methods applied. For example, drug-target network analysis has indicated that a drug might on average act on two targets 7 . Other computational studies have suggested that drugs might on average interact with two to seven targets depending on the target classes the drugs are active against 8 . In addition to varying compound sources and analysis concepts, taking activity measurement characteristics and data confidence criteria into account is also of critical importance for compound promiscuity analysis. For example, it has been shown that the increase in the number of compounds with activity against targets from different families in ChEMBL has mostly resulted from assay-dependent IC 50 but not (assay-independent) K i measurements (equilibrium constants) 9 . In addition, by exclusively considering high-confidence activity data, it has been found that the majority of promiscuous bioactive compounds interact with two to five targets from the same target family, are predominantly active in sub-µM range, and display potency differences within one or two orders of magnitude against their targets 10 . This represents a prevalent promiscuity profile among bioactive compounds. On the basis of high-confidence activity data, it has also been calculated that compounds from ChEMBL interact on average with one to two targets and compounds from PubChem confirmatory assays with two to three targets 11 . By contrast, target annotation analysis has suggested that approved drugs interact on average with close to six targets, whereas experimental drugs (including candidates in clinical trials) interact with one to two targets 11 . The reasons for this apparent discrepancy in target numbers between drugs at different development stages are currently unknown. As increasing amounts of activity data become available, it is likely that recently detected promiscuity rates might further increase. However, the magnitude of such increases as a consequence of data incompleteness 12 is difficult to predict, especially considering the low promiscuity rates that can currently be confirmed on the basis of high-confidence data 1,11 .
In this study, we further extend the computational analysis of promiscuity by evaluating the progression of drug promiscuity rates over time, which required a systematic assessment of activity records with release dates. Different data selection criteria were applied and the calculated promiscuity rates were compared to available drug target annotations. Small to moderate increases in drug promiscuity over time were detected when high-confidence activity data were considered. Lowering the stringency of data selection criteria led to unrealistic estimates of promiscuity rates and their progression.

Data collection
From ChEMBL (release 18) 4 , compounds with direct interactions (i.e., assay relationship type "D") with human targets at the highest confidence level (i.e., assay confidence score 9) were collected. The two ChEMBL parameters 'assay relationship type' and 'assay confidence score' qualify and quantify the level of confidence that the activity against a given target is evaluated in a relevant assay system, respectively. Accordingly, type "D" and score 9 represent the highest level of confidence for activity data. In addition, two types of activity measurements were considered including assay-independent equilibrium constants (K i values) and assay-dependent IC 50 values. To ensure a high level of data integrity, only compounds with explicitly defined K i or IC 50 values were selected. Hence, approximate measurements such as ">", "<", and "~" were disregarded. Compounds with multiple K i or IC 50 measurements for the same target were retained if all these values fell within the same order of magnitude. Otherwise, the target activity was omitted from further consideration. Structures of all qualifying bioactive compounds were standardized using the Molecular Operating Environment (MOE) 13

Amendments from Version 1
In this version, all comments by the third reviewer have been addressed (please, also see our comments in response to Reviewer 1).

1.
It is "order of magnitude".

2.
The majority of these 282 drugs were only annotated with a single target, which is stated in the revision.

3.
Target family profiles have been provided for top five of 14 drugs with the largest changes in promiscuity in a new Table 4. Targets of these drugs belong to a wide range of related or unrelated families.

4.
F1000Research has been informed to re-install the ZENODO link.

5.
No pre-defined potency cut-off was applied in this study. As suggested, potency distributions are reported for compounds in three data sets with varying confidence levels in a new Figure 10.

6.
The set of 26 substructures indicative of PAINS liability have been searched against all drugs. The promiscuity rates of PAINS-positive drugs over time are reported for three data sets in a new Figure 11.

REVISED
into 14 time intervals, as illustrated in Figure 1. All activity records reported before 2000 were assigned to 2000, the starting point of our analysis, and all activity data released after 2012 were assigned to the last period ">2012". For each time interval, the cumulative activity profile was recorded. Hence, changes in the promiscuity rate of a drug were successively determined over the years. Cumulative activity profiles were compared to target annotations available in DrugBank.

Low-confidence data sets
In order to investigate the effect of activity data confidence levels on drug promiscuity, two data sets with lower confidence were assembled from ChEMBL (release 18). For the generation of lowconfidence data sets, two criteria that influence the compound data integrity, i.e., the confidence level of activity and the type of activity measurements were disregarded in subsequent steps. In lowconfidence set 1, the criterion of activity measurement type was not considered. Hence, in addition to K i and IC 50 values, all other potency annotations were equally considered (including "%max", "Efficacy", "EC 50 ", "K d ", and "Residual Activity") for all compounds with 'direct interactions' with human targets and assay confidence score 9. In addition, the consistency and quality of potency measurements was not considered. In low-confidence set 2, the confidence level of activity (assay relationship type and assay confidence score) was not considered, in addition to the type of activity measurements. Therefore, the stringency of activity data and compound selection decreased from the high-confidence set over lowconfidence set 1 to low-confidence set 2.
Progression of drug promiscuity over time was systematically evaluated on the basis of all three data sets.

Results and discussion
Bioactive compounds and approved drugs On the basis of the selection criteria described above, a total of 143,424 bioactive compounds with high-confidence activity data were obtained from ChEMBL. These compounds were active against 1376 different targets and yielded 219,602 compound-target and transformed into canonical SMILES strings 14 . The so assembled compound set exclusively utilized high-confidence activity data (high-confidence data set).
Approved small molecule drugs with available structure and activity information were collected from the latest release of DrugBank (version 4.1) 6 . To synchronize the activity analysis in ChEMBL and DrugBank, all reported 'drug action' targets, metabolizing enzymes, transporters, and carriers were assembled for approved drugs. In some instances, drug target activity might refer to a group of related proteins. For example, atomoxetine was annotated with N-methyl-D-aspartate (NMDA) receptor including seven subtypes. Accordingly, seven UniProt 15 accession IDs (UniProtIDs) were associated with NMDA receptor. Thus, the maximal number of target annotations was collected for approved drugs on the basis of UniProtIDs. Drug structures were also standardized using MOE and transformed into canonical SMILES strings.

Monitoring drug activity records over time
Most compound activity data in ChEMBL are extracted from medicinal chemistry literature and patent sources 4 . Therefore, the release dates of activity data are frequently recorded in this database. However, DrugBank does not report dates for individual target annotations. To systematically monitor drug promiscuity over time, all approved drugs from DrugBank were mapped to ChEMBL by comparing canonical SMILES strings. If a drug (D) and a bioactive compound (B) shared the same SMILES string, a match was obtained. It should be noted that the name of a drug in DrugBank and ChEMBL might differ (i.e., matching by drug/compound name is not reliable). For each match, activity data release dates of compound B were recorded and assigned to drug D. Each activity record represented a target annotation (the terms target activity and target annotation are synonymously used   I  I  I  I  I  II   I  II   I  II   I  II   I  II  III   I  II  III   I  II  III   I  II  III   I  II  III  0  1  1  1  1  2  2  2  2  3  3  3  3  3 whereas an approved drug was annotated with 7.5 targets. Compared to a recent analysis of promiscuity rates 11 , which also included a previous release of DrugBank, the average promiscuity rate of approved drugs further increased from 5.9 to 7.5, while the degree of promiscuity among bioactive compounds remained essentially constant. To monitor drug promiscuity over time, all approved drugs were mapped to bioactive compounds in ChEMBL for which release dates of activity records were reported (as detailed in the Methods section). For 518 of the 1429 approved drugs taken from DrugBank, high-confidence activity data released over different years were found in ChEMBL. These 518 drugs provided the basis for our time-dependent promiscuity analysis.

Data inconsistency
For the 518 qualifying drugs, we first compared their target annotations in DrugBank and the total number of targets derived from high-confidence activity records in ChEMBL. As reported in Figure 2a, most of the drugs had different numbers of targets in interactions, as reported in Table 1. Furthermore, from DrugBank 4.1, 1429 approved drugs were obtained that were annotated with 1657 target proteins corresponding to 10,679 drug-target interactions (Table 1). Thus, there were nearly 100 times more bioactive compounds than approved drugs. However, with 1657 targets, drugs covered a larger target space than bioactive compounds (1376 targets). On average, a bioactive compound was active against 1. Mazindol (3) Terazosin (7) I maƟnib (24)  Figure 2b-Figure 2d.
Differences in promiscuity rates were quantified, as reported in Figure 3a. Among the 486 drugs (~94%) with varying degrees of promiscuity in DrugBank and ChEMBL, 48 and 58 drugs differed by one and two targets, respectively. By contrast, the promiscuity rates of nearly half of the drugs (247; ~48%) varied by more than five targets. Moreover, for the 10 drugs shown in Figure 3b, the promiscuity rates differed by more than 30 targets, which reflected a particularly high degree of data inconsistency. All of these drugs were annotated with many more targets in DrugBank than targets derived from high-confidence activity records in ChEMBL. The extreme case was olanzapine the promiscuity rate of which differed by 47 targets between the two databases.
In addition to comparing the number of target annotations, the activity profiles of drugs were further examined to determine the consistency of the annotations. As reported in Figure 4, 175 drugs (~34%) had non-overlapping sets of targets in these two databases, which was another surprising finding. The remaining 343 drugs had overlapping yet distinct target sets. However, the majority of these drugs shared only one or two targets, reflecting substantial discrepancies between target annotations.
For the study of changes in drug promiscuity over time, accessing original activity records and their release dates was an essential requirement, as rationalized above. Such information is not available in DrugBank.

Changes in drug promiscuity over time
For individual time intervals, the distribution of drug promiscuity rates was determined, as reported in Figure 6a. The box plots reveal an increase in drug promiscuity rates over time, with a maximal rate of six targets per drug in 2000 and 24 targets per drug in interval >2012. However, median promiscuity rates only slightly increased from one (until 2005) to two (beginning in 2006) targets per drug. The distribution of average promiscuity rates is shown in Figure 6b, which slightly but steadily increased over time from 1.5 to 3.2 targets per drug. The larger relative increase of average than median promiscuity rates indicated that the average values were influenced by small numbers of drugs with large numbers of targets, i.e., a small subset of highly promiscuous drugs, consistent with earlier observations 11 . On the basis of median values, detectable increases in drug promiscuity over time were limited.  Changes in promiscuity over time were also monitored for individual drugs. For each drug, the increase in the cumulative promiscuity rates from its first to its most recent activity records was determined (for the hypothetical example in Figure 1, the increase in promiscuity rates is 2). For the 518 drugs, increases are reported in Table 2. Surprisingly, for 282 drugs (~54%), no increase in promiscuity was detected on the basis of high-confidence activity records. This indicated that the majority of these drugs did not receive additional high-confidence activity annotations since their first records were  target. Exemplary drugs with constant promiscuity rates are shown in Figure 7. For the remaining 236 drugs, increasing numbers of targets were detected. However, in most cases, the increase in target numbers was limited, i.e., the promiscuity rates of 197 drugs increased by one to five targets (Table 2). There were only 14 drugs with an increase in promiscuity rates by 10 or more targets. Five drugs with largest increase in promiscuity rates are shown in Figure 8. . Drugs with constant promiscuity over time. Shown are 12 exemplary drugs having a constant promiscuity rate on the basis of high-confidence activity data. For each drug, the year of its first activity report and the number of targets it was active against are given. For example, brimonidine was first reported to be active against a single target in 1997.  families was determined and followed over time. Table 3 reports the number of drugs with increasing target family annotations. For the majority of drugs, the number of target families increased by one or two. For top five drugs with largest changes in promiscuity (Figure 8), their target family profiles are provided in Table 4. The first activity records of all these five drugs belonged to only one target family including protein kinase family, GPCR subfamily, and transporter subfamilies. Compared to their most recent activity records, the number of target families increased by three to nine, spanning a wide range of related or unrelated target families. It indicated that these drugs might have been tested against a large panel of targets over time and that a number of activities have been confirmed at a high level of confidence. For 47 drugs, the number of target families remained constant.
Drug promiscuity on the basis of low-confidence data sets Two compound sets with lower activity data confidence were also assembled from ChEMBL, as described above. The composition of these sets is summarized in Table 5. Low-confidence set 1 in which the types of activity measurements were not specified contained a total of 605,206 compounds active against 2144 targets, yielding more than 2,600,000 interactions. Low-confidence set 2 in which, in addition, the confidence level of activity was undefined consisted of a larger number of 936,924 compounds active against 3934 targets, yielding more than 6,000,000 interactions. All 518 drugs were mapped to these two low-confidence data sets. The cumulative distribution of these drugs over time is reported in Figure 9a. The number of drugs with low-confidence activity annotations in 2000 increased from 78 (high-confidence set) to 194 (low-confidence set 1) and 335 (low-confidence set 2). On average, ~26 and ~15 drugs became available during each year for low-confidence set 1 and 2, respectively. Figure 9b compares the distribution of average drug promiscuity rates for the three data sets over time. In contrast to the highconfidence data set in which drug promiscuity only slightly increased over the years, the average promiscuity rates of drugs in both low-confidence sets were higher and significantly increased. In low-confidence set 2, the average promiscuity rate was 6.3 targets per drug in 2000 and further increased to 28.2 targets (>2012). Thus, by reducing the stringency of selection criteria for activity records, high average promiscuity rates were obtained. The large increases in average promiscuity rates seen in Figure 9b ultimately resulted in 18 (low-confidence set 1) or nearly 30 (set 2) targets per drug are most likely artificial in nature. The comparison reveals how the choice of different activity data selection criteria, or the lack of well-defined criteria, might bias promiscuity analysis. Year noted that only K i and IC 50 values were considered here, although all other types of potency annotations were included in low-confidence sets 1 and 2. In general, the distribution of the high-confidence set was comparable to the low-confidence set 1. The majority of negative logarithmic potency values ranged from ~4.5 (i.e., ~32 µM) to 7.0 (i.e., 100 nM). By contrast, the majority of potency values in low-confidence set 2 were confined to a narrow range.

PAINS substructures
Compounds that are reactive or cause other non-specific effects in a variety of assays are typically false positives and have been termed pan assay interference compounds (PAINS) 16 . Baell and Holloway described a set of 26 substructures that are indicative of PAINS liability 16 . This set of substructures was utilized as a filter to identify drugs that contain PAINS substructures in our three data sets with varying activity confidence levels. A total of 23 drugs (i.e., ~4.4%) were found to contain PAINS substructures. Figure 11 reports the average promiscuity rates of PAINS-positive drugs compared to all available drugs over time. It can be seen that drugs with potential PAINS liability in two low-confidence sets displayed much higher degrees of promiscuity than the global rates. For example, the latest average promiscuity rate (i.e., >2012) of drugs containing PAINS substructures in low-confidence set 2 increased from ~28.2 Imatinib represented a striking example for the presence of unreliable target annotations under non-stringent data selection criteria (Figure 9c). In both low-confidence sets 1 and especially 2 dramatic increases were observed between 2005 and 2008, ultimately leading to 406 and 689 targets for imatinib, respectively (hence exceeding the total number of targets in the human kinome). By contrast, on the basis of high-confidence activity data, the final (>2012) promiscuity rate of imatinib was 24.
In addition, the distributions of potency values were compared across different data sets, as reported in Figure 10. It should be For the top five drugs with largest changes in promiscuity over time, the total number of targets and families is reported. The family with the first target annotation of a drug is shown in bold. Target family abbreviation: MAPEG, membrane-associated proteins in eicosanoid and glutathione metabolism.

# Drugs
Year (a) high-confidence set low-confidence set 1 low-confidence set 2 Avg. # targets per drug

Year (b)
high-confidence set low-confidence set 1 low-confidence set 2 to ~67.0. By contrast, PAINS-positive drugs in the high-confidence set displayed a comparable degree of promiscuity. These findings suggest that PAINS-related effects might also be controlled by applying rigorous data confidence criteria.

Conclusions
The analysis reported herein was designed to monitor drug promiscuity over time through computational data mining. It was facilitated by systematically collecting available activity records with release dates for approved drugs from the ChEMBL database. For more than 500 drugs, it was possible to assess promiscuity rates over a time course. Current promiscuity rates derived from high-confidence ChEMBL activity records are typically much lower than those calculated from target annotations available in DrugBank, which should merit further consideration. Data selection criteria for the assignment of drug targets might at least in part be responsible for the observed differences. On the basis of highconfidence activity data, an increase in the average drug promiscuity rates from only 1.5 to 3.2 targets per drug was observed. The magnitude of average promiscuity rates was influenced by a small subset of highly promiscuous drugs. Thus, increases in average drug promiscuity over time were generally small. However, they frequently involved targets from at least two families. By contrast, for low-confidence data sets, calculated promiscuity rates were much higher and dramatic increases in apparent drug promiscuity were observed over the years. From our point of view, such trends are unreliable. These observations further emphasize the need for well-defined and stringent data selection criteria for promiscuity analysis. Taken together, the findings reported herein reveal a small to moderate increase in detectable drug promiscuity over time while the volumes of compound activity data rapidly grow.

Data availability
The high-confidence and the two low-confidence drug data sets are made available in ZENODO. For each drug in each set, the ChEMBL activity records are provided for individual time intervals.
ZENODO: Drug activity data, doi: 10.5281/zenodo.11576 17 Author contributions JB conceived the study, YH planned and performed the analysis, YH and JB wrote the manuscript.

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work. Figure 11. Drugs containing PAINS substructures. Shown is the distribution of average promiscuity rates over time for all drugs (solid lines) and for drugs that contain PAINS substructures (dashed lines) in three data sets with varying activity confidence levels, respectively.