Dimension reduction of Malaria Box data allows efficient compound prioritization

New anti-malarial drugs are needed to meet the challenge of Background: artemisinin resistance and to achieve malaria elimination and eradication. The new anti-malarial compounds are expected to have many desirable properties, such as activity against multiple stages of , low host cytotoxicity, Plasmodium and low propensity for resistance development, but whether and how these properties might be linked to each other is not clear. A better understanding of the relationship between activities of compounds against different stages of could help in the development of strategies to prioritize Plasmodium compounds with maximum potential for further development. We Methods: utilized the large amount of data that has recently been generated on 400 anti-malarial Malaria Box compounds and performed statistical analyses, such as rank correlation, hierarchical clustering, and principal-component analyses, to test associations between activities against different stages of , Plasmodium other pathogens, and human cells. We found significant positive Results: correlations between the activities of compounds against different stages of . Our results also show toxicity associated with assays conducted Plasmodium at higher compound concentrations. Principal-component analyses (PCA) of the data allowed differentiation of -specific activity from general Plasmodium toxicity and predicted success in evolution of resistance. We found that in vitro a single principal-component can capture most of the desirable properties of Malaria Box compounds and can be used to rank compounds from most desirable to least desirable activity-profile. Here, we provide a Conclusions: systematic strategy to prioritize Malaria Box compounds for further development. This approach may be applied for prioritization of anti-malarial compounds in general. Referee Status:

Introduction Malaria killed about half a million people in the year 2015, and 70% were children under the age of five 1 .The emergence and spread of resistance towards frontline anti-malarial drugs in South-East Asia has created an urgent need to discover new drugs.In addition, new drugs are needed to meet the objective of malaria elimination and global eradication, for which the currently available drugs are not adequate 2 .The desirable characteristics of new clinical candidates, also called Target Compound Profile (TCP), include high potency and fast killing of the asexual erythrocytic stage for quick relief of symptoms, high plasma half-life to reduce treatment duration, activity against the sexual stages to prevent transmission, activity against the liver-stage to avoid relapse and for prophylactic use, activity against multiple species of Plasmodium, and reduced propensity for the development of resistance 3 .New anti-malarial drugs must also be safe for mass administration, and for children and pregnant women, who are most vulnerable to malaria 3 .
It is currently not well understood how TCP properties are related to each other.This makes it difficult to assess whether it would be feasible for a single compound to have all TCP properties and what strategies could be adopted to find such candidates.With the discovery of thousands of active compounds from the highthroughput assays against the erythrocytic stage of P. falciparum [4][5][6][7][8] , it has become imperative to find a prioritization strategy that can identify the most promising candidates for further development.For a subset of antimalarial compounds identified from high-throughput screens in the so-called "Malaria Box", many of the TCP properties have been assessed.The Malaria Box is a set of 400 compounds selected based on their potent activity against the erythrocytic stage of P. falciparum, chemical diversity and commercial availability 9 .These compounds were made available free of cost to researchers, thus catalysing a number of studies, including the screening of these compounds against multiple Plasmodium stages, eukaryotic pathogens and human cells 10 .Some of these compounds have also been tested for their propensity for resistance generation 11 .Here, we utilized the large amount of data generated on Malaria Box compounds and found significant associations between different TCP properties.Based on these observations, we propose a prioritization strategy for anti-malarial compounds for further development.

Methods
The screening data on Malaria Box compounds was obtained from Van Voorhis et al, 10 who compiled the previously published data on Malaria Box compounds (55 assays), as well as their own data (236 assays).We rank transformed all assay values, such that higher values represent higher inhibition.
In case multiple assays were available for a given stage or concentration, their median values were taken: there were nine assays reporting EC 50 values against the asexual stage of P. falciparum, one assay against asexual stage at high compound concentration (10 µM), five gametocytocidal assays conducted at 0.5-1 µM compound concentrations, ten gametocytocidal assays conducted at 2.5-5 µM compound concentrations and six gametocytocidal assays conducted at 10-12.5 µM compound concentrations.There was one assay each at lower and higher compound concentrations against liver (5 µM and 50 µM, respectively) and ookinete stages (1 µM and 10 µM, respectively).Values were also similarly combined for parasites with multiple assays, such as Babesia sp. and Mycobacterium tuberculosis.
The data on the in vitro resistance evolution of Malaria Box compounds was obtained from Corey et al. 11 .
All statistical analyses were performed in the R software v3.3.1 (https://www.r-project.org/).R commands hclust and prcomp were used for hierarchical clustering and PCA analyses respectively.PCA analyses were performed on the activity data against different Plasmodium stages and human cells.Rank correlation values were used to create the distance matrix for the hierarchical clustering.

Relationship between activity of Malaria Box compounds in different assays
Van Voorhis et al. have recently reported their large-scale screening data on Malaria Box compounds, which was compiled along with the previously published data 10 .We first reduced the dimensionality of this data by combining variables that describe activity against the same Plasmodium stage and pathogen.Assays conducted at higher concentration of compounds may provide different results from those performed at lower concentration, and thus assays conducted at different concentrations were combined separately.
Figure 1 provides an overview of the relationship between different properties of Malaria Box compounds in the form of a correlation matrix.Multiple observations can be made from this matrix.There is a moderate, but significant correlation, between activity against the asexual stage and the gametocyte stage of P. falciparum (Spearman rank correlation 0.43, between EC 50 values and % inhibition of gametocytes at 1 µM compound concentration).The correlation between EC 50 values against asexual stage and gametocyte stage gets lower when gametocytocidal activity was screened at higher compound concentrations (Spearman rank correlation 0.17, at 10µM).The gametocytocidal activity at higher concentrations shows a higher correlation with inhibitory activity against different pathogens, including M. tuberculosis and human fibroblast cells (Figure 1).Assays conducted at higher concentrations against asexual, liver and ookinete stages also show a higher correlation with toxicity against human cells and other cell types (Figure 1).These observations suggest that assays conducted at high compound concentration show general toxicity against a wide variety of cells, including human cells, thus hits identified from these assays should be used with caution.
To further understand the relationship between activities against different Plasmodium stages, we performed hierarchical clustering of the data.Three major clusters were evident (Figure 2).Cluster 1 consists of asexual and gametocyte assays conducted at high compound concentrations.Cluster 2 consists of asexual and gametocyte assays conducted at low compound concentrations.Cluster 3 consists of assays conducted against liver and ookinete stages.Separate clustering of assays against asexual and gametocyte stage at different compound concentrations again suggests general toxicity  shows the inhibitory activity of the compounds with a higher number representing a higher activity.
of assays conducted at higher compound concentrations.There are two possibilities why liver and ookinete stages cluster together.These two stages may be physiologically more similar to each other, or it may reflect the fact that these assays were conducted against P. berghei, compared to other assays that were conducted against P. falciparum.
PCA analyses allows differentiation of general and specific toxicity Given the possible confounding roles of compound concentration and Plasmodium species used in the screening, the prioritization of compounds that have pan-stage activity, but low host cytotoxicity, becomes difficult.We thus tested whether the Principal-Component Analyses (PCA) may be utilized to differentiate Plasmodium-specific activity from general toxicity.PCA analyses of the data from different Plasmodium stages and human cells lead to the identification of principal-components, which showed different properties with respect to general and specific activity.PC1 showed high correlation with assays conducted at higher compound concentrations and against a variety of cell types, including human cells (Figure 3), suggesting that PC1 is related to general toxicity.PC3, on the other hand, showed higher correlations with assays conducted at low compound concentrations, but negative or lower correlation with assays conducted at high compound concentrations in Plasmodium, against different pathogens and human cells (Figure 3), suggesting that PC3 is related to specific activity against Plasmodium across different stages.PC2 showed high positive correlation with the liver and ookinete stage assays, but showed negative correlations with asexual and gametocyte stage assays, suggesting that this component reflects activity against these two stages or against P. berghei in which these assays were performed.

General toxicity predicts in vitro resistance evolution
In vitro resistance evolution has recently been attempted against 30 Malaria Box compounds with three independent lines for each compound 11 .We next tested whether the Plasmodium-specific activity or general toxicity estimated from the principal components might predict in vitro resistance evolution.Compounds for which resistance could not be developed showed significantly higher PC1 values (Figure 4A).These compounds also showed higher human toxicity (Figure 4B) and enrichment of probe-like compounds, which have chemical properties associated with higher nonspecific activity 9 (Figure 4C).These results suggest that general toxicity of compounds may lead to lower success in in vitro resistance evolution.On the other hand, high PC3 values were associated with higher likelihood of in vitro resistance generation (Wilcox test p = 0.02, not shown).

Prioritization of Malaria Box compounds
In general, our results suggest that compounds that show high PC3 values should be prioritized for further development, including target identification by in vitro resistance evolution.Table 1 lists the top 20 Malaria Box compounds with the highest PC3 values.These compounds show high activity against multiple stages at a low concentration, but low activity against human cells.In total, 11 of these compounds show favourable oral bioavailability values.Some of these are active against other pathogens (Table 1).
The values of three principal components for all Malaria Box compounds are available in Dataset 1 14 .
Figure 3. Principal components (PC) differentiate general toxicity from Plasmodium specific activity.The three principal components PC1, PC2 and PC3 explained 30%, 16% and 13% of the variation in the data, respectively.PC1 showed positive correlation across different assays, suggesting that it reflects general toxicity.PC3 showed higher positive correlation only with assays conducted at lower compound concentration, but lower or negative correlation with assays conducted at higher compound concentration, other parasites and human cells, suggesting that it reflects pan-stage specific activity against Plasmodium.

Discussion
The wide availability of Malaria Box has catalysed a number of studies on these compounds 10 .Prioritization of compounds based on a large number of variables is not straightforward.Here, we analysed this data and found that a single variable (PC3) can capture most of the desirable compound properties: activity against multiple Plasmodium stages and low host cytotoxicity, thus greatly simplifying the task of compound prioritization.Our analyses suggest that screening at high compound concentrations can lead to general toxicity and thus should be avoided.Thus the idea that hits identified from multiple assays should be more confident 10 needs to be considered carefully when hits are identified from high concentration assays.The consensus approach might lead to the selection of compounds with general toxicity.
We found significant correlation between activity against the asexual stage and the gametocyte stage of P. falciparum (Spearman rank correlation 0.43), which suggests that it might be easier to find compounds that have activity against both these stages, even though the two stages have different growth properties.The correlations between asexual stage with the liver and Table 1.Top Malaria Box compound with the highest PC3 values.These compounds show high activity against multiple stages at a low concentration, but low activity against human cells.The mouse oral bioavailability was obtained by measuring the plasma concentration of the compounds with a single high oral dose (140 μM/kg) 10 .Compounds with favourable plasma concentration (plasma C max > 1μg/ml) are indicated.Activity of compounds against other parasites is also indicated.The oral bioavailability data and compound activity data against other parasites was obtained from Van Voorhis et al. 10 .The PC3 values for all Malaria Box compounds are available in Dataset 1 14 .

Leishmania donovani
Trypanosoma cruzi Brugia malayi Schistosoma mansoni ookinete stages were low (Figure 1 and Figure 2).This could reflect different physiological states of liver and ookinete stages from asexual and gametocyte stages, but it might also reflect that liver and ookinete stage assays were performed in P. berghei, rather than P. falciparum.Thus, the development of higher throughput liver and ookinete stage assays in P. falciparum could be valuable.It is important to note that the correlation values that we report should be considered an underestimate, as inhibition values for assays against the same stage show large variability, e.g.median rank correlation among nine EC 50 values against asexual stage of P. falciparum was 0.51.The possible reasons for this variability have previously been discussed 10 .
The difficulty in the evolution of in vitro resistance is considered a very desirable property of a compound 12 given that a number of anti-malarial drugs are becoming less effective because of resistance generation 13 .Our observations suggest caution in interpreting the results of in vitro resistance evolution experiments.The failure to obtain resistance in vitro could be because of general toxicity of the compound on the erythrocyte hosts.Thus we suggest that the host toxicity of compounds should be thoroughly evaluated before conducting the labour-intensive in vitro resistance evolution experiments.
While we have prioritized compounds according to their panstage activity and low human toxicity, we would like to stress that compounds that show activity across pathogens and human cells may also be potential leads, if their toxicity could be managed.One possibility to reduce the toxicity of a compound is to identify its target in the parasite and its human ortholog, and utilize the three-dimensional structures of the compound with the target to modify the compound to increase selectivity.However, target identification of these compounds might be more difficult using in vitro resistance development.

Open Peer Review
Current Referee Status: The article shows an analysis of the available biological data on the 400 compounds of the Malaria Box set trying to understand if there is any correlation between favorable parasitological properties and undesirable unspecific or toxicity aspects.Most of the data have been retrieved from Van Voorhis et al., 2016.Some of these data have been produced by GSK and this reviewer is one of the co-authors.Statistical analyses reveal some correlations that would allow prioritization of phenotypic hits based on the most desirable antimalarial compound profiles.
Reviewer finds conclusions very interesting specially those related to the potential misleading information provided by malaria assays carried out at high concentration of compound.Principal Component Analyses shows strong correlation between toxic effects in human cells and antimalarial activities produced at the highest compound concentration, suggesting these antimalarial effects not being specific for parasites.Identification of a single principal component capturing most desirable properties of malaria box compounds is an important result and could be the basis to rank compounds in larger datasets.
There are some additional points that could help to improve current version or be considered for potential follow ups.
Author describes the most desirable properties for novel antimalarial compounds but comment that how these properties are linked to each other is not clear.However most of them (speed of action, activity against different stages of Plasmodium, propensity to select for resistance, etc) are strongly influenced by antimalarial target.So, mode of action (MoA) should be a clear link among the different properties and compounds displaying similar MoA should show similar properties.Reviewer agrees that pharmacokinetic aspects of antimalarial compounds (e.g.long half life), are structure based and should be much less related to antimalarial MoA.
Despite there is still a lack of understanding on the antimalarial targets of phenotypic hits there is already some information at this regard.I suggest author to include in the analysis current knowledge on mode of action to identify those targets providing the most desirable antimalarial properties.This would be specially interesting if the analysis is expanded to larger datasets.
Conclusions based on resistance evolution should be put in the context of the small number of compounds with available results.Larger sets of data would be needed to confirm this trend.Nevertheless reviewer agrees with poly-pharmacology and general toxicity as two of the main properties that negatively influence selection of resistance.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that
2. 1. 2.

3.
I have read this submission.I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Reviewer is one of the co-authors of Van Voorhis et al., 2016.That paper contains Competing Interests: most of the biological data used in current analysis.However, reviewer does not feel this to have an influence in the output of my review.The initial correlations show that there are some links between the different life cycle stages of inhibition for malaria, and certainly these are strong correlations compared to say the correlation between killing Plasmodium and killing distantly related worm pathogens.
The most useful insight from the paper is in Figure 4 where it's clear that Compounds where no resistance could be generated have higher PC1 values.
Compounds where resistance could not be generated are more significantly represented by fibroblast inhibition at <10 uM.Given that compounds where no resistance can be made are deemed by some to be part of higher value scaffolds (we all want drugs which don't generate resistance) -then this observation warrants more exploration.Some comments for the author -which would add value to this study (or could form the basis of the next study): The PCA based on cytotoxicity is built from the human fibroblast data.This is a good start, but the real value would be to use all the NCI59 cell data from the US National Cancer Institute and to see whether there is anything else that emerges.
As a set of compounds, the Malaria Box was the fruit of a very small investment -and so the compounds were often selected based on availability and cost.It would be good to run the same PCA analysis on the Pathogen Box structures.For pathogen box there was flexibility to make any compound that was required for the collection, and so the quality of the structures chosen was arguably higher.
Two other types of compound data could be analysed.First it would be interesting to see the same analysis used on the TCAMS set -does that help in some ways to reprioritize these structures.
Second, what happens with PC1 if you look at say the 8000 compounds taken in development by

Figure 1 .
Figure 1.Correlations between inhibition values across different assays.Spearman rank correlations are shown between assays whose values were rank transformed, such that higher values indicate higher inhibition.Gray boxes indicate p values > 0.05.The assays performed at higher concentrations in Plasmodium show higher positive correlations across different assays, including activity against human cells, suggesting that assays conducted at high compound concentration show general toxicity.

Figure 2 .
Figure 2. Hierarchical clustering of the assay data against different Plasmodium stages.The three major clusters are evident that correspond to activity against P. falciparum at high concentrations (Cluster 1, leftmost), activity against P. falciparum at low concentrations (Cluster 2, middle), and possibly activity against P. berghei (Cluster 3, rightmost).Rank correlation values were used to create the distance matrix for the clustering.The color key shows the inhibitory activity of the compounds with a higher number representing a higher activity.

Figure 4 .
Figure 4. Higher general toxicity of compounds against which in vitro resistance development was not successful.The in vitro resistance evolution was attempted for 30 Malaria Box compounds and was successful for 13 compounds11 .Compounds for which resistance evolution was not successful showed (A) higher PC1 values (Wilcox p 0.004), (B) lower EC 50 against human fibroblasts cells (Wilcox p 0.130), and (C) a higher proportion of probe-like compounds, as classified by Medicines for Malaria Venture 9 (Fisher p 0.100).
Tres Cantos Medicines Development Campus Unit, Malaria Unit, GlaxoSmithKline, Tres Cantos, Spain Medicines for Malaria Venture (MMV), Geneva, Switzerland This is an analysis of the Malaria Box set, and an attempt to try to understand whether there are any correlations between activity and safety signals or resistance development using Principal Component Analyses.The author defines a single principal component which can be used to capture most of the desirable compounds of the Malaria Box.The data used are from the Malaria Box and the recent summary in Van Voorhis et al., 2016 (of which this reviewer is one of the 150 co-authors).