Multi-parametric characterization of drug effects on cells

We present here a novel multi-parametric approach for the characterization of multiple cellular features, using images acquired by high-throughput and high-definition light microscopy. We specifically used this approach for deep and unbiased analysis of the effects of a drug library on five cultured cell lines. The presented method enables the acquisition and analysis of millions of images, of treated and control cells, followed by an automated identification of drugs inducing strong responses, evaluating the median effect concentrations and those cellular properties that are most highly affected by the drug. The tools described here provide standardized quantification of multiple attributes for systems level dissection of complex functions in normal and diseased cells, using multiple perturbations. Such analysis of cells, derived from pathological samples, may help in the diagnosis and follow-up of treatment in patients.


Introduction
Advanced precision medicine enables the use of genetic and proteomic information for the characterization of disease states, based on the correlation with detailed medical records and specific pathological manifestations [1][2][3][4][5][6][7][8][9][10][11] . Indeed, multi-component "omics" profiling can report a large number of components of the genome, transcriptome, proteome, interactome and metabolome (to name a few), and detect even a small fraction of them, indicating significant changes from normal 12-16 , yet this fraction may not fully overlap with the currently used repertoire of medical manifestations in complex diseases such as heterogeneous cancers [17][18][19][20][21][22][23][24][25][26] . We address here the possibility of using multi-parametric characterizations of cellular features for identifying novel signatures of functional cell states, and offer quantitative diagnostic, as well as mechanistic measures, of disease progression or suppression following therapy.
It is widely recognized that meaningful understanding of disease states needs to be achieved in the context of the whole body physiology and tissue functional morphology, yet celllevel functional abnormalities lie at the basis of many pathologies, and may thus be identified by recordable cellular attributes. Developments in quantitative light microscopy, either in twodimensional or in three-dimensional model settings, has been combined with live cell microscopy and recent adaptation of microfluidics technologies to screening and personalized treatment optimization with primary cells, spheroids, organoids and tissue biopsies [27][28][29][30][31][32][33] . The difficulty in quantitative characterization of multiple cellular features of biological specimens with diverse morphological and molecular properties creates an urgent need for methodologies that can be standardized, yield strong statistical scores and can be automated for effective application in systems-level biomedicine.
The value of multi-parametric analysis is well recognized in flow cytometry 34 , and is mandatory when the strategies aiming at functional screening depart from cell reporters for specific drug-target interactions. The application of multi-parametric analysis to cell screening has become a common place. Perlman et al. 35,36 have designed a titration-invariant similarity score (TISS) based on multi-parametric doze response matrix for each drug, allowing comparison of drug effects to reflect similarity of mechanisms of action. Classification of patterns displayed by tagged proteins in cells 37-43 is a powerful way to sort sub-cellular localizations, but is bound to cellular responses reflected by the labeled epitopes (e.g. nucleus vs. cytoplasm localization). Tanaka et al. 44 based their analysis on multi-parameter evaluations, using Principal Components Analysis (PCA) to reduce the number of independent parameters so that the multi-dimensional state vectors of cells treated with drugs could be displayed in 3D plots to differentiate or correlate effects and infer similar mechanisms. Melnick et al. 45 obtained their multi-parametric data vectors from 35 tyrosine-kinase-activated cellular assays responding to 1400 kinase inhibitors, and used clustering in Euclidian space to sort perturbations similar in action to known drugs. Screens have also been developed to dissect cellular mechanisms 42,46 and identify proteins involved in cellular processes using interference RNA perturbations 43,47 . Rather than prior definition of the type of effect (e.g. cell death) or a marker to a specific function, the analysis developed here quantifies drug effects on cell phenotypic and molecular attributes in high dimensionality space.
In this study we have determined the effects of the NCI COMBO drug collection [48, Plate number 3948] on five cell lines; the well-established cervical carcinoma-derived HeLa cells and four cell lines derived from bladder cancers 49 , and used light microscopy-based screening to record multiple cellular responses to the tested drugs. Parameters such as total protein levels, cytoplasmic-nuclear distributions and cell death were analyzed at low magnifications (X10/0.25 objectives) 50 , while high-definition imaging, providing detailed intracellular data about fine protein localization (e.g. cytoskeletal fibers, subcellular organelle morphology, etc.) used higher power objectives (60x/.95), providing multi-dimensional information needed to link drug responses with molecular mechanisms and cellular functions.
In order to be able to read properties of cells and measure as many aspects of system-level response to the treatments, we developed an analysis platform for multi-parametric characterization of cellular features and unbiased scores, based on Mahalanobis distances, for identifying the potency of drugs producing a wide variety of effects. The multi-parametric score is a single value representing the difference from control cells, that contain, however, the complete information on the individual contributions of each of the measured parameters, enabling us to identify those that are mostly affected by each perturbation. Furthermore, the score allows quantification of time-dependencies, examination of cell-line differential responses, and prediction of expected synergy of drug combinations.

Cells
HeLa, and four cell lines derived from bladder cancers (UMUC3, TCCSUP, HT1376 and RT4 1 ) were obtained from the ATCC. Cell lines were maintained in DMEM supplemented with 10% FBS (Sigma-Aldrich, Rehovot, Israel) and penicillin/ streptomycin antibiotics. Cells were grown in polystyrene petri dishes.

Drugs
Drugs were obtained from the National Cancer Institute (NCI). The COMBO plate number 3948 48 includes 77 compounds, 23 of which are FDA-approved anti-cancer drugs. Mechanisms of action include anti-metabolite activity, tubulin binding, DNA 1 RT4 cells were received to the lab in November 2006 from the ATCC. RT4 cells are much smaller than HeLa and have different morphology. We could clearly see that there was no HeLa contamination in our RT4 cells.

Amendments from Version 1
Replaced additional data DOI that includes figure legends (missing in previous DOI) Any further responses from the reviewers can be found at the end of the article damage, Hsp90 binding, as well as inhibition of topoisomerases, kinases, the proteasome, angiogenesis, ion channels, palmitoylation, phosphodiesterase, cyclooxygenase, and aromatase.

Sample preparation
Cells were suspended and transferred to 384 well plates (thin plastic bottom, Cat #781091 Greiner-Bio One, D-72636 Frickenhausen) using BioMek FX liquid handling robot (Beckman Coulter, Fullerton, CA 92834) at a density of 1000 cells/well and cultured for one day. Drugs were then added at the ten three-fold dilutions, concentration range: (10-5x10 -4 )*GI 50 , (GI 50 is the concentration of the least sensitive line in the NCI60, see Extended data: Table S1 51 ). Following the specified incubation time with the drugs, cells were fixed, labeled with DAPI for nuclei (Life, Molecular Probes D1306), FITC-Phalloidin for F-Actin (Life, Molecular Probes F432) and indirectly immunolabeled for tubulin (Primary Antitubulin antibodies (SIGMA T6199)) and Cy3-labled secondary antibodies (Alexa Fluor546 -Life, Molecular Probes A11030), and washed by the robot. For drug effect measurement, ten 3-fold dilutions in duplicates were arranged in plate rows, starting from 50 times the median effect concentration specified for the COMBO collection. The transfer order and concentrations from the 96-well COMBO drugs plate into six 384-well plates for each of the 5 cell lines are listed in Extended data: Table S1 51 .
A total of 16 binary combinations were selected from 20 drugs that showed cellular effects. Matrices with two-fold dilutions in duplicate with drug concentrations up and down from the median effect concentrations, as listed in Table 1, were prepared for two cell lines (TCCSUP & UMUC3). Four such matrices were arranged in one 384-well plate, and rearranged for presentation, including duplicate averaging, as "virtual plates".  Figure 1 using the equation in the Methods. The table lists D m the column number of half effect for the drug rows marked by arrows in Figure 1. The higher the D m number is, the more effective the drug is at higher dilutions. The corresponding concentration for a specific drug is 10*GI 50 *3^(1-D m /2) (3-fold dilutions in duplicates from initial 50 fold dilution of the COMBO supplied concentration, set as 500 times the drug GI 50 ).

Imaging
Plate scanning and multi-color image acquisition was performed by WiScan Argus (Idea-biomedical, Rehovot 76705, Israel; https://idea-bio.com) a fast, high-resolution screening microscope system [52][53][54] . Images were stored locally during the screen, and transferred to 60 TeraBytes storage [NetApp, Sunnyvale, CA 94089] accessible to the analysis workstations via fast local Ethernet for visualization of raw images, interactive optimization of the analysis strategy and tuning of usercontrolled parameters (see below). The automated analysis pipeline was then run, accumulating analyzed results to a data base, allowing results display and interpretation.

Analysis pipeline
In order to process the images, we used a computer cluster (Sun Microsystems), consisting of AMD dual-core computer nodes running Linux RedHat. The software used was WiSoft Minerva (Idea-biomedical).
The analysis scripts pipeline (Extended data 55 ; scripts can be used with any software) combines throughput with modularity. Analysis of different specimens and diverse assays requires a wide range of alternative algorithms and interactive capability to select between them and optimize user-defined parameters before processing the whole data. To achieve such flexibility, we broke the analysis into modular steps categorized by their functionality (Pre-processing, Segmentation and Quantification see Extended data: Figure S1 51 ). Sequences of the analysis modules can be integrated like in a jigsaw puzzle and looped in cycle on all images acquired in an experiment. While such modular structure typically compromises performance, we implemented fast communication of images and data between modules via Shared Memory, achieving processing time as fast as would have been the processing time of an integrated optimized program. Every image is annotated during the acquisition (time, fluorescent color or transmitted light, position inside well, well in plate, plate in whole experiment), and the analyzed data carries these annotations, keeping track of the experimental organization (metadata). Calculation and display of statistics of the analyzed results is therefore directly linked with the experimental design structure. The "Plate GUI" is used to display plate-wide scores, and to interactively show (by a mouse click on a well) the original images and the corresponding numerical and statistical analyzed data. The access to all levels of the data is necessary for rational mining of the TeraBytes of digital information at all levels: original images, montages of image tiles, segmented objects, quantified parameters and statistical profiles. The scoring algorithm used here is based on Mahalanobis distances in multi-parametric space (see Extended data: Figure S2 51 ). The attributes calculated for each cell are listed in Extended data: Table S2 51 .

Drug combination effect analysis
Combination matrix scores were fitted using Loewe-additivity and Bliss-independence models 56-58 using the following steps: II. Using the parameters s 1,2 for each of the two drugs to "span" the combination effect matrix, f comb , as a function of the two concentrations, D 1,2 , and solving f comb from Loewe additivity equation: III. Using the medial effect parameters D m1,2 ,s 1,2 for each of the two drugs to "span" the combination effect matrix, f comb , assuming Bliss independence: IV. Fitting the four parameters D m1,2 ,s 1,2 using all the values in the combination matrix simultaneously, based on Loewe or Bliss models.

Results
Five established cell lines; four derived from urinary-bladder cancers, and one from cervical carcinoma, were plated in 384-well plates at a density of 1000 cells per well. Following incubation for 24h, the cells were treated with serial dilutions of the COMBO plate drugs 48 (ten, three-fold dilutions, concentration range: (10-5×10 -4 )*GI 50 , where GI 50 is the concentration of the least sensitive line in the NCI60, see Extended data: Table S1 51 ) and further incubated for 18h (see Methods). The cells were then fixed and labeled (see Methods). Images were acquired in these three fluorescent channels (see top panels, Figure 1), and analyzed as described in the Methods and in Extended data: Figure S1 51 . The analysis yielded, for each well, a list of segmented cells, each characterized by a "vector" of quantified features ('attributes').
Whole cell segmentation was commonly achieved by diffuse cytoplasmic staining (e.g. "cell mask", Invitrogen), with enhanced nuclear concentration. Cytoplasmic fluorescence (due to the monomer fraction of tubulin and actin and even cell autofluorescence) plus the definition of the nucleus by DAPI (often excluding cytoskeletal proteins) provided highly reliable  Table 1 for drug order, each plate row is a dilution series in duplicates). Cells were imaged in three fluorescent colors (Red: Acin, Green:Tubulin & Blue:Nucleus, See image examples in top pannels). Individual cells were segmented, and for each cell an array of attributes was quantified. Attributes include cells and nuclei morphological and fluorescence intensities attributes and microtubules and actin fiber attributes (see Methods and Extended data: Table S2). Color coded Mahalanobis Scores (see Extended data: Figure S2) are presented here for each well in six plates for the five cell lines. A number of drugs with strong cellular effects are displayed (red rows), and indicate some cell-line dependence. Robot error effects were minimized in the analysis by rejecting outliers lying outside the robust PCA ellipsoids.
segmentation of individual cells, even in densely-packed cell islands. This approach enabled us to 'free' color channels for labeling cells for additional cellular markers of interest. In addition, the segmentation process allowed us to also calculate the total covered cell area which can report on cell spreading/ proliferation/death during the time of treatment of each of the tested cell lines. Notably, although all lines are of epithelial origin, the RT4 cells (and to some extent HT1376) grow in islands even at low plating density, while TCCSUP and UMUC3 cells grow, primarily as isolated cells, and may be more susceptible to shrinking or contraction. For this reason, the analysis is based primarily on "per-cell" attributes.
The profiling of drug responses based on a single attribute may be misleading, as demonstrated in Figure 2. In order to define an unbiased score that will accumulate druginduced changes from multi-parametric characterization of cell properties, we have to consider three factors: (1) different attributes have different dimensions and scales; (2) the cell-by-cell variability of each attribute can be quite different; (3) it is not easy to select independent attributes. For example, cell content of a specific detected protein (either by expression of fluorescent derivative or by specific immune-labeling) is highly correlated with cell size. While average protein concentration accounts for this dependency, there are less obvious correlations, reflecting regulated cellular properties, which may become relaxed or tightened in response to drug treatments.
PCA is a well-established method for the characterization of multi-dimensional variability and correlations. We have applied PCA to the quantified attributes in the control wells to best fit a multi-dimensional hyper-ellipsoid, where elongated axes indicate correlated attributes. Using this hyper-ellipsoid, we defined a score based on Mahalanobis distances between each treated cell to the control cells (see Methods and Extended data:  Figure S2 51 ). This score balances the scales and variability of the measured attributes and accounts for correlations between them. In addition, the hyper-ellipsoid for each of the treated wells provides a multi-parametric scale for identifying outliers, created by technical artifacts such as bubbles, dead cells, cell clumps etc. Figure 1 shows the multi-parametric Mahalanobis scores for all the COMBO plate drugs applied to the five cell lines tested. The figure compiles 360 Gigabytes of image data (12GB/plate) imaged from 30 multi-well plates. The scores coded in spectral colors are displayed in the plate format that is also used as a "Graphic User Interface" (Plate GUI) showing, upon a mouseclick on a well, montages of the acquired images, outlined cell segments, as well as the full resolution image color components, and a wealth of statistical presentations and cell-by-cell numerical values, facilitating data mining within the large volume of information (see Extended data: Figure S1 51 ).
The scores display differential responses to some of the drugs for some of the cell lines. Table 1 lists the AC 50 values obtained for these drugs, as described in the Methods. It should be noted that defined as a "distance", Mahalanobis scores are always positive. The multi-parametric score is a faithful reporter of changes in any of the measured attributes for all drugs that showed effects. Moreover, once an effect is detected by scoring large Mahalanobis distance to the control ellipsoid, the largest contributions of each of the attributes to the score identify the most significant attribute changes (see examples in Table 2). The plate scores for individual high contributors were qualitatively similar (though not identical) to the multi-parametric Mahalanobis scores, yet the latter present in one picture what can only be exhaustively reviewed by many single attribute scores. Altogether, this offers a fast and systematic method with internal standardization for navigating in multi-dimensional attribute space towards focusing on the cellular phenotype responding to perturbations. Figure 3 depicts another important feature of cellular responses to drug treatments, namely, time dependence. The effect of drugs with specific molecular target is typically documented using reporter cells displaying target activation. However, the medical value of many drugs often stems from indirect effects. Multi-parametric cell-based measurements can probe both direct and indirect effects. Here the fast disruption of microtubules compared to the slower cell cycle arrest (depicted by DAPIlabeled DNA content) can be seen. Fingerprints of drug effects, based on an array of features or time-dependent of a single feature, suffer from different scales and variabilities of the measured features. The Mahalanobis-scaled fingerprints can include many features at several times, and present more balanced measures for analyses such as multiparametric similarity of drug effects.
In conclusion, multi-parametric analysis of cellular responses to drugs, visualized by high-definition light microscopy screening, allows departing from target-guided drug screening and build essays more sensitive to cell-level functional effects. In addition, since systematic screen of all drug combinations is not practical, selection based on multi-parametric analysis may increase the chance of identifying interactions between different cellular mechanisms favorably affected by drug combinations.
In Figure 4 we show matrices of the responses of one combination (Radicinin and Brasilin) out of 16 binary combinations tested at five 2-fold dilutions around the "AC 50 " concentration of the single drug treatments. Differences between the two treated cell      (Table 1). 16 two-drug combinations matrices were applied to two cell lines (TCCSUP & UMUC3). 5x5 matrices (10x5 wells including duplicates) + single-drug rows and columns + controls were prepared. 4 dual combinations were set in each plate, 8 plates total (4 for each cell line). Example of 7 single features response for the combination of Radicinin (rows) and Brasilin (columns) in concentrations of 4-0.25µM are displayed for the two cell lines (TOP). The data require preparation of larger matrices with more dilutions and repeats. Nevertheless we attempted to fit the Mahalanobis scores (BOTTOM) to Loewe or Bliss models (see Methods) but found no significant synergism.
lines are seen for nuclear-related features. However, Loewe or Bliss model fitting to the Mahalanobis scores does not indicate synergism.

Discussion
Images recording cell, nuclear and cytoskeleton structures were analyzed for five cell lines treated with the COMBO drug library and revealed differential responses. Cell morphology and tagged protein intensity and distribution attributes depicted cellular responses as reported by various features. The transition with time from direct and specific effects on the drug target to indirect and distributed responses manifest the advantages of high-resolution imaging for characterization of cell-level responses by multi-parametric scores. Identification of the features mostly affected by each treatment help guide to relevant cellular functions mediating these responses.
Both basic biological research and biomedical challenges need high-throughput technologies to address the complexity of cellular mechanisms. The information-content in high-throughput experiments, initially confined to biochemical assays reporting a specific target protein, is extending to detailed quantitative sub-cellular characterization, classically the basis for understanding cellular functions. This work developed analysis for high-resolution (recording cytoskeletal fibers) multi-color cellular images from large-scale screens, and display of the results in a digestible form with cell-informatics mining capability. Multiparametric quantification of cellular effects offers a method to identify cellular responses to drugs without selecting in advance a specific drug targets as a reporter. Effective AC 50 obtained from multi-parametric analysis integrate many cell-level responses and is therefore less essay-biased then measurements based on single-attribute analysis. Scores based on Mahalanobis distance offer standardized measures of responses in comparison to controls.
Perturbations to living systems, even when the treatment is directed towards a highly specific target with well-defined shortterm effects, cause at longer times distributed cellular responses. Characterization of the time-dependence of the direct and indirect responses to drugs using multi-parametric analysis contain causal information to help dissect mechanisms, and may provide rational basis for optimization of temporal schedules for drug administration, already recognized valuable in chemotherapy 51 .
Multi-component drug combinations are commonly used in chemotherapy, AIDS and also in antibiotic treatments to fight development of resistant bacteria clones. The effectiveness of such cocktails is often a result of interactions at the physiological level, and was optimized through slow clinical trials. Redundancies and multi-functionalities in molecular networks mediating cellular mechanisms [59][60][61][62] and the multi-genetic nature of diseases such as cancer, with abnormal protein interactions and modified cellular mechanisms, suggest a potential for the discovery and optimization of multiple-perturbation approaches using cell preparations. Many classical drug targets are shared by the disease phenotypes and the normal ones, causing undesired side effects at the potent dozes. Multiple-drug treatments offer degrees of freedom to optimize synergistic interference with a specific mechanism, preferably more effective in the diseased version and in a given cell lineage, with reduced side effects due to lower doses of each individual drugs in the mixture and minimize compensation by alternative pathways. The benefits of compounds with lower affinity and multiple targets is therefore re-evaluated due to their potential in combinatorial therapies [63][64][65][66][67] .
The advancement in our understanding of the molecular mechanisms of cellular functions serves as a basis for rational design of drugs. Combinatorial drug effect profiling is in fact a tool to probe normal network structures [60][61][62] . However, first principles quantitative modeling of complex cellular networks depends on many and not readily measurable parameters. Cell-based highcontent drug response profiling is therefore a practical method to study multiple perturbations, and to optimize mixtures with advantageous responses. Cell-based assays also open possibilities for patient-specific profiling of abnormal pathways and personal optimization of treatments.

Underlying data
The raw data includes 10 Terabytes of images and are therefore too large to share. The images are kept on storage disks as OME-TIFF files with .xml headers metadata for each image. Individuals wishing to access this data can apply to the corresponding author in order to obtain the files. In this situation, these files will be uploaded to and available from the Weizmann Institute of Science website. Example images for one plate are provided in the Extended data 67 . This project contains the following extended data: - Table S1. The order of transfer from the COMBO 96-well plates and the COMBO plate handout.

Extended data
- Table S2. The list of attributes calculated for each cell.

Pegoraro G, Misteli T: High-Throughput Imaging for the Discovery of
Open Peer Review subcellular resolution, this article captures an analytical strategy that is relevant to drug discovery, but also to fundamental analysis of cellular processes, their underlying mechanisms, and potentially, diagnostic assessment/identification of defective processes in disease states.
Using the NCI COMBO drug library as a basis for analysis, the authors detail the experimental, imaging, and image analysis process, but reserve particular focus for the methods of drug effect analysis, primarily relying on the Mahalanobis distance. This measure captures a compact representation of drug effect strengths relative to control variation, whilst also embedding information about the (relative weights) of specific cellular features that capture/comprise the effects of any given perturbation (drug). Mahalanobis distance thus emerges as a powerful tool to assess, compare, sort, and visualise drug effects, especially when compared (as the authors show) with single feature responses, which are far less systematic in their indications.
The authors also highlight the strengths of their approach for identifying complex temporal dynamics of drug response, i.e. how changes in different phenotypic features emerge at different times after treatment. Reflecting the way that even highly specific initial perturbation impacts are propagated outward over time within cellular information processing and responses networks, this is an important additional dimension that is precisely addressed using the authors' analytical strategy.
Importantly, due to the very large scale of the datasets described, the authors indicate (reasonably so) that it is not feasible to share the entire dataset online. However, they indicate that, on request, components, or the entirety of the dataset can be accessed based on a request to the corresponding author. I interpret this as constituting data availability in practice.

Is the description of the method technically sound? Yes
Are sufficient details provided to allow replication of the method development and its use by others? Yes If any results are presented, are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article? Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Systems Microscopy of cancer cell biology, including signalling, cytoskeletal regulation, adhesion and migration biology. Phenotypic drug screening. Development of image analysis, statistical analysis, and data visualisation tools for quantitative single cell analysis based on imaging data.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com