Predicting Outcomes of Hormone and Chemotherapy in the Molecular Taxonomy of Breast Cancer International Consortium

Genomic aberrations and gene expression-defined subtypes in the large METABRIC patient cohort have been used to stratify and predict survival. The present study used normalized gene expression signatures of paclitaxel drug response to predict outcome for different survival times in METABRIC patients receiving hormone (HT) and, in some cases, chemotherapy (CT) agents. This machine learning method, which distinguishes sensitivity vs. resistance in breast cancer cell lines and validates predictions in patients, was also used to derive gene signatures of other HT  (tamoxifen) and CT agents (methotrexate, epirubicin, doxorubicin, and 5-fluorouracil) used in METABRIC. Paclitaxel gene signatures exhibited the best performance, however the other agents also predicted survival with acceptable accuracies. A support vector machine (SVM) model of paclitaxel response containing the ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A, TUBB4B genes was 78.6% accurate in 84 patients treated with both HT and CT (median survival ≥ 4.4 yr). Accuracy was lower (73.4%) in 304 untreated patients. The performance of other machine learning approaches were also evaluated at different survival thresholds. Minimum redundancy maximum relevance feature selection of a paclitaxel-based SVM classifier based on expression of was 79% ABCB11, ABCC1, BAD, BBC3 and BCL2L1 accurate in 53 CT patients. A random forest (RF) classifier produced a gene signature (ABCB11, ABCC1, BAD, BCL2, CYP2C8, CYP3A4, MAP4, ) that predicted >3 year survival with MAPT, NR1I2, TUBB1, GBP1, OPRK1 82.4% accuracy in 420 HT patients. A similar RF gene signature showed 79.6% accuracy in 504 patients treated with CT and/or HT. These results suggest that tumor gene expression signatures refined by machine learning 1 2 2 1 3 1 1


Introduction
Current pharmacogenetic analysis of chemotherapy makes qualitative decisions about drug efficacy in patients (determination of good, intermediate or poor metabolizer phenotypes) based on variants present in genes involved in the transport, biotransformation, or disposition of a drug. We have applied a supervised ML approach to derive accurate gene signatures, based on the biochemically-guided response to chemotherapies with breast cancer cell lines 1 , which show variable responses to growth inhibition by paclitaxel and gemcitabine therapies 2,3 . We analyzed stable 4 and linked unstable genes in pathways that determine their disposition. This involved investigating the correspondence between 50% growth inhibitory concentrations (GI 50 ) of paclitaxel and gemcitabine and gene copy number, mutation, and expression first in breast cancer cell lines and then in patients 1 . Genes encoding direct targets of these drugs, metabolizing enzymes, transporters, and those previously associated with chemo-resistance to paclitaxel (n=31 genes) were then pruned by multiple factor analysis (MFA), which indicated expression of ABCC10, BCL2, BCL2L1, BIRC5, BMF, FGF2, FN1, MAP4, MAPT, NKFB2, SLCO1B3, TLR6, TMEM243, TWIST1, and CSAG2 could predict sensitivity in breast cancer cell lines with 84% accuracy. The cell line-based paclitaxel-gene signature predicted sensitivity in 84% of patients with no or minimal residual disease (n=56; data from 5). The present study derives related gene signatures with ML approaches that predict outcome of hormone-and chemotherapies in the large METABRIC breast cancer cohort 6 .

Methods
SVM learning: Previously, paclitaxel-related response genes were identified from peer-reviewed literature, and their expression and copy number in breast cancer cell lines were analyzed by multiple factor analysis of GI 50 values of these lines 2 (Figure 1). Genes with expression levels related to GI 50 were used to derive SVMs by backwards feature selection for paclitaxel, tamoxifen, methotrexate, 5-fluorouracil, epirubicin, and doxorubicin (trained using the function fitcsvm in MATLAB R2014a 7 and tested with either leave-one-out or 9 fold cross-validation). These SVMs were then assessed for their ability to predict patient outcomes based on available metadata (see Figure 1 and reference 1). Interactive prediction using normalized expression values as input is available at http://chemotherapy.cytognomix.com. The initial set of genes is carefully selected through the understanding of the drug and the pathways associated with it. A multiple factor analysis of the GI 50 values of a training set of breast cancer cell lines and the corresponding expression levels of each gene in the initial set reduces the list of genes. Given this expression levels of each gene the reduced set for each cell line, the method finds the optimal gene subset and the SVM that minimizes the misclassification rate by cross-validation. The SVM is evaluated on patients by classifying those with shorter survival time as resistant and longer survival as sensitive to hormone and/or chemotherapy. The Gaussian kernel SVM requires manual selection of two different parameters, C and sigma; these parameters determine how strictly the SVM learns the training set, thus if not selected properly can lead to overfitting. A grid search evaluates a wide range of combinations of these values by parallelization. The algorithm selects the C and sigma combination that lead to the lowest cross-validation misclassification rate. A backwards feature selection (greedy) algorithm is used, in which one gene of the set is left out in a reduced gene set and the classification is then assessed; genes that maintain or lower the misclassification rate are kept in the signature. The procedure is repeated until the subset with the lowest misclassification rate is selected as the optimal subset of genes. RF learning: RF was trained using the WEKA 3.7 8 data mining tool. This classifier uses multiple random trees for classification, which are combined via a voting scheme to make a decision on the given input gene set. Figure 2 depicts the therapy outcome prediction process of a given patient using a RF consisting of a series of decision trees derived from different subsets of paclitaxel-related genes.
Augmented Gene Selection: The most relevant genes (features) for therapy outcome prediction were found using the minimum redundancy and maximum relevance (mRMR) approach 9 . mRMR is a wrapper that incrementally selects genes by maximizing the average mutual information between gene expression features and classes, while minimizing their redundancies:  (Table 3). mRMR and SVM were combined to obtain a subset of genes that can accurately predict patient survival outcome; here, we considered 3, 4 and 5 years as survival thresholds for breast cancer patients (Table 3). The predicted and expected response to treatment for each individual METABRIC patient for each analyses listed in Table 1, Table 2 and Table 3 are indexed. Patients sensitive to treatment are labeled with '0' while resistant patients are labeled '1'.

Results and discussion
The performance of several ML techniques have been compared that distinguish paclitaxel sensitivity and resistance in METABRIC patients using its tumour gene expression datasets. SVMs have generated gene signatures, indicating which genes are important for treatment response in METABRIC patients. These models are more accurate for prediction of outcomes in patients receiving HT and/or CT compared to other patient groups.
SVMs and RF were trained using expression of genes associated with paclitaxel response, mechanism of action and stable genes in the biological pathways of these targets ( Figure 3). SVM models for drugs used to treat these patients were derived by backwards feature selection on patient subsets stratified by treatment or outcome ( Table 1). The highest SVM accuracy was found for the Figure 2. RF decision tree diagram depicts the therapy outcome prediction process of a given patient, using a RF consisting of k decision trees. Several DTs are built using different subsets of paclitaxel-related genes. The process starts from the root of each tree and if the expression of the gene corresponding to that node is greater than a specific value, the process continues through the right branch, otherwise it continues through the left branch until it reaches a leaf node; that leaf represents the prediction of the tree for that specific input. The decisions of all trees are considered and the one with the largest number of votes is selected as the patient outcome.   The RF classifier was used to predict paclitaxel therapy outcome for patients that underwent CT and/or HT (Table 2). The best performance achieved with RF showed 82.4% overall accuracy using a 3-year survival threshold for distinguishing therapeutic resistance vs. sensitivity.
The best overall accuracy and AUC (sensitivity and specificity) for CT/HT patients using mRMR feature selection for SVM predicting outcome of paclitaxel therapy was obtained for CT patients with 4 year survival. Outcomes for HT patients with 3 year survival were predicted with 84% accuracy; however the specificity was lower in this group. SVM combined with mRMR further improved accuracy of feature selection and prediction of response to hormone and/or chemotherapy based on survival time than either SVM or RF alone.
While not a replication study sensu stricto, the initial paclitaxel gene set used for feature selection was the same as in our previous study 1 . Predictions for the METABRIC patient cohort, which was independent of the previous validation set 5 , of the either same (SVM) or different ML methods (RF and SVM with mRMR) exhibited comparable or better accuracies than our previous gene signature 1 . These techniques are powerful tools which can be used to identify genes that may be involved in drug resistance, as well as predict patient survival after treatment. Future efforts to expand these models to other drugs may assist in suggesting preferred treatments in specific patients, with the potential impact of improving efficacy and reducing duration of therapy. 1.

School of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan
This study proposed prediction methods using SVM and RF classifiers with mRMR selected feature sets from cell line data and demonstrate its prediction ability for outcomes from METABRIC patient cohort. The classifiers with good prediction performance show the usefulness of combining domain knowledge with feature selection techniques. However, some details essential for reproducibility and interpretation are missing. Required information is listed in the following.
What are the values of parameters for SVM and RF classifiers and the methods for parameter selection (by default or other selection methods)?
The development and evaluation of models for patient data are not clear. Whether the models were trained using partial data from METABRIC or only leave-one-out cross-validation was applied? If cross-validation is the case, then what is the model offered at the online server because there will be more than one models created, and whether the cross-validation is involved in the feature selection process that often leads to an overestimation of the performance. For the case of training on partial data, both training and test performance are essential information for evaluating the robustness of models.
Since some of the datasets are highly imbalanced, the numbers of positives and negatives, as well as sensitivity and specificity are more important than accuracy for interpreting the results as a high accuracy with a low AUC could be the result of all positive/negative predictions on an imbalanced dataset. Listing all the information along with the accuracy and AUC will help the interpretation of prediction performances.
No competing interests were disclosed.

Competing Interests:
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 13 Jan 2017 , University of Western Ontario, London, Canada Peter Rogan

Comment 1:What are the values of parameters for SVM and RF classifiers and the methods for parameter selection (by default or other selection methods)?
Response: The parameter values for these classifiers have been added to the Tables 1-5.
In regards to parameter selection, the first paragraph of the methods now describes C and Sigma selection as a grid search to find the values with the lowest cross-validation misclassification rate. Similarly for RF, a grid search was used to optimize the maximum number of randomly selected genes for each tree (second paragraph of Methods section).
Comment 2: The development and evaluation of models for patient data are not clear. Whether the models were trained using partial data from METABRIC or only leave-one-out cross-validation was applied? If cross-validation is the case, then what is the model offered at the online server because there will be more than one models created, and whether the cross-validation is involved in the feature selection process that often leads to an overestimation of the performance. For the case of training on partial data, both training and test performance are essential information for evaluating the robustness of models.
Response: We obtained new results for both RF and mRMR+SVM models when we use discovery set as training set and validation set as test set, the performance of the model was poor. After more investigation we found that there happened to be a large variation between gene expression of 26 targeted genes between discovery and validation set (please see Supplementary Dataset 2). Hence, building any classifier using discovery and validation set as training and test set in their current forms will result of poor performance, since the training and test sets are vastly different.
However, we did carry out another experiment on discovery set solely and used 70% of data for training and remaining 30% for test the performance of the model. The results have been added to the manuscript (Tables 4 and 5).
Comment 3: Since some of the datasets are highly imbalanced, the numbers of positives and negatives, as well as sensitivity and specificity are more important than accuracy for interpreting the results as a high accuracy with a low AUC could be the result of all positive/negative predictions on an imbalanced dataset. Listing all the information along with the accuracy and AUC will help the interpretation of prediction performances.

2.
Covariates such as ER/PR or PAM50 subtypes must be included in a table describing the sample cohorts. Accuracy must be computed separately for these co-variates or they must also be included as co-variates in the machine-learning model. Ideally accuracy would be compared to existing breast cancer classifiers (e.g., using code from Marchionni , 2013) and/or survival curves reported in the literature.

Conclusions
Must be discussed in the context of existing genomics classifiers for breast cancer (e.g., OncotypeDx and/or Mammaprint). Results must be put in context with other predictions on METABRIC data, e.g., outcomes from the DREAM contest.

Acceptable
No competing interests were disclosed.

Competing Interests:
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 13 Jan 2017 , University of Western Ontario, London, Canada Peter Rogan Comment 1: The methods require further clarification to distinguish differences between this study and the previous study as well as the parameters of the machine learning algorithms.
Response: The first paragraph of the Methods describes Support Vector Machine learning, which has been greatly expanded upon. Differences in SVM methodology between the two studies are indicated there (i.e. a Gaussian kernel was used instead of a linear kernel). All other feature selection methods described in the manuscript (Random Forest, mRMR) were not used in Dorman , 2016. et al.
The parameters for machine learning algorithms have been incorporated in the manuscript, and can be found in the footnote section of each data table.

Comment 2: Accuracy in the results must better distinguish results on independent test and training sets.
Response: The Validation dataset showed a distinct overall expression profile from the Discovery set, possibly due to batch effects, which are well known. We added another experiment to the manuscript by splitting the Discovery set into Training and Test sets. The model was trained using 70% of the data and then tested using the remaining 30% of data as test set. We repeated this procedure 100 times and took the median as the final performance result. The results are presented in Tables 4 and 5 of the manuscript.

Comment 3: Classifiers must be put in the context of other existing genomics classifiers used in breast cancer and/or previously published in Mammaprint data.
Response: We have added two sentences in the second paragraph of the "Results and Discussion" section which describes the comparison of our gene signature to those from MammaPrint and Oncotype Dx. Pair-wise comparison of these three signatures show that they are nearly independent of one another.

Comment 4: Abbreviations SVM and RF must be spelled out as Support Vector Machine and Random Forest on first use in Methods.
Response: We thank the reviewer for this suggestion. It has been addressed in the Methods section of the manuscript. Response: This is now clarified within the first paragraph of the Methods section in the manuscript. The SVM classifier was adopted from previous Molecular Oncology publication, while the feature selection method has been developed as part of this publication.
Comment 6: Details about the SVM learning algorithm are included in the caption to Figure 1, but must also be included and completely described in text for the corresponding section of the methods.
Response: Thanks for the reviewer's suggestion. This description of the SVM learning algorithm has been moved from the Figure 1 legend and integrated into the first paragraph of the methods section.

Comment 7: No equations are provided to describe the role of the parameters C and sigma. It is also unclear whether this greedy search is implemented by the Matlab function fitcsvm or uses custom code developed by the authors.
Response: A brief description of the role of each parameter has been added to the first paragraph of the methods section of the manuscript. Readers are also now directed to a reference (Ben-Hur and Weston, 2010) if more detail is desired.
The greedy search, also called sequential backward feature selection, was implemented as a script by our lab in MATLAB. It is not a MATLAB function. This is clarified by changing a few words in the first paragraph of the methods section: "A backwards feature selection (greedy) algorithm was designed and implemented in MATLAB in which…" Moreover, as described above, the SVM classifier was adopted from previous Molecular Oncology publication (Dorman 2016), while the feature selection method has been developed as part of et al. this publication.

Results
Comment 8: Need to specify whether reported accuracies are computed with leave-one-out cross Comment 8: Need to specify whether reported accuracies are computed with leave-one-out cross validation or 9-fold cross validation (described in Methods).
Response: All SVM models described in the manuscript used leave-one-out cross validation except one, and this is clearly indicated in Table 1, and is now commented on in the methods. A 9-fold cross-validation was used to build a model using 735 patients who were treated with Chemotherapy and/or Hormone therapy, as leave-one-out cross validation of this many patients took an unreasonably long time to complete (it exceeded 3 weeks on a dedicated I7 Intel processor).
Comment 9: Ideally, given the size of METABRIC data they would be calculated on independent training (first 1000 patient samples) and test (last 1000 patient samples) datasets.
Response: We obtained new results for both RF and mRMR+SVM models using Discovery patient set for training and Validation set for testing, however the performance of the model was poor. After further investigation, we found that there were large differences between gene expression levels of the 26 model signature genes in the Discovery versus Validation sets (we used Wilcoxon rank sum test, Kruskal-Wallis test and t-test to evaluate the results -shown in the plotted distributions of gene expression in Supplemental Dataset 2) regardless of patient status (alive or dead). Hence, building any classifier using discovery and validation set as training and test set in their current forms will result of poor performance due to this source of heterogeneity.
To address this issue, we did carry out another experiment based on data from the Discovery patient dataset alone; using 70% of data for training and remaining 30% for testing, the performance of the model was significantly better. We speculate that the discrepancy between the expression distributions in the Discovery and Validation sets were the result of batch effects. The results have been added to the manuscript (Tables 4,5).
Comment 10: AUC must be computed separately for discovery and validation sets (  Table 6 of the manuscript.

Conclusions
Comment 14: Must be discussed in the context of existing genomics classifiers for breast cancer (e.g., OncotypeDx and/or Mammaprint).
Response: We have added text to both the second paragraph of the "Results and Discussion" paragraph and to the conclusion of the paper.
Comment 15: Results must be put in context with other predictions on METABRIC data, e.g., outcomes from the DREAM contest.
Response: An important distinction to note in regards to our methodology is that the predictions are based on the genes known to be associated with the response to specific drugs used to treat breast cancer. In the DREAM contest, the method with the highest METABRIC score (as described in Cheng , 2013) was phenotype-based, finding signatures for molecular processes that are et al. disregulated in METABRIC, rather than responses to the cancer therapies themselves. While this is an interesting prediction method, the results cannot compared to our approach. The gene signatures that we have derived contain components of many different pathways. Reference: Cheng WY, Ou Yang TH, Anastassiou D. Biomolecular events in cancer revealed by attractor metagenes. PLoS Comput Biol. 2013;9(2):e1002920.
PKR cofounded Cytognomix. A patent application related to biologically Competing Interests: inspired gene signatures is pending. The other authors declare that they have no competing interests.