<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.9417.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Note</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                    <subj-group>
                        <subject>Cancer Therapeutics</subject>
                    </subj-group>
                    <subj-group>
                        <subject>Genomics</subject>
                    </subj-group>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Predicting Outcomes of Hormone and Chemotherapy in the Molecular&#x00a0;Taxonomy of&#x00a0;Breast Cancer&#x00a0;International&#x00a0;Consortium (METABRIC) Study by Biochemically-inspired Machine Learning</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 1 approved, 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Rezaeian</surname>
                        <given-names>Iman</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Mucaki</surname>
                        <given-names>Eliseos J.</given-names>
                    </name>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Baranova</surname>
                        <given-names>Katherina</given-names>
                    </name>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Pham</surname>
                        <given-names>Huy Q.</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Angelov</surname>
                        <given-names>Dimo</given-names>
                    </name>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ngom</surname>
                        <given-names>Alioune</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Rueda</surname>
                        <given-names>Luis</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Rogan</surname>
                        <given-names>Peter K.</given-names>
                    </name>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2070-5254</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>School of Computer Science, University of Windsor, Windsor, Canada</aff>
                <aff id="a2">
                    <label>2</label>Department of Biochemistry, University of Western Ontario, London, Canada</aff>
                <aff id="a3">
                    <label>3</label>Department of Computer Science, University of Western Ontario, London, Canada</aff>
                <aff id="a4">
                    <label>4</label>CytoGnomix Inc, London, Canada</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:progan@uwo.ca">progan@uwo.ca</email>
                </corresp>
                <fn fn-type="con">
                    <p>PKR, AN and LR designed the methodology and oversaw the project. SVM feature selection with MATLAB was automated by DA. EJM and KB selected the initial gene signatures, and performed processing of the METABRIC data using SVM methods. IR performed the preprocessing of the METABRIC dataset using RF; IR and HQ designed feature selection and classification modules using WEKA. PKR, IR, EJM, AN, and LR wrote the manuscript.</p>
                </fn>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>PKR cofounded Cytognomix. A patent application related to biologically inspired gene signatures is pending. The other authors declare that they have no competing interests.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>27</day>
                <month>1</month>
                <year>2017</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2016</year>
            </pub-date>
            <volume>5</volume>
            <elocation-id>2124</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>24</day>
                    <month>1</month>
                    <year>2017</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Rezaeian I et al.</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/5-2124/pdf"/>
            <abstract>
                <p>Genomic aberrations and gene expression-defined subtypes in the large METABRIC patient cohort have been used to stratify and predict survival. The present study used normalized gene expression signatures of paclitaxel drug response to predict outcome for different survival times in METABRIC patients receiving hormone (HT) and, in some cases, chemotherapy (CT) agents. This machine learning method, which distinguishes sensitivity vs. resistance in breast cancer cell lines and validates predictions in patients, was also used to derive gene signatures of other HT (tamoxifen) and CT agents (methotrexate, epirubicin, doxorubicin, and 5-fluorouracil) used in METABRIC. Paclitaxel gene signatures exhibited the best performance; however the other agents also predicted survival with acceptable accuracies. A support vector machine (SVM) model of paclitaxel response containing genes 
                    <italic toggle="yes">ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A,</italic> and 
                    <italic toggle="yes">TUBB4B</italic> was 78.6% accurate in predicting survival of 84 patients treated with both HT and CT (median survival &#x2265; 4.4 yr). Accuracy was lower (73.4%) in 304 untreated patients. The performance of other machine learning approaches was also evaluated at different survival thresholds. Minimum redundancy maximum relevance feature selection of a paclitaxel-based SVM classifier based on expression of genes 
                    <italic toggle="yes">BCL2L1</italic>, 
                    <italic toggle="yes">BBC3</italic>, 
                    <italic toggle="yes">FGF2</italic>, 
                    <italic toggle="yes">FN1</italic>, and 
                    <italic toggle="yes">TWIST1</italic> was 81.1% accurate in 53 CT patients. In addition, a random forest (RF) classifier using a gene signature (
                    <italic toggle="yes">ABCB1</italic>, 
                    <italic toggle="yes">ABCB11</italic>, 
                    <italic toggle="yes">ABCC1</italic>, 
                    <italic toggle="yes">ABCC10</italic>, 
                    <italic toggle="yes">BAD</italic>, 
                    <italic toggle="yes">BBC3</italic>, 
                    <italic toggle="yes">BCL2</italic>, 
                    <italic toggle="yes">BCL2L1</italic>, 
                    <italic toggle="yes">BMF</italic>, 
                    <italic toggle="yes">CYP2C8</italic>, 
                    <italic toggle="yes">CYP3A4</italic>, 
                    <italic toggle="yes">MAP2</italic>, 
                    <italic toggle="yes">MAP4</italic>, 
                    <italic toggle="yes">MAPT</italic>, 
                    <italic toggle="yes">NR1I2</italic>, 
                    <italic toggle="yes">SLCO1B3</italic>, 
                    <italic toggle="yes">TUBB1</italic>, 
                    <italic toggle="yes">TUBB4A</italic>, and 
                    <italic toggle="yes">TUBB4B</italic>) predicted &gt;3-year survival with 85.5% accuracy in 420 HT patients. A similar RF gene signature showed 82.7% accuracy in 504 patients treated with CT and/or HT. These results suggest that tumor gene expression signatures refined by machine learning techniques can be useful for predicting survival after drug therapies.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Gene expression signatures</kwd>
                <kwd>breast cancer</kwd>
                <kwd>chemotherapy resistance</kwd>
                <kwd>hormone therapy</kwd>
                <kwd>machine learning</kwd>
                <kwd>support vector machine</kwd>
                <kwd>random forest</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>AN and LR are funded by NSERC grants RGPIN-2016-05017 and RGPIN-2014-05084 and by the Windsor Essex County Cancer Centre Foundation under a Seeds4Hope grant. PKR has been supported by NSERC [Discovery Grant RGPIN-2015-06290], Canadian Foundation for Innovation, Canada Research Chairs and Cytognomix Inc.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>Changes to the manuscript have been incorporated in response to the valuable comments provided by Drs. Fertig and Tung. These include three additional tables, which report the results for cross-validation using the METABRIC Discovery set only. We have included Supplementary Material demonstrating heterogeneous expression of paclitaxel signature genes in the Discovery vs. Validation datasets. Methods have been updated with additional detail as requested. Results now include additional performance metrics. We have also corrected some predictions in Dataset 1, however changes do not affect the conclusions of the paper.</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Current pharmacogenetic analysis of chemotherapy makes qualitative decisions about drug efficacy in patients (determination of good, intermediate or poor metabolizer phenotypes) based on variants present in genes involved in the transport, biotransformation, or disposition of a drug. We have applied a supervised machine learning (ML) approach to derive accurate gene signatures, based on the biochemically-guided response to chemotherapies with breast cancer cell lines
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>, which show variable responses to growth inhibition by paclitaxel and gemcitabine therapies
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>,
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>. We analyzed stable
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup> and linked unstable genes in pathways that determine their disposition. This involved investigating the correspondence between 50% growth inhibitory concentrations (GI
                <sub>50</sub>) of paclitaxel and gemcitabine and gene copy number, mutation, and expression first in breast cancer cell lines and then in patients
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>. Genes encoding direct targets of these drugs, metabolizing enzymes, transporters, and those previously associated with chemo-resistance to paclitaxel (n=31 genes) were then pruned by multiple factor analysis (MFA), which indicated that expression levels of genes 
                <italic toggle="yes">ABCC10</italic>, 
                <italic toggle="yes">BCL2</italic>, 
                <italic toggle="yes">BCL2L1</italic>, 
                <italic toggle="yes">BIRC5</italic>, 
                <italic toggle="yes">BMF</italic>, 
                <italic toggle="yes">FGF2</italic>, 
                <italic toggle="yes">FN1</italic>, 
                <italic toggle="yes">MAP4</italic>, 
                <italic toggle="yes">MAPT</italic>, 
                <italic toggle="yes">NKFB2</italic>, 
                <italic toggle="yes">SLCO1B3</italic>, 
                <italic toggle="yes">TLR6</italic>, 
                <italic toggle="yes">TMEM243</italic>, 
                <italic toggle="yes">TWIST1</italic>, and 
                <italic toggle="yes">CSAG2</italic> could predict sensitivity in breast cancer cell lines with 84% accuracy. The cell line-based paclitaxel-gene signature predicted sensitivity in 84% of patients with no or minimal residual disease (n=56; data from 
                <xref ref-type="bibr" rid="ref-5">5</xref>). The present study derives related gene signatures with ML approaches that predict outcome of hormone- and chemotherapies in the large METABRIC breast cancer cohort
                <sup>
                    <xref ref-type="bibr" rid="ref-6">6</xref>
                </sup>.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <p>
                <italic toggle="yes">SVM learning</italic>: Previously, paclitaxel-related response genes were identified from peer-reviewed literature, and their expression and copy number in breast cancer cell lines were analyzed by multiple factor analysis of GI
                <sub>50</sub> values of these lines
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup> (
                <xref ref-type="fig" rid="f1">Figure 1</xref>).  Given the expression levels of each gene, a SVM is evaluated on patients by classifying those with shorter survival time as resistant and longer survival as sensitive to hormone and/or chemotherapy using paclitaxel, tamoxifen, methotrexate, 5-fluorouracil, epirubicin, and doxorubicin. The SVM was trained using the function 
                <italic toggle="yes">fitcsvm</italic> in MATLAB R2014a
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup> and tested with either leave-one-out or 9 fold cross-validation (indicated in 
                <xref ref-type="table" rid="T1">Table 1</xref>). The Gaussian kernel was used for this study, unlike Dorman 
                <italic toggle="yes">et al.</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup> which used the linear kernel. The SVM requires selection of two different parameters, C (misclassification cost) and sigma (which controls the flexibility and smoothness of Gaussians)
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup>; these parameters determine how strictly the SVM learns the training set, and hence if not selected properly, can lead to overfitting. A grid search evaluates a wide range of combinations of these values by parallelization. A Gaussian kernel selects the C and sigma combination that lead to the lowest cross-validation misclassification rate. A backwards feature selection (greedy) algorithm was designed and implemented in MATLAB in which one gene of the set is left out in a reduced gene set and the classification is then assessed; genes that maintain or lower the misclassification rate are kept in the signature. The procedure is repeated until the subset with the lowest misclassification rate is selected as the optimal subset of genes. These SVMs were then assessed for their ability to predict patient outcomes based on available metadata (see 
                <xref ref-type="fig" rid="f1">Figure 1</xref> and reference 
                <xref ref-type="bibr" rid="ref-1">1</xref>). Interactive prediction using normalized expression values as input is available at 
                <ext-link ext-link-type="uri" xlink:href="http://chemotherapy.cytognomix.com">http://chemotherapy.cytognomix.com</ext-link>.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>Figure 1. </label>
                <caption>
                    <title>Biochemically-inspired SVM gene signature derivation workflow.</title>
                    <p>The initial set of genes is carefully selected through the understanding of the drug and the pathways associated with it. A multiple factor analysis of the GI
                        <sub>50 </sub>values of a training set of breast cancer cell lines and the corresponding expression levels of each gene in the initial set reduces the list of genes.</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/11525/99c35509-b811-4437-9f96-7158fdcdc41a_figure1.gif"/>
            </fig>
            <table-wrap id="T1" orientation="portrait" position="anchor">
                <label>Table 1. </label>
                <caption>
                    <title>SVM gene expression signature performance on METABRIC patients.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="center" colspan="1" rowspan="1">Patient
                                <break/>treatment</th>
                            <th align="center" colspan="1" rowspan="1"># of patients</th>
                            <th align="center" colspan="1" rowspan="1">Agent:
                                <break/>
                                <italic toggle="yes">final gene</italic>
                                <break/>
                                <italic toggle="yes">signature (C</italic>
                                <break/>
                                <italic toggle="yes">and sigma)</italic>
                            </th>
                            <th align="center" colspan="1" rowspan="1">Accuracy (%)</th>
                            <th align="center" colspan="1" rowspan="1">Precision</th>
                            <th align="center" colspan="1" rowspan="1">F-Measure</th>
                            <th align="center" colspan="1" rowspan="1">MCC
                                <sup>
                                    <xref ref-type="other" rid="fn1">1</xref>
                                </sup>
                            </th>
                            <th align="center" colspan="1" rowspan="1">AUC
                                <sup>
                                    <xref ref-type="other" rid="fn1">2</xref>
                                </sup>
                            </th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="center" colspan="1" rowspan="6">Both CT
                                <break/>and HT
                                <sup>
                                    <xref ref-type="other" rid="fn1">3</xref>
                                </sup>
                            </td>
                            <td align="center" colspan="1" rowspan="6">84</td>
                            <td align="center" colspan="1" rowspan="1">Paclitaxel: 
                                <italic toggle="yes">ABCC1, ABCC10, BAD,</italic>
                                <break/>
                                <italic toggle="yes">BIRC5, FN1, GBP1, MAPT, SLCO1B3,</italic>
                                <break/>
                                <italic toggle="yes">TMEM243, TUBB3, TUBB4B</italic>
                                <break/>
                                <italic toggle="yes">(C=10000, &#x03c3;=10)</italic>
							</td>
                            <td align="center" colspan="1" rowspan="1">78.6</td>
                            <td align="center" colspan="1" rowspan="1">0.787</td>
                            <td align="center" colspan="1" rowspan="1">0.782</td>
                            <td align="center" colspan="1" rowspan="1">0.559</td>
                            <td align="center" colspan="1" rowspan="1">0.814</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">Tamoxifen: 
                                <italic toggle="yes">ABCC2, ALB, CCNA2,</italic>
                                <break/>
                                <italic toggle="yes">E2F7, FLAD1, FMO1, NCOA2, NR1I2,</italic>
                                <break/>
                                <italic toggle="yes">PIAS4, SULT1E1 (C=100000, &#x03c3;=100)</italic>
							</td>
                            <td align="center" colspan="1" rowspan="1">76.2</td>
                            <td align="center" colspan="1" rowspan="1">0.761</td>
                            <td align="center" colspan="1" rowspan="1">0.760</td>
                            <td align="center" colspan="1" rowspan="1">0.510</td>
                            <td align="center" colspan="1" rowspan="1">0.701</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">Methotrexate: 
                                <italic toggle="yes">ABCC2, ABCG2,</italic>
                                <break/>
                                <italic toggle="yes">CDK2, DHFRL1 (C=10, &#x03c3;=1)</italic>
							</td>
                            <td align="center" colspan="1" rowspan="1">71.4</td>
                            <td align="center" colspan="1" rowspan="1">0.712</td>
                            <td align="center" colspan="1" rowspan="1">0.711</td>
                            <td align="center" colspan="1" rowspan="1">0.410</td>
                            <td align="center" colspan="1" rowspan="1">0.766</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">Epirubicin
                                <italic toggle="yes">: ABCB1, CDA, CYP1B1,</italic>
                                <break/>
                                <italic toggle="yes">ERBB3, ERCC1, MTHFR, PON1,</italic>
                                <break/>
                                <italic toggle="yes">SEMA4D, TFDP2 (C=1000, &#x03c3;=10)</italic>
							</td>
                            <td align="center" colspan="1" rowspan="1">72.6</td>
                            <td align="center" colspan="1" rowspan="1">0.725</td>
                            <td align="center" colspan="1" rowspan="1">0.723</td>
                            <td align="center" colspan="1" rowspan="1">0.434</td>
                            <td align="center" colspan="1" rowspan="1">0.686</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">Doxorubicin: 
                                <italic toggle="yes">ABCC2, ABCD3, CBR1,</italic>
                                <break/>
                                <italic toggle="yes">FTH1, GPX1, NCF4, RAC2, TXNRD1</italic>
                                <break/>
                                <italic toggle="yes">(C=100000, &#x03c3;=100)</italic>
							</td>
                            <td align="center" colspan="1" rowspan="1">75.0</td>
                            <td align="center" colspan="1" rowspan="1">0.749</td>
                            <td align="center" colspan="1" rowspan="1">0.750</td>
                            <td align="center" colspan="1" rowspan="1">0.488</td>
                            <td align="center" colspan="1" rowspan="1">0.701</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">5-Fluorouracil: 
                                <italic toggle="yes">ABCB1, ABCC3,</italic>
                                <break/>
                                <italic toggle="yes">MTHFR, TP53 (C=10000, &#x03c3;=100)</italic>
							</td>
                            <td align="center" colspan="1" rowspan="1">71.4</td>
                            <td align="center" colspan="1" rowspan="1">0.714</td>
                            <td align="center" colspan="1" rowspan="1">0.714</td>
                            <td align="center" colspan="1" rowspan="1">0.417</td>
                            <td align="center" colspan="1" rowspan="1">0.718</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">CT and/or
                                <break/>HT
                                <sup>
                                    <xref ref-type="other" rid="fn1">3</xref>,
                                    <xref ref-type="other" rid="fn1">4</xref>,
                                    <xref ref-type="other" rid="fn1">5</xref>,
                                    <xref ref-type="other" rid="fn1">6</xref>
                                </sup>
                            </td>
                            <td align="center" colspan="1" rowspan="1">735</td>
                            <td align="center" colspan="1" rowspan="1">Paclitaxel: 
                                <italic toggle="yes">BAD, BCAP29, BCL2,</italic>
                                <break/>
                                <italic toggle="yes">BMF, CNGA3, CYP2C8, CYP3A4,</italic>
                                <break/>
                                <italic toggle="yes">FGF2, FN1, NFKB2, NR1I2, OPRK1,</italic>
                                <break/>
                                <italic toggle="yes">SLCO1B3, TLR6, TUBB1, TUBB3,</italic>
                                <break/>
                                <italic toggle="yes">TUBB4A, TUBB4B, TWIST1</italic>
                                <break/>
                                <italic toggle="yes">(C=10000, &#x03c3;=100)</italic>
							</td>
                            <td align="center" colspan="1" rowspan="1">66.1</td>
                            <td align="center" colspan="1" rowspan="1">0.652</td>
                            <td align="center" colspan="1" rowspan="1">0.643</td>
                            <td align="center" colspan="1" rowspan="1">0.287</td>
                            <td align="center" colspan="1" rowspan="1">0.660</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">Deceased
                                <break/>only
                                <sup>
                                    <xref ref-type="other" rid="fn1">2</xref>,
                                    <xref ref-type="other" rid="fn1">6</xref>,
                                    <xref ref-type="other" rid="fn1">7</xref>
                                </sup>
                                <break/>(CT and/or
                                <break/>HT)</td>
                            <td align="center" colspan="1" rowspan="1">327</td>
                            <td align="center" colspan="1" rowspan="1">Paclitaxel: 
                                <italic toggle="yes">ABCB11, BAD, BBC3,</italic>
                                <break/>
                                <italic toggle="yes">BCL2, BCL2L1, BIRC5, CYP2C8,</italic>
                                <break/>
                                <italic toggle="yes">FGF2, FN1, GBP1, MAPT, NFKB2,</italic>
                                <break/>
                                <italic toggle="yes">OPRK1, SLCO1B3, TMEM243</italic>
                                <break/>
                                <italic toggle="yes">(C=100, &#x03c3;=10)</italic>
							</td>
                            <td align="center" colspan="1" rowspan="1">75.3</td>
                            <td align="center" colspan="1" rowspan="1">0.752</td>
                            <td align="center" colspan="1" rowspan="1">0.752</td>
                            <td align="center" colspan="1" rowspan="1">0.505</td>
                            <td align="center" colspan="1" rowspan="1">0.763</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">No
                                <break/>treatment
                                <sup>
                                    <xref ref-type="other" rid="fn1">3</xref>
                                </sup>
                            </td>
                            <td align="center" colspan="1" rowspan="1">304</td>
                            <td align="center" colspan="1" rowspan="1">Paclitaxel: 
                                <italic toggle="yes">ABCB1, ABCB11, BBC3,</italic>
                                <break/>
                                <italic toggle="yes">BCL2L1, BMF, CYP3A4, FGF2,</italic>
                                <break/>
                                <italic toggle="yes">GBP1, MAP4, MAPT, NR1I2, OPRK1,</italic>
                                <break/>
                                <italic toggle="yes">SLCO1B3, TUBB4A, TUBB4B,</italic>
                                <break/>
                                <italic toggle="yes">TWIST2 (C=100, &#x03c3;=10)</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">73.4</td>
                            <td align="center" colspan="1" rowspan="1">0.734</td>
                            <td align="center" colspan="1" rowspan="1">0.733</td>
                            <td align="center" colspan="1" rowspan="1">0.467</td>
                            <td align="center" colspan="1" rowspan="1">0.769</td>
                        </tr>
                    </tbody>
                </table>
                <table-wrap-foot>
                    <fn>
                        <p id="fn1">Initial gene sets preceding feature selection: Paclitaxel - 
                            <italic toggle="yes">ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCAP29, BCL2, BCL2L1, BIRC5, BMF, CNGA3, CYP2C8, CYP3A4, FGF2, FN1, GBP1, MAP2, MAP4, MAPT, NFKB2, NR1I2, OPRK1, SLCO1B3, TLR6, TUBB1, TWIST1.</italic> Tamoxifen - 
                            <italic toggle="yes">ABCB1, ABCC2, ALB, C10ORF11, CCNA2, CYP3A4, E2F7, F5, FLAD1, FMO1, IGF1, IGFBP3, IRS2, NCOA2, NR1H4, NR1I2, PIAS4, PPARA, PROC, RXRA, SMARCD3, SULT1B1, SULT1E1, SULT2A1.</italic> Methotrexate - 
                            <italic toggle="yes">ABCB1, ABCC2, ABCG2, CDK18, CDK2, CDK6, CDK8, CENPA, DHFRL1</italic>. Epirubicin - 
                            <italic toggle="yes">ABCB1, CDA, CYP1B1, ERBB3, ERCC1, GSTP1, MTHFR, NOS3, ODC1, PON1, RAD50, SEMA4D, TFDP2</italic>. Doxorubicin - 
                            <italic toggle="yes">ABCB1, ABCC2, ABCD3, AKR1B1, AKR1C1, CBR1, CYBA, FTH1, FTL, GPX1, MT2A, NCF4, RAC2, SLC22A16, TXNRD1.</italic> 5-Fluorouracil - 
                            <italic toggle="yes">ABCB1, ABCC3, CFLAR, IL6, MTHFR, TP53, UCK2.</italic> 
                            <sup>1</sup>MCC: Matthews Correlation Coefficient. 
                            <sup>2</sup>AUC: Area under receiver operating curve. 
                            <sup>3</sup> Surviving patients; 
                            <sup>4</sup> Analysis included patients in the METABRIC &#x2018;discovery&#x2019; dataset only; 
                            <sup>5</sup> SVMs tested with 9 fold cross-validation, all others tested with leave-one-out cross-validation; 
                            <sup>6</sup> Includes all patients treated with HT,CT, combination CT/HT, either with or without combination radiotherapy; 
                            <sup>7</sup> Median time after treatment until death (&gt; 4.4 years) was used to distinguish favorable outcome, ie. sensitivity to therapy.</p>
                    </fn>
                </table-wrap-foot>
            </table-wrap>
            <p>
                <italic toggle="yes">RF learning</italic>: RF was trained using the WEKA 3.7
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup> data mining tool. This classifier uses multiple random trees for classification, which are combined via a voting scheme to make a decision on the given input gene set. A grid search was used to optimize the maximum number of randomly selected genes for each tree in RF, where k (maximum number of selected genes for each tree) was set from 1 to 19. 
                <xref ref-type="fig" rid="f2">Figure 2</xref> depicts the therapy outcome prediction process of a given patient using a RF consisting of a series of decision trees derived from different subsets of paclitaxel-related genes.</p>
            <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                <label>Figure 2. </label>
                <caption>
                    <title>RF decision tree diagram depicts the therapy outcome prediction process of a given patient, using a RF consisting of 
                        <italic toggle="yes">k</italic> decision trees.</title>
                    <p>Several DTs are built using different subsets of paclitaxel-related genes. The process starts from the root of each tree and if the expression of the gene corresponding to that node is greater than a specific value, the process continues through the right branch, otherwise it continues through the left branch until it reaches a leaf node; that leaf represents the prediction of the tree for that specific input. The decisions of all trees are considered and the one with the largest number of votes is selected as the patient outcome.</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/11525/99c35509-b811-4437-9f96-7158fdcdc41a_figure2.gif"/>
            </fig>
            <p>
                <italic toggle="yes">Augmented Gene Selection</italic>: The most relevant genes (features) for therapy outcome prediction were found using the Minimum Redundancy and Maximum Relevance (mRMR) approach
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>. mRMR is a wrapper approach that incrementally selects genes by maximizing the average mutual information between gene expression features and classes, while minimizing their redundancies:</p>
            <p>
                <disp-formula id="e1">
                    <mml:math display="block" id="math1">
                        <mml:mrow>
                            <mml:mi>m</mml:mi>
                            <mml:mi>R</mml:mi>
                            <mml:mi>M</mml:mi>
                            <mml:mi>R</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:munder>
                                <mml:mrow>
                                    <mml:mtext>max</mml:mtext>
                                </mml:mrow>
                                <mml:mi>s</mml:mi>
                            </mml:munder>
                            <mml:mrow>
                                <mml:mo>[</mml:mo>
                                <mml:mrow>
                                    <mml:mfrac>
                                        <mml:mn>1</mml:mn>
                                        <mml:mrow>
                                            <mml:mrow>
                                                <mml:mo>|</mml:mo>
                                                <mml:mi>s</mml:mi>
                                                <mml:mo>|</mml:mo>
                                            </mml:mrow>
                                        </mml:mrow>
                                    </mml:mfrac>
                                    <mml:mstyle displaystyle="true">
                                        <mml:munder>
                                            <mml:mo>&#x2211;</mml:mo>
                                            <mml:mrow>
                                                <mml:msub>
                                                    <mml:mi>f</mml:mi>
                                                    <mml:mi>i</mml:mi>
                                                </mml:msub>
                                                <mml:mo>&#x2208;</mml:mo>
                                                <mml:mi>S</mml:mi>
                                            </mml:mrow>
                                        </mml:munder>
                                        <mml:mrow>
                                            <mml:mi>I</mml:mi>
                                            <mml:mo stretchy="false">(</mml:mo>
                                            <mml:msub>
                                                <mml:mi>f</mml:mi>
                                                <mml:mi>i</mml:mi>
                                            </mml:msub>
                                            <mml:mo>,</mml:mo>
                                            <mml:mi>C</mml:mi>
                                            <mml:mo stretchy="false">)</mml:mo>
                                            <mml:mo>&#x2212;</mml:mo>
                                            <mml:mfrac>
                                                <mml:mn>1</mml:mn>
                                                <mml:mrow>
                                                    <mml:msup>
                                                        <mml:mrow>
                                                            <mml:mrow>
                                                                <mml:mo>|</mml:mo>
                                                                <mml:mi>s</mml:mi>
                                                                <mml:mo>|</mml:mo>
                                                            </mml:mrow>
                                                        </mml:mrow>
                                                        <mml:mn>2</mml:mn>
                                                    </mml:msup>
                                                </mml:mrow>
                                            </mml:mfrac>
                                        </mml:mrow>
                                    </mml:mstyle>
                                    <mml:mstyle displaystyle="true">
                                        <mml:munder>
                                            <mml:mo>&#x2211;</mml:mo>
                                            <mml:mrow>
                                                <mml:msub>
                                                    <mml:mi>f</mml:mi>
                                                    <mml:mi>i</mml:mi>
                                                </mml:msub>
                                                <mml:mo>,</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>f</mml:mi>
                                                    <mml:mi>j</mml:mi>
                                                </mml:msub>
                                                <mml:mo>&#x2208;</mml:mo>
                                                <mml:mi>S</mml:mi>
                                            </mml:mrow>
                                        </mml:munder>
                                        <mml:mrow>
                                            <mml:mi>I</mml:mi>
                                            <mml:mo stretchy="false">(</mml:mo>
                                            <mml:msub>
                                                <mml:mi>f</mml:mi>
                                                <mml:mi>i</mml:mi>
                                            </mml:msub>
                                            <mml:mo>,</mml:mo>
                                            <mml:msub>
                                                <mml:mi>f</mml:mi>
                                                <mml:mi>j</mml:mi>
                                            </mml:msub>
                                            <mml:mo stretchy="false">)</mml:mo>
                                        </mml:mrow>
                                    </mml:mstyle>
                                </mml:mrow>
                                <mml:mo>]</mml:mo>
                            </mml:mrow>
                        </mml:mrow>
                    </mml:math>
                </disp-formula>
            </p>
            <p>where 
                <italic toggle="yes">f
                    <sub>i</sub>
                </italic> corresponds to a feature in gene set 
                <italic toggle="yes">S, I(f
                    <sub>i</sub>,C)</italic> is the mutual information between 
                <italic toggle="yes">f
                    <sub>i</sub>
                </italic> and class 
                <italic toggle="yes">C</italic>, and 
                <italic toggle="yes">I(f
                    <sub>i</sub>,f
                    <sub>j</sub>)</italic> is the mutual information between features 
                <italic toggle="yes">f
                    <sub>i</sub>
                </italic> and 
                <italic toggle="yes">f
                    <sub>j</sub>
                </italic>.</p>
            <p>For this experiment, we used a 26-gene signature (genes 
                <italic toggle="yes">ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A, TUBB4B, FGF2, FN1, GBP1, NFKB2, OPRK1, TLR6,</italic> and 
                <italic toggle="yes">TWIST1</italic>) as the base feature set. These genes were selected (in Dorman 
                <italic toggle="yes">et al.</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>) based either on their known involvement in paclitaxel metabolism, or evidence that their expression levels and/or copy numbers correlate with paclitaxel GI
                <sub>50</sub> values. mRMR and SVM were combined to obtain a subset of genes that can accurately predict patient survival outcomes; here, we considered 3, 4 and 5 years as survival thresholds for breast cancer patients.</p>
            <p>Performance was evaluated with several metrics. WEKA determined accuracy (ACC), the weighted average of precision and F-measure, the Matthews Correlation Coefficient (MCC) and the area under ROC curve (AUC).</p>
        </sec>
        <sec sec-type="results | discussion">
            <title>Results and discussion</title>
            <supplementary-material id="DS0" orientation="portrait" position="float" xlink:href="https://f1000researchdata.s3.amazonaws.com/datasets/9417/73529ea6-6904-424b-892b-2411acc16fdf_Dataset_1_Revised.xlsx">
                <label>Predicted treatment response for each individual METABRIC patient</label>
                <caption>
                    <p>
                        <sup>
                            <xref ref-type="bibr" rid="ref-12">12</xref>
                        </sup> The predicted and expected response to treatment for each individual METABRIC patient for each analyses listed in 
                        <xref ref-type="table" rid="T1">Table 1</xref>, 
                        <xref ref-type="table" rid="T2">Table 2</xref> and 
                        <xref ref-type="table" rid="T3">Table 3</xref> are indexed. Patients sensitive to treatment are labeled with &#x2018;0&#x2019; while resistant patients are labeled &#x2018;1&#x2019;.</p>
                </caption>
            </supplementary-material>
            <table-wrap id="T2" orientation="portrait" position="anchor">
                <label>Table 2. </label>
                <caption>
                    <title>Results of applying RF to predict outcome of paclitaxel therapy.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">Type of treatment</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">Survival years (as
                                <break/>threshold)</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;"># Patients</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">
                                <italic toggle="yes">K</italic> (number of genes
                                <break/>to be used in
                                <break/>random selection)</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">Accuracy (True
                                <break/>Positive - TP) (%)</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">Precision</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">F-Measure</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">MCC
                                <sup>
                                    <xref ref-type="other" rid="fn2">1</xref>
                                </sup>
                            </th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">AUC
                                <sup>
                                    <xref ref-type="other" rid="fn2">2</xref>
                                </sup>
                            </th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="center" colspan="1" rowspan="3">Chemotherapy
                                <break/>(CT)</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="3">53</td>
                            <td align="center" colspan="1" rowspan="1">7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">56.6</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.510</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.524</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">-0.059</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.441</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">69.8</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.698</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.698</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.396</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.700</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">19</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">66.0</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.645</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.636</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.230</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.653</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="3">Hormone therapy
                                <break/>(HT)</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="3">420</td>
                            <td align="center" colspan="1" rowspan="1">19</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">85.5</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.731</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.788</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.000</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.606</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">9</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">78.6</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.715</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.706</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.069</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.559</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">9</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">71.0</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.634</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.627</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.059</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.632</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="3">CT and/or HT</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="3">504</td>
                            <td align="center" colspan="1" rowspan="1">9</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">82.7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.685</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.749</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.000</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.506</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">19</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">73.6</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.647</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.648</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.039</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.527</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">65.3</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.602</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.593</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.086</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.588</td>
                        </tr>
                    </tbody>
                </table>
                <table-wrap-foot>
                    <fn>
                        <p id="fn2">
                            <sup>1</sup>MCC: Matthews Correlation Coefficient. 
                            <sup>2</sup>AUC: Area under receiver operating curve; both Discovery and Validation patient datasets analyzed. RF predictions done using a gene panel consisting of 19 genes (
                            <italic toggle="yes">ABCB1</italic>, 
                            <italic toggle="yes">ABCB11</italic>, 
                            <italic toggle="yes">ABCC1</italic>, 
                            <italic toggle="yes">ABCC10</italic>, 
                            <italic toggle="yes">BAD</italic>, 
                            <italic toggle="yes">BBC3</italic>, 
                            <italic toggle="yes">BCL2</italic>, 
                            <italic toggle="yes">BCL2L1</italic>, 
                            <italic toggle="yes">BMF</italic>, 
                            <italic toggle="yes">CYP2C8</italic>, 
                            <italic toggle="yes">CYP3A4</italic>, 
                            <italic toggle="yes">MAP2</italic>, 
                            <italic toggle="yes">MAP4</italic>, 
                            <italic toggle="yes">MAPT</italic>, 
                            <italic toggle="yes">NR1I2</italic>, 
                            <italic toggle="yes">SLCO1B3</italic>, 
                            <italic toggle="yes">TUBB1</italic>, 
                            <italic toggle="yes">TUBB4A</italic>, 
                            <italic toggle="yes">TUBB4B</italic>).</p>
                    </fn>
                </table-wrap-foot>
            </table-wrap>
            <table-wrap id="T3" orientation="portrait" position="anchor">
                <label>Table 3. </label>
                <caption>
                    <title>Results of mRMR feature selection for an SVM for predicting outcome of paclitaxel therapy.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>Data</bold>
                            </th>
                            <th align="center" colspan="3" rowspan="1">CT
                                <sup>
                                    <xref ref-type="other" rid="fn3">1</xref>
                                </sup>
                            </th>
                            <th align="center" colspan="3" rowspan="1">HT</th>
                            <th align="center" colspan="3" rowspan="1">CT+HT</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
								
                                <bold>Survival years</bold>
                                <break/>
                                <bold>(as threshold)</bold>
							</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">5</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold># patients
                                    <sup>
                                        <xref ref-type="other" rid="fn3">2</xref>
                                    </sup>
                                </bold>
                            </td>
                            <td align="center" colspan="3" rowspan="1">53</td>
                            <td align="center" colspan="3" rowspan="1">420</td>
                            <td align="center" colspan="3" rowspan="1">504</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>Accuracy (TP)</bold>
                                <break/>
                                <bold>(%)</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">81.1</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">81.1</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">84.9</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">85.7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">79.5</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">72.9</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">83.1</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">74.8</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">67.9</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>Precision</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.809</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.813</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.852</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.878</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.765</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.692</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.795</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.703</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.662</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>F-Measure</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.809</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.811</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.845</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.794</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.726</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.663</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.772</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.672</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.666</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
								
                                <bold>MCC</bold>
							</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.582</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.625</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.675</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.119</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.17</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.173</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.161</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.137</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.238</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
								
                                <bold>AUC</bold>
							</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.783</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.812</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.82</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.508</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.533</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.548</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.53</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.531</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.61</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
								
                                <bold>SVM Par.</bold>
                                <break/>
                                <bold>(gamma)</bold>
							</td>
                            <td align="center" colspan="1" rowspan="1">0.0</td>
                            <td align="center" colspan="1" rowspan="1">0.5</td>
                            <td align="center" colspan="1" rowspan="1">1.0</td>
                            <td align="center" colspan="1" rowspan="1">1.0</td>
                            <td align="center" colspan="1" rowspan="1">0.75</td>
                            <td align="center" colspan="1" rowspan="1">1.5</td>
                            <td align="center" colspan="1" rowspan="1">0.75</td>
                            <td align="center" colspan="1" rowspan="1">0.5</td>
                            <td align="center" colspan="1" rowspan="1">1.0</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
								
                                <bold>SVM Par.</bold>
                                <break/>
                                <bold>(cost)</bold>
							</td>
                            <td align="center" colspan="1" rowspan="1">64</td>
                            <td align="center" colspan="1" rowspan="1">128</td>
                            <td align="center" colspan="1" rowspan="1">8</td>
                            <td align="center" colspan="1" rowspan="1">2</td>
                            <td align="center" colspan="1" rowspan="1">64</td>
                            <td align="center" colspan="1" rowspan="1">2</td>
                            <td align="center" colspan="1" rowspan="1">16</td>
                            <td align="center" colspan="1" rowspan="1">2</td>
                            <td align="center" colspan="1" rowspan="1">2</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
								
                                <bold>Selected</bold>
                                <break/>
                                <bold>genes</bold>
							</td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">MAP4,</italic>
                                <break/>
                                <italic toggle="yes">GBP1</italic>,
                                <break/>
                                <italic toggle="yes">FN1</italic>,
                                <break/>
                                <italic toggle="yes">MAPT</italic>,
                                <break/>
                                <italic toggle="yes">BBC3</italic>,
                                <break/>
                                <italic toggle="yes">FGF2</italic>,
                                <break/>
                                <italic toggle="yes">NFKB2</italic>,
                                <break/>
                                <italic toggle="yes">TUBB4B</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">TWIST1</italic>,
                                <break/>
                                <italic toggle="yes">FN1</italic>,
                                <break/>
                                <italic toggle="yes">BBC3</italic>,
                                <break/>
                                <italic toggle="yes">FGF2</italic>,
                                <break/>
                                <italic toggle="yes">BCL2L1</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">ABCB11</italic>,
                                <break/>
                                <italic toggle="yes">BCL2</italic>,
                                <break/>
                                <italic toggle="yes">GBP1</italic>,
                                <break/>
                                <italic toggle="yes">SLCO1B3</italic>,
                                <break/>
                                <italic toggle="yes">ABCB1</italic>,
                                <break/>
                                <italic toggle="yes">BAD</italic>,
                                <break/>
                                <italic toggle="yes">TUBB4A</italic>,
                                <break/>
                                <italic toggle="yes">MAPT</italic>,
                                <break/>
                                <italic toggle="yes">NFKB2</italic>,
                                <break/>
                                <italic toggle="yes">TUBB4B</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">ABCB11</italic>,
                                <break/>
                                <italic toggle="yes">BCL2</italic>,
                                <break/>
                                <italic toggle="yes">MAP4</italic>,
                                <break/>
                                <italic toggle="yes">TUBB1</italic>,
                                <break/>
                                <italic toggle="yes">GBP1</italic>,
                                <break/>
                                <italic toggle="yes">SLCO1B3</italic>,
                                <break/>
                                <italic toggle="yes">ABCB1</italic>,
                                <break/>
                                <italic toggle="yes">BAD</italic>,
                                <break/>
                                <italic toggle="yes">TWIST1</italic>,
                                <break/>
                                <italic toggle="yes">FN1</italic>,
                                <break/>
                                <italic toggle="yes">TUBB4A</italic>,
                                <break/>
                                <italic toggle="yes">MAPT</italic>,
                                <break/>
                                <italic toggle="yes">OPRK1</italic>,
                                <break/>
                                <italic toggle="yes">BBC3</italic>,
                                <break/>
                                <italic toggle="yes">FGF2</italic>,
                                <break/>
                                <italic toggle="yes">NFKB2</italic>,
                                <break/>
                                <italic toggle="yes">ABCC1</italic>,
                                <break/>
                                <italic toggle="yes">NR1I2</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">BAD</italic>,
                                <break/>
                                <italic toggle="yes">GBP1</italic>,
                                <break/>
                                <italic toggle="yes">MAPT</italic>,
                                <break/>
                                <italic toggle="yes">BBC3</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">ABCB11</italic>,
                                <break/>
                                <italic toggle="yes">MAP4</italic>,
                                <break/>
                                <italic toggle="yes">SLCO1B3</italic>,
                                <break/>
                                <italic toggle="yes">BAD</italic>,
                                <break/>
                                <italic toggle="yes">FN1</italic>,
                                <break/>
                                <italic toggle="yes">OPRK1</italic>,
                                <break/>
                                <italic toggle="yes">BBC3</italic>,
                                <break/>
                                <italic toggle="yes">NFKB2</italic>,
                                <break/>
                                <italic toggle="yes">NR1I2</italic>,
                                <break/>
                                <italic toggle="yes">TUBB4B</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">ABCB11</italic>,
                                <break/>
                                <italic toggle="yes">SLCO1B3</italic>,
                                <break/>
                                <italic toggle="yes">BAD</italic>,
                                <break/>
                                <italic toggle="yes">TUBB4A</italic>,
                                <break/>
                                <italic toggle="yes">MAPT</italic>,
                                <break/>
                                <italic toggle="yes">BBC3</italic>,
                                <break/>
                                <italic toggle="yes">FGF2</italic>,
                                <break/>
                                <italic toggle="yes">NFKB2</italic>,
                                <break/>
                                <italic toggle="yes">ABCC1</italic>,
                                <break/>
                                <italic toggle="yes">NR1I2</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">ABCB11</italic>,
                                <break/>
                                <italic toggle="yes">BMF</italic>,
                                <break/>
                                <italic toggle="yes">BCL2</italic>,
                                <break/>
                                <italic toggle="yes">MAP4</italic>,
                                <break/>
                                <italic toggle="yes">TUBB1</italic>,
                                <break/>
                                <italic toggle="yes">GBP1</italic>,
                                <break/>
                                <italic toggle="yes">SLCO1B3</italic>,
                                <break/>
                                <italic toggle="yes">ABCB1</italic>,
                                <break/>
                                <italic toggle="yes">BAD</italic>,
                                <break/>
                                <italic toggle="yes">TWIST1</italic>,
                                <break/>
                                <italic toggle="yes">FN1</italic>,
                                <break/>
                                <italic toggle="yes">MAPT</italic>,
                                <break/>
                                <italic toggle="yes">OPRK1</italic>,
                                <break/>
                                <italic toggle="yes">BBC3</italic>,
                                <break/>
                                <italic toggle="yes">FGF2</italic>,
                                <break/>
                                <italic toggle="yes">NFKB2</italic>,
                                <break/>
                                <italic toggle="yes">ABCC1</italic>,
                                <break/>
                                <italic toggle="yes">NR1I2</italic>,
                                <break/>
                                <italic toggle="yes">TUBB4B</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">MAP4</italic>,
                                <break/>
                                <italic toggle="yes">GBP1</italic>,
                                <break/>
                                <italic toggle="yes">SLCO1B3</italic>,
                                <break/>
                                <italic toggle="yes">BAD</italic>,
                                <break/>
                                <italic toggle="yes">MAPT</italic>,
                                <break/>
                                <italic toggle="yes">OPRK1</italic>,
                                <break/>
                                <italic toggle="yes">BBC3</italic>,
                                <break/>
                                <italic toggle="yes">NFKB2</italic>,
                                <break/>
                                <italic toggle="yes">ABCC1</italic>,
                                <break/>
                                <italic toggle="yes">NR1I2</italic>,
                                <break/>
                                <italic toggle="yes">TUBB4B</italic>
                            </td>
                        </tr>
                    </tbody>
                </table>
                <table-wrap-foot>
                    <fn>
                        <p id="fn3">
                            <sup>1</sup> For patients treated with CT with &#x2265;4 Yr survival and CT+ HT for &#x2265; 5 Yr, the cost for the mRMR model was set to 64. Of those treated with CT for &#x2265; 4 Yr, genes were selected using a greedy, stepwise forward search, while in other cases, greedy stepwise backward search was used. Also, gamma = 0 in all cases. 
                            <sup>2</sup>Predicted responses for individual METABRIC patients are provided in 
                            <xref ref-type="other" rid="DS0">Dataset 1</xref>.</p>
                    </fn>
                </table-wrap-foot>
            </table-wrap>
            <p>The performances of several ML techniques have been compared such that they distinguish paclitaxel sensitivity and resistance in METABRIC patients using its tumour gene expression datasets. We used mRMR to generate gene signatures and determine which genes are important for treatment response in METABRIC patients. The paclitaxel models are more accurate for prediction of outcomes in patients receiving HT and/or CT compared to other patient groups.</p>
            <p>SVMs and RF were trained using expression of genes associated with paclitaxel response, mechanism of action and stable genes in the biological pathways of these targets (
                <xref ref-type="fig" rid="f3">Figure 3</xref>). Pair-wise comparisons of these genes with those from MammaPrint and Oncotype Dx (other genomic classifiers for breast cancer) find that these signatures are nearly independent of each other, with only a single gene overlap. The distinct differences of these signatures are due to their methodology of derivation, based on different principles and for different purposes (i.e. drug response for a specific reagent). SVM models for drugs used to treat these patients were derived by backwards feature selection on patient subsets stratified by treatment or outcome (
                <xref ref-type="table" rid="T1">Table 1</xref>). The highest SVM accuracy was found for the paclitaxel signature in patients treated with HT and/or adjuvant chemotherapy (78.6%). Since some CT patients were also treated with tamoxifen, methotraxate, epirubicin, doxorubicin and 5-fluorouracil, we also evaluated the performance of models developed for these drugs using the same algorithm. These gene signatures also had acceptable performance (accuracies between 71&#x2013;76%; AUCs between 0.686 &#x2013; 0.766). Leave-one-out validation (CT and HT, no treatment, and deceased patients) exhibited higher model performance than 9-fold crossvalidation (CT and/or HT, including patients treated with radiation).</p>
            <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                <label>Figure 3. </label>
                <caption>
                    <title>Schematic elements of gene expression changes associated with response to paclitaxel.</title>
                    <p>Red boxes indicate genes with a positive correlation between gene expression or copy number, and resistance using multiple factor analysis. Blue demonstrates a negative correlation. Genes outlined in dark grey are those in a previously published paclitaxel SVM model (reproduced from reference 
                        <xref ref-type="bibr" rid="ref-1">1</xref> with permission).</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/11525/99c35509-b811-4437-9f96-7158fdcdc41a_figure3.gif"/>
            </fig>
            <p>The RF classifier was used to predict paclitaxel therapy outcome for patients that underwent CT and/or HT (
                <xref ref-type="table" rid="T2">Table 2</xref>). The best performance achieved with RF showed an 85.5% overall accuracy using a 3-year survival threshold for distinguishing therapeutic resistance vs. sensitivity for those patients that underwent HT.</p>
            <p>The best overall accuracy and AUC (sensitivity and specificity) for CT/HT patients using mRMR feature selection for SVM predicting outcome of paclitaxel therapy was obtained for CT patients with 4-year survival (
                <xref ref-type="table" rid="T3">Table 3</xref>). Outcomes for HT patients with 3-year survival were predicted with 85.7% accuracy; however, the specificity was lower in this group. SVM combined with mRMR further improved accuracy of feature selection and prediction of response to hormone and/or chemotherapy based on survival time than either SVM or RF alone. Predicted treatment responses for individual METABRIC patients using the described ML techniques are indicated in 
                <xref ref-type="other" rid="DS0">Dataset 1</xref>.</p>
            <p>We also assessed the separate 
                <italic toggle="yes">Discovery</italic> and 
                <italic toggle="yes">Validation</italic> datasets as training and test sets, respectively, and repeated the previous experiments. In this scenario, the performance of the model was poor (slightly better than random). This occurred because the gene expression distributions of many of the paclitaxel-related genes in our signature were not reproducible between these two sets and were in fact, quite different (based on Wilcoxon rank sum test, Kruskal-Wallis test and t-tests; 
                <xref ref-type="other" rid="SM1">Supplementary file 1</xref>). This heterogeneity indicates that it is inappropriate to test our gene expression signatures derived by one of these datasets using the other dataset. Furthermore, these gene expression differences also affect the performance of these methods (compare 
                <xref ref-type="table" rid="T2">Table 2</xref> and 
                <xref ref-type="table" rid="T4">Table 4</xref> for RF; 
                <xref ref-type="table" rid="T3">Table 3</xref> and 
                <xref ref-type="table" rid="T5">Table 5</xref> for mRMR).</p>
            <p>To evaluate the paclitaxel models without relying on the 
                <italic toggle="yes">Validation</italic> dataset, the 
                <italic toggle="yes">Discovery</italic> set was split into two distinct parts, consisting of 70% of the patient samples randomly selected for training, and a different set of 30% of samples for testing. This procedure was repeated 100 times using different combinations of training and test samples, and the median performance of these runs is reported (
                <xref ref-type="table" rid="T4">Table 4</xref> and 
                <xref ref-type="table" rid="T5">Table 5</xref>). We also compared the performance of our mRMR+SVM model with the 
                <italic toggle="yes">K-TSP</italic> model (presented in 
                <xref ref-type="bibr" rid="ref-11">11</xref>; 
                <xref ref-type="table" rid="T6">Table 6</xref>). In most cases, our method outperformed K-TSP, based on its accuracy in classifying new patients.</p>
            <table-wrap id="T4" orientation="portrait" position="anchor">
                <label>Table 4. </label>
                <caption>
                    <title>Results of applying RF to predict outcome of the paclitaxel signature for the METABRIC 
                        <italic toggle="yes">Discovery</italic> patient set.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">Type of
                                <break/>treatment</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">Survival
                                <break/>years (as
                                <break/>threshold)</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;"># Patients</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">
                                <italic toggle="yes">K</italic> (number
                                <break/>of genes
                                <break/>to be used
                                <break/>in random
                                <break/>selection)</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">Accuracy
                                <break/>(True
                                <break/>Positive -
                                <break/>TP) (%)</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">Precision</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">F-Measure</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">MCC</th>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7;">AUC</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="center" colspan="1" rowspan="3">Chemotherapy
                                <break/>(CT)</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="3">22</td>
                            <td align="center" colspan="1" rowspan="1">7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">61.1</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.617</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.612</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.224</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.444</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">66.7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.643</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.646</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.189</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.715</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">19</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">66.7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.722</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.687</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.189</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.571</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="3">Hormone therapy
                                <break/>(HT)</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="3">185</td>
                            <td align="center" colspan="1" rowspan="1">19</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">77.0</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.780</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.775</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.018</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.524</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">9</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">79.1</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.733</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.710</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.084</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.527</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">9</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">68.9</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.533</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.601</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">-0.133</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.594</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="3">CT and/or HT</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="3">221</td>
                            <td align="center" colspan="1" rowspan="1">9</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">80.2</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.677</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.734</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">-0.07</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.389</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">19</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">54.8</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.554</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.551</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">-0.143</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.395</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">60.5</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.567</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.579</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.016</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.479</td>
                        </tr>
                    </tbody>
                </table>
                <table-wrap-foot>
                    <fn>
                        <p>Paclitaxel gene panel consisted of 19 genes (
                            <italic toggle="yes">ABCB1</italic>, 
                            <italic toggle="yes">ABCB11</italic>, 
                            <italic toggle="yes">ABCC1</italic>, 
                            <italic toggle="yes">ABCC10</italic>, 
                            <italic toggle="yes">BAD</italic>, 
                            <italic toggle="yes">BBC3</italic>, 
                            <italic toggle="yes">BCL2</italic>, 
                            <italic toggle="yes">BCL2L1</italic>, 
                            <italic toggle="yes">BMF</italic>, 
                            <italic toggle="yes">CYP2C8</italic>, 
                            <italic toggle="yes">CYP3A4</italic>, 
                            <italic toggle="yes">MAP2</italic>, 
                            <italic toggle="yes">MAP4</italic>, 
                            <italic toggle="yes">MAPT</italic>, 
                            <italic toggle="yes">NR1I2</italic>, 
                            <italic toggle="yes">SLCO1B3</italic>, 
                            <italic toggle="yes">TUBB1</italic>, 
                            <italic toggle="yes">TUBB4A</italic>, 
                            <italic toggle="yes">TUBB4B</italic>).</p>
                    </fn>
                </table-wrap-foot>
            </table-wrap>
            <table-wrap id="T5" orientation="portrait" position="anchor">
                <label>Table 5. </label>
                <caption>
                    <title>Results of mRMR feature selection for an SVM for predicting outcome of the paclitaxel signature for the METABRIC 
                        <italic toggle="yes">Discovery</italic> patient set.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">Treatment</th>
                            <th align="center" colspan="3" rowspan="1">CT
                                <sup>
                                    <xref ref-type="other" rid="fn4">1</xref>
                                </sup>
                            </th>
                            <th align="center" colspan="3" rowspan="1">HT</th>
                            <th align="center" colspan="3" rowspan="1">CT+HT</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>Survival</bold>
                                <break/>
                                <bold>years (as</bold>
                                <break/>
                                <bold>threshold)</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">5</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold># patients</bold>
                            </td>
                            <td align="center" colspan="3" rowspan="1">22</td>
                            <td align="center" colspan="3" rowspan="1">185</td>
                            <td align="center" colspan="3" rowspan="1">221</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>Accuracy</bold>
                                <break/>
                                <bold>(TP) (%)</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">57.14</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">57.14</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">85.7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">81.8</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">70.9</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">63.6</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">71.2</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">69.7</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">71.2</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>Precision</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.595</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.686</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.735</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.726</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.670</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.532</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.647</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.629</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.693</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>F-Measure</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.571</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.623</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.791</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.769</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.686</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.562</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.668</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.628</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.666</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>MCC</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.167</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">-0.258</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.000</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">-0.080</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.032</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">-0.075</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.035</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.071</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.245</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>AUC</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.583</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.333</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.500</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.479</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.514</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.477</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.513</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.521</td>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#E5DFEC">0.586</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>SVM Par.</bold>
                                <break/>
                                <bold>(gamma)</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1">0.0</td>
                            <td align="center" colspan="1" rowspan="1">0.5</td>
                            <td align="center" colspan="1" rowspan="1">1.0</td>
                            <td align="center" colspan="1" rowspan="1">1.0</td>
                            <td align="center" colspan="1" rowspan="1">0.75</td>
                            <td align="center" colspan="1" rowspan="1">1.5</td>
                            <td align="center" colspan="1" rowspan="1">0.75</td>
                            <td align="center" colspan="1" rowspan="1">0.5</td>
                            <td align="center" colspan="1" rowspan="1">1.0</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>SVM Par.</bold>
                                <break/>
                                <bold>(cost)</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1">64</td>
                            <td align="center" colspan="1" rowspan="1">128</td>
                            <td align="center" colspan="1" rowspan="1">8</td>
                            <td align="center" colspan="1" rowspan="1">2</td>
                            <td align="center" colspan="1" rowspan="1">64</td>
                            <td align="center" colspan="1" rowspan="1">2</td>
                            <td align="center" colspan="1" rowspan="1">16</td>
                            <td align="center" colspan="1" rowspan="1">2</td>
                            <td align="center" colspan="1" rowspan="1">2</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>Selected</bold>
                                <break/>
                                <bold>genes</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">TWIST1</italic>
                                <break/>
                                <italic toggle="yes">BMF</italic>
                                <break/>
                                <italic toggle="yes">CYP2C8</italic>
                                <break/>
                                <italic toggle="yes">CYP3A4</italic>
                                <break/>
                                <italic toggle="yes">BCL2L1</italic>
                                <break/>
                                <italic toggle="yes">BBC3</italic>
                                <break/>
                                <italic toggle="yes">BAD</italic>
                                <break/>
                                <italic toggle="yes">MAP2</italic>
                                <break/>
                                <italic toggle="yes">MAPT</italic>
                                <break/>
                                <italic toggle="yes">NFKB2</italic>
                                <break/>
                                <italic toggle="yes">FN1</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">BCL2</italic>
                                <break/>
                                <italic toggle="yes">BMF</italic>
                                <break/>
                                <italic toggle="yes">CYP2C8</italic>
                                <break/>
                                <italic toggle="yes">CYP3A4</italic>
                                <break/>
                                <italic toggle="yes">BAD</italic>
                                <break/>
                                <italic toggle="yes">ABCC10</italic>
                                <break/>
                                <italic toggle="yes">NFKB2</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">MAP2</italic>
                                <break/>
                                <italic toggle="yes">BCL2</italic>
                                <break/>
                                <italic toggle="yes">BCL2L1</italic>
                                <break/>
                                <italic toggle="yes">BBC3</italic>
                                <break/>
                                <italic toggle="yes">MAPT</italic>
                                <break/>
                                <italic toggle="yes">GBP1</italic>
                                <break/>
                                <italic toggle="yes">NFKB2</italic>
                                <break/>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">TWIST1</italic>
                                <break/>
                                <italic toggle="yes">BCL2</italic>
                                <break/>
                                <italic toggle="yes">BMF</italic>
                                <break/>
                                <italic toggle="yes">CYP2C8</italic>
                                <break/>
                                <italic toggle="yes">CYP3A4</italic>
                                <break/>
                                <italic toggle="yes">BCL2L1</italic>
                                <break/>
                                <italic toggle="yes">BBC3</italic>
                                <break/>
                                <italic toggle="yes">TLR6</italic>
                                <break/>
                                <italic toggle="yes">BAD</italic>
                                <break/>
                                <italic toggle="yes">ABCB11</italic>
                                <break/>
                                <italic toggle="yes">ABCC1</italic>
                                <break/>
                                <italic toggle="yes">ABCC10</italic>
                                <break/>
                                <italic toggle="yes">MAP4</italic>
                                <break/>
                                <italic toggle="yes">MAPT</italic>
                                <break/>
                                <italic toggle="yes">NR1I2</italic>
                                <break/>
                                <italic toggle="yes">GBP1</italic>
                                <break/>
                                <italic toggle="yes">NFKB2</italic>
                                <break/>
                                <italic toggle="yes">OPRK1</italic>
                                <break/>
                                <italic toggle="yes">FN1</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">TWIST1</italic>
                                <break/>
                                <italic toggle="yes">CYP2C8</italic>
                                <break/>
                                <italic toggle="yes">CYP3A4</italic>
                                <break/>
                                <italic toggle="yes">BCL2L1</italic>
                                <break/>
                                <italic toggle="yes">BBC3</italic>
                                <break/>
                                <italic toggle="yes">TLR6</italic>
                                <break/>
                                <italic toggle="yes">ABCB11</italic>
                                <break/>
                                <italic toggle="yes">ABCC1</italic>
                                <break/>
                                <italic toggle="yes">ABCC10</italic>
                                <break/>
                                <italic toggle="yes">MAP2</italic>
                                <break/>
                                <italic toggle="yes">MAPT</italic>
                                <break/>
                                <italic toggle="yes">NR1I2</italic>
                                <break/>
                                <italic toggle="yes">GBP1</italic>
                                <break/>
                                <italic toggle="yes">NFKB2</italic>
                                <break/>
                                <italic toggle="yes">FN1</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">TWIST1</italic>
                                <break/>
                                <italic toggle="yes">BMF</italic>
                                <break/>
                                <italic toggle="yes">CYP2C8</italic>
                                <break/>
                                <italic toggle="yes">CYP3A4</italic>
                                <break/>
                                <italic toggle="yes">BCL2L1</italic>
                                <break/>
                                <italic toggle="yes">BBC3</italic>
                                <break/>
                                <italic toggle="yes">ABCB11</italic>
                                <break/>
                                <italic toggle="yes">ABCC1</italic>
                                <break/>
                                <italic toggle="yes">ABCC10</italic>
                                <break/>
                                <italic toggle="yes">MAP2</italic>
                                <break/>
                                <italic toggle="yes">MAP4</italic>
                                <break/>
                                <italic toggle="yes">MAPT</italic>
                                <break/>
                                <italic toggle="yes">NR1I2</italic>
                                <break/>
                                <italic toggle="yes">GBP1</italic>
                                <break/>
                                <italic toggle="yes">NFKB2</italic>
                                <break/>
                                <italic toggle="yes">OPRK1</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">BMF</italic>
                                <break/>
                                <italic toggle="yes">CYP2C8</italic>
                                <break/>
                                <italic toggle="yes">BCL2L1</italic>
                                <break/>
                                <italic toggle="yes">BBC3</italic>
                                <break/>
                                <italic toggle="yes">BAD</italic>
                                <break/>
                                <italic toggle="yes">ABCC1</italic>
                                <break/>
                                <italic toggle="yes">ABCC10</italic>
                                <break/>
                                <italic toggle="yes">MAP4</italic>
                                <break/>
                                <italic toggle="yes">NR1I2</italic>
                                <break/>
                                <italic toggle="yes">GBP1</italic>
                                <break/>
                                <italic toggle="yes">NFKB2</italic>
                                <break/>
                                <italic toggle="yes">OPRK1</italic>
                                <break/>
                                <italic toggle="yes">FN1</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">TWIST1</italic>
                                <break/>
                                <italic toggle="yes">BMF</italic>
                                <break/>
                                <italic toggle="yes">CYP2C8</italic>
                                <break/>
                                <italic toggle="yes">CYP3A4</italic>
                                <break/>
                                <italic toggle="yes">BCL2L1</italic>
                                <break/>
                                <italic toggle="yes">BBC3</italic>
                                <break/>
                                <italic toggle="yes">TLR6</italic>
                                <break/>
                                <italic toggle="yes">ABCB11</italic>
                                <break/>
                                <italic toggle="yes">ABCC1</italic>
                                <break/>
                                <italic toggle="yes">ABCC10</italic>
                                <break/>
                                <italic toggle="yes">MAP2</italic>
                                <break/>
                                <italic toggle="yes">MAP4</italic>
                                <break/>
                                <italic toggle="yes">MAPT</italic>
                                <break/>
                                <italic toggle="yes">NR1I2</italic>
                                <break/>
                                <italic toggle="yes">GBP1</italic>
                                <break/>
                                <italic toggle="yes">NFKB2</italic>
                                <break/>
                                <italic toggle="yes">OPRK1</italic>
                                <break/>
                                <italic toggle="yes">FN1</italic>
                            </td>
                            <td align="center" colspan="1" rowspan="1">
                                <italic toggle="yes">TWIST1</italic>
                                <break/>
                                <italic toggle="yes">BMF</italic>
                                <break/>
                                <italic toggle="yes">CYP3A4</italic>
                                <break/>
                                <italic toggle="yes">BCL2L1</italic>
                                <break/>
                                <italic toggle="yes">BBC3</italic>
                                <break/>
                                <italic toggle="yes">TLR6</italic>
                                <break/>
                                <italic toggle="yes">BAD</italic>
                                <break/>
                                <italic toggle="yes">ABCB11</italic>
                                <break/>
                                <italic toggle="yes">ABCC1</italic>
                                <break/>
                                <italic toggle="yes">MAP2</italic>
                                <break/>
                                <italic toggle="yes">MAP4</italic>
                                <break/>
                                <italic toggle="yes">MAPT</italic>
                                <break/>
                                <italic toggle="yes">NR1I2</italic>
                                <break/>
                                <italic toggle="yes">GBP1</italic>
                                <break/>
                                <italic toggle="yes">NFKB2</italic>
                                <break/>
                                <italic toggle="yes">OPRK1</italic>
                                <break/>
                                <italic toggle="yes">FN1</italic>
                            </td>
                        </tr>
                    </tbody>
                </table>
                <table-wrap-foot>
                    <fn>
                        <p id="fn4">
                            <sup>1</sup>For patients treated with CT with &#x2265;4 Yr survival and CT+ HT for &#x2265; 5 Yr
                            <italic toggle="yes">,</italic> the cost for the mRMR model was set to 64. Of those treated with CT for &#x2265; 4 Yr, genes were selected using a greedy, stepwise forward search, while in other cases, greedy stepwise backward search was used. Also, gamma = 0 in all cases.</p>
                    </fn>
                </table-wrap-foot>
            </table-wrap>
            <table-wrap id="T6" orientation="portrait" position="anchor">
                <label>Table 6. </label>
                <caption>
                    <title>Comparison between our mRMR+SVM method and K-TSP method on 
                        <italic toggle="yes">Discovery</italic> patient set of the METABRIC data.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">Data</th>
                            <th align="center" colspan="3" rowspan="1">CT</th>
                            <th align="center" colspan="3" rowspan="1">HT</th>
                            <th align="center" colspan="3" rowspan="1">CT+HT</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>Survival years</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">5</td>
                            <td align="center" colspan="1" rowspan="1">3</td>
                            <td align="center" colspan="1" rowspan="1">4</td>
                            <td align="center" colspan="1" rowspan="1">5</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold># patients</bold>
                            </td>
                            <td align="center" colspan="3" rowspan="1">22</td>
                            <td align="center" colspan="3" rowspan="1">185</td>
                            <td align="center" colspan="3" rowspan="1">221</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>mRMR+SVM Accuracy (%)</bold>
                            </td>
                            <td align="center" colspan="1" rowspan="1">57.14</td>
                            <td align="center" colspan="1" rowspan="1">57.14</td>
                            <td align="center" colspan="1" rowspan="1">85.7</td>
                            <td align="center" colspan="1" rowspan="1">81.8</td>
                            <td align="center" colspan="1" rowspan="1">70.9</td>
                            <td align="center" colspan="1" rowspan="1">63.6</td>
                            <td align="center" colspan="1" rowspan="1">71.21</td>
                            <td align="center" colspan="1" rowspan="1">69.70</td>
                            <td align="center" colspan="1" rowspan="1">71.21</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1" style="background-color:#DDEBF7">
                                <bold>K-TSP</bold>
                                <sup>
                                    <xref ref-type="bibr" rid="ref-11">11</xref>
                                </sup> 
                                <bold>Accuracy (%)</bold>
							</td>
                            <td align="center" colspan="1" rowspan="1">57.14</td>
                            <td align="center" colspan="1" rowspan="1">28.57</td>
                            <td align="center" colspan="1" rowspan="1">28.57</td>
                            <td align="center" colspan="1" rowspan="1">80.91</td>
                            <td align="center" colspan="1" rowspan="1">68.18</td>
                            <td align="center" colspan="1" rowspan="1">69.19</td>
                            <td align="center" colspan="1" rowspan="1">71.21</td>
                            <td align="center" colspan="1" rowspan="1">54.55</td>
                            <td align="center" colspan="1" rowspan="1">53.03</td>
                        </tr>
                    </tbody>
                </table>
                <table-wrap-foot>
                    <fn>
                        <p>The performances of several ML techniques have been compared such that they distinguish paclitaxel sensitivity and resistance in METABRIC patients using its tumour gene expression datasets. We used mRMR to generate gene signatures and determine which genes are important for treatment response in METABRIC patients. The paclitaxel models are more accurate for prediction of outcomes in patients receiving HT and/or CT compared to other patient groups.</p>
                    </fn>
                </table-wrap-foot>
            </table-wrap>
            <p>While not a replication study 
                <italic toggle="yes">sensu stricto</italic>, the initial paclitaxel gene set used for feature selection was the same as in our previous study
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>. Predictions for the METABRIC patient cohort, which was independent of the previous validation set
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup> used in Dorman 
                <italic toggle="yes">et al.</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>, of the either same (SVM) or different ML methods (RF and SVM with mRMR) exhibited comparable or better accuracies than our previous gene signature
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>.</p>
            <p>These techniques are powerful tools which can be used to identify genes that may be involved in drug resistance, as well as predict patient survival after treatment. Future efforts to expand these models to other drugs may assist in suggesting preferred treatments in specific patients, with the potential impact of improving efficacy and reducing duration of therapy.</p>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusion</title>
            <p>In this study we used METABRIC dataset to predict outcome for different survival times in patients receiving hormone (HT) and, in some cases, chemotherapy (CT) agents. We used different machine learning methods in order to identify the best subset of genes that can accurately predict therapeutic response in patients undergone chemotherapy, hormone therapy or a combination of both treatments. Unlike Mammaprint and Oncotype Dx tests, this model focuses on predicting survival prediction based on gene expression in the tumor, presumably before or during drug therapy. This approach may be useful for selecting specific therapies in patients that would be expected to produce a favorable response. </p>
        </sec>
        <sec>
            <title>Data availability</title>
            <p>The data referenced by this article are under copyright with the following copyright statement: Copyright: &#x00ef;&#x00bf;&#x00bd; 2017 Rezaeian I et al.</p>
            <p>Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
                <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/"/>
            </p>
            <p>
                <italic toggle="yes">Patient data</italic>: The METABRIC datasets are accessible from the European Genome-Phenome Archive (EGA) using the accession number EGAS00000000083 (
                <ext-link ext-link-type="uri" xlink:href="https://www.ebi.ac.uk/ega/studies/EGAS00000000083">https://www.ebi.ac.uk/ega/studies/EGAS00000000083</ext-link>). Normalized patient expression data for the 
                <italic toggle="yes">Discovery</italic> (EGAD00010000210) and 
                <italic toggle="yes">Validation</italic> sets (EGAD00010000211) were retrieved with permission from EGA. Corresponding clinical data was obtained from the literature
                <sup>
                    <xref ref-type="bibr" rid="ref-6">6</xref>
                </sup>. While not individually curated, HT patients were treated with tamoxifen and/or aromatase inhibitors, while CT patients were most commonly treated with cyclophosphamide-methotrexate-fluorouracil (CMF), epirubicin-CMF, or doxorubicin-cyclophosphamide.</p>
            <p>F1000Research: Dataset 1. Predicted treatment response for each individual METABRIC patient, 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9417.d149864">10.5256/f1000research.9417.d149864</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>
            </p>
        </sec>
    </body>
    <back>
        <sec id="SM1" sec-type="supplementary-material">
            <title>Supplementary material</title>
            <p>
                <bold>Variation of Gene Expression Distribution between Discovery and Validation Datasets.</bold>
            </p>
            <p>Whisker plots showing the distribution of expression in the 
                <italic toggle="yes">Discovery</italic> and 
                <italic toggle="yes">Validation</italic> METABRIC datasets for 26 genes used in the paclitaxel gene signature.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/9417/706c37bd-db76-4207-a91a-595c8952f7a7.xlsx">Click here to access the data</ext-link>.</p>
        </sec>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dorman</surname>
                            <given-names>SN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Baranova</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Knoll</surname>
                            <given-names>JH</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Genomic signatures for paclitaxel and gemcitabine resistance in breast cancer derived by machine learning.</article-title>
                    <source>

                        <italic toggle="yes">Mol Oncol.</italic>
</source>
                    <year>2016</year>;<volume>10</volume>(<issue>1</issue>):<fpage>85</fpage>&#x2013;<lpage>100</lpage>.
                    <pub-id pub-id-type="pmid">26372358</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.molonc.2015.07.006</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Daemen</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Griffith</surname>
                            <given-names>OL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Heiser</surname>
                            <given-names>LM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Modeling precision treatment of breast cancer.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2013</year>;<volume>14</volume>(<issue>10</issue>):<fpage>R110</fpage>.
                    <pub-id pub-id-type="pmid">24176112</pub-id>
                    <pub-id pub-id-type="doi">10.1186/gb-2013-14-10-r110</pub-id>
                    <pub-id pub-id-type="pmcid">3937590</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shoemaker</surname>
                            <given-names>RH</given-names>
                        </name>
</person-group>:
                    <article-title>The NCI60 human tumour cell line anticancer drug screen.</article-title>
                    <source>

                        <italic toggle="yes">Nat Rev Cancer.</italic>
</source>
                    <year>2006</year>;<volume>6</volume>(<issue>10</issue>):<fpage>813</fpage>&#x2013;<lpage>823</lpage>.
                    <pub-id pub-id-type="pmid">16990858</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nrc1951</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Park</surname>
                            <given-names>NI</given-names>
                        </name>

                        <name name-style="western">
                            <surname> Rogan</surname>
                            <given-names>PK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tarnowski</surname>
                            <given-names>HE</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Structural and genic characterization of stable genomic regions in breast cancer: Relevance to chemotherapy.</article-title>
                    <source>

                        <italic toggle="yes">Mol Oncol.</italic>
</source>
                    <year>2012</year>;<volume>6</volume>(<issue>3</issue>):<fpage>347</fpage>&#x2013;<lpage>59</lpage>.
                    <pub-id pub-id-type="pmid">22342187</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.molonc.2012.01.001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hatzis</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pusztai</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Valero</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer.</article-title>
                    <source>

                        <italic toggle="yes">JAMA.</italic>
</source>
                    <year>2011</year>;<volume>305</volume>(<issue>18</issue>):<fpage>1873</fpage>&#x2013;<lpage>1881</lpage>.
                    <pub-id pub-id-type="pmid">21558518</pub-id>
                    <pub-id pub-id-type="doi">10.1001/jama.2011.593</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Curtis</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shah</surname>
                            <given-names>SP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chin</surname>
                            <given-names>SF</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2012</year>;<volume>486</volume>(<issue>7403</issue>):<fpage>346</fpage>&#x2013;<lpage>352</lpage>.
                    <pub-id pub-id-type="pmid">22522925</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature10983</pub-id>
                    <pub-id pub-id-type="pmcid">3440846</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="book">
                    <article-title>MATLAB and Statistics Toolbox Release 2014a</article-title>. The MathWorks Inc., Natick, Massachusetts, United States.</mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ben-Hur</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Weston</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>A user&#x2019;s guide to support vector machines.</article-title>
                    <source>

                        <italic toggle="yes">Methods Mol Biol.</italic>
</source>
                    <year>2010</year>;<volume>609</volume>:<fpage>223</fpage>&#x2013;<lpage>39</lpage>.
                    <pub-id pub-id-type="pmid">20221922</pub-id>
                    <pub-id pub-id-type="doi">10.1007/978-1-60327-241-4_13</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hall</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Frank</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Holmes</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The WEKA data mining software: an update.</article-title>
                    <source>

                        <italic toggle="yes">ACM SIGKDD Explorations Newsletter.</italic>
</source>
                    <year>2009</year>;<volume>11</volume>(<issue>1</issue>):<fpage>10</fpage>&#x2013;<lpage>18</lpage>.
                    <pub-id pub-id-type="doi">10.1145/1656274.1656278</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ding</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Peng</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>Minimum redundancy feature selection from microarray gene expression data.</article-title>
                    <source>

                        <italic toggle="yes">J Bioinform Comput Biol.</italic>
</source>
                    <year>2005</year>;<volume>3</volume>(<issue>2</issue>):<fpage>185</fpage>&#x2013;<lpage>205</lpage>.
                    <pub-id pub-id-type="pmid">15852500</pub-id>
                    <pub-id pub-id-type="doi">10.1142/S0219720005001004</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Marchionni</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Afsari</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Geman</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A simple and reproducible breast cancer prognostic test.</article-title>
                    <source>

                        <italic toggle="yes">BMC Genomics.</italic>
</source>
                    <year>2013</year>;<volume>14</volume>:<fpage>336</fpage>.
                    <pub-id pub-id-type="pmid">23682826</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2164-14-336</pub-id>
                    <pub-id pub-id-type="pmcid">3662649</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rezaeian</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mucaki</surname>
                            <given-names>EJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Baranova</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Dataset 1 in: Predicting Outcomes of Hormone and Chemotherapy in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) Study by Machine Learning.</article-title>
                    <source>

                        <italic toggle="yes">F1000Research.</italic>
</source>
                    <year>2016</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9417.d149864">Data Source</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report19726">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.11525.r19726</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Tung</surname>
                        <given-names>Chun-Wei</given-names>
                    </name>
                    <xref ref-type="aff" rid="r19726a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-3011-8440</uri>
                </contrib>
                <aff id="r19726a1">
                    <label>1</label>School of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>21</day>
                <month>2</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Tung CW</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport19726" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9417.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors have addressed all the concerns raised from the previous review. A minor comment for the batch effects is given in the follows. As batch effects are expected for heterogeneous datasets, the direct application of prediction model built on the discovery dataset to the validation dataset would be incorrect and usually result in poor performance. To show the usefulness of the gene signatures while minimizing the batch effects, the authors might consider to run cross-validation on the validation dataset alone using the gene signatures obtained from discovery dataset.</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <sub-article article-type="response" id="comment2663-19726">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Rogan</surname>
                            <given-names>Peter</given-names>
                        </name>
                        <aff>University of Western Ontario, Canada</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>PKR cofounded Cytognomix. A patent application related to biologically inspired gene signatures is pending. The other authors declare that they have no competing interests.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>21</day>
                    <month>4</month>
                    <year>2017</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Thank you for your suggestion. As recommended, we have repeated the 70/30% cross-validation analysis performed in the manuscript (Tables 4 and 5) with the same genes obtained from the 
                    <italic>Discovery</italic> dataset (Tables 4 and 5), but using the 
                    <italic>Validation</italic> dataset alone for training and testing. We found that this analysis had a similar performance level as the analysis reported in the main manuscript (Tables 4 and 5). There are exceptions. The mRMR+SVM gene signature developed using &#x201c;CT-only&#x201d; patients at a 5-year threshold was much less accurate using the 
                    <italic>Validation</italic> data. However, the &#x201c;CT-only&#x201d; subset of the 
                    <italic>Validation</italic> dataset is small (N=31), and thus variability is not unexpected. Overall, this analysis suggests that the cross-validation issue was indeed mostly due to batch effects.</p>
                <p>The following sentence was written in the main text which describes this result:</p>
                <p>&#x201c;Starting with the same set of 
                    <italic>Discovery</italic> genes, we also trained a separate model using the 
                    <italic>Validation</italic> data, and tested this data by 70/30% cross-validation &#x00a0;(accuracy for RF: 56-67% [CT], 67-83% [HT], 56-81% [CT-HT]; accuracy for mRMR: 33-56% [CT], 70-84% [HT], 64-82% [CT-HT]).&#x201d;</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report19727">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.11525.r19727</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Fertig</surname>
                        <given-names>Elana J.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r19727a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r19727a1">
                    <label>1</label>Division of Oncology Biostatistics and Bioinformatics, School of Medicine, Johns Hopkins University, Baltimore, MD, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>3</day>
                <month>2</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Fertig EJ</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport19727" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9417.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors were very responsive to the previous round of reviews, including more robust cross-validation and cross-study validation and comparison with other classifiers. Particular concerns remain that the author&#x2019;s conclusions that it is inappropriate to perform cross-study validation due to batch effects are incorrect, particularly since this challenging task is essential to assess overfitting and for clinical translation of classifiers. In addition, the conclusion was insufficiently revised to place their classifier in the&#x00a0;context of the broader literature in this field.</p>
            <p> </p>
            <p> </p>
            <p> 
                <bold>Methods</bold> 
                <list list-type="order">
                    <list-item>
                        <p>Abbreviations SVM and RF must be spelled out as Support Vector Machine and Random Forest on first use. This was not addressed in the revised methods section.</p>
                    </list-item>
                </list> </p>
            <p> 
                <bold>Results</bold> 
                <list list-type="order">
                    <list-item>
                        <p>The authors did perform a robust cross-study validation, as requested in the previous review. We agree this is challenging, due in part to batch effects as reported in this manuscript. However, such cross-study validation is essential to assess the accuracy of classifiers. It is also essential to have translation of genomic signatures into the clinic, where even different assays may be used. To address these concerns the authors must do the following: (a) Remove the sentence &#x201c;This heterogeneity indicates that it is inappropriate to test our gene expression signatures derived by one of these datasets using the other dataset.&#x201d; (b) Discuss the importance of cross-study validation, challenges in this application, and potential of overfitting of suggested by these results.</p>
                    </list-item>
                    <list-item>
                        <p>The author&#x2019;s response that specific therapies were not provided in METABRIC is incorrect. According to Curtis 
                            <italic>et al.</italic>, (2012) &#x201c;Nearly all oestrogen receptor (ER)-positive and/or lymph node (LN)-negative patients did not receive chemotherapy, whereas ER-negative and LN-positive patients did. Additionally, none of the HER2
                            <sup>+</sup> patients received trastuzumab. As such, the treatments were homogeneous with respect to clinically relevant groupings.&#x201d; Therefore, the previous criticism #12 remains. Covariates such as ER/HER2/LN or PAM50 subtypes must be included in a table describing the sample cohorts remains. In addition, accuracy must be computed separately for these co-variates or included in the machine learning model.</p>
                    </list-item>
                </list> </p>
            <p> 
                <bold>Conclusion</bold> 
                <list list-type="order">
                    <list-item>
                        <p>The discussion is insufficient. It still lacks sufficient context of existing genomics classifiers in the literature. The discrepancy between their algorithm and clinical assays is confusing in revised sentence &#x201c;Unlike Mammaprint and Oncotype Dx tests, this model focuses on predicting survival prediction based on gene expression in the tumor, presumably before or during drug therapy.&#x201d; As written, it appears to disregard the long history of predicting clinical outcome from gene expression involved in developing these classifiers from gene expression data (e.g., van't Veer 
                            <italic>et al.,</italic> 2002) into clinical assays based upon expression of smaller numbers of genes.</p>
                    </list-item>
                    <list-item>
                        <p>Based on the previous review, the authors include context with other predictions of the METABRIC data in the response to the reviewers. This must also be included in the Conclusion to assess the relevance of their findings in the literature.</p>
                    </list-item>
                </list>
            </p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-19727-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups.</article-title>
                        <source>
                            <italic>Nature</italic>
                        </source>.<year>2012</year>;<volume>486</volume>(<issue>7403</issue>) :
                        <elocation-id>10.1038/nature10983</elocation-id>
                        <fpage>346</fpage>-<lpage>52</lpage>
                        <pub-id pub-id-type="pmid">22522925</pub-id>
                        <pub-id pub-id-type="doi">10.1038/nature10983</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-19727-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Gene expression profiling predicts clinical outcome of breast cancer.</article-title>
                        <source>
                            <italic>Nature</italic>
                        </source>.<year>2002</year>;<volume>415</volume>(<issue>6871</issue>) :
                        <elocation-id>10.1038/415530a</elocation-id>
                        <fpage>530</fpage>-<lpage>6</lpage>
                        <pub-id pub-id-type="pmid">11823860</pub-id>
                        <pub-id pub-id-type="doi">10.1038/415530a</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment2664-19727">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Rogan</surname>
                            <given-names>Peter</given-names>
                        </name>
                        <aff>University of Western Ontario, Canada</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>PKR cofounded Cytognomix. A patent application related to biologically inspired gene signatures is pending. The other authors declare that they have no competing interests.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>21</day>
                    <month>4</month>
                    <year>2017</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>Methods</bold>
                </p>
                <p>
                    <italic>Comment 1. Abbreviations SVM and RF must be spelled out as Support Vector Machine and</italic>
                </p>
                <p>
                    <italic>Random Forest on first use. This was not addressed in the revised methods section.</italic>
                </p>
                <p>Response: These abbreviations are now spelled out upon their first use in the main text (Methods section).</p>
                <p>
                    <bold>Results</bold>
                </p>
                <p>
                    <italic>Comment 2. The authors did perform a robust cross-study validation, as requested in the previous review. We agree this is challenging, due in part to batch effects as reported in this manuscript. However, such cross-study validation is essential to assess the accuracy of classifiers. It is also essential to have translation of genomic signatures into the clinic, where even different assays may be used. To address these concerns the authors must do the following: (a) Remove the sentence &#x201c;This heterogeneity indicates that it is inappropriate to test our gene expression signatures derived by one of these datasets using the other dataset.&#x201d; (b) Discuss the importance of cross-study validation, challenges in this application, and potential of overfitting of suggested by these results.</italic>
                </p>
                <p>Response: In regards to this point:</p>
                <p>(a) This sentence has been removed, as requested.</p>
                <p>(b) To address concerns regarding potential overfitting of our models, we cross-validate the acquired models to a non-METABRIC data set (from an independent study). In the Sage Bionetworks / DREAM Breast Cancer Prognosis Challenge, cross-study validation was performed using the &#x201c;OsloVal&#x201d; data set, which consists of gene expression and copy number data from 184 breast cancer patients (Margolin 
                    <italic>et al.</italic>, 2013). However, this dataset is not publically available and requires Ethics Board / IRB Review which we did not believe to be worth the effort. Instead, we performed cross-study validation on the gene expression of 310 breast cancer patients made publically available by Hatzis 
                    <italic>et al.</italic> (2011).</p>
                <p>Analysis of this dataset was successful for the mRMR + SVM models developed using chemotherapy-treated patient (&#x201c;CT&#x201d; models), where the threshold for resistance was set to 3-years and 4-years. The &#x201c;CT 3-year&#x201d; model performed well predicting responsive patients (74.2% accuracy), while the &#x201c;CT 4-year&#x201d; model performed better predicting non-responsive patients (75.1% accuracy). The &#x201c;CT 4-year&#x201d; model outperformed the &#x201c;CT 5-year&#x201d; model for both sensitive and resistant patient data sets.</p>
                <p>Random Forest and mRMR+SVM models which used hormone-treated patients (&#x201c;HT&#x201d; and &#x201c;CT+HT&#x201d;) were much less accurate compared to the &#x201c;CT-only&#x201d; models, and predict patients a large percentage of patients from the Hatzis data as sensitive.</p>
                <p>In the main manuscript, we have replaced the removed sentence from (a) and have written the following:</p>
                <p>&#x201c;Cross-study validation allows for the comparison of classification accuracy between the generated gene signatures. The observed heterogeneity in gene expression highlights one of the many challenges of cross-validation of gene signatures between these&#x00a0; data from the same study exhibit drastic differences (for example, 
                    <italic>BCL2L1</italic>; Supplementary file 2). Furthermore, these gene expression differences also affect the performance of these methods when these datasets were combined (compare Table 2 and Table 4 for RF; Table 3 and Table 5 for mRMR). We considered the possibility that the Discovery model might be subject to overfitting. We therefore performed cross-study validation of the Discovery set-signature with an independently-derived dataset (319 invasive breast cancer patients treated with paclitaxel and anthracycline chemotherapy
                    <sup>5</sup>). The mRMR+SVM CT-models performed well (4-year threshold model had an overall accuracy of 68.7%; 3-year threshold model exhibited lower overall accuracy [52%], but was significantly better at predicting patients in remission [74.2%]).&#x201d;</p>
                <p>
                    <bold>References</bold>
                </p>
                <p>Margolin AA, 
                    <italic>et al.</italic> Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci Transl Med. 2013 Apr 17;5(181):181re1. doi: 10.1126/scitranslmed.3006112.</p>
                <p>Hatzis, C., 
                    <italic>et al.</italic> 2011. A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer. JAMA. 305, 1873&#x2013;1881.</p>
                <p>
                    <italic>Comment 3. The author&#x2019;s response that specific therapies were not provided in METABRIC is incorrect. According to Curtis et al., (2012) &#x201c;Nearly all oestrogen receptor (ER)-positive and/or lymph node (LN)-negative patients did not receive chemotherapy, whereas ER-negative and LN-positive patients did. Additionally, none of the HER2
                        <sup>+</sup> patients received trastuzumab. As such, the treatments were homogeneous with respect to clinically relevant groupings.&#x201d; Therefore, the previous criticism #12 remains. Covariates such as ER/HER2/LN or PAM50 subtypes must be included in a table describing the sample cohorts remains. In addition, accuracy must be computed separately for these co-variates or included in the machine learning model.</italic>
                </p>
                <p>Response: Thank you for the clarification regarding patient treatment. As a response, we have added an additional supplementary table which breaks down the accuracy of our models by subtype (ER, HER2, PR, LN and PAM50; Dataset 2). In the main text, we note that accuracy of most models are consistent between subtypes (+/- 10% deviation in accuracy). Subtypes with less than twenty individuals were ignored due to its small sample size. The following deviations in accuracy were noted: 
                    <list list-type="order">
                        <list-item>
                            <p>Random Forest and mRMR models are shown to be consistently more accurate in predicting ER+, HER2- when treated with hormone therapy (both &#x201c;HT&#x201d; and &#x201c;CT and/or HT&#x201d; categories), when compared to ER- and HER2+ patients. The PAM50 basal subtype is consistently low in accuracy when testing patients treated with hormone therapy. This is most likely partially influenced by the RF and mRMR models for &#x2018;HT&#x2019; to more often predict patients as sensitive, combined with the fact that ER+ and HER2- patients were more likely to response to therapy. It is important to note that the accuracy of predictions by RF and mRMR with patients treated only with chemotherapy was fairly consistent across all available subtypes (+/- 10% accuracy).</p>
                        </list-item>
                        <list-item>
                            <p>SVM paclitaxel models performed significantly better with HER2+ patients (26 correct, 3 misclassified; 90% accurate) in HER2- patients (40 correct, 15 misclassified; 73% accurate) when tested on patients treated with both hormone and chemotherapy. In Dorman et al (2016), it was stated that
                                <italic> MAPT</italic> expression (which is present in the paclitaxel model) segregated with PAM50 luminal and basal subtypes. For this model, the accuracies of these subtypes are nearly identical to the accuracy of the entire subset.</p>
                        </list-item>
                    </list> Text describing these results can be found in the third paragraph of the results.</p>
                <p>
                    <bold>Conclusion</bold>
                </p>
                <p>
                    <italic>Comment 4. The discussion is insufficient. It still lacks sufficient context of existing genomics classifiers in the literature. The discrepancy between their algorithm and clinical assays is confusing in revised sentence:</italic>
                </p>
                <p>
                    <italic>&#x201c;Unlike Mammaprint and Oncotype Dx tests, this model focuses on predicting survival prediction based on gene expression in the tumor, presumably before or during drug therapy.&#x201d; </italic>
                </p>
                <p>
                    <italic>As written, it appears to disregard the long history of predicting clinical outcome from gene expression involved in developing these classifiers from gene expression data (e.g., van't Veer et al., 2002) into clinical assays based upon expression of smaller numbers of genes.</italic>
                </p>
                <p>Response:&#x00a0;We have removed the indicated sentence, which we agree was insufficient to the comment from the previous iteration of this article: &#x201c;Must be discussed in the context of existing genomics classifiers for breast cancer (e.g., OncotypeDx and/or Mammaprint)&#x201d;.</p>
                <p>We in no way meant to ignore the long history of predicting clinical outcome from gene expression (as well as other genomic factors). A discussion on this topic was not included in earlier submissions as it initially had an imposed word length limit (upon first submission). We did, however, reference other articles which do discuss this topic. In Dorman 
                    <italic>et al.</italic> (2016), which described some of the methodology for initial gene selection that this study was based on, these contributions &#x00a0;are &#x00a0;well-referenced, including the history of the prediction of clinical outcome from genomic status:</p>
                <p>&#x201c;Previous studies have derived associations between the genomic status of one or more genes and tumor response to certain therapies (Duan 
                    <italic>et&#x00a0;al.</italic>, 2003; Glinsky 
                    <italic>et&#x00a0;al.</italic>, 2005; Hatzis 
                    <italic>et&#x00a0;al.</italic>, 2011; Ma 
                    <italic>et&#x00a0;al.</italic>, 2004; Rajput 
                    <italic>et&#x00a0;al.</italic>, 2013; van't Veer 
                    <italic>et&#x00a0;al.</italic>, 2002).</p>
                <p>Correlations between single gene expression and tumor resistance (Duan 
                    <italic>et&#x00a0;al.</italic>, 2003, 1999) do not take into account multiple mechanisms of resistance or assess interactions between multiple genes. ABC transporter overexpression has long been shown to confer resistance, but enzymatic or functional inhibition has not substantially improve patient response to chemotherapy (Samuels 
                    <italic>et&#x00a0;al.</italic>, 1997).</p>
                <p>Multi-gene analytical approaches have previously been successful in deriving prognostic gene signatures for metastatic risk stratification (Oncotype DX&#x2122;, MammaPrint
                    <sup>&#x00ae;</sup>), subtypes (PAM50), and efforts have been made to predict chemotherapy resistance (Hess 
                    <italic>et&#x00a0;al.</italic>, 2006; Hatzis 
                    <italic>et&#x00a0;al.</italic>, 2011). &#x201c;</p>
                <p>In response to Dr. Fertig&#x2019;s comments, we have added a short discussion with citations of previously published approaches (including MammaPrint and Oncotype DX):</p>
                <p>&#x201c;Genomic information has been shown to correlate with tumor therapy response in previous studies
                    <sup>5,12-16</sup>. From these studies, analytical methods have been used to develop gene signatures for chemotherapy resistance prediction
                    <sup>5</sup>, subtypes (PAM50), and metastatic risk stratification (Oncotype DX&#x2122;, MammaPrint
                    <sup>&#x00ae;</sup>).&#x201d;</p>
                <p>
                    <italic>Comment 5. Based on the previous review, the authors include context with other predictions of the METABRIC data in the response to the reviewers. This must also be included in the Conclusion to assess the relevance of their findings in the literature.</italic>
                </p>
                <p>Response: We have added the indicated text from the previous &#x2018;response to the reviewers&#x2019; (modified) to the Conclusions:</p>
                <p>&#x201c;We also examined the method exhibiting the best performance in the Sage Bionetworks / DREAM Breast Cancer Prognosis Challenge
                    <sup>17</sup>, which was also phenotype-based, however it produces outcome signatures based on molecular processes, rather than the cancer drugs themselves. While interesting and informative, the results cannot be directly compared.&#x201d;</p>
                <p>Please note that the majority of entries in the DREAM project were not fully curated and only exist as source code. Analyzing these files to determine what methodology was attempted by these groups is beyond the scope of our study. A description of the second place of the METABRIC phase of the DREAM challenge is provided in the link below. This link describes how the METABRIC data is trained using a bipartite graphing as input for linear models, boost models, and RankSVM. While they state that RankSVM was the least successful between the three methods, it does not appear that this particular study has been published to the literature. As a result, we cannot fully review their results, and thus cannot be compared to our methodology in the main manuscript.</p>
                <p>https://sagesynapse.wordpress.com/2012/11/01/breast-cancer-challenge-team-pitttransmed-places-second-for-metabric-phase-of-the-challenge/</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report16345">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.10141.r16345</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Tung</surname>
                        <given-names>Chun-Wei</given-names>
                    </name>
                    <xref ref-type="aff" rid="r16345a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-3011-8440</uri>
                </contrib>
                <aff id="r16345a1">
                    <label>1</label>School of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>3</day>
                <month>10</month>
                <year>2016</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2016 Tung CW</copyright-statement>
                <copyright-year>2016</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport16345" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9417.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This study proposed prediction methods using SVM and RF classifiers with mRMR selected feature sets from cell line data and demonstrate its prediction ability for outcomes from METABRIC patient cohort. The classifiers with good prediction performance show the usefulness of combining domain knowledge with feature selection techniques. However, some details essential for reproducibility and interpretation are missing.</p>
            <p> Required information is listed in the following. 
                <list list-type="order">
                    <list-item>
                        <p>What are the values of parameters for SVM and RF classifiers and the methods for parameter selection (by default or other selection methods)?</p>
                    </list-item>
                    <list-item>
                        <p>The development and evaluation of models for patient data are not clear. Whether the models were trained using partial data from METABRIC or only leave-one-out cross-validation was applied? If cross-validation is the case, then what is the model offered at the online server because there will be more than one models created, and whether the cross-validation is&#x00a0;involved in&#x00a0;the&#x00a0;feature selection process&#x00a0;that often leads to an overestimation of the performance. For the case of training on partial data, both training and test performance are essential information for evaluating the robustness of models.</p>
                    </list-item>
                    <list-item>
                        <p>Since some of the datasets are highly imbalanced, the numbers of positives and negatives, as well as sensitivity and specificity are more important than accuracy for interpreting the results as a high accuracy with a low AUC could be the result of all positive/negative predictions on an imbalanced dataset. Listing all the information along with the accuracy and AUC will help the interpretation of prediction performances. &#x00a0;</p>
                    </list-item>
                </list>
            </p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment2432-16345">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Rogan</surname>
                            <given-names>Peter</given-names>
                        </name>
                        <aff>University of Western Ontario, Canada</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>PKR cofounded Cytognomix. A patent application related to biologically inspired gene signatures is pending. The other authors declare that they have no competing interests. </p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>13</day>
                    <month>1</month>
                    <year>2017</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <italic>Comment 1:What are the values of parameters for SVM and RF classifiers and the methods for parameter selection (by default or other selection methods)?</italic>
                </p>
                <p> </p>
                <p> Response: The parameter values for these classifiers have been added to the Tables 1-5.</p>
                <p> In regards to parameter selection, the first paragraph of the methods now describes C and Sigma selection as a grid search to find the values with the lowest cross-validation misclassification rate. Similarly for RF, a grid search was used to optimize the maximum number of randomly selected genes for each tree (second paragraph of Methods section).</p>
                <p> 
                    <italic>Comment 2: The development and evaluation of models for patient data are not clear. Whether the models were trained using partial data from METABRIC or only leave-one-out cross-validation was applied? If cross-validation is the case, then what is the model offered at the online server because there will be more than one models created, and whether the cross-validation is involved in the feature selection process that often leads to an overestimation of the performance. For the case of training on partial data, both training and test performance are essential information for evaluating the robustness of models.</italic>
                </p>
                <p> Response: We obtained new results for both RF and mRMR+SVM models when we use discovery set as training set and validation set as test set, the performance of the model was poor. After more investigation we found that there happened to be a large variation between gene expression of 26 targeted genes between discovery and validation set (please see Supplementary Dataset 2). Hence, building any classifier using discovery and validation set as training and test set in their current forms will result of poor performance, since the training and test sets are vastly different.</p>
                <p> However, we did carry out another experiment on discovery set solely and used 70% of data for training and remaining 30% for test the performance of the model. The results have been added to the manuscript (Tables 4 and 5).</p>
                <p> 
                    <italic>Comment 3: Since some of the datasets are highly imbalanced, the numbers of positives and negatives, as well as sensitivity and specificity are more important than accuracy for interpreting the results as a high accuracy with a low AUC could be the result of all positive/negative predictions on an imbalanced dataset. Listing all the information along with the accuracy and AUC will help the interpretation of prediction performances.&#x00a0;</italic>
                </p>
                <p> Response: As previously mentioned, we have added more performance measures including MCC and AUC. They have been added Tables 1-5 of the manuscript.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report16733">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.10141.r16733</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Fertig</surname>
                        <given-names>Elana J.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r16733a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r16733a1">
                    <label>1</label>Division of Oncology Biostatistics and Bioinformatics, School of Medicine, Johns Hopkins University, Baltimore, MD, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>30</day>
                <month>9</month>
                <year>2016</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2016 Fertig EJ</copyright-statement>
                <copyright-year>2016</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport16733" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9417.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This study develops SVM and RF algorithms built upon previously learned gene signatures of therapeutic response to breast cancer. The algorithms are applied and compared to predict patient survival under different treatment conditions in METABRIC data. The analyses and comparisons are robust and this study provides a useful assessment of biologically-driven classifiers. The three major areas that require improvement before the article is indexed are as follows, and described in further detail below. 
                <list list-type="order">
                    <list-item>
                        <p>The methods require further clarification to distinguish differences between this study and the previous study as well as the parameters of the machine learning algorithms.</p>
                    </list-item>
                    <list-item>
                        <p>Accuracy in the results must better distinguish results on independent test and training sets.</p>
                    </list-item>
                    <list-item>
                        <p>Classifiers must be put in the context of other existing genomics classifiers used in breast cancer and/or previously published in Mammaprint data.</p>
                    </list-item>
                </list> &#x00a0;</p>
            <p> 
                <bold>Title and Abstract&#x00a0;</bold>
            </p>
            <p> Acceptable</p>
            <p> 
                <bold>Article content</bold>
            </p>
            <p> 
                <bold>
                    <italic>Methods</italic>
                </bold> 
                <list list-type="order">
                    <list-item>
                        <p>Abbreviations SVM and RF must be spelled out as Support Vector Machine and Random Forrest on first use in Methods.</p>
                    </list-item>
                    <list-item>
                        <p>Writing in 
                            <italic>SVM learning </italic>subsection of 
                            <italic>Methods</italic> requires clarification to distinguish which of these methods were developed in the previous 
                            <italic>Molecular Oncology</italic> publication and which were developed as part of this publications.</p>
                    </list-item>
                    <list-item>
                        <p>Details about the SVM learning algorithm are included in the caption to Figure 1, but must also be included and completely described in text for the corresponding section of the methods.</p>
                    </list-item>
                    <list-item>
                        <p>No equations are provided to describe the role of the parameters C
                            <italic> </italic>and sigma. It is also unclear whether this greedy search is implemented by the Matlab function 
                            <italic>fitcsvm </italic>or uses custom code developed by the authors.</p>
                    </list-item>
                </list> &#x00a0;</p>
            <p> 
                <bold>
                    <italic>Results</italic>
                </bold> 
                <list list-type="order">
                    <list-item>
                        <p>Need to specify whether reported accuracies are computed with leave-one-out cross validation or 9-fold cross validation (described in Methods).</p>
                    </list-item>
                    <list-item>
                        <p>Ideally, given the size of METABRIC data they would be calculated on independent training (first 1000 patient samples) and training (last 1000 patient samples) datasets.</p>
                    </list-item>
                    <list-item>
                        <p>AUC must be computed separately for discovery and validation sets (Table 2).</p>
                    </list-item>
                    <list-item>
                        <p>It is unclear whether the previous validation set described in the sentence &#x201c;Predictions for the METABRIC patient cohort, which was independent of the previous validation set&#x201d; refers to a validation set used in this publication or the previous publication.</p>
                    </list-item>
                    <list-item>
                        <p>Covariates such as ER/PR or PAM50 subtypes must be included in a table describing the sample cohorts. Accuracy must be computed separately for these co-variates or they must also be included as co-variates in the machine-learning model.</p>
                    </list-item>
                    <list-item>
                        <p>Ideally accuracy would be compared to existing breast cancer classifiers (e.g., using code from Marchionni 
                            <italic>et al.,</italic>&#x00a0;
                            <italic>BMC Genomics</italic>, 2013) and/or survival curves reported in the literature.</p>
                    </list-item>
                </list> &#x00a0;</p>
            <p> 
                <bold>Conclusions</bold> 
                <list list-type="order">
                    <list-item>
                        <p>Must be discussed in the context of existing genomics classifiers for breast cancer (e.g., OncotypeDx and/or Mammaprint).</p>
                    </list-item>
                    <list-item>
                        <p>Results must be put in context with other predictions on METABRIC data, e.g., outcomes from the DREAM contest.</p>
                    </list-item>
                </list> &#x00a0;</p>
            <p> 
                <bold>
                    <italic>Data</italic>
                </bold>
            </p>
            <p> Acceptable</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment2433-16733">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Rogan</surname>
                            <given-names>Peter</given-names>
                        </name>
                        <aff>University of Western Ontario, Canada</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>PKR cofounded Cytognomix. A patent application related to biologically inspired gene signatures is pending. The other authors declare that they have no competing interests.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>13</day>
                    <month>1</month>
                    <year>2017</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <italic>Comment 1: The methods require further clarification to distinguish differences between this study and the previous study as well as the parameters of the machine learning algorithms.</italic>
                </p>
                <p> Response: The first paragraph of the Methods describes Support Vector Machine learning, which has been greatly expanded upon. Differences in SVM methodology between the two studies are indicated there (i.e. a Gaussian kernel was used instead of a linear kernel). All other feature selection methods described in the manuscript (Random Forest, mRMR) were not used in Dorman 
                    <italic>et al.</italic>, 2016.</p>
                <p> The parameters for machine learning algorithms have been incorporated in the manuscript, and can be found in the footnote section of each data table.</p>
                <p> 
                    <italic>Comment 2: Accuracy in the results must better distinguish results on independent test and training sets.</italic>
                </p>
                <p> </p>
                <p> Response: The Validation dataset showed a distinct overall expression profile from the Discovery set, possibly due to batch effects, which are well known. We added another experiment to the manuscript by splitting the Discovery set into Training and Test sets. The model was trained using 70% of the data and then tested using the remaining 30% of data as test set. We repeated this procedure 100 times and took the median as the final performance result. The results are presented in Tables 4 and 5 of the manuscript.</p>
                <p> 
                    <italic>Comment 3: Classifiers must be put in the context of other existing genomics classifiers used in breast cancer and/or previously published in Mammaprint data.</italic>
                </p>
                <p> </p>
                <p> Response: We have added two sentences in the second paragraph of the &#x201c;Results and Discussion&#x201d; section which describes the comparison of our gene signature to those from MammaPrint and Oncotype Dx. Pair-wise comparison of these three signatures show that they are nearly independent of one another.</p>
                <p> 
                    <bold>Methods</bold>
                </p>
                <p> </p>
                <p> 
                    <italic>Comment 4: Abbreviations SVM and RF must be spelled out as Support Vector Machine and Random Forest on first use in Methods.</italic>
                </p>
                <p> </p>
                <p> Response: We thank the reviewer for this suggestion. It has been addressed in the Methods section of the manuscript.</p>
                <p> 
                    <italic>&#x00a0;</italic>
                </p>
                <p>
                    <italic> Comment 5: Writing in SVM learning subsection of Methods requires clarification to distinguish which of these methods were developed in the previous Molecular Oncology publication and which were developed as part of this publications.</italic>
                </p>
                <p> </p>
                <p> Response: This is now clarified within the first paragraph of the Methods section in the manuscript. The SVM classifier was adopted from previous Molecular Oncology publication, while the feature selection method has been developed as part of this publication.</p>
                <p> 
                    <italic>Comment 6: Details about the SVM learning algorithm are included in the caption to Figure 1, but must also be included and completely described in text for the corresponding section of the methods.</italic>
                </p>
                <p> </p>
                <p> Response: Thanks for the reviewer&#x2019;s suggestion. This description of the SVM learning algorithm has been moved from the Figure 1 legend and integrated into the first paragraph of the methods section.</p>
                <p> 
                    <italic>Comment 7: No equations are provided to describe the role of the parameters C and sigma. It is also unclear whether this greedy search is implemented by the Matlab function fitcsvm or uses custom code developed by the authors.</italic>
                </p>
                <p> </p>
                <p> Response: A brief description of the role of each parameter has been added to the first paragraph of the methods section of the manuscript. Readers are also now directed to a reference (Ben-Hur and Weston, 2010) if more detail is desired.</p>
                <p> The greedy search, also called sequential backward feature selection, was implemented as a script by our lab in MATLAB. It is not a MATLAB function. This is clarified by changing a few words in the first paragraph of the methods section:&#x00a0;&#x201c;A backwards feature selection (greedy) algorithm was designed and implemented in MATLAB in which&#x2026;&#x201d;</p>
                <p> Moreover, as described above, the SVM classifier was adopted from previous Molecular Oncology publication (Dorman 
                    <italic>et al.</italic> 2016), while the feature selection method has been developed as part of this publication.</p>
                <p> 
                    <bold>Results</bold>
                </p>
                <p> 
                    <italic>Comment 8: Need to specify whether reported accuracies are computed with leave-one-out cross validation or 9-fold cross validation (described in Methods).</italic>
                </p>
                <p> </p>
                <p> Response: All SVM models described in the manuscript used leave-one-out cross validation except one, and this is clearly indicated in Table 1, and is now commented on in the methods. A 9-fold cross-validation was used to build a model using 735 patients who were treated with Chemotherapy and/or Hormone therapy, as leave-one-out cross validation of this many patients took an unreasonably long time to complete (it exceeded 3 weeks on a dedicated I7 Intel processor).</p>
                <p> 
                    <italic>Comment 9: Ideally, given the size of METABRIC data they would be calculated on independent training (first 1000 patient samples) and test (last 1000 patient samples) datasets.</italic>
                </p>
                <p> </p>
                <p> Response: We obtained new results for both RF and mRMR+SVM models using Discovery patient set for training and Validation set for testing, however the performance of the model was poor. After further investigation, we found that there were large differences between gene expression levels of the 26 model signature genes in the Discovery versus Validation sets (we used Wilcoxon rank sum test, Kruskal-Wallis test and t-test to evaluate the results &#x2013; shown in the plotted distributions of gene expression in Supplemental Dataset 2) regardless of patient status (alive or dead). Hence, building any classifier using discovery and validation set as training and test set in their current forms will result of poor performance due to this source of heterogeneity.</p>
                <p> </p>
                <p> To address this issue, we did carry out another experiment based on data from the Discovery patient dataset alone; using 70% of data for training and remaining 30% for testing, the performance of the model was significantly better. We speculate that the discrepancy between the expression distributions in the Discovery and Validation sets were the result of batch effects. The results have been added to the manuscript (Tables 4,5).</p>
                <p> 
                    <italic>Comment 10: AUC must be computed separately for discovery and validation sets (Table 2).</italic>
                </p>
                <p> </p>
                <p> Response: We have included additional performance measures to Tables 1-5, including Area Under Curve (AUC).</p>
                <p> 
                    <italic>Comment 11: It is unclear whether the previous validation set described in the sentence &#x201c;Predictions for the METABRIC patient cohort, which was independent of the previous validation set&#x201d; refers to a validation set used in this publication or the previous publication.</italic>
                </p>
                <p> </p>
                <p> Response: This sentence is referring to breast cancer patient data from Hatzis 
                    <italic>et al.</italic> (2013), which was used as a validation set in Dorman 
                    <italic>et al.&#x00a0;</italic>(2016), not this publication. We have modified this sentence to clarify the issue.</p>
                <p> 
                    <italic>Comment 12: Covariates such as ER/PR or PAM50 subtypes must be included in a table describing the sample cohorts. Accuracy must be computed separately for these co-variates or they must also be included as co-variates in the machine-learning model.</italic>
                </p>
                <p> </p>
                <p> Response: Even with the subtype as covariant, it is not possible to perform the analysis the reviewer requested. Certain therapies are definitely more effective in particular subtypes (eg. etoposide, docetaxel, and cisplatin are preferentially active in basal or claudin-low cell lines, as observed clinically; Heiser 
                    <italic>et al.,&#x00a0;</italic>2012). The public METABRIC dataset (or the corresponding publication) does not provide the specific therapies used to treat individual patients. Had they done so, it would have made sense to look at these covariates.</p>
                <p> Reference: Heiser LM, Sadanandam A, Kuo WL, Benz SC, Goldstein TC, Ng S, Gibb WJ, Wang NJ, Ziyad S, Tong F, 
                    <italic>et al.</italic> (2012). Subtype and pathway specific responses to anticancer compounds in breast cancer.&#x00a0;
                    <italic>Proc Natl Acad Sci US A</italic>
                    <bold>109</bold>:2724-2729.</p>
                <p> 
                    <italic>Comment 13: Ideally accuracy would be compared to existing breast cancer classifiers (e.g., using code from Marchionni et al., BMC Genomics, 2013) and/or survival curves reported in the literature.</italic>
                </p>
                <p> </p>
                <p> Response: The proposed method has been compared against the K-TSP (Marchionni 
                    <italic>et al.</italic>, BMC Genomics, 2013) as per reviewer&#x2019;s suggestion and the results are presented in Table 6 of the manuscript.</p>
                <p> </p>
                <p> 
                    <bold>Conclusions</bold>
                </p>
                <p> 
                    <italic>Comment 14: Must be discussed in the context of existing genomics classifiers for breast cancer (e.g., OncotypeDx and/or Mammaprint).</italic>
                </p>
                <p> </p>
                <p> Response: We have added text to both the second paragraph of the &#x201c;Results and Discussion&#x201d; paragraph and to the conclusion of the paper.</p>
                <p> 
                    <italic>Comment 15: Results must be put in context with other predictions on METABRIC data, e.g., outcomes from the DREAM contest.</italic>
                </p>
                <p> </p>
                <p> Response: An important distinction to note in regards to our methodology is that the predictions are based on the genes known to be associated with the response to specific drugs used to treat breast cancer. In the DREAM contest, the method with the highest METABRIC score (as described in Cheng 
                    <italic>et al.</italic>, 2013) was phenotype-based, finding signatures for molecular processes that are disregulated in METABRIC, rather than responses to the cancer therapies themselves. While this is an interesting prediction method, the results cannot compared to our approach. The gene signatures that we have derived contain components of many different pathways.</p>
                <p> Reference: Cheng WY, Ou Yang TH, Anastassiou D. Biomolecular events in cancer revealed by attractor metagenes. PLoS Comput Biol. 2013;9(2):e1002920.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
