<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.9893.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                    <subj-group>
                        <subject>Bioinformatics</subject>
                    </subj-group>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>exprso: an R-package for the rapid implementation of machine learning algorithms</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved, 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Quinn</surname>
                        <given-names>Thomas</given-names>
                    </name>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0286-6329</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Tylee</surname>
                        <given-names>Daniel</given-names>
                    </name>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Glatt</surname>
                        <given-names>Stephen</given-names>
                    </name>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Bioinformatics Core Research Facility, Deakin University, Victoria, Australia</aff>
                <aff id="a2">
                    <label>2</label>PsychGENe Lab, SUNY Upstate Medical University, Syracuse, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:contacttomquinn@gmail.com">contacttomquinn@gmail.com</email>
                </corresp>
                <fn fn-type="con">
                    <p>TQ designed and implemented the tool, applied the tool to the use case, and drafted the article. DT and SG helped design the tool and drafted the article. DT contributed code and performed extensive beta testing.</p>
                </fn>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>27</day>
                <month>10</month>
                <year>2016</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2016</year>
            </pub-date>
            <volume>5</volume>
            <elocation-id>2588</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>26</day>
                    <month>10</month>
                    <year>2016</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2016 Quinn T et al.</copyright-statement>
                <copyright-year>2016</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/5-2588/pdf"/>
            <abstract>
                <p>Machine learning plays a major role in many scientific investigations. However, non-expert programmers may struggle to implement the elaborate pipelines necessary to build highly accurate and generalizable models. We introduce here a new R package, exprso, as an intuitive machine learning suite designed specifically for non-expert programmers. Built primarily for the classification of high-dimensional data, exprso uses an object-oriented framework to encapsulate a number of common analytical methods into a series of interchangeable modules. This includes modules for feature selection, classification, high-throughput parameter grid-searching, elaborate cross-validation schemes (e.g., Monte Carlo and nested cross-validation), ensemble classification, and prediction. In addition, exprso provides native support for multi-class classification through the 1-vs-all generalization of binary classifiers. In contrast to other machine learning suites, we have prioritized simplicity of use over expansiveness when designing exprso.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>R</kwd>
                <kwd>package</kwd>
                <kwd>machine learning</kwd>
                <kwd>classification</kwd>
                <kwd>cross-validation</kwd>
                <kwd>machine learning</kwd>
                <kwd>supervised</kwd>
                <kwd>unsupervised</kwd>
                <kwd>genomics</kwd>
                <kwd>prediction</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Supervised machine learning has an increasingly important role in biological studies. However, the sheer complexity of classification pipelines poses a significant barrier to expert biologists unfamiliar with the intricacies of machine learning. Moreover, many biologists lack the time or technical skills necessary to establish their own classification pipelines. Here we discuss the exprso package, a framework for the rapid implementation of high-throughput classification, tailored specifically for use with high-dimensional data. As such, this package aims to empower investigators to execute state-of-the-art binary and multi-class classification, including deep learning, with minimal programming experience necessary.</p>
            <p>Although R offers a tremendous number of high-quality classification packages, there exists only a handful of fully integrated machine learning suites for R. Of these, we recognize here the caret package which offers an expansive toolkit for both classification and regression analyses
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>. Otherwise, we acknowledge the RWeka package which provides an API to the popular Weka machine learning suite, originally written in Java
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>. While these packages have a vast repertoire of functionality, we believe the exprso package has three key advantages. First, this package employs an object-oriented design that makes the software intuitive to lay programmers. In place of a few, elaborate functions that offer power at the expense of convenience, this package makes use of more, simpler functions whereby each constituent event has its own method that users can combine in tandem to create their own custom analytical pipeline.</p>
            <p>Second, this package exposes carefully crafted modules which simplify several high-throughput classification pipelines. Single functions, coupled with special argument handlers, manage sophisticated pipelines such as high-throughput parameter grid-searching, Monte Carlo cross-validation
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>, and nested cross-validation
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>. Moreover, users can embed these high-throughput modules (e.g., parameter grid-searching) within other modules (e.g., Monte Carlo cross-validation), allowing for infinite possibility. In addition, this package provides an automated way to build ensemble classifiers from the results of these high-throughput modules.</p>
            <p>Third, this package prioritizes multi-class classification by generalizing binary classification methods to a multiclass context. Specifically, this package automatically executes 1-vs-all classification and prediction whenever working with a dataset that contains multiple class labels. In addition, this package provides a specialized high-throughput module for 1-vs-all classification with individual 1-vs-all feature selection, an alternative to conventional multi-class classification which has been reported to improve results, at least in the setting of 1-vs-1 multi-class support vector machines
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>.</p>
            <p>While we acknowledge that the premier machine learning suites, like caret, may surpass our package in the breadth of their functionality, we do not intend to replace this tool. Rather, we developed exprso as an adjunct, or alternative, tailored specifically to those with limited programming experience, especially biologists working with high-dimensional data. That said, we hope that even some expert programmers may find value in this software tool.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>Implementation</title>
                <p>This package uses an object-oriented framework for classification. In this paradigm, every unique task, such as data splitting (i.e., creating the training and validation sets), feature selection, and classifier construction, has its own associated function, called a method. These methods typically work as wrappers for other R packages, structured so that the objects returned by one method will feed seamlessly into the next method.</p>
                <p>In other words, each method represents one of a number of analytical modules that provides the user with stackable and interchangeable data processing tools. Examples of these methods include wrappers for popular feature selection methods (e.g., analysis of variance (ANOVA), recursive feature elimination
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>,
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup>, empiric Bayes statistic
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>, minimum redundancy maximum relevancy (mRMR)
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup>, and more) as well as numerous classification methods (e.g., support vector machines (SVM)
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>, neural networks
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup>, deep neural networks
                    <sup>
                        <xref ref-type="bibr" rid="ref-12">12</xref>
                    </sup>, random forests
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>, and more).</p>
                <p>We have adopted a nomenclature to help organize the methods available in this package. In this scheme, most functions have a few letters in the beginning of their name to designate their general utility. Below, we include a brief description of these function prefixes along with a flow diagram of the available methods.</p>
                <list list-type="bullet">
                    <list-item>
                        <p>

                            <bold>array:</bold> Modules that import data stored as a data.frame, ExpressionSet object, or local text file.</p>
                    </list-item>
                    <list-item>
                        <p>

                            <bold>mod:</bold> Modules that modify the imported data prior to classification.</p>
                    </list-item>
                    <list-item>
                        <p>

                            <bold>split:</bold> Modules that split these data into training and validation (or test) sets.</p>
                    </list-item>
                    <list-item>
                        <p>

                            <bold>fs:</bold> Modules that perform feature selection.</p>
                    </list-item>
                    <list-item>
                        <p>

                            <bold>build:</bold> Modules that build classifiers and classifier ensembles.</p>
                    </list-item>
                    <list-item>
                        <p>

                            <bold>predict:</bold> Modules that deploy classifiers and classifier ensembles.</p>
                    </list-item>
                    <list-item>
                        <p>

                            <bold>calc:</bold> Modules that calculate classifier performance, including area under the receiver operating characteristic (ROC) curve (AUC).</p>
                    </list-item>
                    <list-item>
                        <p>

                            <bold>pl:</bold> Modules that manage elaborate classification pipelines, including high-throughput parameter gridsearches, Monte Carlo cross-validation, and nested cross-validation.</p>
                    </list-item>
                    <list-item>
                        <p>

                            <bold>pipe:</bold> Modules that filter the classification pipeline results.</p>
                    </list-item>
                </list>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>A directed graph of all modules included in the exprso package and how they might relate to one another in a complete pipeline.</title>
                        <p>Elements colored grey exist outside of this package and instead refer to natively compatible components from the GEOquery
                            <sup>
                                <xref ref-type="bibr" rid="ref-14">14</xref>
                            </sup> and Biobase
                            <sup>
                                <xref ref-type="bibr" rid="ref-15">15</xref>
                            </sup> packages.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/10664/af82f138-3516-429f-8a51-82f6e59b0628_figure1.gif"/>
                </fig>
                <p>We refer the reader to the package vignette, &#x201c;An Introduction to the exprso Package,&#x201d; hosted with the package on the Comprehensive R Archive Network (CRAN), for a detailed description of object-oriented framework and methods used in this package
                    <sup>
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup>.</p>
            </sec>
            <sec>
                <title>Operation</title>
                <p>Specific computer hardware requirements will depend on the dimensions of the dataset under study, the methods deployed on that dataset, and the extent of any high-throughput analyses used. For the most part, however, a standard laptop computer with the latest version of R installed will handle most applications of the exprso package.</p>
            </sec>
        </sec>
        <sec>
            <title>Use cases</title>
            <p>To showcase this package, we make use of the publicly available hallmark Golub 1999 dataset to differentiate acute lymphocytic leukemia (ALL) from acute myelogenous leukemia (AML) based on gene expression as measured by microarray technology
                <sup>
                    <xref ref-type="bibr" rid="ref-17">17</xref>
                </sup>. We begin by importing this dataset as an ExpressionSet object from the package GolubEsets (version 1.16.0)
                <sup>
                    <xref ref-type="bibr" rid="ref-18">18</xref>
                </sup>. Then, using the arrayExprs function, we load the ExpressionSet object into exprso. Next, using the modFilter, modTransform, and modNormalize methods, we threshold filter, log2 transform, and standardize the data, respectively, reproducing the pre-processing steps taken by the original investigators
                <sup>
                    <xref ref-type="bibr" rid="ref-19">19</xref>
                </sup>.</p>
            <p>To keep the code clear and concise, we make use of the %&gt;% function notation from the magrittr package
                <sup>
                    <xref ref-type="bibr" rid="ref-20">20</xref>
                </sup>. In short, this function passes the result from the previous function call to the first argument of the next function, an action colloquially known as piping.</p>
            <p>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                    <bold>library</bold>(exprso)

                    <bold>library</bold>(golubEsets)

                    <bold>library</bold>(magrittr)

                    <bold>data</bold>(Golub_Merge)

                    <bold>array</bold> &lt;-
arrayExprs(Golub_Merge,
colBy = "ALL.AML",
include = 
                    <bold>list</bold>("ALL","AML"))%&gt;%
modFilter(20, 16000, 500, 5) %&gt;%
modTransform %&gt;%
modNormalize</preformat>
            </p>
            <p>Then, using the splitSample method, one of the split methods shown in the above diagram, we partition the data into a training and a test set through random sampling without replacement. Next, we perform a series of feature selection methods on the extracted training set. Through the fs modules fsStats and fsPrcomp, we pass the top 50 features as selected by the Student&#x2019;s t-test through dimension reduction by principal components analysis (PCA).</p>
            <p>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">splitSets &lt;- splitSample(
                    <bold>array</bold>, percent.include = 67)

                    <bold>array</bold>.fs &lt;-
trainingSet(splitSets) %&gt;%
fsStats(how = "t.test") %&gt;%
fsPrcomp(top = 50)
</preformat>
            </p>
            <p>With feature selection complete, we can construct a classifier. For this example, we use the buildSVM method to train a linear kernel support vector machine (SVM) (with default parameters) using the top 5 principal components. Then, we deploy the trained machine on the test set from above. Note that through the objectoriented framework, each feature selection event, including the rules for dimension reduction by PCA, gets passed along automatically until classifier prediction. This ensures that the test set always undergoes the same feature selection history as the training set. The calcStats function allows us to calculate classifier performance as sensitivity, specificity, accuracy, or area under the curve (AUC)
                <sup>
                    <xref ref-type="bibr" rid="ref-21">21</xref>,
                    <xref ref-type="bibr" rid="ref-22">22</xref>
                </sup>.</p>
            <p>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">pred &lt;-

                    <bold>array</bold>.fs %&gt;%
buildSVM(top = 5, kernel = "linear") %&gt;%

                    <bold>predict</bold>(testSet(splitSets))
calcStats(pred)</preformat>
            </p>
            <p>When constructing a classifier using a build module, we can only specify one set of parameters at a time. However, investigators often want to test models across a vast range of parameters. For this reason, we provide methods like plGrid to automate high-throughput parameter grid-searches. These methods not only wrap classifier construction, but classifier deployment as well. In addition, they accept a fold argument to toggle leave-one-out or v-fold cross-validation.</p>
            <p>Below, we show a simple example of parameter grid-searching, whereby the top 3, 5, and 10 principal components, as established above, get used as a substrate for the construction of linear and radial kernel SVMs with costs of 1, 101, and 1001. In addition, we calculate a biased 10-fold cross-validation accuracy to help guide our choice of the final model parameters. Take note that we call this accuracy biased because we are performing cross-validation on a dataset that has already undergone feature selection. Although this approach gives a poor assessment of absolute classifier performance
                <sup>
                    <xref ref-type="bibr" rid="ref-23">23</xref>
                </sup>, it may still have value in helping to guide parameter selection in a statistically valid manner. As an alternative to this biased cross-validation accuracy, users can instead call the plNested method in which feature selection is performed anew with each data split that occurs during the leave-one-out or v-fold cross-validation.</p>
            <p>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">pl &lt;-
plGrid(
                    <bold>array</bold>.fs, testSet(splitSets),
top = 
                    <bold>c</bold>(3, 5, 10),
how = "buildSVM",
kernel = 
                    <bold>c</bold>("linear", "radial"),
cost = 
                    <bold>c</bold>(1, 101, 1001),
fold = 10)</preformat>
            </p>
            <p>Finally, we show an example for the plMonteCarlo method, an implementation of Monte Carlo cross-validation. Compared to the plGrid method which iteratively builds and deploys classifiers on a validation (or test) set, plMonteCarlo wraps multiple iterations of data splitting, feature selection, and parameter grid-searching. The final result therefore contains the classifier performances as measured on a number of bootstraps carved out from the initial dataset. Argument handler functions help organize the arguments supplied to the split, feature selection, and high-throughput methods managed during the plMonteCarlo method call.</p>
            <p>Take note that when using the Monte Carlo cross-validation method (or any of the other pl modules), the user may iterate over any classification method provided by exprso, not only buildSVM. This includes the buildDNN method for deep neural networks as implemented via h2o
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>. Also note that the user can embed other cross-validation methods, such as another Monte Carlo or nested method, within the cross-validation method call, allowing for endless combinatory possibility.</p>
            <p>In the first section of the code below, we define the argument handler functions for the plMonteCarlo call. As suggested by their names, the ctrlSplitSet, ctrlFeatureSelect, and ctrlGridSearch handlers manage arguments to data splitting, feature selection, and high-throughput grid-searching, respectively. In this example, we set up arguments to split the unaltered training set through random sampling with replacement, perform the two-step feature selection process from above, and run a high-throughput parameter grid-search with biased cross-validation. The unaltered dataset is processed in this way 10 times, as directed by argument B.</p>
            <p>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">ss &lt;-
ctrlSplitSet(func = "splitSample",
percent.include = 67,

                    <bold>replace</bold> = TRUE)
fs &lt;-

                    <bold>list</bold>(ctrlFeatureSelect(func = "fsStats",
how = "t.test"),
ctrlFeatureSelect(func = "fsPrcomp",
top = 50))
gs &lt;-
ctrlGridSearch(func = "plGrid",
top = 
                    <bold>c</bold>(3, 5, 10),
how = "buildSVM",
kernel = 
                    <bold>c</bold>("linear", "radial"),
cost = 
                    <bold>c</bold>(1, 101, 1001),
fold = 10)
boot &lt;-
plMonteCarlo(trainingSet(splitSets),
B = 10,
ctrlSS = ss,
ctrlFS = fs,
ctrlGS = gs)</preformat>
            </p>
            <p>As an adjunct to this bootstrapping pipeline, the user can apply these results to build a classifier ensemble using the best classifier from each bootstrap, then deploy that classifier on the withheld test set. Analogous to how random forests will deploy an ensemble of decision trees
                <sup>
                    <xref ref-type="bibr" rid="ref-24">24</xref>
                </sup>, this method, which we dub &#x201c;random plains&#x201d;, will deploy an ensemble of SVMs.</p>
            <p>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">ens &lt;- buildEnsemble(boot, colBy = "valid.acc", top = 1)
pred &lt;- 
                    <bold>predict</bold>(ens, testSet(splitSets))</preformat>
            </p>
            <p>Beyond those mentioned here, this package also includes methods for integrating unsupervised machine learning (i.e., clustering) into classification pipelines. In addition, exprso contains high-throughput methods specialized for multi-class classification. We refer the reader to the package vignettes, &#x201c;An Introduction to the exprso Package&#x201d; and &#x201c;Advanced Topics for the exprso Package&#x201d;, both hosted with the package on the Comprehensive R Archive Network (CRAN), for a detailed description of all methods included in this package
                <sup>
                    <xref ref-type="bibr" rid="ref-16">16</xref>
                </sup>.</p>
        </sec>
        <sec>
            <title>Summary</title>
            <p>Here we introduce the R package exprso, a machine learning suite tailored specifically to working with high-dimensional data. Unlike other machine learning suites, we have prioritized simplicity of use over expansiveness. By developing this package in an object-oriented framework, we provide a fully interchangeable and modular programming interface that allows for the rapid implementation of binary and multi-class classification pipelines. We have included in this framework functions for executing some of most popular feature selection methods and classification algorithms. In addition, exprso also contains a number of modules that facilitate classification with high-throughput parameter grid-searching in conjunction with sophisticated crossvalidation schemes. Owing to its ease-of-use and extensive documentation, we hope exprso will serve as an indispensable resource, especially to scientific investigators with limited prior programming experience.</p>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>Software available from: 
                <ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/web/packages/exprso/">http://cran.r-project.org/web/packages/exprso/</ext-link>
            </p>
            <p>Latest source code: 
                <ext-link ext-link-type="uri" xlink:href="http://github.com/tpq/exprso">http://github.com/tpq/exprso</ext-link>
            </p>
            <p>Archived source code as at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.162063">http://doi.org/10.5281/zenodo.162063</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-25">25</xref>
                </sup>
            </p>
            <p>Software license: GNU General Public License, version 2.</p>
        </sec>
    </body>
    <back>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kuhn</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Building predictive models in r using the caret package.</article-title>
                    <source>

                        <italic toggle="yes">J Stat Softw.</italic>
</source>ISSN 1548-7660.<year>2008</year>;<volume>28</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>26</lpage>.
                    <pub-id pub-id-type="doi">10.18637/jss.v028.i05</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hornik</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Buchta</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zeileis</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Open-source machine learning: R meets Weka.</article-title>
                    <source>

                        <italic toggle="yes">Computation Stat.</italic>
</source>ISSN 0943-4062, 1613-9658.<year>2008</year>;<volume>24</volume>(<issue>2</issue>):<fpage>225</fpage>&#x2013;<lpage>232</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s00180-008-0119-7</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Picard</surname>
                            <given-names>RR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cook</surname>
                            <given-names>RD</given-names>
                        </name>
</person-group>:
                    <article-title>Cross-Validation of Regression Models.</article-title>
                    <source>

                        <italic toggle="yes">J Am Stat Assoc.</italic>
</source>ISSN 0162-1459, 1537-274X.<year>1984</year>;<volume>79</volume>(<issue>387</issue>):<fpage>575</fpage>&#x2013;<lpage>583</lpage>.
                    <pub-id pub-id-type="doi">10.1080/01621459.1984.10478083</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Varma</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Simon</surname>
                            <given-names>R</given-names>
                        </name>
</person-group>:
                    <article-title>Bias in error estimation when using cross-validation for model selection.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>ISSN 1471-2105.<year>2006</year>;<volume>7</volume>:<fpage>91</fpage>.
                    <pub-id pub-id-type="pmid">16504092</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-7-91</pub-id>
                    <pub-id pub-id-type="pmcid">1397873</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Huang</surname>
                            <given-names>PX</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fisher</surname>
                            <given-names>RB</given-names>
                        </name>
</person-group>:
                    <article-title>Individual feature selection in each One-versus-One classifier improves multi-class SVM performance</article-title>.
                    <ext-link ext-link-type="uri" xlink:href="http://homepages.inf.ed.ac.uk/s1064211/thesis/icpr14.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Guyon</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Weston</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Barnhill</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Gene Selection for Cancer Classification using Support Vector Machines.</article-title>
                    <source>

                        <italic toggle="yes">Machine Learning.</italic>
</source>ISSN 0885-6125, 1573-0565.<year>2002</year>;<volume>46</volume>(<issue>1&#x2013;3</issue>):<fpage>389</fpage>&#x2013;<lpage>422</lpage>.
                    <pub-id pub-id-type="doi">10.1023/A:1012487302797</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Johannes</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>pathClass: Classification using biological pathways as prior knowledge</article-title>. R package version 0.9.4.<year>2013</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=pathClass">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ritchie</surname>
                            <given-names>ME</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Phipson</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wu</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>
                        <italic toggle="yes">limma</italic> powers differential expression analyses for RNA-sequencing and microarray studies.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2015</year>;<volume>43</volume>(<issue>7</issue>):<fpage>e47</fpage>.
                    <pub-id pub-id-type="pmid">25605792</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkv007</pub-id>
                    <pub-id pub-id-type="pmcid">4402510</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>De Jay</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Papillon-Cavanagh</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Olsen</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>mRMRe: an R package for parallelized mRMR ensemble feature selection.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>ISSN 1367-4803, 1460-2059.<year>2013</year>;<volume>29</volume>(<issue>18</issue>):<fpage>2365</fpage>&#x2013;<lpage>2368</lpage>.
                    <pub-id pub-id-type="pmid">23825369</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btt383</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Meyer</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dimitriadou</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hornik</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071). TU Wien</article-title>. R package version 1.6-7.<year>2015</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=e1071">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Venables</surname>
                            <given-names>WN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ripley</surname>
                            <given-names>BD</given-names>
                        </name>
</person-group>:
                    <article-title>Modern Applied Statistics with S</article-title>. Springer, New York, fourth edition. ISBN 0-387-95457-0.<year>2002</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.stats.ox.ac.uk/pub/MASS4">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Aiello</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kraljevic</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Maj</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>h2o: R Interface for H2O</article-title>. R package version 3.10.0.6.<year>2016</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=h2o">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Liaw</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wiener</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Classification and regression by randomforest.</article-title>
                    <source>

                        <italic toggle="yes">R News.</italic>
</source>
                    <year>2002</year>;<volume>2</volume>(<issue>3</issue>):<fpage>18</fpage>&#x2013;<lpage>22</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="http://CRAN.R-project.org/doc/Rnews/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Davis</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Meltzer</surname>
                            <given-names>PS</given-names>
                        </name>
</person-group>:
                    <article-title>GEOquery: a bridge between the Gene Expression Omnibus (GEO) and Bioconductor.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2007</year>;<volume>23</volume>(<issue>14</issue>):<fpage>1846</fpage>&#x2013;<lpage>1847</lpage>.
                    <pub-id pub-id-type="pmid">17496320</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btm254</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Carey</surname>
                            <given-names>VJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gentleman</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Orchestrating high-throughput genomic analysis with Bioconductor.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2015</year>;<volume>12</volume>(<issue>2</issue>):<fpage>115</fpage>&#x2013;<lpage>121</lpage>.
                    <pub-id pub-id-type="pmid">25633503</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.3252</pub-id>
                    <pub-id pub-id-type="pmcid">4509590</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Quinn</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>exprso: Rapid Implementation of Machine Learning Algorithms for Genomic Data</article-title>. R package version 0.1.7.<year>2016</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=exprso">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Golub</surname>
                            <given-names>TR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Slonim</surname>
                            <given-names>DK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tamayo</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.</article-title>
                    <source>

                        <italic toggle="yes">Science.</italic>
</source>ISSN 0036-8075.<year>1999</year>;<volume>286</volume>(<issue>5439</issue>):<fpage>531</fpage>&#x2013;<lpage>537</lpage>.
                    <pub-id pub-id-type="pmid">10521349</pub-id>
                    <pub-id pub-id-type="doi">10.1126/science.286.5439.531</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Golub</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>golubEsets: exprSets for golub leukemia data.</article-title>R package version 1.12.0.
                    <ext-link ext-link-type="uri" xlink:href="https://bioc.ism.ac.jp/packages/3.2/data/experiment/manuals/golubEsets/man/golubEsets.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Deb</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Raji Reddy</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Reliable classification of two-class cancer data using evolutionary algorithms.</article-title>
                    <source>

                        <italic toggle="yes">BioSystems.</italic>
</source>ISSN 0303-2647.<year>2003</year>;<volume>72</volume>(<issue>1&#x2013;2</issue>):<fpage>111</fpage>&#x2013;<lpage>129</lpage>.
                    <pub-id pub-id-type="pmid">14642662</pub-id>
                    <pub-id pub-id-type="doi">10.1016/S0303-2647(03)00138-2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bache</surname>
                            <given-names>SM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wickham</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>magrittr: A Forward-Pipe Operator for R</article-title>.<year>2014</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=magrittr">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bradley</surname>
                            <given-names>AP</given-names>
                        </name>
</person-group>:
                    <article-title>The use of the area under the ROC curve in the evaluation of machine learning algorithms.</article-title>
                    <source>

                        <italic toggle="yes">Pattern Recognition.</italic>
</source>ISSN 0031-3203.<year>1997</year>;<volume>30</volume>(<issue>7</issue>):<fpage>1145</fpage>&#x2013;<lpage>1159</lpage>.
                    <pub-id pub-id-type="doi">10.1016/S0031-3203(96)00142-2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sing</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sander</surname>
                            <given-names>O</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Beerenwinkel</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ROCR: visualizing classifier performance in R.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2005</year>;<volume>21</volume>(<issue>20</issue>):<fpage>3940</fpage>&#x2013;<lpage>1</lpage>.
                    <pub-id pub-id-type="pmid">16096348</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bti623</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Simon</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Radmacher</surname>
                            <given-names>MD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dobbin</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification.</article-title>
                    <source>

                        <italic toggle="yes">J Natl Cancer Inst.</italic>
</source>ISSN 0027-8874, 1460-2105.<year>2003</year>;<volume>95</volume>(<issue>1</issue>):<fpage>14</fpage>&#x2013;<lpage>18</lpage>.
                    <pub-id pub-id-type="pmid">12509396</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jnci/95.1.14</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Breiman</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>Random Forests.</article-title>
                    <source>

                        <italic toggle="yes">Mach Learn.</italic>
</source>ISSN 0885-6125, 1573-0565.<year>2001</year>;<volume>45</volume>(<issue>1</issue>):<fpage>5</fpage>&#x2013;<lpage>32</lpage>.
                    <pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-25">
                <label>25</label>
                <mixed-citation publication-type="data">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Quinn</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>tpq/exprso: exprso-0.1.7.</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2016</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.162063</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report20319">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.10664.r20319</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>L&#x00f6;ffler-Wirth</surname>
                        <given-names>Henry</given-names>
                    </name>
                    <xref ref-type="aff" rid="r20319a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r20319a1">
                    <label>1</label>Leipzig University, Interdisciplinary Centre for Bioinformatics, Leipzig, 04107, Germany</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>17</day>
                <month>2</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 L&#x00f6;ffler-Wirth H</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport20319" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9893.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors present a new tool integrating established algorithms into one R package. This tool is intended to provide access to state-of-the-art statistical analysis to users with limited programming experience. As the manuscript is rather a package vignette than an independent article, I focus my review mainly on the software and usability.</p>
            <p> In general, the intention of providing comprehensive analysis options for non-experts is desirable, however a bit overachieving. In particular, I have several years of programming experience in R, but I was not able to apply the presented methods to another (multiclass) data with acceptable effort of time. Non-expert programmers, as biologists or doctors, also will not be able to correctly use the software in its present form.</p>
            <p> I therefore recommend extensive improvement of usability, user guidance and error feedback. I also acknowledge how challenging implementation of such tools is and encourage the authors to continue development of &#x201c;exprso&#x201d;.</p>
            <p> Particular comments:</p>
            <p> &#x00a0; 
                <list list-type="bullet">
                    <list-item>
                        <p>The 
                            <italic>arrayExps</italic>-function should accept also standard matrix containing expression values, the groups may then be given as character vector.</p>
                    </list-item>
                    <list-item>
                        <p>Handling of most functions can be improved by providing standard parameter values, e.g:</p>
                    </list-item>
                </list> 
                <list list-type="order">
                    <list-item>
                        <p>in 
                            <italic>arrayExps</italic> (may simply include all groups)</p>
                    </list-item>
                    <list-item>
                        <p>
                            <italic>percent.include </italic>in
                            <italic> splitSampe</italic> (66%)</p>
                    </list-item>
                    <list-item>
                        <p>and so on</p>
                    </list-item>
                </list> 
                <list list-type="bullet">
                    <list-item>
                        <p>Help the users to find errors in their function calls by giving appropriate warning/error messages. E.g. &#x201c;fsStats cannot be applied in multi-class studies&#x201d;. No non-programmer will know what an &#x201c;inherited method&#x201d; is.</p>
                    </list-item>
                    <list-item>
                        <p>Addressing the same point: 
                            <italic>fsANOVA</italic> returned with an error not clear to me (my data includes 4 groups):</p>
                    </list-item>
                </list> "contrasts can be applied only to factors with 2 or more levels" 
                <list list-type="bullet">
                    <list-item>
                        <p>I found no information about this in the help file of the fs-methods. In general, documentation should be split into one individual file for each function.</p>
                    </list-item>
                    <list-item>
                        <p>In the manuscript and the package vignette, the authors provide an overview figure of all functions implemented in the package. I suggest revision of this figure: a more systematic flowchart layout of the functions graph would help to find the starting point (GEO, local file, R matrix) and to follow the analysis workflow. Top-left part of the figure is not clear to me.</p>
                    </list-item>
                </list>
            </p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment3249-20319">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Quinn</surname>
                            <given-names>Thomas</given-names>
                        </name>
                        <aff>Deakin University, Australia</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>7</day>
                    <month>12</month>
                    <year>2017</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Dear Henry,</p>
                <p>Thank you so much for taking the time to perform a detailed and critical review of the software. I regret that professional obligations have delayed me from addressing your feedback in a timely manner. However, I am pleased to say that I have heavily revised the exprso package, incorporating most, if not all, of your suggestions.</p>
                <p>Key changes include:</p>
                <p>* Easier data import with the `exprso` function which imports data in x, y format</p>
                <p>* Software now supports continuous outcome prediction (when importing data via `exprso`)</p>
                <p>* More default values (e.g., split modules) to simplify user experience</p>
                <p>* Created custom errors for every split, feature selection, build function. Most other functions now have custom errors as well</p>
                <p>* Documentation is split up by unique function. ?mod, ?split, ?fs, ?build, etc. all open a table of contents that overviews the unique functions available</p>
                <p>* Figure 1 simplified with data sources clearly labeled in black</p>
                <p>* Manuscript adjusted to reflect other changes</p>
                <p>Also, you should not encounter any error with mutli-class classification when importing data using the new `exprso` function.</p>
                <p>(PS: I am aware of a trivial warning in this version that is triggered by predict. It is already fixed in the developmental branch on GitHub).</p>
                <p>Cheers,</p>
                <p>Thom</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report18380">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.10664.r18380</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Plewczynski</surname>
                        <given-names>Dariusz</given-names>
                    </name>
                    <xref ref-type="aff" rid="r18380a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Zubek</surname>
                        <given-names>Julian</given-names>
                    </name>
                    <xref ref-type="aff" rid="r18380a2">2</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r18380a1">
                    <label>1</label>Centre of New Technologies, University of Warsaw, Warsaw, Poland</aff>
                <aff id="r18380a2">
                    <label>2</label>Center of New Technologies, University of Warsaw, Warsaw, Poland</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>9</day>
                <month>12</month>
                <year>2016</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2016 Plewczynski D and Zubek J</copyright-statement>
                <copyright-year>2016</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport18380" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9893.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This is a short software tool article presenting a new R package for implementing machine learning pipelines. According to the authors it is targeted specifically at non-expert programmers analyzing high-dimensional biological data. The article briefly describes design goals and implementation details of the package, and then provides an example of building full machine learning pipeline for well-known microarray data set.</p>
            <p> The main contribution of this work lies in the prepared software and not in the accompanying manuscript. Because of this in the article there are no clear hypotheses nor conclusions. The software package does not provide any novel algorithms or functionalities. It is conceived as a wrapper which should make existing methods more accessible. This goal is of practical rather than scientific nature. Ultimately, the usefulness of the package can only be confirmed by its wider adoption by the users. This situation makes it hard to write a conclusive review.</p>
            <p> I agree with the motivation behind this software. R package infrastructure can be indeed confusing and it is notoriously hard to navigate through it for a beginner programmer. I find the interface adopted by exprso package relatively clean and unambiguous. The way the evaluation methods are implemented is consistent with best machine learning practices. Eventual success of this package depends on how well it will be integrated with other popular packages. I hope that the authors will have the resources for further development of exprso.</p>
            <p> Below I present more detailed comments for the authors:</p>
            <p> &#x00a0; 
                <list list-type="bullet">
                    <list-item>
                        <p>caret is mentioned as the R package with similar goals to exprso. I would find a more detailed comparison between these two package useful, both in terms of general design and specific use cases.</p>
                    </list-item>
                    <list-item>
                        <p>It is commendable when newly created packages integrate with existing infrastructure. As I understand, exprso has some limited integration with GEOquery and Biobase packages allowing easy data import. However, after data is loaded all operations use special ExprsArray objects, distinct from native R types such as DataFrame or Matrix. I understand this design choice, but I have to note that it limits interoperability. Should the additional processing in the middle of exprso pipeline be required, data needs to be converted between formats.</p>
                    </list-item>
                    <list-item>
                        <p>I find the design in which transformations applied on the training set are automatically applied on the testing set controversial. It obscures the pipeline and may not be intuitive for beginner programmers. Moreover, sometimes transformations of the testing set differ slightly from the transformations of the training set.</p>
                    </list-item>
                    <list-item>
                        <p>It is not obvious to me why simple train-test split is implemented as "split" module and cross-validation as "pl" module. There are not very much different. The way cross-validation is currently implemented does not allow detailed control over individual folds, which is sometimes useful.</p>
                    </list-item>
                </list>
            </p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
