<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.9794.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                    <subj-group>
                        <subject>Bioinformatics</subject>
                    </subj-group>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>The ISMARA client</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no" equal-contrib="yes">
                    <name>
                        <surname>Artimo</surname>
                        <given-names>Panu</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no" equal-contrib="yes">
                    <name>
                        <surname>Duvaud</surname>
                        <given-names>S&#x00e9;verine</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no" equal-contrib="yes">
                    <name>
                        <surname>Pachkov</surname>
                        <given-names>Mikhail</given-names>
                    </name>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ioannidis</surname>
                        <given-names>Vassilios</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>van Nimwegen</surname>
                        <given-names>Erik</given-names>
                    </name>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Stockinger</surname>
                        <given-names>Heinz</given-names>
                    </name>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland</aff>
                <aff id="a2">
                    <label>2</label>Biozentrum, University of Basel &amp; SIB Swiss Institute of Bioinformatics, Basel, Switzerland</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:Heinz.Stockinger@sib.swiss">Heinz.Stockinger@sib.swiss</email>
                </corresp>
                <fn id="fn1">
                    <p>*Equal contributors</p>
                </fn>
                <fn fn-type="con">
                    <p>PA and SD developed the client application in Qt5/QML, integrating server-side scripts (in Python and R) that were developed and provided by MP. MP provided guidance on ISMARA's functionality, code and data. VI and HS did testing and project management. EvN provided the initial idea and overall supervision for the project/application. All authors contributed to writing this article.</p>
                </fn>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>15</day>
                <month>12</month>
                <year>2016</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2016</year>
            </pub-date>
            <volume>5</volume>
            <elocation-id>ELIXIR-2851</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>8</day>
                    <month>12</month>
                    <year>2016</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2016 Artimo P et al.</copyright-statement>
                <copyright-year>2016</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/5-2851/pdf"/>
            <abstract>
                <p>ISMARA (
                    <ext-link ext-link-type="uri" xlink:href="https://ismara.unibas.ch/fcgi/mara">ismara.unibas.ch</ext-link>) automatically infers the key regulators and regulatory interactions from high-throughput gene expression or chromatin state data. However, given the large sizes of current next generation sequencing (NGS) datasets, data uploading times are a major bottleneck. Additionally, for proprietary data, users may be uncomfortable with uploading entire raw datasets to an external server. Both these problems could be alleviated by providing a means by which users could pre-process their raw data locally, transferring only a small summary file to the ISMARA server. We developed a stand-alone client application that pre-processes large input files (RNA-seq or ChIP-seq data) on the user's computer for performing ISMARA analysis in a completely automated manner, including uploading of small processed summary files to the ISMARA server. This reduces file sizes by up to a factor of 1000, and upload times from many hours to mere seconds. The client application is available from 
                    <ext-link ext-link-type="uri" xlink:href="https://ismara.unibas.ch/ISMARA/client/">ismara.unibas.ch/ISMARA/client</ext-link>.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>bioinformatics</kwd>
                <kwd>data analysis</kwd>
                <kwd>motif activity response analysis</kwd>
                <kwd>genome</kwd>
                <kwd>command line tool</kwd>
                <kwd>Graphical User Interface</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/501100003497">
                    <funding-source>Swiss State Secretariat for Education, Research and Innovation</funding-source>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/100008375">
                    <funding-source>Universit&#x00e4;t Basel</funding-source>
                </award-group>
                <award-group id="fund-3" xlink:href="http://dx.doi.org/10.13039/501100001711">
                    <funding-source>Schweizerischer Nationalfonds zur F&#x00f6;rderung der Wissenschaftlichen Forschung</funding-source>
                </award-group>
                <funding-statement>Swiss State Secretariat for Education, Research and Innovation (SERI), in part. Development work on ISMARA in the van Nimwegen group is supported by the University of Basel, and by the CellPlasticity and BrainstemX grant of the Swiss National Science Foundation in the context of the SystemsX.ch initiative.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Motif activity response analysis (MARA) is a general method that models genome-wide expression or chromatin state data in terms of computationally predicted regulatory sites for transcription factors (TFs) and microRNAs to infer the key regulators, their targets, and regulatory interactions between regulators, that are operating in a given system (
                <xref ref-type="bibr" rid="ref-1">Arnold 
                    <italic toggle="yes">et al.,</italic> 2013</xref>; 
                <xref ref-type="bibr" rid="ref-3">Balwierz 
                    <italic toggle="yes">et al</italic>., 2014</xref>; 
                <xref ref-type="bibr" rid="ref-4">Suzuki 
                    <italic toggle="yes">et al</italic>., 2009</xref>). MARA has been successfully used to reconstruct core regulatory networks across a wide range of mammalian systems (e.g. see 
                <xref ref-type="bibr" rid="ref-3">Balwierz 
                    <italic toggle="yes">et al</italic>., 2014</xref> and citations therein) and has recently been implemented as a completely automated online system called ISMARA (Integrated System for Motif Activity Response Analysis; 
                <ext-link ext-link-type="uri" xlink:href="https://ismara.unibas.ch/fcgi/mara">ismara.unibas.ch</ext-link>; 
                <xref ref-type="bibr" rid="ref-3">Balwierz 
                    <italic toggle="yes">et al</italic>., 2014</xref>). ISMARA is also one of many resources that are part of Switzerland&#x2019;s Service Delivery Plan in ELIXIR (
                <ext-link ext-link-type="uri" xlink:href="http://www.elixir-europe.org">http://www.elixir-europe.org</ext-link>). To run ISMARA, a user only needs to upload her/his raw data to the server, which can be either gene expression data (microarray or RNA-seq data) or chromatin state data (ChIP-seq data) from a set of biological samples. Although ISMARA is a highly popular tool, the current sizes of raw next-generation sequencing datasets are so large (up to hundreds of GBs), that their upload to the web server can require many hours, and this has become a major bottleneck for many users.</p>
            <p>To address this problem, we have developed a stand-alone client application (called the ISMARA client) that completely automates the process of pre-processing the user's raw data on her/his own computer, and transmits the much smaller resulting processed files to the ISMARA server for analysis. Since the processed files are many orders of magnitude smaller than the original raw files, the upload is short, even with slow Internet connection speeds.</p>
            <p>The resulting processed file (typically several MBs large) is a simple tab-delimited file, which is sent to the ISMARA web server, where it is analyzed in the exactly the same way as when raw data is uploaded. The pre-processing that the ISMARA client performs is also identical to the pre-processing that would otherwise take place on the ISMARA server. Overall, by reducing transfer load and therefore upload times, the ISMARA server is less busy with file transfers, can respond quicker to client requests and the end-user experience is generally improved by shorter waiting times.</p>
            <p>Another important feature of the ISMARA client is that it allows users to only communicate highly summarized data to the ISMARA server. In many cases users may be uncomfortable with uploading entire raw datasets of potentially highly competitive data to an external server. By using the ISMARA client, the raw data stays within the premises of users, whereas only small summary information is sent to the ISMARA server for further processing.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods and implementation</title>
            <p>In developing the ISMARA client application, our main objectives were to reduce data transfer times and to provide a software application that is easy to install and use on several platforms, i.e., operating systems. We selected the framework Qt5 (
                <ext-link ext-link-type="uri" xlink:href="http://www.qt.io">www.qt.io</ext-link>) using QML (
                <ext-link ext-link-type="uri" xlink:href="http://doc.qt.io/qt-5/qtqml-index.html">http://doc.qt.io/qt-5/qtqml-index.html</ext-link>) for the user interface and C++ for the platform-independent part. Several of the pre-processing steps that are currently performed on raw data by the ISMARA web server have been implemented on the client side, i.e., within the ISMARA client, and packaged as a native application for Mac OS X and Linux.</p>
            <p>The ISMARA client can process microarray data (CEL files), and RNA-seq and ChIP-seq data (BAM/BED files). Depending on the data type there are different processing procedures. For microarray data, the ISMARA client first performs background correction on the probe intensities, followed by correction and adjustment for non-specific binding, and then filters out consistently non-expressed probes. After this, it quantile normalizes the intensities across the samples and log-transforms them. A list of microarray chips that are currently supported is available on the ISMARA website (cf &#x201c;Usage&#x201d; at 
                <ext-link ext-link-type="uri" xlink:href="https://ismara.unibas.ch/fcgi/mara">ismara.unibas.ch/fcgi/mara</ext-link>). For RNA-seq data, the client first sorts and indexes the input files, maps the reads to ISMARA's transcript set for the corresponding organism, uses ISMARA's associations between promoters and transcripts and the annotated transcript lengths to calculate normalized expression levels per promoter, and finally log-transforms the expression levels. ChIP-seq data files are sorted and indexed, reads that map to promoter regions (2kb regions centered on each promoter) are counted, the counts are normalized and log-transformed. Detailed descriptions of all processing steps can be found in the original ISMARA paper (
                <xref ref-type="bibr" rid="ref-3">Balwierz 
                    <italic toggle="yes">et al</italic>., 2014</xref>).</p>
            <p>The actual software application uses several external tools, including samtools (
                <xref ref-type="bibr" rid="ref-5">Li 
                    <italic toggle="yes">et al</italic>., 2009</xref>), htslib and bedops (
                <xref ref-type="bibr" rid="ref-6">Neph 
                    <italic toggle="yes">et al</italic>., 2012</xref>), as well as scripts and modules in R and Python. Additionally, a new internal interface has been developed on the ISMARA server that is used by the ISMARA client to automatically upload locally pre-processed data.</p>
            <p>From a user's point of view, the ISMARA client is a convenient tool that takes large raw data files as input, processes them locally (using several CPU cores in parallel) and then submits the results of the pre-processing as a tab-delimited text file to the ISMARA server. The server then performs MARA on this pre-processed data and displays the final results in a web page, i.e. exactly as when raw data are uploaded to the web server. The user experience of the client and the existing web application are very similar, i.e., the client follows the web site's look and feel. The user starts by selecting the data type (microarray, RNA-seq or ChIP-seq): for RNA-seq and ChIP-seq, the user is also requested to select a genome assembly [human genome versions hg18 or hg19 or mouse genome version 9 (mm9)].</p>
            <p>Once the options are selected, a user can add files in CEL, BAM or BED formats. Next, the pre-processing is started by clicking on the &#x201c;Process data&#x201d; button. Note that, if present, the &#x201c;Email&#x201d; and &#x201c;Project name&#x201d; fields can be used by the ISMARA server to send a notification when processing of a specific job has finished.</p>
            <p>Additionally, the ISMARA client also implements a new functionality that is currently not available on the web server: several jobs, i.e., processing/submission requests can be managed with the client application. In particular, the client stores all on-going and finished jobs of the user, including their download URLs, so that it is easy to manage multiple sets of experiments. Detailed log information is also available and can be copy-pasted for further communication with the ISMARA team in case of problems or questions.</p>
            <sec>
                <title>Supported platforms and requirements</title>
                <p>In order to allow and test for platform-independence, the application was developed on several Linux flavours (Linux Mint, CentOS and Ubuntu), as well as on Mac OS X using 
                    <monospace>bash</monospace> UNIX shell as the main glue between scripts and external applications. Original plans also included to support MS Windows natively (Qt5 allows that), but external dependencies on scripting and bioinformatics software,  such  as Python, samtools, R, and Bash, for which support is limited on MS Windows, could not be resolved without  considerable  re-engineering  efforts. Therefore, we decided to use VirtualBox (
                    <ext-link ext-link-type="uri" xlink:href="http://www.virtualbox.org">http://www.virtualbox.org</ext-link>) to create disk images that can also be run on Windows machines. In detail, an Ubuntu client image of ISMARA can be run on VirtualBox and installed on MS Windows, allowing Windows users to make use of the ISMARA client.</p>
                <p>In summary, easily installable binary applications of the ISMARA client are currently provided on-line for Ubuntu 15.04 and Mac OS X (10.10 and 10.11). Additionally, other Linux flavours and/or virtual machine images via VirtualBox can be provided on demand. The ISMARA client can be installed on a machine with the following modest requirements: 4 GB RAM, and fairly recent versions of R (3.2.0 and 3.1.2 for Mac and Linux, respectively) and Python (2.7.6 and 2.7.9 for Mac and Linux, respectively) need to be preinstalled. Notably, because experimental files can be several tens of GBs large, the client allows machines with limited amounts of disk space to make use of external hard drives. Importantly, usage of an external hard drive has no significant impact on the pre-processing performance and can be easily set up from the ISMARA client&#x2019;s preferences.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <p>To assess the performance of the client in comparison with usage of the ISMARA webserver directly we compared two scenarios that we denoted S1 and S2 (cf. 
                <xref ref-type="table" rid="T1">Table 1</xref>): S1 uses the ISMARA client to pre-process data (P1), uploads small summary files to the server (Upload), and then performs the final analysis on the server (P2). Scenario S2 uploads all data (i.e., large files) to the ISMARA server directly, without using the ISMARA client, and lets the server perform both the pre-processing and final analysis (P1+P2). We tested both scenarios on networks with different speeds and used two different datasets: a set of RNA-seq files (GEO accession, GSE30611) with a total size of 30.2 GB, and a set of ChIP-seq files (GEO accession, GSE26386) with a total size of 3.6 GB.</p>
            <table-wrap id="T1" orientation="portrait" position="anchor">
                <label>Table 1. </label>
                <caption>
                    <title>Performance results using ISMARA client with three different input datasets.</title>
                    <p>The analysis used a client with 4 cores and a server with 12 cores, on both fast and slow networks.  Tests were done in July 2016.</p>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1">RNA-seq</th>
                            <th align="left" colspan="1" rowspan="1">Network</th>
                            <th align="left" colspan="1" rowspan="1">Upload </th>
                            <th align="left" colspan="1" rowspan="1">P1</th>
                            <th align="left" colspan="1" rowspan="1">Upload </th>
                            <th align="left" colspan="1" rowspan="1">P2</th>
                            <th align="left" colspan="1" rowspan="1">Total</th>
                        </tr>
                        <tr>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1">
                                <italic toggle="yes">30.2 GB</italic>
                            </th>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1">
                                <italic toggle="yes">17.4 MB</italic>
                            </th>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1"/>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td colspan="1" rowspan="2">S1 client+ 
                                <break/>server</td>
                            <td colspan="1" rowspan="1">1 Gbit/s</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">95 min</td>
                            <td colspan="1" rowspan="1">3 s</td>
                            <td colspan="1" rowspan="1">70 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>165 min</bold> (2h45)</td>
                        </tr>
                        <tr>
                            <td colspan="1" rowspan="1">10 Mbit/s</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">95 min</td>
                            <td colspan="1" rowspan="1">15 s</td>
                            <td colspan="1" rowspan="1">70 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>165 min</bold> (2h45)</td>
                        </tr>
                        <tr>
                            <td colspan="1" rowspan="2">S2 server 
                                <break/>only</td>
                            <td colspan="1" rowspan="1">1 Gbit/s</td>
                            <td colspan="1" rowspan="1">30&#x2013;60 min</td>
                            <td colspan="1" rowspan="1">35 min</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">70 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>135&#x2013;165 min</bold> </td>
                        </tr>
                        <tr>
                            <td colspan="1" rowspan="1">10 Mbit/s</td>
                            <td colspan="1" rowspan="1">360 min</td>
                            <td colspan="1" rowspan="1">35 min</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">70 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>465 min</bold> (7h45)</td>
                        </tr>
                        <tr>
                            <th align="left" colspan="1" rowspan="1">ChIP-seq</th>
                            <th align="left" colspan="1" rowspan="1">Network</th>
                            <th align="left" colspan="1" rowspan="1">Upload </th>
                            <th align="left" colspan="1" rowspan="1">P1</th>
                            <th align="left" colspan="1" rowspan="1">Upload </th>
                            <th align="left" colspan="1" rowspan="1">P2</th>
                            <th align="left" colspan="1" rowspan="1">Total</th>
                        </tr>
                        <tr>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1">
                                <italic toggle="yes">3.6 GB</italic>
                            </th>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1">
                                <italic toggle="yes">10.4 MB</italic>
                            </th>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1"/>
                        </tr>
                        <tr>
                            <td colspan="1" rowspan="2">S1 client+ 
                                <break/>server</td>
                            <td colspan="1" rowspan="1">1 Gbit/s</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">8 min</td>
                            <td colspan="1" rowspan="1">3 s</td>
                            <td colspan="1" rowspan="1">15 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>23 min</bold>
                            </td>
                        </tr>
                        <tr>
                            <td colspan="1" rowspan="1">10 Mbit/s</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">8 min</td>
                            <td colspan="1" rowspan="1">13 s</td>
                            <td colspan="1" rowspan="1">15 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>23 min</bold> </td>
                        </tr>
                        <tr>
                            <td colspan="1" rowspan="2">S2 server 
                                <break/>only</td>
                            <td colspan="1" rowspan="1">1 Gbit/s</td>
                            <td colspan="1" rowspan="1">3&#x2013;8 min</td>
                            <td colspan="1" rowspan="1">7&#x2013;18 min</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">15 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>25&#x2013;41 min</bold> </td>
                        </tr>
                        <tr>
                            <td colspan="1" rowspan="1">10 Mbit/s</td>
                            <td colspan="1" rowspan="1">43 min</td>
                            <td colspan="1" rowspan="1">7&#x2013;18 min</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">15 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>65&#x2013;67 min</bold>
                            </td>
                        </tr>
                        <tr>
                            <th align="left" colspan="1" rowspan="1">Microarray</th>
                            <th align="left" colspan="1" rowspan="1">Network</th>
                            <th align="left" colspan="1" rowspan="1">Upload </th>
                            <th align="left" colspan="1" rowspan="1">P1</th>
                            <th align="left" colspan="1" rowspan="1">Upload </th>
                            <th align="left" colspan="1" rowspan="1">P2</th>
                            <th align="left" colspan="1" rowspan="1">Total</th>
                        </tr>
                        <tr>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1">
                                <italic toggle="yes">39.6 MB</italic>
                            </th>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1">
                                <italic toggle="yes">64 MB</italic>
                            </th>
                            <th colspan="1" rowspan="1"/>
                            <th colspan="1" rowspan="1"/>
                        </tr>
                        <tr>
                            <td colspan="1" rowspan="2">S1 client+ 
                                <break/>server</td>
                            <td colspan="1" rowspan="1">1 Gbit/s</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">7 min</td>
                            <td colspan="1" rowspan="1">5 s</td>
                            <td colspan="1" rowspan="1">22 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>29 min</bold>
                            </td>
                        </tr>
                        <tr>
                            <td colspan="1" rowspan="1">10 Mbit/s</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">7 min</td>
                            <td colspan="1" rowspan="1">24 s</td>
                            <td colspan="1" rowspan="1">22 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>29 min</bold>
                            </td>
                        </tr>
                        <tr>
                            <td colspan="1" rowspan="2">S2 server 
                                <break/>only</td>
                            <td colspan="1" rowspan="1">1 Gbit/s</td>
                            <td colspan="1" rowspan="1">5 s</td>
                            <td colspan="1" rowspan="1">19 min</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">40 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>59 min</bold> </td>
                        </tr>
                        <tr>
                            <td colspan="1" rowspan="1">10 Mbit/s</td>
                            <td colspan="1" rowspan="1">15 s</td>
                            <td colspan="1" rowspan="1">19 min</td>
                            <td colspan="1" rowspan="1">N/A</td>
                            <td colspan="1" rowspan="1">40 min</td>
                            <td colspan="1" rowspan="1">
                                <bold>59 min</bold> </td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <sec>
                <title>Data transfer size and speed</title>
                <p>To investigate the performance gains of the ISMARA client for transferring data of reduced size, we compared the sizes of the original input files with the data file sizes that are obtained from the pre-processing by the client (P1). We analysed expression and ChIP-seq data on middle range desktop machines (Intel core i7 quadcore processors) running Linux Mint or Mac OS X using the example data available on the ISMARA server in the &#x2018;sample data&#x2019; section. The pre-processing of ChIP-seq and RNA-seq data on the client lead to file size reductions of a factor of about 300 to more than a 1000 (10.4 MB and 17.4 MB compared to the original file sizes of 3.6 GB and 30.2 GB, respectively). A smaller file size reduces network transfer times significantly (
                    <xref ref-type="bibr" rid="ref-7">Stockinger 
                        <italic toggle="yes">et al</italic>., 2002</xref>), particularly on long low latency wide-area network connections. For the RNA-seq example in 
                    <xref ref-type="table" rid="T1">Table 1</xref>, uploading the original 30.2 GB files took from 30 to 60 min on fast networks (1 Gbit/s network speed) to 5&#x2013;6 hours on &#x201c;normal&#x201d; (mid-size/home network links with 10 Mbit/s speed). In contrast, uploading the pre-processed data file of 17.4 MB took only several seconds on both fast and slow links.</p>
            </sec>
            <sec>
                <title>Total execution speed</title>
                <p>Next, we compared end-to-end processing times of scenarios S1 and S2 (cf. column &#x2018;Total&#x2019; in 
                    <xref ref-type="table" rid="T1">Table 1</xref>). For the S1 scenario, using 4 cores for the ISMARA client, we observed a total processing time of 2h45 for 
                    <bold>RNA-seq</bold>, including client side processing, upload and web server side processing. Upload time was negligible due to the small size of the pre-processed data file.  For the S2 scenario, in which 30.2 GB of data is first uploaded to the server before all processing is done on the 12-core ISMARA server, we observed the following two total processing times: 2h15&#x2013;2h45 for a 1 Gbit/s network and 7h45 for a 10 Mbit/s network. In summary, using the client on 10 Mbit/s (&#x201c;slower&#x201d;) networks was always faster than using the server only (S2). Even for fast networks, the observed total processing time was similar for S1 and S2.</p>
                <p>For the 
                    <bold>ChIP-seq</bold> data (
                    <xref ref-type="table" rid="T1">Table 1</xref>), overall execution times of scenarios S1 and S2 were similar. Finally, we did not observe any file size reductions for 
                    <bold>microarray</bold> experiments (GEO accession, GSE26386), due to the fact that input file sizes were much smaller (e.g. 36.9 MB) for microarray data in comparison with RNA-seq and ChIP-seq data. Notably, the client pre-processed data files that were uploaded remained relatively small for microarray data as well. Overall, the total processing times for scenarios S1 and S2 with microarray data showed no significant differences.</p>
            </sec>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusion</title>
            <p>The ISMARA client works very well for medium to large datasets by reducing both data transfer times and in many cases also the overall execution times.</p>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>ISMARA client available from: 
                <ext-link ext-link-type="uri" xlink:href="https://ismara.unibas.ch/ISMARA/client/">https://ismara.unibas.ch/ISMARA/client/</ext-link>
            </p>
            <p>ISMARA client source code: 
                <ext-link ext-link-type="uri" xlink:href="https://gitlab.isb-sib.ch/ST/ismara-client">https://gitlab.isb-sib.ch/ST/ismara-client</ext-link>
            </p>
            <p>ISMARA client archived source code at time of publication: DOI, 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.192284">10.5281/zenodo.192284</ext-link> (
                <xref ref-type="bibr" rid="ref-2">Artimo 
                    <italic toggle="yes">et al</italic>., 2016</xref>)</p>
            <p>(
                <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/192284#.WEbJSNWLTcs">https://zenodo.org/record/192284#.WEbJSNWLTcs</ext-link>)</p>
            <p>Licence: GPL v2</p>
        </sec>
    </body>
    <back>
        <ref-list>
            <ref id="ref-1">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Arnold</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Sch&#x00f6;ler</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Pachkov</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Modeling of epigenome dynamics identifies transcription factors that mediate Polycomb targeting.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Res.</italic>
					</source>
                    <year>2013</year>;<volume>23</volume>(<issue>1</issue>):<fpage>60</fpage>&#x2013;<lpage>73</lpage>.
                    <pub-id pub-id-type="pmid">22964890</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.142661.112</pub-id>
                    <pub-id pub-id-type="pmcid">3530684</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <mixed-citation publication-type="data">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Artimo</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Davaud</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Pachkov</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>ISMARA Client [Data set].</article-title>
                    <source>
						
                        <italic toggle="yes">Zenodo.</italic>
					</source>
                    <year>2016</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.192284">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Balwierz</surname>
                            <given-names>PJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Pachkov</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Arnold</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>ISMARA: automated modeling of genomic signals as a democracy of regulatory motifs.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Res.</italic>
					</source>
                    <year>2014</year>;<volume>24</volume>(<issue>5</issue>):<fpage>869</fpage>&#x2013;<lpage>884</lpage>.
                    <pub-id pub-id-type="pmid">24515121</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.169508.113</pub-id>
                    <pub-id pub-id-type="pmcid">4009616</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
					
                        <collab>FANTOM Consortium, </collab>
						
                        <name name-style="western">
                            <surname>Suzuki</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Forrest</surname>
                            <given-names>AR</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Genetics.</italic>
					</source>
                    <year>2009</year>;<volume>41</volume>(<issue>5</issue>):<fpage>553</fpage>&#x2013;<lpage>62</lpage>.
                    <pub-id pub-id-type="pmid">19377474</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ng.375</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Handsaker</surname>
                            <given-names>B</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Wysoker</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The Sequence Alignment/Map format and SAMtools.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2009</year>;<volume>25</volume>(<issue>16</issue>):<fpage>2078</fpage>&#x2013;<lpage>9</lpage>.
                    <pub-id pub-id-type="pmid">19505943</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btp352</pub-id>
                    <pub-id pub-id-type="pmcid">2723002</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Neph</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Kuehn</surname>
                            <given-names>MS</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Reynolds</surname>
                            <given-names>AP</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>BEDOPS: high-performance genomic feature operations.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2012</year>;<volume>28</volume>(<issue>14</issue>):<fpage>1919</fpage>&#x2013;<lpage>1920</lpage>.
                    <pub-id pub-id-type="pmid">22576172</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bts277</pub-id>
                    <pub-id pub-id-type="pmcid">3389768</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Stockinger</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Samar</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Holtman</surname>
                            <given-names>K</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>File and Object Replication in Data Grids.</article-title>
                    <source>
						
                        <italic toggle="yes">Cluster Comput.</italic>
					</source>
                    <year>2002</year>;<volume>5</volume>(<issue>3</issue>):<fpage>305</fpage>&#x2013;<lpage>314</lpage>.
                    <pub-id pub-id-type="doi">10.1023/A:1015681406220</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report19973">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.10560.r19973</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Gelpi</surname>
                        <given-names>Josep Llu&#x00ed;s</given-names>
                    </name>
                    <xref ref-type="aff" rid="r19973a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0566-7723</uri>
                </contrib>
                <aff id="r19973a1">
                    <label>1</label>Joint BSC&#x00a0;- CRG -&#x00a0;IRB Programme in Computational Biology, Barcelona Supercomputing Center, University of Barcelona, Barcelona, Spain</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>7</day>
                <month>2</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Gelpi JL</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport19973" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9794.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The paper reports a client software for the ISMARA server at SIB. The rational of the application is to pre-process data at user&#x2019;s premises reducing the amount of time required to upload raw data to the server. This is indeed very reasonable and due to the usual size of input data, it represents a significant time saving. The client installs ok and software requirements are reasonable. Also the interface is friendly and easy to follow. Some comments/suggestions&#x00a0; follow:</p>
            <p> 1. Instructions to install in a virtual machine for Windows are confusing. Links go to the installation packages of Virtualbox and Ubuntu desktop. I understand that users are expected to install the software in an empty, Ubuntu VM after Virtualbox is available. This needs some skills in system administration. An easier way would be to download a VirtualBox VM with the software already installed. Consider providing such ready-to-run VM, or alternatively a container (Docker or other).</p>
            <p> 2.&#x00a0; Data is uploaded automatically after pre-processing. &#x00a0;Does the server calculation also start automatically after upload? Results page should auto-reload when calculation is completed.</p>
            <p> 3. Consider making the upload optional (although it can be the default). Users may be interested in checking the intermediate files before running the ismara calculation, and upload manually the relevant ones. Users may also store the intermediate files or re-use them for other analysis.</p>
            <p> 4. Although the client is linked to a GUI, presumably, the pre-processing work can be done also from a command line. If this is the case, help on the command line instruction and parameters would be useful. In this way, experienced users could prepare a batch pre-processing job, or perhaps chain this in a larger workflow. Details of the procedure for uploading should be indicated.</p>
            <p> 5. Source is made open, but no indication about the policy of contribution is available.</p>
            <p> 6. In openSUSE KDE desktop, the interface show some visual problems: Data type menu is cut (Use miRNA does not appear), links on the output are not clickable. Also FAQ and Technical Support links are missing.</p>
            <p> 7. URL to the ISMARA results page does not appear in the log file, while text indicates it is.</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report18518">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.10560.r18518</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Daub</surname>
                        <given-names>Carsten O.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r18518a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-5599-5565</uri>
                </contrib>
                <aff id="r18518a1">
                    <label>1</label>Department of Biosciences and Nutrition, Science For Life Laboratory (SciLifeLab), Karolinska Institutet (KI), Stockholm, Sweden</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>18</day>
                <month>1</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Daub CO</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport18518" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9794.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Artimo et al. present a software tool to pre-process microarray, RNA-Seq and ChIP-Seq data for server-based ISMARA motif activity response analysis. With the novel client tool, the data transfer from the user to the ISMARA server is dramatically reduced saving time and allowing to keep the primary data confidential.</p>
            <p> The developed client tool is a very useful complement to the ISMARA server. It makes the ISMARA server much more user friendly. The manuscript is well written with sufficient level of detail.</p>
            <p> I have two minor suggestions: 
                <list list-type="order">
                    <list-item>
                        <p>The client logfile is replaced after each start of the client. It might be helpful to be able to access logfiles for each of the jobs individually as well as even after restarting the client.</p>
                    </list-item>
                    <list-item>
                        <p>It was unclear to me to which genome version the sample data was mapped to. It might also help to state the species for the sample data in case a user does not read the GEO entries.&#x00a0;</p>
                    </list-item>
                </list>
            </p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
