<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.18674.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>CGAT-core: a python framework for building scalable, reproducible computational biology workflows</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved, 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Cribbs</surname>
                        <given-names>Adam P.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-5288-3077</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Luna-Valero</surname>
                        <given-names>Sebastian</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>George</surname>
                        <given-names>Charlotte</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Sudbery</surname>
                        <given-names>Ian M.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-5038-0190</uri>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Berlanga-Taylor</surname>
                        <given-names>Antonio J.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Sansom</surname>
                        <given-names>Stephen N.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Smith</surname>
                        <given-names>Tom</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a6">6</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ilott</surname>
                        <given-names>Nicholas E.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Johnson</surname>
                        <given-names>Jethro</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a7">7</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Scaber</surname>
                        <given-names>Jakub</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0146-1821</uri>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Brown</surname>
                        <given-names>Katherine</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a8">8</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Sims</surname>
                        <given-names>David</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="corresp" rid="c2">b</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Heger</surname>
                        <given-names>Andreas</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="corresp" rid="c3">c</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>The Botnar Research Center, University of Oxford, Oxford, OX3 7LD, UK</aff>
                <aff id="a2">
                    <label>2</label>MRC WIMM Centre for Computational Biology, University of Oxford, Oxford, OX3 9DS, UK</aff>
                <aff id="a3">
                    <label>3</label>Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield, S10 2TN, UK</aff>
                <aff id="a4">
                    <label>4</label>MRC-PHE Centre for Environment and Health, Department of Epidemiology &amp; Biostatistics, Imperial College London, Oxford, W2 1PG, UK</aff>
                <aff id="a5">
                    <label>5</label>Kennedy Institute of Rheumatology, University of Oxford, Oxford, OX3 7FY, UK</aff>
                <aff id="a6">
                    <label>6</label>Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QR, UK</aff>
                <aff id="a7">
                    <label>7</label>The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA</aff>
                <aff id="a8">
                    <label>8</label>Division of Virology, Department of Pathology, University of Cambridge, Cambridge, CB2 1QP, UK</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:adam.cribbs@imm.ox.ac.uk">adam.cribbs@imm.ox.ac.uk</email>
                </corresp>
                <corresp id="c2">
                    <label>b</label>
                    <email xlink:href="mailto:david.sims@imm.ox.ac.uk">david.sims@imm.ox.ac.uk</email>
                </corresp>
                <corresp id="c3">
                    <label>c</label>
                    <email xlink:href="mailto:andreas.heger@gmail.com">andreas.heger@gmail.com</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>4</day>
                <month>4</month>
                <year>2019</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2019</year>
            </pub-date>
            <volume>8</volume>
            <elocation-id>377</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>1</day>
                    <month>4</month>
                    <year>2019</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Cribbs AP et al.</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/8-377/pdf"/>
            <abstract>
                <p>In the genomics era computational biologists regularly need to process, analyse and integrate large and complex biomedical datasets. Analysis inevitably involves multiple dependent steps, resulting in complex pipelines or workflows, often with several branches. Large data volumes mean that processing needs to be quick and efficient and scientific rigour requires that analysis be consistent and fully reproducible. We have developed CGAT-core, a python package for the rapid construction of complex computational workflows. CGAT-core seamlessly handles parallelisation across high performance computing clusters, integration of Conda environments, full parameterisation, database integration and logging. To illustrate our workflow framework, we present a pipeline for the analysis of RNAseq data using pseudo-alignment.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>workflow</kwd>
                <kwd>pipeline</kwd>
                <kwd>python</kwd>
                <kwd>genomics</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/501100000265">
                    <funding-source>Medical Research Council</funding-source>
                    <award-id>G1000902</award-id>
                </award-group>
                <funding-statement>This work was funded by the Medical Research Council (UK) Computational Genomics Analysis and Training programme (G1000902).</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Genomic technologies have given researchers the ability to produce large amounts of data at relatively low cost. Bioinformatic analyses typically involve passing data through a series of manipulations and transformations, called a pipeline or workflow. The need for tools to manage workflows is well established, with a wide range of options available from graphical user interfaces such as Galaxy
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup> and Taverna
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>, aimed at non-programmers, to Snakemake, Nextflow, Toil, CGAT-Ruffus and others
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup> developed with computational biologists in mind. These tools differ in their portability, scalability, parameter handling, extensibility, and ease of use. In a recent survey
                <sup>
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup>, the tool rated highest for ease of pipeline development was CGAT-Ruffus
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>, a Python package that wraps pipeline steps in discrete Python functions, called &#x2018;tasks&#x2019;. It uses Python decorators to track the dependencies between tasks, ensuring that dependent tasks are completed in the correct order and independent tasks can be run in parallel. If a pipeline is interrupted before completion, or new input files are added, only data sets that are missing or out-of-date are re-run. CGAT-Ruffus implements a wide range of decorators that allow complex operations on input files including: conversion of a single input file to a single output file; splitting of a file into multiple files (and vice versa) and conditional merging of multiple input files into a smaller number of outputs. More advanced options include combining combinations or permutations of input files and conditional execution based on input parameters. Use of decorators means that CGAT-Ruffus pipelines are native Python scripts, rather than the domain specific languages (DSLs) used in other workflow tools. A key advantage of this, in addition to python being an already widely understood language in computational biology, is that individual steps can use arbitrary python code, both in how they are linked together and in the actual processing task.</p>
            <p>Here, we introduce Computational Genomics Analysis Toolkit (CGAT)-core
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup>, an open-source python library that extends the functionality of CGAT-Ruffus by adding cluster interaction, parameterisation, logging, database interaction and 
                <ext-link ext-link-type="uri" xlink:href="https://conda.io/">Conda</ext-link> environment switching.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <p>CGAT-core
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup> extends the functionality of CGAT-Ruffus by providing a common interface to control distributed resource management systems using 
                <ext-link ext-link-type="uri" xlink:href="http://www.drmaa.org/">Distributed Resource Management Application API (DRMAA)</ext-link>,. Currently, we support interaction with Sun Grid Engine, Slurm and PBS-pro/Torque. The execution engine enables tasks to be run locally or on a high-performance computing cluster and supports cluster distribution of both command line scripts (
                <italic toggle="yes">cgatcore.run</italic>) and python functions (
                <italic toggle="yes">cgatcore.cluster</italic>). System resources (number of cores to use, amount of RAM to allocate) can be set on a per-pipeline, per-task, or per task-instance basis, even allowing allocation to be based in variables, for example input file size.</p>
            <sec>
                <title>Operation</title>
                <p>The parameter management component encourages the separation of workflow/tool configuration from implementation to build re-usable workflows. Algorithm parameters are collected in a single human-readable yaml configuration file. Thus, parameters can be set specifically for each dataset, without the need to modify the code. For example, sequencing data can be aligned to a different reference genome, by simply changing the path to the genome index in the yaml file. Both pipeline-wide and job-local parameters are automagically substituted into command line statements at execution-time.</p>
                <p>To assist with reproducibility, record keeping and error handling CGAT-core provides multi-level logging during workflow execution, recording full details of runtime parameters, environment configuration and tracking job submissions. Additionally, CGAT-core provides a simple, lightweight interface for interacting with relational databases such as SQLite (
                    <italic toggle="yes">cgatcore.database</italic>), facilitating loading of analysis results at any step of the workflow, including combining output from parallel steps in single wide- or long-format tables.</p>
                <p>CGAT-core can load a different Conda environment for each step of the analysis, enabling the use of tools with conflicting software requirements. Furthermore, providing Conda environment files alongside pipeline scripts ensures that analyses can be fully reproduced.</p>
                <p>CGAT-core workflows are Python scripts, and as such are stand-alone command line utilities that do not require the installation of a dedicated service. In order to reproducibly execute our workflows, we provide utility functions for argument parsing, logging and record keeping within scripts (
                    <italic toggle="yes">cgatcore.experiment</italic>). Workflows are started, inspected and configured through the command line interface. Therefore, workflows become just another tool and can be re-used within other workflows. Furthermore, workflows can leverage the full power of Python, making them completely extensible and flexible.</p>
            </sec>
        </sec>
        <sec>
            <title>Implementation</title>
            <p>CGAT-core is implemented in Python 3 and installable via Conda and 
                <ext-link ext-link-type="uri" xlink:href="https://pypi.org/">PyPI</ext-link> with minimal dependencies. We have successfully deployed and tested the code on OSX, Red Hat and Ubuntu. We have made CGAT-core and associated repositories open-source under the MIT licence, allowing full and free use for both commercial and non-commercial purposes. Our software is fully documented (
                <ext-link ext-link-type="uri" xlink:href="https://pypi.org/">https://pypi.org</ext-link>), version controlled and has extensive testing using continuous integration (
                <ext-link ext-link-type="uri" xlink:href="https://travis-ci.org/cgat-developers">https://travis-ci.org/cgat-developers</ext-link>.) We welcome community participation in code development and issue reporting through GitHub.</p>
        </sec>
        <sec>
            <title>Use case</title>
            <p>To illustrate a simple use case of CGAT-core, we have built an example RNAseq analysis pipeline, which performs read counting using Kallisto
                <sup>
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup> and differential expression using DESeq2
                <sup>
                    <xref ref-type="bibr" rid="ref-15">15</xref>
                </sup>. This workflow and Conda environment are contained within our CGAT-showcase repository (
                <ext-link ext-link-type="uri" xlink:href="https://github.com/cgat-developers/cgat-showcase">https://github.com/cgat-developers/cgat-showcase</ext-link>). The workflow highlights how simple pipelines can be constructed using CGAT-core, demonstrating how the pipeline can be configured using a yaml file, how third-party tools can be executed efficiently across a cluster or on a local machine, and how data can be easily loaded into a database. Furthermore, we and others have been extensively using CGAT-core to build pipelines for computational genomics (
                <ext-link ext-link-type="uri" xlink:href="https://github.com/cgat-developers/cgat-flow">https://github.com/cgat-developers/cgat-flow</ext-link>).</p>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <p>CGAT-core
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup> extends the popular Python workflow engine CGAT-Ruffus by adding desirable features from a variety other workflow systems to form an extremely simple, flexible and scalable package. CGAT-core provides seamless high-performance computing cluster interaction and adds Conda environment integration for the first time. In addition, our framework focuses on simplifying the pipeline development and testing process by providing convenience functions for parameterisation, database interaction, logging and pipeline interaction.</p>
            <p>The ease of pipeline development enables CGAT-core to bridge the gap between exploratory data analysis and building production workflows. A guiding principle is that it should be as easy (or easier) to complete a series of tasks using a simple pipeline compared to using an interactive prompt, especially once cluster submission is considered. CGAT-core enables the production of analysis pipelines that can easily be run in multiple environments to facilitate sharing of code as part of the publication process. Thus, CGAT-core encourages a best-practice reproducible research approach by making it the path of least resistance. For example, exploratory analysis in Jupyter Notebooks can be converted to a Python script or used directly in the pipeline. Similarly, exploratory data analysis in R, or any other language, can easily be converted to a script that can be run by the pipeline. This lightweight wrapping of quickly prototyped analysis forms a lab book, enabling rapid reproduction of analyses and reuse of code for different data sets.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <p>All data underlying the results are available as part of the article and no additional source data are required.</p>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>
                <bold>Source code available from:</bold> 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/cgat-developers/cgat-core">https://github.com/cgat-developers/cgat-core</ext-link>.</p>
            <p>
                <bold>Archived source code at time of publication:</bold> 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.2598115">https://doi.org/10.5281/zenodo.2598115</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup>.</p>
            <p>
                <bold>Licence:</bold> 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/cgat-developers/cgat-core/blob/master/LICENSE">MIT License</ext-link>.</p>
        </sec>
    </body>
    <back>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Afgan</surname>
                            <given-names>E</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Baker</surname>
                            <given-names>D</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>van den Beek</surname>
                            <given-names>M</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update.</article-title>
                    <source>
				
                        <italic toggle="yes">Nucleic Acids Res.</italic>
			</source>
                    <year>2016</year>;<volume>44</volume>(<issue>W1</issue>):<fpage>W3</fpage>&#x2013;<lpage>W10</lpage>.
                    <pub-id pub-id-type="pmid">27137889</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkw343</pub-id>
                    <pub-id pub-id-type="pmcid">4987906</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Wolstencroft</surname>
                            <given-names>K</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Haines</surname>
                            <given-names>R</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Fellows</surname>
                            <given-names>D</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.</article-title>
                    <source>
				
                        <italic toggle="yes">Nucleic Acids Res.</italic>
			</source>
                    <year>2013</year>;<volume>41</volume>(<issue>Web Server issue</issue>):<fpage>W557</fpage>&#x2013;<lpage>61</lpage>.
                    <pub-id pub-id-type="pmid">23640334</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkt328</pub-id>
                    <pub-id pub-id-type="pmcid">3692062</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Okonechnikov</surname>
                            <given-names>K</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Golosova</surname>
                            <given-names>O</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Fursov</surname>
                            <given-names>M</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Unipro UGENE: a unified bioinformatics toolkit.</article-title>
                    <source>
				
                        <italic toggle="yes">Bioinformatics.</italic>
			</source>
                    <year>2012</year>;<volume>28</volume>(<issue>8</issue>):<fpage>1166</fpage>&#x2013;<lpage>7</lpage>.
                    <pub-id pub-id-type="pmid">22368248</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bts091</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Golosova</surname>
                            <given-names>O</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Henderson</surname>
                            <given-names>R</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Vaskin</surname>
                            <given-names>Y</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses.</article-title>
                    <source>
				
                        <italic toggle="yes">PeerJ.</italic>
			</source>
                    <year>2014</year>;<volume>2</volume>:<fpage>e644</fpage>.
                    <pub-id pub-id-type="pmid">25392756</pub-id>
                    <pub-id pub-id-type="doi">10.7717/peerj.644</pub-id>
                    <pub-id pub-id-type="pmcid">4226638</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Nocq</surname>
                            <given-names>J</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Celton</surname>
                            <given-names>M</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Gendron</surname>
                            <given-names>P</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Harnessing virtual machines to simplify next-generation DNA sequencing analysis.</article-title>
                    <source>
				
                        <italic toggle="yes">Bioinformatics.</italic>
			</source>
                    <year>2013</year>;<volume>29</volume>(<issue>17</issue>):<fpage>2075</fpage>&#x2013;<lpage>83</lpage>.
                    <pub-id pub-id-type="pmid">23786767</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btt352</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Gafni</surname>
                            <given-names>E</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Luquette</surname>
                            <given-names>LJ</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Lancaster</surname>
                            <given-names>AK</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>COSMOS: Python library for massively parallel workflows.</article-title>
                    <source>
				
                        <italic toggle="yes">Bioinformatics.</italic>
			</source>
                    <year>2014</year>;<volume>30</volume>(<issue>20</issue>):<fpage>2956</fpage>&#x2013;<lpage>8</lpage>.
                    <pub-id pub-id-type="pmid">24982428</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btu385</pub-id>
                    <pub-id pub-id-type="pmcid">4184253</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Vivian</surname>
                            <given-names>J</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Rao</surname>
                            <given-names>AA</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Nothaft</surname>
                            <given-names>FA</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Toil enables reproducible, open source, big biomedical data analyses.</article-title>
                    <source>
				
                        <italic toggle="yes">Nat Biotechnol.</italic>
			</source>
                    <year>2017</year>;<volume>35</volume>(<issue>4</issue>):<fpage>314</fpage>&#x2013;<lpage>316</lpage>.
                    <pub-id pub-id-type="pmid">28398314</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3772</pub-id>
                    <pub-id pub-id-type="pmcid">5546205</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Fisch</surname>
                            <given-names>KM</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Mei&#x00df;ner</surname>
                            <given-names>T</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Gioia</surname>
                            <given-names>L</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Omics Pipe: a community-based framework for reproducible multi-omics data analysis.</article-title>
                    <source>
				
                        <italic toggle="yes">Bioinformatics.</italic>
			</source>
                    <year>2015</year>;<volume>31</volume>(<issue>11</issue>):<fpage>1724</fpage>&#x2013;<lpage>8</lpage>.
                    <pub-id pub-id-type="pmid">25637560</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btv061</pub-id>
                    <pub-id pub-id-type="pmcid">4443682</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>K&#x00f6;ster</surname>
                            <given-names>J</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Rahmann</surname>
                            <given-names>S</given-names>
                        </name>
			</person-group>:
                    <article-title>Snakemake--a scalable bioinformatics workflow engine.</article-title>
                    <source>
				
                        <italic toggle="yes">Bioinformatics.</italic>
			</source>
                    <year>2012</year>;<volume>28</volume>(<issue>19</issue>):<fpage>2520</fpage>&#x2013;<lpage>2</lpage>.
                    <pub-id pub-id-type="pmid">22908215</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bts480</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Di Tommaso</surname>
                            <given-names>P</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Chatzou</surname>
                            <given-names>M</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Floden</surname>
                            <given-names>EW</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Nextflow enables reproducible computational workflows.</article-title>
                    <source>
				
                        <italic toggle="yes">Nat Biotechnol.</italic>
			</source>
                    <year>2017</year>;<volume>35</volume>(<issue>4</issue>):<fpage>316</fpage>&#x2013;<lpage>319</lpage>.
                    <pub-id pub-id-type="pmid">28398311</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3820</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Leipzig</surname>
                            <given-names>J</given-names>
                        </name>
			</person-group>:
                    <article-title>A review of bioinformatic pipeline frameworks.</article-title>
                    <source>
				
                        <italic toggle="yes">Brief Bioinform.</italic>
			</source>
                    <year>2017</year>;<volume>18</volume>(<issue>3</issue>):<fpage>530</fpage>&#x2013;<lpage>536</lpage>.
                    <pub-id pub-id-type="pmid">27013646</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bib/bbw020</pub-id>
                    <pub-id pub-id-type="pmcid">5429012</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Goodstadt</surname>
                            <given-names>L</given-names>
                        </name>
			</person-group>:
                    <article-title>Ruffus: a lightweight Python library for computational pipelines.</article-title>
                    <source>
				
                        <italic toggle="yes">Bioinformatics.</italic>
			</source>
                    <year>2010</year>;<volume>26</volume>(<issue>21</issue>):<fpage>2778</fpage>&#x2013;<lpage>9</lpage>.
                    <pub-id pub-id-type="pmid">20847218</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btq524</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Heger</surname>
                            <given-names>A</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Cribbs</surname>
                            <given-names>A</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Luna-Valero</surname>
                            <given-names>S</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>cgat-developers/cgat-core: First public release of code (Version v0.5.10).</article-title>
                    <source>
				
                        <italic toggle="yes">Zenodo.</italic>
			</source>
                    <year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.2598115">http://www.doi.org/10.5281/zenodo.2598115</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Bray</surname>
                            <given-names>NL</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Pimentel</surname>
                            <given-names>H</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Melsted</surname>
                            <given-names>P</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Near-optimal probabilistic RNA-seq quantification.</article-title>
                    <source>
				
                        <italic toggle="yes">Nat Biotechnol.</italic>
			</source>
                    <year>2016</year>;<volume>34</volume>(<issue>5</issue>):<fpage>525</fpage>&#x2013;<lpage>7</lpage>.
                    <pub-id pub-id-type="pmid">27043002</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3519</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Love</surname>
                            <given-names>MI</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Anders</surname>
                            <given-names>S</given-names>
                        </name>
			</person-group>:
                    <article-title>Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.</article-title>
                    <source>
				
                        <italic toggle="yes">Genome Biol.</italic>
			</source>
                    <year>2014</year>;<volume>15</volume>(<issue>12</issue>):<fpage>550</fpage>.
                    <pub-id pub-id-type="pmid">25516281</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-014-0550-8</pub-id>
                    <pub-id pub-id-type="pmcid">4302049</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report47003">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.20448.r47003</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Peltzer</surname>
                        <given-names>Alexander</given-names>
                    </name>
                    <xref ref-type="aff" rid="r47003a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-6503-2180</uri>
                </contrib>
                <aff id="r47003a1">
                    <label>1</label>Quantitative Biology Center (QBIC), University of T&#x00fc;bingen, T&#x00fc;bingen, Germany</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>24</day>
                <month>4</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Peltzer A</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport47003" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.18674.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Cribbs 
                <italic>et al </italic>describe CGAT-Core, a python framework for building scalable, reproducible computational biology workflows in their proposed software tool article.</p>
            <p> </p>
            <p> The rationale behind the requirements for developing the software tool is explained properly, although there are many competitor tools and alternatives already "on the market" that can do similar or more things in general. The authors briefly summarize the large variance in portability, scalability, parameter handling and extensibility of the various tools in a sufficient form. One statement I cannot confirm in the last part of the introduction "... in addition to python being an already widely understood language in computational biology, is that individual steps can use arbitrary python code, both in how they are linked together and in the actual processing tasks". This is by all means not a drawback of a DSL, as some of these (e.g. nextflow, but to my knowledge also other competitors) allow for the direct integration of Python code in their respective tasks as well: 
                <ext-link ext-link-type="uri" xlink:href="https://www.nextflow.io/docs/latest/process.html#native-execution">https://www.nextflow.io/docs/latest/process.html#native-execution</ext-link>.</p>
            <p> </p>
            <p> The statement in the methods section: "Thus, parameters can be set specifically for each datasets, without the need to modify the code" is a statement true for all workflow languages I know about (at least Snakemake, Nextflow and Toil separate configuration from actual pipeline code and can use e.g. parameter input files similar to the one that CGAT-Core uses, though with slightly different notation of course). As such, this statement should be probably changed or removed as it incorrectly makes readers assume that this is a unique feature of CGAT-Core. One could for example state that this is a best-practice pattern across various workflow languages and CGAT-Core also enables users to separate pipeline code from e.g. infrastructure or parameter descriptions.</p>
            <p> </p>
            <p> One larger lack I see is that CGAT-Core unlike e.g. Snakemake, CWL, Nextflow and others does not support container technologies such as Docker or Singularity (to only name the two biggest solutions per market share out there). There have been papers critically investigating the effects of non-containerized conda environments "which are ideal for packaging" but not for long and mid-term reproducibility of analysis questions due to changing environments on a client side (see Gr&#x00fc;ning et al
                <sup>
                    <xref ref-type="bibr" rid="rep-ref-47003-1">1</xref>
                </sup> for details).&#x00a0; These issues are only properly addressed by using containerization approaches, which CGAT-Core at the moment seems not to address. I think the authors should mention that the containerization of pipeline dependencies is a crucial part of reproducible data analysis today, and maybe whether they intend to add this in an upcoming release of CGAT-Core.</p>
            <p> </p>
            <p> To not only mention critical parts: I do think the authors did a really good job in documentation, proper GitHub organization set up with README and all required information as well as setting up a community, which I'd like to congratulate them for.</p>
            <p> </p>
            <p> I was also able to run the mentioned example pipeline with Kallisto on my local infrastructure, although I had to adapt certain parts in order to be able to do so.</p>
            <p> </p>
            <p> I'm missing two points in the discussion: benefits for users to use CGAT-Core in general over other competitor tools, and also a more critical discussion on where the framework could be extended to support other environments (e.g. no mention of any cloud service/provider?) in general.</p>
            <p> </p>
            <p> Overall the paper reads well and I think it only requires minor changes, especially highlighting differences to other available workflow tools and critically assessing pros/cons with respect to these.</p>
            <p> </p>
            <p> Typos: 
                <list list-type="bullet">
                    <list-item>
                        <p>Discussion: "... by adding desirable features from a variety other" (missing "of")</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Evolutionary Biology, Bioinformatics, Workflow/Pipeline Development, Systems Integration, Human Genetics, Population Genetics.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-47003-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Practical Computational Reproducibility in the Life Sciences.</article-title>
                        <source>
                            <italic>Cell Syst</italic>
                        </source>.<year>2018</year>;<volume>6</volume>(<issue>6</issue>) :
                        <elocation-id>10.1016/j.cels.2018.03.014</elocation-id>
                        <fpage>631</fpage>-<lpage>635</lpage>
                        <pub-id pub-id-type="pmid">29953862</pub-id>
                        <pub-id pub-id-type="doi">10.1016/j.cels.2018.03.014</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment4724-47003">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Cribbs</surname>
                            <given-names>Adam</given-names>
                        </name>
                        <aff>The University of Oxford, UK</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>5</day>
                    <month>7</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We would like to thank you for taking the time to review the manuscript. We are grateful for your perceptive suggestions and we have updated the manuscript, code and documentation in response as outlined below:</p>
                <p> 
                    <italic>1. Cribbs&#x00a0;et al&#x00a0;describe CGAT-Core, a python framework for building scalable, reproducible computational biology workflows in their proposed software tool article.</italic>
                </p>
                <p>
                    <italic> The rationale behind the requirements for developing the software tool is explained properly, although there are many competitor tools and alternatives already "on the market" that can do similar or more things in general. The authors briefly summarize the large variance in portability, scalability, parameter handling and extensibility of the various tools in a sufficient form.&#x00a0;</italic>
                </p>
                <p>
                    <italic> One statement I cannot confirm in the last part of the introduction "... in addition to python being an already widely understood language in computational biology, is that individual steps can use arbitrary python code, both in how they are linked together and in the actual processing tasks". This is by all means not a drawback of a DSL, as some of these (e.g. nextflow, but to my knowledge also other competitors) allow for the direct integration of Python code in their respective tasks as well:&#x00a0;
                        <ext-link ext-link-type="uri" xlink:href="https://www.nextflow.io/docs/latest/process.html#native-execution">https://www.nextflow.io/docs/latest/process.html#native-execution</ext-link>.</italic>
                </p>
                <p> We did not mean to suggest that incorporation of Python code is a feature unique to CGAT-core, but that the ability to use Python to link tasks is a strength. Users do not have to learn a separate language for building a workflow, thus there is a very simple evolution from writing your first linear python script, then putting actions into functions and then combining these into a workflow through Ruffus decorators. We have reworded this statement to make our intention clearer. &#x201c;A key advantage of this is that Python code can be used to link individual steps, as well as in processing tasks."&#x00a0;</p>
                <p> </p>
                <p> 
                    <italic>2. The statement in the methods section: "Thus, parameters can be set specifically for each datasets, without the need to modify the code" is a statement true for all workflow languages I know about (at least Snakemake, Nextflow and Toil separate configuration from actual pipeline code and can use e.g. parameter input files similar to the one that CGAT-Core uses, though with slightly different notation of course). As such, this statement should be probably changed or removed as it incorrectly makes readers assume that this is a unique feature of CGAT-Core. One could for example state that this is a best-practice pattern across various workflow languages and CGAT-Core&#x00a0;&#x00a0;also enables users to separate pipeline code from e.g. infrastructure or parameter descriptions.</italic>
                </p>
                <p> This statement was meant to compare with CGATcore with existing Ruffus functionality not other workflow tools. We have changed the text to make this clear. &#x201c;Thus, parameters can be set specifically for each dataset, without the need to modify the code, a feature seen in many other workflow management systems.&#x201d;</p>
                <p> </p>
                <p> 
                    <italic>3. One larger lack I see is that CGAT-Core unlike e.g. Snakemake, CWL, Nextflow and others does not support container technologies such as Docker or Singularity (to only name the two biggest solutions per market share out there). There have been papers critically investigating the effects of non-containerized conda environments "which are ideal for packaging" but not for long and mid-term reproducibility of analysis questions due to changing environments on a client side (see Gr&#x00fc;ning et al
                        <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/8-377/v1#rep-ref-47003-1">1</ext-link>&#x00a0;for details).&#x00a0; These issues are only properly addressed by using containerization approaches, which CGAT-Core at the moment seems not to address. I think the authors should mention that the containerization of pipeline dependencies is a crucial part of reproducible data analysis today, and maybe whether they intend to add this in an upcoming release of CGAT-Core.</italic>
                </p>
                <p>
                    <italic> &#x00a0;</italic>
                </p>
                <p> This is indeed a very important issue. This work is in our development plan and recently we have added prototype containerisation using kubernetes and/or singularity. We mention this ongoing development in the discussion section. &#x201c;CGAT-core is under active development by the CGAT-Developers GitHub community. Support for cloud storage interaction, containerisation and LSF cluster interaction are currently being developed.&#x201d;</p>
                <p> </p>
                <p> 
                    <italic>4. To not only mention critical parts: I do think the authors did a really good job in documentation, proper GitHub organization set up with README and all required information as well as setting up a community, which I'd like to congratulate them for.</italic>
                </p>
                <p>
                    <italic> I was also able to run the mentioned example pipeline with Kallisto on my local infrastructure, although I had to adapt certain parts in order to be able to do so.</italic>
                </p>
                <p> Many thanks, we believe it is important to follow best practices in software development and maintenance.</p>
                <p> </p>
                <p> 
                    <italic>5. I'm missing two points in the discussion: benefits for users to use CGAT-Core in general over other competitor tools, and also a more critical discussion on where the framework could be extended to support other environments (e.g. no mention of any cloud service/provider?) in general.&#x00a0;</italic>
                </p>
                <p>
                    <italic> Overall the paper reads well and I think it only requires minor changes, especially highlighting differences to other available workflow tools and critically assessing pros/cons with respect to these.</italic>
                </p>
                <p> In this manuscript we cite a review (Leipzig, 2017) that compares many leading workflow tools including Ruffus, rather than perform a detailed comparison of CGAT-core with other tools ourselves. The focus of this paper was to elaborate the improvements that CGAT-core adds to Ruffus, building on the strengths highlighted by Leipzig and adding useful features available in other workflow tools.&#x00a0;</p>
                <p> However, we do agree that the framework could be extended to support other environments. Indeed, we have added support for cloud data storage using the Boto3 SDK for AWS S3 storage interaction and moto for Google Cloud Storage. We are planning to add support for LSF to enable CGAT-core to be used in the Genomics England computing environment. We now mention these new developments in our discussion section. &#x201c;Support for cloud storage interaction, containerisation and LSF cluster interaction are currently being developed.&#x201d;</p>
                <p> </p>
                <p> 
                    <italic>6. Typos:</italic>
                </p>
                <p>
                    <italic> Discussion: "... by adding desirable features from a variety other" (missing "of")</italic>
                </p>
                <p> Many thanks, this typo has now been corrected.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report46758">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.20448.r46758</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Ryan</surname>
                        <given-names>Devon P.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r46758a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8549-0971</uri>
                </contrib>
                <aff id="r46758a1">
                    <label>1</label>Max Planck Institute of Immunobiology and Epigenetics (MPI-IE), Freiburg, Germany</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>16</day>
                <month>4</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Ryan DP</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport46758" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.18674.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Writing and using analysis pipelines has become a bioinformatician's (or more generally a data analyst's) bread and butter. There are a number of frameworks to perform such analyses, of which CGAT-Ruffus is preferred by many. CGAT-core brings some welcome functionality to Ruffus. In my mind, the addition of parameterisation and conda environment switching were the two biggest stumbling blocks to using Ruffus previously and CGAT-core seems to nicely address both of these.</p>
            <p> </p>
            <p> After going through the documentation (and a fair bit of trial and error due to not being particularly used to using Ruffus syntax) I was able to create and run a very simple workflow from scratch that included paired-end read trimming (cutadapt) and alignment (STAR) and used the parameterisation added by CGAT-core.</p>
            <p> </p>
            <p> While not critical to the manuscript, it'd be nice if the authors could address the following in the documentation (apologies to the authors if I missed some of these in the documentation): 
                <list list-type="bullet">
                    <list-item>
                        <p>The conda install instructions should be `conda install -c conda-forge -c bioconda cgatcore`</p>
                    </list-item>
                    <list-item>
                        <p>Can examples of processing paired-end data be included in the showcase or elsewhere in the documentation? A particular step that I found difficult to implement the first time was performing read trimming such that the resulting files had the same name but were placed in a different directory. It turns out that this can be done with `formatter()`, but an example like the following in the documentation would probably be useful to new users like me:</p>
                    </list-item>
                </list> &#x00a0;&#x00a0;&#x00a0; # pairs is a list of (read1, read2) tuples</p>
            <p> &#x00a0;&#x00a0;&#x00a0; @transform(pairs, formatter(".+/(?P.*)_R[12].fastq.gz$"), ("trimmed/{SAMPLE[0]}_R1.fastq.gz", "trimmed/{SAMPLE[0]}_R2.fastq.gz")) 
                <list list-type="bullet">
                    <list-item>
                        <p>Is there any way to use standard commands to run jobs on a cluster rather than drmaa? Novice users are likely to have an easier time modifying `qsub` and `srun` commands than finding the drmaa shared libraries.</p>
                    </list-item>
                    <list-item>
                        <p>&#x00a0;At least in my testing it wasn't possible to use something like the following in a command:</p>
                    </list-item>
                </list> &#x00a0;&#x00a0;&#x00a0; cmd = """module load cutadapt</p>
            <p> &#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0; cutadapt ..."""</p>
            <p> </p>
            <p> One can combine the two commands with `&amp;&amp;` (or use a conda env), but I didn't notice this in the documentation. That's only a mild annoyance, but it should be mentioned to new users (especially those familiar with snakeMake, where commands are written to a shell script that's then submitted to the cluster).</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics, epigenetics, immunobiology and neuroscience</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <sub-article article-type="response" id="comment4723-46758">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Cribbs</surname>
                            <given-names>Adam</given-names>
                        </name>
                        <aff>The University of Oxford, UK</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>5</day>
                    <month>7</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We would like to thank you for taking the time to review the manuscript, code and documentation so thoroughly. We are grateful for your helpful suggestions and we have updated our documentation in response as outlined below:</p>
                <p> 
                    <italic>1. Writing and using analysis pipelines has become a bioinformatician's (or more generally a data analyst's) bread and butter. There are a number of frameworks to perform such analyses, of which CGAT-Ruffus is preferred by many. CGAT-core brings some welcome functionality to Ruffus. In my mind, the addition of parameterisation and conda environment switching were the two biggest stumbling blocks to using Ruffus previously and CGAT-core seems to nicely address both of these.</italic>
                </p>
                <p>
                    <italic> After going through the documentation (and a fair bit of trial and error due to not being particularly used to using Ruffus syntax) I was able to create and run a very simple workflow from scratch that included paired-end read trimming (cutadapt) and alignment (STAR) and used the parameterisation added by CGAT-core.</italic>
                </p>
                <p>
                    <italic> While not critical to the manuscript, it'd be nice if the authors could address the following in the documentation (apologies to the authors if I missed some of these in the documentation):</italic>
                </p>
                <p>
                    <italic> The conda install instructions should be `conda install -c conda-forge -c bioconda cgatcore`</italic>
                </p>
                <p> 
                    <bold>Thank you, we have modified this in the documentation (https://cgat-core.readthedocs.io/en/latest/getting_started/Installation.html).</bold>
                </p>
                <p> 
                    <italic>2. Can examples of processing paired-end data be included in the showcase or elsewhere in the documentation? A particular step that I found difficult to implement the first time was performing read trimming such that the resulting files had the same name but were placed in a different directory. It turns out that this can be done with `formatter()`, but an example like the following in the documentation would probably be useful to new users like me:</italic>
                </p>
                <p>
                    <italic> &#x00a0;&#x00a0;&#x00a0; # pairs is a list of (read1, read2) tuples</italic>
                </p>
                <p>
                    <italic> &#x00a0;&#x00a0;&#x00a0; @transform(pairs, formatter(".+/(?P.*)_R[12].fastq.gz$"), ("trimmed/{SAMPLE[0]}_R1.fastq.gz", "trimmed/{SAMPLE[0]}_R2.fastq.gz"))</italic>
                </p>
                <p> Thank you, we have included this in the documentation on how to write pipelines (https://cgat-core.readthedocs.io/en/latest/defining_workflow/Writing_workflow.html#useful-information-regarding-decorators).</p>
                <p> 
                    <italic>3. Is there any way to use standard commands to run jobs on a cluster rather than drmaa? Novice users are likely to have an easier time modifying `qsub` and `srun` commands than finding the drmaa shared libraries.</italic>
                </p>
                <p> It is possible to run standard job submission commands like qsub etc. as a command-line statement and run the pipelines in &#x2013;no-cluster mode. We have added this to the documentation (https://cgat-core.readthedocs.io/en/latest/getting_started/Examples.html#using-qsub-commands). Furthermore, we have added some instructions in the installation page of the documentation of how to find drmaa libraries and set up an appropriate environment variable to store the location (https://cgat-core.readthedocs.io/en/latest/getting_started/Installation.html).</p>
                <p> 
                    <italic>4. At least in my testing it wasn't possible to use something like the following in a command:</italic>
                </p>
                <p>
                    <italic> &#x00a0;&#x00a0;&#x00a0; cmd = """module load cutadapt</italic>
                </p>
                <p>
                    <italic> &#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0; cutadapt ..."""</italic>
                </p>
                <p>
                    <italic> One can combine the two commands with `&amp;&amp;` (or use a conda env), but I didn't notice this in the documentation. That's only a mild annoyance, but it should be mentioned to new users (especially those familiar with snakeMake, where commands are written to a shell script that's then submitted to the cluster).</italic>
                </p>
                <p> Apologies, the documentation did make this clear. The commands are written to shell scripts, but line breaks are not preserved. Hence:</p>
                <p> cmd = &#x201c;&#x201d;&#x201d;module cutadapt &amp;&amp; cutadadapt &#x2026; &#x201c;&#x201d;&#x201d;&#x00a0;</p>
                <p> will work as well as:</p>
                <p> cmd = &#x201c;&#x201d;&#x201d;module cutadapt;&#x00a0;</p>
                <p> cutadapt ...</p>
                <p> &#x201c;&#x201d;&#x201d;&#x201d;</p>
                <p> We have updated the documentation accordingly (https://cgat-core.readthedocs.io/en/latest/defining_workflow/Writing_workflow.html?highlight=module%20cutadapt#combining-commands-together ).</p>
            </body>
        </sub-article>
    </sub-article>
</article>
