<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.20233.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>A Sequence Distance Graph framework for genome assembly and analysis</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved, 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Yanes</surname>
                        <given-names>Luis</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-1382-0166</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Garcia Accinelli</surname>
                        <given-names>Gonzalo</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Wright</surname>
                        <given-names>Jonathan</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6471-8749</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ward</surname>
                        <given-names>Ben J.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Clavijo</surname>
                        <given-names>Bernardo J.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Earlham Institute, Norwich, Norfolk, NR4 7UZ, UK</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:bernardo.clavijo@earlham.ac.uk">bernardo.clavijo@earlham.ac.uk</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>23</day>
                <month>8</month>
                <year>2019</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2019</year>
            </pub-date>
            <volume>8</volume>
            <elocation-id>1490</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>12</day>
                    <month>8</month>
                    <year>2019</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Yanes L et al.</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/8-1490/pdf"/>
            <abstract>
                <p>The Sequence Distance Graph (SDG) framework works with genome assembly graphs and raw data from paired, linked and long reads. It includes a simple deBruijn graph module, and can import graphs using the graphical fragment assembly (GFA) format. It also maps raw reads onto graphs, and provides a Python application programming interface (API) to navigate the graph, access the mapped and raw data and perform interactive or scripted analyses. Its complete workspace can be dumped to and loaded from disk, decoupling mapping from analysis and supporting multi-stage pipelines. We present the design and implementation of the framework, and example analyses scaffolding a short read graph with long reads, and navigating paths in a heterozygous graph for a simulated parent-offspring trio dataset.</p>
                <p>SDG is freely available under the MIT license at 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/bioinfologics/sdg">https://github.com/bioinfologics/sdg</ext-link>.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Genome graph</kwd>
                <kwd>genome assembly</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/501100000268">
                    <funding-source>Biotechnology and Biological Sciences Research Council</funding-source>
                    <award-id>BBS/E/T/000PR9818</award-id>
                    <award-id>BB/N009819/1</award-id>
                </award-group>
                <funding-statement>This work was strategically funded by the BBSRC Core Strategic Programme &#13;
Grant [BBS/E/T/000PR9818]. Work by GGA and BJC was also partially funded by the BBSRC grant "OctoSeq: Sequencing the octoploid strawberry"[BB/N009819/1].</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Sequence graphs are the core representation of genome assemblers
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup> Their use has increased lately thanks to the graphical fragment assembly (GFA) format for graph exchange
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>, tools to work with genome variation graphs
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>, and sequence to graph mappers
                <sup>
                    <xref ref-type="bibr" rid="ref-6">6</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup> But a lack of inter operation between graph-based tools, and limited tools for downstream graph-based analysis, contribute to a perceived complexity which maintains linear sequences as the typical unit of exchange. This flattening of graph representations within pipelines with multiple steps, that use different types of sequencing in an iterative fashion, produces ever-longer linear genome sequences through an information loss process. As a result, genome assembly projects are prone to error propagation and difficult to reproduce and control. These problems can be addressed developing graph-based frameworks to integrate the analysis of hybrid datasets.</p>
            <p>The Sequence Distance Graph (SDG) framework implements a 
                <bold>SequenceDistanceGraph</bold> representation that defines sequences in nodes and their adjacency in links, and an associated 
                <bold>Workspace</bold> containing raw data and mappings. This provides an integrated working environment to use multiple sources of information to navigate and analyse genome graphs. 
                <bold>Datastores</bold> allow random access to short, linked, and long read sequences on disk. A mapper on each datastore contains methods to map the reads to the graph and access the mapping data. 
                <bold>KmerCounters</bold> provide functions to compute 
                <italic toggle="yes">k-mer</italic> coverage over the graph from sequencing data, enabling coverage analyses. Additional 
                <bold>DistanceGraphs</bold>, typically representing longer-range information and different linkage levels, define alternative topologies over the 
                <bold>SequenceDistanceGraph</bold> nodes. Finally, a 
                <bold>NodeView</bold> abstraction provides a proxy to a node, with methods to navigate the graph and access its mapped data. This comprehensive framework can be used to explore genome graphs interactively or to create processing methods for assembly or downstream analysis.</p>
            <p>Here we describe the SDG implementation and basic tools, providing examples of use cases that highlight its analytic flexibility. First, we show how to create a hybrid assembly by integration of long reads linkage into a short-read graph. Then we analyse a simulated parent-child trio and show how the coverage of the parent datasets can be used to navigate the graph topology. These are only two of the multiple ways integrating data and genome graphs can be used to perform simple but powerful analyses.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>Implementation</title>
                <p>The C++ core library implements SDG&#x2019;s data structures and methods for 
                    <bold>WorkSpaces</bold>, graphs, datastores and mappers. Its main goal is to provide a straightforward interface to project information from raw datasets onto graphs, and enable easy access and analysis of the graph-data combination. It uses OpenMP for parallel processing, and SWIG 4.0 to export a Python API to enable interactive data analysis.</p>
                <p>The 
                    <bold>SequenceDistanceGraph</bold> class contains a vector of nodes defining DNA sequences, and a vector of links. Every node has a positive and a negative end, and links are defined between these node ends. Links with positive distances represent gaps between linked sequences and negative distances represent overlaps. This representation, shown in 
                    <xref ref-type="fig" rid="f1">Figure 1</xref>, is similar to those presented in 
                    <xref ref-type="bibr" rid="ref-2">2</xref>,
                    <xref ref-type="bibr" rid="ref-11">11</xref> but unifies the concept of overlap and gap. Paths can be defined as list of nodes, with the sign of the first end in the walk. Graphs can be read and written to GFA and GFA2 files.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>A simple Sequence Distance Graph with 5 nodes, including links with d&lt;0, representing overlaps, and a link representing a gap of 10bp.</title>
                        <p>Sequences appear in only one direction and their reverse complement can be obtained by traversing the node in opposite direction, from - to +. The two largest possible paths are [1, 2, 4, 5] and [1, -3, 4, 5], and their reverse complements [-5, -4, -2, -1] and [-5, -4, 3, -1] respectively.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/22229/53efbf1c-a43e-4b47-bf99-4fba89c329a9_figure1.gif"/>
                </fig>
                <p>The 
                    <bold>DistanceGraph</bold> class contains a set of links over the nodes of a 
                    <bold>SequenceDistanceGraph</bold> object. It is used to represent alternative sources of linkage information,  such as longer range linkage produced by mapped reads for scaffolding.</p>
                <p>The 
                    <bold>WorkSpace</bold> contains a single 
                    <bold>SequenceDistanceGraph</bold>, multiple 
                    <bold>DistanceGraphs</bold>, datastores and mappers, and its structure in memory represents the status of the SDG framework. It can be dumped and loaded from disk, providing persistence and checkpoints between different steps on SDG-based pipelines. Raw reads and 
                    <italic toggle="yes">k-mer</italic> counts are kept in separate files, pointed from the 
                    <bold>WorkSpace</bold>, to avoid duplication when using multiple 
                    <bold>WorkSpaces</bold> around the same dataset.</p>
                <p>The 
                    <bold>DataStores</bold> and 
                    <bold>Mappers</bold> provide access and management to raw data and its mapping on the graph. 
                    <bold>Datastores</bold> do not load read data into memory, but rather provide random access to the on-disk data. The 
                    <bold>PairedRedMapper</bold> and 
                    <bold>LinkedReadMapper</bold> classes use a unique 
                    <italic toggle="yes">k-mer</italic> index to map reads to single nodes, with single reads mapping to multiple nodes not being mapped
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>,
                        <xref ref-type="bibr" rid="ref-12">12</xref>
                    </sup> The 
                    <bold>LongReadMapper</bold> class generates multiple mappings from each read to nodes, using a short non-unique 
                    <italic toggle="yes">k-mer</italic> index (k=15 by default)
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>,
                        <xref ref-type="bibr" rid="ref-14">14</xref>
                    </sup> Long read mapping filtering is left to later stages of the processing.</p>
                <p>The 
                    <bold>KmerCounters</bold> creates an index with all the 
                    <italic toggle="yes">k-mers</italic> at a given k up to k=31 and counts occurrences of these 
                    <italic toggle="yes">k-mers</italic> on the graph, allowing then to count occurrences in datastores or fastq files. These counts, persisted in the 
                    <bold>KmerCounter</bold> with a name, can be then accessed to perform 
                    <italic toggle="yes">k-mer</italic> coverage analyses. Projections of raw 
                    <italic toggle="yes">k-mer</italic> coverage in the reads and the assembly over a particular sequence for a node or path, similar to those produce by the "sect" tool of K-mer Analysis Toolkit (KAT)
                    <sup>
                        <xref ref-type="bibr" rid="ref-15">15</xref>
                    </sup> are valuable for content analysis. Spectra analysis of these frequencies can provide further insight into genome composition and representation on the assembly.</p>
                <p>Two processing classes, 
                    <bold>LinkageUntangler</bold> and 
                    <bold>LinkageMaker</bold>, work with alternative linkage configurations. The 
                    <bold>LinkageMaker</bold> is used to condense information via one of its 
                    <monospace>make_linkage*</monospace> methods, from evidence in the 
                    <bold>WorkSpace</bold> into links in a 
                    <bold>DistanceGraph</bold>. The 
                    <bold>LinkageUntangler</bold> class works on a 
                    <bold>DistanceGraph</bold> to simplify, condense and/or linearise its linkage. In the second use case below it can be seen how a combination of 
                    <bold>LinkageMaker</bold> and 
                    <bold>LinkageUntangler</bold> can be used for scaffolding with long reads.</p>
                <p>Finally, the 
                    <bold>NodeView</bold> class, and its associated 
                    <bold>LinkViews</bold>, provide a single-entry point for node-centric analyses. A 
                    <bold>NodeView</bold> from either a 
                    <bold>DistanceGraph</bold> or 
                    <bold>SequenceDistanceGraph</bold> is a wrapper containing a pointer to the graph and a node id, and will provide access to its nodes&#x2019; previous and next linked nodes, mapped reads, or 
                    <italic toggle="yes">k-mer</italic> coverage. A user with good understanding of the 
                    <bold>NodeView</bold> class should be able to access most information in the 
                    <bold>WorkSpace</bold> through it, making it the default choice for analysing the graph.</p>
            </sec>
            <sec>
                <title>Operation</title>
                <p>
                    <bold>
                        <italic toggle="yes">Requirements and installation.</italic>
                    </bold> SDG can be run on Linux and MacOS, and requires enough RAM to hold the WorkSpace completely in memory, which will depend on the dataset. Space to hold the uncompressed sequences on the datastores on disk will also be required.</p>
                <p>SDG  can  be  installed  via  pre-compiled  binaries  from  
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/bioinfologics/sdg/releases">https://github.com/bioinfologics/sdg/releases</ext-link>. The binaries have been built using Python3 and GCC version 6 from the Ubuntu package manager for the Linux version. The MacOS version dependencies were obtained using Homebrew (Python3, GCC-6 and SWIG). SDG can be compiled using CMake, Python3, SWIG version 4 and GCC version 6 onwards. Detailed instructions can be found at 
                    <ext-link ext-link-type="uri" xlink:href="https://bioinfologics.github.io/sdg/sdg/README.html#installation">https://bioinfologics.github.io/sdg/sdg/README.html#installation</ext-link>.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Typical workflow.</italic>
                    </bold> Working with SDG typically involves two different stages: creating a 
                    <bold>WorkSpace</bold> with the data and mappings, and analysing this 
                    <bold>WorkSpace</bold>. SDG includes command line tools to create 
                    <bold>DataStores</bold>, 
                    <bold>KmerCounts</bold>, and 
                    <bold>WorkSpaces</bold>, and map reads within a 
                    <bold>WorkSpace</bold>.</p>
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>sdg-datastore</bold>: creates a 
                            <bold>Datastore</bold> from raw reads and can process paired, 10x or long reads. An output prefix is specified as a parameter and a &lt;prefix&gt;.prseq, &lt;prefix&gt;.lrseq or &lt;prefix&gt;.loseq file is generated.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>sdg-kmercounter</bold>: creates a 
                            <bold>KmerCounter</bold> indexing a graph from a 
                            <bold>WorkSpace</bold> or GFA, or works with an already generated one. A count can be added directly from raw reads or from a datastore. The 
                            <bold>KmerCounter</bold> is persisted on file with extension &#x2019;sdgkc&#x2019;.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>sdg-workspace</bold>: creates a 
                            <bold>WorkSpace</bold> from a base graph or works with an already generated one. 
                            <bold>Datastores</bold> and 
                            <bold>KmerCounters</bold> can be added. The 
                            <bold>WorkSpace</bold> is persisted on file with extension &#x2019;sdgws&#x2019;.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>sdg-dbg</bold>: creates a 
                            <bold>WorkSpace</bold> from a 
                            <bold>PairedReadDatastore</bold> by building a 
                            <italic toggle="yes">deBruijn graph</italic> and using this as the base graph. Counts for the 
                            <italic toggle="yes">k-mers</italic> from the graph and raw reads are added too.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>sdg-mapper</bold>: maps reads within a 
                            <bold>WorkSpace</bold>. An updated 
                            <bold>WorkSpace</bold> is produced and dumped to the specified prefix.</p>
                    </list-item>
                </list>
                <p>
                    <bold>WorkSpaces</bold> can also be instantiated with an empty graph, and the graph populated through the 
                    <monospace>add_node</monospace> and 
                    <monospace>add_link</monospace> methods. The following example on a python session shows how the simple graph from 
                    <xref ref-type="fig" rid="f1">Figure 1</xref> can be created from scratch, navigated through a 
                    <bold>NodeView</bold> instance and sequence from its paths extracted.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#008000"> import</styled-content>
                        <styled-content style="font-size:9px;color:#0000FF"> pysdg</styled-content>
                        <styled-content style="font-size:9px;color:#008000"> as</styled-content>
                        <styled-content style="font-size:9px;color:#0000FF"> SDG</styled-content>

                        <styled-content style="font-size:9px;color:#000000">version </styled-content>
                        <styled-content style="font-size:9px;color:#666666">0.1</styled-content>

                        <styled-content style="font-size:9px;color:#000000">master b4d3f02</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=</styled-content>
                        <styled-content style="font-size:9px;color:#000000">SDG</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">WorkSpace()</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">add_node(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"CTACGGA"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#666666">1</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">add_node(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"GACCTTA"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#666666">2</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">add_node(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"AATACGGTCC"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#666666">3</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">add_node(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"TTACGAA"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#666666">4</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">add_node(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"CTGATATGA"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#666666">5</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">add_link</styled-content>
                        <styled-content style="font-size:9px;color:#000000">(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">-1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#666666"> 2</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#666666"> -2</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">add_link(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">-1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">, </styled-content>
                        <styled-content style="font-size:9px;color:#666666">-3</styled-content>
                        <styled-content style="font-size:9px;color:#000000">, </styled-content>
                        <styled-content style="font-size:9px;color:#666666">-3</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">add_link(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">-2</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#666666"> 4</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#666666"> -3</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">add_link(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">3</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#666666"> 4</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#666666"> -2</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">add_link(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">-4</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#666666"> 5</styled-content>,
                        <styled-content style="font-size:9px;color:#666666"> 10</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=</styled-content>
                        <styled-content style="font-size:9px;color:#000000">ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">get_nodeview(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> nv</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&lt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000">NodeView: Node </styled-content>
                        <styled-content style="font-size:9px;color:#666666">1</styled-content>
                        <styled-content style="font-size:9px;color:#AB21FF"> in</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> SDG</styled-content>
                        <styled-content style="font-size:9px;color:#666666">&gt;</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">next()</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&lt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000">Vector: </styled-content>
                        <styled-content style="font-size:9px;color:#666666">2</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> LinkViews</styled-content>
                        <styled-content style="font-size:9px;color:#666666">&gt;</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#008000"> print</styled-content>
                        <styled-content style="font-size:9px;color:#000000">(nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">next())</styled-content>

                        <styled-content style="font-size:9px;color:#000000">[</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&lt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000">LinkView: </styled-content>
                        <styled-content style="font-size:9px;color:#666666">-3</styled-content>
                        <styled-content style="font-size:9px;color:#000000">bp to Node </styled-content>
                        <styled-content style="font-size:9px;color:#666666">-3&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&lt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000">LinkView: </styled-content>
                        <styled-content style="font-size:9px;color:#666666">-2</styled-content>
                        <styled-content style="font-size:9px;color:#000000">bp to Node </styled-content>
                        <styled-content style="font-size:9px;color:#666666">2&gt;</styled-content>

                        <styled-content style="font-size:9px;color:#000000">]</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> nv </styled-content>
                        <styled-content style="font-size:9px;color:#666666">=</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">next()[</styled-content>
                        <styled-content style="font-size:9px;color:#666666">0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">]</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">node()</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> nv</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&lt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000">NodeView: Node </styled-content>
                        <styled-content style="font-size:9px;color:#666666">-3</styled-content>
                        <styled-content style="font-size:9px;color:#AB21FF"> in</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> SDG</styled-content>
                        <styled-content style="font-size:9px;color:#666666">&gt;</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#008000"> print</styled-content>
                        <styled-content style="font-size:9px;color:#000000">(nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">prev())</styled-content>

                        <styled-content style="font-size:9px;color:#000000">[</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&lt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000">LinkView: </styled-content>
                        <styled-content style="font-size:9px;color:#666666">-3</styled-content>
                        <styled-content style="font-size:9px;color:#000000">bp to Node </styled-content>
                        <styled-content style="font-size:9px;color:#666666">1&gt;</styled-content>

                        <styled-content style="font-size:9px;color:#000000">]</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sequence()</styled-content>

                        <styled-content style="font-size:9px;color:#BA2121">'GGACCGTATT'</styled-content>

                        <styled-content style="font-size:9px;color:#666666">&gt;&gt;&gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> SDG</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">SequenceDistanceGraphPath(ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg, [</styled-content>
                        <styled-content style="font-size:9px;color:#666666">1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">, </styled-content>
                        <styled-content style="font-size:9px;color:#666666">-3</styled-content>
                        <styled-content style="font-size:9px;color:#000000">, </styled-content>
                        <styled-content style="font-size:9px;color:#666666">4</styled-content>
                        <styled-content style="font-size:9px;color:#000000">, </styled-content>
                        <styled-content style="font-size:9px;color:#666666">5</styled-content>
                        <styled-content style="font-size:9px;color:#000000">])</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sequence()</styled-content>

                        <styled-content style="font-size:9px;color:#BA2121">'CTACGGACCGTATTACGAANNNNNNNNNNCTGATATGA'</styled-content>
                    </preformat>
                </p>
                <p>Typically, as shown in 
                    <xref ref-type="fig" rid="f2">Figure 2</xref>, the API is used to explore a larger 
                    <bold>WorkSpace</bold>, with the methods accessing both in-memory and on-disk data, and modifying the status of the 
                    <bold>WorkSpace</bold>.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Structure of a WorkSpace and access via an Python interactive session.</title>
                        <p>The WorkSpace holds the information for a project and contains the graphs, the mappers and 
                            <italic toggle="yes">k-mer</italic> counts. From Python, a previously saved WorkSpace is loaded from disk (1). The NodeView object is centred on a specific node and can be used to access node characteristics (ie. size and sequence), graph topology from the perspective of the node you are on (i.e. neighbours in both directions (2)) and can also retrieve information projected onto the selected node (ie. mappings (3) and 
                            <italic toggle="yes">k-mer</italic> coverage (4)). Operations such as adding a KmerCounter to the WorkSpace and adding a count (5) can be performed, and the WorkSpace can be saved back to disk (6). Once loaded, the bulk of the WorkSpace is held in memory for fast access with the raw read data from the DataStores remaining on disk accessible through random access.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/22229/53efbf1c-a43e-4b47-bf99-4fba89c329a9_figure2.gif"/>
                </fig>
            </sec>
        </sec>
        <sec>
            <title>Example use cases</title>
            <p>To illustrate the use of SDG, we have reproduced a short version of two examples from 
                <ext-link ext-link-type="uri" xlink:href="https://bioinfologics.github.io/sdg_examples/">http://bioinfologics. github.io/sdg_examples</ext-link>.</p>
            <p>All paired end datasets are available on 
                <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3363871#.XUwyVy2ZN24">https://zenodo.org/record/3363871#.XUwyVy2ZN24</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-16">16</xref>
                </sup>, and the PacBio reads are from NCBI accession PRJNA194437
                <sup>
                    <xref ref-type="bibr" rid="ref-17">17</xref>
                </sup> For simplicity, we have also made the datasets available on 
                <ext-link ext-link-type="uri" xlink:href="https://opendata.earlham.ac.uk/opendata/data/sdg_datasets/">https://opendata.earlham.ac.uk/opendata/data/sdg_datasets/</ext-link> as ready-to-use &#x2019;fastq.gz&#x2019; files.</p>
            <sec>
                <title>Hybrid assembly of short and long reads</title>
                <p>This example is based on an 
                    <italic toggle="yes">E. coli</italic> dataset combining PacBio reads from 
                    <xref ref-type="bibr" rid="ref-17">17</xref> and Illumina Miseq 2x300bp reads subsampled from a test run. It uses the long reads to scaffold a short read based graph produced by 
                    <italic toggle="yes">sdg-dbg</italic>. Graphs are dumped to GFA files at different stages, and visualised using 
                    <ext-link ext-link-type="uri" xlink:href="http://rrwick.github.io/Bandage/">Bandage</ext-link> v0.8.1
                    <sup>
                        <xref ref-type="bibr" rid="ref-18">18</xref>
                    </sup>
                </p>
                <p>First, we use the command line tools to create datastores for both long and short reads and an initial 
                    <bold>WorkSpace</bold> containing a DBG assembly:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:9px;color:#000000">sdg-datastore make -t paired -o ecoli_pe ../ecoli_pe_r1.fastq.gz -2 ../ecoli_pe_r2.fastq.gz</styled-content> 

                        <styled-content style="font-size:9px;color:#000000">sdg-datastore make -t long -o ecoli_pb -L ../ecoli_pb_all.fastq.gz</styled-content>

                        <styled-content style="font-size:9px;color:#000000">sdg-dbg -p ecoli_pe.prseq -o ecoli_assm</styled-content>
                    </preformat>
                </p>
                <p>From this point on, we use the python SDG library. First, we load the workspace, add a long read datastore and map its reads using a k=11 index.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:9px;color:#008000">import</styled-content> 
                        <styled-content style="font-size:9px;color:#0000FF">pysdg</styled-content> 
                        <styled-content style="font-size:9px;color:#008000">as</styled-content> 
                        <styled-content style="font-size:9px;color:#0000FF">SDG</styled-content>

                        <styled-content style="font-size:9px;color:#408080"># Load sdg-dbg's workspace from disk, add the pacbio datastore</styled-content>

                        <styled-content style="font-size:9px;color:#000000">ws</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">=</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">SDG</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">WorkSpace(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">'ecoli_assm.sdgws'</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#000000">lords</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">=</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">add_long_reads_datastore(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">'ecoli_pb.loseq'</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#408080"># Map long reads</styled-content>

                        <styled-content style="font-size:9px;color:#000000">lords</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">mapper</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">k</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">= 11</styled-content>

                        <styled-content style="font-size:9px;color:#000000">lords</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">mapper</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">map_reads()</styled-content>


                        <styled-content style="font-size:9px;color:#000000">ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">write_to_gfa1(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">'initial_graph.gfa'</styled-content>)</preformat>
                </p>
                <p>The graph, as shown in 
                    <xref ref-type="fig" rid="f3">Figure 3A</xref> contains multiple unresolved repeats.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <p>Linkage at different stages of the long read scaffolding example, visualised using Bandage: 
                            <bold>A</bold>) SequenceDistanceGraph generated by sdg-dbg from short reads, 
                            <bold>B</bold>) DistanceGraph generated after using make_nextselected_linkage on the long read data, linking all nodes of 1100bp and more, 
                            <bold>C</bold>) DistanceGraph after eliminating all nodes with multiple connections (repeats).</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/22229/53efbf1c-a43e-4b47-bf99-4fba89c329a9_figure3.gif"/>
                </fig>
                <p>We can use the LinkageMaker to create linkage using the long reads datastore. We do this by selecting the nodes between which to analyse possible linkage, in this case all nodes of 1100bp or more, and then calling the 
                    <monospace>make_longreads_multilinkage</monospace> method, with alignment filtering parameters of 1000bp and 10% id.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:9px;color:#000000">lm</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">=</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">SDG</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">LinkageMaker(ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg)</styled-content>

                        <styled-content style="font-size:9px;color:#000000">lm</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">select_by_size(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">1100</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#000000">mldg</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">=</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">lm</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">make_longreads_multilinkage(ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">long_reads_datastores[</styled-content>
                        <styled-content style="font-size:9px;color:#666666">0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">]</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">mapper, </styled-content>
                        <styled-content style="font-size:9px;color:#666666">1000</styled-content>, 
                        <styled-content style="font-size:9px;color:#666666">10</styled-content>)</preformat>
                </p>
                <p>This multi-linkage can be collapsed using the LinkageUntangler. The 
                    <monospace>make_nextselected_linkage</monospace> method links every selected node to its closest selected neighbours on each direction, aggregating the distances via a simple median calculation:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:9px;color:#000000">lu</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">=</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">SDG</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">LinkageUntangler(mldg)</styled-content>

                        <styled-content style="font-size:9px;color:#000000">lu</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">select_by_size(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">1100</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#000000">ns_dg</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">=</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">lu</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">make_nextselected_linkage()</styled-content>

                        <styled-content style="font-size:9px;color:#000000">ns_dg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">write_to_gfa1(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">'ns_collapsed.gfa'</styled-content>)</preformat>
                </p>
                <p>The new graph we dumped, as shown in 
                    <xref ref-type="fig" rid="f3">Figure 3B</xref>, has disconnected the repeats and introduced long read linkage which skips over them, but it is still not fully solved. We can improve this further by getting rid of repetitive nodes that will be connected to multiple neighbours, as each of them belongs in more than one place. We do that by just turning these nodes&#x2019; selection off in the 
                    <bold>LinkageUntangler</bold>, which will then skip them in the solution.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:9px;color:#008000">for</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">nv</styled-content> 
                        <styled-content style="font-size:9px;color:#AB21FF">in</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">ns_dg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">get_all_nodeviews():</styled-content>
    
                        <styled-content style="font-size:9px;color:#008000">if len</styled-content>
                        <styled-content style="font-size:9px;color:#000000">(nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">prev())</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">&gt; 1</styled-content> 
                        <styled-content style="font-size:9px;color:#AB21FF">or</styled-content> 
                        <styled-content style="font-size:9px;color:#008000">len</styled-content>
                        <styled-content style="font-size:9px;color:#000000">(nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">next())</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">&gt; 1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">:</styled-content>
        
                        <styled-content style="font-size:9px;color:#000000">lu</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">selected_nodes[nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">node_id()]</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">=</styled-content> 
                        <styled-content style="font-size:9px;color:#008000">False</styled-content>

                        <styled-content style="font-size:9px;color:#000000">ns_nr_dg</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">=</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">lu</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">make_nextselected_linkage()</styled-content>


                        <styled-content style="font-size:9px;color:#000000">ns_nr_dg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">write_to_gfa1(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">'ns_nr_final.gfa'</styled-content>)</preformat>
                </p>
                <p>The last graph is now a circle, with all the repeats disconnected from any linkage.</p>
            </sec>
            <sec>
                <title>Analysing a simulation of heterozygous parent-child trio with short reads</title>
                <p>We created a simulation of a trio dataset for this example using the synthetic genome creation and sequencing package 
                    <ext-link ext-link-type="uri" xlink:href="https://bioinfologics.github.io/Pseudoseq.jl/dev/">Pseudoseq.jl</ext-link> v0.1.0
                    <sup>
                        <xref ref-type="bibr" rid="ref-19">19</xref>
                    </sup> Chromosomes 4 and 5 of the reference genome of the yeast strain S288C were used as templates to create a diploid, genome for each parent with 1% heterozygous sites. Each homologous pair of chromosomes was crossed over and recombined and the child inherited one chromosome from the first parent at random, and one chromosome from the second parent at random. Simulated paired end reads were generated for each genome, using an average fragment length of 700bp and a read length of 250bp, and an expected coverage of 70x with error rate was set to 0.1%.</p>
                <p>First we used the command line tools to create a graph from the child reads using sdg-dbg, and add 
                    <italic toggle="yes">k-mer</italic> counts for both parents into the datastore.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:9px;color:#000000">sdg-datastore make -t paired -1 child/child-pe-reads_R1.fastq.gz -2 child/child-pe-reads_R2.fastq.gz -o child_pe
sdg-dbg sdg-dbg -p child_pe.prseq -o sdg_child
sdg-kmercounter add -c main.sdgkc -n p1 -f p1/p1-pe-reads_R1.fastq.gz -f p1/p1-pe-reads_R2.fastq.gz -o main
sdg-kmercounter add -c main.sdgkc -n p2 -f p2/p2-pe-reads_R1.fastq.gz -f p2/p2-pe-reads_R2.fastq.gz -o main</styled-content>
                    </preformat>
                </p>
                <p>We now open the 
                    <bold>WorkSpace</bold> and use the 
                    <bold>NodeView::parallels</bold> method to look for the largest bubble structure in the graph, which should be formed by two parallel nodes with haplotypes coming from each parent.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:9px;color:#008000">import</styled-content>
                        <styled-content style="font-size:9px;color:#0000FF"> pysdg</styled-content> 
                        <styled-content style="font-size:9px;color:#008000">as</styled-content>
                        <styled-content style="font-size:9px;color:#0000FF"> SDG</styled-content>

                        <styled-content style="font-size:9px;color:#000000">ws </styled-content>
                        <styled-content style="font-size:9px;color:#666666">= </styled-content>
                        <styled-content style="font-size:9px;color:#000000">SDG</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">WorkSpace(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">'sdg_child.sdgws'</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#408080">#Largest node with one parallel node, and its parallel</styled-content>

                        <styled-content style="font-size:9px;color:#000000">maxbubble</styled-content>
                        <styled-content style="font-size:9px;color:#666666"> = 0</styled-content>

                        <styled-content style="font-size:9px;color:#008000">for</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> nv </styled-content>
                        <styled-content style="font-size:9px;color:#AB21FF">in</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">get_all_nodeviews():</styled-content>
    
                        <styled-content style="font-size:9px;color:#008000">if</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">size()</styled-content>
                        <styled-content style="font-size:9px;color:#666666"> &gt;</styled-content>
                        <styled-content style="font-size:9px;color:#000000"> maxbubble </styled-content>
                        <styled-content style="font-size:9px;color:#AB21FF">and</styled-content> 
                        <styled-content style="font-size:9px;color:#008000">len</styled-content>
                        <styled-content style="font-size:9px;color:#000000">(nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">parallels())</styled-content>
                        <styled-content style="font-size:9px;color:#666666"> == 1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">:</styled-content>
        
                        <styled-content style="font-size:9px;color:#000000">maxbubble</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=</styled-content>
                        <styled-content style="font-size:9px;color:#000000">nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">size()</styled-content>
        
                        <styled-content style="font-size:9px;color:#000000">bubble_nvs</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=</styled-content>
                        <styled-content style="font-size:9px;color:#000000">(nv,nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">parallels()[</styled-content>
                        <styled-content style="font-size:9px;color:#666666">0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">])</styled-content>
                    </preformat>
                </p>
                <p>Since each side should be a haplotype from a different parent, we should see a loss of 
                    <italic toggle="yes">k-mer</italic> coverage on the parent that didn&#x2019;t contribute that haplotype. To check this, we create a plotting function to plot the output from the 
                    <bold>NodeView::kmer_coverage</bold> method.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:9px;color:#008000">def</styled-content> 
                        <styled-content style="font-size:9px;color:#0000FF">plot_kcov</styled-content>
                        <styled-content style="font-size:9px;color:#000000">(nv):</styled-content>
    
                        <styled-content style="font-size:9px;color:#BA2121">'''Plot kmer coverage across the three read sets. Requires pylab.'''</styled-content>
    
                        <styled-content style="font-size:9px;color:#000000">figure();suptitle(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"Coverage for "</styled-content>
                        <styled-content style="font-size:9px;color:#666666">+</styled-content>
                        <styled-content style="font-size:9px;color:#008000">str</styled-content>
                        <styled-content style="font-size:9px;color:#000000">(nv));</styled-content>
    
                        <styled-content style="font-size:9px;color:#000000">subplot(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">3</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#666666">1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#666666">1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">);ylim((</styled-content>
                        <styled-content style="font-size:9px;color:#666666">0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#666666">120</styled-content>
                        <styled-content style="font-size:9px;color:#000000">))</styled-content>
    
                        <styled-content style="font-size:9px;color:#000000">plot(nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">kmer_coverage(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"main"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"PE"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>
                        <styled-content style="font-size:9px;color:#000000">, label</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"child"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">); legend(loc</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">);</styled-content>
    
                        <styled-content style="font-size:9px;color:#000000">subplot(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">3</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">2</styled-content>
                        <styled-content style="font-size:9px;color:#000000">);ylim((</styled-content>
                        <styled-content style="font-size:9px;color:#666666">0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">120</styled-content>
                        <styled-content style="font-size:9px;color:#000000">))</styled-content>
    
                        <styled-content style="font-size:9px;color:#000000">plot(nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">kmer_coverage(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"main"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"p1"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">),</styled-content> 
                        <styled-content style="font-size:9px;color:#BA2121">"red"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">, label</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"parent 1"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">); legend(loc</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">);</styled-content>
    
                        <styled-content style="font-size:9px;color:#000000">subplot(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">3</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">3</styled-content>
                        <styled-content style="font-size:9px;color:#000000">);ylim((</styled-content>
                        <styled-content style="font-size:9px;color:#666666">0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">120</styled-content>
                        <styled-content style="font-size:9px;color:#000000">))</styled-content>
    
                        <styled-content style="font-size:9px;color:#000000">plot(nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">kmer_coverage(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"main"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">,</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"p2"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">),</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"blue"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">, label</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"parent 2"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">); legend(loc</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">);</styled-content>


                        <styled-content style="font-size:9px;color:#000000">plot_kcov(bubble_nvs[</styled-content>
                        <styled-content style="font-size:9px;color:#666666">0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">])</styled-content>

                        <styled-content style="font-size:9px;color:#000000">plot_kcov(bubble_nvs[</styled-content>
                        <styled-content style="font-size:9px;color:#666666">1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">])</styled-content>
                    </preformat>
                </p>
                <p>The plots, shown in 
                    <xref ref-type="fig" rid="f4">Figure 4</xref>, reflect how Node 4775 contains content inherited from parent 2 and its parallel node 11414 contains content inherited from parent 1. We can create a function to extend these parent-specific regions by walking forward and backward as long as only one link takes us to a node that is fully covered by the content of the parent we are following.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Trio analysis:
                            <italic toggle="yes">k-mer</italic> coverage for each side of the largest bubble structure in the child&#x2019;s assembly by each of the three read sets.</title>
                        <p>Coverage drops to 0 on the opposite parent for 
                            <italic toggle="yes">k-mers</italic> that are unique to a parent.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/22229/53efbf1c-a43e-4b47-bf99-4fba89c329a9_figure4.gif"/>
                </fig>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:9px;color:#008000">def</styled-content> 
                        <styled-content style="font-size:9px;color:#0000FF">extend_parent_covered_path</styled-content>
                        <styled-content style="font-size:9px;color:#000000">(starting_node,target_parent):</styled-content>
    
                        <styled-content style="font-size:9px;color:#008000">if</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">get_nodeview(starting_node)</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">kmer_coverage(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"main"</styled-content>, 
                        <styled-content style="font-size:9px;color:#000000">target_parent)</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">count(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">!= 0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">:</styled-content>
        
                        <styled-content style="font-size:9px;color:#008000">return</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">SDG</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">SequenceDistanceGraphPath(ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg,[])</styled-content>
    
                        <styled-content style="font-size:9px;color:#000000">p</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">=</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">SDG</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">SequenceDistanceGraphPath(ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg,[starting_node])</styled-content>
    
                        <styled-content style="font-size:9px;color:#008000">for</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">x</styled-content> 
                        <styled-content style="font-size:9px;color:#AB21FF">in</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">[</styled-content>
                        <styled-content style="font-size:9px;color:#666666">0</styled-content>,
                        <styled-content style="font-size:9px;color:#666666">1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">]:</styled-content>
        
                        <styled-content style="font-size:9px;color:#000000">nv</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">=</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">get_nodeview(p</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">nodes[</styled-content>
                        <styled-content style="font-size:9px;color:#666666">-1</styled-content>
                        <styled-content style="font-size:9px;color:#000000">])</styled-content>
        
                        <styled-content style="font-size:9px;color:#008000">while</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">next():</styled-content>
            
                        <styled-content style="font-size:9px;color:#000000">next_node</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">= 0</styled-content>
            
                        <styled-content style="font-size:9px;color:#008000">for</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">nl</styled-content> 
                        <styled-content style="font-size:9px;color:#AB21FF">in</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">next():</styled-content>
                
                        <styled-content style="font-size:9px;color:#008000">if</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">nl</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">node()</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">kmer_coverage(</styled-content>
                        <styled-content style="font-size:9px;color:#BA2121">"main"</styled-content>, 
                        <styled-content style="font-size:9px;color:#000000">target_parent)</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">count(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">== 0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">:</styled-content>
                    
                        <styled-content style="font-size:9px;color:#008000">if</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">next_node</styled-content> 
                        <styled-content style="font-size:9px;color:#AB21FF">or</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">nl</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">node()</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">node_id()</styled-content> 
                        <styled-content style="font-size:9px;color:#AB21FF">in</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">p</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">nodes:</styled-content>
                        
                        <styled-content style="font-size:9px;color:#000000">next_node</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">= 0</styled-content>
                        
                        <styled-content style="font-size:9px;color:#008000">break</styled-content>
                    
                        <styled-content style="font-size:9px;color:#008000">else</styled-content>
                        <styled-content style="font-size:9px;color:#000000">:</styled-content>
                        
                        <styled-content style="font-size:9px;color:#000000">next_node</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">=</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">nl</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">node()</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">node_id()</styled-content>
            
                        <styled-content style="font-size:9px;color:#008000">if</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">next_node</styled-content> 
                        <styled-content style="font-size:9px;color:#666666">== 0</styled-content>
                        <styled-content style="font-size:9px;color:#000000">:</styled-content> 
                        <styled-content style="font-size:9px;color:#008000">break</styled-content>
            
                        <styled-content style="font-size:9px;color:#000000">p</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">nodes</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">append(next_node)</styled-content>
            
                        <styled-content style="font-size:9px;color:#000000">nv</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=</styled-content>
                        <styled-content style="font-size:9px;color:#000000">ws</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">sdg</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">get_nodeview(next_node)</styled-content>
        
                        <styled-content style="font-size:9px;color:#000000">p</styled-content>
                        <styled-content style="font-size:9px;color:#666666">.</styled-content>
                        <styled-content style="font-size:9px;color:#000000">reverse()</styled-content>
    
                        <styled-content style="font-size:9px;color:#008000">return</styled-content> 
                        <styled-content style="font-size:9px;color:#000000">p</styled-content>
  

                        <styled-content style="font-size:9px;color:#000000">path1</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=</styled-content>
                        <styled-content style="font-size:9px;color:#000000">extend_parent_covered_path(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">11414</styled-content>, 
                        <styled-content style="font-size:9px;color:#BA2121">"p1"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>

                        <styled-content style="font-size:9px;color:#000000">path2</styled-content>
                        <styled-content style="font-size:9px;color:#666666">=</styled-content>
                        <styled-content style="font-size:9px;color:#000000">extend_parent_covered_path(</styled-content>
                        <styled-content style="font-size:9px;color:#666666">4775</styled-content>, 
                        <styled-content style="font-size:9px;color:#BA2121">"p2"</styled-content>
                        <styled-content style="font-size:9px;color:#000000">)</styled-content>
                    </preformat>
                </p>
                <p>After using this function, path1 contains 49 nodes yielding 8672bp of sequence inherited from parent 1, and path2 contains 139 nodes yielding 26351bp of sequence inherited from parent 2. It is important to note that the difference in node count and sequence length arises because the extension function is haplotype-specific and its results depend in the topology of each haplotype graph.</p>
            </sec>
        </sec>
        <sec>
            <title>Summary</title>
            <p>The Sequence Distance Graph framework provides a unified workspace for different sequencing technologies using the genome graph as the basis of integration. It enables analyses across the graph topology, the raw data and its projections to the graph. We have shown how the NodeView class can be used through the Python API to produce interactive analyses that are both powerful and easy to follow. We expect this will be a useful codebase for all levels of users, not only for the construction of graph-based analysis but also for their teaching and dissemination.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <sec>
                <title>Source data</title>
                <p>The PacBio, 
                    <italic toggle="yes">E. coli</italic> reads are deposited on NCBI accession PRJNA194437 from Koren 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-17">17</xref>
                    </sup>
                </p>
                <p>

                    <italic toggle="yes">E. coli</italic> K12 Re-sequencing with PacBio RS and 454: Accession number PRJNA194437, 
                    <ext-link ext-link-type="uri" xlink:href="https://identifiers.org/ncbi/bioproject:PRJNA194437">https://identifiers.org/ncbi/bioproject:PRJNA194437</ext-link>
                </p>
            </sec>
            <sec>
                <title>Underlying data</title>
                <p>The datasets used in the examples are available from: 
                    <ext-link ext-link-type="uri" xlink:href="https://opendata.earlham.ac.uk/opendata/data/sdg_datasets/">https://opendata.earlham.ac.uk/opendata/data/sdg_datasets/</ext-link>  and  archived  in  Zenodo Zenodo: SDG Paper Datasets. 
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.3363871">http://doi.org/10.5281/zenodo.3363871</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup>
                </p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/publicdomain/zero/1.0/legalcode">Creative Commons Zero "No rights reserved" data waiver</ext-link> (CC0 1.0 Public domain dedication).</p>
            </sec>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>Software documentation: 
                <ext-link ext-link-type="uri" xlink:href="https://bioinfologics.github.io/sdg/">https://bioinfologics.github.io/sdg</ext-link>
            </p>
            <p>Source code available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/bioinfologics/sdg">http://github.com/bioinfologics/sdg</ext-link>
            </p>
            <p>Archieved source code at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3363165#.XUw1yy2ZN25">https://zenodo.org/record/3363165#.XUw1yy2ZN25</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-20">20</xref>
                </sup>
            </p>
            <p>License: MIT License</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>We would like to thank Richard Harrison for helpful discussions about SDG&#x2019;s results and continued support through the OctoSeq project. We thank James Cuff for input about design principles and continuous encouragement. We thank Kat Hodgkinson for her feedback and patience as an early user of the rough alpha version of SDG. We thank Camilla Ryan for enduring and joining never-ending discussions about graph representations and the design of the framework.</p>
        </ack>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pevzner</surname>
                            <given-names>PA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tang</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Waterman</surname>
                            <given-names>MS</given-names>
                        </name>
</person-group>:
                    <article-title>An Eulerian path approach to DNA fragment assembly.</article-title>
                    <source>

                        <italic toggle="yes">Proc Natl Acad Sci U S A.</italic>
</source>
                    <year>2001</year>;<volume>98</volume>(<issue>17</issue>):<fpage>9748</fpage>&#x2013;<lpage>9753</lpage>.
                    <pub-id pub-id-type="pmid">11504945</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.171285098</pub-id>
                    <pub-id pub-id-type="pmcid">55524</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Medvedev</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brudno</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Maximum likelihood genome assembly.</article-title>
                    <source>

                        <italic toggle="yes">J Comput Biol.</italic>
</source>
                    <year>2009</year>;<volume>16</volume>(<issue>8</issue>):<fpage>1101</fpage>&#x2013;<lpage>1116</lpage>.
                    <pub-id pub-id-type="pmid">19645596</pub-id>
                    <pub-id pub-id-type="doi">10.1089/cmb.2009.0047</pub-id>
                    <pub-id pub-id-type="pmcid">3154397</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Butler</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>MacCallum</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kleber</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ALLPATHS: de novo assembly of whole-genome shotgun microreads.</article-title>
                    <source>

                        <italic toggle="yes">Genome Res.</italic>
</source>
                    <year>2008</year>;<volume>18</volume>(<issue>5</issue>):<fpage>810</fpage>&#x2013;<lpage>820</lpage>.
                    <pub-id pub-id-type="pmid">18340039</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.7337908</pub-id>
                    <pub-id pub-id-type="pmcid">2336810</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jackman</surname>
                            <given-names>SD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Myers</surname>
                            <given-names>EW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gonella</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>The GFA Specification</article-title>.
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/GFA-spec/GFA-spec">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Garrison</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sir&#x00e9;n</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Novak</surname>
                            <given-names>AM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Variation graph toolkit improves read mapping by representing genetic variation in the reference.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2018</year>;<volume>36</volume>(<issue>9</issue>):<fpage>875</fpage>&#x2013;<lpage>879</lpage>.
                    <pub-id pub-id-type="pmid">30125266</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.4227</pub-id>
                    <pub-id pub-id-type="pmcid">6126949</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rautiainen</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>M&#x00e4;kinen</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marschall</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>Bit-parallel sequence-to-graph alignment.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2018</year>;<fpage>323063</fpage>.
                    <pub-id pub-id-type="doi">10.1101/323063</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sir&#x00e9;n</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Garrison</surname>
                            <given-names>JE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Novak</surname>
                            <given-names>AM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Haplotype-aware graph indexes.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2019</year>.
                    <pub-id pub-id-type="doi">10.1101/559583</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Novak</surname>
                            <given-names>AM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Garrison</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Paten</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>A graph extension of the positional Burrows-Wheeler transform and its applications.</article-title>
                    <source>

                        <italic toggle="yes">Algorithms Mol Biol.</italic>
</source>
                    <year>2017</year>;<volume>12</volume>(<issue>1</issue>):<fpage>18</fpage>.
                    <pub-id pub-id-type="pmid">28702075</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13015-017-0109-9</pub-id>
                    <pub-id pub-id-type="pmcid">5505026</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jain</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dilthey</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Misra</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Accelerating Sequence Alignment to Graphs.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2019</year>.
                    <pub-id pub-id-type="doi">10.1101/651638</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Limasset</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Flot</surname>
                            <given-names>JF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Peterlongo</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2019</year>; pii: btz102.
                    <pub-id pub-id-type="pmid">30785192</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btz102</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Paten</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zerbino</surname>
                            <given-names>DR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hickey</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A unifying model of genome evolution under parsimony.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2014</year>;<volume>15</volume>(<issue>1</issue>):<fpage>206</fpage>.
                    <pub-id pub-id-type="pmid">24946830</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-15-206</pub-id>
                    <pub-id pub-id-type="pmcid">4082375</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Batzoglou</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jaffe</surname>
                            <given-names>DB</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stanley</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ARACHNE: a whole-genome shotgun assembler.</article-title>
                    <source>

                        <italic toggle="yes">Genome Res.</italic>
</source>
                    <year>2002</year>;<volume>12</volume>(<issue>1</issue>):<fpage>177</fpage>&#x2013;<lpage>189</lpage>.
                    <pub-id pub-id-type="pmid">11779843</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.208902</pub-id>
                    <pub-id pub-id-type="pmcid">155255</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sovi&#x0107;</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>&#x0160;iki&#x0107;</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wilm</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Fast and sensitive mapping of nanopore sequencing reads with GraphMap.</article-title>
                    <source>

                        <italic toggle="yes">Nat Commun.</italic>
</source>
                    <year>2016</year>;<volume>7</volume>:<fpage>11307</fpage>.
                    <pub-id pub-id-type="pmid">27079541</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ncomms11307</pub-id>
                    <pub-id pub-id-type="pmcid">4835549</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Paveti&#x0107;</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Katani&#x0107;</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Matula</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Fast and simple algorithms for computing both 
                        <italic toggle="yes">LCS
                            <sub>k</sub>
                        </italic> and 
                        <italic toggle="yes">LCS
                            <sub>k+</sub>
                        </italic>
                    </article-title>. arXiv: 1705.07279 [cs],<year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/pdf/1705.07279.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mapleson</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Garcia Accinelli</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kettleborough</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2017</year>;<volume>33</volume>(<issue>4</issue>):<fpage>574</fpage>&#x2013;<lpage>576</lpage>.
                    <pub-id pub-id-type="pmid">27797770</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btw663</pub-id>
                    <pub-id pub-id-type="pmcid">5408915</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yanes</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Garcia Accinelli</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ward</surname>
                            <given-names>BJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Sdg paper datasets</article-title>.<year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3363871">https://zenodo.org/record/3363871</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Koren</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Harhay</surname>
                            <given-names>GP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Smith</surname>
                            <given-names>TP</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Reducing assembly complexity of microbial genomes with single-molecule sequencing.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2013</year>;<volume>14</volume>(<issue>9</issue>):<fpage>R101</fpage>.
                    <pub-id pub-id-type="pmid">24034426</pub-id>
                    <pub-id pub-id-type="doi">10.1186/gb-2013-14-9-r101</pub-id>
                    <pub-id pub-id-type="pmcid">4053942</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wick</surname>
                            <given-names>RR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schultz</surname>
                            <given-names>MB</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zobel</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Bandage: interactive visualization of 
                        <italic toggle="yes">de novo</italic> genome assemblies.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2015</year>;<volume>31</volume>(<issue>20</issue>):<fpage>3350</fpage>&#x2013;<lpage>3352</lpage>.
                    <pub-id pub-id-type="pmid">26099265</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btv383</pub-id>
                    <pub-id pub-id-type="pmcid">4595904</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ward</surname>
                            <given-names>BJ</given-names>
                        </name>
</person-group>:
                    <article-title>bioinfologics/pseudoseq.jl: First release</article-title>.<year>2019</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.2656743</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yanes</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Garcia Accinelli</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ward</surname>
                            <given-names>BJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>bioinfologics/sdg: Release candidate</article-title>.<year>2019</year>;<fpage>7</fpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3363165">https://zenodo.org/record/3363165</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report55755">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.22229.r55755</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Alekseyev</surname>
                        <given-names>Max</given-names>
                    </name>
                    <xref ref-type="aff" rid="r55755a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-5140-8095</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Avdeyev</surname>
                        <given-names>Pavel</given-names>
                    </name>
                    <xref ref-type="aff" rid="r55755a1">1</xref>
                    <role>Co-referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-7953-6259</uri>
                </contrib>
                <aff id="r55755a1">
                    <label>1</label>George Washington University, Washington, DC, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>24</day>
                <month>1</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Alekseyev M and Avdeyev P</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport55755" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.20233.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors describe a software package aimed at construction, storage, and manipulation of sequence graphs. The software is publicly available on Github. While the core functionality is developed in C++, the package also provides Python wrapper library and a command-line tool. The authors outline the software design and provide pipelines and code examples for two different types of data.</p>
            <p> </p>
            <p> &#x00a0; &#x00a0;Overall, the paper is well-written and describes potentially useful software. At the same time, the paper lacks: 
                <list list-type="order">
                    <list-item>
                        <p>Comparison of (features&#x00a0;of) the developed software and&#x00a0;existing software such as VG toolkit based on variation graphs (reference 5
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-55755-1">1</xref>
                            </sup>);</p>
                    </list-item>
                    <list-item>
                        <p>Discussion of the software applicability (e.g., will it work on large or repeat-rich genomes?), or estimation of its running time/space complexity.</p>
                    </list-item>
                </list> Some minor comments: 
                <list list-type="order">
                    <list-item>
                        <p>The use of bold font is not explained. For example,&#x00a0;Datastores first appears in the sentence &#x201c;
                            <bold>Datastores</bold> allow random&#x00a0;access...&#x201d; describing its features, but WHAT is Datastores?</p>
                    </list-item>
                    <list-item>
                        <p>Orthogonal edge routing in Fig. 1 is somewhat confusing, why not make edges curved?</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>genome assembly, comparative genomics</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-55755-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Variation graph toolkit improves read mapping by representing genetic variation in the reference</article-title>.
                        <source>
                            <italic>Nature Biotechnology</italic>
                        </source>.<year>2018</year>;<volume>36</volume>(<issue>9</issue>) :
                        <elocation-id>10.1038/nbt.4227</elocation-id>
                        <fpage>875</fpage>-<lpage>879</lpage>
                        <pub-id pub-id-type="doi">10.1038/nbt.4227</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report55323">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.22229.r55323</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Dawson</surname>
                        <given-names>Eric T.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r55323a1">1</xref>
                    <xref ref-type="aff" rid="r55323a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-5448-1653</uri>
                </contrib>
                <aff id="r55323a1">
                    <label>1</label>Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA</aff>
                <aff id="r55323a2">
                    <label>2</label>Department of Genetics, University of Cambridge, Cambrige, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>4</day>
                <month>11</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Dawson ET</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport55323" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.20233.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors describe a framework for constructing sequence graphs, aligning reads, manipulating graph structures, and extracting them into standard formats (e.g., GFA). This framework is available as both a set of command line tools and a python library which wraps much of the underlying functionality. Their implementation unifies the representation of gaps and overlaps as a single linkage type within the graph. This is the primary theoretical advance of the work. This work is scientifically sound but its description as written could benefit from some minor additions.</p>
            <p> </p>
            <p> The software is freely available on GitHub and binary releases of the command line tools are provided. These are functional on a modern linux laptop and clear examples with data are provided. The paper includes the outputs of these examples as figures.</p>
            <p> </p>
            <p> The python libraries rely on SWIG and are not included in the binaries. While not requisite for publication, providing the python libraries through pip, conda, or another package manager would increase the reach of the framework. This would match the authors' conclusion that the sdg package provides "a useful codebase for all levels of users."</p>
            <p> </p>
            <p> The examples provided are clear and scientifically relevant. The graph mapping and manipulation example using 
                <italic>E. coli&#x00a0;</italic>data (Figure 3) and the description of genotyping a simulated yeast trio (Figure 4) are both realistic.</p>
            <p> </p>
            <p> However, the authors should provide run times and machine details for these examples. Both are relatively fast as the datasets are small. There is no need for extensive benchmarking; a footnote for each example would address this adequately.</p>
            <p> </p>
            <p> A brief 1-2 sentence discussion of a larger scale example the authors have attempted should also be included.</p>
            <p> </p>
            <p> In addition, the phrasing "simulated parent-offspring trio" in the abstract should be modified to make it clear that the data is from yeast. As it is written the phrasing implies the framework may work on human/animal-scale data, though no evidence of this has been provided in this version of the paper.</p>
            <p> </p>
            <p> Lastly, a brief description of the similarities and differences between the sequence (distance) graph, the variation graph, and the de Bruijn graph from an assembler such as ABySS should be included in the introduction or provided by a reference. This description need not be longer than two to four sentences in length. This should highlight the similar representations of the graphs (e.g., sequences stored in nodes and linkages/paths described by edges) and the different amounts of information content within the graph types. This would strengthen the critical need for the software and is partially highlighted by the example in Figure 3.</p>
            <p> </p>
            <p> As it stands the paper is deserving of indexing. These additions would further strengthen what is already an excellent tool description, I hope without adding too much additional work for the authors.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>computational biology; graph genomes; structural variant calling; bioinformatics; cancer biology</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report52959">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.22229.r52959</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Garrison</surname>
                        <given-names>Erik</given-names>
                    </name>
                    <xref ref-type="aff" rid="r52959a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r52959a1">
                    <label>1</label>University of California, Santa Cruz, Santa Cruz, CA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>24</day>
                <month>9</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Garrison E</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport52959" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.20233.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors demonstrate a new toolchain and data model for working with sequence graphs. This method allows the user to dynamically interact with sequence graphs made in the process of assembly. They provide a number of examples of the use of the method as well as code snippets to demonstrate its functionality. The library is written in C++, but wrapped in python with SWIG, which should make it useful to many researchers for whom C++ is difficult to use.</p>
            <p> I find only one thing strange about the work. In the beginning, the authors indicate that there are not interoperable methods for working with sequence graphs and alignments to them, but they have in effect created another competing standard. Are there particular limitations with existing data models that they hope to address with the Sequence Distance Graph framework? How is their model different than the variation graph model, in which distances are provided by a collection of paths (or equivalently alignments) embedded within the sequence graph?</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
