<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.149577.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved, 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Wattanasombat</surname>
                        <given-names>Sara</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <uri content-type="orcid">https://orcid.org/0009-0008-6110-6614</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Tongjai</surname>
                        <given-names>Siripong</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6451-675X</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:siripong.tongjai@cmu.ac.th">siripong.tongjai@cmu.ac.th</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>31</day>
                <month>5</month>
                <year>2024</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2024</year>
            </pub-date>
            <volume>13</volume>
            <elocation-id>556</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>14</day>
                    <month>5</month>
                    <year>2024</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2024 Wattanasombat S and Tongjai S</copyright-statement>
                <copyright-year>2024</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/13-556/pdf"/>
            <abstract>
                <sec>
                    <title/>
                    <sec>
                        <title>Background</title>
                        <p>Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources.</p>
                    </sec>
                    <sec>
                        <title>Methods</title>
                        <p>We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers&#x2014;Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo&#x2014;for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler&#x2019;s performance, utilizing QUAST and BLASTN for quality assessment.</p>
                    </sec>
                    <sec>
                        <title>Results</title>
                        <p>Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among 
                            <italic toggle="yes">de novo</italic> assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime.</p>
                    </sec>
                    <sec>
                        <title>Conclusions</title>
                        <p>The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.</p>
                    </sec>
                </sec>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>HIV</kwd>
                <kwd>Virus</kwd>
                <kwd>Infectious Diseases</kwd>
                <kwd>NGS</kwd>
                <kwd>Single-molecule sequencing</kwd>
                <kwd>Haplotype reconstruction</kwd>
                <kwd>Genome assembly</kwd>
                <kwd>Genomic surveillance</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>The Health Systems Research Institute, Thailand</funding-source>
                    <award-id>GrantNo.64-148</award-id>
                </award-group>
                <award-group id="fund-2">
                    <funding-source>The Faculty of Medicine Research Fund, Chiang Mai University,</funding-source>
                    <award-id>GrantNo.099-2563</award-id>
                </award-group>
                <funding-statement>This research was funded by the Health Systems Research Institute, Thailand, Grant No. 64-148, and the Faculty of Medicine Research Fund, Chiang Mai University, Grant No. 099-2563.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec5" sec-type="intro">
            <title>Introduction</title>
            <p>In 2020, UNAIDS has re-established the 95-95-95 targets to end the HIV/AIDS epidemic, aiming for 95% of people to know their HIV status, receive treatment, and achieve viral suppression by 2025.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup> While efforts to combat HIV-1 focus on treatment and prevention, HIV-1 genomic surveillance plays a crucial role.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> It offers essential data for evidence-based strategies in HIV prevention, testing, treatment, and care.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup> The genomic surveillance protocol should entail simple and swift sample preparations and sequencing, facilitated by Oxford Nanopore Sequencing Technology.
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup> Additionally, it should feature straightforward bioinformatic analysis pipelines and provide comprehensive reports enriched with reliable HIV-1 information. This adaptable protocol can utilize various computational resources, contributing significantly to global efforts against HIV/AIDS.</p>
            <p>The lack of a proofreading mechanism in HIV-1 reverse transcriptase leads to a high mutation rate, potentially resulting in immune-evading variants or drug-resistant strains, posing challenges in patient care and virus transmission control strains.
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref16">16</xref>
                </sup> Various patterns of intra-host multi-strain HIV infection, including dual, co-, super, and triple infection, have been observed.
                <sup>
                    <xref ref-type="bibr" rid="ref17">17</xref>
                </sup> Multiple HIV infections significantly contribute to the emergence of novel and more infectious HIV-1 recombinants, impacting viral fitness and increasing inter-subtype recombinants&#x2019; prevalence.
                <sup>
                    <xref ref-type="bibr" rid="ref18">18</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref19">19</xref>
                </sup> Some novel HIV-1 recombinants may be undetectable, especially in medically suppressed individuals.
                <sup>
                    <xref ref-type="bibr" rid="ref20">20</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref21">21</xref>
                </sup> Therefore, HIV-1 quasispecies profiling holds value for understanding viral population dynamics, particularly newly emerging recombinants.
                <sup>
                    <xref ref-type="bibr" rid="ref22">22</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref24">24</xref>
                </sup> Developing an effective monitoring protocol focusing on the viral genomic level is crucial for understanding HIV-1 dynamics within the host and devising better treatment and prevention strategies.
                <sup>
                    <xref ref-type="bibr" rid="ref21">21</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref25">25</xref>
                </sup>
            </p>
            <p>Recently, Next Generation Sequencing (NGS) technologies have become crucial in viral genome analysis. The Sequencing by Synthesis (SBS) approach enables the detection of single nucleotide variants (SNVs) and some alterations in viral genomes.
                <sup>
                    <xref ref-type="bibr" rid="ref26">26</xref>
                </sup> However, SBS technology faces limitations in accurately reconstructing viral haplotypes and detecting low-abundance haplotypes for quasispecies analysis. It is also inadequate for examining complex viral sequences such as HIV-1&#x2019;s long terminal repeats (LTRs) or large deletions or insertions in the HIV envelope glycoprotein gene. Furthermore, SBS technology is not well-suited for analyzing phased mutations in the viral genome.
                <sup>
                    <xref ref-type="bibr" rid="ref20">20</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref26">26</xref>
                </sup>
            </p>
            <p>Single-molecule sequencing (SMS) technology, including Oxford Nanopore Technology (ONT) and PacBio SMRT technology, enhances viral quasispecies analysis by generating longer genomic reads,
                <sup>
                    <xref ref-type="bibr" rid="ref27">27</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref29">29</xref>
                </sup> enabling near-complete viral genome reconstruction with high accuracy (97&#x2013;99% identity).
                <sup>
                    <xref ref-type="bibr" rid="ref17">17</xref>
                </sup> ONT&#x2019;s recent advancements allow direct RNA sequencing, reducing bias from PCR or cDNA synthesis.
                <sup>
                    <xref ref-type="bibr" rid="ref24">24</xref>
                </sup> However, SMS technology has a high raw read sequencing error rate exceeding 10%, which can be mitigated with various long-read error correction methods.
                <sup>
                    <xref ref-type="bibr" rid="ref30">30</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref34">34</xref>
                </sup> ONT R10.3 chemistry sequencing of the HIV-1 genome yields raw reads with a 5 to 12% error rate, while ONT&#x2019;s unique molecular identifier (UMI) method achieves a single molecule consensus accuracy of up to 99.9995%.
                <sup>
                    <xref ref-type="bibr" rid="ref35">35</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref36">36</xref>
                </sup> PacBio SMRT has an error rate of 13 to 15%, but its circular consensus sequencing (CCS) approach provides consensus reads with approximately 99.999% accuracy.
                <sup>
                    <xref ref-type="bibr" rid="ref35">35</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref36">36</xref>
                </sup> Data quality from SMS technology can be improved by implementing quality control measures, optimizing coverage thresholds, and validating variant calls to ensure the reliable detection of minor mutations despite residual errors.
                <sup>
                    <xref ref-type="bibr" rid="ref34">34</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref37">37</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref38">38</xref>
                </sup>
            </p>
            <p>Two categories of HIV-1 haplotype detection for SMS data are reference-based and 
                <italic toggle="yes">de novo</italic> approaches.
                <sup>
                    <xref ref-type="bibr" rid="ref39">39</xref>
                </sup> The reference-based methods, though generally accurate, may introduce bias without suitable reference sequences.
                <sup>
                    <xref ref-type="bibr" rid="ref26">26</xref>
                </sup> Using a single reference genome can miscluster reads from different viral haplotypes. For example, RVHaplo&#x2019;s hierarchical clustering with a reference genome may inaccurately group reads from different haplotypes,
                <sup>
                    <xref ref-type="bibr" rid="ref40">40</xref>
                </sup> affecting characterization of novel or rare haplotypes. Canu and GoldRush, considered general-purpose 
                <italic toggle="yes">de novo</italic> assemblers, may overlook genomic reads with lower coverage.
                <sup>
                    <xref ref-type="bibr" rid="ref41">41</xref>
                </sup> Conversely, MetaFlye and Strainline are strain-aware 
                <italic toggle="yes">de novo</italic> assemblers, recognizing strain differences within a sample.
                <sup>
                    <xref ref-type="bibr" rid="ref28">28</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref42">42</xref>
                </sup> HaploDMF, iGDA, and RVHaplo are reference-based assemblers designed for multi-strain mixtures.
                <sup>
                    <xref ref-type="bibr" rid="ref43">43</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref45">45</xref>
                </sup> iGDA and RVHaplo employ similar clustering approaches, while HaploDMF uses deep matrix factorization for contig extension.
                <sup>
                    <xref ref-type="bibr" rid="ref43">43</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref46">46</xref>
                </sup> Despite numerous assembly software options tailored for SMS data,
                <sup>
                    <xref ref-type="bibr" rid="ref28">28</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref42">42</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref45">45</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref47">47</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref48">48</xref>
                </sup> selecting the best tool for haplotype reconstruction and HIV-1 quasispecies analysis is challenging due to a lack of systematic studies. Furthermore, the absence of standardized benchmarking across different software and computing environments complicates the selection process, especially with limited computational resources.
                <sup>
                    <xref ref-type="bibr" rid="ref49">49</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref50">50</xref>
                </sup>
            </p>
            <p>In this study, we demonstrated that Strainline and MetaFlye excelled at haplotype reconstruction, although Strainline required more memory. Conversely, Canu performed poorly when distinguishing sequences in multi-strain mixtures, while GoldRush only yielded consensus assemblies. iGDA exhibited high error rates, whereas RVHaplo showed superior runtime and accuracy. HaploDMF offered improved accuracy despite a longer runtime. Additionally, a containerized pipeline, named HIV-64148, was developed to provide publicly accessible and user-friendly tool for genomic surveillance of HIV-1.</p>
        </sec>
        <sec id="sec6" sec-type="methods">
            <title>Methods</title>
            <sec id="sec7">
                <title>Benchmarking pipeline</title>
                <p>The benchmarking pipeline (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>) started with an assessment of the quality of all long-read FASTQ inputs using NanoPlot
                    <sup>
                        <xref ref-type="bibr" rid="ref51">51</xref>
                    </sup> to ensure data conformity, particularly regarding the median read length. This study evaluates the performance and accuracy of seven assemblers on three different computational systems. Subsequently, the quality and accuracy of each assembler&#x2019;s output are assessed using QUAST. Additionally, sequence similarity and HIV-1 subtype of all assembled contigs are investigated. The benchmarking pipeline is containerized, enabling its execution on any cloud or non-cloud environment
                    <sup>
                        <xref ref-type="bibr" rid="ref52">52</xref>
                    </sup> supporting either Docker or Singularity.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>The HIV-64148 benchmarking pipeline.</title>
                        <p>The pipeline begins with a read quality control analysis of long-read FASTQ files, followed by an assembler, which can be either 
                            <italic toggle="yes">de novo</italic> or reference-based. Finally, the pipeline includes HIV-1 subtyping to classify the HIV-1 strains present in the data.</p>
                    </caption>
                    <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure1.gif"/>
                </fig>
            </sec>
            <sec id="sec8">
                <title>Long-read assemblers</title>
                <p>This study aimed to assess the accuracy, performance, and computational efficacy of seven candidate long-read assemblers, categorized into two approaches for haplotype reconstruction: (1) 
                    <italic toggle="yes">de novo</italic> long-read assemblers, which include Canu,
                    <sup>
                        <xref ref-type="bibr" rid="ref47">47</xref>
                    </sup> Goldrush,
                    <sup>
                        <xref ref-type="bibr" rid="ref48">48</xref>
                    </sup> MetaFlye,
                    <sup>
                        <xref ref-type="bibr" rid="ref42">42</xref>
                    </sup> and Strainline,
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup> and (2) reference-based long-read assemblers, which comprise HaploDMF,
                    <sup>
                        <xref ref-type="bibr" rid="ref43">43</xref>
                    </sup> iGDA,
                    <sup>
                        <xref ref-type="bibr" rid="ref44">44</xref>
                    </sup> and RVHaplo.
                    <sup>
                        <xref ref-type="bibr" rid="ref45">45</xref>
                    </sup>
                </p>
                <p>The assembly software, including Canu, MetaFlye, and iGDA, was installed using the Micromamba package manager, which utilized recipes from the conda-forge (
                    <ext-link ext-link-type="uri" xlink:href="https://anaconda.org/conda-forge">https://anaconda.org/conda-forge</ext-link>) and bioconda channels (
                    <ext-link ext-link-type="uri" xlink:href="https://anaconda.org/bioconda">https://anaconda.org/bioconda</ext-link>). An additional channel, zhixingfeng (
                    <ext-link ext-link-type="uri" xlink:href="https://anaconda.org/zhixingfeng">https://anaconda.org/zhixingfeng</ext-link>), was used specifically for iGDA as per the installation guidelines. Each tool was set up in separate environments to adhere to the developer&#x2019;s specified version of dependencies.</p>
                <p>For HaploDMF, RVHaplo, and Strainline, specifications were not available in Micromamba, necessitating manual installation. The source codes for HaploDMF (
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/dhcai21/HaploDMF">https://github.com/dhcai21/HaploDMF</ext-link>) and RVHaplo (
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/dhcai21/RVHaplo">https://github.com/dhcai21/RVHaplo</ext-link>) were downloaded from their respective GitHub repositories, with dependencies installed via Micromamba using environment specification files provided by the developers. For Strainline, the source code was sourced from its GitHub repository (
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/HaploKit/Strainline">https://github.com/HaploKit/Strainline</ext-link>). Dependencies were managed and installed with Micromamba, except for Daccord (
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/gt1/daccord">https://github.com/gt1/daccord</ext-link>) and Metabat2 (
                    <ext-link ext-link-type="uri" xlink:href="https://bitbucket.org/berkeleylab/metabat/src/master/">https://bitbucket.org/berkeleylab/metabat/src/master/</ext-link>), which were obtained by downloading the latest releases from the developers&#x2019; websites. Licensing for these tools varies: Strainline, GoldRush, HaploDMF, and RVHaplo are licensed under GPL3.0, Canu and iGDA under GPL2.0, and MetaFlye under the BSD-3-Clause license. 
                    <xref ref-type="table" rid="T1">Table 1</xref> provides an overview of all candidate long-read assemblers.</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>Table 1. </label>
                    <caption>
                        <title>An overview of long read assemblers.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Assembler</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Class</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Type</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Error handling</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Haplotype reconstruction</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Frequency estimation</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Programming language</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Canu</bold>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref47">
                                            <bold>47</bold>
                                        </xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Graph-based (general assembler)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <italic toggle="yes">de novo</italic>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Two filtering steps, global filter to find correction evidence and local filter to decide the correction.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N/A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N/A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">C++</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Goldrush</bold>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref48">
                                            <bold>48</bold>
                                        </xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Graph-based (multi-purpose)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <italic toggle="yes">de novo</italic>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Custom polisher GoldPolish to correct low quality bases, Tigmint-long to correct misassemblies.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N/A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N/A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">C++, Python3</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>MetaFlye</bold>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref42">
                                            <bold>42</bold>
                                        </xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Graph-based (strain-aware)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <italic toggle="yes">de novo</italic>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Thresholding of low-frequency k-mers with Poisson error distribution model</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Iteratively condense read graph</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N/A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">C++, Python</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Strainline</bold>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref28">
                                            <bold>28</bold>
                                        </xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Graph-based (strain-aware)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <italic toggle="yes">de novo</italic>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">local de Bruijn graph-based strategy</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Read clustering, iterative extension by OLC algorithm</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Calculate depth of coverage of each haplotype based on alignment against input reads</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Python</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>HaploDMF</bold>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref43">
                                            <bold>43</bold>
                                        </xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Probabilistic</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Reference-based</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Post-process of assembled sequence with Medaka polisher.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Hierarchical clustering algorithm and Deep matrix factorization</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">based on the number of reads within each cluster</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Python3</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>iGDA</bold>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref44">
                                            <bold>44</bold>
                                        </xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Probabilistic</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Reference-based</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N/A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Estimate number of clusters with ANN algorithm and perform contigs extension using overlapping of SNVs.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Based on depth of coverage of each SNV site</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">C++</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>RVHaplo</bold>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref45">
                                            <bold>45</bold>
                                        </xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Probabilistic</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Reference-based</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Post-process of assembled sequence with Medaka polisher.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Hierarchical clustering and generate consensus for each cluster</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">based on the number of reads within each cluster</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Python3</td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <p>Basic information of the assemblers is shown. N/A indicates no information provided by developer.</p>
                    </table-wrap-foot>
                </table-wrap>
            </sec>
            <sec id="sec9">
                <title>Computational systems</title>
                <p>Benchmarking analyses were conducted on three different computational systems: a server, a workstation PC, and a standard home PC. Specifications of each system is indicated in 
                    <xref ref-type="table" rid="T2">Table 2</xref>. The performance parameters of each assembler were measured as follows: 1) total CPU utilization, indicating the overall workload on the CPU, 2) memory usage, and 3) total runtime, defined as the elapsed wall-clock time from the start of assembly to the generation of output contigs. A dedicated Python script was developed to track the resource usage of each process of the long-read assemblers, capturing CPU and memory utilization every 100 milliseconds at each stage of the assembly process. As the pipeline operates within a containerized environment facilitated by Docker, unrelated processes such as operating system operations were filtered out during the measurement of resource utilization.</p>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>Table 2. </label>
                    <caption>
                        <title>Computational systems and specifications.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">System</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">CPU</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Memory</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">OS</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Server</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Intel(R) Xeon(R) Gold 6248 CPU @ 2.50 GHz</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">189G</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Ubuntu 20.04.6 LTS</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Workstation PC</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Intel(R) Core i7-8700 CPU @ 3.20 GHz</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">DIMM DDR4 Synchronous 2133 MHz x4 (total 64 Gb)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Debian11</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Generic Home PC</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">AMD RYZEN5 2600 3.4 GHz</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">8GB DDR4</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Windows 10 on Docker with Linux Subsystem</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec id="sec10">
                <title>HIV-1 genome mixtures</title>
                <p>Long-read FASTQ files were simulated from four HIV-1 genome mixtures: (1) 2 group M HIV-1 subtypes (2M), (2) 2 CRF subtypes (2C), (3) 1 group M HIV-1 subtype and 1 CRF subtype (1M1C), and (4) 2 group M HIV-1 subtypes and 1 CRF subtype (2M1C), see 
                    <xref ref-type="table" rid="T3">Table 3</xref>. Each genome mixture comprised 100 sets of FASTA files containing corresponding HIV-1 genomes randomly selected from the Los Alamos HIV sequence databases
                    <sup>
                        <xref ref-type="bibr" rid="ref53">53</xref>
                    </sup> (
                    <ext-link ext-link-type="uri" xlink:href="https://www.hiv.lanl.gov/">https://www.hiv.lanl.gov/</ext-link>). All 400 HIV-1 FASTA sets underwent data simulation to generate 400 individual long-read FASTQ files using NanoSim V3.1.0. A list of mixtures within the 400 FASTA sets is available in Extended data, Table S6.
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> Additionally, 20 long-read FASTQ files, comprising 5 samples from each mixture (
                    <xref ref-type="table" rid="T3">Table 3</xref>), were generated with varying amplicon sizes as described in the assembly quality assessment.</p>
                <table-wrap id="T3" orientation="portrait" position="float">
                    <label>Table 3. </label>
                    <caption>
                        <title>The HIV-1 genome mixtures and the simulated FASTQ files.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Experiment</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Data</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Detail</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Number of FASTQ</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Average read length (nt)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Number of reads per FASTQ</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="4" valign="top">
                                    <bold>Minimum read length</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>1,000-nt</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5 FASTQ files for each one of 4 HIV-1 mixtures, e.g., 2M, 2C, 1M1C, and 2M1C mixture.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">20</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1,120.715</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2,000</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>2,000-nt</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5 FASTQ files for each one of 4 HIV-1 mixtures, e.g., 2M, 2C, 1M1C, and 2M1C mixture.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">20</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2,015.835</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2,000</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>3,000-nt</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5 FASTQ files for each one of 4 HIV-1 mixtures, e.g., 2M, 2C, 1M1C, and 2M1C mixture.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">20</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">3,022.300</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2,000</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>4,000-nt</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5 FASTQ files for each one of 4 HIV-1 mixtures, e.g., 2M, 2C, 1M1C, and 2M1C mixture.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">20</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">3,917.050</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2,000</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="4" valign="top">
                                    <bold>HIV-1 mixture</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>2M</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2 Group M HIV-1 subtypes</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">100</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">8,316.558</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2,000</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>2C</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2 HIV-1 CRFs</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">100</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">8,704.532</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2,000</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>1M1C</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1 Group M HIV-1subtype and 1 CRF</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">100</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">8,366.96</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2,000</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>2M1C</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2 Group M HIV-1 subtypes and 1 CRF</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">100</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">8,298.88</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2,000</td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <p>The characteristics of the simulated FASTQ files for the minimum read length dataset can be found in Extended data, Table S7.
                            <sup>
                                <xref ref-type="bibr" rid="ref54">54</xref>
                            </sup> The information about the HIV-1 genome mixtures is provided in Extended data, Table S8.
                            <sup>
                                <xref ref-type="bibr" rid="ref54">54</xref>
                            </sup>
                        </p>
                    </table-wrap-foot>
                </table-wrap>
            </sec>
            <sec id="sec11">
                <title>Data simulation and experimental data</title>
                <p>The data simulations were conducted using NanoSim V3.1.0 (
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/bcgsc/NanoSim/releases/tag/v3.1.0">https://github.com/bcgsc/NanoSim/releases/tag/v3.1.0</ext-link>) in metagenomic mode,
                    <sup>
                        <xref ref-type="bibr" rid="ref55">55</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref56">56</xref>
                    </sup> employing a pretrained read profile model named human NA12878 DNA FAB49712 guppy. Additional simulation conditions included a maximum read length of 12,000 nt, a minimum read length of 7,500 nt, a median read length of 9,000 nt, and a standard deviation of read length set to 0.75. The parameter &#x201c;--perfect&#x201d; was configured as True to simulate error-free reads. A depth of coverage of 2,000 was selected, deemed sufficient for variant calling and quasispecies detection.
                    <sup>
                        <xref ref-type="bibr" rid="ref57">57</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref58">58</xref>
                    </sup> The resulting FASTQ files (
                    <xref ref-type="table" rid="T3">Table 3</xref>) were similar to those generated by an Oxford-ONT R9.4 chemistry using a Guppy 3.1.5 basecaller. Furthermore, this study benchmarked the assemblers using experimental data sourced from published studies (Extended data, Table S1).
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> All FASTA templates for data simulation,
                    <sup>
                        <xref ref-type="bibr" rid="ref59">59</xref>
                    </sup> simulated FASTQ files,
                    <sup>
                        <xref ref-type="bibr" rid="ref60">60</xref>
                    </sup> and QC results of the simulated FASTQ files
                    <sup>
                        <xref ref-type="bibr" rid="ref61">61</xref>
                    </sup> are available as Underlying data.</p>
            </sec>
            <sec id="sec12">
                <title>Assembly quality evaluation</title>
                <p>Sequence similarity was assessed using local NCBI BLASTN
                    <sup>
                        <xref ref-type="bibr" rid="ref62">62</xref>
                    </sup> against a customized database of 12,000 HIV-1 genomes from the Los Alamos HIV-1 sequence database,
                    <sup>
                        <xref ref-type="bibr" rid="ref53">53</xref>
                    </sup> with contigs showing &gt;95% sequence similarity considered. QUAST
                    <sup>
                        <xref ref-type="bibr" rid="ref63">63</xref>
                    </sup> was utilized to compare contigs from different assemblers, with regular QUAST for single viral isolate datasets and MetaQUAST
                    <sup>
                        <xref ref-type="bibr" rid="ref63">63</xref>
                    </sup> for multiple isolate inputs. Primary QUAST parameters assessed were number of contigs, sizes, N50, % genome fraction, and total aligned bases, along with assembly correctness measured by average mismatches and indels per 100,000 aligned bases. Additionally, assembly quality was evaluated by average completeness of major HIV-1 ORFs (e.g., 
                    <italic toggle="yes">gag</italic>, 
                    <italic toggle="yes">gag-pol/pol</italic>, and 
                    <italic toggle="yes">env</italic>). The results generated from the pipeline are available as Underlying data.
                    <sup>
                        <xref ref-type="bibr" rid="ref64">64</xref>
                    </sup>
                </p>
            </sec>
        </sec>
        <sec id="sec13" sec-type="results">
            <title>Results</title>
            <sec id="sec14">
                <title>Evaluating the computational performance of the assemblers</title>
                <p>All seven assemblers were executed on three computational systems to measure wall-clock time usage, memory usage, and CPU utilization. Each assembler processed 100 simulated 2M FASTQ files. 
                    <xref ref-type="fig" rid="f2">Figure 2</xref> illustrates the time usage of all FASTQ files processed by different assembly pipelines across the three computational systems. As shown in Extended data, Table S2,
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> the workstation server completed all 700 assemblies in 3 days, 9 hours, 50 minutes, and 28.31 seconds, while the workstation PC required 3 days, 23 hours, 24 minutes, and 2.64 seconds. Due to a failed Strainline-mediated assembly with the generic home PC, the remaining 600 assemblies took 2 days, 14 hours, 49 minutes, and 16.79 seconds. Notably, GoldRush emerged as the fastest assembler, with approximately 30 seconds or less per assembly under all three computational systems, while MetaFlye was the slowest, taking approximately 1,370 seconds or more per assembly.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>A depiction of time usage of all 2,100 individual assemblies.</title>
                        <p>The assemblies (100 simulated 2M FASTQ files per assembler per system) are grouped by computational systems on the x-axis and by assemblers on the y-axis. Additionally, an accompanying statistical summary is provided in Extended data, Table S2.
                            <sup>
                                <xref ref-type="bibr" rid="ref54">54</xref>
                            </sup>
                        </p>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure2.gif"/>
                </fig>
                <p>For memory usage, Strainline could not be tested with the generic home PC due to a segmentation fault error from memory limitation. From the 189-Gb workstation server, the order of maximum memory usage, from highest to lowest, was Strainline, MetaFlye, GoldRush, HaploDMF, iGDA, Canu, and RVHaplo (
                    <xref ref-type="fig" rid="f3">Figure 3A</xref> and 
                    <xref ref-type="fig" rid="f3">B</xref>). Similarly, from the 64-Gb workstation PC, the order of maximum memory usage was Strainline, Canu, MetaFlye, HaploDMF, iGDA, RVHaplo, and GoldRush (
                    <xref ref-type="fig" rid="f3">Figure 3C</xref> and 
                    <xref ref-type="fig" rid="f3">D</xref>). From the 8-Gb generic home PC, the order of maximum memory usage was MetaFlye, iGDA, Canu, GoldRush, RVHaplo, and HaploDMF (
                    <xref ref-type="fig" rid="f3">Figure 3E</xref> and 
                    <xref ref-type="fig" rid="f3">F</xref>). Extended data, Table S3
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> presents the maximum memory usages of the six assemblers.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Memory usage overtime and maximum memory usage.</title>
                        <p>Memory usages of all assemblers under three computational systems, including (A) and (B) a workstation server, (C) and (D) a workstation PC, and (E) and (F) a generic home PC during the assembly phase of the pipeline.</p>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure3.gif"/>
                </fig>
                <p>
                    <xref ref-type="fig" rid="f4">Figure 4</xref> demonstrates the CPU utilizations of all assemblers, with the percentages of maximum CPU usages shown in Extended data, Table S3.
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> On the workstation server (
                    <xref ref-type="fig" rid="f4">Figure 4A</xref>), the assemblers with the highest to lowest averaged CPU usage were HaploDMF, Strainline, GoldRush, iGDA, RVHaplo, Canu, and MetaFlye. Both GoldRush (3,625.9%) and Canu (647.4%) did not overutilize the CPU and remained within the acceptable limit for 36 cores. On the workstation PC (
                    <xref ref-type="fig" rid="f4">Figure 4B</xref>), the order of CPU usage was Strainline, HaploDMF, iGDA, Canu, RVHaplo, GoldRush, and MetaFlye. Notably, MetaFlye overutilized CPU at 1,437.4%, whereas Canu demonstrated only 637.5% peak utilization. Under the generic home PC system (
                    <xref ref-type="fig" rid="f4">Figure 4C</xref>), the CPU usage order was GoldRush, iGDA, RVHaplo, MetaFlye, Canu, and HaploDMF. Both HaploDMF (726.3%) and Canu (628.9%) did not overutilize the CPU. Finally, Correlations between computational factors (e.g., CPUs, maximum memory, and assemblers) and runtime were analyzed using Jamovi statistical software with a non-parametric one-way ANOVA.
                    <sup>
                        <xref ref-type="bibr" rid="ref65">65</xref>
                    </sup>
                    <italic toggle="yes">
                        <sup>,</sup>
                    </italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref66">66</xref>
                    </sup> As demonstrated in 
                    <xref ref-type="table" rid="T4">Table 4</xref>, assembler selection significantly impacted runtime with a large effect size (&#x03b5;
                    <sup>2</sup> = 0.903***), while no statistically significant differences in runtime were observed across different levels of CPUs and maximum memory.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>CPU usages of all seven assemblers under three different computational systems.</title>
                        <p>In this study, the server could accommodate 36 cores (3,600%), while both the workstation PC and the generic home PC could accommodate up to 12 cores (or 1,200%). CPU usages were collected from (A) a server, (B) a workstation PC, and (C) a generic home PC. The y-axis represents percent CPU usage (100% per core). Multi-threaded processing can be observed when CPU utilization is above 100%.</p>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure4.gif"/>
                </fig>
                <table-wrap id="T4" orientation="portrait" position="float">
                    <label>Table 4. </label>
                    <caption>
                        <title>A statistical correlation analysis between computational factors and runtime.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Correlation</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">&#x03c7;
                                    <sup>2</sup>
                                </th>
                                <th align="left" colspan="1" rowspan="1" valign="top">df</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">p</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">&#x03b5;
                                    <sup>2</sup>
                                </th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">CPUs &#x2013; Runtime</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.217</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.641</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.09e-4</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Memory &#x2013; Runtime</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.921</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.631</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">4.61e-4</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Assembler &#x2013; Runtime</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1,805</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&lt;&#x2009;0.001***</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.903</td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <p>Kruskal-Wallis (***p-value &lt;0.001).</p>
                    </table-wrap-foot>
                </table-wrap>
            </sec>
            <sec id="sec15">
                <title>Determining the minimum read lengths required for the assemblers</title>
                <p>The minimum read length was determined by testing the assemblers with FASTQ files from four different median read lengths: 1,000-nt, 2,000-nt, 3,000-nt, and 4,000-nt (
                    <xref ref-type="table" rid="T3">Table 3</xref>). This assessment aimed to ensure that the assembled contigs exhibited high contiguity, genome completeness, and recovered intact major HIV-1 open reading frames (ORFs) such as gag, gag-pol/pol, and env. The evaluation was conducted using the server system.</p>
                <p>The first evaluation of the minimum read lengths was based on the assembled contig size. Increasing median read lengths resulted in longer contigs assembled by Canu (
                    <xref ref-type="fig" rid="f5">Figure 5A</xref>). MetaFlye (
                    <xref ref-type="fig" rid="f5">Figure 5B</xref>) and Strainline (
                    <xref ref-type="fig" rid="f5">Figure 5C</xref>) could not process the 1,000-nt inputs, but Strainline assembled significantly longer contigs from longer reads. However, GoldRush failed the assessment. The remaining 
                    <italic toggle="yes">de novo</italic> assemblers, Canu, MetaFlye, and Strainline, processed the 4,000-nt median read length inputs and yielded contig sizes relatively close to a 9-kb HIV-1 genome. All reference-based assemblers performed well with all four inputs, generating contig sizes close to that of an HIV-1 genome (
                    <xref ref-type="fig" rid="f5">Figure 5D-F</xref>). Complete statistical evaluation details can be found in Extended data, Table S9.
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup>
                </p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>Contig size distribution from the assemblers processing four median read length FASTQ inputs.</title>
                        <p>The pipelines were (A) Canu, (B) MetaFlye, (C) Strainline, (D) HaploDMF, (E) iGDA, and (F) RVHaplo. The x-axis represents four median read lengths (1,000&#x2013;4,000 nt), and the y-axis represents contig size. Each data point denotes an individual contig. The mean contig size of each median read length is indicated by a black square with error bars representing 95% confidence intervals. Absolute mean differences are shown in brackets for those comparisons with statistical significance (p&lt;0.001).</p>
                    </caption>
                    <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure5.gif"/>
                </fig>
                <p>Sequence similarity and genome fraction of assembled contigs were evaluated across different median read lengths. For the 1,000-nt reads (Extended data, Figure S1A),
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> Canu yielded &gt;5,000-nt contigs with &gt;99.5% similarity, while iGDA and RVHaplo generated &gt;8,000-nt contigs with 97% and 99% similarity, respectively. RVHaplo showed superior recovery of major HIV-1 ORFs (Extended data, Figure S2A).
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> At the 2,000-nt reads (Extended data, Figure S1B),
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> Canu and MetaFlye produced &gt;7,000-nt contigs with &gt;99% similarity, whereas Strainline&#x2019;s contigs were ~3,000-nt with &gt;98% similarity. All reference-based assemblers yielded longer contigs with greater similarity and improved ORF recovery (Extended data, Figure S2B).
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> At the 3,000-nt reads (Extended data, Figure S1C),
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> Strainline only produced 4,000-nt contigs with &gt;99.50% similarity, while others generated longer contigs with higher similarity and improved ORF recovery (Extended data, Figure S2C).
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> Finally, at the 4,000-nt reads (Extended data, Figure S1D),
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> all assemblers produced longer contigs with higher similarity. All, except Canu and HaploDMF, demonstrated satisfactory % recovery of the major HIV-1 ORFs (Extended data, Figure S2D).
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> RVHaplo yielded the highest averaged genome fraction (Extended data, Figure S1E).
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup>
                </p>
                <p>A further analysis (
                    <xref ref-type="table" rid="T5">Table 5</xref>) revealed a statistically significant positive correlation between the median read lengths of the inputs and the lengths of the assembled contigs (&#x03c1;=0.185***), as well as between the median read lengths and sequence similarity (&#x03c1;=0.163***). Additionally, assembler selection demonstrated a positive correlation with the size of assembled contigs (&#x03c1;=0.131***), but not with sequence similarity (&#x03c1;=-0.220). Thus, a key finding from this study suggested that a minimum read length of 2,000-nt is necessary for all assemblers, with optimal outcomes achieved at 4,000-nt. In summary, longer median read sizes are associated with higher quality contigs, characterized by longer contig length and greater sequence similarity.</p>
                <table-wrap id="T5" orientation="portrait" position="float">
                    <label>Table 5. </label>
                    <caption>
                        <title>A statistical correlation analysis of the read lengths of inputs and outputs and sequence similarity.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top"/>
                                <th align="left" colspan="1" rowspan="1" valign="top"/>
                                <th align="left" colspan="1" rowspan="1" valign="top">Median read length</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Sequence similarity (%identity)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Assembler selection</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="3" valign="top">Sequence similarity (%identity)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Spearman's rho</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.163
                                    <xref ref-type="table-fn" rid="tfn3">***</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2014;</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">df</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1,243</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2014;</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">p-value</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&lt;&#x2009;0.001</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2014;</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="3" valign="top">Assembler Selection</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Spearman's rho</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2014;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">-0.220</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2014;</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">df</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2014;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1,243</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2014;</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">p-value</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2014;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.000</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2014;</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="3" valign="top">Output contig length</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Spearman's rho</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.185
                                    <xref ref-type="table-fn" rid="tfn3">***</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">-0.290</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.131
                                    <xref ref-type="table-fn" rid="tfn3">***</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">df</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1,243</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1,243</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1,243</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">p-value</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&lt;&#x2009;0.001</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.000</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&lt;&#x2009;0.001</td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <p>H
                            <sub>a</sub> is positive correlation</p>
                        <fn-group content-type="footnotes">
                            <fn id="tfn1">
                                <label>*</label>
                                <p>p&lt;0.05,</p>
                            </fn>
                            <fn id="tfn2">
                                <label>**</label>
                                <p>p&lt;0.01,</p>
                            </fn>
                            <fn id="tfn3">
                                <label>***</label>
                                <p>p&lt;0.001, one-tailed.</p>
                            </fn>
                        </fn-group>
                    </table-wrap-foot>
                </table-wrap>
            </sec>
            <sec id="sec16">
                <title>Evaluating the performance of assemblers using the four HIV-1 genome mixtures</title>
                <p>This experiment evaluated the performance of assemblers in generating contigs from simulated FASTQ inputs containing heterogeneous HIV-1 sequences. These sequences were simulated from four HIV-1 genome mixtures: 2M (2 group M subtypes), 2C (2 circulating recombinant forms, CRFs), 1M1C (1 group M subtype and 1 CRF), and 2M1C (2 group M subtypes and 1 CRF), as shown in 
                    <xref ref-type="table" rid="T3">Table 3</xref>. The evaluation criteria included contiguity, genome completeness, the recovery of major HIV-1 open reading frames (ORFs), and the error rate. Initially, contig sizes were assessed, revealing that most fell within a range of 8,500&#x2013;9,200 nt, close to the size of the HIV-1 genome (Extended data, Figure S3).
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> Notably, MetaFlye produced shorter contigs, ranging from 7,800 to 8,800 nt. Detailed statistical analysis of this assessment is available in Extended data, Table S10.
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup>
                </p>
                <p>Considering both contig length and averaged genome fraction, all assembler pipelines processed the four datasets to yield contigs exceeding 8,000 nt, close to the size of the HIV-1 genome (
                    <xref ref-type="fig" rid="f6">Figure 6A&#x2013;D</xref>). However, the MetaFlye pipeline generated contigs with a relatively low (&lt;90%) averaged genome fraction (
                    <xref ref-type="fig" rid="f6">Figure 6E</xref>). Except for MetaFlye, all assembler pipelines demonstrated satisfactory recovery of major HIV-1 ORFs (
                    <xref ref-type="fig" rid="f7">Figure 7A&#x2013;D</xref>). 
                    <italic toggle="yes">De novo</italic> and reference-based assemblers showed a clear distinction in 2M, 2C, and 2M1C HIV-1 mixtures (
                    <xref ref-type="fig" rid="f6">Figure 6A</xref>, 
                    <xref ref-type="fig" rid="f6">B</xref>, and 
                    <xref ref-type="fig" rid="f6">D</xref>), with 
                    <italic toggle="yes">de novo</italic> assemblers producing contigs with &gt;95.5% sequence similarity. Reference-based assemblers HaploDMF and RVHaplo yielded contigs with relatively high sequence similarity (98&#x2013;99.5%), while iGDA produced 95&#x2013;97% sequence similarity. However, iGDA generated fewer contigs from the 2M mixture inputs (
                    <xref ref-type="fig" rid="f6">Figure 6A</xref>) due to its reliance on overlapping SNVs for contig extension, which may fail with highly similar sequences. The presence of CRFs allowed iGDA to recover more contigs (
                    <xref ref-type="fig" rid="f6">Figure 6B&#x2013;D</xref>). Individual assembly evaluation statistics for each sample are available in Extended data, Table S10.
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup>
                </p>
                <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                    <label>Figure 6. </label>
                    <caption>
                        <title>The MetaQUAST assessment evaluated contigs from the assemblers processing four sets of the simulated FASTQ.</title>
                        <p>The FASTQ file sets were (A) 2M HIV-1 mixture, (B) 2C HIV-1, (C) 1M1C HIV-1 mixture, and (D) 2M1C HIV-1 mixture (
                            <xref ref-type="table" rid="T3">Table 3</xref>). The x-axis displays the percentage sequence similarity obtained from a BLAST alignment with the corresponding reference genomes, while the y-axis represents the aligned length, indicating the longest continuous alignment between each contig and its reference genome. Dot sizes indicate the genome fraction (%Ref Aligned), calculated as the ratio of continuous aligned bases to the total reference genome length. Subplots above and beside each figure display histograms of the contig counts. Additionally, (E) shows the averaged genome fraction (%) of contigs from different long-read assembler pipelines.</p>
                    </caption>
                    <graphic id="gr6" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure6.gif"/>
                </fig>
                <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                    <label>Figure 7. </label>
                    <caption>
                        <title>Averaged completeness of HIV-1 open reading frames (ORFs) of the contigs generated from the assemblers.</title>
                        <p>The contigs were from the assemblers analyzed simulated FASTQ inputs of the 4 HIV-1 mixtures, e.g., (A) 2M, (B) 2C, (C) 1M1C, and (D) 2M1C.</p>
                    </caption>
                    <graphic id="gr7" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure7.gif"/>
                </fig>
                <p>Since all input data consisted of error-free reads, the assembly correctness (
                    <xref ref-type="fig" rid="f8">Figure 8</xref>) was assessed by comparing the contigs with their respective HIV-1 genomes. The correctness of each assembler was evaluated based on the average number of mismatches per 100,000 aligned bases (
                    <xref ref-type="fig" rid="f8">Figure 8A</xref>) and the average number of indels per 100,000 aligned bases (
                    <xref ref-type="fig" rid="f8">Figure 8B</xref>). Among the four 
                    <italic toggle="yes">de novo</italic> assemblers, all introduced a few errors to the contigs, with Strainline exhibiting a greater degree of accuracy compared to the rest. MetaFlye, however, showed the least assembly correctness. Regarding the reference-based assemblers, HaploDMF and RVHaplo demonstrated similar degrees of assembly correctness, while iGDA exhibited the least. Interestingly, indels were predominantly associated with all reference-based assemblers in this study.</p>
                <fig fig-type="figure" id="f8" orientation="portrait" position="float">
                    <label>Figure 8. </label>
                    <caption>
                        <title>The errors observed in the contigs from the simulated FASTQs of the 4 HIV-1 mixtures.</title>
                        <p>The errors were (A) the average number of mismatches (e.g., true SNPs and sequencing errors) per 100,000 aligned bases and (B) the average number of indels per 100,000 aligned bases.</p>
                    </caption>
                    <graphic id="gr8" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure8.gif"/>
                </fig>
                <p>The final assessment investigated the HIV-1 subtype recall rates of the assemblers processing the 4 HIV-1 mixture datasets. In the 2M dataset (
                    <xref ref-type="fig" rid="f9">Figure 9A</xref>), the reference-based assemblers, such as HaploDMF, RVHaplo, and iGDA, correctly recalled most of the subtypes, while the 
                    <italic toggle="yes">de novo</italic> assemblers exhibited lower recall rates. Strainline showed the best performance with &gt;80% recall rates on all common subtypes, except for an overestimation on subtype B (118.52%). Canu, GoldRush, and MetaFlye exhibited low recall rates due to their collapsed assembly approach poorly handling sequences of relatively similar HIV-1 subtypes. Results from the 2C (
                    <xref ref-type="fig" rid="f9">Figure 9B</xref>) and the 1M1C data (
                    <xref ref-type="fig" rid="f9">Figure 9C</xref>) were similar to those observed in the 2M data. Interestingly, results from the 2M1C data (
                    <xref ref-type="fig" rid="f9">Figure 9D</xref>) showed much lower recall rates (&lt;60%) by all assemblers on all HIV-1 group M subtypes due to either a reduction in coverage or an error correction process. For the CRFs, the recall rates were similar to those shown in the 2C and 1M1C datasets.</p>
                <fig fig-type="figure" id="f9" orientation="portrait" position="float">
                    <label>Figure 9. </label>
                    <caption>
                        <title>HIV-1 subtypes and CRFs recall rate by each assembler.</title>
                        <p>Subtype recall rate of each subtype or CRF in (A) 2M HIV-1 mixture, (B) 2C HIV-1 mixture, (C) 1M1C HIV-1 mixture, and (D) 2M1C HIV-1 mixture. Each assembler's recall rate was calculated from the number of correctly identified HIV-1 subtype (or CRF) divided by a total number of a corresponding subtype (or CRF) appearing in each HIV-1 mixture. The value of over 100% was from extra contigs of that subtype produced from mixing closely related subtypes together. See Extended data, Table S11
                            <sup>
                                <xref ref-type="bibr" rid="ref54">54</xref>
                            </sup> for more details.</p>
                    </caption>
                    <graphic id="gr9" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure9.gif"/>
                </fig>
                <p>Based on the results from the previous assessments, Strainline, MetaFlye, and HaploDMF were selected for further investigation using publicly available experimental sequencing data of HIV-1 and other viruses. Strainline, despite its high memory usage, offered rapid 
                    <italic toggle="yes">de novo</italic> viral haplotype reconstruction. MetaFlye, chosen for its compatibility with lower-end devices and acceptable memory usage and runtime, served as a 
                    <italic toggle="yes">de novo</italic> assembler. Despite its long runtime, HaploDMF was included as a representative reference-based assembler with memory efficiency comparable to MetaFlye.</p>
            </sec>
            <sec id="sec17">
                <title>Assessing the performance of three selected long-read assembler pipelines using experimental data from long-read sequencing of HIV-1 and other viruses</title>
                <p>This experiment assessed the performance of selected assemblers, including Strainline, MetaFlye, and HaploDMF, using experimental data of HIV-1 and other pathogenic viruses (Extended data, Table S1).
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> The quality of assembled contigs was evaluated through QUAST, which provided metrics such as the number of contigs, sizes, N50, % genome fraction, and total aligned bases. The experiment was conducted using a workstation server.</p>
                <p>
                    <italic toggle="yes">HIV-1 recombinant forms and NL4-3</italic>
                </p>
                <p>The dataset comprised plasma HIV-1 genomes (TRN01, TRN08, TRN09) and NL4-3 laboratory isolate from the MinION sequencer (BioProject: PRJDB13369).
                    <sup>
                        <xref ref-type="bibr" rid="ref67">67</xref>
                    </sup> All assemblers generated NL4-3 contigs &gt;9,000 nt (
                    <xref ref-type="fig" rid="f10">Figure 10A</xref>). MetaFlye and HaploDMF gave contigs with 95.27% and 91.22% sequence similarity, respectively. Strainline produced nine contigs, the best at 94.34% similarity (Extended data, Table S4).
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> For TRN01, all assemblers produced contigs identified by REGA (version 3.46)
                    <sup>
                        <xref ref-type="bibr" rid="ref68">68</xref>
                    </sup> as subtype B and F recombinants, similar to a previous finding.
                    <sup>
                        <xref ref-type="bibr" rid="ref67">67</xref>
                    </sup> Similarly, TRN08 yielded subtype B contigs. However, no recombinant forms with CRF01_AE were reconstructed. For TRN09, Strainline and HaploDMF replicated original findings, while HaploDMF also identified an A-C recombinant. In contrast, MetaFlye generated subtype A and CRF07_BC gagpol contigs, and a C-CRF01_AE recombinant. REGA subtyping results are in Extended data, Table S12.
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> For the 10 5&#x2019; half genomes of NL4-3 (BioProject: PRJNA938445),
                    <sup>
                        <xref ref-type="bibr" rid="ref69">69</xref>
                    </sup> all assemblers produced contigs meeting both length and alignment criteria. MetaFlye achieved the highest performance with an 89.26% average genome fraction, followed by Strainline at 85.93% and HaploDMF at 84.64%. Detailed alignment statistics are provided in Extended data, Table S13.
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup>
                </p>
                <fig fig-type="figure" id="f10" orientation="portrait" position="float">
                    <label>Figure 10. </label>
                    <caption>
                        <title>Alignment statistics of assembled contigs generated from HIV-1, SARS-CoV-2, and Mpox sequencing data.</title>
                        <p>The assembled contigs were from (A) Full genome laboratory isolate NL4-3, (B) Mpox propagated in a CV-1 cell line and (C) Mpox from pustular lesion, and (D) SARS-CoV-2 sequencing data by the selected assemblers. The details of samples and reference genomes are shown in Extended data, Table S1.
                            <sup>
                                <xref ref-type="bibr" rid="ref54">54</xref>
                            </sup> Visualization modified from Icarus.
                            <sup>
                                <xref ref-type="bibr" rid="ref63">63</xref>
                            </sup> Individual alignment statistics for each haplotype can be found in Extended data, Table S14.
                            <sup>
                                <xref ref-type="bibr" rid="ref54">54</xref>
                            </sup>
                        </p>
                    </caption>
                    <graphic id="gr10" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure10.gif"/>
                </fig>
                <p>
                    <italic toggle="yes">Monkeypox (Mpox)</italic>
                </p>
                <p>The first sequencing data, derived from MinION sequencing of a Mpox propagated in a CV-1 cell line (SRA: ERR10963128),
                    <sup>
                        <xref ref-type="bibr" rid="ref70">70</xref>
                    </sup> contained 2,588 reads post-filtering, with an average length of 4,633 nt and a maximum length of 9,070 nt. HaploDMF generated 3 contigs, the longest being 188,872 nt, with 99.35% genome fraction. MetaFlye produced 7 contigs, the longest at 129,209 nt, with 95.98% genome fraction. Strainline yielded 12 contigs, the longest being 33,081 nt, with a genome fraction of 78.07% (
                    <xref ref-type="fig" rid="f10">Figure 10B</xref> and Extended data, Table S4
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup>). The second dataset, derived from MinION sequencing of a Mpox from a pustular lesion (SRA: SRR15830920),
                    <sup>
                        <xref ref-type="bibr" rid="ref71">71</xref>
                    </sup> comprised 30,990 reads after filtering, with an average length of 9,048 nt and a maximum length of 59,093 nt. MetaFlye and HaploDMF exhibited similar performance, with MetaFlye assembling 122 contigs with an N50 of 13,335 nt and the longest having a 98.58% genome fraction. HaploDMF produced 5 contigs with an N50 of 173,134 nt and the longest contig having a genome fraction of 98.62%. In contrast, Strainline generated 7 contigs with an N50 of 30,255 nt and the longest contig of 101,047 nt. Only the 65,261-nt contig of those 7 contigs demonstrated 51.31% genome fraction (
                    <xref ref-type="fig" rid="f10">Figure 10C</xref> and Extended data, Table S4
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup>).</p>
                <p>
                    <italic toggle="yes">Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)</italic>
                </p>
                <p>The dataset comprises GridION sequencing data of a SARS-CoV-2 clinical sample retrieved from the SRA database (SRP250446).
                    <sup>
                        <xref ref-type="bibr" rid="ref72">72</xref>
                    </sup> Strainline generated 3 contigs: 1 full-length and 2 duplicated half-genome contigs, with a 99.93% genome fraction. MetaFlye and HaploDMF each produced a single contig, with genome fractions of 96.77% and 93.11%, respectively. However, MetaFlye&#x2019;s contig lacked the first 500 nucleotides of the genome, while HaploDMF&#x2019;s contig had a 2,000-nt gap when aligned to the reference genome (
                    <xref ref-type="fig" rid="f10">Figure 10D</xref> and Extended data, Table S4
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup>).</p>
                <p>
                    <italic toggle="yes">Polio virus mixture</italic>
                </p>
                <p>This simulated ONT dataset consisted of 6 poliovirus type 2 sequences.
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup> In 
                    <xref ref-type="fig" rid="f11">Figure 11A</xref> and Extended data, Table S5,
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> Strainline and HaploDMF generated contigs with N50 values of 7,453-nt and 7,436-nt, respectively, closely matching the 7,400-nt genome of Polio virus. Strainline produced 6 contigs with an average genome fraction of 99.34%, all sharing &gt;95% sequence similarity with 6 viral haplotypes. HaploDMF yielded 3 contigs with &gt;95% sequence similarity and an average genome fraction of 99.80%. MetaFlye, however, produced 3 shorter contigs with an N50 of 4,830-nt. These contigs shared &gt;95% sequence similarity with 5 haplotypes, but with an average genome fraction of 70.13%.</p>
                <fig fig-type="figure" id="f11" orientation="portrait" position="float">
                    <label>Figure 11. </label>
                    <caption>
                        <title>Alignment statistics of assembled contigs generated from simulated mix-strain viral sequencing data.</title>
                        <p>The assembled contigs were from simulated mix-strain viral sequencing data of (A) Polio virus, (B) Hepatitis C virus (HCV), and (C) ZIKA Virus (ZIKV) by the selected assemblers. The details of samples and reference genomes are shown in Extended data, Table S1.
                            <sup>
                                <xref ref-type="bibr" rid="ref54">54</xref>
                            </sup> Visualization modified from Icarus.
                            <sup>
                                <xref ref-type="bibr" rid="ref63">63</xref>
                            </sup> Individual alignment statistics for each haplotype can be found in Extended data, Table S14.
                            <sup>
                                <xref ref-type="bibr" rid="ref54">54</xref>
                            </sup>
                        </p>
                    </caption>
                    <graphic id="gr11" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure11.gif"/>
                </fig>
                <p>
                    <italic toggle="yes">Hepatitis C virus (HCV) mixture</italic>
                </p>
                <p>This simulated ONT dataset consisted of 10 strains of hepatitis C virus (HCV), Subtype 1a.
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup> In 
                    <xref ref-type="fig" rid="f11">Figure 11B</xref> and Extended data, Table S5,
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> Strainline and HaploDMF yielded contigs with N50 values of 9,286-nt and 9,309-nt, respectively. Strainline produced 12 contigs with an average genome fraction of 99.85%, all of them sharing &gt;95% sequence similarity with all 10 haplotypes with 2 duplicated contigs. HaploDMF generated 3 contigs with a 99.99% average genome fraction, all exhibiting &gt;95% sequence similarity with HCV haplotypes. MetaFlye produced 5 shorter contigs with an N50 of 5,670-nt, and the longest contig (8,006-nt) shared &gt;95% sequence similarity with 5 HCV haplotypes. However, MetaFlye contigs had an average genome fraction of 77.40%.</p>
                <p>
                    <italic toggle="yes">ZIKA Virus (ZIKV) mixture</italic>
                </p>
                <p>This simulated ONT dataset comprised 15 strains of Zika virus (ZIKV).
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup> In 
                    <xref ref-type="fig" rid="f11">Figure 11C</xref> and Extended data, Table S5,
                    <sup>
                        <xref ref-type="bibr" rid="ref54">54</xref>
                    </sup> Strainline, MetaFlye, and HaploDMF generated contigs with N50 values of 10,264-nt, 5,842-nt, and 10,264-nt, respectively, closely matching the size of the ZIKV genome (11,000-nt). Strainline produced 13 contigs that recovered 14 ZIKV haplotypes, with an average genome fraction of 99.84%. MetaFlye yielded 15 contigs with a lower average genome fraction of 78.56%, recovering 10 ZIKV haplotypes. Interestingly, HaploDMF produced 10 contigs, each with an average genome fraction of 99.90%, and recovered 11 ZIKV haplotypes.</p>
                <p>In summary, the selected assemblers showed strong performance across different experimental ONT datasets. MetaFlye exhibited superior performance with experimental ONT data of full-length or half-length HIV-1 genomes. Strainline demonstrated excellent haplotype recovery with ONT sequencing data of various viral mixtures, including SARS-CoV-2, Poliovirus, HCV, and ZIKV. Additionally, MetaFlye showcased better performance on the metagenomic experimental ONT data of the 200-kb Mpox.</p>
            </sec>
            <sec id="sec18">
                <title>A containerized HIV-64148 pipeline for genomic surveillance</title>
                <p>The HIV-64148 benchmarking pipeline is designed for portability, ensuring it functions across different computational environments with minimal setup. As a result, the containerized HIV-64148 can be executed on either Docker or Singularity platforms. The pipeline offers the flexibility to select from a range of assemblers&#x2014;Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo&#x2014;to best match the computational system or meet specific study requirements. It generates a comprehensive report in HTML format, detailing contigs/haplotype identities, and abundances. Additionally, the report includes protein sequence alignments of drug-targeted HIV-1 genes, alongside profiles of drug resistance and susceptibility associated with these genes. This is achieved by querying the assembled sequence against the Stanford University HIV Drug Resistance Database
                    <sup>
                        <xref ref-type="bibr" rid="ref73">73</xref>
                    </sup>
                    <italic toggle="yes">
                        <sup>&#x2013;</sup>
                    </italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref78">78</xref>
                    </sup> via the provided Sierra web service 2 API, as illustrated in 
                    <xref ref-type="fig" rid="f12">Figure 12</xref>. The downstream analysis pipeline could also serve as a model for implementation with other viruses. The source code for building the containerized pipeline is available in a public code repository: 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/STTLab/HIV-64148">https://github.com/STTLab/HIV-64148</ext-link>.</p>
                <fig fig-type="figure" id="f12" orientation="portrait" position="float">
                    <label>Figure 12. </label>
                    <caption>
                        <title>An implementation of the HIV-64148 pipeline for genomic surveillance.</title>
                        <p>The pipeline comprises three main stages: a read quality control analysis of long-read FASTQ files, followed by assembly using either 
                            <italic toggle="yes">de novo</italic> or reference-based assemblers, and concluding with the identification of HIV-1 subtype and drug resistance analysis.</p>
                    </caption>
                    <graphic id="gr12" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/164057/5f8971bf-6564-4cdf-aa91-e1e23a1e686d_figure12.gif"/>
                </fig>
            </sec>
        </sec>
        <sec id="sec19" sec-type="discussion">
            <title>Discussion</title>
            <p>In this study, we benchmarked seven selected long-read assemblers using both simulated and experimental long-read sequencing data of HIV-1 and other viruses on three computational systems to assess their performances. We demonstrated that only the assembler selection exerts a statistically significant impact on assembly time, with neither CPU nor memory affecting the process.
                <sup>
                    <xref ref-type="bibr" rid="ref79">79</xref>
                </sup> Assembler selection also influences the size of assembled contigs. Additionally, a minimum read length is 2,000-nt, and the 4,000-nt read length results in higher quality assembled contigs. Further investigations revealed that Strainline, MetaFlye, and HaploDMF deliver successful genome assemblies of long-read sequencing data, exhibiting sequence heterogeneity, from both data simulations (i.e., multiple HIV-1 subtypes or CRFs) and available experimental sequencing data of HIV-1 and other viruses.</p>
            <p>Among the 
                <italic toggle="yes">de novo</italic> assemblers, Strainline provides satisfactory assemblies within a reasonable timeframe but requires a large amount of available memory, typically 64GB of RAM or more, exceeding the capacity of a standard computer. MetaFlye, though slower, is suitable for metagenomic sequencing data. GoldRush, the fastest assembler, lacks the capability to assemble alternative contigs and is thus unsuitable for metagenomic sequencing data. Canu efficiently allocates appropriate CPU and memory to each process, resulting in efficient CPU utilization and low memory consumption. Among the reference-based assemblers, iGDA and RVHaplo exhibit similar computational resource usage, with iGDA&#x2019;s runtime comparable to Canu and GoldRush, and RVHaplo&#x2019;s runtime comparable to Strainline. HaploDMF demonstrates considerably longer runtime due to heavy CPU usage during the assembly process, particularly in deep matrix factorization. However, utilizing a compatible GPU could mitigate both the runtime and CPU workload of HaploDMF.
                <sup>
                    <xref ref-type="bibr" rid="ref43">43</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref46">46</xref>
                </sup> Overall, the variations in computational performance among the assemblers emphasize the critical importance of careful assembler selection for the success of analyses tailored to specific scientific projects.</p>
            <p>To improve HIV-1 quasispecies profiling and genomic surveillance, it&#x2019;s essential to discuss the advantages and disadvantages of selected assemblers for handling heterogeneous viral or microbial metagenome sequencing data. Canu and GoldRush are non-strain aware 
                <italic toggle="yes">de novo</italic> assemblers, lacking the ability to distinguish strains within a sample. Using a reference genome may overcome this limitation, but it may not be suitable for genomic surveillance of emerging infectious diseases.
                <sup>
                    <xref ref-type="bibr" rid="ref80">80</xref>
                </sup> MetaFlye is suitable for metagenomic assemblies, employing local k-mer distributions to establish a threshold for global k-mer counting, thus retaining less abundant sequencing reads.
                <sup>
                    <xref ref-type="bibr" rid="ref42">42</xref>
                </sup> Strainline is designed for haplotype reconstruction of diverse viral genomes, treating clusters of long reads as unique scaffolds of different haplotypes and extending scaffolds through multiple iterations of the Overlap-Layout-Consensus (OLC) algorithm.
                <sup>
                    <xref ref-type="bibr" rid="ref28">28</xref>
                </sup> However, this approach may be ineffective with equally long and highly similar reads, resulting in inaccurate estimates of haplotype numbers. For instance, Strainline has been observed to produce numerous duplicate contigs from HIV-1 group M subtype B from 2M HIV-1 mixture and 1M1C HIV-1 mixture.</p>
            <p>High-quality reference sequences are crucial for reference-based assembly. The assemblers use a probabilistic approach to align reads to the reference sequence and identify variants like single nucleotide variants (SNVs), insertions, and deletions for haplotype phasing. Both RVHaplo and HaploDMF employ an SNV frequency matrix for clustering reads of different haplotypes. However, RVHaplo relies on overlapping SNV sites, which may be inadequate for assembling closely related genomes. In contrast, HaploDMF uses a deep matrix factorization model with an adapted loss function to learn and correct latent features from each iteration of SNV detection,
                <sup>
                    <xref ref-type="bibr" rid="ref43">43</xref>
                </sup> enhancing robustness against noise during SNV detection and enabling assembly of closely related strains. iGDA utilizes Adaptive-Nearest Neighbor clustering (ANN) to estimate the number of clusters and, like RVHaplo, uses overlapping SNVs and coverage for contig extension.
                <sup>
                    <xref ref-type="bibr" rid="ref44">44</xref>
                </sup> In general, reference-based assemblers seem suitable only for quasispecies profiling.</p>
            <p>Since the choice of assembler significantly impacts genome assemblies, it is crucial to meet specific hardware system requirements tailored to match the quality of input data, genome sizes of the organisms of interest, and the algorithms used for genome assembly. However, the suitable system requirements have not been explicitly provided. Based on this benchmarking study, the following optimal computational requirements are suggested:
                <list list-type="bullet">
                    <list-item>
                        <label>&#x2022;</label>
                        <p>All assemblers should be executed in a Linux-based operating system.</p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>A quad-core (4 cores) or higher CPU with x86 microarchitecture is recommended. CPUs with ARM-based microarchitecture (e.g., Apple Silicon) may require specific compiling or hand-tuned code with ARM instructions. Alternatively, a properly optimized code can achieve performance comparable to native x86 systems through dynamic binary translation (e.g., Rosetta).
                            <sup>
                                <xref ref-type="bibr" rid="ref81">81</xref>
                            </sup>
                        </p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>An 8 GB memory is sufficient for either viral or bacterial genome assembly, while a 16 GB RAM or higher is recommended for larger genomes, such as human or other mammals. It&#x2019;s worth noting that this study utilized 96 GB memory for all assemblers.</p>
                    </list-item>
                </list>
            </p>
            <p>An additional consideration is the balance between CPU and memory, as demonstrated by comparing a workstation with a generic home PC. An interesting finding is that despite both systems having the same number of CPU cores, the home PC took slightly longer, and its CPUs were more overutilized than those on the workstation. This disparity is attributed to the home PC needing to perform memory management more frequently, thereby adding load to the CPU and slowing down the process.
                <sup>
                    <xref ref-type="bibr" rid="ref82">82</xref>
                </sup>
            </p>
            <p>In addition, the impact of virtualization and containerization layering during benchmarking was disregarded. Consequently, the wall clock time measured from a server was longer than that measured from a workstation desktop. It is possible that the software spends more time traversing layers of virtualization instead of running on a bare-metal environment (i.e., installed the software directly on the machine without containerization).
                <sup>
                    <xref ref-type="bibr" rid="ref83">83</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref85">85</xref>
                </sup> Also worth mentioning, from a computer science perspective, it is expected that when evaluating the efficiency of algorithms used by each assembler, the concept of computational complexity should be applied to provide a better estimation of computational resource requirements and runtime on a more diverse set of data i.e. predicting computing time of dataset with different depth of coverage, read length and assembly complexity.
                <sup>
                    <xref ref-type="bibr" rid="ref79">79</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref86">86</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref87">87</xref>
                </sup> While acknowledging this consideration, it is important to note that conducting a benchmark on real machines provides practical insights and actionable data on a specific use case, despite potential confounders introduced by the machine itself or from software virtualization.</p>
            <p>Insights on computational requirements for current long-read assemblers offer guidance for implementing edge computing in molecular surveillance and epidemiology in bioinformatics and virology. Long-read sequencing data of HIV-1 used for benchmarking indicates that a containerized HIV-64148 pipeline can function with sub-optimal computational infrastructures like workstation PCs or generic home PCs. This pipeline has the potential to support and expand HIV-1 surveillance networks, benefiting scientists, epidemiologists, healthcare providers, and people living with HIV.</p>
            <sec id="sec20">
                <title>Ethical considerations</title>
                <p>Not applicable.</p>
            </sec>
        </sec>
    </body>
    <back>
        <sec id="sec25" sec-type="data-availability">
            <title>Data availability</title>
            <sec id="sec26">
                <title>Underlying data</title>
                <p>
Figshare: HIV64148 TABLE_S6_HIV_Combinations_DataSimulation 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.25436023">https://doi.org/10.6084/m9.figshare.25436023</ext-link>

                    <sup>

                        <xref ref-type="bibr" rid="ref54">54</xref>
</sup> all accession numbers are provided in Table S6</p>
                <p>Figshare: HIV64148 Template FASTA. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.25435774">https://doi.org/10.6084/m9.figshare.25435774</ext-link>

                    <sup>

                        <xref ref-type="bibr" rid="ref59">59</xref>
</sup>
                </p>
                <p>This archive contains reference sequences in FASTA format used as templates to simulated Oxford Nanopore reads Each template is located within a subfolder corresponding to its simulation number.</p>
                <p>Figshare: HIV64148 Simulated FASTQ. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.25435822">https://doi.org/10.6084/m9.figshare.25435822</ext-link>

                    <sup>

                        <xref ref-type="bibr" rid="ref60">60</xref>
</sup>
                </p>
                <p>This archive contains simulated reads from NanoSim with template FASTA. Each FASTQ is located within a subfolder corresponding to its simulation number.</p>
                <p>Figshare: HIV64148 QC Simulated FASTQ. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.25435912">https://doi.org/10.6084/m9.figshare.25435912</ext-link>

                    <sup>

                        <xref ref-type="bibr" rid="ref61">61</xref>
</sup>
                </p>
                <p>This archive contains output from NanoPlot describing characteristic of Simulated FASTQ files. Each HTML is located within a subfolder corresponding to its simulation number.</p>
                <p>Figshare: HIV64148 Results. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.25435948">https://doi.org/10.6084/m9.figshare.25435948</ext-link>

                    <sup>

                        <xref ref-type="bibr" rid="ref64">64</xref>
</sup>
                </p>
                <p>This archive contains genome assembly in FASTA, BLAST result in CSV, and HIV64148 report in HTML from 2M, 2C, 1M1C, and 2M1C HIV-1 mixtures, as well as the results from resource usage overtime (CPU, Memory) measured on 3 computational platforms over 7 tools in CSV format.</p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
            </sec>
            <sec id="sec27">
                <title>Extended data</title>
                <p>Figshare: HIV64148 Supplementary data. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.25436023">https://doi.org/10.6084/m9.figshare.25436023</ext-link>

                    <sup>

                        <xref ref-type="bibr" rid="ref54">54</xref>
</sup>
                </p>
                <p>This project contains the following extended data:</p>
                <p>

                    <bold>
Figure S1.</bold> The MetaQUAST assessment evaluated contigs from the assemblers processing four median read-length FASTQ inputs: (
                    <bold>A</bold>) 1,000-nt, (
                    <bold>B</bold>) 2,000-nt, (
                    <bold>C</bold>) 3,000-nt, and (
                    <bold>D</bold>) 4,000-nt (
                    <xref ref-type="table" rid="T3">
Table 3</xref>). The x-axis displays the percentage sequence similarity obtained from a BLAST alignment with the corresponding reference genomes, while the y-axis represents the aligned length, indicating the longest continuous alignment between each contig and its reference genome. Dot sizes indicate the genome fraction (%Ref Aligned), calculated as the ratio of continuous aligned bases to the total reference genome length. Subplots above and beside each figure display histograms of the contig counts. Additionally, (
                    <bold>E</bold>) shows the averaged genome fraction (%) of contigs from different long-read assembler pipelines. The complete data are available in Extended data, Table S9.
                    <sup>

                        <xref ref-type="bibr" rid="ref54">54</xref>
</sup>
                </p>
                <p>

                    <bold>
Figure S2.</bold> The average completeness of major HIV-1 open reading frames (ORFs) of the contigs generated from the assemblers analyzing FASTQ inputs of 4 median read lengths: (
                    <bold>A</bold>) 1,000-nt, (
                    <bold>B</bold>) 2,000-nt, (
                    <bold>C</bold>) 3,000-nt, or (
                    <bold>D</bold>) 4,000-nt.</p>
                <p>

                    <bold>
Figure S3.</bold> Contig size distribution of all assemblers processing the simulated FASTQ inputs of the 4 HIV-1 mixtures.</p>
                <p>

                    <bold>
Table S1.</bold> A list of experimental data and additional information. N/A: not applicable.</p>
                <p>

                    <bold>
Table S2</bold>. A statistical summary of runtime measurements taken from an assembly phase of the pipelines.</p>
                <p>

                    <bold>
Table S3</bold>. The memory usage and maximum CPU utilization of each assembler on three computational systems. The benchmark utilized 100 samples of two HIV-1 group M mixture with 2,000x coverage and 8,000-nt median read length.</p>
                <p>

                    <bold>
Table S4</bold>. The genome statistics of contigs generated from (
                    <bold>A</bold>) Full genome laboratory isolate NL4-3, (
                    <bold>B</bold>) Mpox propagated in a CV-1 cell line and (
                    <bold>C</bold>) Mpox from pustular lesion, and (
                    <bold>D</bold>) SARS-CoV-2 data by the selected assemblers.</p>
                <p>

                    <bold>
Table S5</bold>. The genome statistics of contigs generated from (
                    <bold>A</bold>) Polio virus, (
                    <bold>B</bold>) Hepatitis C virus (HCV), and (
                    <bold>C</bold>) ZIKA Virus (ZIKV) data by the selected assemblers.</p>
                <p>

                    <bold>
Table S6.</bold> HIV Combinations for Data Simulation (Excel)</p>
                <p>

                    <bold>
Table S7.</bold> Characteristics of Simulated Samples Readlength Experiment (Excel)</p>
                <p>

                    <bold>
Table S8.</bold> Characteristics of Simulated Samples HIV mixture Experiment (Excel)</p>
                <p>

                    <bold>
Table S9.</bold> An Assembly Evaluation of HIV-1 Read Length Experiment (Excel)</p>
                <p>

                    <bold>
Table S10.</bold> An Assembly Evaluation of HIV-1 Subtype Mixtures Experiment (Excel)</p>
                <p>

                    <bold>
Table S11.</bold> Subtypes Recall Rates (Excel)</p>
                <p>

                    <bold>
Table S12.</bold> REGA Subtyping (Excel)</p>
                <p>

                    <bold>
Table S13.</bold> 5&#x2019; Half Genomes of NL4-3 Alignment Statistics (Excel)</p>
                <p>

                    <bold>
Table S14.</bold> Other Viruses Alignment Statistics (Excel)</p>
                <p>All high-resolution figures are available in 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.25436125.v1">https://doi.org/10.6084/m9.figshare.25436125.v1</ext-link>.
                    <sup>

                        <xref ref-type="bibr" rid="ref88">88</xref>
</sup>
                </p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
            </sec>
        </sec>
        <sec id="sec21">
            <title>Software and code</title>
            <p>Software and code for this workflow can be found on GitHub: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/STTLab/HIV-64148">https://github.com/STTLab/HIV-64148</ext-link> (License: OSI approved open license software is under GPL-3.0 license.)</p>
            <p>Archived code as at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.25436062">https://doi.org/10.6084/m9.figshare.25436062</ext-link>

                <sup>

                    <xref ref-type="bibr" rid="ref89">89</xref>
</sup>
            </p>
            <p>Code is available under the terms of the 
                <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
        </sec>
        <sec id="sec22">
            <title>Reporting guidelines</title>
            <p>Not applicable.</p>
        </sec>
        <ack>
            <title>Acknowledgements</title>
            <p>The authors would like to thank The ERAWAN HPC Service, Chiang Mai University, Thailand and The Center of Multidisciplinary Technology for Advanced Medicine (CMUTEAM), Faculty of Medicine, Chiang Mai University, Thailand for their supports on computational resources and server. We would like to thank Support the Children Foundation, Chiang Mai, Thailand for financial support. We also would like to thank Yuphin Chromwinya and Somporn Sankonkit of The Immunology Lab, Department of Microbiology, Faculty of Medicine, Chiang Mai University, Thailand for their administrative supports.</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Frescura</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Achieving the 95 95 95 targets for all: A pathway to ending AIDS.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2022</year>;<volume>17</volume>:<fpage>e0272405</fpage>.
                    <pub-id pub-id-type="pmid">35925943</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0272405</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9352102</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="book">
                    <collab>UNAIDS</collab>:
                    <source>

                        <italic toggle="yes">Understanding Fast-Track: accelerating action to end the AIDS epidemic by 2030.</italic>
</source>
                    <publisher-loc>Geneva, Switzerland</publisher-loc>:
                    <publisher-name>(Joint United Nations Programme on HIV/AIDS (UNAIDS)</publisher-name>;<year>2015</year>; vol.<volume>10</volume>.</mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="book">
                    <collab>UNAIDS</collab>:
                    <source>

                        <italic toggle="yes">World AIDS Day Report: Prevailing against pandemics by putting people at the centre.</italic>
</source>
                    <publisher-loc>Geneva, Switzerland</publisher-loc>:
                    <publisher-name>Joint United Nations Programme on HIV/AIDS (UNAIDS)</publisher-name>;<year>2020</year>; vol.<volume>87</volume>.</mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hill</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Toward a global virus genomic surveillance network.</article-title>
                    <source>

                        <italic toggle="yes">Cell Host Microbe.</italic>
</source>
                    <year>2023</year>;<volume>31</volume>:<fpage>861</fpage>&#x2013;<lpage>873</lpage>.
                    <pub-id pub-id-type="pmid">36921604</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.chom.2023.03.003</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9986120</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="other">
                    <collab>(EPP), E. a. P. P. a. P. &amp; Global Genomic Surveillance, R. f. H. R</collab>:
                    <article-title>Global genomic surveillance strategy for pathogens with pandemic and epidemic potential 2022&#x2013;2032: progress report on the first year of implementation.</article-title>
                    <year>2023</year>.</mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Metzner</surname>
                            <given-names>KJ</given-names>
                        </name>
</person-group>:
                    <article-title>HIV Whole-Genome Sequencing Now: Answering Still-Open Questions.</article-title>
                    <source>

                        <italic toggle="yes">J. Clin. Microbiol.</italic>
</source>
                    <year>2016</year>;<volume>54</volume>:<fpage>834</fpage>&#x2013;<lpage>835</lpage>.
                    <pub-id pub-id-type="pmid">26791367</pub-id>
                    <pub-id pub-id-type="doi">10.1128/JCM.03265-15</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4809914</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wittwer</surname>
                            <given-names>CT</given-names>
                        </name>
</person-group>:
                    <article-title>Portable Nanopore Sequencing for Viral Surveillance.</article-title>
                    <source>

                        <italic toggle="yes">Clin. Chem.</italic>
</source>
                    <year>2016</year>;<volume>62</volume>:<fpage>1427</fpage>&#x2013;<lpage>1429</lpage>.
                    <pub-id pub-id-type="doi">10.1373/clinchem.2016.256693</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Foster-Nyarko</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Nanopore-only assemblies for genomic surveillance of the global priority drug-resistant pathogen, Klebsiella pneumoniae.</article-title>
                    <source>

                        <italic toggle="yes">Microb Genom.</italic>
</source>
                    <year>2023</year>;<volume>9</volume>.
                    <pub-id pub-id-type="pmid">36752781</pub-id>
                    <pub-id pub-id-type="doi">10.1099/mgen.0.000936</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9997738</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wheeler</surname>
                            <given-names>NE</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Innovations in genomic antimicrobial resistance surveillance.</article-title>
                    <source>

                        <italic toggle="yes">Lancet Microbe.</italic>
</source>
                    <year>2023</year>;<volume>4</volume>:<fpage>e1063</fpage>&#x2013;<lpage>e1070</lpage>.
                    <pub-id pub-id-type="doi">10.1016/S2666-5247(23)00285-9</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Holzschuh</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Using a mobile nanopore sequencing lab for end-to-end genomic surveillance of Plasmodium falciparum: A feasibility study.</article-title>
                    <source>

                        <italic toggle="yes">PLOS Glob. Public Health.</italic>
</source>
                    <year>2024</year>;<volume>4</volume>:<fpage>e0002743</fpage>.
                    <pub-id pub-id-type="pmid">38300956</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pgph.0002743</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10833559</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Baltimore</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>Viral RNA-dependent DNA Polymerase: RNA-dependent DNA Polymerase in Virions of RNA Tumour Viruses.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>1970</year>;<volume>226</volume>:<fpage>1209</fpage>&#x2013;<lpage>1211</lpage>.
                    <pub-id pub-id-type="doi">10.1038/2261209a0</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Roberts</surname>
                            <given-names>JD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bebenek</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kunkel</surname>
                            <given-names>TA</given-names>
                        </name>
</person-group>:
                    <article-title>The Accuracy of Reverse Transcriptase from HIV-1.</article-title>
                    <source>

                        <italic toggle="yes">Science.</italic>
</source>
                    <year>1988</year>;<volume>242</volume>:<fpage>1171</fpage>&#x2013;<lpage>1173</lpage>.
                    <pub-id pub-id-type="doi">10.1126/science.2460925</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Preston</surname>
                            <given-names>BD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Poiesz</surname>
                            <given-names>BJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Loeb</surname>
                            <given-names>LA</given-names>
                        </name>
</person-group>:
                    <article-title>Fidelity of HIV-1 Reverse Transcriptase.</article-title>
                    <source>

                        <italic toggle="yes">Science.</italic>
</source>
                    <year>1988</year>;<volume>242</volume>:<fpage>1168</fpage>&#x2013;<lpage>1171</lpage>.
                    <pub-id pub-id-type="doi">10.1126/science.2460924</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Das</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Arnold</surname>
                            <given-names>E</given-names>
                        </name>
</person-group>:
                    <article-title>HIV-1 reverse transcriptase and antiviral drug resistance. Part 1.</article-title>
                    <source>

                        <italic toggle="yes">Curr. Opin. Virol.</italic>
</source>
                    <year>2013</year>;<volume>3</volume>:<fpage>111</fpage>&#x2013;<lpage>118</lpage>.
                    <pub-id pub-id-type="pmid">23602471</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.coviro.2013.03.012</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4097814</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Johnson</surname>
                            <given-names>WE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Desrosiers</surname>
                            <given-names>RC</given-names>
                        </name>
</person-group>:
                    <article-title>Viral Persistence: HIV&#x2019;s Strategies of Immune System Evasion.</article-title>
                    <source>

                        <italic toggle="yes">Annu. Rev. Med.</italic>
</source>
                    <year>2002</year>;<volume>53</volume>:<fpage>499</fpage>&#x2013;<lpage>518</lpage>.
                    <pub-id pub-id-type="pmid">11818487</pub-id>
                    <pub-id pub-id-type="doi">10.1146/annurev.med.53.082901.104053</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kirchhoff</surname>
                            <given-names>F</given-names>
                        </name>
</person-group>:
                    <article-title>Immune Evasion and Counteraction of Restriction Factors by HIV-1 and Other Primate Lentiviruses.</article-title>
                    <source>

                        <italic toggle="yes">Cell Host Microbe.</italic>
</source>
                    <year>2010</year>;<volume>8</volume>:<fpage>55</fpage>&#x2013;<lpage>67</lpage>.
                    <pub-id pub-id-type="pmid">20638642</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.chom.2010.06.004</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Greninger</surname>
                            <given-names>AL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis.</article-title>
                    <source>

                        <italic toggle="yes">Genome Med.</italic>
</source>
                    <year>2015</year>;<volume>7</volume>:<fpage>99</fpage>.
                    <pub-id pub-id-type="pmid">26416663</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13073-015-0220-9</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4587849</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rambaut</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Posada</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Crandall</surname>
                            <given-names>KA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The causes and consequences of HIV evolution.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Rev. Genet.</italic>
</source>
                    <year>2004</year>;<volume>5</volume>:<fpage>52</fpage>&#x2013;<lpage>61</lpage>.
                    <pub-id pub-id-type="doi">10.1038/nrg1246</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Blackard</surname>
                            <given-names>JT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cohen</surname>
                            <given-names>DE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mayer</surname>
                            <given-names>KH</given-names>
                        </name>
</person-group>:
                    <article-title>Human Immunodeficiency Virus Superinfection and Recombination: Current State of Knowledge and Potential Clinical Consequences.</article-title>
                    <source>

                        <italic toggle="yes">Clin. Infect. Dis.</italic>
</source>
                    <year>2002</year>;<volume>34</volume>:<fpage>1108</fpage>&#x2013;<lpage>1114</lpage>.
                    <pub-id pub-id-type="pmid">11915000</pub-id>
                    <pub-id pub-id-type="doi">10.1086/339547</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pandit</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>De Boer</surname>
                            <given-names>RJ</given-names>
                        </name>
</person-group>:
                    <article-title>Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants.</article-title>
                    <source>

                        <italic toggle="yes">Retrovirology.</italic>
</source>
                    <year>2014</year>;<volume>11</volume>:<fpage>56</fpage>.
                    <pub-id pub-id-type="pmid">24996694</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1742-4690-11-56</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4227095</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Deeks</surname>
                            <given-names>SG</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Research priorities for an HIV cure: International AIDS Society Global Scientific Strategy 2021.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Med.</italic>
</source>
                    <year>2021</year>;<volume>27</volume>:<fpage>2085</fpage>&#x2013;<lpage>2098</lpage>.
                    <pub-id pub-id-type="pmid">34848888</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41591-021-01590-5</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Monaco</surname>
                            <given-names>DC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zapata</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hunter</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Resistance profile of HIV-1 quasispecies in patients under treatment failure using single molecule, real-time sequencing.</article-title>
                    <source>

                        <italic toggle="yes">AIDS.</italic>
</source>
                    <year>2020</year>;<volume>34</volume>:<fpage>2201</fpage>&#x2013;<lpage>2210</lpage>.
                    <pub-id pub-id-type="pmid">33196493</pub-id>
                    <pub-id pub-id-type="doi">10.1097/QAD.0000000000002697</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gaudin</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Desnues</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <article-title>Hybrid Capture-Based Next Generation Sequencing and Its Application to Human Infectious Diseases.</article-title>
                    <source>

                        <italic toggle="yes">Front. Microbiol.</italic>
</source>
                    <year>2018</year>;<volume>9</volume>:<fpage>2924</fpage>.
                    <pub-id pub-id-type="pmid">30542340</pub-id>
                    <pub-id pub-id-type="doi">10.3389/fmicb.2018.02924</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6277869</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lauring</surname>
                            <given-names>AS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Andino</surname>
                            <given-names>R</given-names>
                        </name>
</person-group>:
                    <article-title>Quasispecies Theory and the Behavior of RNA Viruses.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Pathog.</italic>
</source>
                    <year>2010</year>;<volume>6</volume>:<fpage>e1001005</fpage>.
                    <pub-id pub-id-type="pmid">20661479</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.ppat.1001005</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2908548</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bonsall</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A comprehensive genomics solution for HIV surveillance and clinical monitoring in a global health setting.</article-title>
                    <source>

                        <italic toggle="yes">Genomics.</italic>
</source>
                    <year>2018</year>.</mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Frishman</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marz</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>, editors:
                    <source>

                        <italic toggle="yes">Virus Bioinformatics.</italic>
</source>
                    <edition>1st ed.</edition>
                    <publisher-name>Chapman and Hall/CRC</publisher-name>;<year>2021</year>.
                    <pub-id pub-id-type="doi">10.1201/9781003097679</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Udaondo</surname>
                            <given-names>Z</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Comparative Analysis of PacBio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus monodon.</article-title>
                    <source>

                        <italic toggle="yes">Life.</italic>
</source>
                    <year>2021</year>;<volume>11</volume>:<fpage>862</fpage>.
                    <pub-id pub-id-type="pmid">34440606</pub-id>
                    <pub-id pub-id-type="doi">10.3390/life11080862</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8399832</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Luo</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kang</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sch&#x00f6;nhuth</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Strainline: full-length de novo viral haplotype reconstruction from noisy long reads.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2022</year>;<volume>23</volume>:<fpage>29</fpage>.
                    <pub-id pub-id-type="pmid">35057847</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-021-02587-6</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8771625</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yamashita</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Single-molecular real-time deep sequencing reveals the dynamics of multi-drug resistant haplotypes and structural variations in the hepatitis C virus genome.</article-title>
                    <source>

                        <italic toggle="yes">Sci. Rep.</italic>
</source>
                    <year>2020</year>;<volume>10</volume>:<fpage>2651</fpage>.
                    <pub-id pub-id-type="pmid">32060395</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41598-020-59397-2</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7021670</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sahlin</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Medvedev</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2021</year>;<volume>12</volume>:<fpage>2</fpage>.
                    <pub-id pub-id-type="pmid">33397972</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-020-20340-8</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7782715</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref31">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jain</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Aluru</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>A comprehensive evaluation of long read error correction methods.</article-title>
                    <source>

                        <italic toggle="yes">BMC Genomics.</italic>
</source>
                    <year>2020</year>;<volume>21</volume>:<fpage>889</fpage>.
                    <pub-id pub-id-type="pmid">33349243</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12864-020-07227-0</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7751105</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Link</surname>
                            <given-names>RW</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>HIV-Quasipore: A Suite of HIV-1-Specific Nanopore Basecallers Designed to Enhance Viral Quasispecies Detection.</article-title>
                    <source>

                        <italic toggle="yes">Front. Virol.</italic>
</source>
                    <year>2022</year>;<volume>2</volume>:<fpage>858375</fpage>.
                    <pub-id pub-id-type="doi">10.3389/fviro.2022.858375</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nguyen Quang</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Dynamic nanopore long-read sequencing analysis of HIV-1 splicing events during the early steps of infection.</article-title>
                    <source>

                        <italic toggle="yes">Retrovirology.</italic>
</source>
                    <year>2020</year>;<volume>17</volume>:<fpage>25</fpage>.
                    <pub-id pub-id-type="pmid">32807178</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12977-020-00533-1</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7433067</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Efficient assembly of nanopore reads via highly accurate and intact error correction.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2021</year>;<volume>12</volume>:<fpage>60</fpage>.
                    <pub-id pub-id-type="pmid">33397900</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-020-20236-7</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7782737</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Karst</surname>
                            <given-names>SM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Methods.</italic>
</source>
                    <year>2021</year>;<volume>18</volume>:<fpage>165</fpage>&#x2013;<lpage>169</lpage>.
                    <pub-id pub-id-type="doi">10.1038/s41592-020-01041-y</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref36">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ni</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Simeneh</surname>
                            <given-names>ZM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Benchmarking of Nanopore R10.4 and R9.4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing.</article-title>
                    <source>

                        <italic toggle="yes">Comput. Struct. Biotechnol. J.</italic>
</source>
                    <year>2023</year>;<volume>21</volume>:<fpage>2352</fpage>&#x2013;<lpage>2364</lpage>.
                    <pub-id pub-id-type="pmid">37025654</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.csbj.2023.03.038</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10070092</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref37">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Luo</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kang</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sch&#x00f6;nhuth</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>VeChat: correcting errors in long reads using variation graphs.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2022</year>;<volume>13</volume>:<fpage>6657</fpage>.
                    <pub-id pub-id-type="pmid">36333324</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-022-34381-8</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9636371</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref38">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>Q</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Recent advances in sequence assembly: principles and applications.</article-title>
                    <source>

                        <italic toggle="yes">Brief. Funct. Genomics.</italic>
</source>
                    <year>2017</year>;<volume>16</volume>:<fpage>361</fpage>&#x2013;<lpage>378</lpage>.
                    <pub-id pub-id-type="pmid">28453648</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bfgp/elx006</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref39">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xiao</surname>
                            <given-names>CL</given-names>
                        </name>
</person-group>:
                    <article-title>A survey on 
                        <italic toggle="yes">de novo</italic> assembly methods for single-molecular sequencing.</article-title>
                    <source>

                        <italic toggle="yes">Quant. Biol.</italic>
</source>
                    <year>2020</year>;<volume>8</volume>:<fpage>203</fpage>&#x2013;<lpage>215</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s40484-020-0214-5</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref40">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Eliseev</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evaluation of haplotype callers for next-generation sequencing of viruses.</article-title>
                    <source>

                        <italic toggle="yes">Infect. Genet. Evol.</italic>
</source>
                    <year>2020</year>;<volume>82</volume>:<fpage>104277</fpage>.
                    <pub-id pub-id-type="pmid">32151775</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.meegid.2020.104277</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7293574</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref41">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Luo</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kang</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sch&#x00f6;nhuth</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Enhancing Long-Read-Based Strain-Aware Metagenome Assembly.</article-title>
                    <source>

                        <italic toggle="yes">Front. Genet.</italic>
</source>
                    <year>2022</year>;<volume>13</volume>:<fpage>868280</fpage>.
                    <pub-id pub-id-type="pmid">35646097</pub-id>
                    <pub-id pub-id-type="doi">10.3389/fgene.2022.868280</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9136235</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref42">
                <label>42</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kolmogorov</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>metaFlye: scalable long-read metagenome assembly using repeat graphs.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Methods.</italic>
</source>
                    <year>2020</year>;<volume>17</volume>:<fpage>1103</fpage>&#x2013;<lpage>1110</lpage>.
                    <pub-id pub-id-type="pmid">33020656</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41592-020-00971-x</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10699202</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref43">
                <label>43</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cai</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shang</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sun</surname>
                            <given-names>Y</given-names>
                        </name>
</person-group>:
                    <article-title>HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2022</year>;<volume>38</volume>:<fpage>5360</fpage>&#x2013;<lpage>5367</lpage>.
                    <pub-id pub-id-type="pmid">36308467</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btac708</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9750122</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref44">
                <label>44</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Feng</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Clemente</surname>
                            <given-names>JC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wong</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Detecting and phasing minor single-nucleotide variants from long-read sequencing data.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2021</year>;<volume>12</volume>:<fpage>3032</fpage>.
                    <pub-id pub-id-type="pmid">34031367</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-021-23289-4</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8144375</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref45">
                <label>45</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cai</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sun</surname>
                            <given-names>Y</given-names>
                        </name>
</person-group>:
                    <article-title>Reconstructing viral haplotypes using long reads.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2022</year>;<volume>38</volume>:<fpage>2127</fpage>&#x2013;<lpage>2134</lpage>.
                    <pub-id pub-id-type="pmid">35157018</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btac089</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref46">
                <label>46</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chan</surname>
                            <given-names>DM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rao</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huang</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>GPU accelerated t-distributed stochastic neighbor embedding.</article-title>
                    <source>

                        <italic toggle="yes">J. Parallel Distrib. Comput.</italic>
</source>
                    <year>2019</year>;<volume>131</volume>:<fpage>1</fpage>&#x2013;<lpage>13</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.jpdc.2019.04.008</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref47">
                <label>47</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Koren</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Canu: scalable and accurate long-read assembly via adaptive 
                        <italic toggle="yes">k</italic>-mer weighting and repeat separation.</article-title>
                    <source>

                        <italic toggle="yes">Genome Res.</italic>
</source>
                    <year>2017</year>;<volume>27</volume>:<fpage>722</fpage>&#x2013;<lpage>736</lpage>.
                    <pub-id pub-id-type="pmid">28298431</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.215087.116</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5411767</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref48">
                <label>48</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wong</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Linear time complexity de novo long read genome assembly with GoldRush.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2023</year>;<volume>14</volume>:<fpage>2906</fpage>.
                    <pub-id pub-id-type="pmid">37217507</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-023-38716-x</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10202940</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref49">
                <label>49</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Van Der Walt</surname>
                            <given-names>AJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Assembling metagenomes, one community at a time.</article-title>
                    <source>

                        <italic toggle="yes">BMC Genomics.</italic>
</source>
                    <year>2017</year>;<volume>18</volume>:<fpage>521</fpage>.
                    <pub-id pub-id-type="pmid">28693474</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12864-017-3918-9</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5502489</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref50">
                <label>50</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hu</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Challenges in Bioinformatics Workflows for Processing Microbiome Omics Data at Scale.</article-title>
                    <source>

                        <italic toggle="yes">Front. Bioinform.</italic>
</source>
                    <year>2022</year>;<volume>1</volume>:<fpage>826370</fpage>.
                    <pub-id pub-id-type="pmid">36303775</pub-id>
                    <pub-id pub-id-type="doi">10.3389/fbinf.2021.826370</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9580927</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref51">
                <label>51</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>De Coster</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>D&#x2019;Hert</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schultz</surname>
                            <given-names>DT</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>NanoPack: visualizing and processing long-read sequencing data.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2018</year>;<volume>34</volume>:<fpage>2666</fpage>&#x2013;<lpage>2669</lpage>.
                    <pub-id pub-id-type="pmid">29547981</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bty149</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6061794</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref52">
                <label>52</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Verma</surname>
                            <given-names>SB</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pandey</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kumar Gupta</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>Containerization and its Architectures: A Study.</article-title>
                    <source>

                        <italic toggle="yes">ADCAIJ.</italic>
</source>
                    <year>2023</year>;<volume>11</volume>:<fpage>395</fpage>&#x2013;<lpage>409</lpage>.
                    <pub-id pub-id-type="doi">10.14201/adcaij.28351</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref53">
                <label>53</label>
                <mixed-citation publication-type="book">
                    <source>

                        <italic toggle="yes">HIV Sequence Compendium 2021.</italic>
</source>
                    <publisher-name>Los Alamos National Laboratory, Theoretical Biology and Biophysics</publisher-name>;<year>2023</year>.</mixed-citation>
            </ref>
            <ref id="ref54">
                <label>54</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tongjai</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wattanasombat</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>figshare.</article-title>
                    <year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref55">
                <label>55</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yang</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim.</article-title>
                    <source>

                        <italic toggle="yes">GigaScience.</italic>
</source>
                    <year>2023</year>;<volume>12</volume>:<fpage>giad013</fpage>.
                    <pub-id pub-id-type="pmid">36939007</pub-id>
                    <pub-id pub-id-type="doi">10.1093/gigascience/giad013</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10025935</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref56">
                <label>56</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yang</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chu</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Warren</surname>
                            <given-names>RL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>NanoSim: nanopore sequence read simulator based on statistical characterization.</article-title>
                    <source>

                        <italic toggle="yes">GigaScience.</italic>
</source>
                    <year>2017</year>;<volume>6</volume>:<fpage>1</fpage>&#x2013;<lpage>6</lpage>.
                    <pub-id pub-id-type="pmid">28327957</pub-id>
                    <pub-id pub-id-type="doi">10.1093/gigascience/gix010</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5530317</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref57">
                <label>57</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wright</surname>
                            <given-names>IA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>NanoHIV: A Bioinformatics Pipeline for Producing Accurate, Near Full-Length HIV Proviral Genomes Sequenced Using the Oxford Nanopore Technology.</article-title>
                    <source>

                        <italic toggle="yes">Cells.</italic>
</source>
                    <year>2021</year>;<volume>10</volume>:<fpage>2577</fpage>.
                    <pub-id pub-id-type="pmid">34685559</pub-id>
                    <pub-id pub-id-type="doi">10.3390/cells10102577</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8534097</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref58">
                <label>58</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ng</surname>
                            <given-names>TT-L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Long-Read Sequencing with Hierarchical Clustering for Antiretroviral Resistance Profiling of Mixed Human Immunodeficiency Virus Quasispecies.</article-title>
                    <source>

                        <italic toggle="yes">Clin. Chem.</italic>
</source>
                    <year>2023</year>;<volume>69</volume>:<fpage>1174</fpage>&#x2013;<lpage>1185</lpage>.
                    <pub-id pub-id-type="pmid">37537871</pub-id>
                    <pub-id pub-id-type="doi">10.1093/clinchem/hvad108</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref59">
                <label>59</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tongjai</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wattanasombat</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>figshare.</article-title>
                    <year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref60">
                <label>60</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tongjai</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wattanasombat</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>figshare.</article-title>
                    <year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref61">
                <label>61</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tongjai</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wattanasombat</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>figshare.</article-title>
                    <year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref62">
                <label>62</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Camacho</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>BLAST+: architecture and applications.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2009</year>;<volume>10</volume>:<fpage>421</fpage>.
                    <pub-id pub-id-type="pmid">20003500</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-10-421</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2803857</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref63">
                <label>63</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mikheenko</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Saveliev</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gurevich</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>MetaQUAST: evaluation of metagenome assemblies.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2016</year>;<volume>32</volume>:<fpage>1088</fpage>&#x2013;<lpage>1090</lpage>.
                    <pub-id pub-id-type="pmid">26614127</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btv697</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref64">
                <label>64</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tongjai</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wattanasombat</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>figshare.</article-title>
                    <year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref65">
                <label>65</label>
                <mixed-citation publication-type="other">
                    <collab>jamovi</collab>:<year>2023</year>.</mixed-citation>
            </ref>
            <ref id="ref66">
                <label>66</label>
                <mixed-citation publication-type="book">
                    <source>

                        <italic toggle="yes">R: A Language and environment for statistical computing.</italic>
</source>
                    <publisher-name>R Foundation for Statistical Computing</publisher-name>;<year>2022</year>.</mixed-citation>
            </ref>
            <ref id="ref67">
                <label>67</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mori</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Nanopore Sequencing for Characterization of HIV-1 Recombinant Forms.</article-title>
                    <source>

                        <italic toggle="yes">Microbiol Spectr.</italic>
</source>
                    <year>2022</year>;<volume>10</volume>:<fpage>e0150722</fpage>&#x2013;<lpage>e0101522</lpage>.
                    <pub-id pub-id-type="pmid">35894615</pub-id>
                    <pub-id pub-id-type="doi">10.1128/spectrum.01507-22</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9431566</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref68">
                <label>68</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pineda-Pe&#x00f1;a</surname>
                            <given-names>A-C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: Performance evaluation of the new REGA version 3 and seven other tools.</article-title>
                    <source>

                        <italic toggle="yes">Infect. Genet. Evol.</italic>
</source>
                    <year>2013</year>;<volume>19</volume>:<fpage>337</fpage>&#x2013;<lpage>348</lpage>.
                    <pub-id pub-id-type="pmid">23660484</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.meegid.2013.04.032</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref69">
                <label>69</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bohn</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gribling-Burrer</surname>
                            <given-names>A-S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ambi</surname>
                            <given-names>UB</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Nano-DMS-MaP allows isoform-specific RNA structure determination.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Methods.</italic>
</source>
                    <year>2023</year>;<volume>20</volume>:<fpage>849</fpage>&#x2013;<lpage>859</lpage>.
                    <pub-id pub-id-type="pmid">37106231</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41592-023-01862-7</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10250195</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref70">
                <label>70</label>
                <mixed-citation publication-type="other">
                    <collab>University of, S</collab>:
                    <article-title>BioProject, PRJEB56841.</article-title>
                    <year>2020</year>.</mixed-citation>
            </ref>
            <ref id="ref71">
                <label>71</label>
                <mixed-citation publication-type="other">
                    <collab>Institut, P</collab>:
                    <article-title>BioProject, PRJNA762014.</article-title>
                    <year>2021</year>.</mixed-citation>
            </ref>
            <ref id="ref72">
                <label>72</label>
                <mixed-citation publication-type="other">
                    <collab>University of, M</collab>:
                    <article-title>BioProject, PRJNA608224.</article-title>
                    <year>2020</year>.</mixed-citation>
            </ref>
            <ref id="ref73">
                <label>73</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rhee</surname>
                            <given-names>SY</given-names>
                        </name>
</person-group>:
                    <article-title>Human immunodeficiency virus reverse transcriptase and protease sequence database.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2003</year>;<volume>31</volume>:<fpage>298</fpage>&#x2013;<lpage>303</lpage>.
                    <pub-id pub-id-type="pmid">12520007</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkg100</pub-id>
                    <pub-id pub-id-type="pmcid">PMC165547</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref74">
                <label>74</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>TF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shafer</surname>
                            <given-names>RW</given-names>
                        </name>
</person-group>:
                    <article-title>Web Resources for HIV Type 1 Genotypic-Resistance Test Interpretation.</article-title>
                    <source>

                        <italic toggle="yes">Clin. Infect. Dis.</italic>
</source>
                    <year>2006</year>;<volume>42</volume>:<fpage>1608</fpage>&#x2013;<lpage>1618</lpage>.
                    <pub-id pub-id-type="pmid">16652319</pub-id>
                    <pub-id pub-id-type="doi">10.1086/503914</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2547473</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref75">
                <label>75</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shafer</surname>
                            <given-names>RW</given-names>
                        </name>
</person-group>:
                    <article-title>Rationale and Uses of a Public HIV Drug-Resistance Database.</article-title>
                    <source>

                        <italic toggle="yes">J. Infect. Dis.</italic>
</source>
                    <year>2006</year>;<volume>194</volume>:<fpage>S51</fpage>&#x2013;<lpage>S58</lpage>.
                    <pub-id pub-id-type="pmid">16921473</pub-id>
                    <pub-id pub-id-type="doi">10.1086/505356</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2614864</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref76">
                <label>76</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rhee</surname>
                            <given-names>S-Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>HIV-1 pol mutation frequency by subtype and treatment experience: extension of the HIVseq program to seven non-B subtypes.</article-title>
                    <source>

                        <italic toggle="yes">AIDS.</italic>
</source>
                    <year>2006</year>;<volume>20</volume>:<fpage>643</fpage>&#x2013;<lpage>651</lpage>.
                    <pub-id pub-id-type="pmid">16514293</pub-id>
                    <pub-id pub-id-type="doi">10.1097/01.aids.0000216363.36786.2b</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2551321</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref77">
                <label>77</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shafer</surname>
                            <given-names>RW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jung</surname>
                            <given-names>DR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Betts</surname>
                            <given-names>BJ</given-names>
                        </name>
</person-group>:
                    <article-title>Human immunodeficiency virus type 1 reverse transcriptase and protease mutation search engine for queries.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Med.</italic>
</source>
                    <year>2000</year>;<volume>6</volume>:<fpage>1290</fpage>&#x2013;<lpage>1292</lpage>.
                    <pub-id pub-id-type="pmid">11062545</pub-id>
                    <pub-id pub-id-type="doi">10.1038/81407</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2582445</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref78">
                <label>78</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gifford</surname>
                            <given-names>RJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The calibrated population resistance tool: standardized genotypic estimation of transmitted HIV-1 drug resistance.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2009</year>;<volume>25</volume>:<fpage>1197</fpage>&#x2013;<lpage>1198</lpage>.
                    <pub-id pub-id-type="pmid">19304876</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btp134</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2672634</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref79">
                <label>79</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hanussek</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bartusch</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kr&#x00fc;ger</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Comput. Biol.</italic>
</source>
                    <year>2021</year>;<volume>17</volume>:<fpage>e1009244</fpage>.
                    <pub-id pub-id-type="pmid">34283824</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pcbi.1009244</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8323933</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref80">
                <label>80</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Anyansi</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Straub</surname>
                            <given-names>TJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Manson</surname>
                            <given-names>AL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Computational Methods for Strain-Level Microbial Detection in Colony and Metagenome Sequencing Data.</article-title>
                    <source>

                        <italic toggle="yes">Front. Microbiol.</italic>
</source>
                    <year>2020</year>;<volume>11</volume>:<fpage>1925</fpage>.
                    <pub-id pub-id-type="doi">10.3389/fmicb.2020.01925</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref81">
                <label>81</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Poje</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brcic</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Knezovic</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>First Steps towards Efficient Genome Assembly on ARM-Based HPC.</article-title>
                    <source>

                        <italic toggle="yes">Electronics.</italic>
</source>
                    <year>2023</year>;<volume>13</volume>:<fpage>39</fpage>.
                    <pub-id pub-id-type="doi">10.3390/electronics13010039</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref82">
                <label>82</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dika</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Prevalla</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Electronics Engineers in Israel (IEEEI 2010).</italic>
</source>
                    <publisher-name>IEEE</publisher-name>;
000015-000018.</mixed-citation>
            </ref>
            <ref id="ref83">
                <label>83</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Popiolek</surname>
                            <given-names>PF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mendizabal</surname>
                            <given-names>OM</given-names>
                        </name>
</person-group>:
                    <article-title>Monitoring and analysis of performance impact in virtualized environments.</article-title>
                    <source>

                        <italic toggle="yes">J. Appl. Comput. Res.</italic>
</source>
                    <year>2013</year>;<volume>2</volume>:<fpage>75</fpage>&#x2013;<lpage>82</lpage>.
                    <pub-id pub-id-type="doi">10.4013/jacr.2012.22.03</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref84">
                <label>84</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Al-hamouri</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Al-Jarrah</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Al-Sharif</surname>
                            <given-names>ZA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <source>

                        <italic toggle="yes">2020 Seventh International Conference on Software Defined Systems (SDS).</italic>
</source>
                    <publisher-name>IEEE</publisher-name>;<year>2020</year>; pp.<fpage>131</fpage>&#x2013;<lpage>138</lpage>.</mixed-citation>
            </ref>
            <ref id="ref85">
                <label>85</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chung</surname>
                            <given-names>MT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Quang-Hung</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nguyen</surname>
                            <given-names>M-T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <source>

                        <italic toggle="yes">2016 IEEE Sixth International Conference on Communications and Electronics (ICCE).</italic>
</source>
                    <publisher-name>IEEE</publisher-name>;<year>2016</year>; pp.<fpage>52</fpage>&#x2013;<lpage>57</lpage>.</mixed-citation>
            </ref>
            <ref id="ref86">
                <label>86</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bielecki</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>&#x015a;mia&#x0142;ek</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Estimation of execution time for computing tasks.</article-title>
                    <source>

                        <italic toggle="yes">Clust. Comput.</italic>
</source>
                    <year>2023</year>;<volume>26</volume>:<fpage>3943</fpage>&#x2013;<lpage>3956</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s10586-022-03774-1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref87">
                <label>87</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sikka</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Learning based Methods for Code Runtime Complexity Prediction.</article-title>
                    <year>2019</year>.
                    <pub-id pub-id-type="doi">10.48550/ARXIV.1911.01155</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref88">
                <label>88</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tongjai</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wattanasombat</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>figshare.</article-title>
                    <year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref89">
                <label>89</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tongjai</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wattanasombat</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>figshare.</article-title>
                    <year>2024</year>.</mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report297759">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.164057.r297759</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Sriswasdi</surname>
                        <given-names>Sira</given-names>
                    </name>
                    <xref ref-type="aff" rid="r297759a1">1</xref>
                    <xref ref-type="aff" rid="r297759a2">2</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r297759a1">
                    <label>1</label>Computational Molecular Biology Group, Faculty of Medicine, Chulalongkorn University, Bangkok, Bangkok, Thailand</aff>
                <aff id="r297759a2">
                    <label>2</label>Center for Artificial Intellience in Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, Bangkok, Thailand</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>16</day>
                <month>7</month>
                <year>2024</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2024 Sriswasdi S</copyright-statement>
                <copyright-year>2024</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport297759" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.149577.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This work presents a comprehensive evaluation of de novo and reference-based assembly tools for Nanopore sequencing of genome mixtures of HIV and other viruses. Focus was placed on both the ability of the tools to capture the true viral subtypes and the computational resource requirement. The impact of basic quality filters, such as the minimum read length, was also explored. The best performing tools were selected based on simulated datasets of known HIV mixtures and their performance were evaluated on various actual and simulated datasets of multiple viruses. This work also provides a container resource that would be useful for non-experts to analyze their own data. However, there are some missing details and metrics that should be improved.</p>
            <p> </p>
            <p> Major issues: 
                <list list-type="order">
                    <list-item>
                        <p>Since a key limitation of Nanopore sequencing is the relatively high sequencing error (compared to other sequencing techniques), I think it is unrealistic to simulate the data with the --perfect mode which produces error-free reads. Furthermore, using error-free simulated data would underestimate the number of mismatches and overestimate the % similarity and correctness of the resulting contigs.</p>
                    </list-item>
                    <list-item>
                        <p>On the same issue of sequencing error, I wonder if more attention should be placed on optimizing the polishing step (such as 
                            <ext-link ext-link-type="uri" xlink:href="https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-024-10582-x">Luan T,</ext-link>&#x00a0;et al., (2024) [Ref-1]). In the current work, only the default error handling algorithm associated with each tool was used.</p>
                    </list-item>
                    <list-item>
                        <p>When evaluating the assembly characteristics of a mixture of subtypes, such as in Figure 6-8, and Figure S3, I think it is important to show the characteristics separately for each subtype so that it is clear whether all subtypes were assembled equally well. Or at least, please show the ranges of the values, not just the averages.</p>
                    </list-item>
                    <list-item>
                        <p>In Table 5, I&#x2019;m not sure how a Spearman correlation could be calculated with assembler selection, which is not an ordinal variable.</p>
                    </list-item>
                    <list-item>
                        <p>When evaluating the assemblers using simulated mixture data, it is unclear what proportion of reads coming from each subtype was used. I think the impact of fractional abundance on the quality of the assembled genomes should be investigated. For the simulated HIV-1 mixture datasets in Table 3, please consider exploring different mixture ratios. For the pre-built simulated polio, zika, and HCV datasets, please indicate the mixture ratios to aid interpretation.</p>
                    </list-item>
                    <list-item>
                        <p>There is no apparent tuning of the parameters for the assemblers. Since this is a benchmark paper that releases a container that will be used by the community, please consider more optimization of the pipeline.</p>
                    </list-item>
                </list> Minor issues: 
                <list list-type="order">
                    <list-item>
                        <p>The comparison between de novo assemblers and reference-based assemblers needs to be carefully explained because reference-based approaches, with reference genome as prior knowledge, would in general be able to produce longer contigs (as seen in Figure 5D-F versus Figure 5A-C, with Figure 5F as the extreme case where every contig from RVHaplo is almost the entire genome). It would be nice to explain these to the readers to prevent misinterpretation.</p>
                    </list-item>
                    <list-item>
                        <p>Figures 10 and 11 could be improved by annotating each contig with the identity of each viral subtype in the mixture. This would allow the readers to understand (i) whether the correct contigs correspond to individual subtypes, (ii) which subtypes could not be recovered, and (iii) whether different methods misassemble the same subtypes.</p>
                    </list-item>
                    <list-item>
                        <p>The pipeline seems to use a fixed HIV-1 reference NC_001802.1. Should the impact of the reference genome be explored? Should the reference genome be changeable in the released pipeline?</p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Partly</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>molecular evolution of microorganisms, cancer transcriptomics, bioinformatics pipelines, machine learning and deep learning applications in biomedicine</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-297759-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates.</article-title>
                        <source>
                            <italic>BMC Genomics</italic>
                        </source>.<year>2024</year>;<volume>25</volume>(<issue>1</issue>) :
                        <elocation-id>10.1186/s12864-024-10582-x</elocation-id>
                        <fpage>679</fpage>
                        <pub-id pub-id-type="pmid">38978005</pub-id>
                        <pub-id pub-id-type="doi">10.1186/s12864-024-10582-x</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report297758">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.164057.r297758</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Chaysiri</surname>
                        <given-names>Rujira</given-names>
                    </name>
                    <xref ref-type="aff" rid="r297758a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r297758a1">
                    <label>1</label>School of Management Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>8</day>
                <month>7</month>
                <year>2024</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2024 Chaysiri R</copyright-statement>
                <copyright-year>2024</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport297758" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.149577.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The study conducted a comprehensive evaluation of seven genome assemblers, focusing on their performance across different computational platforms including a server, workstation PC, and standard PC. It was found that while the choice of platform did not significantly affect processing time, memory capacity was a critical factor, with some assemblers like Strainline requiring up to 64 GB of memory, rendering them unsuitable for standard PCs. Reference-based assemblers were generally faster and more consistent in their processing times compared to the variable performance of de novo assemblers. When assessing the minimum read lengths necessary for effective assembly, de novo assemblers needed a minimum read length (N50) of at least 2000 nt to function optimally, while reference-based assemblers were less affected by variations in read length and consistently produced assemblies close to the expected genome size of 8000 nt for HIV.</p>
            <p> </p>
            <p> In evaluating the performance on mixed HIV-1 subtypes, de novo assemblers exhibited lower recovery rates for subtypes, indicating challenges in reconstructing all subtypes present in the input. Conversely, reference-based assemblers demonstrated better recovery of subtypes but tended to over-represent subtype B, used as the reference, and had higher rates of indels, pointing to a need for improved error correction. Further testing on real sequencing data from various viruses, including HIV-1, Mpox, SARS-CoV-2, Polio, Hepatitis C, and ZIKV, showcased the versatility and high-quality results of the selected assemblers across different viral genomes.</p>
            <p> </p>
            <p> Additionally, the study detailed the development of a Docker container for the HIV-64148 pipeline, which incorporated the tested assemblers along with features for querying online databases to generate comprehensive reports on subtype composition, mutations, and drug resistance profiles. This containerized pipeline proved to be practical and accessible, capable of operating on both large servers and standard computers, thus enhancing its utility for genomic surveillance tasks in diverse settings.</p>
            <p> </p>
            <p> The study provides a robust evaluation of genome assemblers, delivering valuable insights into their performance across different computational platforms and data sets. Its comprehensive nature, covering multiple aspects of assembler performance such as processing time, memory usage, and accuracy, is a significant strength. The inclusion of both simulated and real sequencing data from a variety of viruses highlights the practical applicability of the findings. Additionally, the development of a Docker container for the HIV-64148 pipeline enhances accessibility and usability, facilitating standardized genomic surveillance.</p>
            <p> </p>
            <p> However, the study also reveals some limitations. The high memory requirements for some assemblers, particularly Strainline, limit their usability on standard PCs, which could restrict their application in resource-limited environments. The tendency of reference-based assemblers to over-represent certain subtypes, such as subtype B in HIV, introduces bias that could affect the reliability of genomic studies. Furthermore, the higher indel rates observed in reference-based assemblies indicate a need for further refinement and error correction to improve the accuracy of the results.</p>
            <p> </p>
            <p> To address these issues, future research should focus on optimizing memory usage for assemblers, making them more accessible on standard hardware. Developing methods to reduce reference bias in reference-based assemblers would enhance the reliability of their results. Additionally, implementing robust error correction algorithms could mitigate the higher indel rates observed in reference-based assemblies. Overall, while the study makes significant contributions to the field of genomics, addressing these limitations would further improve the effectiveness and applicability of genome assemblers.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Management Technology</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report297757">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.164057.r297757</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Sae-Ueng</surname>
                        <given-names>Udom</given-names>
                    </name>
                    <xref ref-type="aff" rid="r297757a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-5578-0161</uri>
                </contrib>
                <aff id="r297757a1">
                    <label>1</label>National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>5</day>
                <month>7</month>
                <year>2024</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2024 Sae-Ueng U</copyright-statement>
                <copyright-year>2024</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport297757" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.149577.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>1. The title of the paper should focus on the performance of genome assemblers, as the selection of the assembler is the main factor in the pipeline.</p>
            <p> </p>
            <p> 2. A significant number of mismatch errors occur in ONT sequencing data. The paper should address how to ensure that these errors will assemble into distinct vital haplotypes. It should also explain the dissimilarities between viral haplotypes in genomes and provide evidence of low-abundance haplotypes with read ratios in real-world situations for quasispecies analysis.</p>
            <p> </p>
            <p> 3. If the ratio of reads used in the HIV mixture is the same (as shown in Table 3), but different genome assemblers are used (as shown in Figure 9), it results in different subtype recalls. This could be considered a bias of genome assemblers.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Genome sequencing; Biotechnology</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
