<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.51494.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Method Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>The National Ecological Observatory Network&#x2019;s soil metagenomes: assembly and basic analysis</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Werbin</surname>
                        <given-names>Zoey R.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2927-2838</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Hackos</surname>
                        <given-names>Briana</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Dietze</surname>
                        <given-names>Michael C.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Bhatnagar</surname>
                        <given-names>Jennifer M.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Department of Biology, Boston University, Boston, MA, 02215, USA</aff>
                <aff id="a2">
                    <label>2</label>Department of Mathematics, University of Colorado, Boulder, Boulder, CO, 80309, USA</aff>
                <aff id="a3">
                    <label>3</label>Department of Earth &amp; Environment, Boston University, Boston, MA, 02215, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:zrwerbin@bu.edu">zrwerbin@bu.edu</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>19</day>
                <month>4</month>
                <year>2021</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2021</year>
            </pub-date>
            <volume>10</volume>
            <elocation-id>299</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>16</day>
                    <month>3</month>
                    <year>2021</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Werbin ZR et al.</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/10-299/pdf"/>
            <abstract>
                <p>The National Ecological Observatory Network (NEON) annually performs shotgun metagenomic sequencing to sample genes within soils at 47 sites across the United States. NEON serves as a valuable educational resource, thanks to its open data policies and programming tutorials, but there is currently no introductory tutorial for performing analyses with the soil shotgun metagenomic dataset. Here, we describe a workflow for processing raw soil metagenome sequencing reads using the Sunbeam bioinformatics pipeline. The workflow includes cleaning and processing raw reads, taxonomic classification, assembly into contigs, annotation of predicted genes using custom protein databases, and exporting assemblies to the KBase platform for downstream analysis. This workflow is designed to be robust to annual data releases from NEON, and the underlying Snakemake framework can manage complex software dependencies. The workflow presented here aims to increase the accessibility of NEON&#x2019;s shotgun metagenome data, which can provide important clues about soil microbial communities and their ecological roles.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>metagenomics</kwd>
                <kwd>microbial ecology</kwd>
                <kwd>soil microbiome</kwd>
                <kwd>tutorial</kwd>
                <kwd>workflow</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/100000001">
                    <funding-source>National Science Foundation</funding-source>
                    <award-id>1638577</award-id>
                    <award-id>1949968</award-id>
                    <award-id>1840990</award-id>
                </award-group>
                <funding-statement>ZRW is funded by the National Science Foundation (NSF) Graduate Research Fellowship Program (Award #1840990). ZRW, MCD, and JMB are funded by the NSF Macrosystems Biology Program (Award# 1638577). BH is funded by the BU Bioinformatics Research and Interdisciplinary Training Experience (BRITE) NSF-REU program (Award #1949968).</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1">
            <title>Introduction</title>
            <p>The soil microbiome is responsible for key ecological processes, such as decomposition and nitrogen cycling (
                <xref ref-type="bibr" rid="ref2">Allison 
                    <italic toggle="yes">et al.</italic> 2013</xref>). One powerful tool for studying the soil microbiome is shotgun metagenomic sequencing, in which all of the genetic material within the DNA extract of a soil sample is sequenced at once, without targeting specific organisms (
                <xref ref-type="bibr" rid="ref38">Quince 
                    <italic toggle="yes">et al.</italic> 2017</xref>, P&#x00e9;rez-Cobas 
                <italic toggle="yes">et al</italic> 2020). The largest publicly available sequencing dataset of this type is updated annually by the National Ecological Observatory Network (NEON), which monitors ecological conditions at 47 terrestrial sites spanning 20 ecoclimatic domains across the US and its territories (
                <xref ref-type="bibr" rid="ref24">Keller 
                    <italic toggle="yes">et al.</italic> 2018</xref>). NEON is funded by the National Science Foundation (NSF), and collects soil samples and releases shotgun metagenomics data annually.</p>
            <p>To date, the NEON soil metagenomics data can only be accessed in two formats: as completely raw reads released by NEON, or as processed files through the default protocols of the MG-RAST storage server. Neither format is suitable for most metagenomic analyses, which generally answer scientific questions using custom data processing pipelines that use specific algorithms and targeted reference databases (
                <xref ref-type="bibr" rid="ref25">Ladoukakis 
                    <italic toggle="yes">et al.</italic> 2014</xref>; 
                <xref ref-type="bibr" rid="ref38">Quince 
                    <italic toggle="yes">et al.</italic> 2017</xref>). To facilitate future scientific analysis, we present a workflow for taking raw sequences and generating a processed dataset that can be linked to other NEON data products, which include soil biogeochemistry, root measurements, or aboveground plant communities.</p>
            <p>NEON data is a valuable resource for ecology and bioinformatics, thanks to its open access software, robust documentation, and educational resources (
                <xref ref-type="bibr" rid="ref20">Jones 2020</xref>). The pipeline that we present here is designed to complement existing NEON educational resources, such that users without prior bioinformatics experience may use this dataset to learn about microbial communities within the soil. We present code and explanations for each analysis step, including basic quality control (QC), assembling reads into larger genome fragments (&#x201c;contig&#x201d; assembly), predicting genes, quantifying gene counts for specific ecological or biogeochemical functions, and exporting to the KBase platform (
                <xref ref-type="bibr" rid="ref6">Arkin 
                    <italic toggle="yes">et al.</italic> 2018</xref>). We recommend the review by 
                <xref ref-type="bibr" rid="ref35">P&#x00e9;rez-Cobas 
                    <italic toggle="yes">et al.</italic> (2020)</xref> for an overview of software alternatives for each step of this shotgun metagenomics analysis.</p>
        </sec>
        <sec id="sec2" sec-type="methods">
            <title>Methods</title>
            <sec id="sec3">
                <title>Dataset description</title>
                <p>Soil samples are collected annually from 47 NEON sites during peak greenness. Three samples are collected within a NEON plot at a sampling time point. Soil samples are collected up to 30cm below the soil surface, the organic (O) and the mineral (M) horizons (when present) are separated, and subsamples from each horizon are homogenized into one composite sample per horizon. Sample file names include the 4-letter site identifiers, horizons (O or M), and sampling date. Samples are frozen on dry ice until DNA extraction and preparation using the KAPA Hyper Plus kit (Kapa Biosystems). Samples from multiple sites are pooled into sets of 40 or 60 for sequencing, which is conducted on an Illumina NextSeq at the Battelle Memorial Institute (NEON Metagenomics Standard Operating Procedure, v.3). Since there is currently no versioned release of NEON&#x2019;s metagenomic data, the pipeline described here is designed to be robust to processing new data as it is released from NEON, approximately annually (TOS Science Design for Terrestrial Microbial Diversity, NEON.DOC.000908).</p>
            </sec>
            <sec id="sec4">
                <title>Operation</title>
                <p>We assume a Linux operating system and command-line interface. Storage and RAM requirements will depend on the specific analyses performed and the number of samples analyzed. If using shared computing clusters, refer to the Sunbeam manual for 
                    <ext-link ext-link-type="uri" xlink:href="https://sunbeam.readthedocs.io/en/latest/usage.html#cluster-options">cluster-specific options</ext-link>, which are necessary to take full advantage of multi-core processing.</p>
            </sec>
            <sec id="sec5">
                <title>Implementation</title>
                <p>Once sequences are downloaded, we use the software 
                    <ext-link ext-link-type="uri" xlink:href="https://sunbeam.readthedocs.io/en/latest/">Sunbeam</ext-link> (
                    <xref ref-type="bibr" rid="ref11">Clarke 
                        <italic toggle="yes">et al.</italic> 2019</xref>) to create a bioinformatic pipeline. Sunbeam links a variety of popular bioinformatics tools (e.g. BLAST, MegaHIT, Kraken2, Prodigal), and users can develop and share customized extensions for various purposes. Sunbeam, and its underlying Snakemake framework (
                    <xref ref-type="bibr" rid="ref52">K&#x00f6;ster 
                        <italic toggle="yes">et al.</italic> 2012</xref>), are designed to address common problems with software versioning and updating, as well as efficient data re-analysis (i.e. running the minimal tasks necessary to generate updated output files). In addition to Sunbeam&#x2019;s default steps for cleaning and processing the raw reads, the pipeline below performs taxonomic classification or protein annotation for predicted genes using custom databases.</p>
                <p>
                    <bold>1. Setup</bold>
                </p>
                <p>
                    <bold>
                        <italic toggle="yes">1.1 Get raw sequence files</italic>
                    </bold>
                </p>
                <p>
                    <italic toggle="yes">1.1a Test sample set [recommended option]:</italic> We recommend an initial interactive test of the pipeline with two microbial samples. This will ensure that all necessary software is installed and that file paths are correct. A sample set can be downloaded using the command below:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">mkdir raw_sequences # create directory for raw sequencescd raw_sequences # enter directorywget https://neon-microbial-raw-seq-files.s3.data.neonscience.org/2017/WOOD_002-M-20140925-comp_R1.fastq.gzwget https://neon-microbial-raw-seq-files.s3.data.neonscience.org/2017/WOOD_002-M-20140925-comp_R2.fastq.gzwget https://neon-microbial-raw-seq-files.s3.data.neonscience.org/2017/SCBI_012-M-20140915-comp_R1.fastq.gzwget https://neon-microbial-raw-seq-files.s3.data.neonscience.org/2017/SCBI_012-M-20140915-comp_R2.fastq.gzcd ..# return to enclosing directory</preformat>
                </p>
                <p>
                    <italic toggle="yes">1.1b Download custom dataset:</italic> Use NEON&#x2019;s interactive 
                    <ext-link ext-link-type="uri" xlink:href="https://data.neonscience.org/data-products/explore">Data Portal</ext-link>, or to download a specific set of samples that meets your interests. Download links are included in NEON's &#x201c;Expanded&#x201d; data packages. For example, you could compare samples from Alaska with those from Puerto Rico, or you could download sites that have accompanying multi-decadal data from the 
                    <ext-link ext-link-type="uri" xlink:href="https://lternet.edu/site/">Long-Term Ecological Research</ext-link> (LTER) program. Samples must have forward and reverse reads and they must be compressed (in.fastq.gz format). Even when compressed, each file may still require multiple GB of storage.</p>
                <p>
                    <bold>
                        <italic toggle="yes">1.2 Install Sunbeam</italic>
                    </bold>
                </p>
                <p>Full details on Sunbeam installation can be found in the Sunbeam 
                    <ext-link ext-link-type="uri" xlink:href="https://sunbeam.readthedocs.io/en/latest/usage.html">user guide</ext-link>. In short, run the following commands to create a new &#x201c;analysis&#x201d; directory and download Sunbeam into that directory:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">mkdir metagenome_analysis # create directory for analysiscd metagenome_analysis # enter directorygit clone -b dev https://github.com/sunbeam-labs/sunbeam sunbeam # download development branchcd sunbeam # enter directorybash install.sh # run installation script</preformat>
                </p>
                <p>Confirm success of installation (may take 10-15 minutes):</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">bash tests/run_tests.bash</preformat>
                </p>
                <p>If all went well, your screen will say &#x201c;TESTS SUCCEEDED.&#x201d; A new conda environment should now exist. You can check available environments using:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">conda env list</preformat>
                </p>
                <p>Activate the Sunbeam environment. This must be run for any Sunbeam commands to work.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">conda activate sunbeam</preformat>
                </p>
                <p>Next, we tell Sunbeam where the raw sequences are downloaded, by creating a &#x201c;samples.csv&#x201d; file that links the forward read files and the reverse read files. If you have not downloaded files to a &#x201c;raw_sequences&#x201d; folder (Step 1.1A), change the file path to point to the sequence folder on your own system:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">cd .. # go to enclosing (analysis) directorysunbeam list_samples ../raw_sequences &gt;&gt; samples.csv # change this path if your own raw files are not in &#x201c;raw sequences&#x201d;</preformat>
                </p>
                <p>The last part of setup requires creating a configuration file called &#x201c;sunbeam_config.yml.&#x201d; To use the custom configuration that accompanies this workflow run the following command from your analysis directory:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">wget https://raw.githubusercontent.com/zoey-rw/metagenomes_NEON/main/sunbeam_config.yml # download configuration file</preformat>
                </p>
                <p>This configuration file is used to set parameters for every part of the analysis (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>).
                    <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                        <label>Figure 1. </label>
                        <caption>
                            <title>Sunbeam configuration file provided for NEON shotgun metagenomics bioinformatic pipeline.</title>
                            <p>Many parameters remain the default values provided in Sunbeam&#x2019;s basic configuration file, while others have been customized for this dataset (e.g. file paths, as well as fwd_adapter, rev_adapter, min_length).</p>
                        </caption>
                        <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/54670/5bf2a711-101f-4387-a1b4-a8d89564b1e4_figure1.gif"/>
                    </fig>
                </p>
                <p>
                    <bold>
                        <italic toggle="yes">1.3 Setup troubleshooting and tips</italic>
                    </bold>
                </p>
                <p>On shared computing clusters, some softwares must be loaded as &#x201c;modules&#x201d; before they are used. For instance, to use Miniconda (necessary for every step of this pipeline), this command may work:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">module load miniconda # may need to specify version</preformat>
                </p>
                <p>Most analyses will run quicker if there are multiple threads available. The custom configuration file, sunbeam_config.yml, assumes you have 8 threads available. This command can check your available threads, though you may not want to use all of them if you share computing resources:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">echo "CPU threads: $(grep -c processor/proc/cpuinfo)"</preformat>
                </p>
                <p>
                    <bold>2. Quality control</bold>
                </p>
                <p>In this step, raw sequences are cleaned using the default tools in the Sunbeam pipeline. To remove poor-quality data, or components that are leftover from the sequencing, we use Cutadapt (Martin 2015) and Trimmomatic (
                    <xref ref-type="bibr" rid="ref8">Bolger 
                        <italic toggle="yes">et al.</italic> 2014</xref>). Problematic low-complexity samples are removed using the program Komplexity (
                    <xref ref-type="bibr" rid="ref11">Clarke 
                        <italic toggle="yes">et al.</italic> 2019</xref>). Overall quality of reads is then reported by FastQC (BabrahamBioinformatics, 2018).</p>
                <p>Optionally, users may wish to search for and remove sequences that match the PhiX genome (Step 2.1b), which is a common contaminant of Illumina metagenomic data due to its use as a control during sequencing (
                    <xref ref-type="bibr" rid="ref30">Mukherjee 
                        <italic toggle="yes">et al.</italic> 2015</xref>). This contamination was not found in our test samples (
                    <xref ref-type="fig" rid="f2">Figure 2c</xref>), so we proceed without this in Step 2.1a.
                    <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                        <label>Figure 2. </label>
                        <caption>
                            <title>Quality-control reports produced by the 
                                <italic toggle="yes">sbx_report</italic> extension, described in Step 3.2A.</title>
                            <p>a) Average quality scores along read positions. b) Counts of read pairs for a subset of samples. c) Proportion of reads retained (blue), discarded as low-quality (light grey), or discarded as PhiX (&#x201c;Host&#x201d;) contamination (dark grey). No PhiX contamination was observed in the metagenomes from these 2 NEON soil samples.</p>
                        </caption>
                        <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/54670/5bf2a711-101f-4387-a1b4-a8d89564b1e4_figure2.gif"/>
                    </fig>
                </p>
                <p>
                    <italic toggle="yes">2.1a Run quality control without PhiX decontamination [recommended]:</italic> To run the quality control step without decontaminating the files, use the following command:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">sunbeam run -- --configfile./sunbeam_config.yml clean_qc</preformat>
                </p>
                <p>
                    <italic toggle="yes">Note</italic>: the below command does the same as the above, but produces intermediate outputs for each software (Cutadapt, Trimmomatic, and fastQC). This takes up additional file storage space, but allows you to inspect each output. This is useful for debugging, such as if you suspect that one of these steps is removing more reads than it should.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">sunbeam run -- --configfile sunbeam_config.yml all_qc</preformat>
                </p>
                <p>
                    <italic toggle="yes">2.1b Run quality control with PhiX decontamination:</italic> To download the PhiX genome, run the following command, which will retrieve the genome from the 
                    <ext-link ext-link-type="uri" xlink:href="https://support.illumina.com/sequencing/sequencing_software/igenome.html">Illumina iGenomes</ext-link> website, decompress the file, and rename it as a FASTA file within your current directory:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">wget http://igenomes.illumina.com.s3-website-us-east-1.amazonaws.com/PhiX/Illumina/RTA/PhiX_Illumina_RTA.tar.gz -O -|tar -xz PhiX/Illumina/RTA/Sequence/WholeGenomeFasta/genome.famv PhiX/Illumina/RTA/Sequence/WholeGenomeFasta/genome.fa PhiX/PhiX.fasta</preformat>
                </p>
                <p>In your configuration file, the &#x201c;host_fp&#x201d; parameter must point to the folder enclosing the downloaded PhiX genome. The command below will make this change:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">sed -i "s/host_fp: &#x201c;/host_fp: 'PhiX'/" sunbeam_config.yml</preformat>
                </p>
                <p>Next, run the Sunbeam decontamination step, which automatically includes quality control:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">sunbeam run -- --configfile sunbeam_config.yml all_decontam</preformat>
                </p>
                <p>
                    <bold>
                        <italic toggle="yes">2.2 Evaluate quality control</italic>
                    </bold>
                </p>
                <p>Output folders contain log files for each software run within the quality control step. Each sample also has an HTML file produced by FastQC (BabrahamBioinformatics, 2018), which includes visualizations of base quality, sequence lengths, and other checks. More information on interpreting these reports is available on the 
                    <ext-link ext-link-type="uri" xlink:href="https://www.bioinformatics.babraham.ac.uk/projects/fastqc/">FastQC website</ext-link>. By default, the samples with reads that pass quality control will be located in the following directory: sunbeam_output/qc/decontam/.</p>
                <p>Within our example dataset, average quality scores were high (above 30) throughout sequence reads (
                    <xref ref-type="fig" rid="f2">Figure 2a</xref>). Quality scores represent error rates of base calls (
                    <xref ref-type="bibr" rid="ref18">Illumina, 2014</xref>). On average, the first few reads tended to be of lowest quality, but otherwise, quality decreases along read length. Quantity of sequences can vary dramatically between samples, with read pair counts ranging from 2 million to 15 million (
                    <xref ref-type="fig" rid="f2">Figure 2b</xref>). This does not necessarily reflect variation in the amount of microbes in the soil - rather, variation can be the result of biases in DNA extraction or sequencing (Pereira 
                    <italic toggle="yes">et al.</italic> 2018; Jonsson 
                    <italic toggle="yes">et al.</italic> 2016).</p>
                <p>
                    <bold>3. Taxonomic classification</bold>
                </p>
                <p>The taxonomic identity of reads in a metagenome sample can be assigned by comparing predicted proteins or nucleotides to reference databases. This can be performed with short reads (pre-assembly) or with assembled contigs. Both avenues produce similar results for fungal and bacterial sequences (
                    <xref ref-type="bibr" rid="ref54">Pearman 
                        <italic toggle="yes">et al.</italic> 2020</xref>), so we use short reads for compatibility with Sunbeam&#x2019;s default classifier, Kraken2 (
                    <xref ref-type="bibr" rid="ref48">Wood 
                        <italic toggle="yes">et al.</italic> 2019</xref>). Compared to other classification tools, Kraken2 has been shown to perform favorably on soil datasets (
                    <xref ref-type="bibr" rid="ref22">Kalantar 
                        <italic toggle="yes">et al.</italic> 2020</xref>). However, Sunbeam extensions have also been developed for other classifiers, such as 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/sunbeam-labs/sbx_kaiju">Kaiju</ext-link> or 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/sunbeam-labs/sbx_metaphlan">MetaPhlAn</ext-link>.</p>
                <p>
                    <bold>
                        <italic toggle="yes">3.1 Classify reads using Kraken2</italic>
                    </bold>
                </p>
                <p>First, we must download a Kraken2 reference database. You could build your own with specific combinations of organisms, but pre-built databases are updated regularly and shared by the 
                    <ext-link ext-link-type="uri" xlink:href="https://benlangmead.github.io/aws-indexes/k2">Kraken2 developers</ext-link>. Databases range in size from 100 MB to 90 GB, depending on the genomes included. Most databases are constructed via RefSeq (
                    <xref ref-type="bibr" rid="ref34">O&#x2019;Leary 
                        <italic toggle="yes">et al.</italic> 2016</xref>), but marker gene databases such as Silva (Quast 
                    <italic toggle="yes">et al.</italic> 2012) and RDP (
                    <xref ref-type="bibr" rid="ref12">Cole 
                        <italic toggle="yes">et al.</italic> 2014</xref>) may also be used with Kraken2.</p>
                <p>Below, we use the &#x201c;PlusPF&#x201d; database, which includes sequences from archaea, bacteria, viral, plasmid, human, UniVec_Core, protozoa &amp; fungi. The full database is 48 GB, but the version capped at 8 GB can be downloaded using this command:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">wget -c https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_8gb_20210127.tar.gz -P kraken_pluspf/ # download databasetar -zxvf kraken_pluspf/k2_pluspf_8gb_20210127.tar.gz -C kraken_pluspf/ # decompress databaserm kraken_pluspf/k2_pluspf_8gb_20210127.tar.gz # remove compressed file</preformat>
                </p>
                <p>In your configuration file, the &#x201c;kraken_db_fp&#x201d; parameter should point to the folder enclosing the database (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>).</p>
                <p>To run the taxonomic classification step:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">sunbeam run -- --configfile sunbeam_config.yml all_classify</preformat>
                </p>
                <p>
                    <italic toggle="yes">3.2a Evaluate taxonomic classification using Sunbeam extension:</italic> We can use a Sunbeam extension, 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/sunbeam-labs/sbx_report">
                        <italic toggle="yes">sbx_report</italic>
                    </ext-link>, to inspect results from the classification step. This will provide visual summaries of sequence quality along read position, read decontamination, and relative abundances of taxa from the phylum to the genus level. To download this extension, run:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">sunbeam extend https://github.com/sunbeam-labs/sbx_report</preformat>
                </p>
                <p>Then run the following to generate HTML reports of read quality and taxonomic classification (
                    <xref ref-type="fig" rid="f2">Figures 2</xref> and 
                    <xref ref-type="fig" rid="f3">3</xref>):</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">sunbeam run -- --configfile sunbeam_config.yml --use-conda final_report</preformat>
                    <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                        <label>Figure 3. </label>
                        <caption>
                            <title>Taxonomic abundance reports produced by the 
                                <italic toggle="yes">sbx_report</italic> extension, described in Step 3.2A.</title>
                            <p>Heatmap shows phylum-level read abundances for 2 NEON shotgun metagenomics samples.</p>
                        </caption>
                        <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/54670/5bf2a711-101f-4387-a1b4-a8d89564b1e4_figure3.gif"/>
                    </fig>
                </p>
                <p>
                    <bold>4. Contig assembly</bold>
                </p>
                <p>This step takes the cleaned reads and assembles them into longer genome regions called contigs. We assemble reads into contigs to increase sensitivity and accuracy when predicting and annotating genes. Contig assembly has been shown to provide substantial improvements in conjunction with NCycDB in particular (
                    <xref ref-type="bibr" rid="ref5">Anwar 
                        <italic toggle="yes">et al.</italic> 2019</xref>), which we use in Step 5. Contig assembly generally requires more computational power and time than any other step within metagenomic analysis (
                    <xref ref-type="bibr" rid="ref38">Quince 
                        <italic toggle="yes">et al.</italic> 2017</xref>). Using multiple threads (i.e. 16) is recommended, and this may require adding the &#x201c;--cores 16&#x201d; argument to the Sunbeam command.</p>
                <p>Below, we use the software Megahit (
                    <xref ref-type="bibr" rid="ref26">Li 
                        <italic toggle="yes">et al</italic>. 2016</xref>), which is one of the fastest tools for metagenome assembly. For some samples, this speed may come at the expense of sensitivity, so users are welcome to substitute other software here. One option for this step is 
                    <italic toggle="yes">co-assembly</italic> of reads, in which information is shared between reads, which increases sensitivity to low-abundance reads (
                    <xref ref-type="bibr" rid="ref40">Sczyrba 
                        <italic toggle="yes">et al.</italic> 2017</xref>). However, this causes an exponential increase in assembly time and memory usage, possibly taking days or weeks to complete.</p>
                <p>
                    <italic toggle="yes">4.1a Assemble contigs independently [recommended option]:</italic> In our configuration file (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>), we have set the minimum length of contigs to 1000bp using the &#x2018;min_len&#x2019; parameter. This value represents the average gene length for prokaryotes (
                    <xref ref-type="bibr" rid="ref50">Xu 
                        <italic toggle="yes">et al.</italic> 2006</xref>).</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">sunbeam run -- --configfile sunbeam_config.yml all_assembly</preformat>
                </p>
                <p>
                    <italic toggle="yes">4.1b Co-assemble contigs independently:</italic> To take this route, you can use the extension shared 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/sunbeam-labs/sbx_coassembly">by Sunbeam Labs</ext-link>, which carries out co-assembly using Megahit (
                    <xref ref-type="bibr" rid="ref26">Li 
                        <italic toggle="yes">et al</italic>. 2016</xref>).</p>
                <p>
                    <bold>
                        <italic toggle="yes">4.2 Evaluate assembly output</italic>
                    </bold>
                </p>
                <p>For each sample, basic summaries of the contig assembly are stored in the following directory by default: sunbeam_output/assembly/megahit/. Longer contigs generally represent higher confidence in longer regions of the genome, although misassemblies can occur and lead to long contigs (
                    <xref ref-type="bibr" rid="ref40">Sczyrba 
                        <italic toggle="yes">et al.</italic> 2017</xref>). In the log files, you will find the minimum, maximum, and average contig length, as well as the number of contigs of at least 50bp.</p>
                <p>
                    <italic toggle="yes">4.2a Optional: evaluate assembly output using metaQUAST:</italic> We recommend the tool 
                    <ext-link ext-link-type="uri" xlink:href="http://quast.sourceforge.net/metaquast">metaQUAST</ext-link> to perform a more in-depth evaluation assembly, such as summaries of contig length distributions (
                    <xref ref-type="fig" rid="f4">Figure 4</xref>), detection of misassemblies and errors, or comparison with reference databases to estimate the abundance of unknown species (
                    <xref ref-type="bibr" rid="ref53">Mikheenko 
                        <italic toggle="yes">el al.</italic> 2016</xref>). To download the metaQUAST program (as part of QUAST), run the following lines:
                    <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                        <label>Figure 4. </label>
                        <caption>
                            <title>Output statistics from metaQUAST, summarizing contig lengths per sample.</title>
                            <p>To produce similar statistics without downloading reference genomes, run metaQUAST with the &#x201c;--max-ref-num&#x201d; parameter set to 0.</p>
                        </caption>
                        <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/54670/5bf2a711-101f-4387-a1b4-a8d89564b1e4_figure4.gif"/>
                    </fig>
                </p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">wget https://sourceforge.net/projects/quast/files/latest/download # download newest versiontar -xzf download # decompress file</preformat>
                </p>
                <p>To run the metaQUAST program on a sample or set of samples, specify the directory of input samples and output location like this (note: version number of QUAST may differ):</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">python./quast-5.0.2/metaquast.py -o metaquast_output/sunbeam_output/assembly/contigs/*.fa --max-ref-num 0</preformat>
                </p>
                <p>Section 2.4 of the metaQUAST manual discusses which reference genomes or databases are downloaded by default.</p>
                <p>
                    <bold>5. Annotation</bold>
                </p>
                <p>The annotation step of the pipeline carries out BLAST searches on assembled contigs. Sunbeam will automatically use BLASTn for nucleotide databases, while BLASTx and BLASTp will be used for protein databases. Before protein databases are searched, the location of Open Reading Frames (ORFs) are predicted using the software Prodigal (
                    <xref ref-type="bibr" rid="ref17">Hyatt 
                        <italic toggle="yes">et al.</italic>, 2010</xref>).</p>
                <p>Gene presence does not necessarily mean that the genes are transcribed or active; however, due to the metabolically expensive nature of maintaining genomic pathways (
                    <xref ref-type="bibr" rid="ref28">Lynch, 2006</xref>), there is potentially meaningful correspondence between gene presence and functional potential (
                    <xref ref-type="bibr" rid="ref35">P&#x00e9;rez-Cobas 
                        <italic toggle="yes">et al.</italic> 2020</xref>). Below, we demonstrate preparation of two BLAST protein databases that may be scientifically relevant for soil metagenomics.</p>
                <p>
                    <italic toggle="yes">Downloading the Comprehensive Antibiotic Resistance Database (CARD):</italic> CARD (
                    <xref ref-type="bibr" rid="ref1">Alcock 
                        <italic toggle="yes">et al.</italic> 2020</xref>) is a curated reference database of DNA sequences and proteins, designed to identify mutations and mechanisms of resistance to antibiotics, which can develop as a result of poor human stewardship (
                    <xref ref-type="bibr" rid="ref10">Brown &amp; Wright 2016</xref>). However, antibiotic resistance can also be an ecological signifier of fungal-bacterial competition for nutrients (
                    <xref ref-type="bibr" rid="ref51">Bahram 
                        <italic toggle="yes">et al.</italic> 2018</xref>). We use the homolog protein genes to construct our reference database.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">wget https://card.mcmaster.ca/download/0/broadstreet-v3.1.0.tar.bz2 -P db/card/ # download into new directorycd db/card/ # enter download directorytar -xf broadstreet-v3.1.0.tar.bz2./protein_fasta_protein_homolog_model.fasta # extract filecd ../../ # return to analysis directory</preformat>
                </p>
                <p>Next, we convert to BLASTp database for use within our pipeline:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">makeblastdb -in db/card/protein_fasta_protein_homolog_model.fasta -title card_protein -dbtype 'prot' -hash_index # convert to BLASTp database</preformat>
                </p>
                <p>
                    <italic toggle="yes">Downloading NCycDB:</italic> NCycDB categorizes genes into pathways that represent transformations such as nitrification, denitrification, and anammox. NCycDB was compiled from other sources, including COG, eggNOG, KEGG and the SEED (
                    <xref ref-type="bibr" rid="ref42">Tu 
                        <italic toggle="yes">et al.</italic> 2019</xref>). The NCycDB must be downloaded from Github and converted into a BLAST protein database. From the analysis directory, run the following commands to download the database, decompress the file, and change the file suffix:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">svn export https://github.com/qichao1984/NCyc/trunk/data/db/NCyc &amp;&amp; gunzip db/NCyc/NCyc_100.faa.gz</preformat>
                </p>
                <p>This database has duplicate sequences that can introduce problems later on. We can remove duplicates using the following commands, which utilize the programs BLAST and cd-hit:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">mv db/NCyc/NCyc_100.faa db/NCyc/NCyc_100.fasta # change file extensioncd-hit -i db/NCyc/NCyc_100.fasta -o db/NCyc/NCyc_unique.fasta -c 1 -t 1 # remove duplicate sequences</preformat>
                </p>
                <p>Next, we convert to BLASTp database for use within our pipeline:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">makeblastdb -in db/NCyc/NCyc_unique.fasta -parse_seqids -title NCyc_unique -dbtype prot -hash_index # convert to BLASTp database</preformat>
                </p>
                <p>In your configuration file, the &#x201c;root_fp&#x201d; and &#x201c;protein&#x201d; parameters should point to the BLAST database directory and file names (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>). See the Sunbeam documentation for examples of configuration files that include nucleotide databases.</p>
                <p>
                    <bold>
                        <italic toggle="yes">5.1 Run annotation</italic>
                    </bold>
                </p>
                <p>To run the annotation step:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">sunbeam run -- --configfile sunbeam_config.yml all_annotate</preformat>
                </p>
                <p>
                    <bold>6. Annotation post-processing</bold>
                </p>
                <p>A suite of tools have been published for working with the BLASTxml outputs from Step 5. Python scripts can be used to convert BLASTxml to a CSV format; for examples, see the 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/zoey-rw/metagenomes_NEON">Github repository</ext-link> associated with this manuscript.</p>
                <p>Once we have the read counts of genes associated with specific functions, we can compare results across samples. Gene counts should first be normalized to account for variation in sequencing depths (Pereira 
                    <italic toggle="yes">et al.</italic> 2018). One widely-used method is relative-log expression (RLE), which calculates scaling factors based on the geometric mean of gene abundances across samples. RLE can be implemented using the DESeq R package (
                    <xref ref-type="bibr" rid="ref27">Love 
                        <italic toggle="yes">et al.</italic> 2014</xref>), and can be used to identify genes that are differentially abundant between groups (such as sites, or soil horizons).</p>
                <p>For our two test samples, we can plot the outputs from each BLASTp search (
                    <xref ref-type="fig" rid="f5">Figure 5</xref>). Among antibiotic resistance genes, we can look at trends for specific types of antibiotics. Tetracycline resistance, for example, has become widespread in soil bacteria, possibly linked to intensive farming (
                    <xref ref-type="bibr" rid="ref39">Schmitt 
                        <italic toggle="yes">et al.</italic> 2006</xref>). For a subset of tetracycline-resistance genes, normalized abundances appear higher in the sample from the NEON&#x2019;s WOOD site (
                    <xref ref-type="fig" rid="f5">Figure 5A</xref>). For our nitrogen-cycling genes, we can subset to those associated with organic synthesis and degradation. For these genes, we see a similar pattern, with higher normalized abundances in the sample from the WOOD sites (
                    <xref ref-type="fig" rid="f5">Figure 5B</xref>). However, the SCBI sample had a lower sequencing depth overall (
                    <xref ref-type="fig" rid="f2">Figure 2B</xref>), which can prevent the detection of low-abundance genes.
                    <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                        <label>Figure 5. </label>
                        <caption>
                            <title>Log
                                <sub>10</sub> normalized counts from a BLASTp search of Open Reading Frames (ORFs) within contigs from two shotgun metagenomic samples.</title>
                            <p>Contigs were assembled using Megahit (
                                <xref ref-type="bibr" rid="ref26">Li 
                                    <italic toggle="yes">et al</italic>. 2016</xref>), and ORFs were predicted using Prodigal (
                                <xref ref-type="bibr" rid="ref17">Hyatt 
                                    <italic toggle="yes">et al.</italic> 2010</xref>). These samples are a subset of the full NEON shotgun metagenomics dataset (NEON DP1.10107.001). A) BLASTp hits for a search against the Comprehensive Antibiotic Resistance Database (CARD) (
                                <xref ref-type="bibr" rid="ref1">Alcock 
                                    <italic toggle="yes">et al.</italic> 2020</xref>). Tetracycline resistance genes are defined as CARD entries with the word &#x201c;tetracycline&#x201d; in their description and &#x201c;tet&#x201d; in their name. B) BLASTp hits for a search against NCycDB (
                                <xref ref-type="bibr" rid="ref42">Tu 
                                    <italic toggle="yes">et al.</italic> 2019</xref>). Genes are subset to those belonging to &#x201c;Organic degradation and synthesis&#x201d; pathways.</p>
                        </caption>
                        <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/54670/5bf2a711-101f-4387-a1b4-a8d89564b1e4_figure5.gif"/>
                    </fig>
                </p>
                <p>
                    <bold>7. Exporting to KBase for binning</bold>
                </p>
                <p>The outputs from this pipeline can be further analyzed using the KBase platform, developed by the U.S. Department of Energy for microbiome analysis (
                    <xref ref-type="bibr" rid="ref6">Arkin 
                        <italic toggle="yes">et al.</italic> 2018</xref>). KBase links hundreds of different software tools using an online interface, which allows users to create &#x201c;Narratives&#x201d; for specific data analysis projects. Individual files can be uploaded to KBase directly, or they can be transferred in batches using Globus Online (
                    <xref ref-type="bibr" rid="ref16">Foster 2011</xref>).</p>
                <p>For example, a KBase Narrative (
                    <xref ref-type="fig" rid="f6">Figure 6</xref>) could be used to create Metagenome-Assembled Genomes (MAGs). Because MAGs are created directly from contigs, rather than from microbes grown in an experimental setting, they often have no cultured relatives, representing a hidden source of genetic diversity in the microbiome (
                    <xref ref-type="bibr" rid="ref33">Nayfach 
                        <italic toggle="yes">et al.</italic> 2020</xref>). KBase includes a variety of tools for creating MAGs, each using different algorithms, and outputs from multiple tools can be synthesized using a program called DAS Tool (
                    <xref ref-type="bibr" rid="ref41">Sieber 
                        <italic toggle="yes">et al.</italic> 2018</xref>). For each putative genome, or &#x201c;bin,&#x201d; summary statistics are produced that estimate the completeness and possible contamination of the genome, using a set of genes that are expected to be &#x201c;single-copy&#x201d; within a genome (
                    <xref ref-type="bibr" rid="ref41">Sieber 
                        <italic toggle="yes">et al.</italic> 2018</xref>).
                    <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                        <label>Figure 6. </label>
                        <caption>
                            <title>Creating and evaluating Metagenome-Assembled Genomes (MAGs) using the KBase Narrative interface (
                                <xref ref-type="bibr" rid="ref6">Arkin 
                                    <italic toggle="yes">et al.</italic> 2018)</xref>.</title>
                            <p>First, quality-controlled sequencing reads and assembled contigs are imported using upload modules. Then, contigs are binned into putative genomes (or &#x201c;bins&#x201d;) using MaxBin2 (
                                <xref ref-type="bibr" rid="ref49">Wu 
                                    <italic toggle="yes">et al.</italic> 2016</xref>), MetaBAT2 (
                                <xref ref-type="bibr" rid="ref23">Kang 
                                    <italic toggle="yes">et al.</italic> 2019</xref>), and CONCOCT (
                                <xref ref-type="bibr" rid="ref3">Alneberg 
                                    <italic toggle="yes">et al.</italic> 2014</xref>). Finally, DAS Tool (
                                <xref ref-type="bibr" rid="ref41">Sieber 
                                    <italic toggle="yes">et al.</italic> 2018</xref>) is used to choose the highest-quality bins.</p>
                        </caption>
                        <graphic id="gr6" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/54670/5bf2a711-101f-4387-a1b4-a8d89564b1e4_figure6.gif"/>
                    </fig>
                </p>
                <p>In our example Narrative, we combine the output from three tools, MaxBin2 (
                    <xref ref-type="bibr" rid="ref49">Wu 
                        <italic toggle="yes">et al.</italic> 2016</xref>), MetaBAT2 (
                    <xref ref-type="bibr" rid="ref23">Kang 
                        <italic toggle="yes">et al.</italic> 2019</xref>), and CONCOCT (
                    <xref ref-type="bibr" rid="ref3">Alneberg 
                        <italic toggle="yes">et al.</italic> 2014</xref>). As inputs, we use the contigs assembled in Step 4 of this pipeline, as well as the quality-controlled sequencing reads from Step 2, for the sample WOOD_002-M-20140925-COMP. For this sample, DAS Tool produces one genome, 
                    <italic toggle="yes">bin.001,</italic> which is less than 27% complete. Bins can be further refined manually, and genomes that are more than 90% complete with less than 5% contamination may be good candidates for submission to public databases (Bowers 
                    <italic toggle="yes">et al.</italic> 2017). High-quality MAGs can uncover entirely new lineages in the microbial tree of life (
                    <xref ref-type="bibr" rid="ref33">Nayfach 
                        <italic toggle="yes">et al.</italic> 2020</xref>).</p>
                <p>
                    <bold>Troubleshooting, tips and tricks</bold>
                </p>
                <p>For any rule, if not all files are processed, the step can be repeated using the --unlock and --rerun-incomplete parameters, i.e.:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">sunbeam run -- --configfile sunbeam_config.yml clean_qc --rerun-incomplete --unlocksunbeam run -- --configfile sunbeam_config.yml clean_qc --rerun-incomplete</preformat>
                </p>
                <p>To customize or expand on the workflow above, it is helpful to know the basic logic of Snakemake, which is the underlying framework for the Sunbeam pipeline. Snakemake relies on a series of rules, which specify input files, output files, and any necessary commands. When a rule is called, Snakemake works backwards from the output files to decide if any input files are missing or outdated, and tries to re-run rules as needed. If you want to add an extension to Sunbeam, a full guide is available in the 
                    <ext-link ext-link-type="uri" xlink:href="https://sunbeam.readthedocs.io/en/latest/extensions.html">Sunbeam documentation</ext-link>.</p>
                <p>To scale up to a larger dataset, a significant amount of computational power will be necessary, ideally with 8 or more cores for parallel computation. For those without access to institutional high-performance clusters, the scientific computing platform CyVerse (Merchant 
                    <italic toggle="yes">et al.</italic> 2016) offers free computational and storage resources. Note that intermediate files are generated for multiple steps, which can multiply the amount of storage needed for each metagenomic sample. Deleting these intermediate files when a step has completed will reduce the storage requirements.</p>
            </sec>
        </sec>
        <sec id="sec6">
            <title>Data availability</title>
            <p>Raw metagenomics sequencing data is published as DP1.10107.001 from the National Ecological Observatory Network (
                <ext-link ext-link-type="uri" xlink:href="https://data.neonscience.org/data-products/explore">https://data.neonscience.org/data-products/explore</ext-link>). All other data is previously published and cited throughout the paper.</p>
        </sec>
        <sec id="sec7">
            <title>Software availability</title>
            <p>Bioconductor packages available at 
                <ext-link ext-link-type="uri" xlink:href="https://www.bioconductor.org/">https://www.bioconductor.org/</ext-link>. CRAN packages available at 
                <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/">https://cran.r-project.org/</ext-link>. Sunbeam software available at 
                <ext-link ext-link-type="uri" xlink:href="https://sunbeam.readthedocs.io">https://sunbeam.readthedocs.io</ext-link>.</p>
            <p>Scripts to download NEON raw data, as well as process final BLASTxml files, are hosted at 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/zoey-rw/metagenomes_NEON">https://github.com/zoey-rw/metagenomes_NEON</ext-link>.</p>
            <p>Archived scripts as at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.4589528">http://doi.org/10.5281/zenodo.4589528</ext-link> (
                <xref ref-type="bibr" rid="ref47">Werbin 2021</xref>).</p>
            <p>License: Creative Commons Zero v1.0 Universal.</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>This material is based in part upon work supported by the National Science Foundation through the National Ecological Observatory Network, which is operated under cooperative agreement by Battelle Memorial Institute. We also thank Michael Silverstein at Boston University for assistance with Python scripting.</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Alcock</surname>
                            <given-names>BP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Raphenya</surname>
                            <given-names>AR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lau</surname>
                            <given-names>TTY</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>CARD 2020: Antibiotic resistome surveillance with the comprehensive antibiotic resistance database.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Research.</italic>
</source>
                    <year>2020</year>.
                    <pub-id pub-id-type="pmid">31665441</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkz935</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7145624</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Allison</surname>
                            <given-names>SD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Weihe</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Microbial abundance and composition influence litter decomposition response to environmental change.</article-title>
                    <source>

                        <italic toggle="yes">Ecology.</italic>
</source>
                    <year>2013</year>;<volume>94</volume>(<issue>3</issue>):<fpage>714</fpage>&#x2013;<lpage>725</lpage>.
                    <pub-id pub-id-type="pmid">23687897</pub-id>
                    <pub-id pub-id-type="doi">10.1890/12-1243.1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Alneberg</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bjarnason</surname>
                            <given-names>BS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>De Bruijn</surname>
                            <given-names>I</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Binning metagenomic contigs by coverage and composition.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2014</year>;<volume>11</volume>(<issue>11</issue>):<fpage>1144</fpage>&#x2013;<lpage>1146</lpage>.
                    <pub-id pub-id-type="pmid">25218180</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.3103</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Andrews</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Krueger</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Seconds-Pichon</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>FastQC. A quality control tool for high throughput sequence data. Babraham Bioinformatics.</article-title>
                    <publisher-loc>Babraham Institute</publisher-loc>;<year>2015</year>.</mixed-citation>
            </ref>
            <ref id="ref5">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Anwar</surname>
                            <given-names>MZ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lanzen</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bang-Andreasen</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>To assemble or not to resemble-A validated Comparative Metatranscriptomics Workflow (CoMW).</article-title>
                    <source>

                        <italic toggle="yes">GigaScience.</italic>
</source>
                    <year>2019</year>;<volume>8</volume>(<issue>8</issue>):<fpage>1</fpage>&#x2013;<lpage>10</lpage>.
                    <pub-id pub-id-type="pmid">31363751</pub-id>
                    <pub-id pub-id-type="doi">10.1093/gigascience/giz096</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6667343</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Arkin</surname>
                            <given-names>AP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cottingham</surname>
                            <given-names>RW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Henry</surname>
                            <given-names>CS</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>KBase: The United States department of energy systems biology knowledgebase.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2018</year>;<volume>36</volume>(<issue>7</issue>):<fpage>566</fpage>&#x2013;<lpage>569</lpage>.
                    <pub-id pub-id-type="pmid">29979655</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.4163</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6870991</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref51">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bahram</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hildebrand</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Forslund</surname>
                            <given-names>SK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Anderson</surname>
                            <given-names>JL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Soudzilovskaia</surname>
                            <given-names>NA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bodegom</surname>
                            <given-names>PM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Structure and function of the global topsoil microbiome.</article-title>
                    <source>

                        <italic toggle="yes">Nature [Internet].</italic>
</source>
                    <year>2018</year>;<volume>560</volume>(<issue>7717</issue>):<fpage>233</fpage>&#x2013;<lpage>237</lpage>.
                    <pub-id pub-id-type="doi">10.1038/s41586-018-0386-6</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Banerji</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jahne</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Herrmann</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Bringing Community Ecology to Bear on the Issue of Antimicrobial Resistance.</article-title>
                    <source>

                        <italic toggle="yes">Front Microbiol.</italic>
</source>
                    <publisher-name>Frontiers Media S.A.</publisher-name>;<year>2019</year>;<volume>10</volume>: p.<fpage>15</fpage>.
                    <pub-id pub-id-type="pmid">31803161</pub-id>
                    <pub-id pub-id-type="doi">10.3389/fmicb.2019.02626</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6872637</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bolger</surname>
                            <given-names>AM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lohse</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Usadel</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>Trimmomatic: A flexible trimmer for Illumina sequence data.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2014</year>.
                    <pub-id pub-id-type="pmid">24695404</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btu170</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4103590</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Breitwieser</surname>
                            <given-names>FP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Salzberg</surname>
                            <given-names>SL</given-names>
                        </name>
</person-group>:
                    <article-title>Pavian: Interactive analysis of metagenomics data for microbiome studies and pathogen identification.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2020</year>;<volume>36</volume>(<issue>4</issue>):<fpage>1303</fpage>&#x2013;<lpage>1304</lpage>.
                    <pub-id pub-id-type="pmid">31553437</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btz715</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Brown</surname>
                            <given-names>ED</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wright</surname>
                            <given-names>GD</given-names>
                        </name>
</person-group>:
                    <article-title>Antibacterial drug discovery in the resistance era.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2016</year>;<volume>529</volume>(<issue>7586</issue>):<fpage>336</fpage>&#x2013;<lpage>343</lpage>.
                    <pub-id pub-id-type="pmid">26791724</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature17042</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Clarke</surname>
                            <given-names>EL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Taylor</surname>
                            <given-names>LJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Sunbeam: An extensible pipeline for analyzing metagenomic sequencing experiments.</article-title>
                    <source>

                        <italic toggle="yes">Microbiome.</italic>
</source>
                    <year>2019</year>;<volume>7</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>13</lpage>.
                    <pub-id pub-id-type="pmid">30902113</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s40168-019-0658-x</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6429786</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cole</surname>
                            <given-names>JR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fish</surname>
                            <given-names>JA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Ribosomal Database Project: Data and tools for high throughput rRNA analysis.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2014</year>.
                    <pub-id pub-id-type="pmid">24288368</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkt1244</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3965039</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Donovan</surname>
                            <given-names>PD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gonzalez</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Higgins</surname>
                            <given-names>DG</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Identification of fungi in shotgun metagenomics datasets.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2018</year>;<volume>13</volume>(<issue>2</issue>):<fpage>1</fpage>&#x2013;<lpage>16</lpage>.
                    <pub-id pub-id-type="pmid">29444186</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0192898</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5812651</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Edwards</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Edwards</surname>
                            <given-names>R</given-names>
                        </name>
</person-group>:
                    <article-title>Fastq-pair: efficient synchronization of paired-end fastq files.</article-title>
                    <source>

                        <italic toggle="yes">BioRxiv.</italic>
</source>
                    <year>2019</year>;<fpage>552885</fpage>.
                    <pub-id pub-id-type="doi">10.1101/552885</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Felix</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jablonski</surname>
                            <given-names>KP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Letcher</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Sustainable data analysis with Snakemake.</article-title>
                    <year>2020</year>.<fpage>1</fpage>&#x2013;<lpage>16</lpage>.
                    <pub-id pub-id-type="doi">10.12688/f1000research.29032.1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Foster</surname>
                            <given-names>I</given-names>
                        </name>
</person-group>:
                    <article-title>Globus online: Accelerating and democratizing science through cloud-based services.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Internet Computing.</italic>
</source>
                    <year>2011</year>;<volume>15</volume>(<issue>3</issue>):<fpage>70</fpage>&#x2013;<lpage>73</lpage>.
                    <pub-id pub-id-type="doi">10.1109/MIC.2011.64</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hyatt</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>GL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>LoCascio</surname>
                            <given-names>PF</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Prodigal: Prokaryotic gene recognition and translation initiation site identification.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2010</year>.
                    <pub-id pub-id-type="pmid">20211023</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-11-119</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2848648</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <mixed-citation publication-type="book">
                    <collab>Illumina</collab>:
                    <article-title>Quality Scores.</article-title>
                    <source>

                        <italic toggle="yes">Technical Note: Informatics.</italic>
</source>
                    <year>2014</year>:<fpage>1</fpage>&#x2013;<lpage>2</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.illumina.com/documents/products/technotes/technote_understanding_quality_scores.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <mixed-citation publication-type="other">
                    <collab>Illumina</collab>:
                    <article-title>iGenomes.</article-title>
                    <year>n.d.</year>.Retrieved October 12, 2020
                    <ext-link ext-link-type="uri" xlink:href="https://support.illumina.com/sequencing/sequencing_software/igenome.html">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jones</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>NEON Educational Resources for Online Teaching.</article-title>
                    <source>

                        <italic toggle="yes">NEON Observatory Blog.</italic>
</source>
                    <year>2020</year>.</mixed-citation>
            </ref>
            <ref id="ref21">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jonsson</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>&#x00d6;sterlund</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nerman</surname>
                            <given-names>O</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Variability in Metagenomic Count Data and Its Influence on the Identification of Differentially Abundant Genes.</article-title>
                    <source>

                        <italic toggle="yes">J Comput Biol.</italic>
</source>
                    <year>2017</year>;<volume>24</volume>(<issue>4</issue>):<fpage>311</fpage>&#x2013;<lpage>326</lpage>.
                    <pub-id pub-id-type="pmid">27892712</pub-id>
                    <pub-id pub-id-type="doi">10.1089/cmb.2016.0180</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kalantar</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Carvalho</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>de Bourcy</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>IDseq &#x2013; An Open Source Cloud-based Pipeline and Analysis Service for Metagenomic Pathogen Detection and Monitoring. April, 1&#x2013;14.</article-title>
                    <year>2020</year>.
                    <pub-id pub-id-type="doi">10.1101/2020.04.07.030551</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kang</surname>
                            <given-names>DD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kirton</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>MetaBAT 2: An adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies.</article-title>
                    <source>

                        <italic toggle="yes">PeerJ.</italic>
</source>
                    <year>2019</year>;<volume>2019</volume>(<issue>7</issue>):<fpage>1</fpage>&#x2013;<lpage>13</lpage>.
                    <pub-id pub-id-type="pmid">31388474</pub-id>
                    <pub-id pub-id-type="doi">10.7717/peerj.7359</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6662567</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Keller</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schimel</surname>
                            <given-names>DS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hargrove</surname>
                            <given-names>WW</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A continental strategy for the National Ecological Observatory Network.</article-title>
                    <source>

                        <italic toggle="yes">Front Ecol Environ.</italic>
</source>
                    <year>2008</year>;<volume>6</volume>(<issue>5</issue>):<fpage>282</fpage>&#x2013;<lpage>284</lpage>.
                    <pub-id pub-id-type="doi">10.1890/1540-9295(2008)6[282:ACSFTN]2.0.CO;2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref52">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>K&#x00f6;ster</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rahmann</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Snakemake-a scalable bioinformatics workflow engine.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2012</year>;<volume>28</volume>(<issue>19</issue>):<fpage>2520</fpage>&#x2013;<lpage>2522</lpage>.</mixed-citation>
            </ref>
            <ref id="ref25">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ladoukakis</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kolisis</surname>
                            <given-names>FN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chatziioannou</surname>
                            <given-names>AA</given-names>
                        </name>
</person-group>:
                    <article-title>Integrative workflows for metagenomic analysis.</article-title>
                    <source>

                        <italic toggle="yes">Front Cell Dev Biol.</italic>
</source>
                    <year>2014</year>;<volume>2</volume>(<issue>NOV</issue>):<fpage>1</fpage>&#x2013;<lpage>11</lpage>.
                    <pub-id pub-id-type="pmid">25478562</pub-id>
                    <pub-id pub-id-type="doi">10.3389/fcell.2014.00070</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4237130</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Luo</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>CM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices.</article-title>
                    <source>

                        <italic toggle="yes">In Methods.</italic>
</source>
                    <year>2016</year>.
                    <pub-id pub-id-type="doi">10.1016/j.ymeth.2016.02.020</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Love</surname>
                            <given-names>MI</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Anders</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2014</year>;<volume>15</volume>(<issue>12</issue>):<fpage>1</fpage>&#x2013;<lpage>21</lpage>.
                    <pub-id pub-id-type="pmid">25516281</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-014-0550-8</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4302049</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lynch</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Streamlining and simplification of microbial genome architecture.</article-title>
                    <source>

                        <italic toggle="yes">Annu Rev Microbiol.</italic>
</source>
                    <year>2006</year>;<volume>60</volume>:<fpage>327</fpage>&#x2013;<lpage>349</lpage>.
                    <pub-id pub-id-type="pmid">16824010</pub-id>
                    <pub-id pub-id-type="doi">10.1146/annurev.micro.60.080805.142300</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref29">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Martin</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Cutadapt removes adapter sequences from high-throughput sequencing reads.</article-title>
                    <source>

                        <italic toggle="yes">EMBnet.</italic>
</source>
                    <year>2010</year>.
                    <pub-id pub-id-type="doi">10.14806/ej.17.1.200</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref53">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mikheenko</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Saveliev</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gurevich</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>MetaQUAST: Evaluation of metagenome assemblies.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2016</year>;<volume>32</volume>(<issue>7</issue>):<fpage>1088</fpage>&#x2013;<lpage>1090</lpage>.</mixed-citation>
            </ref>
            <ref id="ref30">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mukherjee</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huntemann</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ivanova</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Large-scale contamination of microbial isolate genomes by illumina Phix control.</article-title>
                    <source>

                        <italic toggle="yes">Stand Genomic Sci.</italic>
</source>
                    <year>2015</year>;<volume>10</volume>(<issue>APRIL2015</issue>),<fpage>1</fpage>&#x2013;<lpage>4</lpage>.
                    <pub-id pub-id-type="pmid">26203331</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1944-3277-10-18</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4511556</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref31">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nasko</surname>
                            <given-names>DJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Koren</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Phillippy</surname>
                            <given-names>AM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2018</year>;<volume>19</volume>(<issue>1</issue>):<fpage>165</fpage>.
                    <pub-id pub-id-type="pmid">30373669</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-018-1554-6</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6206640</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref32">
                <mixed-citation publication-type="other">
                    <collab>National Ecological Observatory Network</collab>:
                    <article-title>Soil shotgun metagenomes (DP1.10107.001) RELEASE-2021.</article-title>
                    <year>Feb 8, 2021</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://data.neonscience.org">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nayfach</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Roux</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Seshadri</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A genomic catalog of Earth&#x2019;s microbiomes.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2020</year>.
                    <pub-id pub-id-type="pmid">33169036</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41587-020-0718-6</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>O&#x2019;Leary</surname>
                            <given-names>NA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wright</surname>
                            <given-names>MW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brister</surname>
                            <given-names>JR</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2016</year>.
                    <pub-id pub-id-type="pmid">26553804</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkv1189</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4702849</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref54">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pearman</surname>
                            <given-names>WS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Freed</surname>
                            <given-names>NE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Silander</surname>
                            <given-names>OK</given-names>
                        </name>
</person-group>:
                    <article-title>Testing the advantages and disadvantages of short- And long-read eukaryotic metagenomics using simulated reads.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2020</year>;<volume>21</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>15</lpage>.</mixed-citation>
            </ref>
            <ref id="ref35">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>P&#x00e9;rez-Cobas</surname>
                            <given-names>AE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gomez-Valero</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Buchrieser</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <article-title>Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses.</article-title>
                    <source>

                        <italic toggle="yes">Microb Genom.</italic>
</source>
                    <year>2020</year>;<volume>6</volume>(<issue>8</issue>).
                    <pub-id pub-id-type="pmid">32706331</pub-id>
                    <pub-id pub-id-type="doi">10.1099/mgen.0.000409</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7641418</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref36">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>P&#x00e9;rez-Cobas</surname>
                            <given-names>AE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gomez-Valero</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Buchrieser</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <article-title>Metagenomic approaches in microbial ecology: an update on genome and marker gene sequencing analyses.</article-title>
                    <source>

                        <italic toggle="yes">Microb Genom.</italic>
</source>
                    <year>2020</year>;<volume>6</volume>(<issue>8</issue>).
                    <pub-id pub-id-type="pmid">32706331</pub-id>
                    <pub-id pub-id-type="doi">10.1099/mgen.0.000409</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7641418</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref37">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Quast</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pruesse</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yilmaz</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2013</year>;<volume>41</volume>(<issue>D1</issue>):<fpage>590</fpage>&#x2013;<lpage>596</lpage>.
                    <pub-id pub-id-type="pmid">23193283</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gks1219</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3531112</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref38">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Quince</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Walker</surname>
                            <given-names>AW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Simpson</surname>
                            <given-names>JT</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Shotgun metagenomics, from sampling to analysis.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2017</year>;<volume>35</volume>(<issue>9</issue>):<fpage>833</fpage>&#x2013;<lpage>844</lpage>.
                    <pub-id pub-id-type="pmid">28898207</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3935</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref39">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schmitt</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stoob</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hamscher</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Tetracyclines and tetracycline resistance in agricultural soils: Microcosm and field studies.</article-title>
                    <source>

                        <italic toggle="yes">Microb Ecol.</italic>
</source>
                    <year>2006</year>;<volume>51</volume>(<issue>3</issue>):<fpage>267</fpage>&#x2013;<lpage>276</lpage>.
                    <pub-id pub-id-type="pmid">16598633</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s00248-006-9035-y</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref40">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sczyrba</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hofmann</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Belmann</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Critical Assessment of Metagenome Interpretation - A benchmark of metagenomics software.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Methods.</italic>
</source>
                    <year>2017</year>;<volume>14</volume>(<issue>11</issue>):<fpage>1063</fpage>&#x2013;<lpage>1071</lpage>.
                    <pub-id pub-id-type="pmid">28967888</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4458</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5903868</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref41">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sieber</surname>
                            <given-names>CMK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Probst</surname>
                            <given-names>AJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sharrar</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy.</article-title>
                    <source>

                        <italic toggle="yes">Nat Microbiol.</italic>
</source>
                    <year>2018</year>;<volume>3</volume>(<issue>7</issue>):<fpage>836</fpage>&#x2013;<lpage>843</lpage>.
                    <pub-id pub-id-type="pmid">29807988</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41564-018-0171-1</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6786971</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref42">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tu</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lin</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cheng</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>NCycDB: A curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2019</year>;<volume>35</volume>(<issue>6</issue>):<fpage>1040</fpage>&#x2013;<lpage>1048</lpage>.
                    <pub-id pub-id-type="pmid">30165481</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bty741</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref43">
                <mixed-citation publication-type="other">
                    <collab>US Long Term Ecological Research Network</collab>:
                    <article-title>LTER Sites.</article-title>
                    <year>n.d.</year>.Retrieved October 13, 2020.
                    <ext-link ext-link-type="uri" xlink:href="https://lternet.edu/site/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref44">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Doak</surname>
                            <given-names>TG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ye</surname>
                            <given-names>Y</given-names>
                        </name>
</person-group>:
                    <article-title>Subtractive assembly for comparative metagenomics, and its application to type 2 diabetes metagenomes.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2015</year>;<volume>16</volume>(<issue>1</issue>).
                    <pub-id pub-id-type="pmid">26527161</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-015-0804-0</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4630832</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref45">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Waring</surname>
                            <given-names>BG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Averill</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hawkes</surname>
                            <given-names>CV</given-names>
                        </name>
</person-group>:
                    <article-title>Differences in fungal and bacterial physiology alter soil carbon and nitrogen cycling: Insights from meta-analysis and theoretical models.</article-title>
                    <source>

                        <italic toggle="yes">Ecol Lett.</italic>
</source>
                    <year>2013</year>;<volume>16</volume>(<issue>7</issue>):<fpage>887</fpage>&#x2013;<lpage>894</lpage>.
                    <pub-id pub-id-type="pmid">23692657</pub-id>
                    <pub-id pub-id-type="doi">10.1111/ele.12125</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref46">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Weder</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jensen</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>c.</article-title>
                    <source>

                        <italic toggle="yes">J Am Acad Child Adol Psych.</italic>
</source>
                    <year>2014</year>;<volume>53</volume>(<issue>4</issue>):<fpage>163</fpage>&#x2013;<lpage>178</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.jaac.2013.12.025</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref47">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Werbin</surname>
                            <given-names>Z</given-names>
                        </name>
</person-group>:
                    <article-title>zoey-rw/metagenomes_NEON: Adding license (Version v1.0.1).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2021, March 8</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.4589528</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref48">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wood</surname>
                            <given-names>DE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lu</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Langmead</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>Improved metagenomic analysis with Kraken 2.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2019</year>.
                    <pub-id pub-id-type="pmid">31779668</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-019-1891-0</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6883579</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref49">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wu</surname>
                            <given-names>YW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Simmons</surname>
                            <given-names>BA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Singer</surname>
                            <given-names>SW</given-names>
                        </name>
</person-group>:
                    <article-title>MaxBin 2.0: An automated binning algorithm to recover genomes from multiple metagenomic datasets.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2016</year>;<volume>32</volume>(<issue>4</issue>):<fpage>605</fpage>&#x2013;<lpage>607</lpage>.
                    <pub-id pub-id-type="pmid">26515820</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btv638</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref50">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xu</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hu</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Average gene length is highly conserved in prokaryotes and eukaryotes and diverges only between the two kingdoms.</article-title>
                    <source>

                        <italic toggle="yes">Mol Biol Evol.</italic>
</source>
                    <year>2006</year>;<volume>23</volume>(<issue>6</issue>):<fpage>1107</fpage>&#x2013;<lpage>1108</lpage>.
                    <pub-id pub-id-type="pmid">16611645</pub-id>
                    <pub-id pub-id-type="doi">10.1093/molbev/msk019</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report84561">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.54670.r84561</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Nelson</surname>
                        <given-names>William</given-names>
                    </name>
                    <xref ref-type="aff" rid="r84561a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-1873-3929</uri>
                </contrib>
                <aff id="r84561a1">
                    <label>1</label>Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>19</day>
                <month>7</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Nelson W</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport84561" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.51494.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <bold>Rationale:</bold>
            </p>
            <p> My main question is who is the audience for this pipeline? Is this intended to be used by students to learn some metagenomic analysis and how the NEON data set can be interrogated? Or is this intended to be used by researchers, in which case I think the downstream annotation and analysis components are somewhat thin. Is this officially recognized by NEON as a standard pipeline that will enable comparison between analyses? I don't wish to sound dismissive, but this reads like a Yet-Another-Metagenomics-Pipeline paper, which on one hand is fine - there's nothing technically or scientifically wrong with it - but this would be a more impactful report if the purpose behind it was more strongly presented.</p>
            <p> </p>
            <p> 
                <bold>Description:</bold>
            </p>
            <p> There is nothing wrong with the description of the various steps, but the descriptions are superficial. There is little discussion of why the methods were chosen and what their strengths and weaknesses are.</p>
            <p> </p>
            <p> 
                <bold>Replication:</bold>
            </p>
            <p> The code blocks are great, but the formatting rendered incorrectly in my browser (Firefox) - newlines were not present, making it hard to interpret what the actual commands are. Also, I tried to follow along with those commands on our institutional computing cluster and got stuck on the installation of sunbeam. I was able to install sunbeam on my desktop server, but the test of the install failed. I went ahead and tried to follow the analysis anyway, but ran into multiple problems. Just a caveat that providing the commands doesn't ensure replicability.&#x00a0;</p>
            <p> </p>
            <p> 
                <bold>A few other comments:</bold>
            </p>
            <p> End of Dataset description: "TOS Science Design for Terrestrial Microbial Diversity, NEON.DOC.000908" - What is this?</p>
            <p> </p>
            <p> The comment about miniconda, "this command may work", is likely to be confusing. Might be best just to say that anaconda is required and to talk to local IT about its availability and how to use it.</p>
            <p> </p>
            <p> The transition between section 1.2 and 2 should make it clearer that section 1.2 was describing constructing the configuration file and sections 2 through 5 are describing the individual steps that make up the sunbeam pipeline. As it reads now, it could be interpreted that the QC step is subsequent to the sunbeam run.</p>
            <p> </p>
            <p> Is section 4.1b missing a code block?</p>
            <p> </p>
            <p> I did not understand what you meant by "We use the homolog protein genes to construct our reference database." in section 5.</p>
            <p> </p>
            <p> The Bowers 2017 reference appears to be missing from the bibliography.</p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the method technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>I have 20 years experience performing microbial genomic and metagenomic analysis, including assembly, binning and annotation.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment7484-84561">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Werbin</surname>
                            <given-names>Zoey</given-names>
                        </name>
                        <aff>Boston University, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>22</day>
                    <month>11</month>
                    <year>2021</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <italic>My main question is who is the audience for this pipeline? Is this intended to be used by students to learn some metagenomic analysis and how the NEON data set can be interrogated? Or is this intended to be used by researchers, in which case I think the downstream annotation and analysis components are somewhat thin. Is this officially recognized by NEON as a standard pipeline that will enable comparison between analyses? I don't wish to sound dismissive, but this reads like a Yet-Another-Metagenomics-Pipeline paper, which on one hand is fine - there's nothing technically or scientifically wrong with it - but this would be a more impactful report if the purpose behind it was more strongly presented.</italic>
                            </p>
                        </list-item>
                    </list> Thank you for identifying these deficiencies within the manuscript. Our intended audience is both students and researchers working with NEON soil metagenomes. We have stated this explicitly in the last paragraph of the Introduction to the article, and strengthened each section of the paper to increase its value to these groups. Specifically, we have added subsections titled "Background and Rationale" and "Considerations for NEON data" to each analysis section. We plan to submit this revised manuscript for inclusion as a NEON community resource.&#x00a0; 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <italic>There is nothing wrong with the description of the various steps, but the descriptions are superficial. There is little discussion of why the methods were chosen and what their strengths and weaknesses are.</italic>
                            </p>
                        </list-item>
                    </list> Each step has now been supplemented with descriptions of our preferred methods as well as the strengths and weaknesses of alternative methods (in "Background and Rationale"). We describe which methods have or have not been benchmarked or optimized for soil metagenomes, specifically, as well as their usefulness for the NEON dataset, given the properties of the data (in "Considerations for NEON data"). 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <italic>The code blocks are great, but the formatting rendered incorrectly in my browser (Firefox) - newlines were not present, making it hard to interpret what the actual commands are. Also, I tried to follow along with those commands on our institutional computing cluster and got stuck on the installation of sunbeam. I was able to install sunbeam on my desktop server, but the test of the install failed. I went ahead and tried to follow the analysis anyway, but ran into multiple problems. Just a caveat that providing the commands doesn't ensure replicability.&#x00a0;</italic>
                            </p>
                        </list-item>
                    </list> Great points. In response to this and to the comments of Reviewer #1, we have adjusted our specific bioinformatic methods to address Sunbeam installation issues. We now recommend the stable branch of the metaGEM pipeline, which has run successfully in multiple Linux environments. The code blocks have all been shortened to improve readability and cross-browser formatting. 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <italic>End of Dataset description: "TOS Science Design for Terrestrial Microbial Diversity, NEON.DOC.000908" - What is this?</italic>
                            </p>
                        </list-item>
                    </list> The citation for this sampling protocol document has been changed to "Stanish &amp; Parnell, 2018", with the full protocol version information within the Works Cited. 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <italic>The comment about miniconda, "this command may work", is likely to be confusing. Might be best just to say that anaconda is required and to talk to local IT about its availability and how to use it.</italic>
                            </p>
                        </list-item>
                    </list> The sentence on miniconda requirements has been revised to point readers to their system administrators. 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <italic>The transition between section 1.2 and 2 should make it clearer that section 1.2 was describing constructing the configuration file and sections 2 through 5 are describing the individual steps that make up the sunbeam pipeline. As it reads now, it could be interpreted that the QC step is subsequent to the sunbeam run.</italic>
                            </p>
                        </list-item>
                    </list> This recommendation is no longer relevant, given our shift in methods and manuscript organization. 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <italic>Is section 4.1b missing a code block?</italic>
                            </p>
                        </list-item>
                    </list> This section is no longer present, given our shift in methods and manuscript organization. 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <italic>I did not understand what you meant by "We use the homolog protein genes to construct our reference database." in section 5.</italic>
                            </p>
                        </list-item>
                    </list> This section is no longer present, given our shift in methods and manuscript organization. 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <italic>The Bowers 2017 reference appears to be missing from the bibliography.</italic>
                            </p>
                        </list-item>
                    </list> This reference has been added to the bibliography.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report83581">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.54670.r83581</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Zimmerman</surname>
                        <given-names>Naupaka</given-names>
                    </name>
                    <xref ref-type="aff" rid="r83581a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2168-6390</uri>
                </contrib>
                <aff id="r83581a1">
                    <label>1</label>Department of Biology, University of San Francisco, San Francisco, CA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>28</day>
                <month>6</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Zimmerman N</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport83581" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.51494.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This is a timely and valuable contribution that has the potential to aid in the use of NEON data by a wider audience. The core approach (using Sunbeam, a snakemake pipeline, to analyze NEON metagenomics data) seems like a good one, and will offer advantages to users who are not yet comfortable enough to develop their own such pipeline from scratch.</p>
            <p> </p>
            <p> While in general the approach is a good one and the need for the tool is real and well-articulated by the authors, there are a number of aspects that could be improved to maximize the value of this contribution. I will outline a few here, but I was unable to complete the full pipeline in my testing using the example data specified in the manuscript, and so I am not able to comment on all aspects of the pipeline at this time. I would be happy to do another view and assessment after hearing from the authors.</p>
            <p> </p>
            <p> I outline some suggestions below:</p>
            <p> </p>
            <p> In the last paragraph of the introduction, I would encourage the authors to revise this sentence: "The pipeline that we present here is designed to complement existing NEON educational resources, such that users without prior bioinformatics experience may use this dataset to learn about microbial communities within the soil." The background skills that are necessary to successfully understand and implement the approach outlined here is not trivial and I don't think it's exactly best suited for someone "without prior bioinformatics experience". I think such a user would more likely need a graphical interface that did not presume comfort with the *nix command line etc. I think the approach outlined here is a valuable contribution because it targets users who may have some comfort with programmatic and command-line approaches, but does not yet have the skill to develop a flexible pipeline themselves.</p>
            <p> </p>
            <p> In the methods section, first paragraph, I think I would revise to be more careful with tenses. In some cases the collection protocols will remain mostly unchanged (e.g. I don't think NEON is planning to add any core sites), but other things may change (the kits that they use, the sequencing depth or sequencer used, etc. Since NEON is a 30 year project, it might help the manuscript's longevity if this paragraph were worded to reflect possible future methodological changes.</p>
            <p> </p>
            <p> I might encourage a mention or a suggestion that users use tmux or screen to run pipelines like this is they are connected to a remote server over something like ssh. If the connection drops during a many hours long pipeline, it can be quite frustrating.</p>
            <p> </p>
            <p> In step 1.2, why do you suggest the use of the develop branch of Sunbeam? Isn't that more likely to include breaking changes that will be overly challenging for the target audience? Perhaps this could be adjusted to use a stable branch or version, and the text could highlight the develop branch alternative for those willing to trade troubleshooting time in exchange for quicker access to more advanced features.</p>
            <p> </p>
            <p> For downloading the config file, it might be better to pull from an archival version of the file instead of the github version, or at the least include a version at a specific commit and not just the main branch, so that it remains stable. Otherwise either the code could break, or the authors would need to continually update the configuration to track with software changes.</p>
            <p> </p>
            <p> In my testing of the approach in the manuscript, I am unable to get past the tests that occur after the installation of Sunbeam (`bash tests/run_tests.bash`). The tests repeatedly fail with segmentation faults during either the megahit or kraken steps. This is on an Ubuntu 20.04 machine with lots of RAM/disk space/cores. I am not sure where the issue is, and I would consider myself reasonably able to troubleshoot such problems, so I am concerned that similar problems might arise and be too challenging for the target audience/user. I would be happy to work with the authors in more detail to resolve this problem (share log files, etc). I shall share them via a comment when I am able to.</p>
            <p> </p>
            <p> Overall, I think this is a valuable contribution that fills a need in the community and uses a good approach to do so. However, in its current form, I cannot successfully run the example code, even on the recommended sample files, and so I have concerns with the brittleness of the approach outlined. I'd encourage the authors to do some additional testing on other machines and settings, and/or build some more resilience into the installation walkthrough so that the average target user is able to make use of this contribution.</p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the method technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Environmental microbial ecology, including specific experience in bioinformatics and pipelines, and several years of experience working with large NEON sequencing datasets.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment7485-83581">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Werbin</surname>
                            <given-names>Zoey</given-names>
                        </name>
                        <aff>Boston University, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>22</day>
                    <month>11</month>
                    <year>2021</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Original reviewer comments are italicized.&#x00a0; 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <italic>This is a timely and valuable contribution that has the potential to aid in the use of NEON data by a wider audience. The core approach (using Sunbeam, a snakemake pipeline, to analyze NEON metagenomics data) seems like a good one, and will offer advantages to users who are not yet comfortable enough to develop their own such pipeline from scratch. While in general the approach is a good one and the need for the tool is real and well-articulated by the authors, there are a number of aspects that could be improved to maximize the value of this contribution. I will outline a few here, but I was unable to complete the full pipeline in my testing using the example data specified in the manuscript, and so I am not able to comment on all aspects of the pipeline at this time. I would be happy to do another view and assessment after hearing from the authors.</italic>
                            </p>
                            <p> </p>
                            <p> Thank you for highlighting the issues with the reproducibility of the pipeline we outlined. Due to the referenced issues with installing software, we have switched to a similar Snakemake pipeline (metaGEM) that has been tested on various computing systems. We describe this new pipeline in the "Implementation" section of the revised manuscript.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <italic>In the last paragraph of the introduction, I would encourage the authors to revise this sentence: "The pipeline that we present here is designed to complement existing NEON educational resources, such that users without prior bioinformatics experience may use this dataset to learn about microbial communities within the soil." The background skills that are necessary to successfully understand and implement the approach outlined here is not trivial and I don't think it's exactly best suited for someone "without prior bioinformatics experience". I think such a user would more likely need a graphical interface that did not presume comfort with the *nix command line etc. I think the approach outlined here is a valuable contribution because it targets users who may have some comfort with programmatic and command-line approaches, but does not yet have the skill to develop a flexible pipeline themselves.</italic>
                            </p>
                            <p> </p>
                            <p> This sentence has been revised to reflect that our audience is those with basic bioinformatics experience. Further, each section of the manuscript has been expanded to include a thorough description of the rationale for various decisions in the subsections "Background and Rationale" and "Considerations for NEON data", so that this can be a more useful introductory guide to soil metagenomics.&#x00a0;</p>
                        </list-item>
                        <list-item>
                            <p>
                                <italic>In the methods section, first paragraph, I think I would revise to be more careful with tenses. In some cases the collection protocols will remain mostly unchanged (e.g. I don't think NEON is planning to add any core sites), but other things may change (the kits that they use, the sequencing depth or sequencer used, etc. Since NEON is a 30 year project, it might help the manuscript's longevity if this paragraph were worded to reflect possible future methodological changes.</italic>
                            </p>
                            <p> </p>
                            <p> Tenses in the "Dataset description" section have been modified to reflect that the reported sampling and sequencing protocols are accurate as of 2021. We state that this bioinformatics protocol is intended for short-read data specifically, and that NEON protocols may shift in the future.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <italic>I might encourage a mention or a suggestion that users use tmux or screen to run pipelines like this is they are connected to a remote server over something like ssh. If the connection drops during a many hours long pipeline, it can be quite frustrating.</italic>
                            </p>
                            <p> </p>
                            <p> We now reference tmux and screen in Implementation, within the sub-section "Local vs cluster analysis".</p>
                        </list-item>
                        <list-item>
                            <p>
                                <italic>In step 1.2, why do you suggest the use of the develop branch of Sunbeam? Isn't that more likely to include breaking changes that will be overly challenging for the target audience? Perhaps this could be adjusted to use a stable branch or version, and the text could highlight the develop branch alternative for those willing to trade troubleshooting time in exchange for quicker access to more advanced features.</italic>
                            </p>
                            <p> </p>
                            <p> Due to our shift in methods, we no longer use either the develop or stable branch of Sunbeam. At the time of writing, however, the develop branch had implemented a potential fix for the segmentation fault errors, but it did not resolve errors on all operating systems. We hope the local and cluster options for running the metaGEM pipeline will also help with reducing troubleshooting time.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <italic>For downloading the config file, it might be better to pull from an archival version of the file instead of the github version, or at the least include a version at a specific commit and not just the main branch, so that it remains stable. Otherwise either the code could break, or the authors would need to continually update the configuration to track with software changes.</italic>
                            </p>
                            <p> </p>
                            <p> With our shift from Sunbeam to metaGEM, we decided to remove the example configuration file. The configuration file that comes installed with metaGEM primarily needs file paths to be modified by the user, whereas most parameters can be left as-is. Throughout the text, we've bolded sentences that instruct the user to modify the configuration filepaths.</p>
                        </list-item>
                    </list> 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <italic>In my testing of the approach in the manuscript, I am unable to get past the tests that occur after the installation of Sunbeam (`bash tests/run_tests.bash`). The tests repeatedly fail with segmentation faults during either the megahit or kraken steps. This is on an Ubuntu 20.04 machine with lots of RAM/disk space/cores. I am not sure where the issue is, and I would consider myself reasonably able to troubleshoot such problems, so I am concerned that similar problems might arise and be too challenging for the target audience/user. I would be happy to work with the authors in more detail to resolve this problem (share log files, etc). I shall share them via a comment when I am able to.</italic>
                            </p>
                        </list-item>
                    </list> 
                    <italic>Overall, I think this is a valuable contribution that fills a need in the community and uses a good approach to do so. However, in its current form, I cannot successfully run the example code, even on the recommended sample files, and so I have concerns with the brittleness of the approach outlined. I'd encourage the authors to do some additional testing on other machines and settings, and/or build some more resilience into the installation walkthrough so that the average target user is able to make use of this contribution.</italic>
                </p>
                <p> </p>
                <p> These are excellent points and led to a dramatic shift in the focus and implementation of this analysis pipeline. The main text of the manuscript now focuses on the various options available to users for each step of soil metagenomic analysis, and describes issues specific to soil ecology and the NEON dataset specifically. The code at the end of each section is now an example of how these decisions may be implemented via specific tools. For this revision, we have communicated with the developers of the tools mentioned (metaGEM and Toolchest) and are confident that these tools will maintain resilience in the coming years. We hope this sufficiently addresses problems of brittleness.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
