<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.126463.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>A2TEA: Identifying trait-specific evolutionary adaptations</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>St&#x00f6;cker</surname>
                        <given-names>Tyll</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-7184-9472</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Uebermuth-Feldhaus</surname>
                        <given-names>Carolin</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-3071-2458</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Boecker</surname>
                        <given-names>Florian</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0732-6914</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Schoof</surname>
                        <given-names>Heiko</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-1527-3752</uri>
                    <xref ref-type="corresp" rid="c2">b</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Crop Bioinformatics, University of Bonn, Bonn, NRW, 53115, Germany</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:tyll.stoecker@gmail.com">tyll.stoecker@gmail.com</email>
                </corresp>
                <corresp id="c2">
                    <label>b</label>
                    <email xlink:href="mailto:schoof@uni-bonn.de">schoof@uni-bonn.de</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>5</day>
                <month>10</month>
                <year>2022</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2022</year>
            </pub-date>
            <volume>11</volume>
            <elocation-id>1137</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>29</day>
                    <month>9</month>
                    <year>2022</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 St&#x00f6;cker T et al.</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/11-1137/pdf"/>
            <abstract>
                <p>
                    <bold>Background:</bold> Plants differ in their ability to cope with external stresses (e.g., drought tolerance). Genome duplications are an important mechanism to enable plant adaptation. This leads to characteristic footprints in the genome, such as protein family expansion. We explore genetic diversity and uncover evolutionary adaptation to stresses by exploiting genome comparisons between stress-tolerant and sensitive-species and RNA-Seq data sets from stress experiments. Expanded gene families that are stress-responsive based on differential expression analysis could hint at species or clade-specific adaptation, making these gene families exciting candidates for follow-up tolerance studies and crop improvement.</p>
                <p>
                    <bold>Software:</bold> Integration of such cross-species omics data is a challenging task, requiring various steps of transformation and filtering. Ultimately, visualization is crucial for quality control and interpretation. To address this, we developed A2TEA: 
                    <bold>A</bold>utomated 
                    <bold>A</bold>ssessment of 
                    <bold>T</bold>rait-specific 
                    <bold>E</bold>volutionary 
                    <bold>A</bold>daptations, a Snakemake workflow for detecting adaptation footprints 
                    <italic toggle="yes">in silico.</italic> It functions as a one-stop processing pipeline, integrating protein family, phylogeny, expression, and protein function analysis. The pipeline is accompanied by an R Shiny web application that allows exploring, highlighting, and exporting the results interactively. This allows the user to formulate hypotheses regarding the genomic adaptations of one or a subset of the investigated species to a given stress.</p>
                <p>
                    <bold>Conclusions:</bold> While our research focus is on crops, the pipeline is entirely independent of the underlying species and can be used with any set of species. We demonstrate pipeline efficiency on real-world datasets and discuss the implementation and limits of our analysis workflow as well as planned extensions to its current state. The A2TEA workflow and web application are publicly available at: 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/tgstoecker/A2TEA.Workflow">https://github.com/tgstoecker/A2TEA.Workflow</ext-link> and 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/tgstoecker/A2TEA.WebApp">https://github.com/tgstoecker/A2TEA.WebApp</ext-link>, respectively.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>plants</kwd>
                <kwd>crops</kwd>
                <kwd>adaptation</kwd>
                <kwd>evolution</kwd>
                <kwd>stress</kwd>
                <kwd>workflow</kwd>
                <kwd>software</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <p>While genomic resources are being expanded in crop species, with more and more high-quality reference genome sequences and transcriptome datasets becoming available, the lack of integrated trait and functional information limits the ability to interpret genomic-scale datasets and discover genotype-phenotype associations. While many efforts have focussed on individual (model) species, the availability of omics data for many more or less related genomes opens opportunities to explore genetic diversity through multi-genome comparisons. While differential expression analysis has led to the discovery of many candidate genes involved in tolerance to stresses, we aim to prioritize genes that were targets of evolutionary adaptation, thus facilitating application in crop improvement. To identify genomic footprints of adaptation, we use comparative genomics to identify protein family expansions. Gene duplication is a major driver of molecular evolution,
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> and in plants, whole-genome duplication events are frequent (reviewed in Ref. 
                <xref ref-type="bibr" rid="ref3">3</xref>), but tandem and transposon-mediated duplications also play a role (reviewed in Ref. 
                <xref ref-type="bibr" rid="ref4">4</xref>). Most of the duplicates are lost or silenced,
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> but which duplicates are retained hints at some evolutionary advantage. These may be targets of adaptation.
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup> Especially for the adaptation of regulatory networks, duplication allows for neo- or subfunctionalization.
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup> These form an evolutionary scenario that can be observed based on our integration of phylogeny and differential expression under stress data. However, we also consider differential expression under a given stress in any species as a functional link to the given stress, even if there is not sufficient data to confirm neo- or subfunctionalization. This allows us to filter for gene family expansions functionally linked to the given stress, as not all adaptations and thus retained duplicates in a genome need to relate to tolerance to the given stress, other traits not under analysis will also show adaptation and thus protein family expansions (
                <xref ref-type="fig" rid="f1">Figure 1</xref>).</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>Figure 1. </label>
                <caption>
                    <title>Identification of interesting gene families for crop improvement by integration of differential gene expression with gene family expansion.</title>
                </caption>
                <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/138877/19751029-baa6-436b-b0cb-c65cf18a7a00_figure1.gif"/>
            </fig>
            <p>The challenge for this multi-genome approach is the cross-species integration of multiple types of omics data, which requires several software tools and various custom steps of transformation and filtering. To promote the exploration of genomics and transcriptomics data and the association of genotype with phenotype data in order to address adaptation, we developed 
                <bold>A2TEA</bold> (
                <bold>A</bold>utomated 
                <bold>A</bold>ssessment of 
                <bold>T</bold>rait-specific 
                <bold>E</bold>volutionary 
                <bold>A</bold>daptations). Our software aims at identifying candidate genes for stress adaptation in plant species and enables GUI-based exploration of the results, but is suitable for gene family expansion analysis integrated with differential expression data in any set of genomes. It is composed of a Snakemake workflow and an R (Shiny) package working in tandem to automate and ease all bioinformatics &amp; analysis tasks involved.</p>
            <p>The A2TEA.Workflow functions as a one-stop processing pipeline, integrating the prediction of gene families in form of orthologous groups (OGs) with the analysis of their phylogeny, protein function, and expression, using RNA-seq data from all species. It allows the user to formulate adaptation hypotheses as specific scenarios of gene family expansion in one or several of the genomes, for example, based on a classification of species as stress-tolerant or sensitive, or to identify clade-specific adaptations. As input, the workflow requires for each species a protein FASTA file for orthologous group prediction and RNA-seq reads suitable for differential expression analysis (control vs. treatment), together with either a genomic FASTA file with appropriate gene annotation or a transcriptomic/cDNA FASTA file. Functional information for each species can be provided by the user or can be optionally inferred by our tool AHRD (
                <ext-link ext-link-type="uri" xlink:href="https://github.com/groupschoof/AHRD">https://github.com/groupschoof/AHRD</ext-link>) during runtime. The single compressed output is ready for analysis with the R programming language
                <sup>
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup> as we took care to create well-structured objects and easy-to-parse outputs. In addition, in order to facilitate immediate and easy exploration and visualization of the results, we created the A2TEA.WebApp written in R Shiny,
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>
                </sup> which allows exploring, highlighting, and exporting the results interactively.</p>
            <p>The A2TEA.Workflow combines state-of-the-art bioinformatics software with custom integration steps to combine inferred gene family expansion events with expression results and functional associations.</p>
            <p>The workflow is designed as a complete solution starting with raw data and performing upstream quality controls and data transformations automatically. We also took care to allow for a high degree of customizability - e.g. RNA-seq analysis can be performed either alignment-based or using pseudoalignment, and tool-specific parameters can be tweaked in one central config file. Importantly, the workflow is designed to answer biological questions and as such requires the definition of hypotheses in form of combinations of the species of interest. For each hypothesis, the user needs to adjust parameters related to the definition and cutoffs of expansion events. This allows computation of results for several combinations in parallel and facilitates the investigation of many hypotheses downstream e.g., expansion in all tolerant species, in only a specific species, or in all species of a clade.</p>
            <p>The A2TEA.WebApp provides an interactive web interface to explore, filter, and visualize the previously generated results via a straightforward tab-structured dashboard design. We took care to create a user-friendly mouse-controlled experience in order to extend the usability from bioinformaticians to experimentalists. The user first uploads the output file of the workflow and chooses the specific &#x201c;hypothesis&#x201d; to investigate. This generates a general information tab providing an overview of phylogeny, expression, and set sizes of orthologous groups (OGs) passing the thresholds. The user is then able to switch to dedicated analysis tabs relating to 1) filtering and analyzing OGs with associated data, 2) set size comparisons and tests, and 3) gene ontology (GO) term enrichment analyses. Reactively rendered tables and visualizations are dynamically populated with links to databases such as Ensembl
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup> and AmiGO
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup> to allow for an immediate follow-up exploration of interesting genes. Tables and graphs can be exported in a variety of formats. The web application also provides a bookmarking system that facilitates the collection and export of the most interesting genes and OGs.</p>
            <p>To extend the usability of the workflow by allowing for further species-specific exploration of gene and geneset functional enrichments we integrated the creation of GeneTonic input data files into the A2TEA.Workflow. GeneTonic is a web application that serves as a comprehensive toolkit for streamlining the interpretation of functional enrichment analyses from RNA-seq data.
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup> As our workflow is built on Snakemake,
                <sup>
                    <xref ref-type="bibr" rid="ref13">13</xref>
                </sup> the addition of further analyses or outputs allows for modular expansion of its current state. We also intend to add further analyses and features to the A2TEA.WebApp web application.</p>
            <p>A2TEA combines best practices in both choice of tools as well as reproducibility and offers a one-stop solution for the integration of genome comparisons with expression and functional data to unravel candidate genes for natural adaptation, e.g. in stress-tolerant plant species. The web application empowers users to explore stress-specific gene family expansions combined with transcriptomic data from their own or published stress experiments by providing interactive visualizations, statistical tests, and dynamically generated database queries.</p>
            <p>Both the A2TEA.Workflow and A2TEA.WebApp are freely available at 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/tgstoecker/A2TEA.Workflow">https://github.com/tgstoecker/A2TEA.Workflow</ext-link> and 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/tgstoecker/A2TEA.WebApp">https://github.com/tgstoecker/A2TEA.WebApp</ext-link>, respectively, and archived in Zenodo.
                <sup>
                    <xref ref-type="bibr" rid="ref51">51</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref52">52</xref>
                </sup> For demonstration purposes, we also made a public instance of the web application available at 
                <ext-link ext-link-type="uri" xlink:href="https://tgstoecker.shinyapps.io/A2TEA-WebApp">https://tgstoecker.shinyapps.io/A2TEA-WebApp</ext-link>.</p>
        </sec>
        <sec id="sec2" sec-type="methods">
            <title>Methods</title>
            <sec id="sec3">
                <title>Implementation</title>
                <p>The A2TEA.Workflow is written in Python and makes use of the Snakemake workflow framework. It leverages the bioconda project channel
                    <sup>
                        <xref ref-type="bibr" rid="ref14">14</xref>
                    </sup> of the conda package manager to handle software installation and dependency management. Another tool from our lab, 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/groupschoof/AHRD">AHRD</ext-link>, is integrated as a Git submodule and can be optionally used to infer protein function annotation for any of the species under investigation.</p>
                <p>The typical use case for running the workflow consists of cloning the GitHub repository, configuring it to specific needs, and then starting the analyses with either installation of software and dependencies during runtime or usage of a Docker/Singularity container (
                    <xref ref-type="fig" rid="f2">Figure 2</xref>). Modification of the workflow is performed by changing dedicated configuration files controlling samples, species, hypotheses, and tool-specific options. With &#x201c;hypotheses&#x201d; we refer to the definition of &#x201c;gene family expansion&#x201d; in the set of species under investigation. Several hypotheses can be run in parallel. This multi-hypothesis structure permits the investigation of several defined biological questions, for instance, gene family expansion in stress-tolerant compared to stress-sensitive species. For each hypothesis, we always require the definition of a set of one or more species that should be checked for expansion compared to a second set of one or more species that should not show expansion. For each hypothesis, the user is able to set several options, such as the ratio or the minimal number of genes in a species, to qualify as an expanded OG. The hypotheses.tsv file is structured column-wise with both an index number and a &#x201c;name&#x201d; variable used to identify the choices throughout the workflow. Generally, the connection between files and workflow rules is achieved by the species names (e.g., &#x201c;Arabidopsis_thaliana&#x201d;). Many hypotheses can be computed in a single workflow with a single final output object that contains all results. This facilitates easy comparisons in the downstream web application, which is especially useful to check the parameter choices for the definition of gene family expansion or when it is necessary to work with unclear trait classification of some species.</p>
                <p>The final output generated by the workflow is a single .RData file that can be loaded into an active R environment with the load() command. This provides several separate objects containing all results in a compact form factor:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>HYPOTHESES.a2tea - List object with one S4 object per hypothesis. Each S4 object contains several layers of nested information. E.g. HYPOTHESES.a2tea$hypothesis_2@expanded_OGs$N0.HOG0001225 refers to a specific expanded OG and S4 data object that contains:</p>
                            <list list-type="bullet">
                                <list-item>
                                    <label>&#x2013;</label>
                                    <p>blast_table (complete BLAST/DIAMOND results for OG genes &amp; extended hits)</p>
                                </list-item>
                                <list-item>
                                    <label>&#x2013;</label>
                                    <p>add_OG_analysis (includes multiple sequence alignment (MSA), phylogenetic tree, and gene info for expanded OG and additional OGs based on best BLAST/DIAMOND hits)</p>
                                </list-item>
                            </list>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>HOG_level_list - List object with one tibble per hypothesis. Information includes OG, number of genes per species, boolean expansion info, number of significant differentially expressed genes (DEGs), and more. The last N list element is a non-redundant superset of all species analyzed over all formulated hypotheses. This makes it easy to create a comparison set e.g., conserved OGs of all species to which the hypothesis subset can then be compared.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>HOG_DE.a2tea - Tibble of DESeq2 results for all genes + additional columns.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>A2TEA.fa.seqs - Non-redundant list object containing corresponding amino acid FASTA sequences of all genes/transcripts in the final analysis (this includes those of expanded OGs + those in additional BLAST hits &amp; additional OGs based on user-chosen parameters).</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>SFA/SFA_OG_level - Gene/transcript level tables that contain functional predictions (human readable descriptions &amp; GO terms inferred by AHRD).</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>hypotheses - A copy of the user-defined hypotheses definitions for the underlying workflow run.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>all_speciesTree - Phylogenetic tree of all species in the workflow run (a non-redundant superset of hypotheses) as inferred by Orthofinder/STAG/Stride.</p>
                        </list-item>
                    </list>
                </p>
                <p>The .RData output can be investigated inside an R session or via the A2TEA.WebApp, which was specifically designed to allow for interactive inspection, visualization, filtering, and export of the results and subsets. We feature a tutorial for its usage and details on how to work with the results of an A2TEA.Workflow analysis run in the Use Case section and in the project&#x2019;s pkgdown site.</p>
                <p>The A2TEA.WebApp is written in the R programming language
                    <sup>
                        <xref ref-type="bibr" rid="ref8">8</xref>
                    </sup> and uses the Shiny
                    <sup>
                        <xref ref-type="bibr" rid="ref9">9</xref>
                    </sup> framework to facilitate interactivity with the data. It expects the user to upload an .RData file created by the A2TEA.Workflow. The web application comes with a test dataset that can be loaded with a single click so that interested users can try out its functionality before having to finish an A2TEA.Workflow run.</p>
                <p>We developed the web application following community standards and have set up a continuous integration system with GitHub actions that performs build checks of both the package itself and the associated pkgdown site
                    <sup>
                        <xref ref-type="bibr" rid="ref15">15</xref>
                    </sup> hosted via GitHub pages.</p>
                <p>The A2TEA.WebApp is written in the R programming language and uses the Shiny framework to facilitate interactivity with the data. It expects the user to upload a .RData file created by the A2TEA.Workflow. The web application comes with a test dataset that can be loaded with a single click so interested users can try out its functionality before having to finish an A2TEA.Workflow run. We developed the web application following community standards and have set up a continuous integration system with GitHub actions that performs build checks of both the package itself and the associated pkgdown site (Wickham et al., 2022)
                    <sup>
                        <xref ref-type="bibr" rid="ref15">15</xref>
                    </sup> hosted via GitHub pages.</p>
                <p>The interface is structured in tabs with shinydashboard
                    <sup>
                        <xref ref-type="bibr" rid="ref16">16</xref>
                    </sup> and shinydashboardPlus,
                    <sup>
                        <xref ref-type="bibr" rid="ref17">17</xref>
                    </sup> providing the layout infrastructure (shown in 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>). The main functionality includes a selector to choose which hypothesis to display (
                    <xref ref-type="fig" rid="f3">Figure 3B</xref>), a sidebar menu that enables the user to switch between different analysis types (
                    <xref ref-type="fig" rid="f3">Figure 3C</xref>), and tool-specific options for parameters, visuals, and export (
                    <xref ref-type="fig" rid="f3">Figure 3F</xref>).</p>
                <p>We designed the interface to allow the focus to be put on an individual analysis or plot to gain insight from the data. Plots and tables are contained in collapsible boxes, leaving it up to the user to decide how much information should be displayed at once. Additionally, we tried to separate important parameters from purely aesthetic choices in the plot options, with main options always visible at the side and aesthetic choices reactively displayed after the user switches a box toggle.</p>
                <p>Since the exploration of data can be a lengthy process with many iteration cycles, we looked for ways of aiding the user in storing the observations made. Following the example set by the GeneTonic web application,
                    <sup>
                        <xref ref-type="bibr" rid="ref12">12</xref>
                    </sup> we integrated a bookmarking system that temporarily stores interesting genes/transcripts and OGs. For this, the user needs to mark the respective ID in one of the tables and then click the dedicated bookmarking button displayed at the top of the interface (
                    <xref ref-type="fig" rid="f3">Figure 3D</xref>). All bookmarks are rendered in two reference tables, both in a dedicated tab as well as a pop-up window that can be displayed on every analysis tab. This quick reference is convenient when performing filtering operations in the tables or choosing an interesting OG to display. While these tables can of course, be downloaded, we also implemented a subsetting feature on the bookmarks tab that creates a smaller .RData file with information only pertaining to the bookmarks and associated data. These smaller subsets are fully functioning complete inputs and can be loaded into the application again at a later time - for re-plotting purposes for instance. With this feature, it is straightforward to extract, store and share all information about interesting genes, transcripts, or OGs.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Overview of the A2TEA.Workflow.</title>
                        <p>Workflow diagram of the A2TEA.Workflow displayed as Snakemake rulegraph. After computation of expanded orthologous groups (OGs) (rule expansion - marked with A) the directed acyclic graph (DAG) is re-evaluated since the results are not known beforehand. This Snakemake checkpoint then uses the reciprocal best hits computed by Orthofinder to find the N most similar additional OGs per OG, where N is a variable set by the user. For each OG and additional set of 1 to N additional OGs, multiple sequence alignments and phylogenetic trees are built and used in the downstream steps.</p>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/138877/19751029-baa6-436b-b0cb-c65cf18a7a00_figure2.gif"/>
                </fig>
            </sec>
            <sec id="sec4">
                <title>Operation</title>
                <p>The A2TEA.Workflow has been primarily designed for use within Linux and requires a standard bash environment with working installations of Snakemake
                    <sup>
                        <xref ref-type="bibr" rid="ref13">13</xref>
                    </sup> and conda/mamba (Ref. 
                    <xref ref-type="bibr" rid="ref14">14</xref>, 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/mamba-org/mamba">https://github.com/mamba-org/mamba</ext-link>). The former facilitates compatibility with common cluster setups such as SLURM or LSF. Instructions for a minimal setup are described in the project&#x2019;s README.</p>
                <p>At the moment, for each species, the A2TEA.Workflow requires as input RNA-Seq reads (both paired-/single-end possible) suitable for differential expression analysis (control vs. treatment), either a genomic or transcriptomic FASTA file, annotations, and a peptide FASTA file. The user can provide functional information per species, or it can be optionally inferred by our tool 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/groupschoof/AHRD">AHRD</ext-link> during the workflow. Control of the workflow is handled by several configuration files, which the user needs to adapt to their specific inputs and scientific questions.</p>
                <p>The samples.tsv table needs to list all RNA-seq FASTQ files with the columns providing additional information based on which the workflow can infer associations such as species, replicate, and the correct steps to perform. For instance, by leaving out the column for the second paired sample, it is automatically inferred that single-end options have to be used (trimming, mapping, etc.). Operations such as recognizing that files are gzipped and need to be handled appropriately are performed automatically as well. The species.tsv table functions similarly and needs to provide per species information on the FASTA and annotation files, the ploidy of the species, and the location of a file providing the functional information, in the form of GO terms, per protein. If no functional information can be provided the user can choose to add &#x201c;AHRD&#x201d; instead of a file path which will trigger a sub-workflow during computation that will create an appropriate file via our functional annotation tool 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/groupschoof/AHRD">AHRD</ext-link>. Based on whether the user provides a genomic or cDNA FASTA file for a species, the workflow will perform either traditional alignment with STAR
                    <sup>
                        <xref ref-type="bibr" rid="ref18">18</xref>
                    </sup> or pseudoalignment with kallisto.
                    <sup>
                        <xref ref-type="bibr" rid="ref19">19</xref>
                    </sup>
                </p>
                <p>The config.yaml controls parameters such as thread usage for individual steps, tool-specific parameters, and parameters relating specifically to the A2TEA.Workflow. Two other very important choices that have to be considered are whether or not automatic filtering for the longest representative isoforms of the peptide FASTA files should be performed and whether gene or transcript level quantification is wanted. Choosing automatic isoform filtering will create a subset peptide FASTA file with only the longest isoform per gene; the header will be shortened to just the gene name identifier. This option must be used in conjunction with gene level quantification since otherwise matching both types of data is not possible.</p>
                <p>The notion behind the hypotheses.tsv table is outlined in the Implementation section due to its central importance to the expansion calculation. Here, we briefly want to present some of the available choices the user can consider. Besides defining sets of species that should be analyzed for expansion compared to other sets of species, the user is able to specify the required numerical differences between the two and which OGs to disregard immediately. For instance, &#x201c;Nmin_expanded_in&#x201d; takes as input an integer value that defines the minimum number of the investigated species that need to fulfill the expansion criteria in order for the gene family to be called &#x201c;expanded&#x201d;. These criteria are for instance &#x201c;min_expansion_factor&#x201d; and &#x201c;min_expansion_difference&#x201d; with which either the minimum factor of expansion or the minimum number of additional genes compared to the non-expanded set of species can be defined. To complement these broad cutoffs, the workflow also integrates a hypothesis-specific CAFE analysis, with which changes in gene family size are analyzed in a way that accounts for phylogenetic history and provides a statistical foundation for evolutionary inferences.
                    <sup>
                        <xref ref-type="bibr" rid="ref20">20</xref>
                    </sup>
                </p>
                <p>After all choices have been made, the workflow can be started with a single Snakemake command. A2TEA.Workflow will then perform all previously listed steps and merge results into the final output file described in the Implementation section (
                    <xref ref-type="fig" rid="f2">Figure 2</xref>). The user is then able to investigate the integrated and condensed results.</p>
                <p>We offer several ways of starting an A2TEA.WebApp instance for downstream investigation of the data: 1) installation with R devtools from our GitHub repository, 2) a docker container with the latest release installed, and lastly, 3) a demo instance hosted on shinyapps.io (
                    <ext-link ext-link-type="uri" xlink:href="https://tgstoecker.shinyapps.io/A2TEA-WebApp/">https://tgstoecker.shinyapps.io/A2TEA-WebApp/</ext-link>). As the A2TEA.WebApp is an interactive tool with an explorative focus and no strict work order, we illustrate its core operative features in the dedicated Use cases section of this manuscript.</p>
            </sec>
        </sec>
        <sec id="sec5">
            <title>Use cases</title>
            <p>In this section, we will illustrate the functionality of the A2TEA.WebApp, using the A2TEA.Workflow results of a three-species analysis of 
                <italic toggle="yes">Hordeum vulgare</italic> (barley), 
                <italic toggle="yes">Zea mays</italic> (maize), and 
                <italic toggle="yes">Oryza sativa japonica</italic> (rice) that investigates adaptive processes in barley to drought stress. Details on the files used as well as their respective publication and SRA accession numbers are listed in detail in both GitHub repositories and the Source data section.
                <sup>
                    <xref ref-type="bibr" rid="ref48">48</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref49">49</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref50">50</xref>
                </sup>
            </p>
            <p>We integrated this dataset into the workflow and the web application to illustrate the software&#x2019;s setup and to allow for a quick exploration of the tools&#x2019; functionalities. After cloning the A2TEA.Workflow repository, an additional script can be run (get_test_data.sh) that quickly sets up the experiment by downloading the required input files. Peptide FASTA files are reduced to 2000 proteins; the transcriptomic data is subsampled to 2M reads to allow for a quicker runtime. The functional annotations are precomputed by 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/groupschoof/AHRD">AHRD</ext-link>. The differential expression analysis is set to be performed on the gene level and two comparisons are performed as defined in the hypotheses.tsv table. These are &#x201c;Expanded in barley compared to rice and maize&#x201d; and &#x201c;Expanded in barley compared to maize&#x201d;. For both, expansion is defined as "number of genes species A &#x2265; 2 &#x00d7; number of genes of species B&#x201d;.</p>
            <p>The final output produced by the workflow is also integrated into the current release of the A2TEA.WebApp and can be loaded via clicking the &#x201c;Try a demo A2TEA.RData file&#x201d; at the top of the interface.</p>
            <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                <label>Figure 3. </label>
                <caption>
                    <title>Screenshot of the trait evolutionary analysis tab in the A2TEA.WebApp.</title>
                    <p>The user can either decide to load the included test dataset or upload a .RData result object (A). Other options in the sidebar menu are a selector of the hypothesis (meaning: species comparison) to display or change to another analysis tab (C). Genes, transcripts, or orthologous groups (OGs) can be marked in tables or boxes ticked and then bookmarked with a dedicated button (D). Bookmarks have their dedicated tab but can also be displayed as a sidebar window anywhere for quick reference purposes (E). Analysis- or plot-specific parameters (F) are displayed to the left of the visualization, and a box-specific sidebar window for aesthetic parameters can be opened by clicking the gears icon (G). The underlying dataset investigates drought tolerance among four Brassicaceae species. Displayed here are the maximum likelihood phylogenies of a gene family showing potential subfunctionalization of Eutrema salsugineum homologs (top two genes in the tree). Blue bars show log2 (fold change) of gene expression between drought and control conditions, stars mark adj. p &#x2264; 0.1 significance cutoff, and the multiple sequence alignment of amino acid sequences is displayed to the right. We provide this particular OG as a bookmarked subset .RData file (see Underlying Data).</p>
                </caption>
                <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/138877/19751029-baa6-436b-b0cb-c65cf18a7a00_figure3.gif"/>
            </fig>
            <sec id="sec6">
                <title>Initial inspection of integrated data</title>
                <p>The general analysis tab is the default view inside the A2TEA.WebApp. Once input is loaded, reactive information boxes display the number of species, the number of expanded OGs, and the number of DEGs for the currently selected hypothesis. Changing the hypothesis (
                    <xref ref-type="fig" rid="f3">Figure 3B</xref>) e.g., to the second hypothesis in our test set (&#x201c;Expanded in barley compared to maize&#x201d;), changes the statistics and all other sets/plots to reflect only the species considered in the hypothesis. Two tables display gene-level differential expression results and functional annotation information (human readable descriptions and GO terms), which allow, for example, the exploration of genes related to a particular function. Also displayed are an inferred phylogenetic tree of the species in the hypothesis subset and an intersection plot (Venn/UpSet) which displays the number of conserved (OG with &#x2265;1 gene from every species), overlapping, or species unique OGs and singleton genes. Importantly, a table describing the details of the currently displayed hypothesis is also displayed. All of this facilitates a broad overview of the data and allows the user to spot errors such as faulty hypothesis definitions or cutoffs that are too strict.</p>
            </sec>
            <sec id="sec7">
                <title>Exploring expansion events with annotated phylogenetic trees</title>
                <p>The main feature of the TEA (trait-specific evolutionary adaptation) tab is a comprehensive toolkit for the visualization of maximum likelihood phylogenies of expanded OGs and associated information such as the log2(fold change) of the displayed genes and an MSA of the respective protein sequences (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>). The MSA can be added as a geometric layer to the tree plots or displayed separately with additional options such as a conservation bar (
                    <xref ref-type="fig" rid="f3">Figure 3A</xref>). To make an informed decision of which OGs are most worthwhile to investigate closer, a table showcasing the total and significantly differentially expressed genes per OG is also provided. With this, the user is enabled to apply several filters, for example, to select all expanded OGs that possess at least 1 DEG and more than 4 genes from 
                    <italic toggle="yes">Hordeum vulgare.</italic> The last table on the tab provides insight into the reciprocal BLAST/DIAMOND hits for the currently chosen OG and the additional most similar OGs. Notably, this table also provides the identifiers given to the proteins by Orthofinder,
                    <sup>
                        <xref ref-type="bibr" rid="ref21">21</xref>
                    </sup> making it easy to relate insight gained in the web application back to other outputs created by Orthofinder in the A2TEA.Workflow, such as the list of putative xenologs.</p>
            </sec>
            <sec id="sec8">
                <title>Comparing sets of orthologous groups</title>
                <p>To describe adaptive processes at a larger scale, we also integrated functionality to visualize distributions of user-defined OG sets and test for their over-representation; e.g., &#x201c;What is the frequency of OGs that show expansion and at least 1 DEG in Hordeum vulgare in all conserved OGs?&#x201d; and &#x201c;Is this set over-represented within the background distribution of conserved OGs with at least 1 DEG from any species?&#x201d;. We took care to make answering such questions very accessible by providing the user with text-based choices of which sets to plot or compare. Currently integrated are an enrichment analysis suite allowing for Fisher-Tests and a corresponding circular set plot (
                    <xref ref-type="fig" rid="f4">Figure 4B</xref>) that visualizes the chosen sets. Also provided is a tool for comparing the size distributions of the OGs (
                    <xref ref-type="fig" rid="f4">Figure 4C</xref>) with which group size effects can be checked; e.g., &#x201c;Do we see differences in the number of DEGs in OGs of a certain size range between the set of interest and the background set?&#x201d;.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Overview of several analysis plots featured in the A2TEA.WebApp.</title>
                        <p>(A) Multiple sequence alignment of an expanded orthologous group (OG) + additional most similar singletons or OGs. Bars at the bottom represent the degree of conservation. (B) Visual representation of a hypergeometric test for over-representation - colors/layers represent sets in the urn model. The outer ring shows the background set (light blue; &#x201c;complete background&#x201d;) and the subset in the background set (light orange; &#x201c;success in background&#x201d;). The inner ring displays the set of interest (blue; &#x201c;sample size&#x201d;) and the subset in the set of interest (orange; &#x201c;success in sample&#x201d;). (C) Barplot of the total number of OGs per group size (number of genes) between all OGs (blue) and only those OGs that are conserved among the analyzed species (orange). (D) Dotplot of the top over-represented biological processes in the subset of OGs of interest compared to the background; dot size indicates the number of OGs with the respective GO term; color displays negative log [base 10] of the p-values from the enrichment test.</p>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/138877/19751029-baa6-436b-b0cb-c65cf18a7a00_figure4.gif"/>
                </fig>
            </sec>
            <sec id="sec9">
                <title>Performing functional enrichment tests</title>
                <p>The last analysis tab provides options for performing GO term over-representation analysis based on the topGO R package.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup> Functions that occur more often than expected can be identified by setting several parameters that specify the set of OGs the user wants to analyze. With our test data, the user could, for instance, be interested in enriched molecular functions of OGs that are expanded in Hordeum vulgare and also possess at least 2 DEGs of Hordeum vulgare. Once computed, a table is displayed that shows the top significantly enriched GO terms and also contains dynamically created links for these to AmiGO2.
                    <sup>
                        <xref ref-type="bibr" rid="ref11">11</xref>
                    </sup> A second table contains information on the corresponding OGs and genes so that the user can follow up on a particular enriched GO term and inspect the underlying data. We also provide two visualizations that summarize the results. The first is a GO enrichment dotplot (
                    <xref ref-type="fig" rid="f4">Figure 4D</xref>) straightforwardly showcasing the overall results, and the second is a GO subgraph of selected top N enriched GO terms. With the latter, we provide the user with an insightful way of investigating how the significant GO terms are distributed over the GO graph.</p>
            </sec>
            <sec id="sec10">
                <title>Export options, bookmarking &amp; ending a session</title>
                <p>Tables can be downloaded as .tsv files, and plots are exportable into various formats, such as.pdf, .png, or.svg, allowing the user to easily save and share the observations and results. However, even a relatively small set of species, like the three Poaceae species in our test data, lead to several OGs that are worthwhile to investigate, substantiating the need for the bookmarking system outlined in the Implementation section. It quickly becomes very valuable to bookmark, e.g., all OGs annotated with the top 5 enriched BP GO terms in the OGs expanded in 
                    <italic toggle="yes">Hordeum vulgare</italic> if the intention is to return to the analysis later or to generate a list to use with another tool quickly. Relating this to the previous sub-sections, we want to emphasize that bookmarking is integral to using the A2TEA.WebApp and is fully featured on all analysis tabs except the &#x201c;Set analyses&#x201d; since here individual genes or OGs are not the focus. To further aid users in the bookmarking process, we also added informative pop-up messages to indicate for instance, that all selected genes/OGs have already been saved. Since the bookmarks can also be used to export a completely functional .RData subset file, only the most relevant information is kept while the processing speed is increased, and all relevant results of the integrative effort are kept. If, for instance, during the analysis, it turned out that hypothesis 2 in our example data (&#x201c;Expanded in barley compared to maize&#x201d;) is, in fact, not of interest anymore, subsetting the .RData file to interesting OGs of hypothesis 1 completely removes the unneeded &#x201c;bloat&#x201d; of hypothesis 2. Similarly, the user could create 2 .RData files (one for each hypothesis) and run a custom script on each separately, efficiently producing hypothesis-specific results.</p>
            </sec>
        </sec>
        <sec id="sec11" sec-type="discussion">
            <title>Discussion</title>
            <p>Classic transcriptomic studies produce large lists of gene regulatory information for which, traditionally, pathway or GO term analyses are used to discover the overall molecular trends caused by the experimental treatments.
                <sup>
                    <xref ref-type="bibr" rid="ref23">23</xref>
                </sup> We propose that we can identify novel genes relevant for stress adaptation by comparing same-stress experiments of several plant species with different levels of stress adaptation in combination with evolutionary footprints in the form of protein family expansion. As illustrated in the Methods &amp; Use Cases sections, our novel software tool A2TEA facilitates the identification of genes associated with the evolution of a trait in a species or a group of related species. Based on the rediscovery of known genes related to the trait, we believe that also novel genes discovered through A2TEA are related to the trait, but experimental verification is in progress, see below. As an example, 
                <xref ref-type="fig" rid="f3">Figure 3</xref> presents a possible subfunctionalization of gene duplicates in 
                <italic toggle="yes">Eutrema salsugineum</italic>, discovered from data of drought tolerance among four Brassicaceae species (details see Underlying Data). The 
                <italic toggle="yes">A. thaliana</italic> homolog is involved in drought stress response.
                <sup>
                    <xref ref-type="bibr" rid="ref24">24</xref>
                </sup>
            </p>
            <p>Several approaches have been employed to identify potential candidate genes that could provide a genetic basis for more resilient crops. This includes forward genetics approaches such as identifying causative genes for advantageous mutant phenotypes,
                <sup>
                    <xref ref-type="bibr" rid="ref25">25</xref>
                </sup> finding common regulators for several stresses via traditional transcriptomics,
                <sup>
                    <xref ref-type="bibr" rid="ref26">26</xref>
                </sup> usage of Quantitative Trait Locus (QTL) mapping and Genome-wide association studies (GWAS) incl. potential integration with expression data,
                <sup>
                    <xref ref-type="bibr" rid="ref27">27</xref>
                </sup> the combination of expression data with functional information and clustering methods
                <sup>
                    <xref ref-type="bibr" rid="ref28">28</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref29">29</xref>
                </sup> and also machine learning based approaches that employ transcriptomic or phenomic data as the basis of their candidate gene predictions.
                <sup>
                    <xref ref-type="bibr" rid="ref30">30</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref31">31</xref>
                </sup>
            </p>
            <p>The underlying methods are manifold and include approaches such as Bulked-Segregant analysis,
                <sup>
                    <xref ref-type="bibr" rid="ref32">32</xref>
                </sup> k-means clustering,
                <sup>
                    <xref ref-type="bibr" rid="ref33">33</xref>
                </sup> WGCNA,
                <sup>
                    <xref ref-type="bibr" rid="ref34">34</xref>
                </sup> co-expression networks
                <sup>
                    <xref ref-type="bibr" rid="ref35">35</xref>
                </sup> and set analyses of DEGs often in combination with pathway or GO term enrichment analyses.
                <sup>
                    <xref ref-type="bibr" rid="ref23">23</xref>
                </sup> While most studies share the approach of reducing a list of regulated genes via secondary criteria, to our knowledge, A2TEA is the first openly available tool that specifically combines stress-specific expression data from several species with gene family expansion to unravel candidate genes for stress adaptation in stress-tolerant species.</p>
            <p>With A2TEA, we present software that simplifies the complex bioinformatics workflow for the user and provides an interactive web interface for analysis of the results. By using Snakemake as a bioinformatic workflow manager, we remove the need for step-by-step handling of raw data (including software setup and dependencies necessary for computations) and ensure FAIR (findable, accessible, interoperable, and reusable) computational analysis standards.
                <sup>
                    <xref ref-type="bibr" rid="ref36">36</xref>
                </sup> The downstream analysis and visualization framework makes the navigation of the resulting large sets of tabular data faster, more intuitive, and more practicable for scientists without programming skills.
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref23">23</xref>
                </sup> It offers a variety of summary statistics on the levels of gene family expansion, differential expression, and functional enrichment to ensure quality control. The Shiny framework provides interactivity regarding the visualization and the analysis of the results, and this interactivity highly facilitates the exploration of scientific questions.</p>
            <p>Based on user experiences with the web application, we have included analyses and visualizations to allow detecting problems in the bioinformatic predictions, e.g., of orthologous groups (OGs). In order to spot potential misassignments of Orthofinder, close homologs to members of an OG are detected by similarity search and displayed with phylogenetic trees and multiple sequence alignments. A typical case is the non-inclusion of a singleton gene of a species due to a significant portion of protein sequence missing in the annotation, caused e.g., by gaps or sequencing errors in the genome sequence or errors in gene prediction. Similarly, false expansions based on a putative paralog that has only very limited alignment overlap with other members of the OG can be detected. These could be actual duplicates but degenerated through pseudogenization or partial duplication, e.g., the action of transposable elements.</p>
            <p>We designed A2TEA with extendability in mind. Both the Snakemake-based A2TEA.Workflow and the A2TEA Shiny App can be easily expanded in a modular fashion to integrate novel features. We are currently testing several additional visualization and testing options. This includes the option for positive selection tests concerning a particular OG, e.g., by calculating the ratio of non-synonymous amino-acid substitutions over synonymous amino-acid substitutions (dN/dS)), distribution comparisons between random and actual DEG-containing OGs, and visualizations for the analysis of general gene/transcript regulation trends. The GO term enrichment functionality is aimed at discovering general trends in the adaptation to the particular stress under investigation. At the moment, the implemented enrichment tests provide options for single over-representation analysis as implemented in the R topGO package.
                <sup>
                    <xref ref-type="bibr" rid="ref22">22</xref>
                </sup> It will be interesting to evaluate and potentially implement further options for functional enrichment analysis in A2TEA, such as modern ensembl approaches
                <sup>
                    <xref ref-type="bibr" rid="ref37">37</xref>
                </sup> or simplification strategies that aid in summarization.
                <sup>
                    <xref ref-type="bibr" rid="ref38">38</xref>
                </sup> Lastly, we intend to implement the option to download a comprehensive RMarkdown/Quarto report summarizing plots and statistics for all bookmarked genes and OGs. This has been demonstrated to be a significant step forward in guaranteeing the portability of results once an interactive session is concluded.
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup>
            </p>
            <p>While our research focus is on crops, from a software perspective, A2TEA is entirely independent of the underlying species and can be used with any set of species. This opens the question of how feasible applying the A2TEA methodology to species from other kingdoms might be. Our motivation for developing A2TEA is primarily rooted in the notion that genome duplication played a major role in the evolutionary past of plants.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> Plant comparative genomics research has shown that gene families are mostly conserved across great evolutionary timescales, comprising even the diversification of all angiosperms and nonflowering plants.
                <sup>
                    <xref ref-type="bibr" rid="ref39">39</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref40">40</xref>
                </sup> Fascinatingly, this conservation of gene families is combined with lineage-specific fluctuations in gene family size, which are frequent among taxa.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref39">39</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref42">42</xref>
                </sup> This suggests that since comparatively few novel gene families arose, much of the great diversity and phenotypic variation seen in land plants may have arisen primarily due to duplication and adaptive specialization of already existing genes.
                <sup>
                    <xref ref-type="bibr" rid="ref40">40</xref>
                </sup>
            </p>
            <p>While whole genome duplication events are expected and reported less frequently in the animal kingdom and thus gene duplication as a driver of protein family expansion does not play as prominent a role in animals as in plants,
                <sup>
                    <xref ref-type="bibr" rid="ref43">43</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref44">44</xref>
                </sup> protein family expansion is still an important driver of adaptation.
                <sup>
                    <xref ref-type="bibr" rid="ref45">45</xref>
                </sup> We expect that A2TEA will be useful in non-plant species, even if protein family expansion only represents a small portion of adaptive changes, with other sources of variation, like alternative splicing playing a potentially more important role.
                <sup>
                    <xref ref-type="bibr" rid="ref46">46</xref>
                </sup>
            </p>
            <p>Currently, we are investigating several publicly available genomic and transcriptomic datasets from various groups of plant species with A2TEA. While we expect to detect candidate genes relevant to adaptation to the stress being investigated, this assumption is based on the rediscovery of known genes. One important follow-up step is thus to experimentally verify the impact of selected candidate genes in vivo. To this end, we perform stress experiments in plants bearing knockout mutations in candidate genes predicted by A2TEA, using sequence-indexed mutant collections such as BonnMu.
                <sup>
                    <xref ref-type="bibr" rid="ref47">47</xref>
                </sup> This will allow us to assess the phenotypic impact of these mutations and, thus, the role of these genes in the tolerance of the stress. While testing all candidates will not be feasible, the rate of genes relevant to the trait under investigation among tested candidates will represent an estimate of the prediction performance.</p>
        </sec>
        <sec id="sec12" sec-type="conclusions">
            <title>Conclusions</title>
            <p>With the availability of multiple genome sequence and RNA-seq data sets, it is now possible to combine comparative evolutionary analyses, in our case protein family expansion, with differential expression to predict genes involved in adaptive traits. However, running the required bioinformatics analyses and data integration tasks as well as summarizing and visualizing the results, remains challenging. A2TEA only requires standard data files as input, follows best practice software standards for both reproducibility and portability, and provides a user-friendly web application for interactive exploration and selection of the most promising candidate genes. We show that genes known to be involved in stress tolerance can be detected in datasets of stress-tolerant and stress-sensitive plants, but we expect A2TEA to be useful in a broader scope when analyzing protein families and their expression in multiple genomes as the parameters for selecting interesting families are very flexible. A2TEA follows a positive trend in modern research software development that provides easy installation and execution through the use of container and workflow technologies as well as interactive visualization and exploration tools for the generated results. Combined, this facilitates better reproducibility, communication, and shareability of comprehensive analyses.</p>
        </sec>
        <sec id="sec13">
            <title>Data availability</title>
            <sec id="sec14">
                <title>Source data</title>
                <p>
                    <bold>Poaceae test data</bold>:</p>
                <p>Transcriptomics:</p>
                <p>
                    <italic toggle="yes">Hordeum vulgare</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr6782243">SRR6782243</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr6782247">SRR6782247</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr6782257">SRR6782257</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr6782249">SRR6782249</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr6782250">SRR6782250</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr6782254">SRR6782254</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Zea mays</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr2043219">SRR2043219</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr2043217">SRR2043217</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr2043190">SRR2043190</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr2043220">SRR2043220</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr2043226">SRR2043226</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr2043227">SRR2043227</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Oryza sativa japonica</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr5134063">SRR5134063</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr5134064">SRR5134064</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr5134065">SRR5134065</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr5134066">SRR5134066</ext-link>
                </p>
                <p>These correspond to the following studies relating on drought stress:</p>
                <p>
                    <italic toggle="yes">Hordeum vulgare</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/s12864-019-5634-0">https://doi.org/10.1186/s12864-019-5634-0</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Zea mays</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1104/pp.16.01045">https://doi.org/10.1104/pp.16.01045</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Oryza sativa japonica</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fpls.2017.00580">https://doi.org/10.3389/fpls.2017.00580</ext-link>
                </p>
                <p>Assemblies &amp; annotations hosted on 
                    <ext-link ext-link-type="uri" xlink:href="https://plants.ensembl.org/index.html">EnsemblPlants</ext-link>:</p>
                <p>
                    <italic toggle="yes">Hordeum vulgare</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-54/fasta/hordeum_vulgare/cdna/Hordeum_vulgare.MorexV3_pseudomolecules_assembly.cdna.all.fa.gz">cDNA FASTA</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-54/gtf/hordeum_vulgare/Hordeum_vulgare.MorexV3_pseudomolecules_assembly.54.gtf.gz">GTF</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-54/fasta/hordeum_vulgare/pep/Hordeum_vulgare.MorexV3_pseudomolecules_assembly.pep.all.fa.gz">Peptide FASTA</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Zea mays</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-49/fasta/zea_mays/dna/Zea_mays.B73_RefGen_v4.dna.toplevel.fa.gz">Genome FASTA</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-49/gtf/zea_mays/Zea_mays.B73_RefGen_v4.49.gtf.gz">GTF</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-49/fasta/zea_mays/pep/Zea_mays.B73_RefGen_v4.pep.all.fa.gz">Peptide FASTA</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Oryza sativa japonica</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-54/fasta/oryza_sativa/cdna/Oryza_sativa.IRGSP-1.0.cdna.all.fa.gz">cDNA FASTA</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-54/gtf/oryza_sativa/Oryza_sativa.IRGSP-1.0.54.gtf.gz">GTF</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-54/fasta/oryza_sativa/pep/Oryza_sativa.IRGSP-1.0.pep.all.fa.gz">Peptide FASTA</ext-link>
                </p>
                <p>An archived version of the complete grasses test data (reduced as used in the examples) is deposited here:</p>
                <p>
                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/7089022">https://zenodo.org/record/7089022</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref48">48</xref>
                    </sup>
                </p>
                <p>
                    <bold>Data used in the Brassicaceae example:</bold>
                </p>
                <p>Transcriptomics:</p>
                <p>
                    <italic toggle="yes">Eutrema salsugineum</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624684">SRR7624684</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624685">SRR7624685</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624692">SRR7624692</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624687">SRR7624687</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624721">SRR7624721</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624722">SRR7624722</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Arabidopsis lyrata</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624680">SRR7624680</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624702">SRR7624702</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624703">SRR7624703</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624732">SRR7624732</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624733">SRR7624733</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624742">SRR7624742</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Arabidopsis thaliana</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624694">SRR7624694</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624696">SRR7624696</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624697">SRR7624697</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624710">SRR7624710</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624714">SRR7624714</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr7624723">SRR7624723</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Brassica napus</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr12429701">SRR12429701</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr12429702">SRR12429702</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr12429703">SRR12429703</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr12429698">SRR12429698</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr12429699">SRR12429699</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=srr12429700">SRR12429700</ext-link>
                </p>
                <p>These correspond to the following studies on drought stress response: 
                    <italic toggle="yes">Eutrema salsugineum, Arabidopsis lyrata, Arabidopsis thaliana</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/nph.15841">https://doi.org/10.1111/nph.15841</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Brassica napus</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/bioproject/656507">PRJNA656507</ext-link>
                </p>
                <p>Assemblies &amp; annotations hosted on 
                    <ext-link ext-link-type="uri" xlink:href="https://plants.ensembl.org/index.html">EnsemblPlants</ext-link>:</p>
                <p>
                    <italic toggle="yes">Eutrema salsugineum</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="ftp.ensemblgenomes.org/pub/plants/release-52/fasta/eutrema_salsugineum/dna/Eutrema_salsugineum.Eutsalg1_0.dna.toplevel.fa.gz">Genome FASTA</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-52/gtf/eutrema_salsugineum/Eutrema_salsugineum.Eutsalg1_0.52.gtf.gz">GTF</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-52/fasta/eutrema_salsugineum/pep/Eutrema_salsugineum.Eutsalg1_0.pep.all.fa.gz">Peptide FASTA</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Arabidopsis lyrata</italic>; 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-52/fasta/arabidopsis_lyrata/dna/Arabidopsis_lyrata.v.1.0.dna.toplevel.fa.gz">Genome FASTA</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-52/gtf/arabidopsis_lyrata/Arabidopsis_lyrata.v.1.0.52.gtf.gz">GTF</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-53/fasta/arabidopsis_lyrata/pep/Arabidopsis_lyrata.v.1.0.pep.all.fa.gz">Peptide FASTA</ext-link> </p>
                <p>
                    <italic toggle="yes">Arabidopsis thaliana</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-52/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz">Genome FASTA</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-52/gtf/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.52.gtf.gz">GTF</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-52/fasta/arabidopsis_thaliana/pep/Arabidopsis_thaliana.TAIR10.pep.all.fa.gz">Peptide FASTA</ext-link>
                </p>
                <p>
                    <italic toggle="yes">Brassica napus</italic>: 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-52/fasta/brassica_napus/dna/Brassica_napus.AST_PRJEB5043_v1.dna.toplevel.fa.gz">Genome FASTA</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-52/gtf/brassica_napus/Brassica_napus.AST_PRJEB5043_v1.52.gtf.gz">GTF</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ensemblgenomes.org/pub/plants/release-52/fasta/brassica_napus/pep/Brassica_napus.AST_PRJEB5043_v1.pep.all.fa.gz">Peptide FASTA</ext-link>
                </p>
            </sec>
            <sec id="sec15">
                <title>Underlying data</title>
                <p>The results generated by the A2TEA.Workflow which are also used for demonstrating the A2TEA.WebApp&#x2019;s functionality presented in this work are available at 
                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/7089608">https://zenodo.org/record/7089608</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref49">49</xref>
                    </sup> and 
                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/7089606">https://zenodo.org/record/7089606</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref50">50</xref>
                    </sup>
                </p>
                <p>Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).</p>
            </sec>
        </sec>
        <sec id="sec16">
            <title>Software availability</title>
            <p>Both the A2TEA.Workflow and the A2TEA.WebApp are available as MIT licensed open source softwares.
                <list list-type="bullet">
                    <list-item>
                        <label>&#x2022;</label>
                        <p>Software available from: 
                            <ext-link ext-link-type="uri" xlink:href="https://tgstoecker.github.io/A2TEA.WebApp">https://tgstoecker.github.io/A2TEA.WebApp</ext-link>
                        </p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>Source code available from: 
                            <ext-link ext-link-type="uri" xlink:href="https://github.com/tgstoecker/A2TEA.Workflow">https://github.com/tgstoecker/A2TEA.Workflow</ext-link>, 
                            <ext-link ext-link-type="uri" xlink:href="https://github.com/tgstoecker/A2TEA.WebApp">https://github.com/tgstoecker/A2TEA.WebApp</ext-link>
                        </p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>Archived source code at time of publication: 
                            <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/7086290">https://zenodo.org/record/7086290</ext-link>,
                            <sup>
                                <xref ref-type="bibr" rid="ref51">51</xref>
                            </sup> 
                            <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/7086282">https://zenodo.org/record/7086282</ext-link>
                            <sup>
                                <xref ref-type="bibr" rid="ref52">52</xref>
                            </sup>
                        </p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>License: 
                            <ext-link ext-link-type="uri" xlink:href="https://https://github.com/tgstoecker/A2TEA.Workflow/blob/master/LICENSE">MIT</ext-link>
                        </p>
                    </list-item>
                </list>
            </p>
        </sec>
    </body>
    <back>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ohno</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Evolution by gene duplication.</italic>
</source>
                    <publisher-name>Springer Science &amp; Business Media</publisher-name>;<year>2013</year>.</mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Taylor</surname>
                            <given-names>JS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Raes</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Duplication and divergence: The evolution.</article-title>
                    <source>

                        <italic toggle="yes">Annu. Rev. Genet.</italic>
</source>
                    <year>2004</year>;<volume>38</volume>:<fpage>615</fpage>&#x2013;<lpage>643</lpage>.
                    <pub-id pub-id-type="pmid">15568988</pub-id>
                    <pub-id pub-id-type="doi">10.1146/annurev.genet.38.072902.092831</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Qiao</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Paterson</surname>
                            <given-names>AH</given-names>
                        </name>
</person-group>:
                    <article-title>Pervasive genome duplications across the plant tree of life and their links to major evolutionary innovations and transitions.</article-title>
                    <source>

                        <italic toggle="yes">Comput. Struct. Biotechnol. J.</italic>
</source>
                    <year>2022</year>;<volume>20</volume>:<fpage>3248</fpage>&#x2013;<lpage>3256</lpage>.
                    <pub-id pub-id-type="pmid">35782740</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.csbj.2022.06.026</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Panchy</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lehti-Shiu</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shiu</surname>
                            <given-names>S-H</given-names>
                        </name>
</person-group>:
                    <article-title>Evolution of gene duplication in plants.</article-title>
                    <source>

                        <italic toggle="yes">Plant Physiol.</italic>
</source>
                    <year>2016</year>;<volume>171</volume>(<issue>4</issue>):<fpage>2294</fpage>&#x2013;<lpage>2316</lpage>.
                    <pub-id pub-id-type="pmid">27288366</pub-id>
                    <pub-id pub-id-type="doi">10.1104/pp.16.00523</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Adams</surname>
                            <given-names>KL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wendel</surname>
                            <given-names>JF</given-names>
                        </name>
</person-group>:
                    <article-title>Polyploidy and genome evolution in plants.</article-title>
                    <source>

                        <italic toggle="yes">Curr. Opin. Plant Biol.</italic>
</source>
                    <year>2005</year>;<volume>8</volume>(<issue>2</issue>):<fpage>135</fpage>&#x2013;<lpage>141</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.pbi.2005.01.001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Maere</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>De Bodt</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Raes</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Modeling gene and genome duplications in eukaryotes.</article-title>
                    <source>

                        <italic toggle="yes">Proc. Natl. Acad. Sci.</italic>
</source>
                    <year>2005</year>;<volume>102</volume>(<issue>15</issue>):<fpage>5454</fpage>&#x2013;<lpage>5459</lpage>.
                    <pub-id pub-id-type="pmid">15800040</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.0501102102</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Voordeckers</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pougach</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Verstrepen</surname>
                            <given-names>KJ</given-names>
                        </name>
</person-group>:
                    <article-title>How do regulatory networks evolve and expand throughout evolution?</article-title>
                    <source>

                        <italic toggle="yes">Curr. Opin. Biotechnol.</italic>
</source>
                    <year>2015</year>;<volume>34</volume>:<fpage>180</fpage>&#x2013;<lpage>188</lpage>.
                    <pub-id pub-id-type="pmid">25723843</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.copbio.2015.02.001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="other">
                    <collab>R Core Team</collab>:
                    <source>

                        <italic toggle="yes">R: A Language and Environment for Statistical Computing.</italic>
</source>
                    <year>2022</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.R-project.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chang</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cheng</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Allaire</surname>
                            <given-names>JJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <source>

                        <italic toggle="yes">shiny: Web Application Framework for R.</italic>
</source>
                    <year>2022</year>. R package version 1.7.2.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=shiny">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cunningham</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Allen</surname>
                            <given-names>JE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Allen</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Ensembl 2022.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2022</year>;<volume>50</volume>(<issue>D1</issue>):<fpage>D988</fpage>&#x2013;<lpage>D995</lpage>.</mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Carbon</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ireland</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mungall</surname>
                            <given-names>CJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Amigo: online access to ontology and annotation data.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2009</year>;<volume>25</volume>(<issue>2</issue>):<fpage>288</fpage>&#x2013;<lpage>289</lpage>.
                    <pub-id pub-id-type="pmid">19033274</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btn615</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Marini</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ludt</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Linke</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Genetonic: an r/bioconductor package for streamlining the interpretation of rna-seq data.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinform.</italic>
</source>
                    <year>2021</year>;<volume>22</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>19</lpage>.</mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>M&#x00f6;lder</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jablonski</surname>
                            <given-names>KP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Letcher</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Sustainable data analysis with snakemake.</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>2021</year>;<volume>10</volume>:<fpage>33</fpage>.
                    <pub-id pub-id-type="pmid">34035898</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.29032.2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="editor">

                        <name name-style="western">
                            <surname>Gr&#x00fc;ning</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dale</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sj&#x00f6;din</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Bioconda: sustainable and comprehensive software distribution for the life sciences.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Methods.</italic>
</source>
                    <year>2018</year>;<volume>15</volume>(<issue>7</issue>):<fpage>475</fpage>&#x2013;<lpage>476</lpage>.
                    <pub-id pub-id-type="pmid">29967506</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41592-018-0046-7</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wickham</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hesselberth</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Salmon</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">pkgdown: Make Static HTML Documentation for a Package.</italic>
</source>
                    <year>2022</year>. R package version 2.0.6.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=pkgdown">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chang</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ribeiro</surname>
                            <given-names>BB</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">shinydashboard: Create Dashboards with &#x2018;Shiny&#x2019;.</italic>
</source>
                    <year>2021</year>. R package version 0.7.2.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=shinydashboard">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Granjon</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">shinydashboardPlus: Add More &#x2018;AdminLTE2&#x2019; Components to &#x2018;shinydashboard&#x2019;.</italic>
</source>
                    <year>2021</year>. R package version 2.0.3.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=shinydashboardPlus">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dobin</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Davis</surname>
                            <given-names>CA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schlesinger</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Star: ultrafast universal rna-seq aligner.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2013</year>;<volume>29</volume>(<issue>1</issue>):<fpage>15</fpage>&#x2013;<lpage>21</lpage>.
                    <pub-id pub-id-type="pmid">23104886</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bts635</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bray</surname>
                            <given-names>NL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pimentel</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Melsted</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Near-optimal probabilistic rna-seq quantification.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Biotechnol.</italic>
</source>
                    <year>2016</year>;<volume>34</volume>(<issue>5</issue>):<fpage>525</fpage>&#x2013;<lpage>527</lpage>.
                    <pub-id pub-id-type="doi">10.1038/nbt.3519</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mendes</surname>
                            <given-names>FK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vanderpool</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fulton</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Cafe 5 models variation in evolutionary rates among gene families.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2021</year>;<volume>36</volume>(<issue>22-23</issue>):<fpage>5516</fpage>&#x2013;<lpage>5518</lpage>.</mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Emms</surname>
                            <given-names>DM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kelly</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Orthofinder: phylogenetic orthology inference for comparative genomics.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2019</year>;<volume>20</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>14</lpage>.</mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Alexa</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rahnenf&#x00fc;hrer</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Gene set enrichment analysis with topgo.</article-title>
                    <source>

                        <italic toggle="yes">Bioconductor Improv.</italic>
</source>
                    <year>2009</year>;<volume>27</volume>:<fpage>1</fpage>&#x2013;<lpage>26</lpage>.</mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Supek</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>&#x0160;kunca</surname>
                            <given-names>N</given-names>
                        </name>
</person-group>:
                    <chapter-title>Visualizing go annotations.</chapter-title>
                    <source>

                        <italic toggle="yes">The Gene Ontology Handbook.</italic>
</source>
                    <publisher-loc>New York, NY</publisher-loc>:
                    <publisher-name>Humana Press</publisher-name>;<year>2017</year>; pages<fpage>207</fpage>&#x2013;<lpage>220</lpage>.</mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ling</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jarvis</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Current Biology, and undefined 2015. Regulation of chloroplast protein import by the ubiquitin e3 ligase sp1 is important for stress tolerance in plants.</italic>
</source>
                    <publisher-name>Elsevier</publisher-name>;<year>2015</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.sciencedirect.com/science/article/pii/S0960982215009574">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kirschner</surname>
                            <given-names>GK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rosignoli</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Guo</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Enhanced gravitropism 2 encodes a sterile alpha motif&#x2013;containing protein that controls root growth angle in barley and wheat.</article-title>
                    <source>

                        <italic toggle="yes">Proc. Natl. Acad. Sci.</italic>
</source>
                    <year>2021</year>;<volume>118</volume>(<issue>35</issue>):<fpage>e2101526118</fpage>.
                    <pub-id pub-id-type="pmid">34446550</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.2101526118</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sham</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Moustafa</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Al-Ameri</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Identification of arabidopsis candidate genes in response to biotic and abiotic stresses using comparative microarrays.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2015</year>;<volume>10</volume>(<issue>5</issue>):<fpage>e0125666</fpage>.
                    <pub-id pub-id-type="pmid">25933420</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0125666</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Guo</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yang</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Integrating gwas, qtl, mapping and rna-seq to identify candidate genes for seed vigor in rice (oryza sativa l.).</article-title>
                    <source>

                        <italic toggle="yes">Mol. Breed.</italic>
</source>
                    <year>2019</year>;<volume>39</volume>(<issue>6</issue>):<fpage>1</fpage>&#x2013;<lpage>16</lpage>.</mixed-citation>
            </ref>
            <ref id="ref28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sewelam</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brilhaus</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Br&#x00e4;utigam</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Molecular plant responses to combined abiotic stresses put a spotlight on unknown and abundant genes.</article-title>
                    <source>

                        <italic toggle="yes">J. Exp. Bot.</italic>
</source>
                    <year>2020</year>;<volume>71</volume>(<issue>16</issue>):<fpage>5098</fpage>&#x2013;<lpage>5112</lpage>.
                    <pub-id pub-id-type="pmid">32442250</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jxb/eraa250</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kar</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mai</surname>
                            <given-names>H-J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Khalouf</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Comparative transcriptomics of lowland rice varieties uncovers novel candidate genes for adaptive iron excess tolerance.</article-title>
                    <source>

                        <italic toggle="yes">Plant Cell Physiol.</italic>
</source>
                    <year>2021</year>;<volume>62</volume>(<issue>4</issue>):<fpage>624</fpage>&#x2013;<lpage>640</lpage>.
                    <pub-id pub-id-type="pmid">33561287</pub-id>
                    <pub-id pub-id-type="doi">10.1093/pcp/pcab018</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shaik</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ramakrishna</surname>
                            <given-names>W</given-names>
                        </name>
</person-group>:
                    <article-title>Machine learning approaches distinguish multiple stress conditions using stress-responsive genes and identify candidate genes for broad resistance in rice.</article-title>
                    <source>

                        <italic toggle="yes">Plant Physiol.</italic>
</source>
                    <year>2014</year>;<volume>164</volume>(<issue>1</issue>):<fpage>481</fpage>&#x2013;<lpage>495</lpage>.
                    <pub-id pub-id-type="pmid">24235132</pub-id>
                    <pub-id pub-id-type="doi">10.1104/pp.113.225862</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref31">
                <label>31</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Braun</surname>
                            <given-names>IR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yanarella</surname>
                            <given-names>CF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lawrence-Dill</surname>
                            <given-names>CJ</given-names>
                        </name>
</person-group>:
                    <article-title>Computing on phenotypic descriptions for candidate gene discovery and crop improvement.</article-title>
                    <year>2020</year>.
                    <pub-id pub-id-type="doi">10.34133/2020/1963251</pub-id>
                    <ext-link ext-link-type="uri" xlink:href="http://downloads.spj.sciencemag.org">Reference Source</ext-link>
                    <ext-link ext-link-type="uri" xlink:href="https://downloads.spj.sciencemag.org/plantphenomics/2020/1963251.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Michelmore</surname>
                            <given-names>RW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Paran</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kesseli</surname>
                            <given-names>RV</given-names>
                        </name>
</person-group>:
                    <article-title>Identification of markers linked to disease-resistance genes by bulked segregant analysis: A rapid method to detect markers in specific genomic regions by using segregating populations.</article-title>
                    <source>

                        <italic toggle="yes">Proc. Natl. Acad. Sci. U. S. A.</italic>
</source>
                    <year>1991</year>;<volume>88</volume>:<fpage>9828</fpage>&#x2013;<lpage>9832</lpage>.
                    <issn>00278424</issn>.
                    <pub-id pub-id-type="doi">10.1073/PNAS.88.21.9828</pub-id>
                    <ext-link ext-link-type="uri" xlink:href="https://www.pnas.org">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Likas</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vlassis</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Verbeek</surname>
                            <given-names>JJ</given-names>
                        </name>
</person-group>:
                    <article-title>The global k-means clustering algorithm.</article-title>
                    <source>

                        <italic toggle="yes">Pattern Recogn.</italic>
</source>
                    <year>2003</year>;<volume>36</volume>(<issue>2</issue>):<fpage>451</fpage>&#x2013;<lpage>461</lpage>.
                    <pub-id pub-id-type="doi">10.1016/S0031-3203(02)00060-2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Langfelder</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Horvath</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Wgcna: an r package for weighted correlation network analysis.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinform.</italic>
</source>
                    <year>2008</year>;<volume>9</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>13</lpage>.</mixed-citation>
            </ref>
            <ref id="ref35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Aoki</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ogata</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shibata</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>Approaches for extracting practical information from gene co-expression networks in plant biology.</article-title>
                    <source>

                        <italic toggle="yes">Plant Cell Physiol.</italic>
</source>
                    <year>2007</year>;<volume>48</volume>(<issue>3</issue>):<fpage>381</fpage>&#x2013;<lpage>390</lpage>.
                    <pub-id pub-id-type="pmid">17251202</pub-id>
                    <pub-id pub-id-type="doi">10.1093/pcp/pcm013</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref36">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wratten</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wilm</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>G&#x00f6;ke</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Methods.</italic>
</source>
                    <year>2021</year>;<volume>18</volume>(<issue>10</issue>):<fpage>1161</fpage>&#x2013;<lpage>1168</lpage>.
                    <pub-id pub-id-type="pmid">34556866</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41592-021-01254-9</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref37">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Alhamdoosh</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ng</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wilson</surname>
                            <given-names>NJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Combining multiple tools outperforms individual methods in gene set enrichment analyses.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2017</year>;<volume>33</volume>(<issue>3</issue>):<fpage>414</fpage>&#x2013;<lpage>424</lpage>.
                    <pub-id pub-id-type="pmid">27694195</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btw623</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref38">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zuguang</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>H&#x00fc;bschmann</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>Simplify enrichment: A bioconductor package for clustering and visualizing functional enrichment results.</article-title>
                    <source>

                        <italic toggle="yes">Genom. Proteom. Bioinf.</italic>
</source>
                    <year>2022</year>.
                    <pub-id pub-id-type="doi">10.1016/j.gpb.2022.04.008</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref39">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rensing</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lang</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zimmer</surname>
                            <given-names>AD</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The physcomitrella genome reveals evolutionary insights into the conquest of land by plants.</article-title>
                    <source>

                        <italic toggle="yes">Science.</italic>
</source>
                    <year>2008</year>;<volume>319</volume>(<issue>5859</issue>):<fpage>64</fpage>&#x2013;<lpage>69</lpage>.
                    <pub-id pub-id-type="pmid">18079367</pub-id>
                    <pub-id pub-id-type="doi">10.1126/science.1150646</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref40">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Flagel</surname>
                            <given-names>LE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wendel</surname>
                            <given-names>JF</given-names>
                        </name>
</person-group>:
                    <article-title>Gene duplication and evolutionary novelty in plants.</article-title>
                    <source>

                        <italic toggle="yes">New Phytol.</italic>
</source>
                    <year>2009</year>;<volume>183</volume>(<issue>3</issue>):<fpage>557</fpage>&#x2013;<lpage>564</lpage>.
                    <pub-id pub-id-type="doi">10.1111/j.1469-8137.2009.02923.x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref41">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Velasco</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zharkikh</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Troggio</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A high quality draft consensus sequence of the genome of a heterozygous grapevine variety.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2007</year>;<volume>2</volume>(<issue>12</issue>):<fpage>e1326</fpage>.
                    <pub-id pub-id-type="pmid">18094749</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0001326</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref42">
                <label>42</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ming</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hou</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Feng</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The draft genome of the transgenic tropical fruit tree papaya (carica papaya linnaeus).</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2008</year>;<volume>452</volume>(<issue>7190</issue>):<fpage>991</fpage>&#x2013;<lpage>996</lpage>.
                    <pub-id pub-id-type="pmid">18432245</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature06856</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref43">
                <label>43</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mable</surname>
                            <given-names>BK</given-names>
                        </name>
</person-group>:
                    <article-title>&#x2018;why polyploidy is rarer in animals than in plants&#x2019;: myths and mechanisms.</article-title>
                    <source>

                        <italic toggle="yes">Biol. J. Linn. Soc.</italic>
</source>
                    <year>2004</year>;<volume>82</volume>(<issue>4</issue>):<fpage>453</fpage>&#x2013;<lpage>466</lpage>.
                    <pub-id pub-id-type="doi">10.1111/j.1095-8312.2004.00332.x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref44">
                <label>44</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Murat</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Van de Peer</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Salse</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Decoding plant and animal genome plasticity from differential paleo-evolutionary patterns and processes.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol. Evol.</italic>
</source>
                    <year>2012</year>;<volume>4</volume>(<issue>9</issue>):<fpage>917</fpage>&#x2013;<lpage>928</lpage>.
                    <pub-id pub-id-type="pmid">22833223</pub-id>
                    <pub-id pub-id-type="doi">10.1093/gbe/evs066</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref45">
                <label>45</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Demuth</surname>
                            <given-names>JP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>De Bie</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stajich</surname>
                            <given-names>JE</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The evolution of mammalian gene families.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2006</year>;<volume>1</volume>(<issue>1</issue>):<fpage>e85</fpage>.
                    <pub-id pub-id-type="pmid">17183716</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0000085</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref46">
                <label>46</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Magen</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ast</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>Different levels of alternative splicing among eukaryotes.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2007</year>;<volume>35</volume>(<issue>1</issue>):<fpage>125</fpage>&#x2013;<lpage>131</lpage>.
                    <pub-id pub-id-type="pmid">17158149</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkl924</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref47">
                <label>47</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Marcon</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Altrogge</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Win</surname>
                            <given-names>YN</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Bonnmu: a sequence-indexed resource of transposon-induced maize mutations for functional genomics studies.</article-title>
                    <source>

                        <italic toggle="yes">Plant Physiol.</italic>
</source>
                    <year>2020</year>;<volume>184</volume>(<issue>2</issue>):<fpage>620</fpage>&#x2013;<lpage>631</lpage>.
                    <pub-id pub-id-type="pmid">32769162</pub-id>
                    <pub-id pub-id-type="doi">10.1104/pp.20.00478</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref48">
                <label>48</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>St&#x00f6;cker</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>A2TEA.Workflow test data (v1.0.0) [Data set]. Zenodo.</article-title>
                    <year>2022</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7089022</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref49">
                <label>49</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>St&#x00f6;cker</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>A2TEA.Workflow Poaceae reduced example data (v1.0.0) [Data set]. Zenodo.</article-title>
                    <year>2022</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7089608</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref50">
                <label>50</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>St&#x00f6;cker</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>A2TEA Brassicaceae example data (v.1.0.0) [Data set]. Zenodo.</article-title>
                    <year>2022</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7089606</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref51">
                <label>51</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>St&#x00f6;cker</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>tgstoecker/A2TEA.Workflow: First release (v1.0.0). Zenodo.</article-title>
                    <year>2022</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7086290</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref52">
                <label>52</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>St&#x00f6;cker</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>tgstoecker/A2TEA.WebApp: v1.0.0 (v1.0.0). Zenodo.</article-title>
                    <year>2022</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7086282</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report156575">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.138877.r156575</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Siddharthan</surname>
                        <given-names>Rahul</given-names>
                    </name>
                    <xref ref-type="aff" rid="r156575a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-2233-0954</uri>
                </contrib>
                <aff id="r156575a1">
                    <label>1</label>Computational Biology, The Institute of Mathematical Sciences, Chennai, Tamil Nadu, India</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>1</day>
                <month>2</month>
                <year>2023</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Siddharthan R</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport156575" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.126463.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The software tool A2TEA is well described and motivated. The idea of identifying genes involve in adaptive traits, such as stress-tolerance, using inter-species comparisons, is sound. A2TEA integrates the bioinformatics (RNA-seq, orthologous group computation, functional information inference, etc.</p>
            <p> </p>
            <p> I do not work in this field so have not tested out the tool. The webapp looks well laid out and user-friendly but I did not try it on actual data.</p>
            <p> </p>
            <p> Several boxes in the flowchart in figure 2 have garbled text (text replaced with boxes).&#x00a0; I also suggest highlighting a couple of possible paths through this figure in an actual workflow (eg, a single workflow would not use both kallisto and STAR?).</p>
            <p> </p>
            <p> Minor comment: one block of text, starting with "The A2TEA.WebApp is written in the R programming language..." (two paragraphs) is repeated (as a single paragraph immediately after).</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Computational biology: regulatory genomics, chromatin, algorithms</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment9506-156575">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>St&#x00f6;cker</surname>
                            <given-names>Tyll</given-names>
                        </name>
                        <aff/>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>24</day>
                    <month>3</month>
                    <year>2023</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We thank all reviewers for providing valuable input to the article that helped us to improve it further. We have tried to address their comments and made efforts to incorporate their insightful criticisms and suggestions. Similar suggestions and criticisms have been grouped together in order to address them concisely. 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <bold>Several boxes in the flowchart in figure 2 have garbled text (text replaced with boxes). I also suggest highlighting a couple of possible paths through this figure in an actual workflow (eg, a single workflow would not use both kallisto and STAR?).</bold>
                            </p>
                            <p> </p>
                            <p> We have made sure that the figure is now without the noted garbled text. Regarding the paths through the diagram: The reviewer is correct that in a normal differential expression analysis one would not choose to process some of the samples with one tool and the rest with another. However, in our field of plant/crop research, we have encountered many cases of early assembly and annotation versions, with many cases of only a transcriptome or a genome assembly being available. Due to this, the workflow does support running one species through a pseudoalignment and another through classic alignment-based quantification to provide more flexibility when combining data from different sources. In our Workflow README, we point out that for runtime and resource purposes, we recommend kallisto/pseudoalignment, and it should always be preferred if possible.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Minor comment: one block of text, starting with "The A2TEA.WebApp is written in the R programming language..." (two paragraphs) is repeated (as a single paragraph immediately after).</bold>
                            </p>
                            <p> </p>
                            <p> We thank the reviewer very much for pointing out this mistake that occurred during our editing process.</p>
                        </list-item>
                    </list>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report154237">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.138877.r154237</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Aranda</surname>
                        <given-names>Manuel</given-names>
                    </name>
                    <xref ref-type="aff" rid="r154237a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6673-016X</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Salazar Moya</surname>
                        <given-names>Octavio</given-names>
                    </name>
                    <xref ref-type="aff" rid="r154237a1">1</xref>
                    <role>Co-referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6340-6524</uri>
                </contrib>
                <aff id="r154237a1">
                    <label>1</label>Biological and Environmental Sciences and Engineering Division (BESE), Red Sea Research Center (RSRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>25</day>
                <month>11</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Aranda M and Salazar Moya O</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport154237" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.126463.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <underline>General comments:</underline>
            </p>
            <p> </p>
            <p> The manuscript submitted by St&#x00f6;cker et&#x00a0;
                <italic>et al</italic>. describes an integrated pipeline for the identification of genes potentially involved in adaptation by analyzing gene duplications and gene family expansions. The pipeline combines RNA-Seq expression analyses via DESeq2, identification and phylogenetic analysis of orthologous groups with OrthoFinder, statistical analyses of gene family changes taking into account phylogeny inference with Cafe5, and gene ontology enrichment analyses with TopGO. The results of the pipeline can be visualized using the R shiny package, which allows for easy exploration of the results even for experimentalists with little bioinformatic background. A2TEA is a simple yet useful tool for the visualization and integration of multiple analyses and datasets. A few things need to be revised in the manuscript, such as a corrupted Figure 2, repeated paragraphs, and the addition of missing references. The implementation of a &#x201c;Download All Results&#x201d; option in the general tab of the shiny app would be helpful. Allowing the pipeline to run without the addition of RNA-Seq could be quite useful to allow analyses of species without available RNA-Seq data from the same condition. Additionally, it would be appropriate to put an emphasis on the broader use of the app instead of focusing on plant stress, as it would make the app more appealing to a broader audience.</p>
            <p> </p>
            <p> 
                <underline>Specific comments</underline> 
                <list list-type="bullet">
                    <list-item>
                        <p>Abstract: &#x201c;It functions as a one-stop processing pipeline, integrating protein family, phylogeny, expression, and protein function analysis&#x201d;. Change analysis to analyses.</p>
                    </list-item>
                    <list-item>
                        <p>Abstract: &#x201c;The pipeline is accompanied by an R Shiny web application that&#x201d;. Jump in line, fix editing.</p>
                    </list-item>
                    <list-item>
                        <p>Introduction: &#x201c;Most of the duplicates are lost or silenced&#x2026;.&#x201d;. Maybe change it to something like: Most of the gene duplicates are lost or silenced, but retained duplicates may hint at some evolutionary advantage, which may be targets of adaptation.</p>
                    </list-item>
                    <list-item>
                        <p>Introduction: The first paragraph of the introduction needs some work. It is unclear if the authors are explaining the different approaches that can be used to identify genes related to adaptation or if they are referring to their pipeline, as they use &#x201c;we&#x201d; and &#x201c;us&#x201d;. It would be appropriate to first describe the different approaches and their benefits and then the need to combine them. The second and third paragraphs already properly address what the pipeline is doing.</p>
                    </list-item>
                    <list-item>
                        <p>Introduction: Change &amp; for and at: &#x201c;working in tandem to automate and ease all bioinformatics &amp; analysis tasks involved.&#x201d;</p>
                    </list-item>
                    <list-item>
                        <p>Introduction: Change &#x201c;gene families in form of&#x201d; to gene families in the form of&#x2026;</p>
                    </list-item>
                    <list-item>
                        <p>Figure 1 (and the whole article). While the tool was designed for the identification of genes for crop improvement, it is not limited to that application. As the name of the application indicates, it is designed for trait-specific evolutionary adaptations. It might be a good idea to generalize the possible applications of this tool rather than focus only on plant stress-specific responses. The authors can specify that an example of its use is in crop improvement while making sure to maintain a general tone for the use of A2TEA for a broader audience.</p>
                    </list-item>
                    <list-item>
                        <p>Cite DIAMOND, DESeq2, BLAST.</p>
                    </list-item>
                    <list-item>
                        <p>Cite Cafe in the main text (only found in the reference list).</p>
                    </list-item>
                    <list-item>
                        <p>This paragraph is repeated: &#x201c;The A2TEA.WebApp is written in the R programming language and uses the Shiny framework to facilitate interactivity with the data. It expects the user to upload a .RData file created by the A2TEA.Workflow. The web application comes with a test dataset that can be loaded with a single click so interested users can try out its functionality before having to finish an A2TEA.Workflow run.&#x201d;</p>
                    </list-item>
                    <list-item>
                        <p>Figure 2 is full of typos like orthoander instead of orthofinder and many of the boxes don&#x2019;t have text but empty squares, probably an error while formatting.</p>
                    </list-item>
                    <list-item>
                        <p>Figure 3. Italicize Eutrema salsugineum.</p>
                    </list-item>
                    <list-item>
                        <p>Figure 4. Increase the resolution or the font size, as words in panels B, C, and D are blurry.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>Comments on the A2TEA Shiny App. 
                            <list list-type="bullet">
                                <list-item>
                                    <p>General tab: Differential expression and Functional annotation require a &#x201c;Download Full Results&#x201d; button and not only a download current page.</p>
                                </list-item>
                                <list-item>
                                    <p>GO term analyses tab: At GO term analyses, changing the algorithm does not change the column name at GO term set choices. It always says classicFisher. Same in enrichment plots. Also, in the enrichment plots, it might be better to plot the cut-off lines after plotting the terms, as sometimes the terms are small due to a low number of annotated genes belonging to them and might be covered by the line. Changing the x-axis label for the bar plot to the number of significant orthogroups might be better.</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>Discussion: &#x201c;We propose that we can identify novel genes relevant for stress adaptation by comparing same-stress experiments of several plant species with different levels of stress adaptation in combination with evolutionary footprints in the form of protein family expansion&#x201d;. Again, it would be better to keep the usability of this tool as general as possible, not only focusing on plat stress, as it could be used for any species and conditions.&#x00a0;Also, using the word condition or treatment instead of stress would be better.</p>
                    </list-item>
                    <list-item>
                        <p>Additional comments. It could be beneficial if the App could also be run without RNA-Seq data for the development of a general hypothesis when comparing species for which RNA-Seq data under the same conditions is not available.</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Genomics</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment9505-154237">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>St&#x00f6;cker</surname>
                            <given-names>Tyll</given-names>
                        </name>
                        <aff/>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>24</day>
                    <month>3</month>
                    <year>2023</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We thank all reviewers for providing valuable input to the article that helped us to improve it further. We have tried to address their comments and made efforts to incorporate their insightful criticisms and suggestions. Similar suggestions and criticisms have been grouped together in order to address them concisely. 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <bold>Abstract: &#x201c;It functions as a one-stop processing pipeline, integrating protein family, phylogeny, expression, and protein function analysis&#x201d;. Change analysis to analyses.</bold>
                            </p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Abstract: &#x201c;The pipeline is accompanied by an R Shiny web application that&#x201d;. Jump in line, fix editing.</bold>
                            </p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Introduction: &#x201c;Most of the duplicates are lost or silenced&#x2026;.&#x201d;. Maybe change it to something like: Most of the gene duplicates are lost or silenced, but retained duplicates may hint at some evolutionary advantage, which may be targets of adaptation.</bold>
                            </p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Introduction: Change &amp; for and at: &#x201c;working in tandem to automate and ease all bioinformatics &amp; analysis tasks involved.&#x201d;</bold>
                            </p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Introduction: Change &#x201c;gene families in form of&#x201d; to gene families in the form of&#x2026;</bold>
                            </p>
                            <p> We thank the reviewer very much for taking the time and point out several minor formatting and phrasing errors, all of which we have corrected.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Introduction: The first paragraph of the introduction needs some work. It is unclear if the authors are explaining the different approaches that can be used to identify genes related to adaptation or if they are referring to their pipeline, as they use &#x201c;we&#x201d; and &#x201c;us&#x201d;. It would be appropriate to first describe the different approaches and their benefits and then the need to combine them. The second and third paragraphs already properly address what the pipeline is doing.</bold>
                            </p>
                            <p> On re-reading the first paragraph of the introduction with the reviewer's comments in mind, we strongly agreed that a more logical separation was necessary. We have restructured and rewritten parts of it to more clearly separate the prior approaches and then &#x2013; in a second step &#x2013; the need to combine them as we do with our software.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Cite DIAMOND, DESeq2, BLAST.&#x00a0;</bold>
                            </p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Cite Cafe in the main text (only found in the reference list).</bold>
                            </p>
                            <p> </p>
                            <p> We thank the reviewer for pointing out important missing citations in our text and have added these as well as additional ones. We have repositioned our citation of CAFE5 immediately next to the software&#x2019;s name in the text (prior it was at the end of the sentence).</p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>This paragraph is repeated: &#x201c;The A2TEA.WebApp is written in the R programming language and uses the Shiny framework to facilitate interactivity with the data. It expects the user to upload a .RData file created by the A2TEA.Workflow. The web application comes with a test dataset that can be loaded with a single click so interested users can try out its functionality before having to finish an A2TEA.Workflow run.&#x201d;</bold>
                            </p>
                            <p> </p>
                            <p> We thank the reviewer very much for pointing out this mistake that occurred during our editing process.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Figure 2 is full of typos like orthoander instead of orthofinder and many of the boxes don&#x2019;t have text but empty squares, probably an error while formatting.</bold>
                            </p>
                            <p> </p>
                            <p> Unsure of the cause, we have uploaded the figure in a different format which seems to have solved the garbled text.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Figure 3. Italicize Eutrema salsugineum.</bold>
                            </p>
                            <p> </p>
                            <p> We italicized this species in the figure text and took care to check the manuscript for similar formatting flaws.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Figure 4. Increase the resolution or the font size, as words in panels B, C, and D are blurry.</bold>
                            </p>
                            <p> </p>
                            <p> We re-exported the image in higher resolution to increase the overall legibility.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Comments on the A2TEA Shiny App.</bold>
                            </p>
                            <p> </p>
                            <p> 
                                <bold>General tab: Differential expression and Functional annotation require a &#x201c;Download Full Results&#x201d; button and not only a download current page.</bold>
                            </p>
                            <p> </p>
                            <p> The reviewer was correct in pointing out the need for this feature. For all tables, we have now implemented the possibility of downloading all results and not just the displayed page via the suggested buttons. If the user did not perform any filtering, the complete table is downloaded; if the user performs filter operations (e.g. | log2FC | &gt; 1) the complete, filtered table is downloaded.</p>
                            <p> </p>
                            <p> 
                                <bold>GO term analyses tab: At GO term analyses, changing the algorithm does not change the column name at GO term set choices. It always says classicFisher. Same in enrichment plots. Also, in the enrichment plots, it might be better to plot the cut-off lines after plotting the terms, as sometimes the terms are small due to a low number of annotated genes belonging to them and might be covered by the line. Changing the x-axis label for the bar plot to the number of significant orthogroups might be better.</bold>
                            </p>
                            <p> </p>
                            <p> The choice of the topGO algorithm concerns how the GO graph is analyzed (independence vs. consideration parent terms, etc.) and is a separate option from the downstream test statistic (
                                <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf">https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf</ext-link>). In the current release, we only allow classicFisher as the test statistic. As such, the column does not need to change in our opinion. However, we agree that information of the used algorithm should be kept. We have now implemented that the name of the algorithm is part of the output tables that the user is able to download.</p>
                            <p> </p>
                            <p> We have changed the order of the plotting in the enrichment plot &#x2013; now, the lines plot behind the terms, which ensures clearer visualizations.</p>
                            <p> </p>
                            <p> We have also adapted the reviewer's suggestion for a better x-axis label in the enrichment bar plot.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Figure 1 (and the whole article). While the tool was designed for the identification of genes for crop improvement, it is not limited to that application. As the name of the application indicates, it is designed for trait-specific evolutionary adaptations. It might be a good idea to generalize the possible applications of this tool rather than focus only on plant stress-specific responses. The authors can specify that an example of its use is in crop improvement while making sure to maintain a general tone for the use of A2TEA for a broader audience.</bold>
                            </p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Discussion: &#x201c;We propose that we can identify novel genes relevant for stress adaptation by comparing same-stress experiments of several plant species with different levels of stress adaptation in combination with evolutionary footprints in the form of protein family expansion&#x201d;. Again, it would be better to keep the usability of this tool as general as possible, not only focusing on plant stress, as it could be used for any species and conditions. Also, using the word condition or treatment instead of stress would be better.</bold>
                            </p>
                            <p> </p>
                            <p> We agree with the reviewer that the tone and focus of our manuscript can be somewhat loosened to emphasize the general technical feasibility of applying our approach to research outside of the plant kingdom. We have 1. altered parts of the manuscript to incorporate the words treatment/condition instead of the word stress (to underline broader context applicability) and 2. added an additional sentence to the end of the first paragraph emphasizing the general technical applicability in other organisms than plants.</p>
                            <p> </p>
                            <p> However, as we point out in two paragraphs near the end of the discussion, our particular approach stems from the support of previous research that genome duplication was a decisive factor in the evolutionary history of plants and much of the phenotypic variation in land plants may have arisen primarily due to duplication and adaptive specialization of already existing genes. Since our method specifically focuses on the analysis of protein family expansion events, it might be much less applicable in other groups of species where other evolutionary processes, such as alternative splicing or pathway/network rewiring, could play a proportionally much more important role. These considerations made us specifically target the "Plant Computational and Quantitative Genomics" collection because the feasibility of our software for research in other kingdoms is unclear.</p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>It could be beneficial if the App could also be run without RNA-Seq data for the development of a general hypothesis when comparing species for which RNA-Seq data under the same conditions is not available.</bold>
                            </p>
                            <p> </p>
                            <p> We agree with the reviewer's notion of not requiring RNA-seq data for all species in a workflow run. The latest release of the A2TEA.Workflow and WebApp now allow for this. The README of the Workflow explains that if the user does not possess transcriptomic data for a species, they can leave the genomic/cDNA fasta and annotation positions in the species.tsv file empty. We updated the WebApp to handle such cases of missing data. Therefore our whole pipeline has become considerably more flexible.</p>
                        </list-item>
                    </list>
                </p>
            </body>
        </sub-article>
    </sub-article>
</article>
