<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.73018.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>AQUA: an Advanced QUery Architecture for the SPARC Portal</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved with reservations, 1 not approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Shahidi</surname>
                        <given-names>Niloofar</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-5532-1651</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Lin</surname>
                        <given-names>Xuanzhi</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-3290-4122</uri>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Munarko</surname>
                        <given-names>Yuda</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-9656-3945</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Rasmy</surname>
                        <given-names>Laila</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ngo</surname>
                        <given-names>Tram</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Auckland Bioengineering Institute, The University of Auckland, Auckland, 1010, New Zealand</aff>
                <aff id="a2">
                    <label>2</label>Case Western Reserve University, Cleveland, OH, 44106, USA</aff>
                <aff id="a3">
                    <label>3</label>School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA</aff>
                <aff id="a4">
                    <label>4</label>California Medical Innovations Institute Inc, San Diego, CA, 92121, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:nsha457@aucklanduni.ac.nz">nsha457@aucklanduni.ac.nz</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>16</day>
                <month>9</month>
                <year>2021</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2021</year>
            </pub-date>
            <volume>10</volume>
            <elocation-id>930</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>2</day>
                    <month>9</month>
                    <year>2021</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Shahidi N et al.</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/10-930/pdf"/>
            <abstract>
                <p>The Stimulating Peripheral Activity to Relieve Conditions (SPARC) program integrates biological and neural information to create anatomical and functional maps of the peripheral nervous system. The SPARC Portal hosts a dynamic storage for the datasets, models, and resources to help the researchers find and produce data. Currently, the SPARC Portal provides a primary search tool, which lacks some features to improve the search experience. To purposefully retrieve the required information from the stored datasets and resources, we have developed an Advanced QUery Architecture (AQUA) for the SPARC Portal. Near-real-time auto-completion of the queries, close-matches suggestions, and multiple filters to narrow or sort the results are the major features of AQUA with the goal to enhance the usability of the SPARC search engine. AQUA is available from: https://github.com/SPARC-FAIR-Codeathon/aqua</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>AQUA</kwd>
                <kwd>SPARC</kwd>
                <kwd>biological query</kwd>
                <kwd>natural language processing</kwd>
                <kwd>NIFS Ontology</kwd>
                <kwd>text mining</kwd>
                <kwd>Codeathon</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1">
            <title>F1000 Research Statement of Endorsement</title>
            <p>David Nickerson confirms that the author has an appropriate level of expertise to conduct this research, and confirms that the submission is of an acceptable scientific standard. David Nickerson declares he is NF&#x2019;s primary supervisor and one of the organisers of the 2021 SPARC FAIR Codeathon. Affiliation: Auckland Bioengineering Institute, University of Auckland.</p>
        </sec>
        <sec id="sec2" sec-type="intro">
            <title>Introduction</title>
            <p>The Stimulating Peripheral Activity to Relieve Conditions (SPARC) program is a platform to assist neuroscientists in developing new medical devices.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup> It aims to leverage our understanding of nerve-organ interactions in biological entities and advance existing medical tools. It hosts over a hundred datasets, projects, and resources that are increasing in number, and in the future, there will be a need for a robust tool to explore the expanding content. A targeted data retrieval from the SPARC Portal can boost the researcher-portal interaction experience and help users find the data they seek. However, the search features of the SPARC Portal are limited.</p>
            <p>Currently, the search engine of the SPARC Portal does not account for close-matches or misspelt words. The primitive display of the returned results does not emphasise the matched texts and does not allow users to filter or sort the searched data. This prevents users from easily finding their required resources, and once found, users cannot properly narrow or sort the returned data. Moreover, the current description given for each returned result might not necessarily contain the matched keywords which leads to confusion. We have developed an application that we believe will enhance the SPARC Portal search by addressing the above-mentioned issues to reach a FAIR (Findable, Accessible, Interoperable and Reusable) repository to benefit researchers globally.</p>
            <p>Advanced QUery Architecture (AQUA) is an application that aims at improving the search capabilities of the SPARC Portal. In particular, it makes the search engine smarter at reading and understanding queries. It also enhances the result display feature of the SPARC Portal by making it more user-friendly and providing users with more sophisticated result filtering and sorting options. Our end goal is to improve the visibility of the SPARC datasets exponentially. This, in turn, will benefit the SPARC community as a whole since their datasets will be more discoverable for reuse and subsequent collaborations.</p>
            <p>AQUA was initiated and accomplished during the 2021 SPARC FAIR Codeathon held in July, for a time frame of two weeks. In AQUA, we have incorporated Artificial Intelligence tools to process and refine the queries on the SPARC Portal and implement predictive typing to give feasible suggestions. Thereafter, AQUA auto-corrects the queries to match the existing data on the SPARC portal and the Neuroscience Information Framework Standard (
                <ext-link ext-link-type="uri" xlink:href="https://github.com/SciCrunch/NIF-Ontology">NIFS</ext-link>) Ontology. This will return the most probable datasets that match the search keywords and a list of related new keywords. To enhance the current results display, we have added some functional features to first more precisely filter and sort the results, second emphasise the matched texts for easier skimming, and third, in the case of no available matching results, allow the users to enter their email addresses and get notified when their requested dataset is published.</p>
            <p>In this paper, we first review the implementation of AQUA and how its main sectors correlate with the user and the SPARC portal. Next, we provide more details on the sub-sections of each sector and their implemented tools and packages. We mention the added features to the AQUA User Interface (UI) and discuss how it differs from the existing SPARC Portal. Finally, we describe how AQUA can change the search tool on the SPARC Portal and denote the possible future developments to AQUA.</p>
        </sec>
        <sec id="sec3" sec-type="methods">
            <title>Methods</title>
            <sec id="sec4">
                <title>Implementation</title>
                <p>This section discusses the improvement of the search tool on the SPARC Portal. 
                    <xref ref-type="fig" rid="f1">Figure 1</xref> demonstrates how the AQUA UI (also referred to as frontend) and the AQUA server-side data-access layer (also referred to as backend) bridge between the user and the SPARC Knowledge Base. AQUA UI receives the user&#x2019;s queries, formulates them in JSON, and transfers to the AQUA backend module. AQUA backend searches for the formulated queries in the SPARC Knowledge Base. Once the matching datasets/resources are detected, the AQUA backend returns the ranked results to the AQUA UI. Thereafter, the AQUA UI displays the results according to the user&#x2019;s preference of ranking/filtering. The AQUA UI is implemented using the HTML-CSS-JS trio and the main tools utilised for the AQUA backend are Python, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.docker.com/">Docker</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.sqlite.org/index.html">SQLite</ext-link>,
                    <sup>
                        <xref ref-type="bibr" rid="ref2">2</xref>
                    </sup> and 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/SciGraph/SciGraph">SciGraph</ext-link>.</p>
                <fig fig-type="fig" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>AQUA workflow.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76636/18549b6f-6da2-4a66-8307-f71ce3e84bff_figure1.gif"/>
                </fig>
                <p>
                    <xref ref-type="fig" rid="f2">Figure 2</xref> depicts the pipeline of AQUA in three major sections: 
                    <list list-type="simple">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>Query refinement:</bold>
                                <list list-type="order">
                                    <list-item>
                                        <label>1.</label>
                                        <p>Auto-completion: Based on the term, our tool automatically completes the queries if it partially/completely matches any keywords. It then sends the selected keyword to AQUA backend.</p>
                                    </list-item>
                                    <list-item>
                                        <label>2.</label>
                                        <p>Suggestions: If no exact matches are found, it finds close-matches and suggests them to the users by popping up the phrase: 
                                            <italic toggle="yes">&#x201c;Showing results for ...&#x201d;.</italic> If the users select to search for their initial query, AQUA will send the raw and uncorrected query to the AQUA backend.</p>
                                    </list-item>
                                </list>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>Results filtering:</bold>
                                <list list-type="order">
                                    <list-item>
                                        <label>1.</label>
                                        <p>Sort by: When the results for the query are displayed, user will have the option of sorting them based on the 
                                            <italic toggle="yes">Relevance</italic>, 
                                            <italic toggle="yes">Date published</italic>, and 
                                            <italic toggle="yes">Alphabetical order.</italic>
                                        </p>
                                    </list-item>
                                    <list-item>
                                        <label>2.</label>
                                        <p>Filter by: The results can also be filtered based on 
                                            <italic toggle="yes">Keyword</italic>, 
                                            <italic toggle="yes">Author</italic>, 
                                            <italic toggle="yes">Category</italic>, and 
                                            <italic toggle="yes">Publication date.</italic>
                                        </p>
                                    </list-item>
                                    <list-item>
                                        <label>3.</label>
                                        <p>Matched text emphasised: The searched keywords will be emphasised in the returned results.</p>
                                    </list-item>
                                </list>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>&#x201c;Notify me&#x201d;:</bold> At the end, if no results are returned by the AQUA backend, our tool asks the user if they want to get notified when a related resource is published or not. For a given email address, the tool checks for its validity and then stores it using SQLite. Thereafter, it will check for any updated/uploaded related resource on the SPARC Portal everyday at 2AM EDT. In case of the requested resource availability, it sends a notification email to the registered user.</p>
                        </list-item>
                    </list>
                </p>
                <fig fig-type="fig" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>An overview of the AQUA pipeline.</title>
                        <p>The grey and yellow boxes correspond to the &#x201c;Query refinement&#x201d; and &#x201c;Notify me&#x201d; modules of the AQUA backend, respectively. The green box corresponds to the &#x201c;Results filtering&#x201d; function of the AQUA frontend on displaying the results. The purple boxes illustrate the filters and sorting options.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76636/18549b6f-6da2-4a66-8307-f71ce3e84bff_figure2.gif"/>
                </fig>
                <p>The AQUA platform integrates Python libraries, data mining tools, a SQL database engine, and Document Object Model (DOM) API to mimic an environment similar to the SPARC Portal with an improved seach functionality in multiple ways.</p>
            </sec>
            <sec id="sec5">
                <title>AQUA backend</title>
                <p>The AQUA backend includes querying the SPARC Knowledge Base for information, delivering data to the frontend, and processing any logic that the AQUA UI requires. The SPARC Knowledge Base comprises of two references: 
                    <bold>SPARC dataset metadata</bold> and 
                    <bold>NIFS ontology</bold>. Metadata is the &#x201c;Data about data&#x201d;, 
                    <italic toggle="yes">i.e.</italic>, additional information provided about datasets. The SPARC dataset metadata includes information such as title, description, techniques, as well as the number of the files, formats, licenses, etc. (
                    <ext-link ext-link-type="uri" xlink:href="https://staging.sparc.science/help/3vcLloyvrvmnK3Nopddrka#metadata">SPARC dataset metadata</ext-link>), and the NIFS ontology is a set of community ontologies used by SPARC to annotate data and models.</p>
                <p>The AQUA backend focuses on two main features: 
                    <bold>Query refinement</bold> and 
                    <bold>Email notification</bold>. Below, we give a brief introduction to these added features.</p>
                <p>
                    <list list-type="simple">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>Query refinement:</bold>
                            </p>
                            <p>When the initial query term is inserted it goes through two paths: auto-completion (yellow box in 
                                <xref ref-type="fig" rid="f3">Figure 3</xref>) and suggestions (purple box in 
                                <xref ref-type="fig" rid="f3">Figure 3</xref>).</p>
                            <p>
                                <list list-type="order">
                                    <list-item>
                                        <label>1.</label>
                                        <p>
                                            <bold>Auto-completion:</bold>
                                        </p>
                                        <p>The AQUA query refinement module auto-completes the queries after the third inserted letter while the user is typing. The idea of auto-completion is to prevent typos occurring and to give a better user experience in the SPARC Portal. We have created an n-gram model for auto-completion and utilised a Python library 
                                            <ext-link ext-link-type="uri" xlink:href="https://pypi.org/project/fast-autocomplete/">fast-autocomplete</ext-link>. In spelling correction task, an n-gram is a contiguous sequence of n letters from a given sample of text. An n-gram model is utilised to compare strings and compute the similarity between two words, by counting the number of similar n-grams they share. This technique is language independent. The more similar n-grams between two words exist the more similar they are.
                                            <sup>
                                                <xref ref-type="bibr" rid="ref3">3</xref>
                                            </sup>
                                        </p>
                                        <p>The Elasticsearch&#x2019;s auto-complete suggester is not fast enough and does not do everything that we need. Consequently, we have utilised the fast-autocomplete library in Python which provides us with a much faster process (reducing the average latency from 900 ms to 30 ms). Elasticsearch&#x2019;s auto-complete suggester does not handle any sort of combination of the words in query terms. For example fast-autocomplete can handle 
                                            <italic toggle="yes">&#x201c;brainstem neuron in rat&#x201d;</italic> when the words 
                                            <italic toggle="yes">&#x201c;brainstem&#x201d;</italic>, 
                                            <italic toggle="yes">&#x201c;neuron&#x201d;</italic>, 
                                            <italic toggle="yes">&#x201c;in&#x201d;</italic>, 
                                            <italic toggle="yes">&#x201c;rat&#x201d;</italic> are separately fed into it, while Elasticsearch&#x2019;s auto-complete needs that whole sentence to be fed to it to show it in auto-complete results.</p>
                                    </list-item>
                                    <list-item>
                                        <label>2.</label>
                                        <p>
                                            <bold>Suggestions:</bold>
                                        </p>
                                        <p>Simultaneously, AQUA utilises SciGraph for auto-correction and suggestion. SciGraph represents ontologies and ontology-encoded knowledge in a Neo4j graph. However, we found that solely using SciGraph is not sufficient because SciGraph returns alternative queries/suggestions without correcting the initial query. For example, if there is a typo or removed space between the words of a query (
                                            <italic toggle="yes">scriptio continua</italic>), SciGraph returns either no results or irrelevant results from the ElasticSearch. Therefore, we have added a new auto-correction feature to segment queries with missing spaces and fix error spelling by creating a pipeline to 
                                            <ext-link ext-link-type="uri" xlink:href="https://pypi.org/project/symspellpy/">SymSpellPy</ext-link>. SymSpellPy is a Python port of 
                                            <ext-link ext-link-type="uri" xlink:href="https://github.com/wolfgarbe/SymSpell">SymSpell</ext-link> for spelling correction, fuzzy search and approximate string matching. This improves the performance before sending the request to the ElasticSearch. The auto-correction result is combined with the suggestion results and then executed as the final query search terms. This is demonstrated within the purple box in 
                                            <xref ref-type="fig" rid="f3">Figure 3</xref>.</p>
                                        <p>
                                            <list list-type="alpha-lower">
                                                <list-item>
                                                    <p>
                                                        <bold>Word segmentation:</bold>
                                                    </p>
                                                    <p>Word segmentation divides a string into words by inserting missing spaces at the appropriate positions.</p>
                                                </list-item>
                                                <list-item>
                                                    <p>
                                                        <bold>Spelling correction:</bold>
                                                    </p>
                                                    <p>Supports spelling correction (word splitting/merging) of multi-word input strings in three cases
                                                        <sup>
                                                            <xref ref-type="bibr" rid="ref4">4</xref>
                                                        </sup>:</p>
                                                    <p>1) Extra space inserted into a correct word which leads to two incorrect terms; 2) Removed space between two correct words which leads to one incorrect term; 3) Multiple independent input terms with/without spelling errors.</p>
                                                </list-item>
                                            </list>
                                        </p>
                                        <p>To read more on AQUA query refinement visit: 
                                            <ext-link ext-link-type="uri" xlink:href="https://github.com/SPARC-FAIR-Codeathon/aqua/blob/main/Documentation/QueryRefinement.md">https://github.com/SPARC-FAIR-Codeathon/aqua/blob/main/Documentation/QueryRefinement.md</ext-link>.</p>
                                    </list-item>
                                </list>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>Email notification</bold>
                            </p>
                            <p>The primary purpose of this module is to notify users whenever a new dataset is published matching their search terms. However, users can still use the same function to receive a summary table including basic information and links to all datasets currently matching their keywords. Additionally, as the &#x201c;Notify me&#x201d; module saves the requests in a database, this information can be further accessed and analysed to improve the content (
                                <xref ref-type="fig" rid="f4">Figure 4</xref>).</p>
                            <p>We can summarize the &#x201c;Notify me&#x201d; actions as follow:</p>
                            <p>
                                <list list-type="order">
                                    <list-item>
                                        <label>1.</label>
                                        <p>Adds email requests with keywords;</p>
                                    </list-item>
                                    <list-item>
                                        <label>2.</label>
                                        <p>Scans for existing search hits and sends email;</p>
                                    </list-item>
                                    <list-item>
                                        <label>3.</label>
                                        <p>Moves the pending requests to a waiting list that is scanned daily;</p>
                                    </list-item>
                                    <list-item>
                                        <label>4.</label>
                                        <p>Moves the fulfilled requests to an archive;</p>
                                    </list-item>
                                    <list-item>
                                        <label>5.</label>
                                        <p>Any failed requests (that already have matching hits) will remain on the waiting list for one month, during which the &#x201c;Notify me&#x201d; module will try to send the email daily. Afterwards, if the email still fails, it will be moved to the archive with a &#x201c;failed&#x201d; status.</p>
                                    </list-item>
                                </list>
                            </p>
                            <p>To read more visit: 
                                <ext-link ext-link-type="uri" xlink:href="https://github.com/SPARC-FAIR-Codeathon/aqua/blob/main/Documentation/NotifyMe.md">https://github.com/SPARC-FAIR-Codeathon/aqua/blob/main/Documentation/NotifyMe.md</ext-link>.</p>
                        </list-item>
                    </list>
                </p>
                <fig fig-type="fig" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Query refinement by Auto-completion/Suggestions.</title>
                        <p>The purple box corresponds to the path into returned suggestions and the yellow box corresponds to the auto-completion path. The procedure is demonstrated by an example of inserting a misspelt initial query (
                            <italic toggle="yes">braistem</italic>) into the module.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76636/18549b6f-6da2-4a66-8307-f71ce3e84bff_figure3.gif"/>
                </fig>
                <fig fig-type="fig" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>The pipeline of the AQUA &#x201d;Notify me&#x201d; module.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76636/18549b6f-6da2-4a66-8307-f71ce3e84bff_figure4.gif"/>
                </fig>
            </sec>
            <sec id="sec6">
                <title>AQUA UI</title>
                <p>AQUA UI receives the user&#x2019;s queries, formulates them, and transfers to the AQUA backend module. When the response from the AQUA backend is received, the AQUA UI interprets it and displays the content on the screen. Like the SPARC Portal web application, the AQUA UI is implemented using 
                    <ext-link ext-link-type="uri" xlink:href="https://vuejs.org/">VueJS</ext-link> and 
                    <ext-link ext-link-type="uri" xlink:href="https://nuxtjs.org/">NuxtJS</ext-link>. Nuxt is an upper-level framework that is built over Vue.js to design and create highly advanced web applications.
                    <sup>
                        <xref ref-type="bibr" rid="ref5">5</xref>
                    </sup> The AQUA UI displays the customised list of results with the emphasised searched keywords.</p>
            </sec>
            <sec id="sec7">
                <title>Operation</title>
                <p>To start the application follow the steps in 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/SPARC-FAIR-Codeathon/aqua/blob/main/Documentation/Documentation.md#hammer_and_wrench-installation">Installation</ext-link>.</p>
                <p>
                    <bold>How to use the features added by AQUA to the SPARC Portal search engine?</bold>
                </p>
                <p>The application works like other similar search engines with a user interface mimicking the SPARC Portal environment.</p>
                <p>
                    <list list-type="order">
                        <list-item>
                            <label>1.</label>
                            <p>
                                <bold>Predictive search typing:</bold>
                            </p>
                            <p>AQUA provides auto-completion for user&#x2019;s queries as they type. This feature is powered by SciGraph and training data from the SPARC Knowledge Base. AQUA only shows auto-completion after users type three letters or more to avoid too many results being returned, slowing down the application.</p>
                        </list-item>
                        <list-item>
                            <label>2.</label>
                            <p>
                                <bold>Advanced search options:</bold>
                            </p>
                            <p>By expanding the &#x201c;Advanced search&#x201d; tab under the search box, users can select whether AQUA searches for 
                                <bold>Exact match</bold> for their query or 
                                <bold>Any of the words</bold>. The default is 
                                <bold>Any of the words</bold> match.</p>
                        </list-item>
                        <list-item>
                            <label>3.</label>
                            <p>
                                <bold>Advanced sorting:</bold>
                            </p>
                            <p>The existing SPARC Portal allows sorting based on dataset titles (alphabetically) and by published date. AQUA adds a &#x201c;Relevance&#x201d; sorting criterion that returns results based on how relevant the results are to their search query. This is set as the default sorting option.</p>
                        </list-item>
                        <list-item>
                            <label>4.</label>
                            <p>
                                <bold>Advanced filtering:</bold>
                            </p>
                            <p>The existing SPARC Portal only allows for filtering based on &#x201c;Dataset status&#x201d;, which is either Published or Embargoed. AQUA adds more sophisticated filtering options. Users can filter datasets by one or several keywords, authors, and categories. Hit &#x201c;Enter&#x201d; after each &#x201c;Keyword&#x201d;, &#x201c;Author&#x201d;, or &#x201c;Category&#x201d; in their respective box to register it. After the entries are registered, click &#x201c;Apply&#x201d; to filter dataset results.</p>
                        </list-item>
                        <list-item>
                            <label>5.</label>
                            <p>
                                <bold>Email notifications for new matched datasets:</bold>
                            </p>
                            <p>Users can opt in to receive emails about new datasets that match their search query. We believe this is a much needed option for users to stay updated about their search and SPARC datasets. Simply click on &#x201c;Create alerts&#x201d; under the search box and enter an email. AQUA will trigger an email send when newly added dataset(s) that match the search query are published by SPARC. This is a one-time-only email subscription.</p>
                        </list-item>
                        <list-item>
                            <label>6.</label>
                            <p>
                                <bold>Emphasise matched texts in result display:</bold>
                            </p>
                            <p>When a dataset is returned, any matched text in the dataset title and description will be emphasised for easy and convenient lookup.</p>
                        </list-item>
                    </list>
                </p>
            </sec>
        </sec>
        <sec id="sec8">
            <title>Use case</title>
            <p>We conducted experiments to compare the performance of the AQUA query refinement module by either deploying SciGraph or fast-autocomplete. We analysed the operation in auto-completing the queries in terms of performance and execution time. We compared these two criteria in two scenarios: correct queries, and queries with one typo. Our experiment revealed that fast-autocomplete returns more completions than SciGraph in both cases of inserting correct queries and queries with typo. Also, fast-autocomplete returned the results 24 times faster in correct queries and 11 times faster in queries with typos.</p>
            <p>We tested the performance of the AQUA spelling correction module and compared the results with the SPARC&#x2019;s Elasticsearch. To do this, we randomly selected 22 sets of queries from the SPARC dataset, each containing fifty keywords or phrases. The queries were then modified to include different types of typos (deletion, insertion, replacement). We calculated the Mean Average Precision (MAP) of AQUA and the SPARC&#x2019;s Elasticsearch in spelling correction. Results showed that as the number of terms in a query increases, the performance of AQUA noticeably surpasses the SPARC&#x2019;s Elasticsearch (
                <xref ref-type="table" rid="T1">Table 1</xref>). Same steps were taken on querying the name of author/authors as keywords for 9 test collections. 
                <xref ref-type="table" rid="T2">Table 2</xref> shows that AQUA performs better in correcting misspellings that appear in a two-term &#x201c;author&#x201d; query. A significant performance difference is AQUA&#x2019;s ability to fix &#x201c;author&#x201d; as a query that loses space where AQUA&#x2019;s MAP is 0.92 while the SPARC&#x2019;s Elasticsearch&#x2019;s MAP is only 0.12.</p>
            <table-wrap id="T1" orientation="portrait" position="float">
                <label>Table 1. </label>
                <caption>
                    <title>Mean Average Precision (MAP) of AQUA and the SPARC&#x2019;s Elasticsearch over 22 test collections consisting of biological keywords as queries.</title>
                </caption>
                <table frame="hsides" rules="groups">
                    <thead>
                        <tr valign="bottom">
                            <th align="left" colspan="1" rowspan="2">Typo</th>
                            <th align="left" colspan="2" rowspan="1">1 term</th>
                            <th align="left" colspan="2" rowspan="1">2 terms</th>
                            <th align="left" colspan="2" rowspan="1">3 terms</th>
                        </tr>
                        <tr valign="bottom">
                            <th align="left" colspan="1" rowspan="1">AQUA</th>
                            <th align="left" colspan="1" rowspan="1">ES</th>
                            <th align="left" colspan="1" rowspan="1">AQUA</th>
                            <th align="left" colspan="1" rowspan="1">ES</th>
                            <th align="left" colspan="1" rowspan="1">AQUA</th>
                            <th align="left" colspan="1" rowspan="1">ES</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">0 typo</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.714785</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.711452</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.569673</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.569673</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.680431</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.677097</td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">1 del</td>
                            <td align="left" colspan="1" rowspan="1">0.635935</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.677184</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.555371</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.505849</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.668609</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.653644</td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">1 insert</td>
                            <td align="left" colspan="1" rowspan="1">0.704785</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.742356</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.56559</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.572663</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.680431</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.661312</td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">1 replace</td>
                            <td align="left" colspan="1" rowspan="1">0.644126</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.772202</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.548968</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.568364</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.680431</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.646185</td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">no space</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">0.568006</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.987667</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.667097</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.816667</bold>
                            </td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">no space 1 typo</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">0.559696</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.995918</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.670508</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.056122</td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">no space 2 typo</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.484005</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.056667</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.644305</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.010204</td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">no space 3 typo</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.446296</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.184211</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.589903</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.003472</td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">3 typo</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.540761</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.481212</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.646919</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.621238</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <table-wrap id="T2" orientation="portrait" position="float">
                <label>Table 2. </label>
                <caption>
                    <title>Mean Average Precision (MAP) of AQUA and the SPARC&#x2019;s Elasticsearch over 9 test collections consisting of authors as queries.</title>
                </caption>
                <table frame="hsides" rules="groups">
                    <thead>
                        <tr valign="bottom">
                            <th align="left" colspan="1" rowspan="2">Typo</th>
                            <th align="left" colspan="2" rowspan="1">1 term</th>
                            <th align="left" colspan="2" rowspan="1">2 terms</th>
                        </tr>
                        <tr valign="bottom">
                            <th align="left" colspan="1" rowspan="1">AQUA</th>
                            <th align="left" colspan="1" rowspan="1">ES</th>
                            <th align="left" colspan="1" rowspan="1">AQUA</th>
                            <th align="left" colspan="1" rowspan="1">ES</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">0 typo</td>
                            <td align="left" colspan="1" rowspan="1">0.863212</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.897673</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.926911</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.952778</bold>
                            </td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">1 del</td>
                            <td align="left" colspan="1" rowspan="1">0.613025</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.675974</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.818579</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.797889</td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">1 insert</td>
                            <td align="left" colspan="1" rowspan="1">0.843871</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.914193</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.926944</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.96</bold>
                            </td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">1 replace</td>
                            <td align="left" colspan="1" rowspan="1">0.822374</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.867786</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.913039</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.913265</bold>
                            </td>
                        </tr>
                        <tr valign="top">
                            <td align="left" colspan="1" rowspan="1">no space</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">NaN</td>
                            <td align="left" colspan="1" rowspan="1">
                                <bold>0.926911</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1">0.1245</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <p>The experiment results and description are available 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/SPARC-FAIR-Codeathon/aqua/blob/main/Documentation/Documentation.md#mag_right-testing">here</ext-link>. The code for running the experiments and the data are also available on: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/SPARC-FAIR-Codeathon/aqua/tree/main/experiment">https://github.com/SPARC-FAIR-Codeathon/aqua/tree/main/experiment</ext-link>.</p>
        </sec>
        <sec id="sec9" sec-type="conclusions">
            <title>Conclusions and next steps</title>
            <p>This paper demonstrated how the SPARC Portal could be more FAIR by improving its search feature through AQUA. Since the first contact between researchers and a repository of datasets/models/resources is through the website&#x2019;s search engine, we enhanced the search system&#x2019;s functionality and the user interface. In AQUA, we deployed multiple tools and packages to make querying the data more precise, convenient, and effective.</p>
            <p>We propose to add a 
                <italic toggle="yes">view type</italic> to the existing SPARC Portal to enhance the users&#x2019; experience with the website. The SPARC Portal&#x2019;s existing view type is &#x201c;List&#x201d;. AQUA proposes to add a &#x201c;Gallery&#x201d; view option in the future. Also, we plan to add a new discovering feature to the SPARC Portal to find resources by querying snapshots of simulations. This can be done by segmenting the simulation results into smaller time intervals or any chunk of data. Currently, the AQUA &#x201c;Notify me&#x201d; feature is a one-time-only email notification. Options to be alerted more than once can also be added in the future. AQUA can also enhance the SPARC search engine further by improving user&#x2019;s next query. This will be done by developing a session-based search based on user&#x2019;s search or clickthrough history on the Portal. The feature will create a personalized experience for users and thus enhance their overall experience with the SPARC Portal.</p>
        </sec>
        <sec id="sec10">
            <title>Software availability</title>
            <p>Source code available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/SPARC-FAIR-Codeathon/aqua/blob/main/LICENSE">https://github.com/SPARC-FAIR-Codeathon/aqua/blob/main/LICENSE</ext-link>
            </p>
            <p>Archived source code as at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.5352470">https://doi.org/10.5281/zenodo.5352470</ext-link>.
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup>
            </p>
            <p>License: MIT</p>
            <p>The AQUA application can be installed and run by cloning the 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/SPARC-FAIR-Codeathon/aqua">main Github repository</ext-link> and following the command line instructions. Instructions on how to clone a Github repository can be found 
                <ext-link ext-link-type="uri" xlink:href="https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository-from-github/cloning-a-repository">here</ext-link>.</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>We would like to extend our special thanks to the NIH Common Fund&#x2019;s SPARC Program and to the organisers of the 2021 SPARC FAIR Codeathon for their support during the planning and development of this project.</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <mixed-citation publication-type="web">
                    <label>1</label>The Sparc Data and Resource Center:<year>2021</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://sparc.science/help/the-sparc-data-and-resource-center">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bhosale</surname>
                            <given-names>ST</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Patil</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Patil</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>SQLite: Light Database System.</article-title>
                    <source>

                        <italic toggle="yes">Int J Computer Sci Mobile Computing.</italic>
</source>
                    <year>April 2015</year>;<volume>4</volume>(<issue>4</issue>):<fpage>882</fpage>&#x2013;<lpage>885</lpage>.</mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ahmed</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>De Luca</surname>
                            <given-names>EW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nurnberger</surname>
                            <given-names>N</given-names>
                        </name>
</person-group>:
                    <article-title>Revised n-gram based automatic spelling correction tool to improve retrieval effectiveness.</article-title>
                    <source>

                        <italic toggle="yes">Polibits.</italic>
</source>
                    <year>December 2009</year>;<volume>40</volume>(<issue>40</issue>):<fpage>39</fpage>&#x2013;<lpage>48</lpage>.</mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="web">
                    <collab>symspellpy api</collab>.
                    <ext-link ext-link-type="uri" xlink:href="https://symspellpy.readthedocs.io/en/latest/api/symspellpy.html%23symspell">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="web">
                    <collab>Nuxt.js and Vue.js</collab>:
                    <article-title>Reasons why they differ and when do they combine.</article-title>
                    <year>2021</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://cubettech.com/resources/blog/nuxt-js-and-vue-js-reasons-why-they-differ-and-when-do-they-combine/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shahidi</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ngo</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>lrasmy</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Niloofar-Sh/aqua: First release of AQUA (v1.0.0).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2021</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.5352470</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report141807">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.76636.r141807</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Martone</surname>
                        <given-names>Maryann E.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r141807a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8406-3871</uri>
                </contrib>
                <aff id="r141807a1">
                    <label>1</label>Department of Neurosciences, Center for Research in Biological Systems, University of California, San Diego, San Diego, California, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>I am one of the PIs of the SPARC Data and Resource Center. This work concerns the SPARC project.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>26</day>
                <month>7</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Martone ME</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport141807" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.73018.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors describe a query tool they developed for the SPARC Portal during a code-a-thon held in July 2021. The authors correctly identified several shortcomings of the portal search at that time, and created a service that would address them. However, there are a few issues that limit the utility of this article: 
                <list list-type="order">
                    <list-item>
                        <p>The main issue is that, as often happens, the SPARC search interface has evolved since that time, so many of the contentions are no longer true. SPARC has moved to a faceted search interface that goes well beyond the filters employed at that time and here, e.g., anatomical structure, technique, species, sex. The new search functionality uses an Algolio index which could support many of the features that you have developed (although I don't think it is completely open, so that is a minus). So I think for this to be useful beyond describing the technology used, you would have to compare AQUA to the current SPARC interface/services rather than the state in 2021. While I know that it may be impractical to redo all the tests etc with updated SPARC, I would like the author to at least acknowledge the update and address these issues in the discussion.</p>
                    </list-item>
                    <list-item>
                        <p>The authors lay out several features, e.g., autocomplete, spell checking and e-mail notification. These clearly would be useful, but I don't think that the authors provided evidence in the form of user testing that they 
                            <italic>are</italic> useful, that is, that they give better search results for an average SPARC user. They refer readers to the results in GitHub, but I'd like to see some concrete examples and user feedback. I know there is a Docker image available, but that is not practical for those with domain expertise to test. What are the authors plans to make a version of their interface available through SPARC? e.g., in the Tool and Resource section? I didn't see it listed there.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>The authors don't discuss the generalizability of their approach. Would their code have use beyond the SPARC portal?</p>
                    </list-item>
                </list> Minor issues: 
                <list list-type="order">
                    <list-item>
                        <p>When the authors refer to "keywords", are they specifically referring to the metadata field marked "keywords"? &#x00a0;What about the other standard metadata that a SPARC acquires and tags in the JSON metadata file?</p>
                    </list-item>
                    <list-item>
                        <p>Adding a reference to the SPARC SDS specification would provide readers with a better idea of the metadata available. &#x00a0;Bandrowski et al., 2021
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-141807-1">1</xref>
                            </sup>.&#x00a0;&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>NIFS should be NIFSTD. &#x00a0;A reference for the NIFSTD ontology is Bug et al. 2008
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-141807-2">2</xref>
                            </sup>.</p>
                    </list-item>
                    <list-item>
                        <p>The table/figure legends are not adequate. Terms are introduced, e.g., SciGraph in Fig 1, that are not explained either in the legend or in the text. Why are some values bolded in the tables? A reference for SciGraph is Surles-Zeigler et al., 2022
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-141807-3">3</xref>
                            </sup> (currently accepted for publication in Frontiers in Neuroinformatics).&#x00a0;</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>No</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Partly</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>No</p>
            <p>Reviewer Expertise:</p>
            <p>Neuroinformatics. &#x00a0;I also am a PI in the SPARC Data and Resource Center so know the SPARC project very well.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-141807-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>SPARC Data Structure: Rationale and Design of a FAIR Standard for Biomedical Research Data</article-title>.
                        <source>
                            <italic>bioRxiv</italic>
                        </source>.<year>2021</year>;
                        <elocation-id>10.1101/2021.02.10.430563</elocation-id>
                        <pub-id pub-id-type="doi">10.1101/2021.02.10.430563</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-141807-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>The NIFSTD and BIRNLex vocabularies: building comprehensive ontologies for neuroscience.</article-title>
                        <source>
                            <italic>Neuroinformatics</italic>
                        </source>.<year>2008</year>;<volume>6</volume>(<issue>3</issue>) :
                        <elocation-id>10.1007/s12021-008-9032-z</elocation-id>
                        <fpage>175</fpage>-<lpage>94</lpage>
                        <pub-id pub-id-type="pmid">18975148</pub-id>
                        <pub-id pub-id-type="doi">10.1007/s12021-008-9032-z</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-141807-3">
                    <label>3</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Extending and using anatomical vocabularies in the Stimulating Peripheral Activity to Relieve Conditions (SPARC) program</article-title>.
                        <source>
                            <italic>bioRxiv</italic>
                        </source>.<year>2021</year>;
                        <elocation-id>10.1101/2021.11.15.467961</elocation-id>
                        <pub-id pub-id-type="doi">10.1101/2021.11.15.467961</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report94519">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.76636.r94519</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Rajagopal</surname>
                        <given-names>Vijay</given-names>
                    </name>
                    <xref ref-type="aff" rid="r94519a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-5509-402X</uri>
                </contrib>
                <aff id="r94519a1">
                    <label>1</label>Department of Biomedical Engineering, University of Melbourne, Melbourne, VIC, Australia</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>1</day>
                <month>11</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Rajagopal V</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport94519" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.73018.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors present a search and retrieve tool for the SPARC knowledge database. Overall, the contribution is important and in line with making research data FAIR. The article is also written well but is missing some key components that would make testing and adoption of this tool in SPARC easy to do. 
                <list list-type="bullet">
                    <list-item>
                        <p>Installation already assumes installation of yarn, and I was unable to easily install it. Please include details of popular alternate installation methods (like docker) within the instruction manual.</p>
                    </list-item>
                    <list-item>
                        <p>Related to the above,&#x00a0;what are the minimum required libraries that one needs to build and install to their software to reproduce their results?</p>
                    </list-item>
                    <list-item>
                        <p>Tables 1 and 2 need to be more informative. What is the formula for mean average precision? Ideally, they should provide other metrics as well. Perhaps even the&#x00a0;distribution of precision for the test collections.</p>
                    </list-item>
                    <list-item>
                        <p>The authors suggest that AQUA surpasses Elasticsearch&#x00a0;as number of queries increases in Table 1. Looking at column "3 terms", however, I&#x00a0;see that ES has a MAP of ~0.8 vs AQUA's ~0.6 when there is no space. Therefore the claim that AQUA is superior is to Elasticsearch not clear to me.</p>
                    </list-item>
                    <list-item>
                        <p>In Table 1, what do the NaNs mean in the column "1 term"? I can see that "no space" does not apply in this case. If this is a typo, it should be resolved.&#x00a0;In these cases is a NaN an error produced by both ES (Elasticsearch) and AQUA?</p>
                    </list-item>
                    <list-item>
                        <p>In Table 2, the "2 term" column shows that the MAP is not really that different between ES and AQUA, except for "no space". The table does not reflect the significant improvements by using AQUA. I suggest including more comparison metrics to make the case for the performance of AQUA.</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>No</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>No</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>computational physiology, mechanobiology, systems biology, image analysis, bioengineering, heart, breast</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
    </sub-article>
</article>
