<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.73825.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Opinion Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Recommendations for connecting molecular sequence and biodiversity research infrastructures through ELIXIR</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved, 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Waterhouse</surname>
                        <given-names>Robert M.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Adam-Blondon</surname>
                        <given-names>Anne-Fran&#x00e7;oise</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Agosti</surname>
                        <given-names>Donat</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Baldrian</surname>
                        <given-names>Petr</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Balech</surname>
                        <given-names>Bachir</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4419-0729</uri>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Corre</surname>
                        <given-names>Erwan</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6354-2278</uri>
                    <xref ref-type="aff" rid="a6">6</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Davey</surname>
                        <given-names>Robert P.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-5589-7754</uri>
                    <xref ref-type="aff" rid="a7">7</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Lantz</surname>
                        <given-names>Henrik</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2419-0075</uri>
                    <xref ref-type="aff" rid="a8">8</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Pesole</surname>
                        <given-names>Graziano</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-3663-0859</uri>
                    <xref ref-type="aff" rid="a5">5</xref>
                    <xref ref-type="aff" rid="a9">9</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Quast</surname>
                        <given-names>Christian</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a10">10</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Gl&#x00f6;ckner</surname>
                        <given-names>Frank Oliver</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8528-9023</uri>
                    <xref ref-type="aff" rid="a11">11</xref>
                    <xref ref-type="aff" rid="a12">12</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Raes</surname>
                        <given-names>Niels</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4329-4892</uri>
                    <xref ref-type="aff" rid="a13">13</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Sandionigi</surname>
                        <given-names>Anna</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-5257-0027</uri>
                    <xref ref-type="aff" rid="a14">14</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Santamaria</surname>
                        <given-names>Monica</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-5257-0027</uri>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Addink</surname>
                        <given-names>Wouter</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a15">15</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Vohradsky</surname>
                        <given-names>Jiri</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a16">16</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Nunes-Jorge</surname>
                        <given-names>Amandine</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0007-0653</uri>
                    <xref ref-type="aff" rid="a10">10</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Willassen</surname>
                        <given-names>Nils Peder</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a17">17</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Lanfear</surname>
                        <given-names>Jerry</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8007-5568</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a18">18</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Department of Ecology and Evolution and Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Vaud, 1015, Switzerland</aff>
                <aff id="a2">
                    <label>2</label>Universit&#x00e9; Paris Saclay, Versailles, 78026, France</aff>
                <aff id="a3">
                    <label>3</label>Plazi, Bern, 3007, Switzerland</aff>
                <aff id="a4">
                    <label>4</label>Institute of Microbiology of the Czech Academy of Sciences, Praha, 142 20, Czech Republic</aff>
                <aff id="a5">
                    <label>5</label>Institute of Biomembranes, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, CNR, Bari, 70126, Italy</aff>
                <aff id="a6">
                    <label>6</label>CNRS/Sorbonne Universit&#x00e9;, Station Biologique de Roscoff, Roscoff, 29680, France</aff>
                <aff id="a7">
                    <label>7</label>Earlham Institute, Norwich, NR4 7UZ, UK</aff>
                <aff id="a8">
                    <label>8</label>Department of Medical Biochemistry and Microbiology/NBIS, Uppsala University, Uppsala, Sweden</aff>
                <aff id="a9">
                    <label>9</label>Department of Biosciences. Biotechnology and Biopharmaceutics, University of Bari &#x201c;A. Moro&#x201d;, Bari, 70126, Italy</aff>
                <aff id="a10">
                    <label>10</label>Life Sciences &amp; Chemistry, Jacobs University Bremen gGmbH, Bremen, Germany</aff>
                <aff id="a11">
                    <label>11</label>MARUM - Center for Marine Environmental Sciences, University of Bremen, Bremerhaven, 27570, Germany</aff>
                <aff id="a12">
                    <label>12</label>Alfred Wegener Institute, Helmholtz Center for Polar- and Marine Research, Bremerhaven, 27570, Germany</aff>
                <aff id="a13">
                    <label>13</label>NLBIF - Netherlands Biodiversity Information Facility, Naturalis Biodiversity Center, Leiden, 2300 RA, The Netherlands</aff>
                <aff id="a14">
                    <label>14</label>University of Milan Bicocca, Milan, 20127, Italy</aff>
                <aff id="a15">
                    <label>15</label>DiSSCo - Distributed System of Scientific Collections, Naturalis Biodiversity Center, Leiden, 2300 RA, The Netherlands</aff>
                <aff id="a16">
                    <label>16</label>Laboratory of Bioinformatics, Institute of Microbiology, Prague, 142 20, Czech Republic</aff>
                <aff id="a17">
                    <label>17</label>Dept. of Chemistry, UiT The Arctic University of Norway, Troms&#x00f8;, Norway</aff>
                <aff id="a18">
                    <label>18</label>ELIXIR Hub, Wellcome Genome Campus, Cambridge, CB10 1SD, UK</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:jerry.lanfear@elixir-europe.org">jerry.lanfear@elixir-europe.org</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>3</day>
                <month>12</month>
                <year>2021</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2021</year>
            </pub-date>
            <volume>10</volume>
            <elocation-id>ELIXIR-1238</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>12</day>
                    <month>10</month>
                    <year>2021</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Waterhouse RM et al.</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/10-1238/pdf"/>
            <abstract>
                <p>Threats to global biodiversity are increasingly recognised by scientists and the public as a critical challenge. Molecular sequencing technologies offer means to catalogue, explore, and monitor the richness and biogeography of life on Earth. However, exploiting their full potential requires tools that connect biodiversity infrastructures and resources. As a research infrastructure developing services and technical solutions that help integrate and coordinate life science resources across Europe, ELIXIR is a key player. To identify opportunities, highlight priorities, and aid strategic thinking, here we survey approaches by which molecular technologies help inform understanding of biodiversity. We detail example use cases to highlight how DNA sequencing is: resolving taxonomic issues; Increasing knowledge of marine biodiversity; helping understand how agriculture and biodiversity are critically linked; and playing an essential role in ecological studies. Together with examples of national biodiversity programmes, the use cases show where progress is being made but also highlight common challenges and opportunities for future enhancement of underlying technologies and services that connect molecular and wider biodiversity domains. Based on emerging themes, we propose key recommendations to guide future funding for biodiversity research: biodiversity and bioinformatic infrastructures need to collaborate closely and strategically; taxonomic efforts need to be aligned and harmonised across domains; metadata needs to be standardised and common data management approaches widely adopted; current approaches need to be scaled up dramatically to address the anticipated explosion of molecular data; bioinformatics support for biodiversity research needs to be enabled and sustained; training for end users of biodiversity research infrastructures needs to be prioritised; and community initiatives need to be proactive and focused on enabling solutions. For sequencing data to deliver their full potential they must be connected to knowledge: together, molecular sequence data collection initiatives and biodiversity research infrastructures can advance global efforts to prevent further decline of Earth&#x2019;s biodiversity.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Bioinformatics</kwd>
                <kwd>Genomics</kwd>
                <kwd>Sequencing</kwd>
                <kwd>Data Management</kwd>
                <kwd>Data Standards</kwd>
                <kwd>Genetic Resources</kwd>
                <kwd>Taxonomy</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>Ministry of Education, Youth and Sports of the Czech Republic</funding-source>
                    <award-id>LM2015047</award-id>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/501100005416">
                    <funding-source>Norges Forskningsr&#x00e5;d</funding-source>
                    <award-id>270068</award-id>
                </award-group>
                <award-group id="fund-3" xlink:href="http://dx.doi.org/10.13039/501100000268">
                    <funding-source>Biotechnology and Biological Sciences Research Council</funding-source>
                    <award-id>BB/CSP1720/1</award-id>
                    <award-id>BBS/E/T/000PR9817</award-id>
                    <award-id>BB/P016855/1</award-id>
                    <award-id>BBS/E/T/000PR9783</award-id>
                    <award-id>BB/CCG1720/1</award-id>
                    <award-id>BBS/E/T/000PR9814</award-id>
                </award-group>
                <award-group id="fund-4" xlink:href="http://dx.doi.org/10.13039/501100007601">
                    <funding-source>Horizon 2020</funding-source>
                    <award-id>817580</award-id>
                </award-group>
                <award-group id="fund-5" xlink:href="http://dx.doi.org/10.13039/501100001824">
                    <funding-source>Grantov&#x00e1; Agentura &#x010c;esk&#x00e9; Republiky</funding-source>
                    <award-id>21-17749S</award-id>
                </award-group>
                <award-group id="fund-6" xlink:href="http://dx.doi.org/10.13039/100012088">
                    <funding-source>Arcadia Fund</funding-source>
                </award-group>
                <award-group id="fund-7">
                    <funding-source>Schweizerischer Nationalfonds zur F&#x00f6;rderung der Wissenschaftlichen Forschung</funding-source>
                    <award-id>PP00P3_170664</award-id>
                    <award-id>PP00P3_202669</award-id>
                </award-group>
                <funding-statement>DA was supported by Arcadia &#x2013; a charitable fund of Lisbet Rausing and Peter Baldwin. RMW was supported by Swiss National Science Foundation PP00P3_170664. RPD was supported by the Biotechnology and Biological Sciences Research Council (BBSRC), part of UK Research and Innovation (UKRI), through the Core Strategic Programme Grant BB/CSP1720/1, BBS/E/T/000PR9817, Designing Future Wheat grant BB/P016855/1, BBS/E/T/000PR9783 and Core Capability Grant BB/CCG1720/1, BBS/E/T/000PR9814 at the Earlham Institute. PB was supported by the Czech Science Foundation (21-17749S) and by the Ministry of Education, Youth and Sports of the Czech Republic (LM2015047). NPW was supported by the Research Council of Norway (270068). A-F A-B was supported by the GenRes Bridge project that received funding from the European Union&#x2019;s Horizon 2020 research and innovation programme under grant agreement No 817580.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <sec id="sec2">
                <title>Sequence data collection initiatives offer opportunities to connect with and feed into biodiversity research infrastructures</title>
                <p>Biological diversity represents the full spectrum of the variety of organisms on Earth, at genetic, species, and ecosystem levels, created over millions of years of evolution. Biodiversity is also essential for life itself, for the sustainability of varied communities of interdependent and interacting species at all scales. Anthropocentrically, biodiversity forms the foundation of ecosystem services that are indispensable for human well-being and a healthy planet. Whilst biodiversity is naturally constantly changing, increasingly unsustainable pressures resulting from human activities mean that this variety is currently being lost like never before. Recognising the threat to humanity that this decline poses, governments and international organisations have responded with strategies to protect and restore biodiversity, such as the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (
                    <xref ref-type="bibr" rid="ref56">IPBES 2021</xref>). These and other initiatives also recognise the important roles that genetic and genomic data can play in biodiversity assessment, monitoring, conservation, and restoration, to ensure the long-term health of ecosystem services (
                    <xref ref-type="bibr" rid="ref51">Hoban et al. 2020</xref>). This requires infrastructures that make it easier for scientists to exchange knowledge and agree on best practices, as well as to find, share, and connect increasingly large and diverse datasets. As an intergovernmental organisation that develops services and technical solutions to integrate and coordinate life science resources from across Europe, ELIXIR recognises that connecting molecular sequence data with biodiversity research infrastructures will be critical to support global efforts to prevent further declines of biodiversity.</p>
                <p>
                    <italic toggle="yes">Biodiversity sequencing and research infrastructure initiatives</italic>
                </p>
                <p>One of the aims of biodiversity research infrastructures is to compile and maintain comprehensive lists of all known species of organisms including their spatio-temporal distributions on Earth, normally within a taxonomic framework and usually with additional associated metadata. Prominent examples that bring together information from multiple sources include the Catalogue of Life (CoL) (
                    <xref ref-type="bibr" rid="ref93">Roskov et al. 2020</xref>), the Global Biodiversity Information Facility (
                    <xref ref-type="bibr" rid="ref43">GBIF 2021</xref>), the Environmental Research Infrastructures Community (
                    <xref ref-type="bibr" rid="ref35">ENVRI 2021</xref>), the Ocean Biodiversity Information System (
                    <xref ref-type="bibr" rid="ref84">OBIS 2021</xref>), the Encyclopedia of Life (
                    <xref ref-type="bibr" rid="ref85">Parr et al. 2014</xref>), and the Distributed System of Scientific Collections (
                    <xref ref-type="bibr" rid="ref27">DiSSCo 2021</xref>). For example, GBIF aims to map diversity in space and time based on natural science collection records, sequence data, biodiversity surveys, human and machine observations, and species lists. The taxonomic frameworks are built from sources of published records such as the Biodiversity Heritage Library and the Biodiversity Literature Repository (BLR), with ongoing efforts to standardise data and make them machine readable and citable (
                    <xref ref-type="bibr" rid="ref1">Agosti &amp; Egloff 2009</xref>; 
                    <xref ref-type="bibr" rid="ref86">Penev et al. 2012</xref>; 
                    <xref ref-type="bibr" rid="ref12">B&#x00e9;nichou et al. 2019</xref>). Biodiversity research infrastructures also encompass biobanks (genebanks or seed banks) for conserving genetic resources, of major crops and their wild relatives e.g. collated by Genesys (
                    <xref ref-type="bibr" rid="ref44">Genesys 2021</xref>), of livestock breeds managed by the Domestic Animal Diversity Information System (
                    <xref ref-type="bibr" rid="ref21">DAD-IS 2021</xref>), or of microbes in the context of food and agriculture or health e.g. managed by the Microbial Resource Research Infrastructure (
                    <xref ref-type="bibr" rid="ref75">MIRRI 2021</xref>).</p>
                <p>Molecular data collection initiatives are equally as varied, aiming to collate growing amounts of DNA and RNA sequence data, often also employing a taxonomic framework and collecting sample metadata. Notable examples include the principally archival International Nucleotide Sequence Database Collaboration (INSDC) (
                    <xref ref-type="bibr" rid="ref8">Arita et al. 2021</xref>) comprising the DNA DataBank of Japan (DDBJ), the European Nucleotide Archive (ENA), and the United States National Center for Biotechnology Information (NCBI) GenBank, as well as the China National GeneBank DataBase (
                    <xref ref-type="bibr" rid="ref118">Wang et al. 2019</xref>). More specialised initiatives focus on e.g. ribosomal RNA collections (
                    <xref ref-type="bibr" rid="ref48">Gl&#x00f6;ckner et al. 2017</xref>; 
                    <xref ref-type="bibr" rid="ref95">Santamaria et al. 2018</xref>; 
                    <xref ref-type="bibr" rid="ref82">Nilsson et al. 2019</xref>), microbiome resources (
                    <xref ref-type="bibr" rid="ref76">Mitchell et al. 2020</xref>), or metagenomics sequence data (
                    <xref ref-type="bibr" rid="ref73">Meyer et al. 2019</xref>).</p>
                <p>These examples help to formulate more formal definitions: (i) molecular sequence data collection initiatives are producing and collating reference catalogues of genetic and genomic biodiversity on Earth; and (ii) biodiversity research infrastructures are capturing knowledge from scientific collections, observations, and the literature, and building resources of biodiversity information for all Earth&#x2019;s organisms. Here we identify opportunities to connect these biodiversity sequence collection initiatives and research infrastructures in a standardised and scalable manner that will greatly enhance the utility of both by facilitating data-to-knowledge research.</p>
                <p>
                    <italic toggle="yes">Expanding collections of molecular sequence data</italic>
                </p>
                <p>New technologies and falling sequencing costs are greatly improving the diversity of species sampling through the acquisition of increasing amounts of molecular data. This has led to a growing number of large-scale sequencing data generation initiatives with increasingly ambitious sampling aims covering eukaryotes, prokaryotes, and viruses (
                    <xref ref-type="table" rid="T1">Table 1</xref>). For example, the Earth BioGenome Project (
                    <xref ref-type="bibr" rid="ref31">EBP 2021</xref>) aims to coordinate the sequencing and characterisation of the genomes of all eukaryotic life, with a vision of creating a new foundation for biology that will deliver solutions for understanding ecosystems, protecting biodiversity, and benefiting human welfare (
                    <xref ref-type="bibr" rid="ref67">Lewin et al. 2018</xref>). This involves developing and agreeing on standards for all protocols from specimen collection and identification through to sequencing, assembly, annotation, and analysis. Initiatives are typically geographically or taxonomically focused, such as the Darwin Tree of Life (
                    <xref ref-type="bibr" rid="ref29">DToL 2021</xref>) project in Britain and Ireland, The European Reference Genome Atlas initiative (
                    <xref ref-type="bibr" rid="ref36">ERGA 2021</xref>), the Vertebrate Genomes Project (VGP) (
                    <xref ref-type="bibr" rid="ref126">Rhie et al. 2020</xref>), the i5k Arthropod Genomes Initiative (
                    <xref ref-type="bibr" rid="ref54">i5K Consortium 2013</xref>), the 10KP Plant Genomes Project (
                    <xref ref-type="bibr" rid="ref19">Cheng et al. 2018</xref>), and others (
                    <xref ref-type="table" rid="T1">Table 1</xref>).</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>Table 1. </label>
                    <caption>
                        <title>Examples of major molecular sequence data generation and coordination initiatives.</title>
                        <p>A non-exhaustive list of active international projects and umbrella initiatives covering many species and producing (meta) genomes, (meta) transcriptomes, and/or DNA barcodes, with public data deposition.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Initiative name/acronym</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Main focus of the initiative</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">URL/website for further information</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">1KITE 1&#x2019;000 Insect Transcriptome Evolution</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Transcriptomes, insects</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.1kite.org/">https://www.1kite.org/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">1KP 1&#x2019;000 Plants</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Transcriptomes, plants</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://sites.google.com/a/ualberta.ca/onekp/">https://sites.google.com/a/ualberta.ca/onekp/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">10KP 10&#x2019;000 Plants</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, plants</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://db.cngb.org/10kp/">https://db.cngb.org/10kp/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">ACE Antarctic Circumnavigation Expedition</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">(meta)genomes, (meta)transcriptomes, marine microbes</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://spi-ace-expedition.ch/">https://spi-ace-expedition.ch/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Bat1K 1&#x2019;000 Bat Genomes</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, all bats</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://bat1k.ucd.ie/about/">https://bat1k.ucd.ie/about/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Bird10K 10&#x2019;000 Bird Genomes</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, all birds</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://b10k.genomics.cn/">https://b10k.genomics.cn/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">DToL Darwin Tree of Life</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, Britain and Ireland eukaryotes</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.darwintreeoflife.org/">https://www.darwintreeoflife.org/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">EBP Earth BioGenome Project</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, all eukaryotes, umbrella for many initiatives worldwide</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.earthbiogenome.org/">https://www.earthbiogenome.org/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">ERGA European Reference Genome Atlas</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, all eukaryotes in Europe</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.erga-biodiversity.eu/">https://www.erga-biodiversity.eu/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">G10K 10&#x2019;000 Genomes</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, umbrella for Bat1K, Bird10K, VGP, etc.</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://genome10k.soe.ucsc.edu/about/">https://genome10k.soe.ucsc.edu/about/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">GAGA Global Ant Genomics Alliance</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, ants</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="http://antgenomics.dk/">http://antgenomics.dk/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomic Encyclopedia of Bacteria and Archaea</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, bacteria and archaea</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://phylogenomics.me/major-current-projects/geba/">https://phylogenomics.me/major-current-projects/geba/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">GIGA Global Invertebrate Genomics Alliance</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, transcriptomes, non-insect non-nematode invertebrates</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="http://giga-cos.org/">http://giga-cos.org/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">GlobalFungi</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Fungi, ITS sequences</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://globalfungi.com/">https://globalfungi.com/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Global Virome Project</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">(meta)genomes, viruses</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.globalviromeproject.org/">http://www.globalviromeproject.org/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">i5k 5&#x2019;000 Arthropod Genomes Initiative</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, arthropods</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="http://i5k.github.io/">http://i5k.github.io/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">iBOL International Barcode of Life &amp; BIOSCAN</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">DNA barcodes plants, animals, fungi</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://ibol.org/">https://ibol.org/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Kew Tree of Life Project</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Flowering plants, target sequence capture</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://treeoflife.kew.org/">https://treeoflife.kew.org/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">MOSAiC Arctic Ocean Expedition</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">(meta)genomes, (meta)transcriptomes, marine microbes</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://mosaic-expedition.org/">https://mosaic-expedition.org/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Tara Oceans</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">(meta)genomes, (meta)transcriptomes, plankton</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://oceans.taraexpeditions.org/">https://oceans.taraexpeditions.org/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">The Earth Microbiome Project</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Microbial communities</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://earthmicrobiome.org/">https://earthmicrobiome.org/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">UNITE</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Fungi, ITS sequences</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://unite.ut.ee/">https://unite.ut.ee/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">VGP Vertebrate Genomes Project</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Genomes, 70&#x2019;000 vertebrates</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <ext-link ext-link-type="uri" xlink:href="https://vertebrategenomesproject.org/">https://vertebrategenomesproject.org/</ext-link>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>Microbe-focused sequencing initiatives benefit from much smaller genomes, but this is countered by orders of magnitude greater species diversity, most of which remains uncatalogued. Pioneering efforts such as the Genomic Encyclopedia of Bacteria and Archaea (GEBA) aim to systematically fill gaps in the phylogeny and to sequence type strains (
                    <xref ref-type="bibr" rid="ref120">Whitman et al. 2015</xref>; 
                    <xref ref-type="bibr" rid="ref80">Mukherjee et al. 2017</xref>). Others apply metagenomics approaches and are driven more by ecosystem ecology than phylogeny, including the Earth Microbiome Project (EMP) (
                    <xref ref-type="bibr" rid="ref46">Gilbert et al. 2014</xref>), Tara Oceans (
                    <xref ref-type="bibr" rid="ref104">Sunagawa et al. 2020</xref>) and other marine surveying projects. Many are driven by the impacts of microbes on human health, e.g. the Global Microbial Identifier (GMI) consortium (
                    <xref ref-type="bibr" rid="ref2">Aarestrup et al. 2012</xref>) collates genomic information of microorganisms linked to epidemiological data for bacteria, viruses, parasites, and fungi, and the Human Microbiome Project that focuses on host-microbiome interactions (
                    <xref ref-type="bibr" rid="ref55">iHMP Research Network Consortium 2019</xref>). Similarly, the Global Virome Project aims to improve understanding of the diversity and ecology of viral threats (
                    <xref ref-type="bibr" rid="ref17">Carroll et al. 2018</xref>).</p>
                <p>In addition to reference genomes, collections of sequence data are growing through DNA barcoding initiatives that define standardised molecular marker(s) for species identification, e.g. cytochrome c oxidase I (COX1) for metazoans, internal transcribed spacer (ITS) for fungi, 16S rRNA for bacteria, and ribulose-1,5-bisphosphate carboxylase/oxygenase (rbcL), maturase K (matK), and ITS for plants. The main reference libraries include the Barcode of Life Data (BOLD) System (
                    <xref ref-type="bibr" rid="ref91">Ratnasingham &amp; Hebert 2007</xref>) and the International Barcode of Life (iBOL) (
                    <xref ref-type="bibr" rid="ref4">Adamowicz et al. 2017</xref>). Ongoing barcoding efforts, such as the iBOL consortium&#x2019;s BIOSCAN programme (
                    <xref ref-type="bibr" rid="ref52">Hobern 2020</xref>), continue to expand molecular sequence data collection to speed up species discovery as well as exploring species interactions and tracking their dynamics. Together, these sequence data generation initiatives aim to produce molecular catalogues with associated metadata of the entirety of Earth&#x2019;s biodiversity.</p>
                <p>
                    <italic toggle="yes">Metadata standards requirements for use in biodiversity research</italic>
                </p>
                <p>Many sequencing initiatives have and will continue to produce molecular data in the form of reference-quality genomes, complete transcriptomes, and lineage-tailored DNA barcode libraries. In terms of tangible outcomes for biodiversity knowledge, these data represent a rapidly growing comprehensive molecular &#x2018;lookup table&#x2019; for species identification. To ensure accuracy, species must be correctly identified and recorded during sample collection and referenced to a taxonomic backbone (e.g. NCBI or GBIF), with subsequent management of reference or voucher information, and publishing with the respective voucher and taxon identifiers. To this end, sample vouchering experience from museums such as the Smithsonian has been vital in driving standards development through collaborative initiatives such as the Global Genome Biodiversity Network (GGBN) (
                    <xref ref-type="bibr" rid="ref127">Droege et al. 2016</xref>). These efforts helped to extend data models for classical specimens, e.g. Darwin Core (
                    <xref ref-type="bibr" rid="ref121">Wieczorek et al. 2012</xref>) and Access to Biological Collections Data (ABCD) (
                    <xref ref-type="bibr" rid="ref128">Holetschek et al. 2012</xref>), in order to build a new data model for molecular sequence data. One of the key roles of initiatives like the Earth BioGenome Project and others (
                    <xref ref-type="table" rid="T1">Table 1</xref>) is to coordinate the development of protocols and standards for sample collection and metadata capture in line with such data models, building on established reporting standards that aim to make genomic data discoverable, e.g. developed by the genomic standards consortium (
                    <xref ref-type="bibr" rid="ref41">Field et al. 2014</xref>).</p>
                <p>In the context of infraspecific diversity conserved in plant, forest, and animal genetic resources, several projects are developing common recommendations and metadata standards to improve the conservation and sustainable use of these resources, e.g. GenRes Bridge (
                    <xref ref-type="bibr" rid="ref45">GenResBridge 2021</xref>), DivSeek (
                    <xref ref-type="bibr" rid="ref28">DivSeek 2021</xref>), and FAANG (
                    <xref ref-type="bibr" rid="ref38">FAANG 2021</xref>). Metagenomics projects also recognise the importance of developing data standards for describing essential steps, including sampling, sequencing, data analysis, archiving, and dissemination (
                    <xref ref-type="bibr" rid="ref53">ten Hoopen et al. 2017</xref>). Across the board, tools that make metadata management easier, such as the COPO platform for brokering collaborative open omics data (
                    <xref ref-type="bibr" rid="ref99">Shaw et al. 2020</xref>), are helping to ensure that data are increasingly Findable, Accessible, Interoperable, and Reusable (FAIR) (
                    <xref ref-type="bibr" rid="ref122">Wilkinson et al. 2016</xref>). These examples highlight the challenges involved as well as the importance of developing and applying community standards to comprehensively describe the sources of molecular sequence data collections. Good metadata management is critical to enable biorepositories to collect and preserve Earth&#x2019;s genetic and genomic biodiversity in molecular sequence collections, while making it both available to and usable by researchers worldwide.</p>
                <p>
                    <italic toggle="yes">Benefits of connecting sequencing data to biodiversity research infrastructures</italic>
                </p>
                <p>Data management frameworks aim to connect data generation initiatives to biodiversity research infrastructures in order to accelerate and expand the capabilities of existing species quantification and monitoring efforts. To achieve a unified global record of species populations in space and time, two principal Essential Biodiversity Variables (EBVs), species abundance and distribution, are required (
                    <xref ref-type="bibr" rid="ref59">Jetz et al. 2019</xref>). To detect critical and potentially long-lasting biodiversity change, additional EBVs need to be prioritised such as allelic diversity, survival rates, ecosystem heterogeneity, phenology, range dynamics, size at first reproduction, and body mass index (
                    <xref ref-type="bibr" rid="ref97">Schmeller et al. 2018</xref>; 
                    <xref ref-type="bibr" rid="ref64">Kissling et al. 2018</xref>). Taxonomically annotated molecular catalogues of Earth&#x2019;s biodiversity provide the means to scale up data collection of species and community EBVs that can be extrapolated from sequencing georeferenced samples. DNA barcoding has proven to be a cost-effective way of providing a model for integrating genomic data resources and biodiversity catalogues. For example, connecting GBIF with the UNITE database, a fungi-focused DNA barcoding resource (
                    <xref ref-type="bibr" rid="ref82">Nilsson et al. 2019</xref>), enables spatial and temporal surveying even for &#x2018;dark taxa&#x2019; without any physical specimens or resolved taxonomic names (
                    <xref ref-type="bibr" rid="ref94">Ryberg &amp; Nilsson 2018</xref>). Another example is the DNA barcode reference library of Canadian invertebrate fauna, which is supported by voucher specimens, digital images, and DNA extracts, with sequences deposited at GenBank and BOLD, and specimen data contributed to GBIF as Darwin Core records (
                    <xref ref-type="bibr" rid="ref26">deWaard et al. 2019</xref>).</p>
                <p>Beyond barcodes, employing the MGnify resource (
                    <xref ref-type="bibr" rid="ref76">Mitchell et al. 2020</xref>) to perform taxonomic assignments of microbiome sequencing data, the European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) and GBIF teamed up to facilitate the generation of sequence-based occurrence records from georeferenced European Nucleotide Archive (ENA) samples as standardised Darwin Core sampling-event datasets (
                    <xref ref-type="bibr" rid="ref96">Schigel et al. 2019</xref>). Facilitating these processes is important to ensure that DNA-derived data are made discoverable through biodiversity platforms and thus increase the value of sequences with associated coordinates and timestamps (
                    <xref ref-type="bibr" rid="ref7">Andersson et al. 2020</xref>). These examples show that making such connections can (i) extend traditional sampling methods of observing, capturing, or extracting, to massively scaled-up sampling using metagenomics or environmental DNA (eDNA) techniques; and (ii) transform traditional expert identification approaches into super-fast molecular species identification using progressively more comprehensive reference sequence databases. To realise these benefits, the future will therefore increasingly need to combine new sequencing technologies and bioinformatics data models for molecular sequence data management with field ecology to match metagenomics or eDNA data to reference genomic species libraries.</p>
                <p>
                    <italic toggle="yes">Mutually beneficial outcomes for sequence collections and biodiversity infrastructures</italic>
                </p>
                <p>For biodiversity research to exploit the full value of data from molecular sequence collection initiatives, it is clear that robust and reproducible approaches to data integration are required. Ongoing efforts to coordinate traditional biodiversity infrastructures exemplify how developing common standards and practices enhance interoperability and value. For example, the DiSSCo research infrastructure works towards the digital unification of all European natural science collections (
                    <xref ref-type="bibr" rid="ref27">DiSSCo 2021</xref>), and the Consortium of European Taxonomic Facilities (
                    <xref ref-type="bibr" rid="ref18">CETAF 2021</xref>) brings together collections from museums, botanic gardens, and others, with a research focus on taxonomy and systematic biology. Such digitalisation and standardisation greatly facilitate the task of connecting sequence collections and biodiversity research infrastructures, exemplified by recent GBIF-UNITE and GBIF-EBI collaborations (
                    <xref ref-type="bibr" rid="ref7">Andersson et al. 2020</xref>).</p>
                <p>As well as accelerating and expanding the capabilities of existing biodiversity quantification and monitoring efforts, molecular data can support biodiversity research more widely. For example, by helping to extend, refine, and update catalogues of known species, particularly for microbes and fungi but also other groups such as insects where possibly 80% of species remain to be discovered (
                    <xref ref-type="bibr" rid="ref102">Stork 2018</xref>), known as &#x2018;dark&#x2019; biodiversity. Reciprocally, traditional biodiversity data and resources can help inform detailed annotations of sequence collections, linking data to knowledge about species biology and ecosystem compositions. One way this corpus of data from an estimated 500 million scholarly publications including all known species and their taxonomy, can be made FAIR-compliant is through the BLR (
                    <xref ref-type="bibr" rid="ref5">Agosti et al. 2019</xref>) and its reuse by GBIF. Thus by making the connections, decades of accumulated learning can transform into new and refined knowledge supported by molecular data, greatly advancing data-to-knowledge research.</p>
                <p>Here we outline current technical capabilities with respect to the tools and other resources that support the molecular components of biodiversity informatics, and present four use case examples focused on (i) sequence-informed taxonomies; (ii) ocean metagenomics; (iii) agricultural food security genetics; and (iv) global fungal diversity. These illustrate current efforts and resources to link sequence collections with biodiversity infrastructures. They inform strategies for developing national biodiversity programmes, while also highlighting key gaps that need to be addressed. Together with other examples, they help to formulate recommendations for closer integrations through ELIXIR and other infrastructures that will shape the future of biodiversity research.</p>
            </sec>
            <sec id="sec3">
                <title>ELIXIR as an infrastructure to support integration of molecular and other biodiversity-related data</title>
                <p>ELIXIR is an intergovernmental European organisation that brings together life science resources including databases, software tools, training materials, cloud storage and supercomputers, to connect and unite infrastructures vital for scientific research (
                    <xref ref-type="bibr" rid="ref33">ELIXIR 2021a</xref>). It coordinates, integrates, and sustains bioinformatics resources across its member states, enabling users in academia and industry to access services that support scientists to exchange expertise and develop best practices, as well as to find and share the accumulating volumes of data being generated by publicly funded research. ELIXIR services (i.e. resources for users), platforms (i.e. technical domains for implementation), and communities (i.e. use cases) aim to develop and provide solutions to manage life sciences data of increasing quantity and complexity, with robust bioinformatics infrastructures and the best tools and training to drive innovation. These principles also apply to the growing field of biodiversity informatics, and it is thus timely to begin to identify the key life sciences resources, from both within the established ELIXIR infrastructures and beyond, which are required to effectively support biodiversity research. This includes the acquisition, analysis, and archival of molecular sequence data, and their integration with other biodiversity-related data and resources.</p>
                <p>As this is a rapidly moving field, rather than listing these resources in a static table herein, we provide a contextualised list on the ELIXIR services website: 
                    <ext-link ext-link-type="uri" xlink:href="https://elixir-europe.org/services/biodiversity">https://elixir-europe.org/services/biodiversity</ext-link>. Over time, this portfolio of biodiversity informatics resources and services will be reviewed and extended to reflect the 
                    <italic toggle="yes">status quo</italic>, bringing visibility to existing infrastructures as well as stimulating initiatives to address key gaps and improve integration. Many demonstrate how ELIXIR already acts as an infrastructure to support the integration of molecular and other biodiversity-related data, as elaborated in the four different use cases detailed below. The current range of identified resources includes those that enable deposition and archival of molecular data as well as facilitating access to and retrieval of biodiversity-relevant data. This extends to software, workflows, and computing resources for data analysis, for improving data interoperability, and for using molecular data to address key questions in biodiversity. It incorporates access to training for researchers coming from diverse backgrounds, and advocates FAIR data principles of findability, accessibility, interoperability, and reusability as a cornerstone of any infrastructure that supports the integration of molecular and other biodiversity-related data.</p>
                <p>In addition, wider assistance and guidance to help with Life Science data management, can also be found in the ELIXIR Research Data Management Kit (
                    <xref ref-type="bibr" rid="ref92">RDMkit 2021</xref>), an online guide containing good data management practices applicable to research projects from the beginning to the end.</p>
            </sec>
        </sec>
        <sec id="sec4">
            <title>Use cases: Integrating sequence collections and biodiversity infrastructures</title>
            <p>Here we describe four use cases that demonstrate how biodiversity-relevant bioinformatics resources are being used to connect and integrate sequencing data with biodiversity-related research infrastructures to enhance interoperability and value. The use cases cover a broad spectrum with a common theme of showing examples of how these tools and other resources are used in order to process, analyse, and archive molecular sequencing data, within the broader context of biodiversity-related data generation and research. Use case 1 examines taxonomies, the key roles they play in biodiversity research, and the interdependence of molecular data and taxonomic references. Use case 2 turns to metagenomics data and the exploration of the hidden diversity of the world&#x2019;s oceans. Use case 3 highlights genetics and genomics resources and initiatives for food security and agriculture. Finally, use case 4 details efforts to describe and understand the patterns of global distributions and diversity of fungi using molecular data. Although by no means exhaustive, these use cases provide clear examples of key life science tools and resources supporting biodiversity informatics through the integration of molecular and other biodiversity-related data to facilitate global efforts to protect and restore biodiversity.</p>
            <sec id="sec5">
                <title>Use case 1: The interdependence of molecular and biodiversity resources via taxonomic names</title>
                <p>Creating a comprehensive taxonomy linked with unique taxonomic identifiers (taxIDs) concerns mainly an efficient interoperability function in molecular biodiversity studies such as DNA barcoding and metabarcoding, phylogeny inference, genomics, and data retrieval. Occurrence and taxonomic data such as those present in the GBIF taxonomic backbone (
                    <xref ref-type="fig" rid="f1">Figure 1a</xref>) provide the opportunity to summarise the geographical distribution of included taxa and more recently the described taxa supplied by BLR (
                    <xref ref-type="bibr" rid="ref5">Agosti et al. 2019</xref>). However, such data are not necessarily linked to unique taxIDs across repositories and might include several synonyms that also remain unlinked. The same issue can be encountered in the CoL (
                    <xref ref-type="bibr" rid="ref93">Roskov et al. 2020</xref>) resource where the most recent recognised taxonomy, when covered, is reported (
                    <xref ref-type="fig" rid="f1">Figure 1b</xref>). It is important to note that not all these taxonomic entries have associated molecular sequence data where many of them originate from classical biodiversity studies. In this context, while seeking a new experimental design for molecular characterisation of specific organisms, the absence of unique identifiers (i.e. taxIDs) represents an important issue in collecting the most comprehensive information related to the organisms of interest. This may be due to several reasons including the presence of synonymous names, taxa with the same scientific names but with different taxonomic classifications, the splitting of well-established species leading to the nomination of new and different taxa, e.g. European Grass Snake (
                    <xref ref-type="bibr" rid="ref63">Kindler et al. 2017</xref>), or evidence-based renaming of species, such as the fungus that causes ash dieback (
                    <xref ref-type="bibr" rid="ref10">Baral et al. 2014</xref>), requiring additional needed legacy information to track the recent changes in taxonomic classifications and link them efficiently to a reference taxonomy.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Interconnections of taxonomy and molecular data resources.</title>
                        <p>A schematic representation of the primary molecular and taxonomy data resources illustrating how they are interconnected to support the development of comprehensive taxonomies linked with unique taxonomic identifiers (taxID). Specifically, each NCBI taxID is associated with a molecular sequence in (a) the NCBI and ENA primary databases which feed (b) the GBIF taxonomy backbone. (c) CoL informs both the NCBI taxonomy and GBIF with new or updated taxa names taking information from third party specialised resources. Finally, (d) literature data are used to extract taxonomic names and treatments to enhance and update NCBI taxonomy, GBIF and CoL through the Biodiversity Literature Repository and TreatmentBank. &#x2018;XX&#x2019; indicates taxon or other specialised resources. Lines with arrows indicate data sharing efforts. Dashed lines with red arrows indicate that only a part of data is shared with the destination resources. Blue boxes highlight machine annotation and yellow boxes indicate human curation. NCBI, United States National Center for Biotechnology Information; ENA, European Nucleotide Archive; GBIF, Global Biodiversity Information Facility; CoL, Catalogue of Life.</p>
                    </caption>
                    <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/77505/544777d5-89be-41ab-af2e-f08bf30e94c9_figure1.gif"/>
                </fig>
                <p>The NCBI taxonomy database (
                    <xref ref-type="bibr" rid="ref40">Federhen 2012</xref>; 
                    <xref ref-type="bibr" rid="ref98">Schoch et al. 2020</xref>) offers well-structured taxonomic classification reports in which &#x2018;synonyms&#x2019; and &#x2018;equivalent names&#x2019; are linked to the unique taxID of the main taxon scientific name (
                    <xref ref-type="fig" rid="f1">Figure 1a</xref>). In particular, the release of February 10th 2021 contains 179,314 declared synonyms and 1,180 scientific names with more than one taxonomic path or rank. For instance, 
                    <italic toggle="yes">Diplura</italic> is the scientific name of both order and genus ranks, 
                    <italic toggle="yes">Centipeda</italic> is a genus name belonging to plants and firmicutes, and 
                    <italic toggle="yes">Taenidia</italic> is a Coleoptera subgenus and a genus name belonging to plants. As noted above, the lack of molecular sequence data for many established taxa means that they currently have no corresponding NCBI taxIDs. This represents the gap between the NCBI taxonomy and other repositories or backbones such as CoL and GBIF (
                    <xref ref-type="fig" rid="f1">Figure 1b and c</xref>). In addition, other molecular sequence collections, such as BOLD, contain entries with related taxonomic information sometimes not yet incorporated into the NCBI taxonomy and consequently lacking unique NCBI taxIDs.</p>
                <p>An important way to enhance the completeness of the taxonomy information is to merge and harmonise such information coming from different sources. A good example in this context is the EukMap platform developed within the UniEuk project (
                    <xref ref-type="bibr" rid="ref13">Berney et al. 2017</xref>). It is an open-source software currently oriented to protist taxonomy management, but it can be deployed by other communities adapting it to their needs. The platform adopts an online open collaboration concept for expert driven curation able to link state-of-the-art phylogeny-based taxonomy with genetic information. As such, taxonomists are encouraged to propose updates or corrections to the taxonomy using the platform. Proposals are then validated by community experts to feed into the official release of the UniEuk taxonomic framework with the goal of pushing these changes to the common taxonomy resources such as the NCBI (
                    <xref ref-type="bibr" rid="ref98">Schoch et al. 2020</xref>) and SILVA (
                    <xref ref-type="bibr" rid="ref90">Quast et al. 2012</xref>).</p>
                <p>More generally, a solution for taxonomic name integration is included in the published literature (
                    <xref ref-type="fig" rid="f1">Figure 1d</xref>). All these names including their history are documented in the huge, daily growing corpus of highly structured taxonomic literature, comprising well over 100M pages of printed or more recently electronically published literature, dating back to the origin of modern taxonomy in 1753 and 1758 for plants and animals respectively (
                    <xref ref-type="bibr" rid="ref69 ref70">Linn&#x00e9; &amp; Salvius 1753, 1758</xref>). Each taxonomic name is accompanied by a taxonomic treatment with a description and/or diagnosis, notes on behaviour, distribution, vernacular names, and citations of previous treatments or synonyms. The latter functions not only similar to a bibliographic citation for articles, for which a Digital Object Identifier (DOI) can be mined, but can also be typed, for example by creating a synonym (see e.g. the original description of the honey bee 
                    <italic toggle="yes">Apis mellifera</italic> by Linnaeus (
                    <xref ref-type="bibr" rid="ref68">Linnaeus 1758</xref>). In this last issue, text mining techniques would play an important role in collecting the relevant information from scientific literature to update the knowledge needed to resolve such ambiguity. For example, Plazi (
                    <xref ref-type="bibr" rid="ref89">Plazi 2021</xref>) extracted over 370,000 taxonomic treatments and data therein including taxonomic treatment citations (
                    <xref ref-type="bibr" rid="ref74">Miller et al. 2015</xref>). These data are FAIR and reused by GBIF and accessible through Plazi&#x2019;s application Synospecies, providing access to the taxonomic names and synonyms as linked open data. They are also submitted once a day to NCBI, albeit only data covering organisms already present in the database, and thus morphological based species without molecular sequence depositions are discarded.</p>
                <p>An additional source of information on taxon names used in scientific publications falls outside taxonomic treatments, such as linked supplementary data tables (e.g. listing all sequenced specimens with their corresponding taxonomic names and accession numbers), or a list of species or molecular taxonomic units identified from a metabarcoding survey (
                    <xref ref-type="fig" rid="f1">Figure 1d</xref>). Clearly the advantages of having access to the taxonomic treatments and to the structured data tables embedded in the scientific papers, as this allows understanding the reasoning for creating a new species name or synonym, are numerous. This also provides access to cited specimens, permits the discovery of advanced species/taxa interactions such as viral hosts or plant pollinators, and promotes the development of a harmonised and complete list of taxonomic names tagged by unique taxIDs.</p>
            </sec>
            <sec id="sec6">
                <title>Use case 2: Metagenomics exploration of the hidden diversity of the world&#x2019;s oceans</title>
                <p>Biodiversity data derived from marine metagenomics datasets have grown substantially during the last years and can serve as an excellent example of how molecular sequence data have expanded the insight and understanding of microbial biodiversity in the marine environment. Before the establishment of ELIXIR (
                    <xref ref-type="bibr" rid="ref50">Harrow et al. 2021</xref>) and the Marine Metagenomics Community (
                    <xref ref-type="bibr" rid="ref77">MMC 2021</xref>), there was a lack of standards on how to process the data and deposit metagenomic and metagenomic-derived data into appropriate databases. As one of the first steps to address these gaps, the MMC published best practices (
                    <xref ref-type="bibr" rid="ref53">ten Hoopen et al. 2017</xref>) that served as a foundation for a community standard to enable reproducibility and better sharing of metagenomic data along with comprehensive sampling metadata. As a part of the work, the community built and benchmarked analysis pipelines, established domain-specific reference databases and established better procedures for deposition of metagenomic data.</p>
                <p>An example project that has been very successful in using molecular sequence data to inform and enrich our understanding of biodiversity is the work undertaken by the Tara Ocean Foundation (
                    <xref ref-type="bibr" rid="ref106">TARA 2021</xref>). Within this project, several major studies have been undertaken since 2009 using molecular sequencing techniques to characterise the life of the world&#x2019;s oceans. Tara Oceans has advanced our knowledge of all microbial kingdoms of life present in the ocean, from bacteria to small eukaryotes, as well as viruses (e.g. 150,000 eukaryotic taxa in the epipelagic ocean at 90% unknown, nearly 200,000 new double-stranded DNA viruses). The approach uses meta-barcoding, metagenomic and meta-transcriptomic data sequencing (
                    <xref ref-type="bibr" rid="ref104">Sunagawa et al. 2020</xref>) to generate large numbers of raw sequence reads derived from organisms present in water samples. The Ocean Gene Atlas (
                    <xref ref-type="bibr" rid="ref116">Villar et al. 2018</xref>) is a web service to explore the biogeography of marine genes (
                    <xref ref-type="fig" rid="f2">Figure 2</xref>) based on sequence similarities and consists of the Tara Ocean Microbiome - Reference Gene Catalog database (OM-RGC) and the Marine Atlas of Tara Ocean Unigenes (MATOU) (
                    <xref ref-type="bibr" rid="ref103">Sunagawa et al. 2015</xref>; 
                    <xref ref-type="bibr" rid="ref107">Tara Oceans Coordinators et al. 2018</xref>). The OM-RGC contains 46 million bacterial/archaeal genes, generated from metagenome raw data, while MATOU contains 117 million eukaryotic genes, generated from the metatranscriptome raw data. The raw data from Tara Oceans has also been submitted to MGnify - a free to use resource for analysis, visualisation and discovery of metagenomic, metatranscriptomic, amplicon and assembly datasets (
                    <xref ref-type="bibr" rid="ref76">Mitchell et al. 2020</xref>). Approximately 1,300 samples in eight studies have so far been analysed in MGnify, including 370 metatranscriptome and metagenome samples. Of these, 1,189 amplicon events have been registered in GBIF, giving rise to more than 750,000 biogeography occurrences (
                    <xref ref-type="bibr" rid="ref42">GBIF 2018</xref>). The sequence datasets analysed in MGnify are stored in the European Nucleotide Archive (ENA) and re-used in other marine reference databases such as METdb, a genomic reference database dedicated to micro-eukaryotic species, containing 348 organisms and 463 strains of micro-eukaryotic species derived from transcriptome sequence data (
                    <xref ref-type="bibr" rid="ref81">Niang et al. 2020</xref>).</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Overview of the processing of marine environmental sequence information.</title>
                        <p>A simplified flowchart of the processing steps of information from the Tara Oceans datasets (metagenomics, metatranscriptomics, and amplicons) to integrate data with primary and secondary resources and other biodiversity platforms as GBIF and WoRMS. Processing of sequence data from oceanic water samples using informatics tools and services connects them with taxonomic information and links them to knowledge about species biology and ecosystem variables. PR
                            <sup>2</sup>, pr2-primers: an 18S rRNA primer database for protists; SILVA, an on-line resource for quality checked and aligned ribosomal RNA sequence data; ITSoneDB, a collection of eukaryotic ribosomal RNA Internal Transcribed Spacer 1 (ITS1) sequences; UniProt, the universal protein resource of sequence and functional information; ENA, European Nucleotide Archive; MGnify, the microbiome analysis resource; OGA, Ocean Gene Atlas; OBA, Ocean Barcode Atlas; GBIF, Global Biodiversity Information Facility; WoRMS, the World Register of Marine Species; MetDB, a genomic reference database for marine species; MAR databases, a collection of richly annotated and manually curated contextual and sequence resources for marine species.</p>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/77505/544777d5-89be-41ab-af2e-f08bf30e94c9_figure2.gif"/>
                </fig>
                <p>The metagenome-assembled genomes (MAGs) generated from analyses of shotgun sequenced samples in MGnify have been included in the MAR databases (
                    <xref ref-type="bibr" rid="ref65">Klemetsen et al. 2018</xref>), a collection of richly annotated and manually curated contextual (metadata) and sequence databases for marine prokaryote species. Context is captured through ensuring compliance with the Genomic Standards Consortium (
                    <xref ref-type="bibr" rid="ref41">Field et al. 2014</xref>) recommendations for Minimum Information about any (x) Sequence (MIxS) standards, an overarching framework of sequence metadata (
                    <xref ref-type="bibr" rid="ref124">Yilmaz et al. 2011</xref>). These resources are accessible through the Marine Metagenomics Portal (
                    <xref ref-type="bibr" rid="ref78">MMP 2021</xref>), with the MarRef containing nearly 1,000 complete microbial genomes, and MarDB hosting more than 13,000 non-complete genomes. The MAR database entries are cross-referenced with ENA and the World Register of Marine Species (WoRMS) (
                    <xref ref-type="bibr" rid="ref110">Vandepitte et al. 2018</xref>) records to ease the access to additional and curated metadata. The data from the Tara Oceans project also provides links to several other databases such as UniProt (
                    <xref ref-type="bibr" rid="ref109">The UniProt Consortium 2019</xref>), a high-quality curated database of protein sequences and functional information, SILVA (
                    <xref ref-type="bibr" rid="ref90">Quast et al. 2012</xref>), a database for ribosomal RNA (rRNA) genes used for phylogenetic reconstruction, PR
                    <sup>2</sup> (
                    <xref ref-type="bibr" rid="ref111">Vaulot et al. 2021</xref>) a reference database of 18S rRNA protist sequences carefully curated by experts from each taxonomic group in the context of EukRef project, and ITSoneDB (
                    <xref ref-type="bibr" rid="ref95">Santamaria et al. 2018</xref>), a specialised collection of ribosomal RNA Internal Transcribed Spacer 1 (ITS1) sequences aimed at the taxonomic identification of eukaryotes. The Ocean Barcode Atlas (OBA) is a web service designed to explore the biodiversity and biogeography of marine organisms at planetary scale for Tara Oceans and other marine metabarcode datasets (
                    <xref ref-type="bibr" rid="ref113">Vernette et al. 2021</xref>).</p>
                <p>
                    <xref ref-type="fig" rid="f2">Figure 2</xref> illustrates how raw environmental sequence data derived from oceanic water samples are processed, annotated, and re-used, applying informatics tools and services to connect them with taxonomic information that helps link the data to knowledge about species biology and ecosystem variables. These sequence datasets therefore serve as a measure to determine diversity and abundance in a specific habitat, provide a means to quantify declines in biodiversity and climate change, and allow for efficient comparisons of datasets, e.g. time-series experiments, in environmental or species monitoring programmes.</p>
                <p>Other large-scale projects to analyse ocean biodiversity have also been undertaken in recent years, including the Malaspina expedition (
                    <xref ref-type="bibr" rid="ref30">Duarte 2015</xref>), Ocean Sampling Day initiatives (
                    <xref ref-type="bibr" rid="ref66">Kopf et al. 2015</xref>), the Antarctic Circumnavigation Expedition (
                    <xref ref-type="bibr" rid="ref3">ACE 2021</xref>), and the Multidisciplinary drifting Observatory for the Study of Arctic Climate expedition (
                    <xref ref-type="bibr" rid="ref79">MOSAiC 2021</xref>). On the one hand, the large and growing variety of observations taken during oceanic sampling (
                    <xref ref-type="bibr" rid="ref49">Gorsky et al. 2019</xref>) have posed many data management challenges. On the other hand, facing these challenges means that the field of marine metagenomics has paved the way towards better capturing, processing, and managing of samples and their metadata. In parallel to these studies addressing diversity issues at the global ocean scale, smaller spatial scale studies addressing temporality issues have emerged (including classical diversity data, genomic data, and imaging data) on enhanced marine genomic observatories (
                    <xref ref-type="bibr" rid="ref14">Bourlat et al. 2013</xref>; 
                    <xref ref-type="bibr" rid="ref24">Davies et al. 2014</xref>). More generally, increasingly integrative approaches to diversity analysis are now favoured by the marine research community (
                    <xref ref-type="bibr" rid="ref16">Canonico et al. 2019</xref>; 
                    <xref ref-type="bibr" rid="ref20">Collins et al. 2020</xref>). Although marine metagenomics is relatively mature as a field, there are still many issues that need attention. There is a need to implement standardised procedures for processing and analysing datasets, including best practices for assembly, binning and annotation. Furthermore, the quality of reference databases, integration of new omics data, specific data warehouses, and long-term data management services are issues that warrant careful attention, e.g. in the context of moving from biodiversity snapshots to large-scale monitoring and discovery.</p>
            </sec>
            <sec id="sec7">
                <title>Use case 3: Biodiversity genetics and genomics for food and agriculture</title>
                <p>Adaptation of agriculture has been based on fitting crop varieties and breeds to their production system, which includes farming systems and their natural environments. This has led after initial domestication to the development of a large diversity of varieties and breeds adapted to local farming conditions but also to diverse usage and consumer demands. With the specialisation and industrialisation of production systems after the Second World War, this high intraspecific diversity has started to decline all over the world and is now threatened in many cases (
                    <xref ref-type="bibr" rid="ref88">Pilling, B&#x00e9;langer &amp; Hoffmann 2020b</xref>). Important initiatives to catalogue and conserve this diversity in large 
                    <italic toggle="yes">ex situ</italic> collections or with participatory 
                    <italic toggle="yes">in situ</italic> approaches have grown in parallel with a global governance under the auspices of the United Nations Food and Agriculture Organisation (FAO) (
                    <xref ref-type="bibr" rid="ref87">Pilling, B&#x00e9;langer, Diulgheroff, et al. 2020a</xref>). The global objectives of these initiatives are to secure this biodiversity as the indispensable foundation of sustainable food production systems (
                    <xref ref-type="bibr" rid="ref101">Smale &amp; Jamora 2020</xref>), highlighted in the EU Biodiversity 2030 Strategy, the EU Green Deal, and the UN Sustainable Development Goal 2.5 (Zero hunger - maintain the genetic diversity of seeds, cultivated plants and farmed and domesticated animals and their related wild species). The global collections of genetic resources, comprising ~5.4 million plant accessions from over 50,000 species and over 7,800 local breeds (
                    <xref ref-type="bibr" rid="ref87">Pilling, B&#x00e9;langer, Diulgheroff, et al. 2020a</xref>), are managed by a large number of stakeholders. Plant genetic resources are conserved in more than 700 genebanks from 103 countries and 17 regional or international research centres (
                    <xref ref-type="bibr" rid="ref87">Pilling, B&#x00e9;langer, Diulgheroff, et al. 2020a</xref>), that contribute to international catalogues of genetic resources such as the European Search Catalogue for Plant Genetic Resources (EURISCO) (
                    <xref ref-type="bibr" rid="ref119">Weise et al. 2017</xref>) and the European Farm Animal Biodiversity Information System (EFABIS) at the European level and the Domestic Animal Diversity Information System (
                    <xref ref-type="bibr" rid="ref21">DAD-IS 2021</xref>) and GENESYS (
                    <xref ref-type="bibr" rid="ref44">Genesys 2021</xref>) at the international level (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>; (
                    <xref ref-type="bibr" rid="ref39">FAO 2010</xref>)). Another possibility for archiving data on collections of genetic resources is to use the GBIF portal, which is often carried out in parallel as an alternative that does not require any clearance by governmental agencies (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>; e.g. datasets at GBIF from The Netherlands Centre for Genetic Resources, (
                    <xref ref-type="bibr" rid="ref72">Menting 2020</xref>)).</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Overview of the main information systems used for archiving data on genetic resources for food and agriculture.</title>
                        <p>In the green box at the bottom, the information systems used to manage the data collected and curate it. Some of these information systems are maintained by ELIXIR nodes in their national infrastructures. The data can then regularly be submitted and updated in international archives. The list of Genetic Resource accessions are archived in the European Search Catalogue for Plant Genetic Resources (EURISCO) and the European Farm Animal Biodiversity Information System (EFABIS) after clearance by National Focal Points appointed by country governments and then collected by global information systems, Genesys and the Domestic Animal Diversity Information System (DAD-IS). They can also archive their datasets at GBIF without any clearance. Genotyping and genomic data are archived in ELIXIR deposition databases and Core Data Resources (EMBL-EBI ENA, EVA and BioSamples). Brokering platforms such as COPO, can be used to facilitate data submission to international archives. ELIXIR has also contributed to a global standard for a RESTful application programming interface (API) focused on plant data, BrAPI, that is progressively implemented on the main plant information systems to allow automatic access to standardised data. GBIF, Global Biodiversity Information Facility; ENA, European Nucleotide Archive; EVA, European Variation Archive.</p>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/77505/544777d5-89be-41ab-af2e-f08bf30e94c9_figure3.gif"/>
                </fig>
                <p>Since the 1980s, collections of accessions have been genotyped with a set of fast-evolving techniques, mainly to understand crops and breed evolution since domestication and for the identification of adaptive traits, e.g. (
                    <xref ref-type="bibr" rid="ref123">Wilkinson et al. 2013</xref>; 
                    <xref ref-type="bibr" rid="ref15">Brozynska et al. 2016</xref>; 
                    <xref ref-type="bibr" rid="ref71">Mascher et al. 2019</xref>). Sequence variation has also proved useful and is increasingly used for monitoring current maintained genetic diversity, e.g. detection of redundancy in collections, or assessment of threat levels facing small populations of breeds, forest trees or crop wild relatives (
                    <xref ref-type="bibr" rid="ref11">B&#x00e9;langer et al. 2019</xref>). International archives have been developed to store sequence variation data, such as dbSNP (
                    <xref ref-type="bibr" rid="ref100">Sherry et al. 2001</xref>) focused on Sequence Nucleotide Polymorphisms (SNPs) or the European Variation Archive (
                    <xref ref-type="bibr" rid="ref37">EVA 2021</xref>) launched more recently to store any type of sequence variant that can be expressed in Variant Call Format (VCF) (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>). Companion archives can be used to track the accession identifiers and collection provenance (BioSamples and BioStudies at EMBL-EBI or BioProjects at NCBI) while the reference genomes used for the detection of the variations must be stored in the INSDC archives prior to the submission of variation data (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>).</p>
                <p>A key challenge to be addressed in the context of biodiversity genetics and genomics for food and agriculture rests with the identification of the accessions of genetic resources and their consistency across the catalogues and molecular archives. Breeds or crop variety names that are critical information for linking data obtained on genetic resources to previous knowledge are also a challenge for interoperability due to misspelling, homonyms and synonyms over time, and across regions and borders. Given that reference genome sequences, sequence variation data and catalogues of accessions of genetic resources are usually managed by separate groups, they often end up in different silos with poor or no interoperability. This also affects the interoperability of the data once submitted to international archives, which is still not a routine practice. It is therefore currently not possible to automatically obtain the genetic variation data associated with a given panel of accessions selected in a catalogue of genetic resources or reciprocally, to retrieve all known information on the origin of the accessions (country of origin, type of material, etc.) associated with a variant found in EVA or dbSNP. For crops, the United Nations FAO recently recommended adding a DOI to the three fields of the MultiCrop Passport Data standards that have ensured the unique identification of accessions to date (species name, holding institution name, and the accession identifier provided by the holding institute) and developed a dedicated service (GLIS: the Global Information System on Plant Genetic Resources for Food and Agriculture, 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>) to support the adoption of this new practice by genebanks. The communities working on crops, farm animals and forest trees are also actively working with EMBL-EBI to develop dedicated specifications for the metadata associated with the data archived in ENA and EVA and in particular to ensure that they track identifiers associated with accessions of genetic resources. In this context it is important to take into account the possible different scales at which the genetic resources are collected, i.e. an individual for most crops, and populations for breeds and most forest trees. Reciprocally, mechanisms for capturing and updating in genebank catalogues the identifiers associated with the samples that were genotyped or sequenced are needed (see e.g. in relation with domestic animals biological resources: 
                    <xref ref-type="bibr" rid="ref57">IMAGE 2021</xref>). These challenges are not necessarily unique to biodiversity genetics and genomics for food and agriculture, but they particularly highlight efforts required to informatically process and connect sequence data with sample metadata.</p>
            </sec>
            <sec id="sec8">
                <title>Use case 4: Understanding biogeographical diversity through molecular mapping of global fungal distributions</title>
                <p>The GlobalFungi Database (
                    <xref ref-type="bibr" rid="ref47">GlobalFungi 2021</xref>) exemplifies efforts to connect sequencing data to biodiversity research infrastructures and advance data-driven research. Fungi play key roles in all terrestrial ecosystems, primarily as decomposers of organic matter but also as pathogens or symbionts. Long-standing scientific interests in describing and understanding the patterns of global distributions and diversity of fungi mean that sequencing initiatives have led to an accumulating wealth of fungal molecular data from various geographical regions, ecosystems, and habitats. Large-scale studies focusing on soil fungi have used metabarcoding analysis to examine the ecological drivers and biogeographic patterns of fungal community composition and diversity (
                    <xref ref-type="bibr" rid="ref108">Tedersoo et al. 2014</xref>; 
                    <xref ref-type="bibr" rid="ref32">Egidi et al. 2019</xref>). However, coordinated global sampling at sufficient spatial and taxonomic resolution remains largely unfeasible for individual research studies. Instead, a meta-approach is needed to collect, collate, categorise, and centralise existing data using infrastructures that can continue to gather and include new and future genetic and genomic datasets. The GlobalFungi Database was established as a platform to address these needs by providing public access to published data on fungal community composition obtained by next-generation-sequencing approaches through a web-based interface that promotes FAIR principles and allows various queries and visualisations of the results (
                    <xref ref-type="bibr" rid="ref115">V&#x011b;trovsk&#x00fd; et al. 2020</xref>). Release version 3.0 contains over 1100 million observations of fungi from 367 manually curated studies with over 36,000 samples and 213 million ITS sequence variants (
                    <xref ref-type="fig" rid="f4">Figure 4</xref>). GlobalFungi allows searching for specific sequence variants, fungal genera, species, and molecular species (called &#x2018;species hypotheses&#x2019;) by performing BLASTn sequence searches and querying the local MySQL database. Annotation of taxa is based on UNITE, the database of fungal molecular taxa compiled using direct sequencing of known fungal species and environmental sequencing of targeted barcodes (
                    <xref ref-type="bibr" rid="ref82">Nilsson et al. 2019</xref>). GlobalFungi contains data from high-throughput sequencing efforts including local abundance of fungi and complete sampling metadata, and allows querying of samples by location searches on maps or through the studies where they were published. These are complemented with extensive climatic data for sample locations retrieved from the CHELSA (Climatologies at High resolution for the Earth&#x2019;s Land Surface Areas) database (
                    <xref ref-type="bibr" rid="ref60">Karger et al. 2017</xref>). The GlobalFungi Database aims to continue to grow by adding more records and by motivating the community to submit new datasets to help build the resource for research on the systematics, biogeography, and ecology of fungi.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Data and annotation sources connected via the GlobalFungi Database.</title>
                        <p>GlobalFungi enables searches for specific sequence variants, fungal genera, species, and molecular species (called &#x2018;species hypotheses&#x2019;) by performing BLASTn sequence searches and querying a local MySQL database. Annotation of taxa is based on UNITE, the database of fungal molecular taxa. Samples can be queried by searching on maps or through the studies where they were published. Climatic data for sample locations are retrieved from the CHELSA (Climatologies at High resolution for the Earth&#x2019;s Land Surface Areas) database. ITS, Internal Transcribed Spacer; GPS, Global Positioning System.</p>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/77505/544777d5-89be-41ab-af2e-f08bf30e94c9_figure4.gif"/>
                </fig>
                <p>The utility of such a centralised resource connecting sequencing data to biodiversity research infrastructures is demonstrated generally through the characterisation of global patterns of fungal biodiversity (
                    <xref ref-type="bibr" rid="ref114">V&#x011b;trovsk&#x00fd; et al. 2019</xref>) or predicting the global biodiversity of fungi (
                    <xref ref-type="bibr" rid="ref9">Baldrian et al. 2021</xref>) and specifically through the ability to identify fungi that are carried across continents along with introduced plants (
                    <xref ref-type="bibr" rid="ref117">Vlk et al. 2020</xref>). Moreover, the metadata-rich resource helped to show that symbiotic fungi are more vulnerable to climate change than pathogens and that climate change thus represents a considerable threat for forestry production, agriculture, and food security (
                    <xref ref-type="bibr" rid="ref114">V&#x011b;trovsk&#x00fd; et al. 2019</xref>). Beyond community diversity and biogeography patterns, the distributions of individual fungal species are particularly important, e.g. for phytopathogenic fungi that may severely affect yields of agricultural crops such as the 
                    <italic toggle="yes">Fusarium</italic> pathogen of bananas (
                    <xref ref-type="bibr" rid="ref22">Dale et al. 2017</xref>). Exploiting the GlobalFungi Database, mycologists, ecologists, or global climate change scientists are able to link fungal occurrence and diversity data with the panel of collected metadata, allowing for the characterisation of key environmental factors that are driving fungal diversity. Such studies can be performed at different geographic levels, from country scales to biomes of the entire world, and for all identifiable fungal communities or for selected ecosystem compartments. Collating these data involves manual curation of information from published studies (367 studies in the latest release), but metadata heterogeneity means that attributes extracted from the publications that are common across the database are limited to just Longitude, Latitude, Continent, Sample type, Biome, Sampling year, Primers used, and pH, while additional metadata only exist for some of the studies. Nevertheless, these resources bring together different data types to enable assessments of fungal diversity across the globe and tracking of individual species or genera, leading to the development of a more comprehensive understanding of the biogeography of fungal diversity. Importantly, this also facilitates assessments of potential threats faced by fungal communities and the ecosystems of which they form such a vital part.</p>
            </sec>
        </sec>
        <sec id="sec9">
            <title>Informing strategies for large-scale national biodiversity programmes</title>
            <p>Now that large-scale regional, national, and global biodiversity genomics projects are a reality, it is vital to capitalise on the lessons learned and best practices developed through initiatives such as the example use cases presented above. The greatest impact that genetic and genomic data can have on biodiversity assessments, monitoring, conservation, and restoration will only be realised with the support of infrastructures that facilitate the finding, sharing, and connecting of increasingly large and diverse datasets. This requires efforts at all levels to be put into practice from the start, informing strategies for biodiversity programmes to ensure that the data they generate are findable and interoperable. A huge amount of data is being produced globally, and whilst the situation is improving with respect to open access for sequencing data at least, much of this data is still not made available to the research community with adherence to the FAIR principles. By developing strategies and supporting infrastructures that make this easier and scalable, usability and impact will be greatly extended: a major goal of ELIXIR. National biodiversity sequencing efforts can be a useful opportunity to demonstrate how project-wide strategies for harmonisation and standardisation of FAIR data can be put to good effect.</p>
            <p>The primary products of these programmes, the assembled genomes and their corresponding annotations, are the fundamental building blocks that modern computational comparative approaches exploit to learn about the biology and evolution of the species (
                <xref ref-type="bibr" rid="ref125">Zoonomia Consortium 2020</xref>). These benefit from and build on accumulated knowledge from field and wet lab research compiled by biologists working on their organisms of interest and documenting experiment details and sample information. This species, experiment, and sample metadata is vital to contextualise the production of a genome and its annotation, and even more so when subsequently exploiting these resources, e.g. through gene expression analysis and interpretation using transcriptomic and other techniques.</p>
            <p>Standardisation is essential for the successful scaling up of these initiatives. Whilst the superset of metadata used to describe biological entities and processes might be ever-expanding, metadata about the provenance of samples can be reduced to a subset of &#x2018;core&#x2019; terms that reflect descriptions that are fundamental to the downstream contextualisation of a given sequence. For example, the Darwin Tree of Life project (
                <xref ref-type="bibr" rid="ref29">DToL 2021</xref>) is a large programme that aims to understand the biodiversity of the British Isles, by sequencing the DNA of all the animals, plants, fungi, and protists, comprising approximately 60,000 species. As a partner of the Earth BioGenome Project (
                <xref ref-type="bibr" rid="ref67">Lewin et al. 2018</xref>), DToL has worked with sample collectors who are, or collaborate with, taxonomic experts to develop a core standard for sample metadata collection alongside Standard Operating Procedures for physical preservation of samples and subsequent sequencing. The breadth of the genomes that will be produced from the wide array of habitats, collection methods, and variety of recorded traits across taxonomic groups is a key challenge in terms of ensuring compliance with these standards.</p>
            <p>DToL is also undertaking widespread DNA barcoding of specimens. DNA barcoding contributes to rapid identification of biological material and, in terms of cost-benefit, knowing when to barcode and/or genome sequence a specimen could be seen to be a balancing act when considering how to efficiently make assessments of biodiversity. As noted in Use Case 1, barcoding provides a fast and cost-effective technical process to ascertain a provenance trail for a given organism with respect to its taxonomic lineage, an essential part of biodiversity studies. This becomes increasingly important where taxonomic identification is still uncertain due to conflicting or a lack of information, i.e. where an expert identification results in naming differences, lack of defined lineages of less well-studied organisms within the taxonomy databases, and discrepancies with taxonomic identifier allocation services such as the NCBI. As part of DToL, specific metadata schemas are being prepared to assist with the collection of standardised barcoding data alongside methodologies to automate taxonomic identification based on amplicon sequences. Data management tools incorporate and link the deposited sample metadata and the subsequent genomes in the EMBL-EBI Biosamples and ENA databases, respectively, and will also submit to BOLD.</p>
            <p>Other national projects focus on within species diversity rather than between species. The national Swedish conifer programme to sequence the Norway spruce and Scots pine genomes serves as an example of what can be done with a well-assembled and annotated genome (
                <xref ref-type="bibr" rid="ref83">Nystedt et al. 2013</xref>). Around 75% of Sweden&#x2019;s area is covered with forest, and much of this is conifer. To improve production and to inform a sustainable forestry practice, the genomes will serve as a basis for a massive resequencing effort where thousands of individuals are sequenced using short read technologies. This will in turn be used to study population structure, and tens of thousands of individuals with known phenotypes will be genotyped. These phenotypes can then be coupled to genotypes and used to improve productivity and to create varieties more adapted to climate change. This also opens up the possibility of pangenomics, an area that is growing in popularity and usefulness, especially in the context of food security highlighted by use case 3, and particularly for crop and livestock improvement (
                <xref ref-type="bibr" rid="ref105">Tao et al. 2019</xref>; 
                <xref ref-type="bibr" rid="ref62">Khan et al. 2020</xref>; 
                <xref ref-type="bibr" rid="ref23">Danilevicz et al. 2020</xref>; 
                <xref ref-type="bibr" rid="ref25">Della Coletta et al. 2021</xref>). To fully exploit the massive amounts of sequence data produced, they will need to be deposited with carefully annotated metadata, and stable identifiers that are coupled with phenotypic information.</p>
            <p>Whilst the DToL and the Swedish conifer projects are at different ends of the spectrum in terms of breadth and depth, they highlight direct commonalities. They both comprise important first steps for future biodiversity studies, i.e. they develop fundamental genomic baselines on which to build future comparisons amongst organisms and populations through resequencing efforts. However, differences in sampling, naming conventions, sequencing dataset quality and coverage, and annotation quality can all lead to barriers to uptake within the FAIR data ecosystem. By using standardised methods and tools for metadata and data capture and processing, one of the key gaps in biodiversity data management is fulfilled and directly coupled to efforts to produce sequencing and barcoding data based on consistent rich metadata about the biological material from which data are derived. Technical tools, including COPO (
                <xref ref-type="bibr" rid="ref99">Shaw et al. 2020</xref>), an ELIXIR roadmapped data brokering resource, are being employed to aid consistent deposition of project-compliant data and metadata in DToL and other upcoming national and international programmes. The aim is to provide a comprehensive overview of the history of the sample, evidence for its characterisation, and its genome which is ready to be used for annotation and further study.</p>
            <p>The DToL and the Swedish conifer projects here serve as examples, in many respects paving the way for emerging initiatives such as the European Reference Genome Atlas (
                <xref ref-type="bibr" rid="ref36">ERGA 2021</xref>). Being able to link the sampled biological material to the metadata about the collection process, the identification strategy, the sequence data, and subsequent metrics for assembly, and finally the annotation, will fill crucial gaps in FAIR data delivery in these projects. In this way, the coordination of infrastructure alongside coordination of sampling and characterisation processes based on metadata specifications is a powerful way of linking FAIR data to the methodologies that communities use to undertake biodiversity research and discovery.</p>
        </sec>
        <sec id="sec10">
            <title>Common challenges faced when connecting molecular sequence and biodiversity research infrastructures</title>
            <p>The four use cases and examples of large-scale national biodiversity programmes outlined above present different aspects of how infrastructures can be involved in and support biodiversity studies. They represent data and knowledge ecosystems of connected and complementary information systems. The technical solutions to overcoming data integration challenges are often somewhat domain-specific. Nevertheless, analogies can be drawn amongst the different steps taken to address specific challenges, revealing common gaps in tools and infrastructures focused on taxonomy, metadata, and community services. Cross-domain recognition of these gaps is important to ensure coordinated efforts to address priority issues that will facilitate continued commitments to open science and increased usability of biodiversity related data in support of increased research efficiency.</p>
            <sec id="sec11">
                <title>Missing taxIDs, conflicting taxonomies, and information locked in publications</title>
                <p>The informatics processes designed to connect information from biodiversity research infrastructures with molecular sequence data collections are often hindered by the inconsistent use of taxIDs across collaborating partners. For molecular sequence data, taxIDs are issued by NCBI but only for taxa where sequences have been deposited, whereas biodiversity infrastructures often employ their own distinct sets of taxIDs. Missing and non-matched taxIDs give an incomplete and inconsistent view of currently documented taxa, which greatly decreases the power of computational analyses and severely limits cross-infrastructure interoperability. Conflicting and/or not regularly updated taxonomies employed by the different infrastructures further hinder interoperability, promoting the building of data silos by distinct research communities. Furthermore, different names are currently accepted (able to be processed) by the different infrastructures, and synonym lists are not complete or not compatible. A similar situation exists for agricultural catalogues of genetic resources, where accessions, lines, and samples may be assigned conflicting identifiers by different laboratories. Moreover, the names of breeds and varieties to which they belong are not standardised, meaning that when data are shared or archived their future reuse can be limited by the uncertainty of their origins. The information necessary to address these issues exists, but is difficult to obtain as it essentially implies determining the provenance of a name. It is trapped in the collective wisdom of experts and their publications, and thus must first be extracted, e.g. using text-mining and expert curation, and then fed into reference taxonomic infrastructures with stable backbones and fully traceable identifiers. However, this does not extend well to metagenomics-focused research where &#x2018;dark taxa&#x2019; vastly outnumber described diversity, and thus pose additional challenges in the context of defining and employing interoperable identifiers. Communities recognise that taxonomies are not static because our ever-improving understanding of life on Earth necessitates constant revisions. They also recognise that gaps created by failing to develop and support harmonisation initiatives are holding back advances in biodiversity research.</p>
            </sec>
            <sec id="sec12">
                <title>Inconsistent metadata standards: adoption of best practices</title>
                <p>Comprehensive and accurate recording of metadata are critical for data reuse and interoperability, but they require considerable extra efforts and cannot be rigidly enforced. They not only enable the tracing of the origins of samples or sample-derived molecular data, but they also provide the necessary context to be able to link these to other relevant data. The scope of such other relevant data could cover taxonomy, ecology and life history, climatology, biogeography, essential and extended sets of biodiversity variables, and much more, but only if the data can be correctly linked. The use cases outlined above highlight just how heterogeneous metadata can be across different research domains, but also how important it is to be able to maintain correct links in order to achieve meaningful research outputs. Metadata is particularly important in the context of connecting molecular sequence data to biodiversity research infrastructures, especially with expanding collections of molecular sequence data and efforts to build reference genomic species libraries. Although a suite of relevant metadata standards exist, e.g. Darwin Core (
                    <xref ref-type="bibr" rid="ref121">Wieczorek et al. 2012</xref>) for species observations, specimens, samples, and related information, and MIxS for Minimum Information about any (x) Sequence (
                    <xref ref-type="bibr" rid="ref124">Yilmaz et al. 2011</xref>), they are not used consistently and different standards are adopted by different infrastructures. This is a common problem, as research communities and projects differ with respect to how they set the balance between achieving (i) maximal data accessibility - encouraging data submissions by requiring minimal metadata standards, and (ii) maximal data findability, interoperability, and reusability - by requiring much more comprehensive cataloguing of metadata at the risk of discouraging data submissions. A common challenge is the lack of well-defined comprehensive checklists before embarking on sample collections. Efforts to develop these would mean that the appropriate metadata can be captured during the experiment, rather than retrospectively having to determine the key attributes and recover their values from heterogeneous sources. The examples presented above highlight how consistently capturing at least sample provenance can facilitate some retrospective metadata harvesting, but the challenges of doing so remain considerable. Despite general commonalities amongst standards for the whole data lifecycle: data collection, data processing, analysis, annotation, curation, and data deposition, communities recognise that metadata standards are not &#x2018;one-size-fits-all&#x2019; because the great variety of research projects means that some degree of flexibility is required. They also recognise the important added value of investing in comprehensive metadata collection. Practically however, the heterogeneity of current solutions limits communities&#x2019; abilities to fully exploit the accumulating data to advance biodiversity research.</p>
            </sec>
            <sec id="sec13">
                <title>Lack of brokering services tailored to communities</title>
                <p>Another common challenge across research communities is the lack of comprehensive and dedicated support to help scientists work towards better compliance with FAIR principles. Researchers who are designing and carrying out the sampling and experiments are not necessarily trained with the technical know-how to ensure good data management. In larger consortia there is often more scope for such support, but this has been historically largely responsive rather than being fully integrated from the early planning stages. Funding agencies are increasingly requiring detailed planning on standards for metadata collection, collation, aggregation, dissemination, and archiving, but implementing such plans remains challenging. For individual researchers, this process often constitutes a barrier that prevents their data being made available in the most useful way to the rest of the scientific community. The use cases above highlight some examples of communities that are building brokering services to meet their own needs, but these probably reflect the exception rather than the rule. Brokering services support researchers by maintaining a technical infrastructure for aiding and automating data submission. For example, the Integrated Publishing Toolkit (
                    <xref ref-type="bibr" rid="ref58">IPT 2021</xref>) is a free open-source software used to publish and share biodiversity datasets through the GBIF network. Even when such data brokering tools exist for specific communities, users still need support to ensure that they are using the systems correctly: selecting the right standards; employing the right formats; obtaining useful feedback when metadata is not collected properly or missing; and most importantly, a human support mechanism that fits with their domain. The next step, integrating data across different research domains, is often where the greatest metadata loss occurs. If a host resource cannot accommodate certain data types or structures, these remain with the submitter and risk being lost altogether even if provenance is recorded. Proactive communities have recognised many of these challenges and developed brokering tools to meet specific needs, and more generally it is clear that without such supporting services the end-value of the data for biodiversity research is greatly diminished. With the scaling up of production of high-quality sequence data collections and biodiversity research datasets, communities also recognise that ensuring high standards achieved normally through manually curating metadata will not be possible without efficient brokering. This remains practically challenging in many cases as developing and maintaining such dedicated support services to assist researchers with FAIR data brokering is rarely prioritised.</p>
            </sec>
        </sec>
        <sec id="sec14">
            <title>Recommendations for closer integrations that will shape the future of biodiversity research</title>
            <p>Our survey of approaches by which molecular technologies help inform understanding of biodiversity aimed to identify opportunities and priorities to aid strategic thinking. This highlights the emerging critical importance of making use of molecular data to advance understanding of biodiversity in its broadest terms. The four use cases clearly demonstrate that molecular data are now increasingly and routinely used to inform diverse questions on taxonomies, diversity and abundance of microorganisms, the interface with the human food chain, and to increase our understanding of organisms in a wider ecological sense. Also evident is the rapid change in scale, both in terms of foundational whole genomes and derived data, which is creating related challenges across the use cases and more widely in the field. To that end, we therefore make the following recommendations, which we believe are essential for the wider field of biodiversity research to benefit from the vast quantity of molecular data that will be generated in the coming years:</p>
            <sec id="sec15">
                <title>Biodiversity-related and molecular-focused infrastructures need to collaborate</title>
                <p>First and foremost, the key infrastructures in the molecular domain such as ELIXIR, should seek to form strong collaborations with those that span the biodiversity domain, such as (but not limited to) GBIF, DiSSCo, CETAF, ENVRI, CoL, BLR and OBIS. This will be required to meet the challenges associated with the steep scaling up of molecular approaches for the study of biodiversity. Infrastructures should build Communities of Practice that create standards and alignment across the two domains of science. This will support research aimed at discovering, monitoring, characterising, and understanding biodiversity, but also many other areas of research and innovation in the life sciences using genetic diversity as a basis. Infrastructures can benefit from the experience of ELIXIR to independently build solutions to meet specific community needs but maintain interoperability with existing resources. A significant step in this direction will be via a Horizon 2020 funded project to build the Biodiversity Community Integrated Knowledge Library (
                    <xref ref-type="bibr" rid="ref145">BiCIKL</xref>). This will bring together a cross-disciplinary set of infrastructures, spanning molecular, taxonomic, literature, museum, and others into a single community focused on addressing biodiversity-related data challenges.</p>
            </sec>
            <sec id="sec16">
                <title>Taxonomies need to be aligned and harmonised across domains</title>
                <p>To address shortcomings in the way taxonomies are handled between the biodiversity and molecular domains we should adopt a common linked data resource. Building on existing resources, taxonomy methodologies need to bridge the gap between identifiers in the molecular domain (e.g. taxID) and taxonomic names in the biodiversity domain, in a manner that is harmonised across repositories. This would provide tools to deal with synonyms and updates, it would enable better understanding of the meaning of a taxonomic name through access to taxonomic treatments, and it would facilitate annotations and links with external data. Harmonisation would also provide researchers with access to a comprehensive and consistent overview of known or accepted taxa names as a proxy of the current state of existing biodiversity to be characterised. This needs to cover all branches of life and be able to accommodate emerging potential species currently only known from sequencing-based studies.</p>
            </sec>
            <sec id="sec17">
                <title>Metadata needs to be better standardised and universally adopted</title>
                <p>To facilitate links across the biodiversity and molecular domains we should develop a consistent set of interoperable metadata standards that are fit-for-purpose and fully integrated into the research lifecycle. This will allow for the connected tracking of accessions, vouchers, and samples with a rich wealth of information captured about their origins (localisation, biome, etc.), and with publications synthesising the emerging knowledge. This has to be associated with a set of technical standards and tools to facilitate data and metadata collection, formatting, and curation, with brokering services to guide the process to completion. Responding to such needs, the ELIXIR Research Data Management Kit (
                    <xref ref-type="bibr" rid="ref92">RDMkit 2021</xref>) offers guidance on life sciences data management practices applicable to metadata in biodiversity-related research. Finally, and recognising that standards as described here are only useful if they are widely used, we recommend there is a rigorous drive towards their universal adoption via data brokering and deposition platforms and via publication of results in the scientific literature.</p>
            </sec>
            <sec id="sec18">
                <title>Approaches for managing molecular data need to be scaled up</title>
                <p>As the rate of acquisition grows, and molecular data are increasingly recognised as a common resource with multiple downstream applications, data management solutions need to scale accordingly. It is clear that from barcodes to reference genomes, sequencing hundreds of thousands of species in the near future will generate the foundational data for most biodiversity molecular studies for decades to come. Efficient data management will require national and international investments to build and sustain the required infrastructures. Upscaling the approaches for standardised and common methods for metadata capture, sequence analysis and annotation, as well as curation and archival, is critical if the data are to be re-used as widely as possible at a large scale and across domains. In addition, when operating at this scale, and across many geographies, it is essential now that the core resources are designed to be sustained in the long term.</p>
            </sec>
            <sec id="sec19">
                <title>Bioinformatics tools and services for biodiversity research need to be prioritised</title>
                <p>Continued community-driven development of the analysis tools and services required to take full advantage of the accumulating data should be actively supported. Methods for the analysis of molecular data integrated with biodiversity-related data will continue to evolve and improve, so adopting a fixed approach to data analysis is not a realistic option. Instead, development should proceed in an environment that encourages innovation while building on and connecting to existing tools and services. To achieve this in an efficient manner that benefits the entire community, bioinformatics methods development needs to follow the recommendations on FAIR software (
                    <xref ref-type="bibr" rid="ref61">Katz et al. 2021</xref>). To encourage this, we should prioritise the establishment of dedicated recommendations and guidelines for best practices in developing bioinformatics tools and services for biodiversity research. For example, workflows used to analyse biodiversity-related data should be containerised and made easily accessible through BioContainers (
                    <xref ref-type="bibr" rid="ref112">da Veiga Leprevost et al. 2017</xref>) or within cloud computing infrastructures. These efforts can benefit from and should build on the ELIXIR tools ecosystem (
                    <xref ref-type="bibr" rid="ref34">ELIXIR 2021b</xref>) that aims to help communities find, register and benchmark software tools, while maintaining information standards for these tools, and producing, adopting and promoting best practices for their development.</p>
            </sec>
            <sec id="sec20">
                <title>Training needs to be widely available to the community and sustained</title>
                <p>To encourage and enable the adoption of these recommendations by the end-user communities, we should build common training, capacity building, and outreach activities. This needs to cover all stages of the processes involved, from sampling to data processing and analysis. Training ensures dissemination of the developed tools, resources, and standards to the scientific community and engagement feeds back into refinements and new initiatives to better serve community needs. Sustained support for training connects infrastructure developers like data engineers, service providers, and software developers, with infrastructure users producing and analysing biodiversity-related data. The rewards from prioritising training are evident from the experiences of the ELIXIR Training Platform, through which researchers are empowered with the skills and confidence to use the relevant tools and services and contribute to their continued development.</p>
            </sec>
            <sec id="sec21">
                <title>The biodiversity community needs to proactively seek common solutions that enable molecular technologies to advance biodiversity research</title>
                <p>This survey represents a step in the direction of identifying common challenges and opportunities with respect to how molecular technologies can help inform understanding of biodiversity. The use cases described above show how different research communities are developing initiatives to connect molecular sequence data collections with biodiversity research infrastructures. They represent just a small fraction of ongoing initiatives spanning a wide range of biodiversity studies, some more and others less aware of each other&#x2019;s activities. Research communities should be proactive in communicating their needs and the solutions to meet them, thereby encouraging cross-community development of tools and resources that multiply benefits and avoid redundancies. The ELIXIR contextualised portfolio of biodiversity informatics resources and services provides a starting point to bringing visibility to existing infrastructures as well as stimulating improved integration. In order to better understand the challenges concerning emerging technologies, scaling up workflows, and ensuring that standards evolve in a coherent manner, we recommend that the community develops a curated, shared, and public understanding of the different types of emerging data, dataflows, repositories, and portals that are necessary to steward up-to-date, comprehensive, complete, and interoperable reference datasets on biodiversity. Such a catalogue of use cases would be a natural output of the Communities of Practice described above in our recommendation on improved collaborations. Through such community-driven initiatives, core sets of standards, approaches, and techniques should be defined that provide all researchers with the means to address critical biodiversity questions by taking advantage of well-connected molecular sequence and biodiversity research infrastructures.</p>
            </sec>
        </sec>
        <sec id="sec22">
            <title>Data availability</title>
            <p>No data are associated with this article.</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>The authors would like to acknowledge the important contribution made by Dr Corinne Martin, ELIXIR, who provided critical independent expert review of the manuscript during its preparation.</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Agosti</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Egloff</surname>
                            <given-names>W</given-names>
                        </name>
</person-group>:
                    <article-title>Taxonomic information exchange and copyright: the Plazi approach.</article-title>
                    <source>

                        <italic toggle="yes">BMC. Res. Notes 2009.</italic>
</source>
                    <year>2009</year>;<volume>2</volume>(<issue>53</issue>).
                    <pub-id pub-id-type="doi">10.1186/1756-0500-2-53</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Aarestrup</surname>
                            <given-names>FM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Integrating Genome-based Informatics to Modernize Global Disease Monitoring, Information Sharing, and Response.</article-title>
                    <source>

                        <italic toggle="yes">Emerg. Infect. Dis.</italic>
</source>
                    <year>2012</year>;<volume>18</volume>:<fpage>e1</fpage>&#x2013;<lpage>e1</lpage>.
                    <pub-id pub-id-type="pmid">23092707</pub-id>
                    <pub-id pub-id-type="doi">10.3201/eid/1811.120453</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3559169</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <mixed-citation publication-type="other">
                    <collab>ACE</collab>:
                    <article-title>ACE Expedition &#x2013; A better understanding of Antarctica.</article-title>
                    <year>2021</year>.
(Accessed February 24, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://spi-ace-expedition.ch/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Adamowicz</surname>
                            <given-names>SJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hollingsworth</surname>
                            <given-names>PM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ratnasingham</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>International Barcode of Life: Focus on big biodiversity in South Africa Cristescu, ME, editor.</article-title>
                    <source>

                        <italic toggle="yes">Genome.</italic>
</source>
                    <year>2017</year>;<volume>60</volume>:<fpage>875</fpage>&#x2013;<lpage>879</lpage>.
                    <pub-id pub-id-type="pmid">29130757</pub-id>
                    <pub-id pub-id-type="doi">10.1139/gen-2017-0210</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Agosti</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Biodiversity Literature Repository (BLR), a repository for FAIR data and publications.</article-title>
                    <source>

                        <italic toggle="yes">Biodivers. Inf. Sci. Stand.</italic>
</source>
                    <year>2019</year>;<volume>3</volume>:<fpage>e37197</fpage>.
                    <pub-id pub-id-type="doi">10.3897/biss.3.37197</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Agosti</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Egloff</surname>
                            <given-names>W</given-names>
                        </name>
</person-group>:
                    <article-title>Taxonomic information exchange and copyright: the Plazi approach.</article-title>
                    <source>

                        <italic toggle="yes">BMC. Res. Notes.</italic>
</source>
                    <year>2009</year>;<volume>2</volume>:<fpage>53</fpage>.
                    <pub-id pub-id-type="pmid">19331688</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1756-0500-2-53</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2673227</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Andersson</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>
                    <year>2020</year>.
                    <article-title>Publishing sequence-derived data through biodiversity data platforms.</article-title>
                    <pub-id pub-id-type="doi">10.35035/DOC-VF1A-NR22</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Arita</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Karsch-Mizrachi</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cochrane</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>The international nucleotide sequence database collaboration.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2021</year>;<volume>49</volume>:<fpage>D121</fpage>&#x2013;<lpage>D124</lpage>.
                    <pub-id pub-id-type="pmid">33166387</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkaa967</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7778961</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Baldrian</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>V&#x011b;trovsk&#x00fd;</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lepinay</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>High-throughput sequencing view on the magnitude of global fungal diversity.</article-title>
                    <source>

                        <italic toggle="yes">Fungal Divers.</italic>
</source>
                    <year>2021</year>.
in press.
                    <pub-id pub-id-type="doi">10.1007/s13225-021-00472-y</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Baral</surname>
                            <given-names>H-O</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Queloz</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hosoya</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>
                        <italic toggle="yes">Hymenoscyphus fraxineus</italic>, the correct scientific name for the fungus causing ash dieback in Europe.</article-title>
                    <source>

                        <italic toggle="yes">IMA Fungus.</italic>
</source>
                    <year>2014</year>;<volume>5</volume>:<fpage>79</fpage>&#x2013;<lpage>80</lpage>.
                    <pub-id pub-id-type="pmid">25083409</pub-id>
                    <pub-id pub-id-type="doi">10.5598/imafungus.2014.05.01.09</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4107900</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>B&#x00e9;langer</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pilling</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <collab>FAO, Commission on Genetic Resources for Food and Agriculture</collab>:
                    <source>

                        <italic toggle="yes">The state of the world&#x2019;s biodiversity for food and agriculture.</italic>
</source>
                    <year>2019</year>.
(Accessed February 22, 2021).
                    <ext-link ext-link-type="uri" xlink:href="http://www.fao.org/3/CA3129EN/CA3129EN.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>B&#x00e9;nichou</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gerard</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chester</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The European Journal of Taxonomy: Enhancing taxonomic publications for dynamic data exchange and navigation.</article-title>
                    <source>

                        <italic toggle="yes">Biodivers. Inf. Sci. Stand.</italic>
</source>
                    <year>2019</year>;<volume>3</volume>:<fpage>e37199</fpage>.
                    <pub-id pub-id-type="doi">10.3897/biss.3.37199</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Berney</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>UniEuk: Time to Speak a Common Language in Protistology!.</article-title>
                    <source>

                        <italic toggle="yes">J. Eukaryot. Microbiol.</italic>
</source>
                    <year>2017</year>;<volume>64</volume>:<fpage>407</fpage>&#x2013;<lpage>411</lpage>.
                    <pub-id pub-id-type="pmid">28337822</pub-id>
                    <pub-id pub-id-type="doi">10.1111/jeu.12414</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5435949</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref145">
                <mixed-citation publication-type="journal">
                    <collab>BiCIKL</collab>.
                    <ext-link ext-link-type="uri" xlink:href="https://bicikl-project.eu">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bourlat</surname>
                            <given-names>SJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Genomics in marine monitoring: New opportunities for assessing marine health status.</article-title>
                    <source>

                        <italic toggle="yes">Mar. Pollut. Bull.</italic>
</source>
                    <year>2013</year>;<volume>74</volume>:<fpage>19</fpage>&#x2013;<lpage>31</lpage>.
                    <pub-id pub-id-type="pmid">23806673</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.marpolbul.2013.05.042</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Brozynska</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Furtado</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Henry</surname>
                            <given-names>RJ</given-names>
                        </name>
</person-group>:
                    <article-title>Genomics of crop wild relatives: expanding the gene pool for crop improvement.</article-title>
                    <source>

                        <italic toggle="yes">Plant Biotechnol. J.</italic>
</source>
                    <year>2016</year>;<volume>14</volume>:<fpage>1070</fpage>&#x2013;<lpage>1085</lpage>.
                    <pub-id pub-id-type="pmid">26311018</pub-id>
                    <pub-id pub-id-type="doi">10.1111/pbi.12454</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Canonico</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Global Observational Needs and Resources for Marine Biodiversity.</article-title>
                    <source>

                        <italic toggle="yes">Front. Mar. Sci.</italic>
</source>
                    <year>2019</year>;<volume>6</volume>:<fpage>367</fpage>.
                    <pub-id pub-id-type="doi">10.3389/fmars.2019.00367</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Carroll</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The Global Virome Project.</article-title>
                    <source>

                        <italic toggle="yes">Science.</italic>
</source>
                    <year>2018</year>;<volume>359</volume>:<fpage>872</fpage>&#x2013;<lpage>874</lpage>.
                    <pub-id pub-id-type="pmid">29472471</pub-id>
                    <pub-id pub-id-type="doi">10.1126/science.aap7463</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <mixed-citation publication-type="other">
                    <collab>CETAF</collab>:
                    <article-title>CETAF &#x2013; Consortium of European Taxonomic Facilities|EXPLORING AND DOCUMENTING DIVERSITY IN NATURE.</article-title>
                    <year>2021</year>.
(Accessed November 1, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://cetaf.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cheng</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>10KP: A phylodiverse genome sequencing plan.</article-title>
                    <source>

                        <italic toggle="yes">GigaScience.</italic>
</source>
                    <year>2018</year>;<volume>7</volume>:<fpage>1</fpage>&#x2013;<lpage>9</lpage>.
                    <pub-id pub-id-type="pmid">29618049</pub-id>
                    <pub-id pub-id-type="doi">10.1093/gigascience/giy013</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5869286</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Collins</surname>
                            <given-names>JE</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Strengthening the global network for sharing of marine biological collections: recommendations for a new agreement for biodiversity beyond national jurisdiction Blasiak, R, editor.</article-title>
                    <source>

                        <italic toggle="yes">ICES J. Mar. Sci.</italic>
</source>
                    <year>2020</year>;<volume>78</volume>:<fpage>305</fpage>&#x2013;<lpage>314</lpage>.
                    <pub-id pub-id-type="pmid">33814897</pub-id>
                    <pub-id pub-id-type="doi">10.1093/icesjms/fsaa227</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7988798</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <mixed-citation publication-type="journal">
                    <collab>DAD-IS</collab>:
                    <article-title>Domestic Animal Diversity Information System.</article-title>
                    <source>

                        <italic toggle="yes">Domest. Anim. Divers. Inf. Syst. DAD-IS.</italic>
</source>
                    <year>2021</year>.
(Accessed October 26, 2020).
                    <ext-link ext-link-type="uri" xlink:href="http://www.fao.org/dad-is">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dale</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Transgenic Cavendish bananas with resistance to Fusarium wilt tropical race 4.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2017</year>;<volume>8</volume>:<fpage>1496</fpage>.
                    <pub-id pub-id-type="pmid">29133817</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-017-01670-6</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5684404</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Danilevicz</surname>
                            <given-names>MF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tay Fernandez</surname>
                            <given-names>CG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marsh</surname>
                            <given-names>JI</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Plant pangenomics: approaches, applications and advancements.</article-title>
                    <source>

                        <italic toggle="yes">Curr. Opin. Plant Biol.</italic>
</source>
                    <year>2020</year>;<volume>54</volume>:<fpage>18</fpage>&#x2013;<lpage>25</lpage>.
                    <pub-id pub-id-type="pmid">31982844</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.pbi.2019.12.005</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Davies</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The founding charter of the Genomic Observatories Network.</article-title>
                    <source>

                        <italic toggle="yes">GigaScience.</italic>
</source>
                    <year>2014</year>;<volume>3</volume>:<fpage>2</fpage>.
                    <pub-id pub-id-type="pmid">24606731</pub-id>
                    <pub-id pub-id-type="doi">10.1186/2047-217X-3-2</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3995929</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Della Coletta</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Qiu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ou</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>How the pan-genome is changing crop genomics and improvement.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2021</year>;<volume>22</volume>:<fpage>3</fpage>.
                    <pub-id pub-id-type="pmid">33397434</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-020-02224-8</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7780660</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>deWaard</surname>
                            <given-names>JR</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A reference library for Canadian invertebrates with 1.5 million barcodes, voucher specimens, and DNA samples.</article-title>
                    <source>

                        <italic toggle="yes">Sci. Data.</italic>
</source>
                    <year>2019</year>;<volume>6</volume>(<issue>308</issue>):<fpage>308</fpage>.
                    <pub-id pub-id-type="pmid">31811161</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41597-019-0320-2</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6897906</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <mixed-citation publication-type="book">
                    <collab>DiSSCo</collab>:
                    <source>

                        <italic toggle="yes">The Distributed System of Scientific Collections.</italic>
</source>
                    <publisher-name>DiSSCo</publisher-name>;<year>2021</year>.
(Accessed October 26, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://www.dissco.eu/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <mixed-citation publication-type="book">
                    <collab>DivSeek</collab>:
                    <source>

                        <italic toggle="yes">DivSeek International Network - A Global Community Driven Not-for-Profit Organization.</italic>
</source>
                    <publisher-name>DivSeek Intl.</publisher-name>;<year>2021</year>.
(Accessed October 26, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://divseekintl.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref127">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Droege</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The Global Genome Biodiversity Network (GGBN) Data Standard specification.</article-title>
                    <source>

                        <italic toggle="yes">Database (Oxford).</italic>
</source>
                    <year>2016 Oct 2</year>;<volume>2016</volume>:<fpage>baw125</fpage>.
                    <pub-id pub-id-type="pmid">27694206</pub-id>
                    <pub-id pub-id-type="doi">10.1093/database/baw125</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5045859</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref29">
                <mixed-citation publication-type="other">
                    <collab>DToL</collab>:
                    <article-title>Darwin Tree Of Life.</article-title>
                    <year>2021</year>.
(Accessed June 19, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://www.darwintreeoflife.org/">r</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref30">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Duarte</surname>
                            <given-names>CM</given-names>
                        </name>
</person-group>:
                    <article-title>Seafaring in the 21St Century: The Malaspina 2010 Circumnavigation Expedition.</article-title>
                    <source>

                        <italic toggle="yes">Limnol. Oceanogr. Bull.</italic>
</source>
                    <year>2015</year>;<volume>24</volume>:<fpage>11</fpage>&#x2013;<lpage>14</lpage>.
                    <pub-id pub-id-type="doi">10.1002/lob.10008</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref31">
                <mixed-citation publication-type="book">
                    <collab>EBP</collab>:
                    <source>

                        <italic toggle="yes">Earth BioGenome Project.</italic>
</source>
                    <publisher-name>Earth Biog. Proj.</publisher-name>;<year>2021</year>.
(Accessed April 7, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://www.earthbiogenome.org">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref32">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Egidi</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A few Ascomycota taxa dominate soil fungal communities worldwide.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2019</year>;<volume>10</volume>:<fpage>2369</fpage>.
                    <pub-id pub-id-type="pmid">31147554</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-019-10373-z</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6542806</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <mixed-citation publication-type="other">
                    <collab>ELIXIR</collab>:
                    <article-title>ELIXIR|A distributed infrastructure for life-science information.</article-title>
                    <year>2021a</year>.
(Accessed December 14, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://elixir-europe.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <mixed-citation publication-type="book">
                    <collab>ELIXIR</collab>:
                    <source>

                        <italic toggle="yes">ELIXIR Tools Platform.</italic>
</source>
                    <publisher-name>ELIXIR</publisher-name>;<year>2021b</year>.
(Accessed May 10, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://elixir-europe.org/platforms/tools">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref35">
                <mixed-citation publication-type="book">
                    <collab>ENVRI</collab>:
                    <source>

                        <italic toggle="yes">ENVRI: Environmental Research Infrastructures Community.</italic>
</source>
                    <publisher-name>ENVRI Community</publisher-name>;<year>2021</year>.
(Accessed May 10, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://envri.eu/">https://envri.eu/</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref36">
                <mixed-citation publication-type="journal">
                    <collab>ERGA</collab>:
                    <article-title>The European Reference Genome Atlas (ERGA) initiative.</article-title>
                    <source>

                        <italic toggle="yes">Genome Atlas Eur. Biodivers.</italic>
</source>
                    <year>2021</year>.
(Accessed May 10, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://www.erga-biodiversity.eu">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref37">
                <mixed-citation publication-type="other">
                    <collab>EVA</collab>:
                    <article-title>The European Variation Archive.</article-title>
                    <year>2021</year>.
(Accessed December 14, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://www.ebi.ac.uk/eva/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref38">
                <mixed-citation publication-type="journal">
                    <collab>FAANG</collab>:
                    <article-title>A Global Network - Functional Annotation of Animal Genomes (FAANG).</article-title>
                    <source>

                        <italic toggle="yes">Funct. Annot. Anim. Genomes FAANG Proj. - Coord. Int. Action Accel. Genome Phenome.</italic>
</source>
                    <year>2021</year>.
(Accessed October 26, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://www.animalgenome.org/community/FAANG/index">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref39">
                <mixed-citation publication-type="book">
                    <collab>FAO</collab>:
                    <source>

                        <italic toggle="yes">The second report on the state of the world&#x2019;s plant genetic resources for food and agriculture.</italic>
</source>
                    <publisher-loc>Rome</publisher-loc>:
                    <publisher-name>Commission on Genetic Resources for Food and Agriculture, Food and Agriculture Organization of the United Nations</publisher-name>;<year>2010</year>.</mixed-citation>
            </ref>
            <ref id="ref40">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Federhen</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>The NCBI Taxonomy database.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2012</year>;<volume>40</volume>:<fpage>D136</fpage>&#x2013;<lpage>D143</lpage>.
                    <pub-id pub-id-type="pmid">22139910</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkr1178</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3245000</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref41">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Field</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Genomic Standards Consortium Projects.</article-title>
                    <source>

                        <italic toggle="yes">Stand. Genomic Sci.</italic>
</source>
                    <year>2014</year>;<volume>9</volume>:<fpage>599</fpage>&#x2013;<lpage>601</lpage>.
                    <pub-id pub-id-type="pmid">25197446</pub-id>
                    <pub-id pub-id-type="doi">10.4056/sigs.5559680</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4148985</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref42">
                <mixed-citation publication-type="other">
                    <collab>GBIF</collab>:
                    <article-title>Amplicon sequencing of Tara Oceans DNA samples corresponding to size fractions for protists.</article-title>
                    <year>2018</year>.
                    <pub-id pub-id-type="doi">10.15468/2HV1BE</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref43">
                <mixed-citation publication-type="book">
                    <collab>GBIF</collab>:
                    <source>

                        <italic toggle="yes">GBIF: The Global Biodiversity Information Facility (2020) What is GBIF?.</italic>
</source>
                    <publisher-name>GBIF</publisher-name>;<year>2021</year>.
(Accessed June 17, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://www.gbif.org/what-is-gbif">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref44">
                <mixed-citation publication-type="other">
                    <collab>Genesys</collab>:
                    <article-title>Genesys.</article-title>
                    <year>2021</year>.
(Accessed October 26, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://www.genesys-pgr.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref45">
                <mixed-citation publication-type="other">
                    <collab>GenResBridge</collab>:
                    <article-title>GenRes Bridge.</article-title>
                    <year>2021</year>.
(Accessed October 26, 2020).
                    <ext-link ext-link-type="uri" xlink:href="http://www.genresbridge.eu/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref46">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gilbert</surname>
                            <given-names>JA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jansson</surname>
                            <given-names>JK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Knight</surname>
                            <given-names>R</given-names>
                        </name>
</person-group>:
                    <article-title>The Earth Microbiome project: successes and aspirations.</article-title>
                    <source>

                        <italic toggle="yes">BMC Biol.</italic>
</source>
                    <year>2014</year>;<volume>12</volume>:<fpage>69</fpage>.
                    <pub-id pub-id-type="pmid">25184604</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12915-014-0069-1</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4141107</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref47">
                <mixed-citation publication-type="other">
                    <collab>GlobalFungi</collab>:
                    <article-title>GlobalFungi.</article-title>
                    <year>2021</year>.
(Accessed December 23, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://globalfungi.com/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref48">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gl&#x00f6;ckner</surname>
                            <given-names>FO</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>25 years of serving the community with ribosomal RNA gene reference databases and tools.</article-title>
                    <source>

                        <italic toggle="yes">J. Biotechnol.</italic>
</source>
                    <year>2017</year>;<volume>261</volume>:<fpage>169</fpage>&#x2013;<lpage>176</lpage>.
                    <pub-id pub-id-type="pmid">28648396</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jbiotec.2017.06.1198</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref49">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gorsky</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Expanding Tara Oceans Protocols for Underway, Ecosystemic Sampling of the Ocean-Atmosphere Interface During Tara Pacific Expedition (2016&#x2013;2018).</article-title>
                    <source>

                        <italic toggle="yes">Front. Mar. Sci.</italic>
</source>
                    <year>2019</year>;<volume>6</volume>:<fpage>750</fpage>.
                    <pub-id pub-id-type="doi">10.3389/fmars.2019.00750</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref50">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Harrow</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ELIXIR-EXCELERATE: establishing Europe&#x2019;s data infrastructure for the life science research of the future.</article-title>
                    <source>

                        <italic toggle="yes">EMBO J.</italic>
</source>
                    <year>2021</year>;<volume>40</volume>:<fpage>e107409</fpage>.
                    <pub-id pub-id-type="pmid">33565128</pub-id>
                    <pub-id pub-id-type="doi">10.15252/embj.2020107409</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7957415</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref51">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hoban</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Genetic diversity targets and indicators in the CBD post-2020 Global Biodiversity Framework must be improved.</article-title>
                    <source>

                        <italic toggle="yes">Biol. Conserv.</italic>
</source>
                    <year>2020</year>;<volume>248</volume>:<fpage>108654</fpage>.
                    <pub-id pub-id-type="doi">10.1016/j.biocon.2020.108654</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref52">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hobern</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>BIOSCAN: DNA barcoding to accelerate taxonomy and biogeography for conservation and sustainability Adamowicz, S, editor.</article-title>
                    <source>

                        <italic toggle="yes">Genome.</italic>
</source>
                    <year>2021</year>;<volume>64</volume>:<fpage>161</fpage>&#x2013;<lpage>164</lpage>.
                    <pub-id pub-id-type="pmid">32268069</pub-id>
                    <pub-id pub-id-type="doi">10.1139/gen-2020-0009</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref128">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Holetschek</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The ABCD of primary biodiversity data access.</article-title>
                    <source>

                        <italic toggle="yes">Plant Biosystems - An International Journal Dealing with all Aspects of Plant Biology</italic>
</source>
                    <year>2012</year>;<volume>146</volume>(<issue>4</issue>):<fpage>771</fpage>&#x2013;<lpage>779</lpage>.
                    <pub-id pub-id-type="doi">10.1080/11263504.2012.740085</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref53">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hoopen</surname>
                            <given-names>P</given-names>
                            <prefix>ten</prefix>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The metagenomic data life-cycle: standards and best practices.</article-title>
                    <source>

                        <italic toggle="yes">GigaScience.</italic>
</source>
                    <year>2017</year>;<volume>6</volume>:<fpage>1</fpage>&#x2013;<lpage>11</lpage>.
                    <pub-id pub-id-type="pmid">28637310</pub-id>
                    <pub-id pub-id-type="doi">10.1093/gigascience/gix047</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5737865</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref54">
                <mixed-citation publication-type="journal">
                    <collab>i5K Consortium</collab>:
                    <article-title>The i5K initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment.</article-title>
                    <source>

                        <italic toggle="yes">J. Hered.</italic>
</source>
                    <year>2013</year>;<volume>104</volume>:<fpage>595</fpage>&#x2013;<lpage>600</lpage>.
                    <pub-id pub-id-type="pmid">23940263</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jhered/est050</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4046820</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref55">
                <mixed-citation publication-type="journal">
                    <collab>iHMP Research Network Consortium</collab>:
                    <article-title>The Integrative Human Microbiome Project.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2019</year>;<volume>569</volume>:<fpage>641</fpage>&#x2013;<lpage>648</lpage>.
                    <pub-id pub-id-type="pmid">31142853</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41586-019-1238-8</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6784865</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref56">
                <mixed-citation publication-type="book">
                    <collab>IPBES</collab>:
                    <source>

                        <italic toggle="yes">The Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES).</italic>
</source>
                    <publisher-name>IPBES</publisher-name>;<year>2021</year>.
(Accessed May 6, 2021).
                    <ext-link ext-link-type="uri" xlink:href="http://www.ipbes.net/node/36759">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref57">
                <mixed-citation publication-type="other">
                    <collab>IMAGE</collab>:
                    <article-title>Innovative Management of Animal Genetic Resources.</article-title>
                    <year>2021</year>.
(Accessed July 9, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://www.image2020genebank.eu/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref58">
                <mixed-citation publication-type="other">
                    <collab>IPT</collab>:
                    <article-title>The Integrated Publishing Toolkit.</article-title>
                    <year>2021</year>.
(Accessed March 22, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://www.gbif.org/ipt">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref59">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jetz</surname>
                            <given-names>W</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Essential biodiversity variables for mapping and monitoring species populations.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Ecol. Evol.</italic>
</source>
                    <year>2019</year>;<volume>3</volume>:<fpage>539</fpage>&#x2013;<lpage>551</lpage>.
                    <pub-id pub-id-type="pmid">30858594</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41559-019-0826-1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref60">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Karger</surname>
                            <given-names>DN</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Climatologies at high resolution for the earth&#x2019;s land surface areas.</article-title>
                    <source>

                        <italic toggle="yes">Sci. Data.</italic>
</source>
                    <year>2017</year>;<volume>4</volume>:<fpage>170122</fpage>.
                    <pub-id pub-id-type="pmid">28872642</pub-id>
                    <pub-id pub-id-type="doi">10.1038/sdata.2017.122</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5584396</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref61">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Katz</surname>
                            <given-names>DS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gruenpeter</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Honeyman</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>Taking a fresh look at FAIR for research software.</article-title>
                    <source>

                        <italic toggle="yes">Patterns.</italic>
</source>
                    <year>2021</year>;<volume>2</volume>:<fpage>100222</fpage>.
                    <pub-id pub-id-type="pmid">33748799</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.patter.2021.100222</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7961177</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref62">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Khan</surname>
                            <given-names>AW</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Super-Pangenome by Integrating the Wild Side of a Species for Accelerated Crop Improvement.</article-title>
                    <source>

                        <italic toggle="yes">Trends Plant Sci.</italic>
</source>
                    <year>2020</year>;<volume>25</volume>:<fpage>148</fpage>&#x2013;<lpage>158</lpage>.
                    <pub-id pub-id-type="pmid">31787539</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.tplants.2019.10.012</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6988109</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref63">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kindler</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Hybridization patterns in two contact zones of grass snakes reveal a new Central European snake species.</article-title>
                    <source>

                        <italic toggle="yes">Sci. Rep.</italic>
</source>
                    <year>2017</year>;<volume>7</volume>:<fpage>7378</fpage>.
                    <pub-id pub-id-type="pmid">28785033</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41598-017-07847-9</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5547120</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref64">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kissling</surname>
                            <given-names>WD</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Towards global data products of Essential Biodiversity Variables on species traits.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Ecol. Evol.</italic>
</source>
                    <year>2018</year>;<volume>2</volume>:<fpage>1531</fpage>&#x2013;<lpage>1540</lpage>.
                    <pub-id pub-id-type="pmid">30224814</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41559-018-0667-3</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref65">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Klemetsen</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The MAR databases: development and implementation of databases specific for marine metagenomics.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2018</year>;<volume>46</volume>:<fpage>D692</fpage>&#x2013;<lpage>D699</lpage>.
                    <pub-id pub-id-type="pmid">29106641</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkx1036</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5753341</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref66">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kopf</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The ocean sampling day consortium.</article-title>
                    <source>

                        <italic toggle="yes">GigaScience.</italic>
</source>
                    <year>2015</year>;<volume>4</volume>:<fpage>27</fpage>.
                    <pub-id pub-id-type="pmid">26097697</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13742-015-0066-5</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4473829</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref67">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lewin</surname>
                            <given-names>HA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Earth BioGenome Project: Sequencing life for the future of life.</article-title>
                    <source>

                        <italic toggle="yes">Proc. Natl. Acad. Sci.</italic>
</source>
                    <year>2018</year>;<volume>115</volume>:<fpage>4325</fpage>&#x2013;<lpage>4333</lpage>.
                    <pub-id pub-id-type="pmid">29686065</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.1720115115</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5924910</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref68">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Linnaeus</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>
                    <article-title>Apis mellifera Linnaeus, 1758, spec. nov.</article-title>
                    <year>1758</year>.
                    <pub-id pub-id-type="doi">10.5281/ZENODO.3922706</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref69">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Linn&#x00e9;</surname>
                            <given-names>C</given-names>
                            <prefix>von</prefix>
                        </name>

                        <name name-style="western">
                            <surname>Lars</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>
                    <source>

                        <italic toggle="yes">Caroli Linnaei &#x2026; Species plantarum: exhibentes plantas rite cognitas, ad genera relatas, cum differentiis specificis, nominibus trivialibus, synonymis selectis, locis natalibus, secundum systema sexuale digestas&#x2026;. Holmiae:Impensis Laurentii Salvii.</italic>
</source>
                    <year>1753</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.biodiversitylibrary.org/item/13829">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref70">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Linn&#x00e9;</surname>
                            <given-names>C</given-names>
                            <prefix>von</prefix>
                        </name>

                        <name name-style="western">
                            <surname>Lars</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Caroli Linnaei&#x2026;Systema naturae per regna tria naturae: secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis. Holmiae:Impensis Direct. Laurentii Salvii.</italic>
</source>
                    <year>1758</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.biodiversitylibrary.org/item/10277">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref71">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mascher</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Genebank genomics bridges the gap between the conservation of crop diversity and plant breeding.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Genet.</italic>
</source>
                    <year>2019</year>;<volume>51</volume>:<fpage>1076</fpage>&#x2013;<lpage>1081</lpage>.
                    <pub-id pub-id-type="pmid">31253974</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41588-019-0443-6</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref72">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Menting</surname>
                            <given-names>F</given-names>
                        </name>
</person-group>:
                    <article-title>Centre for Genetic Resources, the Netherlands.</article-title>
                    <source>

                        <italic toggle="yes">PGR passport data.</italic>
</source>
                    <year>2020</year>.
                    <pub-id pub-id-type="doi">10.15468/MUGSLO</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref73">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Meyer</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>MG-RAST version 4&#x2014;lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis.</article-title>
                    <source>

                        <italic toggle="yes">Brief. Bioinform.</italic>
</source>
                    <year>2019</year>;<volume>20</volume>:<fpage>1151</fpage>&#x2013;<lpage>1159</lpage>.
                    <pub-id pub-id-type="pmid">29028869</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bib/bbx105</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6781595</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref74">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Miller</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Integrating and visualizing primary data from prospective and legacy taxonomic literature.</article-title>
                    <source>

                        <italic toggle="yes">Biodivers. Data J.</italic>
</source>
                    <year>2015</year>;<volume>3</volume>:<fpage>e5063</fpage>.
                    <pub-id pub-id-type="doi">10.3897/BDJ.3.e5063</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref75">
                <mixed-citation publication-type="other">
                    <collab>MIRRI</collab>:
                    <article-title>The pan-European Microbial Resource Research Infrastructure.</article-title>
                    <year>2021</year>.
(Accessed October 26, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://www.mirri.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref76">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mitchell</surname>
                            <given-names>AL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>MGnify: the microbiome analysis resource in 2020.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2020</year>;<fpage>gkz1035</fpage>.
                    <pub-id pub-id-type="doi">10.1093/nar/gkz1035</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref77">
                <mixed-citation publication-type="book">
                    <collab>MMC</collab>:
                    <source>

                        <italic toggle="yes">Marine Metagenomics Community.</italic>
</source>
                    <publisher-name>ELIXIR</publisher-name>;<year>2021</year>.
(Accessed February 24, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://elixir-europe.org/communities/marine-metagenomics">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref78">
                <mixed-citation publication-type="other">
                    <collab>MMP</collab>:
                    <article-title>Marine Metagenomics Portal.</article-title>
                    <year>2021</year>.
(Accessed March 24, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://mmp.sfb.uit.no/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref79">
                <mixed-citation publication-type="book">
                    <collab>MOSAiC</collab>:
                    <source>

                        <italic toggle="yes">MOSAiC Expedition.</italic>
</source>
                    <publisher-name>MOSAiC Exped</publisher-name>;<year>2021</year>.
(Accessed February 24, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://mosaic-expedition.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref80">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mukherjee</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Biotechnol.</italic>
</source>
                    <year>2017</year>;<volume>35</volume>:<fpage>676</fpage>&#x2013;<lpage>683</lpage>.
                    <pub-id pub-id-type="pmid">28604660</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3886</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref81">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Niang</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>METdb: A genomic reference database for marine species.</article-title>
                    <year>2020</year>.
                    <pub-id pub-id-type="doi">10.7490/F1000RESEARCH.1118000.1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref82">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nilsson</surname>
                            <given-names>RH</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2019</year>;<volume>47</volume>:<fpage>D259</fpage>&#x2013;<lpage>D264</lpage>.
                    <pub-id pub-id-type="pmid">30371820</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gky1022</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6324048</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref83">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nystedt</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The Norway spruce genome sequence and conifer genome evolution.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2013</year>;<volume>497</volume>:<fpage>579</fpage>&#x2013;<lpage>584</lpage>.
                    <pub-id pub-id-type="pmid">23698360</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature12211</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref84">
                <mixed-citation publication-type="other">
                    <collab>OBIS</collab>:
                    <article-title>Ocean Biodiversity Information System.</article-title>
                    <year>2021</year>.
(Accessed May 10, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://obis.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref85">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Parr</surname>
                            <given-names>CS</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The Encyclopedia of Life v2: Providing Global Access to Knowledge About Life on Earth.</article-title>
                    <source>

                        <italic toggle="yes">Biodivers. Data J.</italic>
</source>
                    <year>2014</year>;<volume>2</volume>:<fpage>e1079</fpage>.
                    <pub-id pub-id-type="doi">10.3897/BDJ.2.e1079</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref86">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Penev</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>
                    <article-title>Implementation Of Taxpub, An Nlm Dtd Extension For Domain-Specific Markup In Taxonomy, From The Experience Of A Biodiversity Publisher.</article-title>
                    <year>2012</year>.
                    <pub-id pub-id-type="doi">10.5281/ZENODO.804247</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref87">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pilling</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>B&#x00e9;langer</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Diulgheroff</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Global status of genetic resources for food and agriculture: challenges and research needs: Global status of genetic resources for food and agriculture.</article-title>
                    <source>

                        <italic toggle="yes">Genet. Resour.</italic>
</source>
                    <year>2020a</year>;<volume>1</volume>:<fpage>4</fpage>&#x2013;<lpage>16</lpage>.
                    <pub-id pub-id-type="doi">10.46265/genresj.2020.1.4-16</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref88">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pilling</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>B&#x00e9;langer</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hoffmann</surname>
                            <given-names>I</given-names>
                        </name>
</person-group>:
                    <article-title>Declining biodiversity for food and agriculture needs urgent global action.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Food.</italic>
</source>
                    <year>2020b</year>;<volume>1</volume>:<fpage>144</fpage>&#x2013;<lpage>147</lpage>.
                    <pub-id pub-id-type="doi">10.1038/s43016-020-0040-y</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref89">
                <mixed-citation publication-type="other">
                    <collab>Plazi</collab>:
                    <article-title>Plazi: an association supporting and promoting the development of persistent and openly accessible digital taxonomic literature.</article-title>
                    <year>2021</year>.
(Accessed February 24, 2021).
                    <ext-link ext-link-type="uri" xlink:href="http://plazi.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref90">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Quast</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2012</year>;<volume>41</volume>:<fpage>D590</fpage>&#x2013;<lpage>D596</lpage>.
                    <pub-id pub-id-type="pmid">23193283</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gks1219</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3531112</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref91">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ratnasingham</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hebert</surname>
                            <given-names>PDN</given-names>
                        </name>
</person-group>:
                    <article-title>Bold: The Barcode of Life Data System.</article-title>
                    <source>

                        <italic toggle="yes">Mol. Ecol. Notes.</italic>
</source>
                    <year>2007</year>;<volume>7</volume>:<fpage>355</fpage>&#x2013;<lpage>364</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.barcodinglife.org">Reference Source</ext-link>
                    <pub-id pub-id-type="pmid">18784790</pub-id>
                    <pub-id pub-id-type="doi">10.1111/j.1471-8286.2007.01678.x</pub-id>
                    <pub-id pub-id-type="pmcid">PMC1890991</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref92">
                <mixed-citation publication-type="other">
                    <collab>RDMkit</collab>:
                    <article-title>RDMkit The ELIXIR Research Data Management Kit.</article-title>
                    <year>2021</year>.
(Accessed April 5, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://rdmkit.elixir-europe.org/index.html">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref126">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rhie</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Towards complete and error-free genome assemblies of all vertebrate species.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2021 Apr</year>;<volume>592</volume>(<issue>7856</issue>):<fpage>737</fpage>&#x2013;<lpage>746</lpage>.
                    <pub-id pub-id-type="pmid">33911273</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41586-021-03451-0</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8081667</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref93">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Roskov</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Catalogue of Life - 2019 Annual Checklist.</article-title>
                    <source>

                        <italic toggle="yes">Cat. Life 2019 Annu. Checkl.</italic>
</source>
                    <year>2020</year>.
(Accessed May 13, 2020).
                    <ext-link ext-link-type="uri" xlink:href="http://www.catalogueoflife.org/annual-checklist/2019/info/ac">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref94">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ryberg</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nilsson</surname>
                            <given-names>RH</given-names>
                        </name>
</person-group>:
                    <article-title>New light on names and naming of dark taxa.</article-title>
                    <source>

                        <italic toggle="yes">MycoKeys.</italic>
</source>
                    <year>2018</year>;<volume>30</volume>:<fpage>31</fpage>&#x2013;<lpage>39</lpage>.
                    <pub-id pub-id-type="pmid">29681731</pub-id>
                    <pub-id pub-id-type="doi">10.3897/mycokeys.30.24376</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5904500</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref95">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Santamaria</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ITSoneDB: a comprehensive collection of eukaryotic ribosomal RNA Internal Transcribed Spacer 1 (ITS1) sequences.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2018</year>;<volume>46</volume>:<fpage>D127</fpage>&#x2013;<lpage>D132</lpage>.
                    <pub-id pub-id-type="pmid">29036529</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkx855</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5753230</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref96">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schigel</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Going Molecular: Sequence-based spatiotemporal biodiversity evidence in GBIF.</article-title>
                    <source>

                        <italic toggle="yes">Biodivers. Inf. Sci. Stand.</italic>
</source>
                    <year>2019</year>;<volume>3</volume>:<fpage>e37036</fpage>.
                    <pub-id pub-id-type="doi">10.3897/biss.3.37036</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref97">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schmeller</surname>
                            <given-names>DS</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A suite of essential biodiversity variables for detecting critical biodiversity change: EBVs and critical biodiversity change.</article-title>
                    <source>

                        <italic toggle="yes">Biol. Rev.</italic>
</source>
                    <year>2018</year>;<volume>93</volume>:<fpage>55</fpage>&#x2013;<lpage>71</lpage>.
                    <pub-id pub-id-type="pmid">28447398</pub-id>
                    <pub-id pub-id-type="doi">10.1111/brv.12332</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref98">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schoch</surname>
                            <given-names>CL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>NCBI Taxonomy: a comprehensive update on curation, resources and tools.</article-title>
                    <source>

                        <italic toggle="yes">Database.</italic>
</source>
                    <year>2020</year>;<volume>2020</volume>:<fpage>baaa062</fpage>.
                    <pub-id pub-id-type="doi">10.1093/database/baaa062</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref99">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shaw</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>COPO: a metadata platform for brokering FAIR data in the life sciences.</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>2020</year>;<volume>9</volume>:<fpage>495</fpage>.
                    <pub-id pub-id-type="doi">10.12688/f1000research.23889.1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref100">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sherry</surname>
                            <given-names>ST</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>dbSNP: the NCBI database of genetic variation.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2001</year>;<volume>29</volume>:<fpage>308</fpage>&#x2013;<lpage>311</lpage>.
                    <pub-id pub-id-type="pmid">11125122</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/29.1.308</pub-id>
                    <pub-id pub-id-type="pmcid">PMC29783</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref101">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Smale</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jamora</surname>
                            <given-names>N</given-names>
                        </name>
</person-group>:
                    <article-title>Valuing genebanks.</article-title>
                    <source>

                        <italic toggle="yes">Food Secur.</italic>
</source>
                    <year>2020</year>;<volume>12</volume>:<fpage>905</fpage>&#x2013;<lpage>918</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s12571-020-01034-x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref102">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Stork</surname>
                            <given-names>NE</given-names>
                        </name>
</person-group>:
                    <article-title>How Many Species of Insects and Other Terrestrial Arthropods Are There on Earth?.</article-title>
                    <source>

                        <italic toggle="yes">Annu. Rev. Entomol.</italic>
</source>
                    <year>2018</year>;<volume>63</volume>:<fpage>31</fpage>&#x2013;<lpage>45</lpage>.
                    <pub-id pub-id-type="doi">10.1146/annurev-ento-020117-043348</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref103">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sunagawa</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Structure and function of the global ocean microbiome.</article-title>
                    <source>

                        <italic toggle="yes">Science.</italic>
</source>
                    <year>2015</year>;<volume>348</volume>:<fpage>1261359</fpage>&#x2013;<lpage>1261359</lpage>.
                    <pub-id pub-id-type="pmid">25999513</pub-id>
                    <pub-id pub-id-type="doi">10.1126/science.1261359</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref104">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sunagawa</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Tara Oceans: towards global ocean ecosystems biology.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Rev. Microbiol.</italic>
</source>
                    <year>2020</year>;<volume>18</volume>:<fpage>428</fpage>&#x2013;<lpage>445</lpage>.
                    <pub-id pub-id-type="doi">10.1038/s41579-020-0364-5</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref105">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tao</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mace</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Exploring and Exploiting Pan-genomics for Crop Improvement.</article-title>
                    <source>

                        <italic toggle="yes">Mol. Plant.</italic>
</source>
                    <year>2019</year>;<volume>12</volume>:<fpage>156</fpage>&#x2013;<lpage>169</lpage>.
                    <pub-id pub-id-type="pmid">30594655</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.molp.2018.12.016</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref106">
                <mixed-citation publication-type="other">
                    <collab>TARA</collab>:
                    <article-title>The Tara Ocean Foundation. Fond. Tara Oc&#x00e9;an.</article-title>
                    <year>2021</year>.
(Accessed December 7, 2020).
                    <ext-link ext-link-type="uri" xlink:href="https://oceans.taraexpeditions.org/en/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref107">
                <mixed-citation publication-type="journal">
                    <collab>Tara Oceans Coordinators</collab>:
                    <etal/>
                    <article-title>A global ocean atlas of eukaryotic genes.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2018</year>;<volume>9</volume>:<fpage>373</fpage>.
                    <pub-id pub-id-type="pmid">29371626</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-017-02342-1</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5785536</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref108">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tedersoo</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Global diversity and geography of soil fungi.</article-title>
                    <source>

                        <italic toggle="yes">Science.</italic>
</source>
                    <year>2014</year>;<volume>346</volume>:<fpage>1256688</fpage>.
                    <pub-id pub-id-type="pmid">25430773</pub-id>
                    <pub-id pub-id-type="doi">10.1126/science.1256688</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref109">
                <mixed-citation publication-type="journal">
                    <collab>The UniProt Consortium</collab>:
                    <article-title>UniProt: a worldwide hub of protein knowledge.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2019</year>;<volume>47</volume>:<fpage>D506</fpage>&#x2013;<lpage>D515</lpage>.
                    <pub-id pub-id-type="pmid">30395287</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gky1049</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6323992</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref110">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vandepitte</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A decade of the World Register of Marine Species &#x2013; General insights and experiences from the Data Management Team: Where are we, what have we learned and how can we continue? Hejnol, A, editor.</article-title>
                    <source>

                        <italic toggle="yes">PLOS ONE.</italic>
</source>
                    <year>2018</year>;<volume>13</volume>:<fpage>e0194599</fpage>.
                    <pub-id pub-id-type="pmid">29624577</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0194599</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5889062</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref111">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vaulot</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Geisen</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mah&#x00e9;</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bass</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>pr2-primers: an 18S rRNA primer database for protists.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2021</year>.
                    <pub-id pub-id-type="doi">10.1101/2021.01.04.425170</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref112">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Veiga Leprevost</surname>
                            <given-names>F</given-names>
                            <prefix>da</prefix>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>BioContainers: an open-source and community-driven framework for software standardization Valencia, A, editor.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2017</year>;<volume>33</volume>:<fpage>2580</fpage>&#x2013;<lpage>2582</lpage>.
                    <pub-id pub-id-type="pmid">28379341</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btx192</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5870671</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref113">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vernette</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The Ocean barcode atlas: A web service to explore the biodiversity and biogeography of marine organisms.</article-title>
                    <source>

                        <italic toggle="yes">Mol. Ecol. Resour.</italic>
</source>
                    <year>2021</year>;<volume>21</volume>:<fpage>1347</fpage>&#x2013;<lpage>1358</lpage>.
                    <pub-id pub-id-type="doi">10.1111/1755-0998.13322</pub-id>1755&#x2013;0998.13322.</mixed-citation>
            </ref>
            <ref id="ref114">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>V&#x011b;trovsk&#x00fd;</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A meta-analysis of global fungal distribution reveals climate-driven patterns.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2019</year>;<volume>10</volume>:<fpage>5142</fpage>.
                    <pub-id pub-id-type="pmid">31723140</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-019-13164-8</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6853883</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref115">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>V&#x011b;trovsk&#x00fd;</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>GlobalFungi, a global database of fungal occurrences from high-throughput-sequencing metabarcoding studies.</article-title>
                    <source>

                        <italic toggle="yes">Sci. Data.</italic>
</source>
                    <year>2020</year>;<volume>7</volume>:<fpage>228</fpage>.
                    <pub-id pub-id-type="pmid">32661237</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41597-020-0567-7</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7359306</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref116">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Villar</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The Ocean Gene Atlas: exploring the biogeography of plankton genes online.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2018</year>;<volume>46</volume>:<fpage>W289</fpage>&#x2013;<lpage>W295</lpage>.
                    <pub-id pub-id-type="pmid">29788376</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gky376</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6030836</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref117">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vlk</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Alien ectomycorrhizal plants differ in their ability to interact with co-introduced and native ectomycorrhizal fungi in novel sites.</article-title>
                    <source>

                        <italic toggle="yes">ISME J.</italic>
</source>
                    <year>2020</year>;<volume>14</volume>:<fpage>2336</fpage>&#x2013;<lpage>2346</lpage>.
                    <pub-id pub-id-type="pmid">32499492</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41396-020-0692-5</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7608243</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref118">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The China National GeneBank&#x2500;owned by all, completed by all and shared by all.</article-title>
                    <source>

                        <italic toggle="yes">Yi Chuan Hered.</italic>
</source>
                    <year>2019</year>;<volume>41</volume>:<fpage>761</fpage>&#x2013;<lpage>772</lpage>.
                    <pub-id pub-id-type="doi">10.16288/j.yczz.19-148</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref119">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Weise</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oppermann</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Maggioni</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>EURISCO: The European search catalogue for plant genetic resources.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2017</year>;<volume>45</volume>:<fpage>D1003</fpage>&#x2013;<lpage>D1008</lpage>.
                    <pub-id pub-id-type="pmid">27580718</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkw755</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5210606</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref120">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Whitman</surname>
                            <given-names>WB</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Genomic Encyclopedia of Bacterial and Archaeal Type Strains, Phase III: the genomes of soil and plant-associated and newly described type strains.</article-title>
                    <source>

                        <italic toggle="yes">Stand. Genomic Sci.</italic>
</source>
                    <year>2015</year>;<volume>10</volume>:<fpage>26</fpage>.
                    <pub-id pub-id-type="pmid">26203337</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s40793-015-0017-x</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4511459</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref121">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wieczorek</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Darwin Core: An Evolving Community-Developed Biodiversity Data Standard Sarkar, IN, editor.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2012</year>;<volume>7</volume>:<fpage>e29715</fpage>.
                    <pub-id pub-id-type="pmid">22238640</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0029715</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3253084</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref122">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wilkinson</surname>
                            <given-names>MD</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The FAIR Guiding Principles for scientific data management and stewardship.</article-title>
                    <source>

                        <italic toggle="yes">Sci. Data.</italic>
</source>
                    <year>2016</year>;<volume>3</volume>:<fpage>160018</fpage>.
                    <pub-id pub-id-type="pmid">26978244</pub-id>
                    <pub-id pub-id-type="doi">10.1038/sdata.2016.18</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4792175</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref123">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wilkinson</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Signatures of Diversifying Selection in European Pig Breeds Visscher, PM, editor.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Genet.</italic>
</source>
                    <year>2013</year>;<volume>9</volume>:<fpage>e1003453</fpage>.
                    <pub-id pub-id-type="pmid">23637623</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pgen.1003453</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3636142</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref124">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yilmaz</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Biotechnol.</italic>
</source>
                    <year>2011</year>;<volume>29</volume>:<fpage>415</fpage>&#x2013;<lpage>420</lpage>.
                    <pub-id pub-id-type="pmid">21552244</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.1823</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3367316</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref125">
                <mixed-citation publication-type="journal">
                    <collab>Zoonomia Consortium</collab>:
                    <article-title>A comparative genomics multitool for scientific discovery and conservation.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2020</year>;<volume>587</volume>:<fpage>240</fpage>&#x2013;<lpage>245</lpage>.
                    <pub-id pub-id-type="pmid">33177664</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41586-020-2876-6</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7759459</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report119683">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.77505.r119683</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Andersson</surname>
                        <given-names>Anders</given-names>
                    </name>
                    <xref ref-type="aff" rid="r119683a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-3627-6899</uri>
                </contrib>
                <aff id="r119683a1">
                    <label>1</label>Department of Gene Technology, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Science for Life Laboratory, Stockholm, Sweden</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>31</day>
                <month>3</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Andersson A</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport119683" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.73825.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The opinion article by Waterhouse et al. "Recommendations for connecting molecular sequence and biodiversity research infrastructures through ELIXIR" goes through a number of use cases on how biodiversity information from DNA sequencing data can be integrated into biodiversity platforms and also highlights challenges related to this. Towards the end, the authors, as the title promises, provide some recommendations on what type of infrastructure initiatives should be funded, to help the biodiversity field overcome the challenges.</p>
            <p> </p>
            <p> This paper has many good points and the authors have pinpointed several of the caveats of integrating molecular data with biodiversity platforms. And it also provides some concrete advice on projects/initiatives to overcome the obstacles. But the paper is extremely long - 28 pages - and sometimes difficult to read. For example, it took me five attempts to grasp the meaning of this sentence: &#x201c;Our survey of approaches by which molecular technologies help inform understanding of biodiversity aimed to identify opportunities and priorities to aid strategic thinking&#x201d;. And the following sentence doesn't provide much aid: "This highlights the emerging critical importance of making use of molecular data to advance understanding of biodiversity in its broadest terms.&#x201d;. I would thus recommend the authors simplify the text a bit. Likewise, I think the paper would benefit from being shortened, otherwise, there is a risk many readers will never reach the recommendations at the end (which, given the title, is probably the main point of this opinion paper).</p>
            <p> </p>
            <p> Here some specific suggestions (page numbers refer to the pdf version of the article):</p>
            <p> </p>
            <p> Abstract: "To identify opportunities, highlight priorities, and aid strategic thinking, here we survey approaches by which molecular technologies help inform understanding of biodiversity." -&gt; (I suggest) "Here we survey approaches by which molecular technologies help inform understanding of biodiversity, in order to identify opportunities, highlight priorities, and aid strategic thinking."</p>
            <p> </p>
            <p> Abstract: &#x201c;Increasing knowledge of marine biodiversity&#x201d; -&gt; &#x201c;increasing knowledge of marine biodiversity&#x201d;</p>
            <p> </p>
            <p> p. 3: &#x201c;at genetic, species, and ecosystem&#x201d; -&gt; &#x201c;at population, community and ecosystem levels&#x201d; (all those levels involve genetics)</p>
            <p> </p>
            <p> p. 3: &#x201c;millions of years of evolution&#x201d; -&gt; &#x201c;billions of years of evolution&#x201d; (life on Earth arose 3-4 billion years ago)</p>
            <p> </p>
            <p> p. 4: "These examples help to formulate more formal definitions: (i) molecular sequence data collection initiatives are producing and collating reference catalogues of genetic and genomic biodiversity on Earth; and (ii) biodiversity research infrastructures are capturing knowledge from scientific collections, observations, and the literature, and building resources of biodiversity information for all Earth&#x2019;s organisms. Here we identify opportunities to connect these."</p>
            <p> </p>
            <p> - I find the definition (i) incomplete: in addition to producing reference catalogues (e.g. genomes or marker genes) they, importantly, also contain sequencing datasets from the field that hold information on species occurrences (and sometimes intra-specific diversity) in samples (i.e. metagenomic and metabarcoding datasets). The description of MGnify on page 7 illustrates this. Maybe this is what is meant by "collating" but that was not clear to me.</p>
            <p> </p>
            <p> Table 1: Add GTDB (https://gtdb.ecogenomic.org/)</p>
            <p> </p>
            <p> p. 6: Add a brief description of GTDB, for example to the first paragraph of page 6. GTDB is rapidly establishing itself as the standard for cataloguing prokaryotic diversity and a good example of how (meta)genomics can aid in improving taxonomies.</p>
            <p> </p>
            <p> p. 8: "In this context, while seeking a new experimental design for molecular characterisation of specific organisms, the absence of unique identifiers (i.e. taxIDs) represents an important issue in collecting the most comprehensive information related to the organisms of interest."</p>
            <p> </p>
            <p> - This sentence has unclear meaning to me, what is meant by "new experimental design" here?</p>
            <p> </p>
            <p> p. 18: &#x201c;These efforts can benefit from and should build on the ELIXIR tools ecosystem (ELIXIR 2021b) that aims to help communities find, register and benchmark software tools, while maintaining information standards for these tools, and producing, adopting and promoting best practices for their development.&#x201d;.</p>
            <p> </p>
            <p> - I'm not really in favour of using the term "should" here. There are other examples of software tool collaborations that fulfil the FAIR requirements such as the nf-core collaboration (https://nf-co.re/) with pipelines/tools used by thousands of researchers and many sequencing facilities.</p>
            <p>Is the topic of the opinion article discussed accurately in the context of the current literature?</p>
            <p>Yes</p>
            <p>Are arguments sufficiently supported by evidence from the published literature?</p>
            <p>Yes</p>
            <p>Are all factual statements correct and adequately supported by citations?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn balanced and justified on the basis of the presented arguments?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Microbial ecology and evolution. Bioinformatics.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment8535-119683">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Lanfear</surname>
                            <given-names>Jerry</given-names>
                        </name>
                        <aff>ELIXIR, UK</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>18</day>
                    <month>7</month>
                    <year>2022</year>
                </pub-date>
            </front-stub>
            <body>
                <p>The opinion article by Waterhouse et al. "Recommendations for connecting molecular sequence and biodiversity research infrastructures through ELIXIR" goes through a number of use cases on how biodiversity information from DNA sequencing data can be integrated into biodiversity platforms and also highlights challenges related to this. Towards the end, the authors, as the title promises, provide some recommendations on what type of infrastructure initiatives should be funded, to help the biodiversity field overcome the challenges.</p>
                <p> </p>
                <p> This paper has many good points and the authors have pinpointed several of the caveats of integrating molecular data with biodiversity platforms. And it also provides some concrete advice on projects/initiatives to overcome the obstacles. But the paper is extremely long - 28 pages - and sometimes difficult to read. For example, it took me five attempts to grasp the meaning of this sentence: &#x201c;Our survey of approaches by which molecular technologies help inform understanding of biodiversity aimed to identify opportunities and priorities to aid strategic thinking&#x201d;. And the following sentence doesn't provide much aid: "This highlights the emerging critical importance of making use of molecular data to advance understanding of biodiversity in its broadest terms.&#x201d;. I would thus recommend the authors simplify the text a bit. Likewise, I think the paper would benefit from being shortened, otherwise, there is a risk many readers will never reach the recommendations at the end (which, given the title, is probably the main point of this opinion paper).</p>
                <p> </p>
                <p> 
                    <bold>Response: We thank the reviewer for noting the positive points and for the constructive criticisms with respect to addressing readability issues, also noted by reviewer 1. We have made simplifications and rephrased complex statements to more clearly convey the main messages. Regarding the length, we recognise that describing the four use cases in the manuscript adds substantially to the overall content, but we believe these details are necessary as they provide the basis from which to develop meaningful recommendations. We agree that the recommendations are the main point of the paper, and we believe that many readers will be inclined to focus on this section along with one or two of the use cases most closely aligned with their own research fields. Thus, while attempting to be more concise throughout the manuscript we would prefer not to dramatically shorten or discard any particular section. We have also rephrased the two sentences highlighted here to improve readability.&#x00a0;</bold>
                </p>
                <p> </p>
                <p> </p>
                <p> Here some specific suggestions (page numbers refer to the pdf version of the article):</p>
                <p> </p>
                <p> Abstract: "To identify opportunities, highlight priorities, and aid strategic thinking, here we survey approaches by which molecular technologies help inform understanding of biodiversity." -&gt; (I suggest) "Here we survey approaches by which molecular technologies help inform understanding of biodiversity, in order to identify opportunities, highlight priorities, and aid strategic thinking."</p>
                <p> </p>
                <p> 
                    <bold>Response: Agree, updated.</bold>
                </p>
                <p> </p>
                <p> Abstract: &#x201c;Increasing knowledge of marine biodiversity&#x201d; -&gt; &#x201c;increasing knowledge of marine biodiversity&#x201d;</p>
                <p> </p>
                <p> 
                    <bold>Response: fixed</bold>
                </p>
                <p> </p>
                <p> p. 3: &#x201c;at genetic, species, and ecosystem&#x201d; -&gt; &#x201c;at population, community and ecosystem levels&#x201d; (all those levels involve genetics)</p>
                <p> </p>
                <p> Response: Agree that the proposed formulation is better, updated.</p>
                <p> </p>
                <p> p. 3: &#x201c;millions of years of evolution&#x201d; -&gt; &#x201c;billions of years of evolution&#x201d; (life on Earth arose 3-4 billion years ago)</p>
                <p> </p>
                <p> 
                    <bold>Response: Agree, updated.</bold>
                </p>
                <p> </p>
                <p> p. 4: "These examples help to formulate more formal definitions: (i) molecular sequence data collection initiatives are producing and collating reference catalogues of genetic and genomic biodiversity on Earth; and (ii) biodiversity research infrastructures are capturing knowledge from scientific collections, observations, and the literature, and building resources of biodiversity information for all Earth&#x2019;s organisms. Here we identify opportunities to connect these."</p>
                <p> - I find the definition (i) incomplete: in addition to producing reference catalogues (e.g. genomes or marker genes) they, importantly, also contain sequencing datasets from the field that hold information on species occurrences (and sometimes intra-specific diversity) in samples (i.e. metagenomic and metabarcoding datasets). The description of MGnify on page 7 illustrates this. Maybe this is what is meant by "collating" but that was not clear to me.</p>
                <p> </p>
                <p> 
                    <bold>Response: We agree that the definition should be more clearly broadened to encompass other sequencing datasets and have updated the text accordingly.</bold>
                </p>
                <p> </p>
                <p> Table 1: Add GTDB (https://gtdb.ecogenomic.org/)</p>
                <p> </p>
                <p> 
                    <bold>Response: The examples provided in Table 1 are focused on international projects and umbrella initiatives producing (meta) genomes, (meta) transcriptomes, and/or DNA barcodes. GTDB seems to fit more the profile of a consumer of such data and therefore we do not think it represents an example of the type of project we wish to highlight here.</bold>
                </p>
                <p> </p>
                <p> p. 6: Add a brief description of GTDB, for example to the first paragraph of page 6. GTDB is rapidly establishing itself as the standard for cataloguing prokaryotic diversity and a good example of how (meta)genomics can aid in improving taxonomies.</p>
                <p> </p>
                <p> 
                    <bold>Response: We agree that this is a good example and we have added GTDB in the discussion of microbe-focused sequencing initiatives.</bold>
                </p>
                <p> </p>
                <p> p. 8: "In this context, while seeking a new experimental design for molecular characterisation of specific organisms, the absence of unique identifiers (i.e. taxIDs) represents an important issue in collecting the most comprehensive information related to the organisms of interest."</p>
                <p> - This sentence has unclear meaning to me, what is meant by "new experimental design" here?</p>
                <p> </p>
                <p> 
                    <bold>Response: Indeed, &#x2018;new experimental design&#x2019; could be misleading, we have rephrased to improve clarity</bold>
                </p>
                <p> </p>
                <p> p. 18: &#x201c;These efforts can benefit from and should build on the ELIXIR tools ecosystem (ELIXIR 2021b) that aims to help communities find, register and benchmark software tools, while maintaining information standards for these tools, and producing, adopting and promoting best practices for their development.&#x201d;.</p>
                <p> - I'm not really in favour of using the term "should" here. There are other examples of software tool collaborations that fulfil the FAIR requirements such as the nf-core collaboration (https://nf-co.re/) with pipelines/tools used by thousands of researchers and many sequencing facilities.</p>
                <p> </p>
                <p> 
                    <bold>Response: Indeed the use of &#x201c;should&#x201d; here was meant to echo the sentiment above relating to &#x201c;building on and connecting to existing tools and services&#x201d; but as it could be misinterpreted we have reworded to remove ambiguity.</bold>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report115285">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.77505.r115285</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Hobern</surname>
                        <given-names>Donald</given-names>
                    </name>
                    <xref ref-type="aff" rid="r115285a1">1</xref>
                    <xref ref-type="aff" rid="r115285a2">2</xref>
                    <xref ref-type="aff" rid="r115285a3">3</xref>
                    <xref ref-type="aff" rid="r115285a4">4</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6492-4016</uri>
                </contrib>
                <aff id="r115285a1">
                    <label>1</label>International Barcode of Life, Guelph, Canada</aff>
                <aff id="r115285a2">
                    <label>2</label>Australian Plant Phenomics Facility, Adelaide, Australia</aff>
                <aff id="r115285a3">
                    <label>3</label>Atlas of Living Australia, Canberra, Australia</aff>
                <aff id="r115285a4">
                    <label>4</label>Species 2000, Leiden, The Netherlands</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>5</day>
                <month>1</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Hobern D</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport115285" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.73825.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This paper addresses probably the most significant opportunity for data-driven innovation and transformation in taxonomy, biogeography, ecology, conservation and biosecurity, with major implications for sustainability and food security.</p>
            <p> </p>
            <p> The paper is well structured and clearly demonstrates the potential and challenges. I've divided my comments as follows: 1) clarifications of detail (minor) in regard to some of the referenced initiatives, 2) a few major initiatives that are highly relevant but not referenced, 3) suggestions to make the text clearer and more readable.</p>
            <p> </p>
            <p> I use page numbers from the PDF version downloadable on 4 January 2022.</p>
            <p> </p>
            <p> 
                <bold>Clarifications of detail</bold>
            </p>
            <p> </p>
            <p> Page 4: The preferred abbreviation for Catalogue of Life is now (since 2021) fully capitalised (COL).</p>
            <p> </p>
            <p> Page 4: "The taxonomic frameworks are built ..." - this seems to refer specifically to GBIF's taxonomic framework, so perhaps replace "The" with "Its". More importantly, by far the largest contribution to GBIF's framework is COL (see https://data-blog.gbif.org/post/gbif-backbone-taxonomy/) - this provides the major structure and other published resources (including BLR) augment it.</p>
            <p> </p>
            <p> Page 6 - "The main reference libraries include ..." - BOLD is the main reference library that supports iBOL. iBOL is not a separate library. I recommend rewriting "and the International Barcode of Life" as "maintained by the International Barcode of Life".</p>
            <p> </p>
            <p> Page 6 - "(e.g. NCBI or GBIF)" - as noted above, COL is the core of the GBIF backbone and is used in many other contexts. It may be better to reference "(NCBI, COL or GBIF)".</p>
            <p> </p>
            <p> Page 7 - "For example, connecting GBIF with the UNITE database" - Note that GBIF has integrated both BOLD and UNITE in this way - both now contribute molecular OTUs (in BOLD's case BINs) that appear as part of the GBIF taxonomic framework.</p>
            <p> </p>
            <p> Page 7 - "Reciprocally, traditional biodiversity data and resources can help inform" - while this is true, it is not clear how the authors expect the benefits to be developed in this direction. Navigation from molecular data to a corpus of knowledge about associated taxa is *relatively* simple to achieve, but the published literature is insufficiently structured or parsed to support meaningful inference to support machine-driven analysis of genetic and genomic data.</p>
            <p> </p>
            <p> Page 8 - "The NCBI taxonomy database" - The listed number of synonyms is important, but should be put in context of the current version of COL including 1.95 million accepted species names and another 2 million synonyms.</p>
            <p> </p>
            <p> Page 10 - "An additional source of information on taxon names" - It is important to highlight the scale and diversity of information resources and datasets relating to biodivesity, but these are so heterogeneous that it is unhelpful to treat them monolithically. Lumping them together leaves taxonomic identifiers as the only possible connection point. In practice, the field of biodiversity informatics needs to digest these resources into digital objects that fall into more precise classes (specimen, ecosystem, species, gene, sampling event, trait, etc.).</p>
            <p> </p>
            <p> Page 13 - Figure 3 caption - "They can also archive their datasets at GBIF without any clearance." - Unless I am missing something, this would be better expressed as "publish their datasets to GBIF" since GBIF does not currently assume responsibility for archival of data published to the network (although such archival often de facto occurs).</p>
            <p> </p>
            <p> Page 16 - "Missing and non-matched taxIDs give an incomplete and inconsistent view" - All that is written here is true, but an associated and often neglected issue is the uncertainty associated with taxonomic identifications. A name may be correctly interpreted according to a perfect taxonomic framework, while all the time being based on misidentification. This aspect overlays everything written here and needs to be acknowledged. Of course, this is also a key area in which the fusion of genetic/genomic and other data can bring big benefits. Ideally taxonomic type specimens will end up serving as anchor points not only for morphological descriptions but also as DNA vouchers that can be used to label the corresponding molecular OTUs and validate field-collected data.</p>
            <p> </p>
            <p> Page 17 - "Efforts to develop these would mean that the appropriate metadata can be captured during the experiment" - True, but it is important that we distinguish clearly within the metadata between elements co-collected with the sample of interest and elements added subsequently via interpolation, look-up, etc.</p>
            <p> </p>
            <p> Page 17 - "Even when such data brokering tools exist for specific communities" - Another source of difficulty is inconsistent rigor in defining or interpreting even widely adopted standards. Mapping data from different studies will involve compromises and ambiguities that may not be apparent either to those sharing the data or to consumers of the data.</p>
            <p> </p>
            <p> Page 18 - "Metadata needs to be better standardised and universally adopted" - It may be worthwhile to clarify the scope of what is intended by "metadata" - FAIR data standards should include consideration of data structures and packaging models to ensure that users can correctly find and interpret all elements. This is more complex than adopting vocabulaties, etc.</p>
            <p> </p>
            <p> Page 18 - "Bioinformatics tools and services for biodiversity research need to be prioritised" - It may be worthwhile to acknowledge that we do not need monolithic solutions here. We need minimum information standards, stable identifiers and provenance information, good generalisable packaging mechanisms and a software ecosystem that assists with point-to-point or data-class to data-class transformations. Satellite imagery may be a good analogy. Downstream consumers need well-referenced products such as NDVI - these become the components of interest for other more targeted applications. In the same way, we may be best off focusing on a modular approach - develop robust taxonomic frameworks and associated tools, map molecular hypotheses against these frameworks, ensure that data from samples can be consumed as Darwin Core Occurrences and Events, etc.</p>
            <p> </p>
            <p> 
                <bold>Other initiatives</bold>
            </p>
            <p> </p>
            <p> Page 7 - "Ongoing efforts to coordinate traditional biodiversity infrastructures" - As well as the initiatives referenced in this paragraph, GBIF and partners have organised two global conferences, each leading to a publication focused on building such coordination (Hobern 
                <italic>et al.</italic> 2012, Hobern 
                <italic>et al.</italic> 2019). The later event led to the call for an alliance for biodiversity knowledge (
                <ext-link ext-link-type="uri" xlink:href="https://www.biodiversityinformatics.org/">https://www.biodiversityinformatics.org/</ext-link>) which is highly relevant to this paper as an umbrella for cross-infrastructure collaborations. The alliance is also applicable to page 18 "Biodiversity-related and molecular-focused infrastructures need to collaborate".</p>
            <p> </p>
            <p> Page 8 - Use case 1, paragraph 1 - A topical collection of six papers has recently been published in Organisms Diversity and Evolution from work carried out under the auspices of IUBS. This collection specifically explores the need for a shared taxonomic framework and makes proposals for the required collaboration (Towards a global list of accepted species, see&#x00a0;
                <ext-link ext-link-type="uri" xlink:href="https://link.springer.com/journal/13127/topicalCollection/AC_82162ae498991fe393589cc98ba425d4">here</ext-link>). Citations for all six papers provided. This is especially applicable to page 18 "Taxonomies need to be aligned and harmonised across domains".</p>
            <p> </p>
            <p> 
                <bold>Readability</bold>
            </p>
            <p> </p>
            <p> It may be a relatively minor issue, but many sentences throughout the document are unnecessarily hard to read because the central ideas are delayed to the end of the sentence and/or a passive voice is unnecessarily used. For example, in the abstract, consider rewriting "As a research infrastructure developing services and technical solutions that help integrate and coordinate life science resources across Europe, ELIXIR is a key player" as "ELIXIR plays a key role as a research infrastructure that develops services and technical solutions that help integrate and coordinate life science resources across Europe". I found I needed to re-read several passages a few times to get their sense. In almost all cases, the concepts were correct and important, but obscured by word order.</p>
            <p> </p>
            <p> Page 8 - use case 1 seems in particular need of a rewrite to improve clarity. The first sentence, ("Creating a comprehensive taxonomy linked ...") does not make sense and certainly needs to be rewritten.&#x00a0;</p>
            <p> </p>
            <p> Page 11 - "The MAR database entries are cross-referenced with ENA and the World Register of Marine Species (WoRMS) (Vandepitte et al. 2018) records" - no need for the word "records".</p>
            <p> </p>
            <p> Page 11 - "On the one hand, the large and growing variety of observations taken during oceanic sampling (Gorsky et al. 2019) have posed many data management challenges." - "has posed".</p>
            <p> </p>
            <p> Page 13 - "Long-standing scientific interests" - "interest".</p>
            <p> </p>
            <p> Page 15 - "This species, experiment and sample metadata" - may be clearer as "This metadata on species, experimennts and samples".</p>
            <p> </p>
            <p> Page 18 - "It is clear that from barcodes to reference genomes, sequencing hundreds of thousands of species in the near future will generate the foundational data for most biodiversity molecular studies for decades to come." - this sentence is awkward and could be rewritten.</p>
            <p>Is the topic of the opinion article discussed accurately in the context of the current literature?</p>
            <p>Yes</p>
            <p>Are arguments sufficiently supported by evidence from the published literature?</p>
            <p>Yes</p>
            <p>Are all factual statements correct and adequately supported by citations?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn balanced and justified on the basis of the presented arguments?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Biodiversity informatics including management of taxonomic and DNA barcode data, use of data in taxonomy, ecology and agriculture.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-115285-1">
                    <label>1</label>
                    <mixed-citation>
                        <person-group person-group-type="author"/>:
                        <article-title>Global Biodiversity Informatics Outlook: Delivering biodiversity knowledge in the information age</article-title>.
                        <source>
                            <italic>Global Biodiversity Information Facility</italic>
                        </source>.<year>2012</year>;
                        <elocation-id>https://doi.org/10.15468/6JXA-YB44</elocation-id>
                        <pub-id pub-id-type="doi">https://doi.org/10.15468/6JXA-YB44</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-115285-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Connecting data and expertise: a new alliance for biodiversity knowledge.</article-title>
                        <source>
                            <italic>Biodivers Data J</italic>
                        </source>.<year>2019</year>;<volume>7</volume>:
                        <elocation-id>10.3897/BDJ.7.e33679</elocation-id>
                        <fpage>e33679</fpage>
                        <pub-id pub-id-type="pmid">30886531</pub-id>
                        <pub-id pub-id-type="doi">10.3897/BDJ.7.e33679</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-115285-3">
                    <label>3</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Towards a global list of accepted species I. Why taxonomists sometimes disagree, and why this matters</article-title>.
                        <source>
                            <italic>Organisms Diversity &amp; Evolution</italic>
                        </source>.<year>2021</year>;<volume>21</volume>(<issue>4</issue>) :
                        <elocation-id>10.1007/s13127-021-00495-y</elocation-id>
                        <fpage>615</fpage>-<lpage>622</lpage>
                        <pub-id pub-id-type="doi">10.1007/s13127-021-00495-y</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-115285-4">
                    <label>4</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Towards a global list of accepted species II. Consequences of inadequate taxonomic list governance</article-title>.
                        <source>
                            <italic>Organisms Diversity &amp; Evolution</italic>
                        </source>.<year>2021</year>;<volume>21</volume>(<issue>4</issue>) :
                        <elocation-id>10.1007/s13127-021-00518-8</elocation-id>
                        <fpage>623</fpage>-<lpage>630</lpage>
                        <pub-id pub-id-type="doi">10.1007/s13127-021-00518-8</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-115285-5">
                    <label>5</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Towards a global list of accepted species III. Independence and stakeholder inclusion</article-title>.
                        <source>
                            <italic>Organisms Diversity &amp; Evolution</italic>
                        </source>.<year>2021</year>;<volume>21</volume>(<issue>4</issue>) :
                        <elocation-id>10.1007/s13127-021-00496-x</elocation-id>
                        <fpage>631</fpage>-<lpage>643</lpage>
                        <pub-id pub-id-type="doi">10.1007/s13127-021-00496-x</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-115285-6">
                    <label>6</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Towards a global list of accepted species IV: Overcoming fragmentation in the governance of taxonomic lists</article-title>.
                        <source>
                            <italic>Organisms Diversity &amp; Evolution</italic>
                        </source>.<year>2021</year>;<volume>21</volume>(<issue>4</issue>) :
                        <elocation-id>10.1007/s13127-021-00499-8</elocation-id>
                        <fpage>645</fpage>-<lpage>655</lpage>
                        <pub-id pub-id-type="doi">10.1007/s13127-021-00499-8</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-115285-7">
                    <label>7</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Towards a global list of accepted species V. The devil is in the detail</article-title>.
                        <source>
                            <italic>Organisms Diversity &amp; Evolution</italic>
                        </source>.<year>2021</year>;<volume>21</volume>(<issue>4</issue>) :
                        <elocation-id>10.1007/s13127-021-00504-0</elocation-id>
                        <fpage>657</fpage>-<lpage>675</lpage>
                        <pub-id pub-id-type="doi">10.1007/s13127-021-00504-0</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-115285-8">
                    <label>8</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Towards a global list of accepted species VI: The Catalogue of Life checklist</article-title>.
                        <source>
                            <italic>Organisms Diversity &amp; Evolution</italic>
                        </source>.<year>2021</year>;<volume>21</volume>(<issue>4</issue>) :
                        <elocation-id>10.1007/s13127-021-00516-w</elocation-id>
                        <fpage>677</fpage>-<lpage>690</lpage>
                        <pub-id pub-id-type="doi">10.1007/s13127-021-00516-w</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment8534-115285">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Lanfear</surname>
                            <given-names>Jerry</given-names>
                        </name>
                        <aff>ELIXIR, UK</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>18</day>
                    <month>7</month>
                    <year>2022</year>
                </pub-date>
            </front-stub>
            <body>
                <p>This paper addresses probably the most significant opportunity for data-driven innovation and transformation in taxonomy, biogeography, ecology, conservation and biosecurity, with major implications for sustainability and food security.</p>
                <p> The paper is well structured and clearly demonstrates the potential and challenges. I've divided my comments as follows: 1) clarifications of detail (minor) in regard to some of the referenced initiatives, 2) a few major initiatives that are highly relevant but not referenced, 3) suggestions to make the text clearer and more readable.</p>
                <p> I use page numbers from the PDF version downloadable on 4 January 2022.</p>
                <p> </p>
                <p> 
                    <bold>Response: We thank the reviewer for recognising the importance of this Opinion Piece and especially for the detailed constructive feedback that has undoubtedly helped to improve the manuscript substantially.</bold>
                </p>
                <p> </p>
                <p> Clarifications of detail</p>
                <p> </p>
                <p> Page 4: The preferred abbreviation for Catalogue of Life is now (since 2021) fully capitalised (COL).</p>
                <p> </p>
                <p> 
                    <bold>Response: Updated to COL throughout, including in Figure 1.</bold>
                </p>
                <p> </p>
                <p> Page 4: "The taxonomic frameworks are built ..." - this seems to refer specifically to GBIF's taxonomic framework, so perhaps replace "The" with "Its". More importantly, by far the largest contribution to GBIF's framework is COL (see https://data-blog.gbif.org/post/gbif-backbone-taxonomy/) - this provides the major structure and other published resources (including BLR) augment it.</p>
                <p> </p>
                <p> 
                    <bold>Response: Updated accordingly.</bold>
                </p>
                <p> </p>
                <p> Page 6 - "The main reference libraries include ..." - BOLD is the main reference library that supports iBOL. iBOL is not a separate library. I recommend rewriting "and the International Barcode of Life" as "maintained by the International Barcode of Life".</p>
                <p> </p>
                <p> 
                    <bold>Response: Updated accordingly.</bold>
                </p>
                <p> </p>
                <p> Page 6 - "(e.g. NCBI or GBIF)" - as noted above, COL is the core of the GBIF backbone and is used in many other contexts. It may be better to reference "(NCBI, COL or GBIF)".</p>
                <p> </p>
                <p> 
                    <bold>Response: Updated accordingly.</bold>
                </p>
                <p> </p>
                <p> Page 7 - "For example, connecting GBIF with the UNITE database" - Note that GBIF has integrated both BOLD and UNITE in this way - both now contribute molecular OTUs (in BOLD's case BINs) that appear as part of the GBIF taxonomic framework.</p>
                <p> </p>
                <p> 
                    <bold>Response: Our second example in this section (Canadian invertebrate fauna) presents a specific case of how BOLD integrates with GBIF, so although somewhat implicit this integration is already noted and we prefer to leave it as it is rather than expand this section.</bold>
                </p>
                <p> </p>
                <p> Page 7 - "Reciprocally, traditional biodiversity data and resources can help inform" - while this is true, it is not clear how the authors expect the benefits to be developed in this direction. Navigation from molecular data to a corpus of knowledge about associated taxa is *relatively* simple to achieve, but the published literature is insufficiently structured or parsed to support meaningful inference to support machine-driven analysis of genetic and genomic data.</p>
                <p> </p>
                <p> 
                    <bold>Response: We agree that currently there are many challenges that still need to be overcome to achieve this and have edited the text to reflect this as a future goal rather than a current reality.</bold>
                </p>
                <p> </p>
                <p> Page 8 - "The NCBI taxonomy database" - The listed number of synonyms is important, but should be put in context of the current version of COL including 1.95 million accepted species names and another 2 million synonyms.</p>
                <p> </p>
                <p> 
                    <bold>Response: For context we have added this after the reference to Figure 1.</bold>
                </p>
                <p> </p>
                <p> Page 10 - "An additional source of information on taxon names" - It is important to highlight the scale and diversity of information resources and datasets relating to biodivesity, but these are so heterogeneous that it is unhelpful to treat them monolithically. Lumping them together leaves taxonomic identifiers as the only possible connection point. In practice, the field of biodiversity informatics needs to digest these resources into digital objects that fall into more precise classes (specimen, ecosystem, species, gene, sampling event, trait, etc.).</p>
                <p> </p>
                <p> 
                    <bold>Response: Although this case study is focused on taxonomy we agree that highlighting the role of digital objects is important and have updated the text accordingly.</bold>
                </p>
                <p> </p>
                <p> Page 13 - Figure 3 caption - "They can also archive their datasets at GBIF without any clearance." - Unless I am missing something, this would be better expressed as "publish their datasets to GBIF" since GBIF does not currently assume responsibility for archival of data published to the network (although such archival often de facto occurs).</p>
                <p> </p>
                <p> 
                    <bold>Response: Updated accordingly.</bold>
                </p>
                <p> </p>
                <p> Page 16 - "Missing and non-matched taxIDs give an incomplete and inconsistent view" - All that is written here is true, but an associated and often neglected issue is the uncertainty associated with taxonomic identifications. A name may be correctly interpreted according to a perfect taxonomic framework, while all the time being based on misidentification. This aspect overlays everything written here and needs to be acknowledged. Of course, this is also a key area in which the fusion of genetic/genomic and other data can bring big benefits. Ideally taxonomic type specimens will end up serving as anchor points not only for morphological descriptions but also as DNA vouchers that can be used to label the corresponding molecular OTUs and validate field-collected data.</p>
                <p> </p>
                <p> 
                    <bold>Response: This is indeed a very important point that we did not address specifically, we have added &#x201c;Taxon misidentifications&#x201d; to this section to highlight this while not going into details to try to keep this section on common challenges concise, instead adding a sentence to the use case 1 section earlier to elaborate this point.</bold>
                </p>
                <p> </p>
                <p> Page 17 - "Efforts to develop these would mean that the appropriate metadata can be captured during the experiment" - True, but it is important that we distinguish clearly within the metadata between elements co-collected with the sample of interest and elements added subsequently via interpolation, look-up, etc.</p>
                <p> </p>
                <p> 
                    <bold>Response: We agree with this distinction and have updated the text to include this important point.</bold>
                </p>
                <p> </p>
                <p> Page 17 - "Even when such data brokering tools exist for specific communities" - Another source of difficulty is inconsistent rigor in defining or interpreting even widely adopted standards. Mapping data from different studies will involve compromises and ambiguities that may not be apparent either to those sharing the data or to consumers of the data.</p>
                <p> </p>
                <p> 
                    <bold>Response: We agree that even within domains (between studies) there are ambiguities and have updated the text to specifically mention this before considering cross-domain issues.</bold>
                </p>
                <p> </p>
                <p> Page 18 - "Metadata needs to be better standardised and universally adopted" - It may be worthwhile to clarify the scope of what is intended by "metadata" - FAIR data standards should include consideration of data structures and packaging models to ensure that users can correctly find and interpret all elements. This is more complex than adopting vocabulaties, etc.</p>
                <p> </p>
                <p> 
                    <bold>Response: We agree that data structures and packaging models are also important for FAIR. However, here we focus on metadata solely as an information collection, standardisation and curation mechanism, as this is a vital first step in how knowledge is represented within a biodiversity project, i.e. sample collection metadata as defined in the Darwin Tree of Life. Data structures and packaging models are primarily concerned with the consumption of biodiversity data, and if the metadata provided at the outset is high quality and standardised, downstream tools and APIs can be varied yet still remain FAIR. We modified the text to clarify the focus here on the first steps in the process of metadata collection.</bold>
                </p>
                <p> </p>
                <p> Page 18 - "Bioinformatics tools and services for biodiversity research need to be prioritised" - It may be worthwhile to acknowledge that we do not need monolithic solutions here. We need minimum information standards, stable identifiers and provenance information, good generalisable packaging mechanisms and a software ecosystem that assists with point-to-point or data-class to data-class transformations. Satellite imagery may be a good analogy. Downstream consumers need well-referenced products such as NDVI - these become the components of interest for other more targeted applications. In the same way, we may be best off focusing on a modular approach - develop robust taxonomic frameworks and associated tools, map molecular hypotheses against these frameworks, ensure that data from samples can be consumed as Darwin Core Occurrences and Events, etc.</p>
                <p> </p>
                <p> 
                    <bold>Response: We had hoped to have conveyed this with phrases such as &#x201c;adopting a fixed approach to data analysis is not a realistic option&#x201d; and &#x201c;development should proceed in an environment that encourages innovation while building on and connecting to existing tools and services&#x201d;. Describing these concepts with the suggested term &#x201c;modular approach&#x201d; works well to reinforce these ideas, so we have taken this on board and updated the paragraph to more clearly reflect this message.</bold>
                </p>
                <p> </p>
                <p> </p>
                <p> Other initiatives</p>
                <p> </p>
                <p> Page 7 - "Ongoing efforts to coordinate traditional biodiversity infrastructures" - As well as the initiatives referenced in this paragraph, GBIF and partners have organised two global conferences, each leading to a publication focused on building such coordination (Hobern et al. 2012, Hobern et al. 2019). The later event led to the call for an alliance for biodiversity knowledge (https://www.biodiversityinformatics.org/) which is highly relevant to this paper as an umbrella for cross-infrastructure collaborations. The alliance is also applicable to page 18 "Biodiversity-related and molecular-focused infrastructures need to collaborate".</p>
                <p> </p>
                <p> 
                    <bold>Response: These are indeed important syntheses of efforts to build such coordination, we have updated both paragraphs to highlight the relevance of the alliance for cross-infrastructure collaborations.</bold>
                </p>
                <p> </p>
                <p> Page 8 - Use case 1, paragraph 1 - A topical collection of six papers has recently been published in Organisms Diversity and Evolution from work carried out under the auspices of IUBS. This collection specifically explores the need for a shared taxonomic framework and makes proposals for the required collaboration (Towards a global list of accepted species, see here). Citations for all six papers provided. This is especially applicable to page 18 "Taxonomies need to be aligned and harmonised across domains".</p>
                <p> </p>
                <p> 
                    <bold>Response: We agree that this topical collection exemplifies many of the issues faced in this domain and have now specifically mentioned this in the text.</bold>
                </p>
                <p> </p>
                <p> </p>
                <p> Readability</p>
                <p> </p>
                <p> It may be a relatively minor issue, but many sentences throughout the document are unnecessarily hard to read because the central ideas are delayed to the end of the sentence and/or a passive voice is unnecessarily used. For example, in the abstract, consider rewriting "As a research infrastructure developing services and technical solutions that help integrate and coordinate life science resources across Europe, ELIXIR is a key player" as "ELIXIR plays a key role as a research infrastructure that develops services and technical solutions that help integrate and coordinate life science resources across Europe". I found I needed to re-read several passages a few times to get their sense. In almost all cases, the concepts were correct and important, but obscured by word order.</p>
                <p> </p>
                <p> 
                    <bold>Response: We have been through the text and specifically identified sentences that would benefit from rearrangements as suggested to improve readability.</bold>
                </p>
                <p> </p>
                <p> Page 8 - use case 1 seems in particular need of a rewrite to improve clarity. The first sentence, ("Creating a comprehensive taxonomy linked ...") does not make sense and certainly needs to be rewritten.</p>
                <p> </p>
                <p> 
                    <bold>Response: There are two aspects in a single authoritative list. Going forwards a single list seems obvious, as proposed by Garnett and colleagues the way to go. However, all the legacy data requires a list to include all the synonyms, misidentifications, spelling variants and in order to decide, access to the respective taxonomic treatments. To clarify we use the language used by Garnett et al. that introduces the set of six IUBS commissioned papers mentioned above.</bold>
                </p>
                <p> </p>
                <p> Page 11 - "The MAR database entries are cross-referenced with ENA and the World Register of Marine Species (WoRMS) (Vandepitte et al. 2018) records" - no need for the word "records".</p>
                <p> </p>
                <p> 
                    <bold>Response: Fixed.</bold>
                </p>
                <p> </p>
                <p> Page 11 - "On the one hand, the large and growing variety of observations taken during oceanic sampling (Gorsky et al. 2019) have posed many data management challenges." - "has posed".</p>
                <p> </p>
                <p> 
                    <bold>Response: Fixed.</bold>
                </p>
                <p> </p>
                <p> Page 13 - "Long-standing scientific interests" - "interest".</p>
                <p> </p>
                <p> 
                    <bold>Response: Fixed.</bold>
                </p>
                <p> </p>
                <p> Page 15 - "This species, experiment and sample metadata" - may be clearer as "This metadata on species, experimennts and samples".</p>
                <p> </p>
                <p> 
                    <bold>Response: Agree, updated.</bold>
                </p>
                <p> </p>
                <p> Page 18 - "It is clear that from barcodes to reference genomes, sequencing hundreds of thousands of species in the near future will generate the foundational data for most biodiversity molecular studies for decades to come." - this sentence is awkward and could be rewritten.</p>
                <p> </p>
                <p> 
                    <bold>Response: Agreed, we have re-worked this sentence for clarity.</bold>
                </p>
            </body>
        </sub-article>
    </sub-article>
</article>
