<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.23468.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Assessment of a demonstrator repository for individual clinical trial data built upon DSpace</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 2 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Tilki</surname>
                        <given-names>Birol</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Schulenberg</surname>
                        <given-names>Thomas</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Canham</surname>
                        <given-names>Steve</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Banzi</surname>
                        <given-names>Rita</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kuchinke</surname>
                        <given-names>Wolfgang</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Ohmann</surname>
                        <given-names>Christian</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-5919-1003</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Coordination Centre for Clinical Trials, Heinrich-Heine-University, D&#x00fc;sseldorf, Nordrhine-Westfalia, 40225, Germany</aff>
                <aff id="a2">
                    <label>2</label>European Clinical Research Infrastructure Network, ECRIN, Redhill, Surrey, RH1 6QH, UK</aff>
                <aff id="a3">
                    <label>3</label>Istituto di Ricerche Farmacologiche Mario Negri, IRCCS, Milan, 20156, Italy</aff>
                <aff id="a4">
                    <label>4</label>European Clinical Research Infrastructure Network, ECRIN, D&#x00fc;sseldorf, Nordrhine-Westfalia, 40477, Germany</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:christian.ohmann@med.uni-duesseldorf.de">christian.ohmann@med.uni-duesseldorf.de</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>25</day>
                <month>6</month>
                <year>2020</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2020</year>
            </pub-date>
            <volume>9</volume>
            <elocation-id>311</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>23</day>
                    <month>6</month>
                    <year>2020</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Tilki B et al.</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/9-311/pdf"/>
            <abstract>
                <p>
                    <bold>Background:</bold> Given the increasing number and heterogeneity of data repositories, an improvement and harmonisation of practice within repositories for clinical trial data is urgently needed. The objective of the study was to develop and evaluate a demonstrator repository, using a widely used repository system (DSpace), and then explore its suitability for providing access to individual participant data (IPD) from clinical research.</p>
                <p>
                    <bold>Methods:</bold> After a study of the available options, DSpace (version 6.3) was selected as the software for developing a demonstrator implementation of a repository for clinical trial data. In total, 19 quality criteria were defined, using previous work assessing clinical data repositories as a guide, and the demonstrator implementation was then assessed with respect to those criteria.</p>
                <p>
                    <bold>Results:</bold> Generally, the performance of the DSpace demonstrator repository in supporting sensitive personal data such as that from clinical trials was strong, with 14 requirements demonstrated (74%), including the necessary support for metadata and identifiers. Two requirements could not be demonstrated (inability to incorporate de-identification tools in the submission workflow, lack of a self-attestation system) and three requirements were only partially demonstrated (ability to provide links to de-identification tools and requirements, incorporation of a data transfer agreement in system workflow, and capability to offer managed access through application on a case by case basis).</p>
                <p>
                    <bold>Conclusions:</bold> Technically, the system was able to support most of the pre-defined requirements, though there are areas where support could be improved. Of course, in a productive repository, appropriate policies and procedures would be needed to direct the use of the available technical features. A technical evaluation should therefore be seen as indicating a system&#x2019;s potential, rather than being a definite assessment of its suitability. DSpace clearly has considerable potential in this context and appears a suitable base for further exploration of the issues around storing sensitive data.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Repository</kwd>
                <kwd>clinical trial</kwd>
                <kwd>individual participant data</kwd>
                <kwd>data sharing</kwd>
                <kwd>DSpace</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/100010661">
                    <funding-source>Horizon 2020 Framework Programme</funding-source>
                    <award-id>654248</award-id>
                </award-group>
                <funding-statement>The study has received funding from the project CORBEL European Union&#x2019;s Horizon 2020 research and innovation programme (CORBEL under grant agreement No 654248). </funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>The process of the selection of DSpace as software for developing a demonstrator repository was clearer described. The selection of the quality criteria for assessment of the repository and the reason for missing security features and encryption was better explained. The confusion over the metadata was clarified. In the section &#x00ab;&#x00a0;De-identification practices&#x00a0;&#x00bb;, a line was added in response to the reviewers comment. In &#x00ab;&#x00a0;Formal contract regarding upload and storage&#x00a0;&#x00bb; an explanation reflecting the comment of a reviewer was given. In the section &#x00ab;&#x00a0;Flexibility of access&#x00a0;&#x00bb; the meaning of the term self-attestation has been clarified. The section about &#x00ab;&#x00a0;Long temr preservation and sustainabiltiy&#x00a0;&#x00bb; has been renamed and rewritten. The reasoning for using only public data has been been better explained. In the discussion, the two overarching principles FAIR and TRUST have been introduced. Three references have been added. In addition, some typos/mis-spellings were corrected and minor changes were made to improve the English.</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>The sharing of clinical trial data still occurs mainly with in a closed professional evironment through direct and personal sharing, rather than via accessible data repositories. A multi-stakeholder taskforce addressing this problem recommended that data and documents from clinical trials available for sharing should be transferred to a suitable data repository to help ensure that the data objects are properly prepared, are available in the longer term, are stored securely and are subject to rigorous governance
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>. A recent study has shown that an increasing number of such repositories are available for sharing of individual participant data (IPD) from clinical studies
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>. There are many different types of repositories, however, such as generic repositories for all kinds of life-science data, repositories exclusively for clinical research data and specialised repositories with a specific focus, e.g. a single disease area, and major heterogeneity exists with respect to data-upload, data-handling, and data-access processes. This heterogeneity of repository types and features, reflects both the different purposes and perspectives of repository founders, and the relative immaturity of repository data-sharing services. Given the lack of a consensus about the services required from a data repository, each organisation has implemented its own policies and systems to meet its own priorities. Greater harmonisation of practices within repositories, coupled with the implementation of quality criteria for repositories, may diminish the reluctance of many researchers to share the data from their studies, thus promoting data-sharing, discoverability, and re-use
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>,
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>.</p>
            <p>In a consensus building exercise, the necessity for compliance of repositories for clinical trial data and related data objects with quality criteria was emphasised
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>. The services any repository provides should conform to specified quality standards, to give its users confidence that their data and documents will be stored securely and in accordance with the specific data transfer agreements they have agreed. During the consensus exercise, the importance of getting consent for data archiving, sharing and re-use from research participants was stressed and formulated as one of the essential data sharing principles.</p>
            <p>This paper explores the suitability of a widely used data repository system, DSpace, for supporting the long-term management of IPD generated from clinical research while conforming to defined quality criteria. Though DSpace is a repository system used for open data, it is increasingly used also for restricted data access because it provides several built-in features that make it adaptable for restricted data sharing. The work was carried out as part of a broader set of activities aimed at developing mechanisms for the sharing of IPD from clinical research (
                <ext-link ext-link-type="uri" xlink:href="https://www.corbel-project.eu/home.htm">https://www.corbel-project.eu/home.htm</ext-link>). It builds on previous published papers describing principles and practical recommendations for IPD sharing
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>, offering a detailed analysis of the processes involved in depositing, managing and sharing IPD
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>, and evaluating existing repositories for their suitability for the deposition of IPD, specifically for researchers in the non-commercial sector
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>. In the latter analysis, repositories were assessed against a set of quality criteria, referring to the processes of data upload, storage, de-identification, and quality controls, metadata, identifiers, flexibility of access and long-term preservation. The aim of this paper is to describe the development of a demonstrator repository based on the DSpace system and assess it using a pre-defined set of quality criteria and requirements.</p>
            <p>The reason for developing this repository was to explore further various technical and workflow issues around the long-term management of IPD, in practical terms, using a well-known repository system applied to IPD from clinical research. The demonstrator is intended as an illustrative example only and this paper deals only with technical aspects of the repository system, i.e. its evaluation as a suitable infrastructure. It is clear that many aspects of a repository&#x2019;s suitability for IPD are linked to the procedures and processes implemented by the institution hosting the repository. In other words, a strong technical infrastructure is a necessary but not sufficient indicator of quality.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>Selection of DSpace as software for developing a demonstrator repository</title>
                <p>Writing a bespoke repository system from scratch was seen as unrealistic, given resource constraints, and in any case less useful than using an existing system &#x2013; one that would also be available to potential repository managers. A variety of systems were considered as the possible base system for the demonstrator repository (e.g. Figshare, DSpace). These and other systems were characterised with respect to the following standardised criteria
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>
                    </sup>:</p>
                <list list-type="bullet">
                    <list-item>
                        <p>Name of the system</p>
                    </list-item>
                    <list-item>
                        <p>Contact</p>
                    </list-item>
                    <list-item>
                        <p>Webpage of the system</p>
                    </list-item>
                    <list-item>
                        <p>Level of usage (country)</p>
                    </list-item>
                    <list-item>
                        <p>Short description of the system</p>
                    </list-item>
                    <list-item>
                        <p>Type of activity the system is supporting</p>
                    </list-item>
                    <list-item>
                        <p>Modules/architecture/components included</p>
                    </list-item>
                    <list-item>
                        <p>What data stored with the system</p>
                    </list-item>
                    <list-item>
                        <p>Research use cases/projects/studies the system is used </p>
                    </list-item>
                </list>
                <p>A formal comparison between the systems was not made
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>
                    </sup>, but DSpace was rated as the system with the greatest potential for a demonstrator repository, particularly in an academic context.</p>
                <p>A formal comparison between the systems was not made
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>
                    </sup>, but DSpace was rated as the system with the greatest potential for a demonstrator repository, particularly in an academic context.</p>
                <p>DSpace was selected partly because it appears to be by far the most popular of the various repository systems, with almost 2884 users, 2204 of them listed as &#x2018;academic&#x2019; (including the University of Cambridge, Yale, Duke University and the University of Edinburgh amongst many around the world; 
                    <ext-link ext-link-type="uri" xlink:href="https://duraspace.org/registry/">https://duraspace.org/registry/</ext-link>). Three of 25 repositories for IPD from clinical trial data, evaluated in a recent review, are built upon DSpace (Dryad, Drum, Edinburgh DataShare)
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup>.</p>
                <p>In addition, DSpace is an open source system and can be modified and extended by users. It claims about 100 contributors to the code base, with the Dryad repository, which runs on DSpace, being an example of how the system can be extended. It is possible to download and run a pre-configured &#x2018;out of the box&#x2019; solution, but DSpace also claims to be fully modifiable, even though many of the modifications listed are relatively superficial (e.g. themes, screen configurations, search parameters). The system appeares compliant with most of the relevant standards (e.g. Open Archives Initiative Protocol for Metadata (OAI-PMH)), developed for harvesting metadata descriptions from records), runs on a variety of operating systems and can use either Oracle or PostgreSQL as the back-end database store (
                    <ext-link ext-link-type="uri" xlink:href="https://duraspace.org/dspace/">https://duraspace.org/dspace/</ext-link>). There also appeared to be an active user group and comprehensive documentation, including a Wiki (
                    <ext-link ext-link-type="uri" xlink:href="https://wiki.duraspace.org/display/DSPACE/">https://wiki.duraspace.org/display/DSPACE/</ext-link>). An alternative to DSpace would have been Invenio (
                    <ext-link ext-link-type="uri" xlink:href="https://invenio-software.org/">https://invenio-software.org/</ext-link>), which delivers the repository units for Zenodo, OpenAIRE and CERN Open Data. Invenio appeared very focused on open data, however, while DSpace seemed to offer more possibilities for supporting more managed access. Further details of the candidate systems considered are given in 
                    <xref ref-type="bibr" rid="ref-6">6</xref>.</p>
            </sec>
            <sec>
                <title>Technical infrastructure for the demonstrator repository</title>
                <p>A data repository was established between October 2018 and June 2019 within the Coordination Centre for Clinical Trials at the University of D&#x00fc;sseldorf, by BT (first author) using version 6.3 of DSpace. Additional software was installed to supplement DSpace functioning and manage servers and common server functionality.</p>
                <p>
                    <italic toggle="yes">Full list of the software and hardware used for the repository installations and details of the technical implementation of the demonstrator repository:</italic>
                </p>
                <p>DSpace is a framework of a considerable number of different software tools that must work together to achieve an efficient DSpace installation. Prerequisite software tools must be downloaded, installed, tested, configured and integrated with each other. In addition to DSpace itself, the following were installed: 
                    <list list-type="bullet">
                        <list-item>
                            <p>Ubuntu 16 and Ubuntu 18 (Linux operating system)</p>
                        </list-item>
                        <list-item>
                            <p>Java 8 (Java Development Kit)</p>
                        </list-item>
                        <list-item>
                            <p>Apache Maven 3.3.9 (Java build tool)</p>
                        </list-item>
                        <list-item>
                            <p>Apache Ant 1.9.13 (Java build tool)</p>
                        </list-item>
                        <list-item>
                            <p>PostgreSQL 9.5 (with pgcrypto installed) as the relational database back end</p>
                        </list-item>
                        <list-item>
                            <p>Apache Tomcat 9.0.11 (Java Servlet, Server Pages, and Web Socket Engine)</p>
                        </list-item>
                    </list>
                </p>
                <p>DSpace can be installed at different scales, allowing different amounts of data to be handled. In our usage scenario we assumed the storage of several hundred trials with a size of 10&#x2013;100 MB per trial, uploaded over several years. We therefore decided to install a mid-range version of DSpace, able to accommodate a large number of clinical trials datasets. The virtual server was established with: 
                    <list list-type="bullet">
                        <list-item>
                            <p>6 GB RAM in total: approximately 2 GB for Ubuntu 16/18, 2 GB for PostgreSQL, 2 GB for Tomcat.</p>
                        </list-item>
                        <list-item>
                            <p>200 GB system storage. Deducting 40 GB for system and application use this provides enough storage for 1600 datasets (at 100MB per dataset).</p>
                        </list-item>
                    </list>
                </p>
                <p>This mid-range system is capable of supporting an application with either a large number of items (roughly 50,000 files and associated metadata) or a large volume of activity (searches, accesses, downloads, etc.).</p>
                <p>For testing, publicly available data and documents as from clinical trials were uploaded to the demonstrator repository. The data used are displayed on the welcome page of the DSpace demonstrator repository (
                    <ext-link ext-link-type="uri" xlink:href="http://90.147.75.211:8080/xmlui/">http://90.147.75.211:8080/xmlui/</ext-link>).</p>
            </sec>
            <sec>
                <title>Quality applied to the reference implementation</title>
                <p>The quality criteria used for assessment were developed from an original collection of 34 attributes, themselves derived from previous work and discussion within CORBEL and the IMPACT Observatory project
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup>. These criteria were meant to provide a broad characterisation of a repository and included aspects assessing both a repository&#x2019;s relative maturity and its suitability for clinical research data. From these criteria 8 features were selected as being especially important for clinical researchers wishing to deposit individual participant data (IPD). They were used in a general evaluation of repositories
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup> and were also applied to the DSpace implementation.</p>
                <p>These 8 criteria identified as being key to successful management of IPD are listed below
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup>.

                    <list list-type="bullet">
                        <list-item>
                            <label>1.</label>
                            <p>Guidelines for data upload and storage</p>
                        </list-item>
                        <list-item>
                            <label>2.</label>
                            <p>Support for data de-identification</p>
                        </list-item>
                        <list-item>
                            <label>3.</label>
                            <p>Data quality controls</p>
                        </list-item>
                        <list-item>
                            <label>4.</label>
                            <p>Contracts for upload and storage</p>
                        </list-item>
                        <list-item>
                            <label>5.</label>
                            <p>Available provenance and accessibility metadata</p>
                        </list-item>
                        <list-item>
                            <label>6.</label>
                            <p>Application of identifiers</p>
                        </list-item>
                        <list-item>
                            <label>7.</label>
                            <p>Flexibility of access</p>
                        </list-item>
                        <list-item>
                            <label>8.</label>
                            <p>Repository long term preservation</p>
                        </list-item>
                    </list>
                </p>
                <p>Other standards and criteria for trustworthy digital repositories have been developed and are being applied, e.g., Data Seal of Approval, International Council for Science World Data Systems
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>&#x2013;
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>). These criteria usually examine more generic repository features, for example the nature of the security measures in place, the use of encryption, the technical infrastructure, staff competence, etc. Because in this exercise we were not evaluating a repository, but focusing instead on a specific tool, one that would sit 
                    <italic toggle="yes">within</italic> a repository, we did not look at these more general criteria in detail. Of course activities such as monitoring, reviewing and implementing security measures are very important, but we would see them mainly as the concern of the repository managing DSpace rather than DSpace itself. The relationship between the eight criteria used here and other standards and criteria available for repositories is explored further in the Discussion section (see also 
                    <xref ref-type="table" rid="T3">Table 3</xref>).</p>
                <p>Managing metadata (data about data) is a key requirement of any repository system, though there are two distinct forms of metadata to consider. To promote interoperability and retain meaning within interpretation and analysis, shared data should be, as far as possible, structured, described and formatted using widely recognised data and metadata standards (e.g. Clinical Data Interchange Standards Consortium (CDISC), Core Outcome Measures in Effectiveness Trials (COMET), Medical Dictionary for Regulatory Activities (MedDRA))
                    <sup>
                        <xref ref-type="bibr" rid="ref-1">1</xref>
                    </sup>. The metadata in this context is 
                    <italic toggle="yes">descriptive</italic>, detailing the contents of the data. A repository should be able to check that such metadata is available, ideally in one of a range of specified formats, and support its inclusion with the data (see the details for criteria 1) but the responsibility for providing it rests with the data generators. But there is also a need for 
                    <italic toggle="yes">provenance and accessibility</italic> metadata, which is used to make up a repository's catalogue of content, and which describes, for example, the nature and source of the data, its date(s), the authors, and &#x2013; especially important with sensitive data that is likely to be under managed access &#x2013; how the data can be accessed, including the details of any application procedure. Providing such metadata is the responsibility of the repository itself, although ideally it is done in close collaboration with the data generators. This type of metadata is the subject of criterion 5.</p>
                <p>In order to make the assessment of the criteria more operational and to distinguish features of the system (technical features) from measures around the system (e.g. policies and procedures), the criteria were split into specific requirements. This was performed by the group of authors. 
                    <xref ref-type="table" rid="T1">Table 1</xref> provides a detailed breakdown of the eight criteria in terms of their associated &#x2018;requirements&#x2019; &#x2013; i.e. the features one would normally expect to see implemented. &#x2018;System&#x2019; features (i.e. repository system and its technical features), are distinguished from &#x2018;Procedures&#x2019; (i.e. function of the repository&#x2019;s policies and procedures).</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>Table 1. </label>
                    <caption>
                        <title>Quality criteria and linked requirements.</title>
                        <p>
                            <italic toggle="yes">System:</italic> To be demonstrated by the repository system&#x2019;s technical features. 
                            <italic toggle="yes">Procedures:</italic> Function of the repository&#x2019;s governance, policies, procedures.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Requirement</th>
                                <th align="left" colspan="1" rowspan="1" valign="top"/>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="2" rowspan="1" valign="top">1.&#x00a0;&#x00a0;&#x00a0;Guidelines for data upload and storage (The repository should &#x2026;)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;1a. support a range of file types and metadata schema</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;1b. provide mechanisms for the upload of files, including instructions</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;1c. provide rules and guidelines for data upload and storage</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Procedures</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="2" rowspan="1" valign="top">2.&#x00a0;&#x00a0;&#x00a0;De-identification practices before upload (The repository should &#x2026;)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;2a. be able to provide links to de-identification tools and requirements</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;2b. implement de-identification tools</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;2c. provide requirements and / or guidelines for de-identification</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Procedures</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;2d. provide a consultancy service on de-identification</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Procedures</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="2" rowspan="1" valign="top">3.&#x00a0;&#x00a0;&#x00a0;Control of quality of data (The repository should &#x2026;)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;3a. support quality control in its workflow</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;3b. enforce procedures that promote and monitor data and metadata quality</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Procedures</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="2" rowspan="1" valign="top">4.&#x00a0;&#x00a0;&#x00a0;Formal contract regarding upload and storage (The repository should &#x2026;)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;4a. incorporate a data transfer agreement in system workflow</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;4b. make a comprehensive data transfer agreement mandatory</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Procedures</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="2" rowspan="1" valign="top">5.&#x00a0;&#x00a0;&#x00a0;Application of a metadata schema to describe contents (The repository should &#x2026;)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;5a. use a consistent metadata schema to describe its content</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;5b. allow a customised metadata schema to be applied</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;5c. provide tools to help data generators to complete metadata fields</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;5d. make metadata openly (public) available</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;5e. have policies in place that enforce the application of appropriate metadata</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Procedures</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="2" rowspan="1" valign="top">6.&#x00a0;&#x00a0;&#x00a0;Application of an identifier (The repository should &#x2026;)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;6a. be able to apply a primary persistent identifier system</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;6b. be able to use other persistent identifiers as appropriate</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;6c. have policies and processes that ensure identifiers are applied correctly</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Procedures</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="2" rowspan="1" valign="top">7.&#x00a0;&#x00a0;&#x00a0;Flexibility of access (The repository should &#x2026;)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;7a. allow open access to material, with an optional embargo period</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;7b. allow open access after web-based self-attestation of the user</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;7c. offer managed access through group membership</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;7d. offer managed access through application on a case by case basis</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;7e. support granular access to different parts of datasets collections</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;7f. have policies that ensure access is specified and monitored</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Procedures</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;7g. provide guidance to users on the access options and their implications</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Procedures</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="2" rowspan="1" valign="top">8.&#x00a0;&#x00a0;&#x00a0;Repository long-term preservation (The repository should &#x2026;)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;8a. support long term preservation of data and metadata</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;8b. make use of sustainable software systems</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">System</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x00a0;&#x00a0;&#x00a0;8c. implement policies for preservation of data</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Procedures</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>For example, to support &#x2018;Guidelines for data upload and storage&#x2019;, the requirements for the repository could include: 
                    <list list-type="bullet">
                        <list-item>
                            <label>a)</label>
                            <p>being able to support a wide variety of file and metadata types,</p>
                        </list-item>
                        <list-item>
                            <label>b)</label>
                            <p>providing easy to use mechanisms for the upload of files, including technical instructions,</p>
                        </list-item>
                        <list-item>
                            <label>c)</label>
                            <p>providing rules and guidelines for data upload and storage (e.g. which formats or metadata schema to use and when).</p>
                        </list-item>
                    </list>
                </p>
                <p>a), and b) are mainly aspects of the repository system and its technical features, whilst c) is more a function of the repository&#x2019;s policies and procedures.</p>
                <p>In the context of this study it is important to stress that only the requirements labelled as &#x2018;system&#x2019; attributes in 
                    <xref ref-type="table" rid="T1">Table 1</xref> were evaluated (19 of 29, or 66%). Each of these system features was assessed and its level of fulfilment within DSpace classified as: 
                    <list list-type="bullet">
                        <list-item>
                            <p>demonstrated</p>
                        </list-item>
                        <list-item>
                            <p>partially demonstrated</p>
                        </list-item>
                        <list-item>
                            <p>not demonstrated</p>
                        </list-item>
                    </list>
                </p>
                <p>The assessment of the requirements was performed by BT and based on publicly available information about DSpace (web pages, user manuals, Q&amp;A pages, reports, etc.). DSpace was not contacted directly and but there was contact with the DSpace community. The Coordination Centre for Clinical Trials in D&#x00fc;sseldorf participated at a meeting of the German user community.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <p>The results are summarised in this section and in 
                <xref ref-type="table" rid="T2">Table 2</xref>.</p>
            <table-wrap id="T2" orientation="portrait" position="float">
                <label>Table 2. </label>
                <caption>
                    <title>Summary of the assessment of quality criteria and the requirements.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="top">Requirement</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Result</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Comment</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">1a. The repository should support a range of file
                                <break/>types and metadata schema</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Flexible approach to file storage</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">1b. The repository should provide mechanisms
                                <break/>for the upload of files, including instructions</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Variety of tools available, along with detailed technical
                                <break/>guidance</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">2a. The repository should be able to provide
                                <break/>links to de-identification tools and requirements</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Partially demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Links can be established but have to be set up by system
                                <break/>administrators</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">2b. The repository should implement de-
                                <break/>identification tools</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Not demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Not currently possible</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">3a. The repository should support quality
                                <break/>control in its workflow</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Partially demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Some quality features available, e.g. a submission review
                                <break/>workflow, but not a full quality control system</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">4a. The repository should incorporate a data
                                <break/>transfer agreement in system workflow</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Partially demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Confirmation of a signed data transfer protocol can
                                <break/>be required from the user, but there is no support for
                                <break/>constructing or editing such a document</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">5a. The repository should use a consistent
                                <break/>metadata schema to describe its content</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Impressive range of metadata schemes</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">5b. The repository should allow a customised
                                <break/>metadata schema to be applied</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">A specific metadata schema for clinical research could be
                                <break/>implemented</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">5c. The repository should provide tools to help
                                <break/>data generators to complete metadata fields</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Range of tools available</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">5d. The repository should make metadata
                                <break/>openly (public) available</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Metadata are public</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">6a. The repository should be able to apply a
                                <break/>primary persistent identifier system</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Use of CNRI Handle System</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">6b. The repository should be able to use other
                                <break/>persistent identifiers as appropriate</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Use of other identifiers allowed (e.g. DOI)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">7a. The repository should allow open access to
                                <break/>material, with an optional embargo period</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Sophisticated embargo management as well as full open
                                <break/>access.</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">7b. The repository should allow open access
                                <break/>after web-based self-attestation of the user</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Not demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Not currently possible</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">7c. The repository should offer managed
                                <break/>access through group membership</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Functionality access through group membership
                                <break/>(priviledged users)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">7d. The repository should offer managed
                                <break/>access through application on a case by case
                                <break/>basis</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Possible with the request a copy functionality, but could be
                                <break/>extended further</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">7e. The repository should support granular
                                <break/>access to different parts of datasets collections</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Permissions can be assigned to a priviledged user at the
                                <break/>item, community and collection level</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">8a. The repository should support long term
                                <break/>preservation of data and metadata</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Demonstrated as far as it is a technical issue</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">8b. The repository should make use of
                                <break/>sustainable software systems</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Demonstrated</bold>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Long-term availability and maintenance of system expected</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <table-wrap id="T3" orientation="portrait" position="anchor">
                <label>Table 3. </label>
                <caption>
                    <title>Comparison between the Banzi quality criteria
                        <sup>
                            <xref ref-type="bibr" rid="ref-2">2</xref>
                        </sup> and the other approaches.</title>
                    <p>Grey: not considered by the Banzi criteria
                        <sup>
                            <xref ref-type="bibr" rid="ref-2">2</xref>
                        </sup>.</p>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="top">Criterion
                                <break/>(Banzi 
                                <italic toggle="yes">et al.,</italic> 2019)
                                <sup>
                                    <xref ref-type="bibr" rid="ref-2">2</xref>
                                </sup>
                            </th>
                            <th align="left" colspan="1" rowspan="1" valign="top">ICSU World Data System (2016)
                                <sup>
                                    <xref ref-type="bibr" rid="ref-9">9</xref>
                                </sup>
                            </th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Burton 
                                <italic toggle="yes">et al.</italic> (2015)
                                <sup>
                                    <xref ref-type="bibr" rid="ref-8">8</xref>
                                </sup>
                            </th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Science Europe
                                <break/>(2018)
                                <sup>
                                    <xref ref-type="bibr" rid="ref-10">10</xref>
                                </sup>
                            </th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Hrynaszkiewicz 
                                <italic toggle="yes">et al.</italic>
                                <break/>(2016)
                                <sup>
                                    <xref ref-type="bibr" rid="ref-7">7</xref>
                                </sup>
                            </th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Guidelines for data
                                <break/>upload and storage</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">R9: The repository applies
                                <break/>documented processes and
                                <break/>procedures in managing archival
                                <break/>storage of the data,
                                <break/>R12: Archiving takes place
                                <break/>according to defined workflows
                                <break/>from ingest to dissemination</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">De-identification
                                <break/>practices before
                                <break/>upload</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">R.6: The repository adopts
                                <break/>mechanisms to secure ongoing
                                <break/>expert guidance and feedback
                                <break/>(either inhouse, or external,
                                <break/>including scientific guidance, if
                                <break/>relevant), which could also cover
                                <break/>requirements or guidelines related
                                <break/>to de-identification of uploaded
                                <break/>data.</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Control of quality of
                                <break/>data</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">R11: The repository has
                                <break/>appropriate expertise to address
                                <break/>data and metadata quality and
                                <break/>ensures that sufficient information
                                <break/>is available for end users to make
                                <break/>quality-related evaluations).</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">C6: Quality assurance
                                <break/>and control
                                <break/>C7: Curation and
                                <break/>archiving)</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Formal contract
                                <break/>regarding upload and
                                <break/>storage</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">R2: The repository maintains all
                                <break/>applicable licences covering data
                                <break/>access and use and monitors
                                <break/>compliance; including conditions
                                <break/>of use),</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">C4: Transparent and
                                <break/>accountable; all policies
                                <break/>and written agreements
                                <break/>underpinning a
                                <break/>repository&#x2019;s processes
                                <break/>for data management
                                <break/>(including any legal
                                <break/>contracts) should be
                                <break/>properly documented</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">3. Data access
                                <break/>and usage
                                <break/>licenses; provide
                                <break/>information about
                                <break/>licensing and
                                <break/>permissions</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Implement data use
                                <break/>agreements (DUAs).</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Application of a
                                <break/>metadata schema to
                                <break/>describe contents</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">R8: The repository accepts data
                                <break/>and metadata based on defined
                                <break/>criteria to ensure relevance and
                                <break/>understandability for data users</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">C5: Data and metadata
                                <break/>fidelity</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">2. Metadata;
                                <break/>use metadata
                                <break/>standards that are
                                <break/>broadly accepted
                                <break/>(by the scientific
                                <break/>community)</td>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Application of an
                                <break/>identifier</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td align="left" colspan="1" rowspan="1" valign="top">1. Provision of
                                <break/>persistent and
                                <break/>unique identifiers
                                <break/>(PIDS)).</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Provide stable identifiers
                                <break/>for metadata about non-
                                <break/>public dataset(s))</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Flexibility of access</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Repository long-term
                                <break/>preservation</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">R3: The repository has a continuity
                                <break/>plan to ensure ongoing access and
                                <break/>preservation of its holdings
                                <break/>R10: The repository assumes
                                <break/>responsibility for long-term
                                <break/>preservation and manages
                                <break/>this function in a planned and
                                <break/>documented way</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">C8: Reliable availability
                                <break/>including backup
                                <break/>C10: Preserve
                                <break/>confidentiality, integrity
                                <break/>and availability of
                                <break/>the repository).</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">4. Preservation;
                                <break/>ensure persistence
                                <break/>of metadata and
                                <break/>data,</td>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr style="background-color:#BFBFBF">
                            <td align="left" colspan="1" rowspan="1" valign="top">Transparency and
                                <break/>accountability</td>
                            <td colspan="1" rowspan="1"/>
                            <td align="left" colspan="1" rowspan="1" valign="top">C4: Transparent and
                                <break/>accountable</td>
                            <td colspan="1" rowspan="1"/>
                            <td align="left" colspan="1" rowspan="1" valign="top">Implement a transparent
                                <break/>system for requesting
                                <break/>access to data and
                                <break/>reviewing requests to
                                <break/>access data</td>
                        </tr>
                        <tr style="background-color:#BFBFBF">
                            <td align="left" colspan="1" rowspan="1" valign="top">Timely management</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td align="left" colspan="1" rowspan="1" valign="top">Allowing access to data
                                <break/>in a timely manner and
                                <break/>including a proportionate
                                <break/>review of the scientific
                                <break/>rationale, without
                                <break/>introducing unnecessary
                                <break/>barriers</td>
                        </tr>
                        <tr style="background-color:#BFBFBF">
                            <td align="left" colspan="1" rowspan="1" valign="top">Metadata repository</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td align="left" colspan="1" rowspan="1" valign="top">2. Enabling
                                <break/>referencing to
                                <break/>related relevant
                                <break/>information, such
                                <break/>as other data and
                                <break/>publications and
                                <break/>asks</td>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr style="background-color:#BFBFBF">
                            <td align="left" colspan="1" rowspan="1" valign="top">Data versioning</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td align="left" colspan="1" rowspan="1" valign="top">1. Support of data
                                <break/>versioning</td>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr style="background-color:#BFBFBF">
                            <td align="left" colspan="1" rowspan="1" valign="top">Audit of repositories</td>
                            <td colspan="1" rowspan="1"/>
                            <td align="left" colspan="1" rowspan="1" valign="top">C9: Effective audits</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr style="background-color:#BFBFBF">
                            <td align="left" colspan="1" rowspan="1" valign="top">Adequate funding and
                                <break/>staff</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">R5: The repository has adequate
                                <break/>funding and sufficient members of
                                <break/>qualified staff managed through
                                <break/>a clear system of governance to
                                <break/>effectively carry out the mission</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr style="background-color:#BFBFBF">
                            <td align="left" colspan="1" rowspan="1" valign="top">Disvoverability</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">R13: The repository enables users
                                <break/>to discover the data and refer to
                                <break/>them in a persistent way through
                                <break/>proper citation</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr style="background-color:#BFBFBF">
                            <td align="left" colspan="1" rowspan="1" valign="top">Technical
                                <break/>infrastructure</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">R15: The repository functions on
                                <break/>well-supported operating systems
                                <break/>and other core infrastructural
                                <break/>software and is using hardware and
                                <break/>software technological appropriate
                                <break/>to the services it provides to its
                                <break/>designated community.
                                <break/>R16. The technical infrastructure
                                <break/>of the repository should provide
                                <break/>for protection of the facility and its
                                <break/>data, product, services, and users</td>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                        <tr style="background-color:#BFBFBF">
                            <td align="left" colspan="1" rowspan="1" valign="top">Authenticity and
                                <break/>integrity of the data</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">R7: The repository guarantees the
                                <break/>integrity and authenticity of the
                                <break/>data</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">C10: Preserve
                                <break/>confidentiality, integrity
                                <break/>and availability of the
                                <break/>repository</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Enable data
                                <break/>authenticity and
                                <break/>integrity</td>
                            <td colspan="1" rowspan="1"/>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <sec>
                <title>Guidelines for data upload and storage</title>
                <p>DSpace exhibits a flexible approach to file storage by supporting a range of file types and metadata schemas (1a demonstrated). With a variety of tools available, along with detailed technical guidance, it also provides mechanisms for upload of files, including instructions (1b demonstrated).</p>
            </sec>
            <sec>
                <title>De-identification practices before upload</title>
                <p>The DSpace system has no published requirements or guidelines relating to the de-identification of uploaded data. It is the submitter&#x2019;s responsibility to ensure that documents are consistent with current standards, guidelines and policies from official bodies and scientific organisations. The submitter is, however, able to use links to requirements, guidelines and/or tools, if these are established by the system&#x2019;s administrator (2a partially demonstrated). As far as we could tell, neither the DSpace repository system nor the user community have implemented de-identification tools or programs, able to perform and document de-identification on an existing dataset (2b not demonstrated). Having said that it is worth noting that, should such support tools be created, DSpace does provide a task management system (known as the 'Curation System') in which such tools can be integrated and configured.</p>
            </sec>
            <sec>
                <title>Control of quality of data</title>
                <p>The control of the quality of data is more a question of procedures and workflow around a repository than technical features available in a particular system. Nevertheless, there are some technical features that could facilitate a quality control workflow. Some of these features are available within DSpace, usually as optional and configurable additions to the data upload process but they are limited to a predefined review workflow. This covers a single reviewer workflow, collection&#x2019;s workflow steps and a score review workflow. This is certainly an important feature but does not correspond to a full quality-controlled process, which needs additional features like monitoring and tracking uploads, rejections, edits; reports about reviews in process and performed, etc. (3a partially demonstrated).</p>
            </sec>
            <sec>
                <title>Formal contract regarding upload and storage</title>
                <p>A formal data transfer contract signed by the data generator and the repository administrator should be a prerequisite for transferring clinical trial data to a repository, not least to clarify potential legal responsibilities under data protection legislation. At the end of the manual submission process in DSpace, the submitter (data generator) is asked to grant the repository service an appropriate distribution license (different licences can be made available to different user communities). The distribution license can be edited or customised, however, the platform does not provide a user interface to do this easily. Agreeing a distribution licence is not the same, however, as enforcing a data transfer agreement. Confirmation of the existence of a signed data transfer protocol can be required from the user, i.e. integrated within the distribution licence, if implemented. The demonstrator repository is not, however, able to provide support for constructing and editing such a document (4a partially demonstrated).</p>
            </sec>
            <sec>
                <title>Application of a metadata schema to describe contents</title>
                <p>DSpace can support multiple extended metadata schemas for describing an item. A qualified Dublin Core metadata is provided by default. Multiple schemas can be configured, and metadata fields selected from a mix of configured schemas to describe items (5a demonstrated). In addition, a new metadata schema can be created. In the demonstrator repository, the ECRIN Clinical Trial Metadata Schema was created
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup> (5b demonstrated). DSpace has several tools to help data generators export content and metadata, ingest content and metadata tools and batch edit metadata (5c demonstrated). DSpace offers OAI-MPH, a standard protocol for metadata harvesting. Metadata are public in DSpace. Communities, Collections and Items are discoverable in the browse and search systems regardless of read authorisation. Therefore, everyone can access metadata of items openly (5d demonstrated).</p>
            </sec>
            <sec>
                <title>Application of an identifier</title>
                <p>DSpace uses the CNRI Handle System primarily to create a persistent identifier for every object (item, collection and community) stored in the system (6a demonstrated). DSpace also allows other persistent identifiers, such as a digital object identifier (DOI), to be applied to data sets to improve discoverability and to allow correct citation in DSpace. This is in parallel to the Handle System (6b demonstrated).</p>
            </sec>
            <sec>
                <title>Flexibility of access</title>
                <p>DSpace has sophisticated embargo management as well as full open access. Embargo settings allow submitters to define embargoes linked to specific dates, that by default are applied to all anonymous (non-administrator) access requests. Advanced embargo settings can be used to apply (or exclude) embargo policies for particular user groups (7a demonstrated). The DSpace system supports several common authentication systems, but web based self-attestation is not supported (7b not demonstrated). In this context the term 'self-attestation' refers to a registration like process where the user first has to provide information about themselves, including their contact details, and give details of the purpose for which they intend to use the data, together with any other information required by the data managers. Email details would then normally be verified (by clicking on a validation link sent to the address provided) before access would be granted.</p>
                <p>Resources can be made available only to certain "privileged" users, and this functionality allows access through group membership to be implemented (7c demonstrated). The &#x2018;request a copy&#x2019; functionality exists in DSpace to facilitate access in cases when uploaded content is not openly shared. With this feature, the data submitter or owner interacts directly with the requester on a case-by-case basis. More complex request evaluation processes, for example involving a data access committee, are not directly supported in DSpace, though could in theory be integrated into any dialog between the requestor and the data submitter (7d demonstrated). The DSpace administrator can assign permissions to a privileged user at the item, community and collections level, allowing granular access to different parts of datasets collections (7e demonstrated).</p>
            </sec>
            <sec>
                <title>Long-term preservation and sustainability</title>
                <p>These are two related issues, one dealing with the preservation of the data in the long term, the second with the sustainability of the repository itself. A repository&#x2019;s longevity will mainly be dependent on resourcing and institutional commitment, and given the inevitable uncertainties around both of these a clear policy about what should happen to data if a repository is closed would clearly be a requirement for most potential users., At a technical level, however, provides some support for long term preservation mechanisms, e.g. checksums can be applied and verified on all items. It can also be integrated with the open source archiving system Archivematica, allowing the generation of system-independent Archival Information Packages (AIPs)
                    <sup>
                        <xref ref-type="bibr" rid="ref-12">12</xref>
                    </sup> (8a, in so far it is a technical issue, demonstrated). DSpace also claims to have implemented a strategic plan for sustainability. Because it uses open technology, has a broad dissemination and usage, with a large user community and many diverse applications, the long-term availability and maintenance of the system is expected, if not guaranteed (8b demonstrated).</p>
            </sec>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <sec>
                <title>Assessment of the demonstrator repository</title>
                <p>The performance in supporting sensitive personal data such as that from clinical trials was strong, with 14 requirements demonstrated (74%). This included strong support for different file types and metadata systems, a range of access control systems, including embargoes and granular access management, an integrated persistent identifier scheme plus support for other identifiers like DOIs, and good support for data management in the long-term.</p>
                <p>Of the two areas that were not demonstrated at all, the first &#x2013; the inability to incorporate de-identification tools in the submission workflow &#x2013; is arguably an over ambitious requirement. Although general techniques certainly exist for de-identification this should normally be an exercise that is planned, documented and tested on a study-by-study basis, rather than an automatic process. Having links available to de-identification resources is probably a more realistic requirement.</p>
                <p>The second missing requirement, the lack of a self-attestation system, is a feature that some data generators might want to use, as it requires much less administrative overhead then setting up access rights for groups and individuals. It would require an administrator to define the fields required for self-attestation and, like the current user registration process, it could be backed up by a system requiring confirmation of the email address given. Given the range of other access options available in DSpace it may not be a serious omission, but it is a missing feature that would be &#x2018;nice to have&#x2019;.</p>
                <p>Of the three areas that were partially demonstrated, the need for repository managers to establish links to de-identification and other tools, rather than have them built-in to the system, may represent an additional task but it is one that should be relatively easy to do. It can also be argued that this approach is more flexible, and easier to keep up to date, than a set of links integrated into the system.</p>
                <p>The second partially demonstrated area related to quality control. The submission workflow allows for up to three review stages, which is good, but few other elements of quality control and monitoring seemed to be built into the system. For repository managers handling sensitive data, it would be useful to have reports on upload and access or access request activity, and the ability to integrate checklists of required features or information (such as de-identification status, metadata completeness, access types allowed or identifiers applied), as might be applied during the review process, to tag on the data itself (i.e. within internal system metadata). This would allow the status of the data in the repository to be better monitored and potential issues with data quality and/or legal issues to be more quickly identified.</p>
                <p>The third partially demonstrated issue related to data transfer agreements, governing the terms of data upload and storage. Sensitive data requires more than a simple upload to a repository because, unless the data is fully anonymised, there are likely to be legal issues that need to be clarified, for instance exactly which institution is acting as the Data Controller, as that term is defined in the General Data Protection Regulation (GDPR). (At the very least, the legal status of the data needs to be clear, i.e. does it fall under data protection legislation, and if so which, or is it exempt from such consideration because of the way it has been prepared.) In addition, there may be questions about who is responsible for versioning data if it is changed, for paying any associated costs, about the access management required, and who needs to review access requests if access is managed (etc.). These considerations go well beyond any general agreement whereby data generators simply grant the repository the right to make their data available under a selected licence &#x2013; and for sensitive data they may need to be considered on a study by study basis.</p>
                <p>It would therefore be very useful if &#x2013; as a configured option &#x2013; the system could enforce a clear check that such a data transfer agreement was in place, preferably with the date of its application. (At the moment that seems possible, but a rather complex workaround is required.) It would be even better if the system could also indicate where the data transfer agreement was stored and link to it, or even display its provisions within the system. Ideally, a mature system would even allow the agreement to be drafted and agreed within the system, as part of a private interchange between the data uploader and the repository.</p>
            </sec>
            <sec>
                <title>Weaknesses of the study</title>
                <p>A limitation of the study is that it is focusing only on the 8 repository features defined in Banzi 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup>. Other quality features not considered here may also be very important, for example good data security. This study should therefore be seen as a starting point, which will need further extension, perhaps using alternate approaches and systems (see next section).</p>
                <p>We focused on attributes that we thought were particularly important for clinical trial and similar data. Aspects of quality for data repositories that have been cited by other authors, but which have not been explicitly considered in our approach include: 
                    <list list-type="bullet">
                        <list-item>
                            <p>Transparency and accountability</p>
                        </list-item>
                        <list-item>
                            <p>Timely management</p>
                        </list-item>
                        <list-item>
                            <p>Metadata repository</p>
                        </list-item>
                        <list-item>
                            <p>Data versioning</p>
                        </list-item>
                        <list-item>
                            <p>Auditing of repositories</p>
                        </list-item>
                        <list-item>
                            <p>Adequate funding and staff</p>
                        </list-item>
                        <list-item>
                            <p>Discoverability</p>
                        </list-item>
                        <list-item>
                            <p>Technical infrastructure</p>
                        </list-item>
                    </list>
                </p>
                <p>Transparency and accountability have been referenced by Hrynaszkiewicz 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup> and by Burton 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>. Allowing access to data in a timely manner and including a proportionate review of the scientific rationale, without introducing unnecessary barriers has been formulated by 
                    <xref ref-type="bibr" rid="ref-7">7</xref>. Science Europe supports the idea of a metadata repository, enabling referencing to related relevant information, such as other data and publications and asks for support of data versioning
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>. Effective audits are proposed by Burton 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>. The ICSU World Data System requires that the repository has adequate funding and sufficient members of qualified staff managed through a clear system of governance to effectively carry out the mission and that the repository enables users to discover the data and refer to them in a persistent way through proper citation
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup>. The ICSU World Data System
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup> requires that the repository functions on well-supported operating systems and other core infrastructural software and is using hardware and software technological appropriate to the services it provides to its designated community. In addition, the technical infrastructure of the repository should provide for protection of the facility and its data, product, services, and users
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>. The need to try and integrate these different approaches to assessing data repositories is discussed in the next section.</p>
                <p>Another weakness of the study is that the assessment of the quality criteria is (necessarily) subjective &#x2013; the criteria are not quantitative. In our approach, a rather simple scale based upon &#x201c;demonstrated&#x201d;, &#x201c;partially demonstrated&#x201d; and &#x201c;not demonstrated&#x201d; was used. The definition of the different categories may not have been precise enough to give an accurate representation of the repository&#x2019;s functioning.</p>
                <p>Finally, there may be an issue related to the sources and completeness of the information used. We only took publicly available information about DSpace into consideration (web pages, user manuals, Q&amp;A pages, reports, etc.). We did not contact DSpace directly and were not in contact with their developers. We did, however, participate at a meeting of the German user community and had discussions with a DSpace user. It should be noted, however, that transparency has been formulated as one the main principles for trusted repositories: &#x201c;In order to select the most appropriate repository for a particular use case, all potential users benefit from being able to easily find and access information on the scope, target user community, policies, and capabilities of the data repository.&#x201d;
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>. As a consequence, publicly available information should be sufficient to basically assess a repository.</p>
            </sec>
            <sec>
                <title>Approaches and systems for assessing the quality of repositories</title>
                <p>There are overarching general principles that address aspects around data management and data repositories on a very high level. In the FAIR principles, it is formulated that data should be Findable, Accessible, Interoperable and Reusable
                    <sup>
                        <xref ref-type="bibr" rid="ref-14">14</xref>
                    </sup>. The TRUST principles formulate guidance for digital repositories of research data with a focus on Transparency, Responsibility, User focus, Sustainability and Technology
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>. Concrete guidelines, recommendations and best practice for data sharing and for trusted repositories should follow these principles and should provide concrete help for implementation of these principles.</p>
                <p>Different approaches have been used to assess the quality of repositories dedicated to data sharing, both of sensitive data and more generally, with different emphases laid upon different features. For instance, Hrynaszkiewicz 
                    <italic toggle="yes">at al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup> proposed additional features for data repositories to better accommodate non-public clinical datasets, including Data Use Agreements, whilst Burton 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup> introduced the term &#x201c;Data Safe Haven&#x201d;, for sensitive data, and provided 12 criteria that characterised such a haven.</p>
                <p>The Core Trustworthy Data Repositories Requirements
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup> are intended to reflect the characteristics of trustworthy repositories (for all types of data). All requirements are mandatory and are equally weighted, standalone items. Although some overlap is unavoidable, duplication of the evidence sought among requirements has been kept to a minimum where possible. The choices contained in the supplied checklists (e.g., repository type and curation level) are not considered to be comprehensive, and additional space is provided in all cases for the applicant to add &#x2018;other&#x2019; (more idiosyncratic) information. This and any comments given may then be used to refine such lists in the future. The CoreTrustSeal Board offers all interested data repositories a core-level certification based on the DSA&#x2013;WDS Core Trustworthy Data Repositories Requirements catalogue and procedures
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup>.</p>
                <p>One initiative of Science Europe
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup> was to develop a set of core requirements for data management plans (DMPs), as well as a list of criteria for the selection of trustworthy repositories where researchers can store their data for sharing. The different approaches are compared in 
                    <xref ref-type="table" rid="T3">Table 3</xref>. In light of the development of the European Open Science Cloud (EOSC) and the increasing pressure for data sharing, these requirements and criteria should help to harmonise rules on data management throughout Europe. This will aid researchers in complying with research data management requirements even when working with different research funders and research organisations.</p>
                <p>In general, it may be necessary to better distinguish between criteria that are properties of the underlying infrastructure (e.g. staff preparation, physical security, logical security, appropriate technology) and those which are more tightly coupled to a specific repository system. In fact, we would suggest that there are three (overlapping) &#x2018;layers&#x2019; of attributes that need to be considered &#x2013; those associated with the underlying organisational infrastructure, those linked to the repository&#x2019;s technical systems, and those derived from procedures and workflows
                    <italic toggle="yes">.</italic> Future attempts to assess the quality of repositories should perhaps consider these layers more explicitly. In this study we were focused on the &#x2018;system&#x2019; attributes, but a broader description and assessment of a demonstrator repository should examine all three aspects, perhaps across each of the three main functional areas of a data repository, i.e. data upload, data storage and data access.</p>
                <p>None of the approaches described above is sufficient to classify the quality of repositories for clinical trial data, as pointed out by Banzi 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup>. It may be that we need to differentiate criteria that should apply to all or most data repositories from those that only apply, or become more significant, in the context of particular types of data, like IPD. A general assessment, and especially a general &#x2018;score&#x2019;, of repositories may therefore be less meaningful than an assessment for particular types of data or data usage. Despite these difficulties we believe that it would be useful to try and achieve a consensus about what &#x2018;quality&#x2019; means in terms of data repositories, in different contexts, both to support repository managers and to help guide and promote their use by researchers.</p>
            </sec>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusion</title>
            <p>We assessed the suitability of DSpace to support a repository of sensitive data, such as that from clinical trials, using quality criteria that we had previously identified as being critical to managing such data. Technically, the system was able to support most of the features required, including the necessary support for metadata and identifiers, though there are areas &#x2013; for instance explicit support of data transfer agreements &#x2013; where support could be improved. Of course, in a productive repository, appropriate policies and procedures would be needed to direct the use of the available technical features. A technical evaluation should therefore be seen as indicating a system&#x2019;s potential, rather than being a definite assessment of its suitability. DSpace clearly has considerable potential in this context and appears a suitable base for further exploration of the issues around storing sensitive data.</p>
            <p>This work should stimulate the discussion about quality assessment and certification of repositories. The discussion is of particular importance for repository managers as well as standardising organisations in the field (e.g. Data Seal of approval). Another target group are researchers willing to deposit data in a repository, who have an interest that definite quality criteria are fulfilled by the repository.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <p>All data underlying the results are available as part of the article and no additional source data are required.</p>
            <p>The ECRIN demonstrator repository for clinical trial data: 
                <ext-link ext-link-type="uri" xlink:href="http://90.147.75.211:8080/xmlui/">http://90.147.75.211:8080/xmlui/</ext-link>
            </p>
            <p>Additional information on the CORBEL project is available on the CORBEL website (
                <ext-link ext-link-type="uri" xlink:href="https://www.corbel-project.eu/home.html">https://www.corbel-project.eu/home.html</ext-link>).</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>The authors wish to thank Serena Battaglia (ECRIN) for technical and organisational support of the study and Stefan Klein (Biomedical Imaging Group Rotterdam, Erasmus University, Netherlands) for support of handling images with XNAT and the Instituto Nationale di Fisica Nucleare (INFN) (Italy) for server and services support.</p>
        </ack>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ohmann</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Banzi</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Canham</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Sharing and reuse of individual participant data from clinical trials: principles and recommendations.</article-title>
                    <source>

                        <italic toggle="yes">BMJ Open.</italic>
</source>
                    <year>2017</year>;<volume>7</volume>(<issue>12</issue>):<fpage>e018647</fpage>.
                    <pub-id pub-id-type="pmid">29247106</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmjopen-2017-018647</pub-id>
                    <pub-id pub-id-type="pmcid">5736032</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Banzi</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Canham</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kuchinke</surname>
                            <given-names>W</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evaluation of repositories for sharing individual-participant data from clinical studies.</article-title>
                    <source>

                        <italic toggle="yes">Trials.</italic>
</source>
                    <year>2019</year>;<volume>20</volume>(<issue>1</issue>):<fpage>169</fpage>.
                    <pub-id pub-id-type="pmid">30876434</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13063-019-3253-3</pub-id>
                    <pub-id pub-id-type="pmcid">6420770</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Austin</surname>
                            <given-names>CC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brown</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fong</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Research data repositories: Review of current features, gap analysis, and recommendations for minimum requirements.</article-title>
                    <source>

                        <italic toggle="yes">IASSIST Quarterly.</italic>
</source>Winter<year>2015</year>;<volume>39</volume>(<issue>4</issue>):<fpage>24</fpage>.
                    <pub-id pub-id-type="doi">10.29173/iq904</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <collab>OECD Global Science Forum</collab>:
                    <article-title>Business models for sustainable research data repositories</article-title>. OECD Science Technology and Industry Policy Paper No. 47,<year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=DSTI/STP/GSF(2017)1/FINAL&amp;docLanguage=En">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ohmann</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Canham</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Banzi</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Classification of processes involved in sharing individual participant data from clinical trials [version 2; peer review: 3 approved].</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>2018</year>;<volume>7</volume>:<fpage>138</fpage>.
                    <pub-id pub-id-type="pmid">29623192</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.13789.2</pub-id>
                    <pub-id pub-id-type="pmcid">5861517</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Banzi</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Canham</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ohmann</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Report about a workshop on sensitive data: Repositories for sharing individual participant data from clinical trials and existing tools/services for storing clinical trial data (Version 1).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2018</year>; Assessed 15 April 2020.
                    <pub-id pub-id-type="doi">10.5281/zenodo.1438261</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hrynaszkiewicz</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Khodiyar</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Andrew</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Publishing descriptions of non-public clinical datasets: proposed guidance for researchers, repositories, editors and funding organisations.</article-title>
                    <source>

                        <italic toggle="yes">Res Integr Peer Rev.</italic>
</source>
                    <year>2016</year>;<volume>61</volume>:<fpage>6</fpage>.
                    <pub-id pub-id-type="doi">10.1186/s41073-016-0015-6</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Burton</surname>
                            <given-names>PR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Murtagh</surname>
                            <given-names>MJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Boyd</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Data Safe Havens in Health Research and Healthcare.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2015</year>;<volume>31</volume>(<issue>20</issue>):<fpage>3241</fpage>&#x2013;<lpage>8</lpage>.
                    <pub-id pub-id-type="pmid">26112289</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btv279</pub-id>
                    <pub-id pub-id-type="pmcid">4595892</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <collab>ICSU, World Data System</collab>:
                    <article-title>Core Trustworthy Data Repositories Requirements</article-title>.<year>2016</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.coretrustseal.org/wp-content/uploads/2017/01/Core_Trustworthy_Data_Repositories_Requirements_01_00.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <collab>Science Europe</collab>:
                    <article-title>Practical guide for the international alignment of research data management</article-title>.<year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.scienceeurope.org/our-resources/practical-guide-to-the-international-alignment-of-research-data-management/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Canham</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ohmann</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <article-title>ECRIN Clinical Research Metadata Schema Version 2 (April 2018).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2018</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.1312539</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <article-title>Archivematica  - DSpace exports</article-title>. accessed 17/06/2020.
                    <ext-link ext-link-type="uri" xlink:href="https://www.archivematica.org/en/docs/archivematica-1.11/user-manual/transfer/dspace/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lin</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Crabtree</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dillo</surname>
                            <given-names>I</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The TRUST Principles for digital repositories.</article-title>
                    <source>

                        <italic toggle="yes">Sci Data.</italic>
</source>
                    <year>2020</year>;<volume>7</volume>(<issue>1</issue>):<fpage>144</fpage>.
                    <pub-id pub-id-type="pmid">32409645</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41597-020-0486-7</pub-id>
                    <pub-id pub-id-type="pmcid">7224370</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wilkinson</surname>
                            <given-names>WD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dumontier</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jan Aalbersberg</surname>
                            <given-names>IJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The FAIR Guiding Principles for scientific data management and stewardship.</article-title>
                    <source>

                        <italic toggle="yes">Sci Data.</italic>
</source>
                    <year>2016</year>;<volume>3</volume>:<fpage>160018</fpage>.
                    <pub-id pub-id-type="pmid">26978244</pub-id>
                    <pub-id pub-id-type="doi">10.1038/sdata.2016.18</pub-id>
                    <pub-id pub-id-type="pmcid">4792175</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report65528">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.27604.r65528</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Martinez-Garcia</surname>
                        <given-names>Agustina</given-names>
                    </name>
                    <xref ref-type="aff" rid="r65528a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-1440-5829</uri>
                </contrib>
                <aff id="r65528a1">
                    <label>1</label>University of Cambridge, Cambridge, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>26</day>
                <month>6</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Martinez-Garcia A</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport65528" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.23468.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>No further comments: the authors have addressed suggestions and amendments in this version.</p>
            <p> </p>
            <p> There is only a minor typo: the sentence "A formal comparison between the systems" in the Methods section appears twice.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Not applicable</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Digital archiving, digital repository platforms, research data management</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report63998">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.25899.r63998</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Baxter</surname>
                        <given-names>Rob</given-names>
                    </name>
                    <xref ref-type="aff" rid="r63998a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-3693-8725</uri>
                </contrib>
                <aff id="r63998a1">
                    <label>1</label>EPCC, University of Edinburgh, Edinburgh, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>2</day>
                <month>6</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Baxter R</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport63998" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.23468.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The article describes a useful experiment in trialling DSpace as a candidate repository for potentially sensitive clinical trial data. The assessment criteria used focussed on the &#x201c;system&#x201d; level, keeping the scope manageable, and map well onto more formal existing frameworks. The conclusions that DSpace is not a bad place to start &#x2014; necessary but insufficient &#x2014; are sound and offer a useful guide to people faced with similar challenges in enabling the sharing of sensitive data. I have a few specific observations around methods and analysis, noted below.</p>
            <p> </p>
            <p> While geared more towards open data, the FAIR principles (
                <ext-link ext-link-type="uri" xlink:href="https://www.force11.org/fairprinciples">https://www.force11.org/fairprinciples</ext-link>) are an increasingly important set of criteria for research data repos and complement some of the approaches in Table 3. Perhaps they could be added to the mapping?</p>
            <p> </p>
            <p> There is no mention of encryption in the 8 assessment criteria, but encryption is hinted at in the software config ("PostgreSQL 9.5 (with pgcrypto installed) as the relational database back end&#x201d;). For a repo system handling sensitive data, I&#x2019;d like to see encryption at rest and encryption in flight as two additional criteria. Perhaps this is implicit in the experiment (the pgcrypto extension offers a tantalising hint!), and if so it&#x2019;s worth making it explicit. If encryption 
                <italic>wasn&#x2019;t</italic> considered as a criterion, it&#x2019;s worth adding an explanation: certainly encrypting archive data is controversial &#x2014; what if you lose the keys? &#x2014;&#x00a0;but an Internet-accessible database of sensitive data is a worrying thing to have exposed unencrypted.</p>
            <p> </p>
            <p> General, automatic de-identification of data is hard, as I&#x2019;m sure the authors are fully aware! While they do cover de-identification support (or rather, the lack of it) in DSpace, I wonder if they would like to comment on whether they would regard some form of basic personally-identifiable data quality checking as a &#x201c;must&#x201d; for repository systems dealing with sensitive data? (Looking for names, addresses, email addresses, etc. in submissions.) How easy would it be for an absent-minded researcher to upload PII into DSpace and make it publicly readable by default? Should the assessment criteria be tighter here? Perhaps this is food for future work.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Not applicable</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Development &amp; provision of large-scale data services for both open and sensitive data (the Edinburgh International Data Facility, the Scottish National Safe Haven).</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <sub-article article-type="response" id="comment5643-63998">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Ohmann</surname>
                            <given-names>Christian</given-names>
                        </name>
                        <aff>ECRIN, Germany</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>22</day>
                    <month>6</month>
                    <year>2020</year>
                </pub-date>
            </front-stub>
            <body>
                <p>A reference to the FAIR and TRUST principles was included in the paper.</p>
                <p>It was explained why security features end encryption were not considered in detail in the paper.&#x00a0;</p>
                <p>The recommendation about exploring basic personally-identifiable data quality checking as a "must" for repository systems will be followed up in future work.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report62923">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.25899.r62923</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Martinez-Garcia</surname>
                        <given-names>Agustina</given-names>
                    </name>
                    <xref ref-type="aff" rid="r62923a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-1440-5829</uri>
                </contrib>
                <aff id="r62923a1">
                    <label>1</label>University of Cambridge, Cambridge, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>20</day>
                <month>5</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Martinez-Garcia A</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport62923" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.23468.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <bold>Summary</bold>
            </p>
            <p> This is the review of the research paper &#x201c;Assessment of a demonstrator repository for individual clinical trial data built upon the DSpace open source platform&#x201d;. The paper describes the assessment and implementation of a repository demonstrator, for the storage and dissemination of clinical trial data, with a particular focus on Individual Participant Data (IPD). The developed demonstrator is built upon the open source and community developed DSpace repository platform (
                <ext-link ext-link-type="uri" xlink:href="https://duraspace.org/dspace/">https://duraspace.org/dspace/</ext-link>). This repository platform is data agnostic and can be used to both serve fully open content and content that requires some form of managed access. This paper will be very useful to those looking at evaluating repository platforms for archiving and disseminating research data more generally.</p>
            <p> The paper focuses on describing the technical criteria used for assessing the suitability of this platform for the storage and dissemination of clinical data, although a good overview of other operational aspects such as the development of guidelines, data deposition rules and quality review in the context of repository submission workflows, is also described. It also includes a summary of the technical requirements (software dependencies and deployment infrastructure) which can be useful to others evaluating the use of this repository platform for the storage and dissemination of research data.</p>
            <p> 
                <bold>Research methodology</bold>
            </p>
            <p> Overall, the paper includes sufficient details about the methods and analysis undertaken. The authors have explored recent studies in the area, i.e. the suitability assessment presented builds upon a previous study looking at a range of existing repository platforms for sharing clinical trial data, and sensitive data more broadly. The results from this study are the basis for selecting the DSpace platform. In this respect, and although the authors include references to materials where the rationale for selecting this platform is presented, it would have been useful to include a summary table outlining key criteria and some details of the other platforms evaluated. The paper only mentions other platforms (e.g. Figshare, or Zenodo) in passing.</p>
            <p> One strength of the paper is that the authors reflect on and present the perceived weaknesses of their study. However, and given the sensitive nature of the data underpinning clinical trials, I found it quite surprising that data security features were not included as part of the key criteria defined for this initial assessment, given that not meeting these criteria could impact the suitability of this platform for the archival of clinical data. The authors acknowledge this weakness of their study and state that criteria relating to data security should be considered in future extensions of the study. As part of future assessment, the authors should consider looking at robust security testing of the platform, such as performing penetration testing.</p>
            <p> Another weakness of the study, even though the authors acknowledge it in the paper, is that they have only evaluated openly available documentation for the DSpace platform. Such documentation can often be incomplete in community-based projects, owing to potential lack of resources. More detail about why they took this approach would have been useful. Moreover, and given that DSpace is a very popular platform within the academic community as acknowledged in the paper, the authors could have informally contacted other institutions currently using the platform to find out more about their experiences of the platform when put to similar uses, and their opinion on the platform&#x2019;s strengths and weaknesses.</p>
            <p> 
                <bold>Content review</bold>
            </p>
            <p> The paper reads very well, and the content structure is appropriate. The &#x201c;Introduction&#x201d; section sets the scene nicely and provides sufficient background information, with relevant and current literature references. One minor observation is that, when authors introduce the work of a dedicated taskforce addressing the problem of current forms of sharing clinical data, and propose to use data repositories, there is no mention of the importance of gaining consent for data archiving, sharing and re-use from research participants. This is a key barrier to data sharing, and one that we encounter as providers of Research Data Management Services, when researchers wish to deposit their data with our Institutional Repository.</p>
            <p> The &#x201c;Methods&#x201d; section is&#x00a0;well developed: the &#x201c;Technical infrastructure for the demonstrator repository&#x201d; section provides useful details for those seeking to use similar platforms; and sufficient information is provided so that a similar assessment can be performed on other platforms, or for study replication (even though the analysis is partially qualitative). As mentioned earlier, it would have been useful to include a summary table outlining key criteria and some details of the other platforms evaluated for completeness.</p>
            <p> In the &#x201c;Assessment of quality criteria for the reference implementation&#x201d;, the paragraph beginning with &#x201c;To promote interoperability &#x2026;&#x201d; is a bit unclear and contradictory. It mentions the importance of using metadata standards for describing, structuring and formatting content, which I agree is very important; but they have excluded them as part of the assessment criteria. In particular, the sentence &#x201c;Here we focus on standards for metadata&#x201d; is very confusing as the examples given earlier all refer to metadata standards. Is the sentence intended to mean that the study is only concerned with metadata standards and does not consider data format standards?</p>
            <p> The &#x201c;Results&#x201d; section reads very well and is clear. The summary table together with the different criteria-based subsections include relevant, high-level information about the technical assessment that has been performed. With respect to requirement 2a around de-identification tools, perhaps it is worth mentioning that, although not specifically implemented by the community, the DSpace platform does have a mechanism / framework in place (i.e. curation system) that allows for easy integration of such tools within DSpace&#x2019;s standard submission workflows (see&#x00a0;
                <ext-link ext-link-type="uri" xlink:href="https://wiki.lyrasis.org/display/DSDOC6x/Curation+System">https://wiki.lyrasis.org/display/DSDOC6x/Curation+System</ext-link>).</p>
            <p> It is mentioned in the section &#x201c;Formal contract regarding upload and storage&#x201d; that the implemented demonstrator does not provide support for constructing and editing the distribution licence. However, the distribution licence text can be edited or customised, as we have done so in our Institutional DSpace repository instance. Perhaps, what the authors mean instead is that the platform does not provide a user interface to do this easily.</p>
            <p> The section about repository long-term preservation could have incorporated more detailed information about the DSpace&#x2019;s platform&#x2019;s capabilities around content preservation and relevant references and links to relevant literature. For example, open source integrations of the DSpace platform with preservation systems exist, e.g.&#x00a0;integration with Archivematica (
                <ext-link ext-link-type="uri" xlink:href="https://figshare.com/articles/Automating_OAIS_compliant_digital_preservation_using_Archivematica_and_DSpace/11274143/1">https://figshare.com/articles/Automating_OAIS_compliant_digital_preservation_using_Archivematica_and_DSpace/11274143/1</ext-link>).&#x00a0;The authors seem to mix the platform&#x2019;s long-term availability based on a number of aspects such as technology sustainability plans, or wide use, with the platform&#x2019;s capabilities for preservation of the repository content itself. The former is not directly related to preservation but to the long-term sustainability of the platform.</p>
            <p> Lastly, a number of sections in the paper talk about self-attestation functions in the context of access to repository content (requirement 7b &#x2013; web-based self-attestation of the user). I am not familiar with this term, and the general reader would benefit with a clearer definition of the term and such functions. I can only guess, based on context and my knowledge of repository platforms, that the authors mean the repository&#x2019;s ability for user self-registration to be able to access repository content, or functions for only giving access to content once certain information about the user has been collected and verified. E.g. the repository allows to incorporate a form asking content requesters to supply information about what uses they will make of the data, purpose of their research, contact information and /or email address to be verified, etc. If this is the case, this should be made much more explicit in the paper.</p>
            <p> 
                <bold>Minor edits and structure comments</bold>
            </p>
            <p> In the &#x201c;Results&#x201d; subsection of the abstract, the sentence &#x201c;Two requirements could not be demonstrated (inability to incorporate de-identification tools in the submission workflow, lack of a self-attestation system) &#x2026;&#x201d; is not clear. It needs to be rephrased, e.g. &#x201c;ability to incorporate &#x2026;&#x201d; and &#x201c;support for self-attestation &#x2026;&#x201d;. Otherwise it reads as though the things in parenthesis are actually the requirements.</p>
            <p> In the &#x201c;Conclusions&#x201d; subsection of the abstract, &#x201c;productive repository&#x201d; should read &#x201c;production ready repository&#x201d; or similar.</p>
            <p> In the &#x201c;Introduction&#x201d; section, first sentence, &#x201c;evironment&#x201d; should read &#x201c;environment.</p>
            <p> Table 3, third row &#x201c;Control of quality of data&#x201d;, C6 should read &#x201c;Quality assurance&#x201d; instead of &#x201c;insurance&#x201d;. Also, Table 3 appears much earlier (page 7) than its reference within the paper (page 11). I found this quite confusing when reading the paper as it appeared straight after Table 2, and completely out of context. It would be much clearer if the table was moved closer to its reference in the text, towards the end of the paper.</p>
            <p> </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Not applicable</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Digital archiving, digital repository platforms, research data management</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <sub-article article-type="response" id="comment5642-62923">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Ohmann</surname>
                            <given-names>Christian</given-names>
                        </name>
                        <aff>ECRIN, Germany</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>22</day>
                    <month>6</month>
                    <year>2020</year>
                </pub-date>
            </front-stub>
            <body>
                <p>The process of selection of DSpace as platfom for the demonstrator repository has been better explained. A table outlining key criteria of other platforms was not included but the information can be extracted from a reference.given.</p>
                <p> </p>
                <p> The reasons why missing security features and encryption were not considered in the paper were explained.</p>
                <p> </p>
                <p> A statement was included to motivate the use mainly public information.</p>
                <p> </p>
                <p> The importance of consent was stressed in the introduction.</p>
                <p> .</p>
                <p> The confusion over metadata was clarified.</p>
                <p> </p>
                <p> In the section &#x00ab;&#x00a0;De-identification practices&#x00a0;&#x00bb;, a line was added in response to the reviewers comment.</p>
                <p> </p>
                <p> In &#x00ab;&#x00a0;Formal contract regarding upload and storage&#x00a0;&#x00bb; an explanation reflecting the comment of the reviewer was given.</p>
                <p> </p>
                <p> The section about &#x00ab;&#x00a0;Long term preservation and sustainabiltiy&#x00a0;&#x00bb; has been renamed and rewritten.</p>
                <p> </p>
                <p> In the section &#x00ab;&#x00a0;Flexibility of access&#x00a0;&#x00bb; the meaning of the term self-attestation has been clarified. .</p>
            </body>
        </sub-article>
    </sub-article>
</article>
