Research Software vs. Research Data II: Protocols for Research Data dissemination and evaluation in the Open Science context

Background: Open Science seeks to render research outputs visible, accessible and reusable. In this context, Research Data and Research Software sharing and dissemination issues provide real challenges to the scientific community, as consequence of recent progress in political, legal and funding requirements. Methods: We take advantage from the approach we have developed in a precedent publication, in which we have highlighted the similarities between the Research Data and Research Software definitions. Results: The similarities between Research Data and Research Software definitions can be extended to propose protocols for Research Data dissemination and evaluation derived from those already proposed for Research Software dissemination and evaluation. We also analyze FAIR principles for these outputs. Conclusions: Our proposals here provide concrete instructions for Research Data and Research Software producers to make them more findable and accessible, as well as arguments to choose suitable dissemination platforms to complete the FAIR framework. Future work could analyze the potential extension of this parallelism to other kinds of research outputs that are disseminated under similar conditions to those of Research Data and Research Software, that is, without widely accepted publication procedures involving editors or other external actors and where the dissemination is usually restricted through the hands of the production team.


Introduction
Researchers produce many different outputs in their work in order to obtain the results that will be published in scientific journals, in articles that are still the main exchanging information mechanism in the scientific conversation. Among others, researchers produce Research Data (RD) and Research Software (RS), but yet again, both outputs have not currently a publication procedure as widely accepted as the one existing for articles, which constitutes one of the main drawbacks for their acceptance as first class citizens in the scientific ecosystem. This is one of the goals of the FAIR guiding principles 1: … is for scholarly digital objects of all kinds to become 'first class citizens' in the scientific publication ecosystem, where the quality of the publicationand more importantly, the impact of the publicationis a function of its ability to be accurately and appropriately found, reused, and cited over time, by all stakeholders, both human and mechanical.
[…] we do not pay our valuable digital objects the careful attention they deserve when we create and preserve them.
On the other hand, the following definition sets up Open Science goals related to research outputs 2: Open Science is the political and legal framework where research outputs are shared and disseminated in order to be rendered visible, accessible and reusable.
In this context, as reported in 3, the necessary skills to reach out these goals are complex: The skills needed for Open Science cover a broad span from data management to legal aspects, and include also more technical skills, such as data stewardship, data protection, scholarly communication and dissemination (including creating metadata) … and still require to be engineered 4 (see also section 5 of 5): An acceptable workflow needs to be created. However, most researchers, while experts in their own fields, have little awareness of metadata standards for data publication and information science in general, leading to cognitive and skill barriers that prevent them from undertaking routine best-practice data management.
Another drawback of this missing publication procedure for RD and RS is the possible loss of the expert knowledge that has been acquired along the research process 6: If not traditional papers and volumes, what, then, should researchers be publishing? Whilst the digital exchange of data is straightforward, the digital exchange and transfer of scientific knowledge in collaborative environments has proven to be a non-trivial task, requiring tacit, and rapidly changing expert knowledgemuch of which is lost in traditional methods of publication and information exchange. We believe that there is a need for mechanisms that support the production of self-contained units of knowledge and that facilitate the publication, sharing and reuse of such entities.
Examples of this lost knowledge include the report of failure cases, which are rarely published; or the description of the modifications that have been included in the final implemented algorithms, and that are the result of a long trial and error process to improve the initially conceived algorithm or to avoid computational errors.

REVISED Amendments from Version 1
This version considers the comments of the reviewers to better explain and illustrate some of the concepts presented in the article.
In particular we have improved Sections 2.4 and 3.4 to better explain the relations between the FAIR Principles and the work we propose here. A new table is added to Section 3.4 to show the connections between the RD CDUR evaluation protocol of Section 3.3 and the FAIR principles of Reference 1.
See our answers to the referee reports to complete the differences with the version 1 of this article.
Any further responses from the reviewers can be found at the end of the article Although the current trend in the scientific publication ecosystem is to place RS and RD into a better position, many researchers are still at a loss when facing RS and RD dissemination, and do not possess the needed skills, support or assistance for their disclosure in the right conditions. Moreover, they consider that much work and effort would be necessary to accomplish this goal, while having little or no positive effect in their curriculum 4: Put crudely, the large amount of effort involved in preparing data for publication release, coupled with the negligible current incentives and rewards, prevents many researchers from doing so.
Notice that some simple rules to be considered in RD dissemination practices have been already proposed, for example, in 7 On the other hand, research funders, like the European Commission, are currently laying out Open Science policies in their calls, in which it is required open access to the generated RD of the funded projects (although there may be exceptions), and where it is recommended to provide open access to research outputs in all generality, beyond publications and data, e.g. software tools 8. Notice that in the dissemination of these research outputs it is necessary to provide significant information in order to facilitate their visibility, accessibility and their reuse 9: Detailed provenance includes facets such as how the resource was generated, why it was generated, by whom, under what conditions, using what starting-data or source-resource, using what funding/resources, who owns the data, who should be given credit, and any filters or cleansing processes that have been applied post-generation.
Bearing in mind the above described landscape, the goal of our work here is to contribute to the improvement of the scientific endeavor with protocols that could help researchers, and the community at large, in the dissemination of their produced RD and RS, while contributing to the accomplishment of Open Science goals.
We concentrate here in practical matters, that is, in the how to: how to disseminate RD and RS to make them first class citizens so that they become visible, accessible, reusable. But dissemination procedures are not enough. With the aim to motivate researchers to deal with better dissemination tasks, most of the times considered by the members of the scientific community as an additional, useless burden, we should also take into consideration pathways that yield improved research evaluation practices, so relevant for researchers. That is, pathways that contribute to evaluate correctly the disseminated outputs with protocols that help both the researchersto know what will be evaluated and howas well as the evaluatorsinto setting the evaluation process.
Our proposal is grounded on our knowledge and experience concerning RS 10-14. This translation of knowledge from RS to RD has been already successfully applied 15 to propose a RD definition and to tackle the Borgman's conundrum challenges 16. In the present paper we attempt to extend this approach to the case of RD dissemination and evaluation practices. Indeed, there are some obvious differences between software and data. But by keeping these aspects aside, and focusing on the similarities, we can learn a lot from the common features that appear in the production context of RD and RS. As remarked above, this is a general procedure that can be adapted to several situations. Even when the differences are too important, and maybe the proposed dissemination and evaluation procedures are not directly applicable as such in both settings, they could help to suggest hints to address the diverse issues appearing in each environment. In summary, the present work follows and expands the approach adopted in 15. Both articles can be read separately, but they constitute a whole.
The plan of this work is as follows. The next section is devoted to revisit the corresponding points related to RS: definition, dissemination, evaluation and consideration of the role of FAIR principles in this context. Section 3 focus then in RD topics, reviewing the proposed RD definition 15 and to present the main contribution: some comprehensive RD dissemination and evaluation procedures. Conclusions will end this work.

Research Software
Three are the main components of this section: the RS definition coming from 13, 14, the RS dissemination procedure coming from 10, the CDUR RS evaluation protocol from 13. Some comments on FAIR principles for RS will complete this section.

Research Software definition, reference and citation
In this work we consider the following definition of RS 13, 14: Research software is a well identified set of code that has been written by a (again, well identified) research team. It is software that has been built and used to produce a result published or disseminated in some article or scientific contribution. Each research software encloses a set (of files) that contains the source code and the compiled code. It can also include other elements as the documentation, specifications, use cases, a test suite, examples of input data and corresponding output data, and even preparatory material.
We observe, following the above definition, that RS has three main characteristics: • the goal of the RS development is to do research, • it has been written by a research team, • the RS is involved in the obtention of the results presented in scientific articles (as the most important means for scientific exchange are still articles published in scientific journals), or by any other kind of recognized scientific means.
Note that documentation, licenses, examples, data, tests, software management plans and other related information and materials can also be part of the set of files that constitutes a specific RS.
Moreover, a RS development team may not just use software produced by other teams, but also include external software as a component inside the ongoing development, something which can be facilitated by the Free/Open Source Software (FLOSS) 1 licenses. This potential external component will qualify here as RS if it complies with the three characteristics given in the above definition 15. Moreover, the responsible team of the resulting work should clearly identify the included external components and their licenses, as well as highlight, by means of recommended citation practices, 13, 17, 18, the external components that qualify as RS.
General aspects of FLOSS issues can be consulted, for example, in 19. Let us remark that good practices for software development management ask for updating regularly the RS related information, like, for example, project's funding, publications or involved teams and contributors. A Software Management Plan (SMP) can be a powerful tool to help and to handle this information, see for example 12, and the references therein.
Let us recall that RS reference and citation recommendations have been considered in section 2.5 of 13 where we propose easy to adopt methods to improve RS citation practices, other citation related works can be found in 17, 18, 20.

A Research Software dissemination procedure
Let us begin by recalling that, as stated in 8: Dissemination means the public disclosure of the results by appropriate means (other than resulting from protecting or exploiting the results), including by scientific publications in any medium.
The following RS dissemination procedure has been proposed in 10 and was first published 2 in the PLUME project 3 (2006-2013) 13, 21. The French initial version includes a close analysis of legal issues (French author rights, licensing) in order to produce FLOSS RS. It is slightly updated and completed in the following. More information on the legal issues can be found in 11, or in section 2.1 of Reference 15.
As a general recommendation, it is best practice to consider licensing issues and to keep the related information in a SMP from the very first stages of the RS development. The RS license establishes its sharing conditions: it can give rights for access, copy, modification, redistribution of the RS, and it can establish reciprocity clauses that should be respected by the potential RS users. Licenses should be put well into place before releasing the RS.
Here we present the proposed RS dissemination procedure. Steps marked with (*) are to be revisited regularly for each version release. The PLUME project platform has been closed in 2022. Some of the documents published in this platform are now available from https:// zenodo.org/communities/plume-patrimoine-logiciel-laboratoire/ • Choose a name or title to identify the RS, avoid trademarks and other proprietary names, you can associate date, version number, and target platform. Consider best practices in file names 4 .
• (*) Establish the list of authors and affiliations (this is the so called research team step). An associated percentage of participation, completed with minor contributors can be useful. If the list is too long, keep updated information in a web page or another document like a SMP, for example, where you can mention the different contributor roles. This is the step in which the intellectual property producer's rights are established. Producers include the RS authors and rightholders. This is then the step in which RS legal issues related to copyright information are dealt with.
• (*) Establish the list of included software and data components, indicate their licenses (or other documents like the component's documentation) giving the rights to access, copying, modification and redistribution for each component. In the case of software and data that fall in the category of RS or RD, please take into consideration best citation practices 13, 17, 18.
• Choose a software license, with the agreement of all the rightholders and authors, and establish a signed agreement if possible. The licenses of the software components that have been included and/or modified to produce the RS can have impact in your license decision, see for example 10, 19, 22. Software licenses and licensing information can be found at the Free Software Foundation (FSF) 5 , the Open Source Initiative (OSI) 6 , and the Software Package Data Exchange (SPDX) 7 . Consider using FLOSS licenses to give the rights of use, copy, modification, and/or redistribution. This is then the step in which legal issues related to the RS sharing conditions are to be taken into consideration. Indicate the license in the RS files, its documentation, and the project web pages. Give licenses, like GNU FDL 8 , Creative Commons (CC) 9 , LAL 10 , to documentation and to web sites.
• Choose a web site, forge, or deposit to distribute your product; licensing and/or conditions of use, copy, modification, and/or redistribution should be clearly stated, as well as the best way to cite your work. Good metadata and respect of open standards are always important when giving away new components to a large community: it helps others to reuse your work and increases its longevity. Use Persistent Identifiers (PIDs) 11 if possible.
• (*) This step deals with the utility of the RS and how it has been used for your research (this is the research work step). Establish the list of main functionalities, and archive a tar.gz or similar for the main RS versions in safe place. Keep a list of the associated research work, including published articles. Update your documentation, SMP, web site, etc. with the new information in each main RS version.
• Inform your laboratories and head institutions about this RS dissemination (if this has not be done in the license step).
• Create and indicate clearly an address of contact.
• Release the RS.
• Inform the community (e.g via mailing lists), consider the publication of a software paper, see for example the list of Journals where you can publish articles focusing on software 12 . 4 See for example https://libguides.princeton.edu/c.php?g=102546&p=930626, https://doranum.fr/stockage-archivage/comment-nommerfichiers/ This proposed procedure is flexible and can be adapted to many different situations. It has been taken into consideration in the HAL research software deposit 23 13 .

The CDUR procedure to evaluate Research Software
We include in this section the summarized version of the CDUR protocol that can be found in 13 (section 4.1). This reference gives a detailed description and analysis of the protocol as well as a complete list of references related to this work. This procedure for RS evaluation contains four steps to be applied in the following chronological order: Citation, Dissemination, Use and Research. For example, as we have seen in the Section 2.2, the first steps in the RS dissemination procedure correspond to the correct RS identification, and in order to be correctly cited, the RS reference should be clearly indicated. Let us introduce a resumed version of these four steps.
(C) Citation. This step measures to what extent the evaluated RS is well identified as a research output. It is also the step where RS authors are correctly identified as well as their affiliations.
Section 2.5 of 13 proposes three different ways to establish a RS reference, in order to facilitate its citation, formula that can include the use of persistent identifiers. Moreover, a more evolved RS identification level could be provided in the form of a metadata set. Reference and metadata include, among other informations, the list of the RS authors and their affiliations (13, section 2.2). See also 17, 18, 20.
(D) Dissemination. This step measures the quality of the RS dissemination plan involving actions such as: • Choosing a license, with the agreement of all the rights' holders and authors. Consider, preferably, using FLOSS licenses.
• Choosing a web site, forge, or deposit to distribute the product; stating clearly licensing and conditions of use, copy, modification, and/or redistribution.
• Creating and indicating a contact address.
This step deals with legal issues involving the authors and rightholders (as established in the Citation step) deciding and installing the license(s) for the RS dissemination. This is also the step concerning Open Science, as the RS license expresses its sharing conditions; and where policy makers should establish the Open Science policies that will be applied in the evaluation process.
Finally, let us recall that the inclusion of the list of related publications, data sets and other related works in the dissemination procedure helps to prepare the reproducible science issues that are to be taken into account in the Use step.
(U) Use. This step is devoted to the evaluation of the technical software aspects. In particular, this step measures the quality of the RS usage, considering that a performing RS is one that is both correct and usable by the target scientific community.
The RS usability does not only refer to the quality of the scientific output but also can deal with other matters, such as the provided documentation, tutorials and examples (including both inputs and outputs), an easy and intuitive manipulation, testing and version management, etc. This is the reproducible science step, where it is measured how the published results obtained with the RS can be replicated and reproduced.
(R) Research. This step measures the impact of the scientific research that has required in an essential way the RS under consideration.
The evaluation of this item should follow whatever standards for scientific research quality in the concerned community. This is the step where the RS related publications (as described in the RS definition in Section 2.1) come into play, and where the evaluation should consider the difficulty of the addressed scientific problems, the quality of the obtained results, the efficiency of the proposed algorithms and data structures, etc. The RS impact can also be assessed through the research impact of the related publications, and through its inclusion (or use) as software component in other RS.
Each of these four steps can reach different levels of qualification and the corresponding scale is to be set up by the policy makers considering a particular evaluation event. Thus, the CDUR protocol can be easily adapted to different circumstances: career evolution, recruitment, funding, RS peer review or other procedures to be applied by universities and other research performing institutions, research funders, or scientific journals, and it can also be adapted to different evaluation situations arising in different scientific areas.

FAIR Research Software
Although the FAIR principles have been first designed for data, they apply as well to other digital objects 1: … it is our intent that the principles apply not only to 'data' in the conventional sense, but also to the algorithms, tools, and workflows that led to that data. All scholarly digital research objectsfrom data to analytical pipelines benefit from application of these principles, since all components of the research process must be available to ensure transparency, reproducibility, and reusability.
In Reference 24 we can find further explanation on the FAIR Guiding Principles 14 : FAIR refers to a set of principles, focused on ensuring that research objects are reusable, and actually will be reused, and so become as valuable as is possible. They deliberately do not specify technical requirements, but are a set of guiding principles that provide for a continuum of increasing reusability, via many different implementations. They describe characteristics and aspirations for systems and services to support the creation of valuable research outputs that could then be rigorously evaluated and extensively reused, with appropriate credit, to the benefit of both creator and user.
Explanation that is followed by a list of items to outline what FAIR is not. We complete here this list of what FAIR is not with the following: FAIR does neither claim to be, strictly speaking, a dissemination procedure, nor an evaluation protocol as the ones proposed in the present article. Yet these FAIR Guiding Principles give instructions that can be considered in dissemination procedures and evaluation protocols.
In the case of RS, FAIR principles have been considered in several conferences and publications, although some adaptations seem to be necessary 20, 25, 26. See also the documents in the FAIR Research Software (FAIR4RS) Zenodo Community 15 of the RDA FAIR4RS WG 16 27.
In this section we highlight two points regarding these principles that appear in our RS dissemination procedure (see Section 2.2) and the CDUR evaluation protocol (see Section 2.3), namely those referring to Persistent Identifiers (PIDs) and metadata, as remarked in 5: Central to the realization of FAIR are FAIR Digital Objects. These objects could represent data, software, protocols or other research resources. They need to be accompanied by Persistent Identifiers (PIDs) and metadata rich enough to enable them to be reliably found, used and cited.
Note that these two points are included in the basic "minimum standard" of 5 (p. 13). In particular we would like to observe the following points regarding PIDs: • we recommend to use PIDs associated to authors, like ORCID 17 , • we recommend to associate PIDs to the disseminated RS; as a RS can have several versions, do consider a different PID for each main release, • as PIDs can be provided by the chosen deposit, PID provision should be one of the arguments favoring the selection of a deposit like, for example, Zenodo 18 , • articles associated to the RS should have their own PID, furnishing in this way the RS with other possible citation forms 13 (section 2.5), i.e. with complementary means to reliably finding the RS, facilitating thus its use and citation by other researchers, • if the included data and software components or other external components that are necessary to run the disseminated RS have associated their own PIDs, it is convenient to refer to them in order to contribute to their own access and visibility.
Note that several possibilities for software identification are discussed in 27.
On the other hand, concerning the role of metadata sets in our RS dissemination and evaluation proposals, let us observe that metadata is a very flexible concept, going from a simple reference or citation form or the use of citation file formats 17, to a very complete and precise RS description. In any case, our protocols consider that they are an important tool to set attribution to the RS and to facilitate credit. One possibility we would like to suggest is the metadata format proposed in the PRESOFT SMP template 12, that has a manageable size and has also the advantage that it is based in the RS index card elaborated in the PLUME project (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013). A different, more complex metadata set can be generated, for example, with COdeMeta 19 . We remark that it is the role of the RS producer team to set the RS metadata complying with FAIR principles 1, 27, that appear in the Findable, Interoperable and Reusable guidelines, and to ensure that the RS deposit guarantees that the metadata remain accessible (principle A.2). These aspects related to metadata and citation forms are to be considered in the C, and U steps of our CDUR protocol following the requirements established by the evaluation committees.
Finally, we consider the implementation and adoption of FAIR principles 1,5,9,27,29 and other standards as arguments favoring the choice of a deposit for the RS, see the principles F4 and the four Accessible ones (A.1, A1.1, A1.2, A2) of 1 and 27, which are to be considered in the D step of our CDUR protocol. A tool to help taking such decision could be the FAIRsharing platform 20 , that provides a large amount of community-developed standards, as well as indicators (among others) necessary to monitor their adoption, and to follow data policies established by funders, editorials and other organizations. See, for example, the information that appears in the FAIRsharing platform associated to the FAIR Principles 21 .
A table establishing relationships between the CDUR evaluation protocol steps and the FAIR principles has been included in Section 3.4. We refer the reader to this table and the subsequent comments for further comparison issues.

Research Data
This section translates to RD the previously addressed RS issues: definition, dissemination and evaluation, ending with some RD FAIR considerations.

A Research Data definition
In coherence with the declared parallelism between RS and RD, we consider here the RD definition proposed in 15.
Research Data is a well identified set of data that has been produced (collected, processed, analyzed, shared and disseminated) by a (again, well identified) research team. The data has been collected, processed and analyzed to produce a result published or disseminated in some article or scientific contribution. Each research data encloses a set (of files) that contains the dataset maybe organized as a database, and it can also include other elements as the documentation, specifications, use cases, and any other useful material as provenance information, instrument information, etc. It can include the research software that has been developed to manipulate the dataset (from short scripts to research software of larger size) or give the references to the software that is necessary to manipulate the data (developed or not in an academic context). Thus, as carefully argued and commented in 15, RD has three main characteristics: • the goal of the RD collection and analysis is to do research, that is, to answer a scientific question, • it has been produced by a research team, • the RD is involved in the obtention of the results presented in scientific articles (as the most important means for scientific exchange are still articles published in scientific journals), or by any other kind of recognized scientific means.
The reader is referred to 15 for a thorough explanation of the different issues involved in the description of these characteristics. Yet, we recall in here that the identified set of data constitute a database in the case the data are arranged in a systematic or methodical way and is individually accessible by electronic or other means 30-34. The sui generis database rights primarily protects the producer of the database and may prohibit the extraction and/or reuse of all or a substantial part of its content for example 30.
Remark that it is becoming a general practice for research funders to ask for a Data Management Plan (DMP) concerning the data generated in a funded project 22 8, 37-38. See for example the DMPonline platform of the Digital Curation Center (DCC) as a helpful tool to create, review, and share DMPs that meet institutional and funder requirements 23 . In particular, French research projects can benefit from DMP OPIDoR 24 .

A procedure for Research Data dissemination
The following procedure has been adapted to RD from the RS dissemination procedure proposed in Section 2.2. Only a new item has been added here for RD (the 3rd item) to highlight the potential difficulties concerning legal (and ethical) issues. Similarly, steps marked with (*) are to be revisited regularly in each version release, if necessary.
Again, as a general recommendation, it is best practice to consider licensing issues 34 and to keep a DMP from the very first stages of the RD development. The RD license establishes the sharing conditions: it can give rights for access, copy, modification, redistribution of the RD, and it can establish reciprocity clauses that should be respected by the potential RD users. It should be put well into place before releasing the RD.
• Choose a name or title to identify the RD, avoid trademarks and other proprietary names, you can associate date, version number … Consider best practices in file names 25 .
• (*) Establish the list of the persons that have participate to the RD production, that is, the persons who have collected, processed, analyzed, shared and disseminated the RD; as well as their affiliations (this is the so called research team step). If the list is too long, keep updated information in a web page or another document like a DMP, for example, where you can mention the different contributor roles. This is the step in which the producer's rights are established, if any. Producers include the RD authors (in the case there are intellectual property rights associated to the RD) and the corresponding rightholders. This is then the step in which legal issues related to copyright and ownership information are dealt with 32, 33, 37, 38.
• Data can have associated other legal (or ethical) contexts 15, 34, 38, 39, they can be intimately related to the ongoing research work, consider them with the help of legal experts if necessary.
• (*) Establish the list of included software and data components, indicate their licenses (or other documents like the component's documentation) giving rights to access, copying, modification and redistribution for the component. In the case of software and data that fall in the category of RS or RD, please take into consideration best citation practices 40-43, 13, 18, 20 • Choose a data license, with the agreement of all the producers and rightholders, and establish a signed agreement if possible. The licenses of data components that have been included and/or modified to produce the RD can have impact in your license decision 34. Consider using licenses like the Creative Common licenses (V4.0) 26 or the Open Data Commons Licenses 27 , for example. Other data licenses can be found at SPDX 28 . This is then the step in which legal issues related to the RD sharing conditions are to be taken into consideration. Indicate the license in the RD files, its documentation, the project web pages, etc. Give licenses, like GNU FDL 29 , Creative Commons (CC) 30 , LAL 31 , to documentation and to web sites.
• Choose a web site, forge, a data repository or any another deposit to distribute your product; licensing and conditions of use, copy, modification, and/or redistribution should be clearly stated, as well as the best way to cite your work. Good metadata and respect of open standards are always important when giving away new components to a large community: it helps others to reuse your work and increases its longevity. Use Persistent Identifiers (PIDs) 32 if possible. This point corresponds to the Where? question of the Borgman's conundrum challenges as discussed in the Conclusions section of 15.
• (*) This step deals with the utility of the RD and how it has been used for your research (it is then the research work step). Establish the list of the main RD research issues that appear in your work and that can facilitate its reuse. Archive a tar.gz or similar for the main RD versions in safe place. Keep a list of the associated research work, including published articles. Update your documentation, DMP, web site … with the new information in each main release.
• Inform your laboratories and head institutions about this RD dissemination (if this has not be done in the license step).
• Create and indicate clearly an address of contact.
• Release the RD.
• Inform the community (e.g. via mailing lists), consider the publication of a data paper.
This proposed procedure is also flexible and can be adapted to many different situations.
Note that if you follow this dissemination procedure you will get a 5STARS RD 33 .
A much more complete and complex vision of data sharing can be found, for example, in 5.

The CDUR procedure to evaluate Research Data
Similarly to the RS CDUR evaluation protocol proposed in Section 2.3, the CDUR protocol for RD evaluation that we propose out in here contains four steps to be carried out in the following chronological order: Citation, Dissemination, Use and Research. The RS CDUR evaluation protocol translates to the RD evaluation context in a straightforward way: (C) Citation. This step measures to what extent the evaluated RD is well identified as a research output. It is also the step where RD producers are correctly identified as well.
As seen in the dissemination procedure (Section 3.2), a reference to cite the work should be well established. If required in a evaluation process, a complete set of RD metadata (including PIDs) should be provided. This is also the step concerning Open Science, as the RD license expresses its sharing conditions; and the step where policy makers should establish the Open Science policies that will be applied in the evaluation process.
Finally, let us recall that the inclusion of the list of related publications, software and data sets and other works mentioned in the dissemination procedure helps to prepare the reproducible science issues that are to be taken into account in the Use step.
(U) Use. This step is devoted to the evaluation of the technical data aspects. In particular, this step measures the quality of the RD. The RD usability does not only refer to the quality of the scientific output but also can deal with other matters, such as the provided documentation, tutorials and examples of use for easy and intuitive manipulation, etc.
The relevance of the production scientific context, and of the RD generation process, to maximize reuse and reproducibility has been already emphasized in Reference 13, but indeed, the replicability and reuse steps are highly challenging for RD, even if the RD is correctly disseminated. Evaluation committees should take into consideration the difficulties appearing in this matter.
This is the reproducible science step, where it is measured how the published results obtained with the RD can be replicated and reproduced.
(R) Research. This step measures the impact of the scientific research that has required in an essential way the RD under consideration.
The evaluation of this item should follow whatever standards for scientific research quality in the concerned community.
This is the step where the RD related publications (as described in Section 3.1) come into play, and where the evaluation should consider the difficulty of the addressed scientific problems, the quality of the obtained results, the efficiency of the proposed algorithms and data structures, etc. The RD impact can also be assessed through the research impact of the related publications, and through its inclusion (or use) as a data component in other RD.
To end this section, let us remark that similar considerations for the flexibility of the application of the CDUR RS evaluation protocol do apply for RD. See 13 for a more detailed analysis of the RS CDUR evaluation protocol.

FAIR Research Data
Remark that, as stated in Section 2.4, FAIR principles have been initially designed for data, so our reflections in that section concerning the connection of the FAIR principles with the CDUR protocol do specially apply here. Indeed, there is a lot of recent work on FAIR data issues, see for example 1,5,9,29,43 and the references mentioned there. We would like to mention some FAIR assessment tools currently under development, such as the automatic FAIR evaluator (DIGITAL.CSIC) of the EOSC-Synergy project 34 or the data sharing evaluation project 35 .
The Let us remark that we have not found an equivalent FAIR principle to the Research CDUR as shown in the Table 1 above.
Of course, our perception of the connections enumerated in this table does not mean that we consider both approaches to be equivalent or redundant. Indeed, as remarked in 15, our perspective remains at a conceptual level that, on the one hand does not enter in some details that could be addressed through the FAIR Principles. On the other hand, we think that our foundational, simplified perspective allows to better grasp some of the involved basic problems, helping in this way to imagine a journey towards their solution. Yet, we consider, thanks to the Referees suggestions, that getting deeper into the relationships between the FAIR and CDUR principles could be and interesting and challenging subject for future work.

Conclusion
Designing and following best practices for research output dissemination are important steps toward accomplishing the Open Science goals, to render research visible, accessible and reusable 2. We also consider that the current evolution in research evaluation practices will enable the adoption of Open Science methods 13, 44, as well as they will facilitate their integration in every day research activities.
As we have already detailed in our work, RS and RD present many similarities concerning dissemination and evaluation issues. For example, we have included in Section 3.1 a RD definition that has been proposed in 13 and that it is clearly based on a RS definition (see 13, 14 and Section 2.1). Following the same scheme, in Section 3 we have proposed and argued in detail RD dissemination and evaluation procedures grounded in the RS proposed dissemination (Section 2.2 and 10) and evaluation (Section 2.3 and 13) procedures.
It is pending work for the future to analyse the potential extension of this parallelism to other kinds of research outputs that are disseminated under similar conditions as RD and RS, that is, without widely accepted publication procedures involving editors or other external actors and where the dissemination is usually restricted through the hands of the production team (eventually including the selection of platforms or repositories). Sections 2.4 and 3.4 on FAIR RS and RD develop some reflections on the connections between these principles and the proposed dissemination and evaluation protocols. We consider that our dissemination and evaluation (CDUR) proposals, if followed correctly, may clearly contribute towards a more sound implementation of FAIR principles for RS and RD, as they provide robust instructions for their producers to make them more findable and accessible, as well as arguments to choose suitable dissemination platforms to complete the FAIR framework. Moreover, interoperability and reusability could be also fostered with best documentation practices, such as it is proposed in our dissemination procedure; practices that can be evaluated with our CDUR protocol.
As declared in Section 3.4, we think that it will be very important to devote some future work to further study the similarities and differences, and the mutual benefits, between the FAIR and CDUR approaches.
On another note, we observe that one of the advantages of the CDUR protocols for RS and RD described here is that they separate the evaluation of research aspects from those related to much more technical issues concerning software or data, as these different contexts may involve evaluators with disparate levels of expertise in the corresponding areas. Evaluators can then set priorities and adapt the protocol to the evaluation setting.
Furthermore, we consider that our dissemination and evaluation proposals contribute towards the development of Open Science 2. On the one hand, enhancing open access outputs, as we highlight precisely the steps that deal with licensing issues. On the other hand, because we emphasize the role of best dissemination practices the first two steps of the CDUR protocols, as remarked in Sections 2.3 and 3.3.
As a general reflection related to our study of RD dissemination and evaluation issues, let us remark that the CDUR protocol states that a research output (such as RD or RS) that is to be disseminated, should be identified correctly to increase its visibility, as well as the visibility of its producer team and their research work, in order to make it accessible and reutilizable. We have already highlighted in 13 that one of the roles of the evaluation stages is to improve best dissemination practices, such as best credit, attribution and citation, practices that are still to be widely adopted: … we consider that it is in the interest of the research communities and institutions to adopt clear and transparent procedures for the evaluation of research software. Procedures like the proposed CDUR protocol facilitate RS evaluation and will, as a consequence, improve RS sharing and dissemination, RS citation practices and, thus, RS impact assessment.
After the study presented in this article, it seems to us clear enough that the same statement also applies for RD.
As a final conclusion of our work, we would like to emphasize the underlying RD dissemination/evaluation loop (see 45): first, the CDUR protocol points out to the research community the need to correctly disseminate outputs, as only well disseminated outputs are potential subject of evaluation; secondly, the CDUR protocol also implies that outputs are to be disseminated following the adopted evaluation policies.
In this imbricated context, it is the intention of this work to contribute towards improving dissemination and evaluation procedures, and thus, to enhance best Open Science every day practices.

Data availability
Underlying data Data underlying the arguments presented in this article can be found in the references and footnotes.

Mark Leggott
Digital Research Alliance of Canada, Ottawa, ON, Canada In general I found the intent of the article (to propose a common rubric for making data and SW adhere to Open Science and FAIR Principles) to be a reasonable goal, but I'm not sure the article clearly achieves that goal. The authors state that "Bearing in mind the above described landscape, the goal of our work here is to contribute to the improvement of the scientific endeavor with protocols that could help researchers, and the community at large, in the dissemination of their produced RD and RS, while contributing to the accomplishment of Open Science goals." I don't feel that the proposed protocols (CDUR) provide sufficiently detailed recommendations to support this goal.
Part of the challenge I have in saying this is that work of initiatives like FAIR4RS would be very informative in this article, but they are not mentioned. Also, the FAIR4RS Principles have direct intersections with some of the CDUR approaches, which seems overly simplified in view of this FAIR4RS rubric. I find it unusual that the paper does not even mention the work of the FAIR4RS Working Group, which has articulated a number of the concepts and approaches highlighted in this article. The work of the RDA-FORCE11 Software Source Code Identification Working Group is also of direct relevance to the PID discussion in 2.4, so should be highlighted.
A table comparing the CDUR recommendations against data and SW might be useful, as it would extract the key elements of the proposed approach and make it easier for the reader to make the connections.
The authors could mention the value of a Software Bill of Materials (SBOMs) in section 2.1, para 4 (Moreover...) Mentioning how a SMP can be integrated with a DMP, reinforcing the idea that the data and SW can be considered in a common rubric, would also be beneficial. There has been some work by the Software Sustainability Institute to develop a SMP meant to be integrated with standard DMP tools.
I find the idea of drawing connections between the practices needed to support similar Open Science/FAIR concepts with data and SW is very desirable, and the authors do provide one of the few attempts to articulate this. If they were able to achieve a better integration of additional and specific resources and best practices with their CDUR approach, it would benefit the article substantially.
Is the work clearly and accurately presented and does it cite the current literature? Partly

If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable -[Software Bill of Materials (SBOMs)] Many thanks for your suggestion, but we consider that software security issues, the main topic of, for example, the report https://linuxfoundation.org/wpcontent/uploads/LFResearch_SBOM_Report_020422.pdf, are not part of our conceptual approach of Section 2.1. As already mentioned to other Referees, we do not enter in several concrete and technical issues. The point of software security issues, as well as many other ones, could be considered by the evaluation committees in the Use step of the CDUR protocol, which can be easily adapted by the committees to consider such technical points. Indeed, there are many relevant issues that are not detailed in our definition and proposed protocols, as we aim to address the subject from a more conceptual perspective. This point of view has been clarified in the new version of [Reference 13]. The decision of considering such broad point of view has required us to include in the new version of [Reference 13] comments reacting to some Referee questions that asked us, like you do, the consideration of different specific issues that they considered we had forgotten.
-[how a SMP can be integrated with a DMP] Thanks again for your suggestion. This could be the subject of future extension of our work, but it is out of the scope of the present article.
-[I find the idea of drawing connections between the practices needed to support similar Open Science/FAIR concepts with data and SW is very desirable, and the authors do provide one of the few attempts to articulate this.] Thank you so much for this very positive comment, that in some sense supports our perception that this article already provides enough contributions to merit its publication as it is, although, there is always room for improvements in future work.
-[achieve a better integration of additional and specific resources and best practices] Following your suggestion we provide a new version of sections 2.4 and 3.4 regarding additional reflections on the relationships between FAIR and CDUR issues. For more detailed information on the CDUR protocol the appropriate reference is [Reference 11], while in the present paper we have included only a short description to explain that CDUR can be applied in a similar way for RS and for RD.
The comparison of RS and RD in the context of dissemination (and its evaluation) is interesting and highlights some similarities in the bottlenecks and challenges that need to be addressed.
However, our main general concern is the high heterogeneity of RD compared to RS (maybe a naïve vision) which makes these general procedures and protocols not always applicable or too vague to be practically useful as they don't provide practical solutions.
However, they have the merit to draw the attention to the importance of following good practices (e.g. protocols, procedures) to disseminate RD and RS, and provide general guidance which can help to identify what could be some potential practical solutions. We recommend this article to be indexed as a good basis for further discussions.
Below we provide some detailed comments and suggestions on specific points discussed in the manuscript that could be included in future version of the article.

Proposed definitions:
The proposed definitions used in this article are based on another article currently awaiting peer review 1 . The authors should update these definitions after comments from the reviewers and approval of the other manuscript.

○
We generally agree with the proposed definitions. However, regarding the definition of RD we would add that following protocols and standards established in the field are important. For example, "the data has been collected, processed and analysed following protocols, procedures and standards established in the field to produce a results….." ○ Dissemination and evaluation procedures: The dissemination procedure and the CDUR protocol for RS have previously been reported and reviewed, we don't have further comments on these.
○ One important aspect and challenge for RD dissemination that should be mentioned is to make data "machine-readable" to facilitate collection and enable re-use, and thus contribute to new data-driven discoveries (e.g. using machine-based tool). In some fields, RD are mainly shared as "human-readable" format only, usually in a monolithic supporting information document along with a scientific publication (e.g. PDF format). RS in contrast has well developed repositories which to a certain extent are machine actionable, as well as being closely integrated into the publication processes (such as Github via https://github.com/openjournals and https://joss.theoj.org). Overall, we think this article should have a paragraph where the state of play might be on machine-readable or even machine-actionable RD and RS and perhaps comparing how this is evolving for both of them.

○
In general, it would be good to further emphasise that many aspects discussed in these procedures and protocols can easily be addressed with relevant metadata which are keys to address the FAIR principles. In terms of metadata, this plays a key role in ensuring Findability/Discovery of the object, using the metadata registry MDS (metadata store). It would be good for this article to perhaps illustrate the role of registered metadata in finding both RD and RS, and perhaps to explore the granularity of the metadata for both. Is it good enough to use metadata purely to discover the functionality of a RS code, or might it be necessary to explore in more details the functions and libraries in RS? ○ It is maybe a naïve vision, but research data features a much broader heterogeneity than RS, so many different types of data can be generated that it makes these procedures difficult to apply or generalise to RD in general (As mentioned in the report of reviewer 1 for RS 2 ).

○
In general, more specific examples are provided for RS than for RD. More examples or references for RD would be useful for the readers.

Dissemination steps:
Contrary to RS, versioning is rare or can be difficult to achieve with RD. Complementary data can be generated to support new conclusions but does not necessarily feature new versions of the previous ones, thus "revisiting version release" as mentioned doesn't necessarily make sense, but each time new data are generated would be more appropriate.

○
A single name or title for RD is not necessarily as straightforward as it is with RS, as RD usually are made available along with scientific articles (as mentioned in the three main characteristics), the name of the dataset can be related to the title of the narrative it accompany though.
○ "Choose a web site, forge…" Data repository should be explicitly mentioned. In addition, ○ selecting data repository which generate relevant metadata for the discipline (if available) should be considered (rather than generic data repository).
The address of contact can easily be addressed via PID associated with the authors (i.e. ORCiD).

Evaluation steps:
We would encourage the author to discuss how the different C, D, U, R steps of the protocol help to meet the FAIR criteria. To some extent, the Citation step help to address the Findability, the Use step deals with the interoperability, etc… ○ Data citation: These can easily be evaluated via the attribution of DOI, and PID for the researchers (OrCiD). The data citation can follow some guideline previously defined by various organisation (see joint declaration for data citation -FORCE11) and should be given as examples.

○
The choice of file format (e.g. non-proprietary, open format, machine-readable) to enable re-use is important and something that should be evaluated in the protocol.
Reviewer Expertise: FAIR data, workflows for data publication, chemistry We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.
Author Response 08 Sep 2022 Teresa Gomez-Diaz, CNRS, Paris-Est, France Dear Charles Romain and Henry S. Rzepa, Many thanks for all your comments that have helped us a lot to improve our work. Here are some answers while we are preparing a new version.
-[high heterogeneity of RD compared to RS] We will include some explanations. Indeed, heterogeneity appears obviously between software and data, yes. But if we keep these aspects aside and focus on the similarities, we can learn a lot from the common features that appear in the production context of RD and RS. This is a general procedure that can be adapted to several situations. Even when the differences are too important, and maybe the proposed dissemination and evaluation procedures are not directly applicable as such in both settings, they could help to suggest hints to address the diverse issues appearing in each environment.
-[to draw the attention to the importance of following good practices] Many thanks! -[Proposed definitions, article currently awaiting peer review] We are preparing Version 2 of [Reference 13] bearing in mind Referees' comments.
-[Proposed definitions, would add following protocols and standards] We agree with you, but we do not think that is necessary to change the formulation of our definition. Yet, in the new version of [Reference 13], we have detailed and emphasized the relevance, in order to characterize the RD concept, of the research team decisions in the scientific production framework, which includes the protocols, procedures and standards to be followed in order to consider some given data as true RD. See, for example, our reference to the STRENDA standards developed for investigations on enzyme activities. Some of your comments to the present paper have helped us to improve the new version of [Reference 13], as they are related to the conceptual framework. For this reason, the new version of the present paper does not reflect them in here.
-[Dissemination and evaluation procedures, machine-readable...] Yes, there are many relevant issues that are not detailed in our definition and proposed protocols, as we aim to address the subject from a more conceptual perspective. This point of view has been clarified in the new version of [Reference 13]. The decision of considering such broad point of view has required us to include in the new version of [Reference 13] comments reacting to some Referee questions that asked us, like you do, the consideration of different specific issues that they considered we had forgotten.
first step to identify it as a research output.
-[Dissemination steps, Data repository should be explicitly mentioned] Yes, you are right. This point has been studied in the Conclusions of [Reference 13], that is the Where? question of the RD Borgman's conundrum challenges. We will modify in the new version of the present article the RD dissemination procedure to include your suggestion.
-[Dissemination steps, The address of contact can easily be addressed via PID associated with the authors (i.e. ORCiD).] Well, we do not fully agree with this optimistic view, as ORCID usually refers to one researcher, while a mail address can refer to a (evolving) team, which is a much more generic solution. In our opinion, the identification issue can still be a quite challenging issue, mainly for software or data that have been developed for many years collectively. Thus, we do prefer to keep the current formulation in Sections 2.2 and 3.2 of the present article.
-[Evaluation steps, how the different C, D, U, R steps of the protocol help to meet the FAIR criteria] Yes, you are right, and many thanks for this interesting suggestion, that have addressed in sections 2.4 and 3.4 of the new version.
-[Data citation, can easily be evaluated via the attribution of DOI..] Citation examples are already available in the cited works, see for example References 4,34,35,37, and we will add this one: Altman M, Crosas M. The evolution of data citation: From principles to implementation . IASSIST Quarterly. 2013;37]. See also our comments above concerning ORCID issues.
-[choice of file format] Yes, but we think that this issue is already addressed -perhaps not in such level of detail as you mention, as this level does not correspond to the more conceptual approach of our work -, as indicated in the dissemination protocols, in Sections 2.2 and 2.3: Good metadata and respect of open standards are always important when giving away new components to a large community: it helps others to reuse your work and increases its longevity. And this is to be evaluated in the CDUR protocol in the Dissemination step, as the use of open formats corresponds to Open Science issues. Other technical issues can be evaluated in the Use step, see the description of this step in section 4.2 of [Reference 11].
-[The replicability and reuse steps are highly challenging for RD, even if RD are "welldisseminated". We would emphasize the importance of the context and how the RD have been generated, to maximize reuse and reproducibility.] Yes, see [Reference 13] where the importance of the context, that is intimately related to the data and RD concepts, is examined thoroughly in Section 3 (OECD Glossary of Statistical Terms and Reference 18 indicated in [Reference 13]. Following your suggestion, this point has been highlighted now in the new version in preparation, see the CDUR protocols proposed for RD (Section 3.3).