Sharing health research data – the role of funders in improving the impact

Recent public health emergencies with outbreaks of influenza, Ebola and Zika revealed that the mechanisms for sharing research data are neither being used, or adequate for the purpose, particularly where data needs to be shared rapidly. A review of research papers, including completed clinical trials related to priority pathogens, found only 31% (98 out of 319 published papers, excluding case studies) provided access to all the data underlying the paper - 65% of these papers give no information on how to find or access the data. Only two clinical trials out of 58 on interventions for WHO priority pathogens provided any link in their registry entry to the background data. Interviews with researchers revealed a reluctance to share data included a lack of confidence in the utility of the data; an absence of academic-incentives for rapid dissemination that prevents subsequent publication and a disconnect between those who are collecting the data and those who wish to use it quickly. The role of the funders of research needs to change to address this. Funders need to engage early with the researchers and related stakeholders to understand their concerns and work harder to define the more explicitly the benefits to all stakeholders. Secondly, there needs to be a direct benefit to sharing data that is directly relevant to those people that collect and curate the data. Thirdly more work needs to be done to realise the intent of making data sharing resources more equitable, ethical and efficient. Finally, a checklist of the issues that need to be addressed when designing new or revising existing data sharing resources should be created. This checklist would highlight the technical, cultural and ethical issues that need to be considered and point to examples of emerging good practice that can be used to address them.

A review of research papers, including completed clinical trials related to priority pathogens, found only 31% (98 out of 319 published papers, excluding case studies) provided access to all the data underlying the paper -65% of these papers give no information on how to find or access the data. Only two clinical trials out of 58 on interventions for WHO priority pathogens provided any link in their registry entry to the background data.
Interviews with researchers revealed a reluctance to share data included a lack of confidence in the utility of the data; an absence of academic-incentives for rapid dissemination that prevents subsequent publication and a disconnect between those who are collecting the data and those who wish to use it quickly. The role of the funders of research needs to change to address this. Funders need to engage early with the researchers and related stakeholders to understand their concerns and work harder to define the more explicitly the benefits to all stakeholders. Secondly, there needs to be a direct benefit to sharing data that is directly relevant to those people that collect and curate the data. Thirdly more work needs to be done to realise the intent of making data sharing resources more equitable, ethical and efficient. Finally, a checklist of the issues that need to be addressed when designing new or revising existing data sharing resources should be created. This checklist would highlight the technical, cultural and ethical issues that need to be considered and

Reviewer Status
Invited Reviewers

Introduction
The benefits of sharing health research data to improve public health have been promoted by international research funders for over a decade but the reality is that the quality and volume of health research data shared, even in emergency situations, remains low 1,2 . This lack of progress seems to reflect a cultural reluctance among researchers to 'give up their data' without any clear benefits returning to them. This concern is heightened among researchers in low resource settings who feel that the requirements to share data, from funders and journals, risk turning them into data exporters unless greater efforts are made to ensure a fairer distribution of benefits. In this paper we draw on our experience of supporting data sharing initiatives and some commissioned research to highlight the barriers to sharing research data and the role research funders might play to improve this situation.

A decade of progress?
In January 2011, a group of research funding organizations published a joint statement on sharing health research data with the aim to promote the efficient use of those data to accelerate improvements in public health. The funders recognized that for data sharing to be most effective, a combination of technical and cultural issues need to be addressed. They framed this approach around three principles which required any data sharing mechanism they supported to be equitable, ethical and efficient (See Wellcome Trust page on sharing research data).

Box 1. Sharing research data to improve public health the principles in the full joint statement by funders of health research (2010)
Equitable: any approach to the sharing of data should recognise and balance the needs of researchers who generate and use data, other analysts who might want to reuse those data and the communities and funders who expect health benefits to arise from research.
Ethical: all data sharing should protect the privacy of individuals and the dignity of communities, while simultaneously respecting the imperative to improve public health through the most productive use of data.
Efficient: any approach to data sharing should improve the quality and value of research and increase its contribution to improving public health. Approaches should be proportionate and build on existing practice and reduce unnecessary duplication and competition.
Progress on encouraging the sharing of research data has been made over the subsequent decade and it is now common for research grants and journals to require the data underlying a paper or clinical trial to be shared (see PLOS editorial and publishing policies, AllTrials, and NIH data sharing policy.) However, recent public health emergencies with outbreaks of influenza, Ebola and Zika have brought into sharp focus the realization that the mechanisms for sharing data are neither being used or adequate for the purpose, particularly where data needs to be shared rapidly 3-5 .
In addition, researchers working in low-and middle-income countries highlight an inequity created by the disadvantage as they see it by the blanket requirements to share their data. Their concern is that sharing their data too soon, or without any restrictions will lead to their data being analysed by others with greater capacity, and no benefit will return to the researchers themselves or the populations they work with. In effect they become data exporters rather than partners. So while there is a lot of emphasis placed on data being Findable, Accessible, Interoperable and Reusable, known as the FAIR approach, many researchers in developing countries fear the reality for them will be far from fair 6-8 .

The findings of two surveys and a workshop
To explore this further, we commissioned two surveys to review the governance arrangements and standards within existing data sharing resources. The findings of those studies informed a workshop held in October 2017 with a set of stakeholders representing researchers and funding organizations. All the reports and supporting files are published as open access under a Creative Commons licence and in free-to-access repositories.
Readers are strongly encouraged to read that material as the primary source of reference 1,2,9,10 .
The first survey -Data Sharing in Public Health Emergencies -focussed on data sharing in public health emergencies concerned with the pathogens named by the World Health Organization as of priority concern because of their epidemic or pandemic potential (see WHO list of Blueprint priority diseases). A review of academic papers published since 2003 relating to these diseases was undertaken and attempts were then made to access the data underlying those publications via the web and through a direct survey of the corresponding authors. Interviews were undertaken with a range of people either conducting or supporting research in these areas and this was supplemented with a review of institutional policies, discussion documents and academic commentaries about standards and norms in data sharing 1,2 .
The second survey -Development of International Standards for Online Repositories -was designed to identify which 'standards' were being used in data sharing relating to the neglected diseases. Standards were identified following a review of publically accessible information (via the web or publication) relating to three main areas each with a set of elements describing the standards under those areas 9 .
A third report combined the findings of these two surveys and was used to shape thinking at a workshop held in Antwerp, Belgium in October 2017 10 .
The workshop brought together 26 experts representing agencies that included those that provide data sharing resources for diseases prevalent in low and middle income countries.

What does this tell us?
Sharing health research data currently remains the exception rather than the norm. The review of research papers, including completed clinical trials related to priority pathogens, found only 31% (98 out of 319 published papers, excluding case studies) provided access to all the data underlying the paper. While a few authors will provide the data on request, 65% of these papers give no information on how to find or access the data. And the review of clinical trial registries, for trials on interventions for priority pathogens, reported an even worse picture. Only two trials out of 58 provided any link in their registry entry to the background data 1,2 .
Interviews with researchers revealed the reasons for a reluctance to share data included a lack of confidence in the utility of the data and therefore unwillingness to invest resources to prepare it to be shared; absence of academic-incentives for rapid dissemination that prevents subsequent publication (as opposed to the public health need) and a disconnect between those who are collecting the data and those who wish to use it quickly.
A similar scepticism about how data might be used or misused, the potential harms to patients and the risks to the researcher sharing data that might reveal errors in their work, have been reported elsewhere 8,11 . policies. In part this might reflect the limited guidance offered by the same funders in supporting their researchers to understand and undertake data sharing to implement and monitor these policies in practice.
It appears the main barrier to sharing is not technical but cultural, with researchers remaining sceptical about the benefits to them of sharing data. For researchers in low-resource setting data sharing can even be seen as a threat that their data will be exported and exploited by others with little benefit returning to them.
Therefore, research funders should take stock and revise data sharing policies to provide incentive structures for researchers. One clear first step would be to engage early with the researchers and related stakeholders to understand their concerns and work harder to define the benefit of sharing beyond a general sense that sharing data is in the public interest. The overall purpose of sharing the data needs to be clear and ideally developed with input from data suppliers, secondary data users, potential end-users and beneficiaries, and if possible with input from the participants that are the source of those data. Concerns regarding privacy versus the secondary use of the data need to be explored and mechanisms put in place to balance the public benefit against potential risks to privacy and confidentiality.
Secondly, there needs to be a direct benefit to sharing data that is directly relevant to those people that collect and curate the data. For example academics require citation of their work, including a data set. The generation of data and its subsequent citation for reuse needs to be integrated into research assessment -an idea captured in the Declaration on Research Assessment (see San Francisco Declaration on Research Assessment). So if the purpose of the data sharing mechanism is clear and all stakeholders buy into that purpose and if they feel their inputs will be recognised in research assessment together this will create a strong incentive to share. This was certainly our experience when working with Schistosomiasis researchers 8 .
Thirdly, whilst there are a myriad of data standards to work with to meet the general principles of making data FAIR, more work needs to be done to realise the intent of making data sharing resources more equitable, ethical and efficient. As evident in the surveys summarized here good practice is starting to emerge so what is needed is better ways to share that practice. Funders need to work with the researchers and their networks to support the technical work required to develop standards that enable inter-operability.
For example one contributory role for funders would be to collect more systematically the data management plans that they have requested as part of funding grants and make them publicly accessible. In line with good practice these should be standardized where possible and ideally have clear, machinereadable metadata. An online resource that brings together the reference material and policies that are exemplars in each of the categories that cover governance, data curation, security and longevity would provide the basis for a framework to guide the future development of new sharing resources.
Finally, a checklist of the issues that need to be addressed when designing new or revising existing data sharing resources should be created. In addition to defining the purpose of data sharing this would highlight the technical, cultural and ethical issues that need to be considered and point to examples of emerging good practice that can be used to address them. The authors are working on this next stage and hope that with this type of planning and support in place the data sharing long desired by research funders will start to become the norm.

Data availability
No data are associated with this article.

Grant information
The author(s) declared that no grants were involved in supporting this work.

Open Peer Review
They also reported 65% of these papers gave no information on how to find or access the data (does the 65% refer to the 319 papers or the 98?).
They provide several recommendations, but outside of the checklist the recommendations seem more aspirational and more about why one would want to or should share data. But as framed are not easily actionable.
First -Funders need to engage early with the researchers and related stakeholders to understand their concerns and work harder to define the more explicitly the benefits to all stakeholders. Second -there needs to be a direct benefit to sharing data that is directly relevant to those people that collect and curate the data. Third -more work needs to be done to realize the intent of making data sharing resources more equitable, ethical and efficient.
Finally -a checklist of the issues that need to be addressed when designing new or revising existing data sharing resources should be created. This checklist would highlight the technical, cultural and ethical issues that need to be considered and point to examples of emerging good practice that can be used to address them.
The discussion needs to capture more of the following: There are legitimate concerns about early or premature data sharing that are not discussed, especially focusing on low resource research entities and the power dynamics that leverage the early release of data to the public as an opportunity for resource intensive researchers to leverage the data not only for "public good" but for their gain with no need for attribution to or sharing of downstream resources with the low resource entities that conducted the work. While many health research funders have a generic policy requiring research data to be shared in a manner that maximize health and societal benefit, these are often unfunded mandates, especially for research conducted in low resource countries or by low resource institutions.
Releasing data in the absence of the context of much of the data can lead to significant misinterpretation of the data. Also, for certain data there needs to be time to clean and collapse the data to ensure participant privacy before it can be made publicly available. This can be a laborious process to do correctly and in many instances there may not be funding for this provided by the funder and low resource institutions may not have resources to conduct this unfunded work in an expeditious manner.
For WHO pathogen work it seems funders might consider an a priori explicit plan of a cooperative agreement/partnership for early shared data for studies led by low resource "partners" doing the hands-on work and a more resource intensive partner (or funder) with capacity to do intense data analysis, with shared credit and equity in leveraging the data to get more funding for both partners. This can help to balance the power dynamics, and support early widespread data dissemination with the resource intensive partner (or funder) co-leading rapid production of publications and the creation of publicly available cleaned data for public benefit.
Introduction -improve what situation? I think this needs teasing out more in the first paragraph. Improving what -the quality of data to be shared, data sharing policies? What group of funders -who was involved? They framed this approach....Who framed?
Highlighting that data collected specifically to answer a specific question in a trial for example, may not be sufficient to answer a different question, is important. Understanding why and how the data were collected, analysed and interpreted is important when reviewing secondary data. If this is misinterpreted, there are risks associated to the use of that data.
Findings of the two surveys and workshop -Who was involved in the survey, how many people involved and what was the response rate? Were funders involved in this? (For both of the surveys) Why was the review from 2003, given that data sharing para from 2011? As expected the results of the review are low, so are these numbers from recent research? What is the time period of those that made access to data? Did you see a change over time, those identified are from 2015 onwards for example.
Who participated in the workshop? Agencies from where? I think this would be important given the focus on low to middle income countries.
How many people interviewed? And who were they? Was this semi-structured/structured interviews, and how did you interpret the transcripts? Providing some context will help determine the value of the comments and confidence about the interpretation of the interviews.
Concluding section -Ethical issues may also need to be reviewed, are participants made aware at time of consent that the data collected for the specific study can later be used [ Here are my comments:

Findings of two surveys and a workshop
First paragraph, first sentence -Who is "we"? The three authors of this paper?
Second paragraph -The authors talked about a review of academic literature. Is this part of the survey or workshop? Perhaps this should be in the Introduction.
Third paragraph -Suggest briefly stating the three main areas found in the survey.
Fourth and fifth paragraph -Please tell us a little bit more about the third report and the workshop.

What does this tell us?
Does this refer to the findings of the two surveys and workshop?
Paragraphs 2 and 3 -It is not clear what the objective of these paragraphs are. Do they summarise the barriers to data sharing? The reason I mentioned barriers is that in the Introduction, the authors said that this paper does two things -(1) to highlight the barriers to sharing research data and (2) the role research funders might play to improve this situation.
If indeed they are barriers, are these barriers based on the two surveys and workshop or do they include research conducted by other groups? If the latter, I believe there are more barriers such as costs and ownership issues.
Paragraph 2, first sentence. "Interviews with researchers". Please provide reference(s). Did the authors conduct these interviews?
What are the next steps: the role for funders in support of data sharing Paragraph 1 -"…there is very low compliance with these policies". How did the authors get this information? Please provide references if appropriate.
Paragraph 2 -"It appears the main barrier to sharing is not technical but cultural,…" How did the authors get this information? Please provide references if appropriate.
Paragraph 4 -"There needs to be direct benefit…." This is a general statement. What is the funders' specific role?
Suggest including a "Conclusions" section.
Is the topic of the opinion article discussed accurately in the context of the current literature?