A review of data sharing statements in observational studies published in the BMJ: A cross-sectional study

In order to understand the current state of data sharing in observational research studies, we reviewed data sharing statements of observational studies published in a general medical journal, the British Medical Journal. We found that the majority (63%) of observational studies published between 2015 and 2017 included a statement that implied that data used in the study could not be shared. If the findings of our exploratory study are confirmed, room for improvement in the sharing of real-world or observational research data exists.


Introduction
Over the recent years, a number of articles and movements have called for the sharing of clinical trial data 1-3 . However the access to and sharing of real-world/observational data receives little and arguably insufficient attention. In this study we sought to assess the current state of data sharing in published observational studies in a general medical journal, namely the British Medical Journal (BMJ). The BMJ was chosen as the journal does not enforce a commitment of data sharing for observational studies 4 , but all research articles are required to contain a data sharing statement 5 .

Methods
All observational research articles published in the BMJ between 1 st January 2015 and 31 st August 2017 were investigated. These dates were chosen as it provides over 100 articles for analysis and recent data post-dating some of the articles regarding clinical trial data sharing 2 . Assuming a proportion of articles not being shared of between 20-40%, 100 articles would enable a precision of 7.8 to 9.6%. Observational research articles were defined as cohort, case-control and cross-sectional studies, as well as case series. Meta-analyses, systematic reviews, randomised controlled trials and genetic/Mendelian randomisation studies were excluded. The data sharing statements of these studies were reviewed. If the statement written was "no additional data available" 5 or that data was not publicly available, the study data was classed as not shared. Statements alluding to data being available from the corresponding author, or a referral to access policies of the data source the study data was classed as shared.
Where statements alluded to code or technical appendix being available, but no reference to data specifically being available, the study data was classed as not shared.

Results
Two hundred and thirty seven observational studies were included. A review of the data sharing statements of these studies revealed that 149 (63%) studies did not share data.

Conclusions
In our review of the data sharing statements of observational studies published in the BMJ, we identified that the majority of studies did not share data. Whilst there are likely many reasons for this, including patient confidentiality concerns and the access possibilities of the data source, the lack of data sharing in observational research is a potential cause for concern. The key limitation of our study was the scope of the data. We only reviewed the data sharing statements of one medical journal and therefore the generalisability of our results is unclear. The data sharing policy of the BMJ is relatively general (sharing is encouraged, but specific requirements are not communicated). Follow on work should assess how the robustness of a journal's data sharing policy could influence the rate of data sharing. Our analysis should be seen as exploratory rather than definitive. Further studies are needed, in greater depth, to confirm or refute our findings. If consistent findings are seen, the lack of sharing of observational research data is an area that warrants further attention.

Data availability
Dataset 1: Raw data showing the studies identified from the BMJ and whether the data sharing statements indicated data was not/were available. doi, 10.5256/f1000research.12673.d177871 6 Competing interests LM and SR are employees of Bristol-Myers Squibb Company. AS, AS, SG and RW are employees of Evidera Inc.

Grant information
The author(s) declared that no grants were involved in supporting this work. This exploratory analysis contributes to the literature on biomedical data sharing practices, and demonstrates the importance of the further understanding of data sharing policies, practices, and barriers for domain and type specific data sets, such as those associated with observational studies.
While it does not impact the soundness of this short report, I would recommend that the authors expand on their reasoning for choosing BMJ for their analysis. Specifically, while BMJ requires data sharing statements, its data sharing policy is relatively general (sharing is encouraged, but specific requirements are not communicated). It would be helpful for the authors to discuss (potentially as a follow-up analyses), how the robustness of a journal's data sharing policy could influence the rate of data sharing.
Regarding the sampling methodology, the authors write, "All observational research articles published in the BMJ between 1st January 2015 and 31st August 2017 were investigated. These dates were chosen as it provides a reasonable sample size for analysis and recent data post-dating some of the articles regarding clinical trial data sharing." In my view, this statement is insufficient. The authors should provide more detailed and quantitative information about the strength and the nature of the sample. Finally, the authors write "A review of the data sharing statements of these studies revealed that 149 (63%) studies had a statement implying that the data underlying the study could not be share". However the methodology does not indicate if the data "could" be shared, only if it was shared. I would urge the authors to revise this statement be be more precise.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Sreeram Ramagopalan
We thank the reviewer for their comments. We focussed on the BMJ because of the journal being a key backer of the all trials initiative but we agree with the reviewers point and have updated the discussion. We have updated the methods to include a precision calculation. We corrected the results sentence. This is an interesting study that draws attention to an important area -open access to the data underpinning research papers allowing assessment of reproducibility. It is of course limited by it's focus on one journal as data-sharing policies vary between journals. In addition, requested statements by journals often become formulaic as the tendency is to copy what other papers have said previously. Some journals have tried new approaches in an effort to encourage data-sharing such as the NEJM 'Sprint Challenge'.
However, the major limitation is a lack of further data on why data was not shared. Some providers such as the UK primary care data repository CPRD do not allow data-sharing, even if the authors of articles are keen proponents of open data-access. More in depth work will be required to understand the true barriers and way ahead for data-sharing.

If applicable, is the statistical analysis and its interpretation appropriate? Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more