Keywords
Open Data, Publications, National Institute of Health, Bibliometrics
This article is included in the Research on Research, Policy & Culture gateway.
Open Data, Publications, National Institute of Health, Bibliometrics
Recent years have seen an increased call for data sharing in clinical studies, especially for research funded by international and governmental agencies1. The call originally aimed to maximize transparency for clinical trial results1, but the benefits of data sharing extended beyond its original aim. Open access data is frequently cited as a boon for researchers, where researchers can re-analyze already collected data to answer a new research question2,3. To organize and maximize the scientific use of open access data, researchers and funders store their data in open access data repositories4. The Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC), is a National Heart, Lung, and Blood Institute is one such data repository, initiated in 2000 with the aim of sharing data from observational and interventional studies supported by the institute5. The impact of open access data, in terms of publications generated and citations received is still unknown. In this study, we aim to analyze number of publications that used BioLINCC open access data, and the impact of these publications through the citations they received.
There are a total of 205 studies listed on BioLINCC data repository, where four studies have their data stored in other repositories, and seven studies have only specimens available at the BioLINCC institution available upon request, but no datasets associated with them. We only included datasets stored in BioLINCC repository and can be accessed through their portal, which comprises 194 dataset.
We also contacted BioLINCC support to obtain an up to date list of published articles that used BioLINCC dataset, where we received a list of all publications up to 24th July 2019. Researchers accessing the BioLINCC datasets are requested to disclose any publication resulted from the use of the BioLINCC datasets. The BioLINCC also list published articles that used BioLINCC datasets on their website (https://biolincc.nhlbi.nih.gov/publications/). A manual search of PubMed was also carried out to confirm an updated full list of publications. We used the basic search of PubMed by inputting the title of the dataset in the search field. Any study that reported the use of the searched dataset as part of its results was included in our analysis. The included articles either used data stored in the BioLINCC repository alone, or used these datasets along with other datasets from other repositories.
We used Web of Science (WoS) database to analyze the characteristics of included publications. We prepared a list of digital object identifiers (DOIs) for the included articles. We inputted the DOI list into the WoS advanced search field, where only WoS indexed publications from the total included articles were analyzed further. The WoS database has a built-in analysis to provide data regarding the number of publications using the included dataset per year (yearly publications), topic of publication, affiliation of authors, and number of citations received6.
1,086 published articles used data from BioLINCC repository, but only 987 (90.88%) articles were WoS indexed. All articles published were English language (see underlying data7). The first publication using BioLINCC open data was from 2002. Since then, the number of publications has steadily increased since 2002, as shown in Figure 1, and peaked in 2018 with a total number of 138 publications.
The 987 open data publications received a total of 34,181 citations from 27,904 published articles up to 1st October 2019. The average citation per item for the publications using BioLINCC data was 34.63. The total number of citations received by publications using BioLINCC data per year has increased from only 2 citations in 2002, to a peak of 2361 citations in 2018 (Figure 2).
A total of 352 (35.66%) of the published articles related to cardiac and cardiovascular systems, 106 (10.74%) articles related to general internal medicine, and 92 (9.32%) related to public and occupational health. Figure 3 shows the 10 most common fields the studied publications using BioLINCC data published in. The American Journal of Cardiology had the highest number of publications using BioLINCC data (60; 6.08%), followed by the International Journal of Cardiology with 47 (4.76%), and American Journal of Medicine 25 (2.53%). Table 1 shows the top 10 journals that publications using BioLINCC data were published in. US authors participated in 842 (85.31%) of the publications using BioLINCC data, followed by Canadian and England authors, with 121 (12.26%), and 81 (8.21%), respectively (Figure 4). The top three affiliations in terms of publications using BioLINCC data were University of Alabama system, University of Alabama at Birmingham, and University of California system as shown in Table 2.
Tremendous effort has been made by BioLINCC in preparing dataset to be used as open data since its establishment, where hundreds of studies have been published using BioLINCC open data6. The impact of these publications can be measured in terms of citations received, where citations of publications using BioLINCC data have exponentially increased. They received a total of 2361 citations in the year 2018. Cardiology is the main field, with more than third of publications are cardiology related, and the top two journals publishing articles using BioLINCC data are also cardiology journals.
In an analysis done in 2017, Coady and his colleagues analyzed the administrative records of investigator requests for BioLINCC data, they found that 35% of clinical trial data were associated with at least one publication within five years from data public release8. Where we previously pointed to the importance of open access data for underfunded researchers2, our results showed that the top three countries using open access data are USA, UK, and Canada. Researchers new to open data might be skeptical about the publishing opportunity of studies performed using open data. In our analysis the top 10 journals publishing open data studies, which also comprised around 27% of the total studied publications, had an impact factor of more than two. Regarding the clinical impact of publications using open data, an example would be the post-hoc analysis of the Digitalis Investigation Group trial using the open data of the original trial9, which showed that digoxin therapy is associated with an increased risk of death from any cause among women, but not men, a finding that the original study failed to find. The digitalis trial is an example of how cardiology researchers are using open data, with efforts of cardiology initiatives encouraging data sharing and use by cardiology researchers10. Clinical trial data sharing in cardiology has also been used to validate the reproducibility of published results11. In our study, we found a higher number of cardiology related publications using open access data compared to other specialties.
Since 2003, the National Institute of Health mandated that data collected by studies receiving more than $500,000 be stored in a publicly available repository, with BioLINCC being the main repository for NIH-NHLB institute funded research12. This might explain the high impact of studies resulting from the BioLINCC stored data. On the other hand, data shared by platforms other than BioLINCC may lack sufficient description about the shared data, which will hamper its use by other researchers13. Moreover, repositories should focus on facilitating access to data and increasing awareness about it, so that more researchers can use the data from these repositories10,11. Our results are based on BioLINCC repository, where data of well-funded research projects undergo extensive processing before being publicly shared, resulting in well-curated, high quality data. Other studies should be done to validate our results, by evaluating data repositories that do not have the pre-sharing processing.
Harvard Dataverse: Publications that used Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) datasets. https://doi.org/10.7910/DVN/1TXA3C7
This project contains the following underlying data:
Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
No
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
No
Competing Interests: Drs. Vorland and Brown have received research funds from the Center for Open Science.
Reviewer Expertise: Meta-research
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: data science, data sharing and reuse
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
References
1. Ross JS, Ritchie JD, Finn E, Desai NR, et al.: Data sharing through an NIH central database repository: a cross-sectional survey of BioLINCC users.BMJ Open. 2016; 6 (9): e012769 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: clinical research, medical informatics
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
Version 4 (revision) 18 Aug 21 |
read | read | ||
Version 3 (revision) 21 Apr 21 |
read | |||
Version 2 (revision) 28 Sep 20 |
read | |||
Version 1 20 Jan 20 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)