ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

The impact of the National Heart, Lung, and Blood Institute data: analyzing published articles that used BioLINCC open access data

[version 1; peer review: 1 approved with reservations, 2 not approved]
PUBLISHED 20 Jan 2020
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Research on Research, Policy & Culture gateway.

Abstract

Background: Data sharing is now a mandatory prerequisite for several major funders and journals, where researchers are obligated to deposit the data resulting from their studies in an openly accessible repository. Biomedical open data are now widely available in almost all disciplines, where researchers can freely access and reuse these data in new studies. We aim to assess the impact of open data in terms of publications generated using open data and citations received by these publications, where we will analyze publications that used the Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) as an example.
Methods: As of July 2019, there was a total of 194 datasets stored in BioLINCC repository and accessable through their portal. We requested the full list of publications that used these datasets from BioLINCC, and we also performed a supplementary PubMed search for other publications. We used Web of Science (WoS) to analyze the characteristics of publications and the citations they received.
Results: 1,086 published articles used data from BioLINCC repository, but only 987 (90.88%) articles were WoS indexed. The number of publications has steadily increased since 2002 and peaked in 2018 with a total number of 138 publications on that year. The 987 open data publications received a total of 34,181 citations up to 1st October 2019. The average citation per item for the open data publications was 34.63. The total number of citations received by open data publications per year has increased from only 2 citations in 2002, peaking in 2018 with 2361 citations.
Conclusion: The vast majority of studies that used BioLINCC open data were published in WoS indexed journals and are receiving an increasing number of citations.

Keywords

Open Data, Publications, National Institute of Health, Bibliometrics

Introduction

Recent years have seen an increased call for data sharing in clinical studies, especially for research funded by international and governmental agencies1. The call originally aimed to maximize transparency for clinical trial results1, but the benefits of data sharing extended beyond its original aim. Open access data is frequently cited as a boon for researchers, where researchers can re-analyze already collected data to answer a new research question2,3. To organize and maximize the scientific use of open access data, researchers and funders store their data in open access data repositories4. The Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC), is a National Heart, Lung, and Blood Institute is one such data repository, initiated in 2000 with the aim of sharing data from observational and interventional studies supported by the institute5. The impact of open access data, in terms of publications generated and citations received is still unknown. In this study, we aim to analyze number of publications that used BioLINCC open access data, and the impact of these publications through the citations they received.

Methods

Data collection

There are a total of 205 studies listed on BioLINCC data repository, where four studies have their data stored in other repositories, and seven studies have only specimens available at the BioLINCC institution available upon request, but no datasets associated with them. We only included datasets stored in BioLINCC repository and can be accessed through their portal, which comprises 194 dataset.

We also contacted BioLINCC support to obtain an up to date list of published articles that used BioLINCC dataset, where we received a list of all publications up to 24th July 2019. Researchers accessing the BioLINCC datasets are requested to disclose any publication resulted from the use of the BioLINCC datasets. The BioLINCC also list published articles that used BioLINCC datasets on their website (https://biolincc.nhlbi.nih.gov/publications/). A manual search of PubMed was also carried out to confirm an updated full list of publications. We used the basic search of PubMed by inputting the title of the dataset in the search field. Any study that reported the use of the searched dataset as part of its results was included in our analysis. The included articles either used data stored in the BioLINCC repository alone, or used these datasets along with other datasets from other repositories.

Bibliometric analysis

We used Web of Science (WoS) database to analyze the characteristics of included publications. We prepared a list of digital object identifiers (DOIs) for the included articles. We inputted the DOI list into the WoS advanced search field, where only WoS indexed publications from the total included articles were analyzed further. The WoS database has a built-in analysis to provide data regarding the number of publications using the included dataset per year (yearly publications), topic of publication, affiliation of authors, and number of citations received6.

Results

1,086 published articles used data from BioLINCC repository, but only 987 (90.88%) articles were WoS indexed. All articles published were English language (see underlying data7). The first publication using BioLINCC open data was from 2002. Since then, the number of publications has steadily increased since 2002, as shown in Figure 1, and peaked in 2018 with a total number of 138 publications.

25a03a20-2b6e-413a-923a-aa05c181254e_figure1.gif

Figure 1. Number of publications that used Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) open data since 2002.

The 987 open data publications received a total of 34,181 citations from 27,904 published articles up to 1st October 2019. The average citation per item for the publications using BioLINCC data was 34.63. The total number of citations received by publications using BioLINCC data per year has increased from only 2 citations in 2002, to a peak of 2361 citations in 2018 (Figure 2).

25a03a20-2b6e-413a-923a-aa05c181254e_figure2.gif

Figure 2. The total number of citations received by open data publications per year.

A total of 352 (35.66%) of the published articles related to cardiac and cardiovascular systems, 106 (10.74%) articles related to general internal medicine, and 92 (9.32%) related to public and occupational health. Figure 3 shows the 10 most common fields the studied publications using BioLINCC data published in. The American Journal of Cardiology had the highest number of publications using BioLINCC data (60; 6.08%), followed by the International Journal of Cardiology with 47 (4.76%), and American Journal of Medicine 25 (2.53%). Table 1 shows the top 10 journals that publications using BioLINCC data were published in. US authors participated in 842 (85.31%) of the publications using BioLINCC data, followed by Canadian and England authors, with 121 (12.26%), and 81 (8.21%), respectively (Figure 4). The top three affiliations in terms of publications using BioLINCC data were University of Alabama system, University of Alabama at Birmingham, and University of California system as shown in Table 2.

25a03a20-2b6e-413a-923a-aa05c181254e_figure3.gif

Figure 3. The 10 most common fields the studied open data articles published in.

Table 1. Top 10 journals publishing articles that used Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) open data with their respective impact factor according to 2018 Journal Citation report.

JOURNAL Impact factorArticles (%)
AMERICAN JOURNAL OF CARDIOLOGY 2.84360 (6.08%)
INTERNATIONAL JOURNAL OF CARDIOLOGY 3.47147 (4.76%)
AMERICAN JOURNAL OF MEDICINE 4.76025 (2.53%)
EUROPEAN JOURNAL OF HEART FAILURE 12.12922 (2.23%)
HYPERTENSION 7.01722 (2.23%)
PLOS ONE 2.77621 (2.13%)
CIRCULATION 23.05418 (1.82%)
JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY 18.63918 (1.82%)
JOURNAL OF CARDIAC FAILURE 3.96716 (1.62%)
EUROPEAN HEART JOURNAL 24.88915 (1.52%)
25a03a20-2b6e-413a-923a-aa05c181254e_figure4.gif

Figure 4. The top countries published using Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) open data.

Table 2. The top affiliations in terms of open data publications using Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) open data.

OrganizationArticlesPercentage
UNIVERSITY OF ALABAMA BIRMINGHAM 12012.158%
UNIVERSITY OF ALABAMA SYSTEM 12012.158%
UNIVERSITY OF CALIFORNIA SYSTEM 10911.044%
HARVARD UNIVERSITY 10510.638%
UNIVERSITY OF CALIFORNIA SAN FRANCISCO 575.775%
CASE WESTERN RESERVE UNIVERSITY 555.572%
VETERANS HEALTH ADMINISTRATION VHA 545.471%
UNIVERSITY OF CALIFORNIA LOS ANGELES 535.370%
UNIVERSITY OF TEXAS SYSTEM 525.268%
PENNSYLVANIA COMMONWEALTH SYSTEM OF HIGHER EDUCATION PCSHE 515.167%

Discussion

Tremendous effort has been made by BioLINCC in preparing dataset to be used as open data since its establishment, where hundreds of studies have been published using BioLINCC open data6. The impact of these publications can be measured in terms of citations received, where citations of publications using BioLINCC data have exponentially increased. They received a total of 2361 citations in the year 2018. Cardiology is the main field, with more than third of publications are cardiology related, and the top two journals publishing articles using BioLINCC data are also cardiology journals.

In an analysis done in 2017, Coady and his colleagues analyzed the administrative records of investigator requests for BioLINCC data, they found that 35% of clinical trial data were associated with at least one publication within five years from data public release8. Where we previously pointed to the importance of open access data for underfunded researchers2, our results showed that the top three countries using open access data are USA, UK, and Canada. Researchers new to open data might be skeptical about the publishing opportunity of studies performed using open data. In our analysis the top 10 journals publishing open data studies, which also comprised around 27% of the total studied publications, had an impact factor of more than two. Regarding the clinical impact of publications using open data, an example would be the post-hoc analysis of the Digitalis Investigation Group trial using the open data of the original trial9, which showed that digoxin therapy is associated with an increased risk of death from any cause among women, but not men, a finding that the original study failed to find. The digitalis trial is an example of how cardiology researchers are using open data, with efforts of cardiology initiatives encouraging data sharing and use by cardiology researchers10. Clinical trial data sharing in cardiology has also been used to validate the reproducibility of published results11. In our study, we found a higher number of cardiology related publications using open access data compared to other specialties.

Since 2003, the National Institute of Health mandated that data collected by studies receiving more than $500,000 be stored in a publicly available repository, with BioLINCC being the main repository for NIH-NHLB institute funded research12. This might explain the high impact of studies resulting from the BioLINCC stored data. On the other hand, data shared by platforms other than BioLINCC may lack sufficient description about the shared data, which will hamper its use by other researchers13. Moreover, repositories should focus on facilitating access to data and increasing awareness about it, so that more researchers can use the data from these repositories10,11. Our results are based on BioLINCC repository, where data of well-funded research projects undergo extensive processing before being publicly shared, resulting in well-curated, high quality data. Other studies should be done to validate our results, by evaluating data repositories that do not have the pre-sharing processing.

Data availability

Underlying data

Harvard Dataverse: Publications that used Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) datasets. https://doi.org/10.7910/DVN/1TXA3C7

This project contains the following underlying data:

  • BioLINCC Dataset.tab (Spreadsheet containing details of publications using BioLINCC datasets)

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

Comments on this article Comments (0)

Version 4
VERSION 4 PUBLISHED 20 Jan 2020
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
AlRyalat SA, El Khatib O, Al-qawasmi O et al. The impact of the National Heart, Lung, and Blood Institute data: analyzing published articles that used BioLINCC open access data [version 1; peer review: 1 approved with reservations, 2 not approved]. F1000Research 2020, 9:30 (https://doi.org/10.12688/f1000research.21884.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 20 Jan 2020
Views
30
Cite
Reviewer Report 25 Sep 2020
Colby Vorland, Department of Applied Health Science, Indiana University School of Public Health-Bloomington, Bloomington, IN, USA 
Andrew Brown, Department of Applied Health Science, Indiana University School of Public Health-Bloomington, Bloomington, IN, USA 
Not Approved
VIEWS 30
Summary:
The authors ask an interesting question as to what the impact of BioLINCC has been on the use of open data. However, the assessments of impact do not seem to appropriately contextualize the use of BioLINCC datasets as ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Vorland C and Brown A. Reviewer Report For: The impact of the National Heart, Lung, and Blood Institute data: analyzing published articles that used BioLINCC open access data [version 1; peer review: 1 approved with reservations, 2 not approved]. F1000Research 2020, 9:30 (https://doi.org/10.5256/f1000research.24126.r70340)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 21 Apr 2021
    Saif Aldeen AlRyalat, Department of Ophthalmology, University of Jordan Hospital, The University of Jordan, Amman, 11942, Jordan
    21 Apr 2021
    Author Response
    I went through the manuscript and amended and responded to all comments. Here are the responses.

    Reviewer Colby Vorland and Andrew Brown



    It is an honor ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 21 Apr 2021
    Saif Aldeen AlRyalat, Department of Ophthalmology, University of Jordan Hospital, The University of Jordan, Amman, 11942, Jordan
    21 Apr 2021
    Author Response
    I went through the manuscript and amended and responded to all comments. Here are the responses.

    Reviewer Colby Vorland and Andrew Brown



    It is an honor ... Continue reading
Views
36
Cite
Reviewer Report 08 Sep 2020
Lisa Federer, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA 
Not Approved
VIEWS 36
The authors have addressed an interesting question - what are the impacts of open data, specifically considering citations received by publications using open data sets. This question is very timely given the increasing number of funder and journal requirements that ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Federer L. Reviewer Report For: The impact of the National Heart, Lung, and Blood Institute data: analyzing published articles that used BioLINCC open access data [version 1; peer review: 1 approved with reservations, 2 not approved]. F1000Research 2020, 9:30 (https://doi.org/10.5256/f1000research.24126.r70339)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 28 Sep 2020
    Saif Aldeen AlRyalat, Department of Ophthalmology, University of Jordan Hospital, The University of Jordan, Amman, 11942, Jordan
    28 Sep 2020
    Author Response
    We would like to thank Dr. Frederer, who is an expert in the field of data science, for the insight and thoughts she shared through her revision. While we agree with her ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 28 Sep 2020
    Saif Aldeen AlRyalat, Department of Ophthalmology, University of Jordan Hospital, The University of Jordan, Amman, 11942, Jordan
    28 Sep 2020
    Author Response
    We would like to thank Dr. Frederer, who is an expert in the field of data science, for the insight and thoughts she shared through her revision. While we agree with her ... Continue reading
Views
40
Cite
Reviewer Report 10 Aug 2020
Christian Ohmann, European Clinical Research Infrastructure Network, ECRIN, Düsseldorf, Nordrhine-Westfalia, Germany 
Approved with Reservations
VIEWS 40
This is an interesting paper about the impact of data sharing for a non-commercial repository (BioLINCC). From the viewpoint of the reviewer, the manuscript should be improved:

In the section “data collection” the authors describe different data ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Ohmann C. Reviewer Report For: The impact of the National Heart, Lung, and Blood Institute data: analyzing published articles that used BioLINCC open access data [version 1; peer review: 1 approved with reservations, 2 not approved]. F1000Research 2020, 9:30 (https://doi.org/10.5256/f1000research.24126.r68865)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 28 Sep 2020
    Saif Aldeen AlRyalat, Department of Ophthalmology, University of Jordan Hospital, The University of Jordan, Amman, 11942, Jordan
    28 Sep 2020
    Author Response
    It is an honor to receive feedback from professor Ohmann, we performed almost all the changes suggested, and we hope the current version satisfies the quality required. Here are the ... Continue reading
  • Author Response 21 Apr 2021
    Saif Aldeen AlRyalat, Department of Ophthalmology, University of Jordan Hospital, The University of Jordan, Amman, 11942, Jordan
    21 Apr 2021
    Author Response
    Dear Professor Ohmann,

    We hope our responses satisfy your comments, if so, we hope to receive your feedback. 

    Thank you for your time.

    Sincerely,
    Saif Aldeen AlRyalat, MD.
    ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 28 Sep 2020
    Saif Aldeen AlRyalat, Department of Ophthalmology, University of Jordan Hospital, The University of Jordan, Amman, 11942, Jordan
    28 Sep 2020
    Author Response
    It is an honor to receive feedback from professor Ohmann, we performed almost all the changes suggested, and we hope the current version satisfies the quality required. Here are the ... Continue reading
  • Author Response 21 Apr 2021
    Saif Aldeen AlRyalat, Department of Ophthalmology, University of Jordan Hospital, The University of Jordan, Amman, 11942, Jordan
    21 Apr 2021
    Author Response
    Dear Professor Ohmann,

    We hope our responses satisfy your comments, if so, we hope to receive your feedback. 

    Thank you for your time.

    Sincerely,
    Saif Aldeen AlRyalat, MD.
    ... Continue reading

Comments on this article Comments (0)

Version 4
VERSION 4 PUBLISHED 20 Jan 2020
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.