Keywords
Altmetrics, bibliometrics, publication impact
This article is included in the Research on Research, Policy & Culture gateway.
This article is included in the Interactive Figures collection.
Altmetrics, bibliometrics, publication impact
This revision addresses the reviewer comments received. It includes an expanded introduction, including a new figure to provide background to the EMPIRE Index. We provide greater clarity of the objectives and purpose of this analysis, including why we have used CiteScore and Google Trends data, and also expanded the discussion to provide more context for the interpretation of the results. We have added information on data extraction from Altmetrics Explorer and PlumX. We have also added information on the definition of the publication types (these are defined by PubMed) and have changed “altmetrics” to the more general term “article-level metrics” throughout.
See the authors' detailed response to the review by Shir Aviv-Reuven
See the authors' detailed response to the review by José Luis Ortega
See the authors' detailed response to the review by Arman Yurisaldi Saleh
Article-level measures of publication impact (ALMs, which include alternative metrics or altmetrics) can help to inform the impact of a publication among different audiences and in different contexts. Although the journal impact factor (JIF) and related scores such as CiteScore may help to identify journals with a high readership, they are widely recognised as being a poor indicator of the quality or impact of individual research articles1,2. We have previously described a novel approach to summarising ALMs, the EMPIRE (EMpirical Publication Impact and Reach Evaluation) Index, which uses article-level metrics to assess the impact of medical publications in terms relevant to different stakeholders3. The EMPIRE Index provides component scores for scholarly, social and societal impact, as well as a total impact score and predictive reach metrics (Figure 1). The scholarly component correlates weakly with CiteScore while the social score correlates closely with the Altmetric Attention Score, an altmetric-based score that is mostly related to citations in news and social media. The societal component, comprised of citations in guidelines, policy documents and patents, represents a distinct score.
HCP, healthcare provider; NEJM, New England Journal of Medicine.
The EMPIRE Index uses randomized controlled clinical trials published in the NEJM as benchmark, which studies that are selected by the journal editors as likely to be of the highest impact for the practice of medicine. However, this standard cannot be uniformly applied to all types of studies, across different disease areas and development stages of a therapy. It is widely recognised that publication metrics vary by discipline, and to facilitate the comparison of publication impact across different fields, field-normalised citation impacts are frequently calculated4. Although evidence is limited, some research suggests that ALMs can also vary by publication type. For example, review articles in pharmacology journals can receive twice as many citations as original articles5.
Therefore, the nuances of each field and publication type should be considered in order to understand what constitutes a typical score. This additional context is an important consideration as it allows users to compare scores with a relevant frame of reference and so to accurately interpret and utilise publication metrics and to derive meaningful insights.
It is likely that EMPIRE Index scores vary by disease area and by publication type. However, the scale and nature of these variations are unknown, which complicates efforts to compare scores of individual publications. This in turn limits the utility of the scale to identify ‘high impact’ publications, since what counts as an atypically high score will vary across therapy areas publication types.
In this brief report we present an exploratory investigation of the interaction between disease indications and publication types and typical EMPIRE Index scores. We sought to explore the variations that may exist in typical (i.e. average and median) EMPIRE Index scores across different therapy areas and publication types, and to explore the magnitude of these variations. This will allow the metrics of individual publications to be placed in a richer context of potentially comparable publications to facilitate interpretation of individual publication scores.
We also sought to provide additional context for the observed typical scores in two ways. We look to see if there are similar variations in journal CitesScore to explore whether variations could be expected based on differences in this journal-level indicator. We also look to see if there are similar variations in public interest in therapy areas we explore, with particular interest in comparing this with the EMPIRE Index includes a social score. For this we examine Google search trends, which have previously been used as a population-level measure of interest in health topics6.
This exploratory study investigated 12 disease indications, purposefully chosen to reflect a variety of common and rare diseases with a variety of aetiologies. Six of these were rare diseases, selected as a convenience sample of disease indications with which the authors were most familiar. No formal statistical power analysis was undertaken. However, we aimed for disease samples from approximately 1000 publications, which would enable publication type sub-analyses. The six rare disease samples were, therefore, pooled.
Relevant publications were identified for each disease by the appearance of the disease name in the publication title. We limited the search period to items with publication dates between 1 May 2017 and 1 May 2018, to give sufficient time for metrics to accumulate while also minimising the time-dependent variation in metrics.
The searches were conducted on PubMed between 22 June 2020 and 3 July 2020, using the following search string:
For each disease, we conducted secondary searches for each publication type using PubMed tags for those of interest (i.e. the search string above and either “review”, "systematic review", "clinical trial, phase iii", “clinical trial” or "observational study"). PubMed publication types are metadata supplied by PubMed and derive originally from publisher submissions7. PubMed IDs were entered into the Altmetric Explorer and PlumX dashboards and ALMs downloaded over the period 23 June 2020 to 11 July 2020. ALMs were assumed to be zero for any publication for which Altmetric Explorer did not return a result. We also obtained the journal CiteScore for all publications8.
EMPIRE Index scores were calculated for all publications as described previously3. Briefly, selected ALMs that compose the EMPIRE Index were weighted and aggregated to form three component scores (social impact, scholarly impact and societal impact), which were then summed to form a total impact score.
Each disease area comprised a different mixture of publication types, which we expected could confound the analysis; multivariate analysis on such a heterogenous, non-normal and zero-inflated data set is problematic. Therefore, we opted to create standardised samples through random polling.
A sample was created for each disease area with a standardised mix of publication types chosen to maximise the total number of publications retained (the standardised publication types [SPT] set). First, the two least common publication types (phase 3 clinical trials and systematic reviews) were excluded because of the high variation between disease areas and because they are largely subsets of other publication types (clinical trials and reviews, respectively). Although the observational studies publication type was only slightly more common than systematic reviews, it was retained as it was considered to be functionally very different from clinical trials and reviews. The proportions of each of the remaining three publication types were calculated for each disease set, as well as for the overall set. Publications were then trimmed from each disease set by random sampling, as needed, to match the proportions in the overall set. The trimmed publication sets formed the SPT set.
Similarly, each publication type comprised a different mix of diseases. A standardised disease areas (SDA) set was created by random sampling using a similar approach that ensured each publication type included the same mix of diseases, while maximising the total number of publications retained.
We downloaded weekly Google Trends data on relative interest over time for the period of interest for these diseases (May 1 2017 to May 1 2018). A score of 100 indicates the maximum interest in any week over the search period and across any of the search terms of interest. The year averages presented here are expressed relative to that maximum score.
As these analyses were exploratory, we primarily provide descriptive statistics and only minimal statistical analysis was undertaken. Intra-group differences were assessed using Kruskal-Wallis one-way analysis of variance, a non-parametric test for equality of population means (a significant result indicates that that at least one population median of one group is different from the population median of at least one other group).
In total, 20 577 publications were identified across the 12 disease areas9, of which 5825 (28%) were tagged with one of the publication types of interest (Table 1). Table 1 also shows the Google search interest for each of these diseases.
Google search interest is the average weekly interest across the search period, and is a relative score 0–100 where 100 is the maximum score for any disease in any individual week.
The numbers of publications retained in the SDA set used for publication type comparisons (i.e. with the same disease indication composition for each publication type) are shown in Table 2.
Median EMPIRE Index scores and CiteScores for each disease in the SDA set are shown in Figure 2 and Table 3. Mean EMPIRE Index scores, shown in Figure 3, broadly reflect the median scores. Statistical analysis indicated that there was some significant variation in the medians of each component as well as the total impact score and journal CiteScore. In general, the ranking of publication type is relatively consistent across different types of impact. Notably, phase 3 clinical trials had the highest median and mean scores, while observational studies had the lowest. Systematic reviews had higher impact than reviews. Most articles across all publication types had no societal impact, and significant differences in societal impact were driven by outliers. Of note, eight of the ten publications with the highest societal impact were clinical trials, and six of those were in non-small cell lung cancer (NSCLC).
CI, confidence interval.
The interactive version (online only, accessible here: https://s3.eu-west-2.amazonaws.com/ox.em/webflow/p29ieu21/chart1.html) also shows mean EMPIRE Index scores for each disease by publication type (full set).
The numbers of publications retained in the SPT set used for disease comparisons (i.e. with the same publication type composition for each disease indication) are shown in Table 4.
Median EMPIRE Index scores and journal CiteScores for each disease in the SPT set are shown in Figure 4 and Table 5. Kruskall–Wallis testing indicated at least one significant pairwise difference in the total scores, each component score and journal CiteScore. Migraine and multiple sclerosis (MS) had the highest impact across social and scholarly component scores as well as the total impact score, while NSCLC and psoriasis had the lowest. Most articles across all diseases had no societal impact, with significant differences in societal impact driven by outliers. The eight publications with the highest societal impact were all important clinical outcomes trials (three in type 2 diabetes, three in NSCLC and one each in migraine and asthma).
MS, multiple sclerosis; NSCLC, non-small cell lung cancer; T2D, type 2 diabetes.
Mean EMPIRE Index scores for each disease in the SPT set are shown in Figure 5. The interactive version of Figure 5 (online publication only) also shows the mean EMPIRE Index scores by disease for each publication type (full data set). Mean scores do not show clear trends for differences between disease indications, although societal impact appears to be lower for asthma and MS, and higher for migraine than other diseases. The high societal impact for migraine was driven by review articles; 16 of the 23 migraine articles with societal impact scores above zero were review articles. The scholarly impact for rare diseases appears to be higher than for other disease areas, albeit with low confidence owing to small numbers of publications included.
The interactive version (online only, accessible here: https://s3.eu-west-2.amazonaws.com/ox.em/webflow/p29ieu21/chart2.html) also shows mean EMPIRE Index scores for each disease by publication type (full set). MS, multiple sclerosis; NSCLC, non-small cell lung cancer; T2D, type 2 diabetes.
This analysis found that typical EMPIRE Index scores vary across both disease indications and publication types. These results provide valuable contextual information for interpreting EMPIRE Index scores and publication metric findings in general, for individual publications. For example, these findings can be used to help to understand whether a particular publication has notably high (or low) metrics.
We found considerable differences between disease areas, which broadly reflected public interest in the disease as assessed through Google search interest. Google search interest reflects the volumes of searches conducted on Google by the general public, and can be taken as an indication of the number of people actively seeking information on different topics. We found that the three diseases with the highest median EMPIRE Index scores, especially social impact, were migraine, MS and asthma; these also had the highest public interest. These differences were not observed in journal CiteScores, meaning that the disease areas with higher EMPIRE Index impact were not necessarily published in ‘high impact’ journals. NSCLC had low public interest (‘lung cancer’ as a general term was higher, but still lower than any of the other five major disease areas examined). Publications in NSCLC also had low median total impact scores, particularly in terms of social impact scores, despite being published in journals with higher median CiteScores. Overall, this suggests that the reasons that publications in some disease areas attract higher social impact scores (driven by citations in news articles and social media) is that these disease areas are of greater interest to the general public.
Although this suggests distinct differences between diseases in terms of publication impact, it should be noted that the period of interest was only a single year. The findings could therefore have been influenced by the completion of important clinical studies, which can vary from year to year across disease areas.
A clear picture is seen for publication types, with phase 3 trials demonstrating much higher metrics than other types. Phase 3 clinical trials are the last stage of clinical research of new drug treatments, and are typically large scale and provide evidence intended to guide treatment practice10. The higher impact observed for this publication type likely relates to higher public interest, higher scholarly interest, and greater likelihood of citations in guidelines and policy documents. Systematic reviews had higher impact than general reviews; interestingly, this was despite being published in journals with similar median CiteScores. This likely reflects that the methodological approach to synthesising systematic literature reviews makes them more impactful. Observational studies had the lowest impact, suggesting observational analyses are still generally regarded as having lower interest.
In general, across both publication types and disease indications, median scores were higher for scholarly impact than for social or societal impact, while mean and maximal scores were broadly similar (or lower). This suggests that score distribution is more skewed for social and societal impact, with many papers generating little interest despite some scholarly impact.
A key strength of this study is the use of an automated approach to identify a large pool of publications for analysis. However, the automated process used depends on the reliability of the underlying data. For example, disease areas were identified through a PubMed search on article titles, which may have excluded some relevant articles or included irrelevant ones. The PubMed search engine uses automatic term mapping, which usually makes the search more inclusive but can introduce inconsistencies11. Publication types were identified by metadata tags, but these can often be inconsistently applied or missing. It can also result in duplication; for example, some phase 3 clinical trial publications in our sample were also classified as clinical trials.
In conclusion, the EMPIRE Index successfully identified differences in impact by disease indication and publication type. This supports the notion that there is no universal gold standard metric for publications, and instead the impact of each publication needs to be evaluated in the context of the type of publication, disease area and potentially other factors. These findings should be considered when using the EMPIRE Index to assess publication impact.
Figshare: EMPIRE Index disease and publication type analysis. https://doi.org/10.6084/m9.figshare.17072435.v19
This project contains the following underlying data:
SMA metrics unlinked 11Jul20.xlsx
Psoriasis metrics unlinked 11Jul20.xlsx
NSCLC metrics unlinked 5Jul20.xlsx
NET metrics unlinked 11Jul20.xlsx
NASH metrics unlinked 11Jul20.xlsx
MS metrics unlinked 5Jul20.xlsx
Migraine metrics unlinked 5Jul20.xlsx
Google search interest (30Jul21).xlsx
DLBCL metrics unlinked 11Jul20.xlsx
Asthma metrics unlinked 5Jul20.xlsx
TSC metrics unlinked 11Jul20.xlsx
TNBC metrics unlinked 11Jul20.xlsx
T2DM metrics unlinked 5Jul20.xlsx
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bibliometrics, Scientometrics, academic search, journals impact
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bibliometrics, altmetrics, academic search engines, scholarly social networks
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
No
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
No
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bibliometrics, altmetrics, academic search engines, scholarly social networks
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 5 (revision) 30 Oct 24 |
read | ||
Version 4 (revision) 16 Sep 24 |
read | read | |
Version 3 (revision) 10 Mar 23 |
read | ||
Version 2 (revision) 12 Apr 22 |
read | ||
Version 1 27 Jan 22 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)