Does the impact of medical publications vary by disease indication and publication type? An exploration using a novel, value-based, publication metric framework: the EMPIRE Index [version 1; peer review: 1 not approved]

Background: The EMPIRE (EMpirical Publication Impact and Reach Evaluation) Index is a value-based, multi-component metric framework to assess the impact of medical publications in terms of relevance to different stakeholders. It comprises three component scores (social, scholarly and societal impact), each incorporating related altmetrics that indicate a different aspect of engagement with the publication. Here, we present an exploratory investigation of whether publication types or disease indications influence EMPIRE Index scores. Methods: Article-level metrics were extracted and EMPIRE Index scores were calculated for 5825 journal articles published from 1 May 2017 to 1 May 2018, representing 12 disease indications (chosen to reflect a wide variety of common and rare diseases with a variety of aetiologies) and five publication types. Results: There were significant differences in scores between article types and disease indications. Median (95% CI) social and scholarly impact scores ranged from 1.2 (0.3–1.6) to 4.8 (3.1–6.6), respectively, for phase 3 clinical trials, and from 0.3 (0.3–0.4) to 2.3 (1.9–2.6), respectively, for observational studies. Social and scholarly impact scores were highest for multiple sclerosis publications and lowest for non-small cell lung cancer publications. Systematic reviews achieved greater impact than regular reviews. Median trends in the social impact of different disease areas matched the level of public interest as assessed through Google search interest. Although most articles did not register societal impact, mean societal impact scores were highest for migraine publications. Conclusions: The EMPIRE Index successfully identified differences in impact by disease area and publication type, which supports the notion that the impact of each publication needs to be evaluated in Open Peer Review


Introduction
Article-level measures of publication impact (alternative metrics or altmetrics) can help to inform the impact of a publication among different audiences and in different contexts. Although the journal impact factor (JIF) may help to identify journals with a high readership, it is widely recognised as being a poor indicator of the quality or impact of individual research articles 1,2 . We have previously described a novel approach to summarising altmetrics, the EMPIRE (EMpirical Publication Impact and Reach Evaluation) Index, which uses article-level metrics to assess the impact of medical publications in terms relevant to different stakeholders 3 . The EMPIRE Index provides component scores for scholarly, social and societal impact, as well as a total impact score and predictive reach metrics. It provides richer information than other commonly used metrics such as the Altmetric Attention Score or JIF, with societal impact being the most distinct component score.
It is widely recognised that publication metrics vary by discipline; to facilitate the comparison of publication impact across different disciplines, field-normalised citation impacts are frequently calculated 4 . Metrics also vary by publication type. For example, a study found that review articles in pharmacology journals received twice as many citations as original articles 5 . Here, we present an exploratory investigation of whether disease indications and publication types influence the average EMPIRE Index scores.

Methods
This exploratory study investigated 12 disease indications, chosen to reflect a variety of common and rare diseases with a variety of aetiologies. Six of these were rare diseases, selected as a convenience sample of disease indications with which the authors were most familiar. No formal statistical power analysis was undertaken. However, we aimed for disease samples from approximately 1000 publications, which would enable publication type sub-analyses. The six rare disease samples were, therefore, pooled.
Relevant publications were identified for each disease by the appearance of the disease name in the publication title. We limited the search period to items with publication dates between 1 May 2017 and 1 May 2018, to give sufficient time for metrics to accumulate while also minimising the time-dependent variation in metrics.
The searches were conducted on PubMed between 22 June 2020 and 3 July 2020, using the following search string: For each disease, we conducted secondary searches for each publication type using PubMed tags for those of interest (i.e. the search string above and either "review", "systematic review", "clinical trial, phase iii", "clinical trial" or "observational study"). Altmetrics were obtained for all publications from Altmetric Explorer and PlumX over the period 23 June 2020 to 11 July 2020. Altmetrics were assumed to be zero for any publication for which Altmetric Explorer did not return a result. We also obtained the journal CiteScore for all publications 6 . EMPIRE Index scores were calculated for all publications as described previously 3 . Briefly, selected altmetrics that compose the EMPIRE Index were weighted and aggregated to form three component scores (social impact, scholarly impact and societal impact), which were then summed to form a total impact score.
Each disease area comprised a different mixture of publication types, which we expected could confound the analysis; multivariate analysis on such a heterogenous, non-normal and zero-inflated data set is problematic. Therefore, we opted to create standardised samples through random polling.
A sample was created for each disease area with a standardised mix of publication types chosen to maximise the total number of publications retained (the standardised publication types [SPT] set). First, the two least common publication types (phase 3 clinical trials and systematic reviews) were excluded because of the high variation between disease areas and because they are largely subsets of other publication types (clinical trials and reviews, respectively). Although the observational studies publication type was only slightly more common than systematic reviews, it was retained as it was considered to be functionally very different from clinical trials and reviews. The proportions of each of the remaining three publication types were calculated for each disease set, as well as for the overall set. Publications were then trimmed from each disease set by random sampling, as needed, to match the proportions in the overall set. The trimmed publication sets formed the SPT set.
Similarly, each publication type comprised a different mix of diseases. A standardised disease areas (SDA) set was created by random sampling using a similar approach that ensured each publication type included the same mix of diseases, while maximising the total number of publications retained.
To provide an indication of public interest in each of these diseases, we downloaded weekly Google Trends data on relative interest over time for the period of interest for these diseases (May 1 2017 to May 1 2018). A score of 100 indicates the maximum interest in any week over the search period and across any of the search terms of interest. The year averages presented here are expressed relative to that maximum score.
As these analyses were exploratory, we primarily provide descriptive statistics and only minimal statistical analysis was undertaken. Intra-group differences were assessed using Kruskal-Wallis one-way analysis of variance, a non-parametric test for equality of population means (a significant result indicates that that at least one population median of one group is different from the population median of at least one other group).

Sample characteristics
In total, 20 577 publications were identified across the 12 disease areas 7 , of which 5825 (28%) were tagged with one of the publication types of interest (Table 1). Table 1 also shows the Google search interest for each of these diseases.

Analysis by publication type
The numbers of publications retained in the SDA set used for publication type comparisons (i.e. with the same disease indication composition for each publication type) are shown in Table 2.
Median EMPIRE Index scores and CiteScores for each disease in the SDA set are shown in Figure 1 and Table 3. Mean EMPIRE Index scores, shown in Figure 2, broadly reflect the Table 1. Numbers of publications identified in the search. Google search interest is the average weekly interest across the search period, and is a relative score 0-100 where 100 is the maximum score for any disease in any individual week.   median scores. Statistical analysis indicated that there was some significant variation in the medians of each component as well as the total impact score and journal CiteScore. In general, the ranking of publication type is relatively consistent across different types of impact. Notably, phase 3 clinical trials had the highest median and mean scores, while observational studies had the lowest. Systematic reviews had higher impact than reviews. Most articles across all publication types had no societal impact, and significant differences in societal impact were driven by outliers. Of note, eight of the ten publications with the highest societal impact were clinical trials, and six of those were in non-small cell lung cancer (NSCLC).

Analysis by disease indication
The numbers of publications retained in the SPT set used for disease comparisons (i.e. with the same publication type composition for each disease indication) are shown in Table 4.
Median EMPIRE Index scores and journal CiteScores for each disease in the SPT set are shown in Figure 3 and Table 5.
Kruskall-Wallis testing indicated at least one significant pairwise difference in the total scores, each component score and journal CiteScore. Migraine and multiple sclerosis (MS) had the highest impact across social and scholarly component scores as well as the total impact score, while NSCLC and psoriasis had the lowest. Most articles across all diseases had no societal impact, with significant differences in societal impact driven by outliers. The eight publications with the highest societal impact were all important clinical outcomes trials (three in type 2 diabetes, three in NSCLC and one each in migraine and asthma).
Mean EMPIRE Index scores for each disease in the SPT set are shown in Figure 4. The interactive version of Figure 4 (online publication only) also shows the mean EMPIRE Index scores by disease for each publication type (full data set). Mean scores do not show clear trends for differences between disease indications, although societal impact appears to be lower for asthma and MS, and higher for migraine than other diseases. The high societal impact for migraine was driven by review articles; 16 of the 23 migraine articles with societal impact scores above zero were review articles. The scholarly impact for rare diseases appears to be higher than for other disease areas, albeit with low confidence owing to small numbers of publications included.

Discussion
This analysis found that typical EMPIRE Index scores vary across both disease indications and publication types. These results provide valuable contextual information for interpreting EMPIRE Index scores and publication metric findings in general, for individual publications. For example, these findings can be used to help to understand whether a particular publication has notably high (or low) metrics.
We found considerable differences between disease areas, which broadly reflected public interest in the disease (as assessed through Google search interest). For example, the three diseases with the highest median EMPIRE Index scores, especially social impact, were migraine, MS and asthma; these also had the highest public interest. These differences were not observed in journal CiteScores, meaning that the disease areas with higher EMPIRE Index impact were not necessarily published in 'high impact' journals. NSCLC had low public interest ('lung cancer' as a general term was higher, but still lower than any of the other five major disease areas examined). Publications in NSCLC also had low median total impact scores, particularly in terms of social impact, despite being published in journals with higher median CiteScores.
Although this suggests distinct differences between diseases in terms of publication impact, it should be noted that the period of interest was only a single year. The findings could therefore have been influenced by the completion of important clinical studies, which can vary from year to year across disease areas.
A clear picture is seen for publication types, with phase 3 trials demonstrating much higher metrics than other types. The        high impact of phase 3 clinical trials is to be expected, given that they are intended to provide practice-changing information. Systematic reviews had higher impact than general reviews; interestingly, this was despite being published in journals with similar median CiteScores. This likely reflects that the methodological approach to synthesising systematic literature reviews makes them more impactful. Observational studies had the lowest impact, suggesting observational analyses are still generally regarded as having lower interest.
In general, across both publication types and disease indications, median scores were higher for scholarly impact than for social or societal impact, while mean and maximal scores were broadly similar (or lower). This suggests that score distribution is more skewed for social and societal impact, with many papers generating little interest despite some scholarly impact.
A key strength of this study is the use of an automated approach to identify a large pool of publications for analysis. However, the automated process used depends on the reliability of the underlying data. For example, disease areas were identified through a PubMed search on article titles, which may have excluded some relevant articles or included irrelevant ones. The PubMed search engine uses automatic term mapping, which usually makes the search more inclusive but can introduce inconsistencies 8 . Publication types were identified by metadata tags, but these can often be inconsistently applied or missing. It can also result in duplication; for example, some phase 3 clinical trial publications in our sample were also classified as clinical trials.
In conclusion, the EMPIRE Index successfully identified differences in impact by disease indication and publication type. This supports the notion that there is no universal gold standard metric for publications, and instead the impact of each publication needs to be evaluated in the context of the type of publication, disease area and potentially other factors. These findings should be considered when using the EMPIRE Index to assess publication impact. This project contains the following underlying data:

Open Peer Review Introduction
The introduction is poor because it does not introduce the research problem properly. An introduction should explain why the research is necessary and what the problem is that the paper wants to address. Why is important to know that impact varies by disease or by publication type? Why is an indicator such as EMPIRE Index necessary?
Authors confound article-level with altmetrics. Many altmetric measures could be article-level metrics, but there are other article-level metrics (i.e. citations) that are not altmetrics. Author should revise the text and correct these statements.
I do not see a key section in a scientific article: a literature review. Authors should include a section where they contextualize the research, explaining previous studies that had deal with this problem. Studies about new indicators, differences in the impact by subject or document type, different types of impacts, etc. Overall, authors should present other studies in the introduction that show why is important to know that a disease has more social or scholarly impact, or why some document types attract more citations or tweets. In short, more conceptual background. The paper includes only eight references, which illustrates the poor context of this research.

Objective
I think this section is the most important in a paper and it is missing in this manuscript. If there are not objectives, there are not research goals and a paper lacks sense. The final line of Introduction says: "Here, we present an exploratory investigation of whether disease indications and publication types influence the average EMPIRE Index scores." Why do you want to know this? Why is important to know this? Authors should introduce a section for objectives where they explain the main objectives and secondary ones. It could be interesting to include some research questions that specify the aim of the paper.

Methods
Authors should include more detail about the data extraction process in Altmetric and PlumX. Authors claim that they used Altmetric Explorer to retrieve 20k publications. How did you exactly retrieve the data from this site? Do you use the API endpoint? How do you obtain the data from PlumX?
A key element in the paper is to test differences according to diseases and document types, but there is not any explanation about these typologies. For example, and about document types, what is the difference between "review" and "systematic review"? This part is key because the definition of groups and types influences the results, so it is very important to define the diseases and the typology of documents.
All the analysis relies on a new indicator, EMPIRE Index. In spite of the importance of this indicator in the paper, there is not any information about how it is calculated and conceptualized. We have to read a non-peer reviewed pre-print deposited in a repository to know what it is. I do not see this way to do science as correct. Authors have to explain how this indicator is calculated in this paper; how the three component scores are defined; what is the difference between "social impact" and "societal impact", mainly from a conceptual point of view; likewise how do you find three components and not two or four; which metrics take part in each component and why?
CiteScore is an indicator for journals, not for articles. How it is used and why? The same for Google Trends -how they are used in the study, and why?

Results
Tables 3 and 5 use CiteScore median for valuing articles by diseases. I recall that it is not correct to use journal indicators for research articles, and the use of median or mean from a ratio (CiteScore is the ratio of citations by publications) is a mathematical artifact, and the results are spurious.

Discussion
This paper lacks of any substantial discussion about the results. Sometimes the interpretation is obvious and it does not contribute valuable information. For example, the first paragraph ends: "For example, these findings can be used to help to understand whether a particular publication has notably high (or low) metrics." This is obvious and It is not necessary to do this study to reach that conclusion. Every publication has high or low metrics. Another example: "Observational The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com