Are numerical scores important for grant proposals' evaluation? A cross sectional study

Ivan Buljan; David G. Pina; Antonija Mijatović; Ana Marušić

doi:10.12688/f1000research.139743.1

Home Browse Are numerical scores important for grant proposals' evaluation? A...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Are numerical scores important for grant proposals' evaluation? A cross sectional study

[version 1; peer review: 1 approved, 2 approved with reservations, 1 not approved]

Ivan Buljan ¹, David G. Pina², Antonija Mijatović³, Ana Marušić³

PUBLISHED 26 Sep 2023

Author details Author details

¹ Department of Psychology, Faculty of Humanities and Social Sciences in Split, University of Split, Split, Croatia
² European Research Executive Agency, European Commission, Brussels, Belgium
³ Department of Research in Biomedicine and Health and Center for Evidence-based Medicine, Medical School of Split, Split, Croatia

Ivan Buljan
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

David G. Pina
Roles: Conceptualization, Data Curation, Investigation, Methodology, Project Administration, Resources, Writing – Original Draft Preparation, Writing – Review & Editing

Antonija Mijatović
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Writing – Original Draft Preparation, Writing – Review & Editing

Ana Marušić
Roles: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Supervision, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Research on Research, Policy & Culture gateway.

This article is included in the Meta-research and Peer Review collection.

Abstract

Background: In the evaluation of research proposals, reviewers are often required to provide their opinions using various forms of quantitative and qualitative criteria. In 2020, the European Commission removed, for the Marie Skłodowska-Curie Actions (MSCA) Innovative Training Networks (ITN) funding scheme, the numerical scores from the individual evaluations but retained them in the consensus report. This study aimed to assess whether there were any differences in reviewer comments’ linguistic characteristics after the numerical scoring was removed, compared to comments from 2019 when numerical scoring was still present.
Methods: This was an observational study and the data were collected for the Marie Skłodowska-Curie Actions (MSCA) Innovative Training Networks (ITN) evaluation reports from the calls of 2019 and 2020, for both individual and consensus comments and numerical scores about the quality of the proposal on three evaluation criteria: Excellence, Impact and Implementation. All comments were analyzed using the Linguistic Inquiry and Word Count (LIWC) program.
Results: For both years, the comments for proposal's strengths were written in a style that reflects objectivity, clout, and positive affect, while in weaknesses cold and objective style dominated, and that pattern remained stable across proposal status and research domains. Linguistic variables explained a very small proportion of the variance of the differences between 2019 and 2020 (McFadden R²=0.03).
Conclusions: Removing the numerical scores was not associated with the differences in linguistic characteristics of the reviewer comments. Future studies should adopt a qualitative approach to assess whether there are conceptual changes in the content of the comments.

Keywords

Peer Review, Linguistic Characteristics, MSCA program

Corresponding author: Ivan Buljan

Competing interests: No competing interests were disclosed.

Grant information: This study was funded by the Croatian Science Foundation (“Professionalism in Health - Decision making in practice and research, ProDeM”) under Grant Agreement No. IP-2019-04-4882. The funder had no role in the design of this study, its execution, analyses, interpretation of the data, or decision to submit results.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2023 Buljan I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Buljan I, Pina DG, Mijatović A and Marušić A. Are numerical scores important for grant proposals' evaluation? A cross sectional study [version 1; peer review: 1 approved, 2 approved with reservations, 1 not approved]. F1000Research 2023, 12:1216 (https://doi.org/10.12688/f1000research.139743.1) First published: 26 Sep 2023, 12:1216 (https://doi.org/10.12688/f1000research.139743.1) Latest published: 05 Sep 2024, 12:1216 (https://doi.org/10.12688/f1000research.139743.2)

Disclaimer

All views expressed in this study are strictly those of the authors and may in no circumstances be regarded as an official position of the European Research Executive Agency or the European Commission.

Introduction

The process of evaluating research grant proposals has attracted considerable attention in the past decade. With the increasing amount of funding for research, there is a constant need for improvements in evaluation procedures for providing funding to the most promising project proposals. Recent scoping reviews on peer review for research funding recommends, among other propositions, that there is a need for the identification of interventions that are consistent in resolving peer review issues in proposal evaluation (Recio-Saucedo et al., 2022; Shepherd et al., 2018). Studies on grant peer-review evaluation have mostly focused on the analysis of criteria used by expert reviewers when assessing proposals (Abdoul et al., 2012; van Arensbergen and van den Besselaar, 2012; Hug and Aeschbach, 2020). Other studies have investigated the linguistic content of review reports (van den Besselaar et al., 2018; Hren et al., 2022). However, to the best of our knowledge, evidence is missing on how the requirement for numerically scoring a grant proposal, i.e. attributing a numerical/quantitative score to a proposal by a given reviewer, affects the way a reviewer comments or expresses opinions related to the proposal.

The evaluation process of grants submitted to EU research programs, the so-called Framework Programmes for research and innovation, consists usually of two consecutive steps, with each proposal going through (1) an individual evaluation, made by (typically three) different expert reviewers and (2) a consensus phase, where those reviewers agree on the final evaluation of the proposal. In both parts of the evaluation, the evaluation is normally focused on three criteria: a) research Excellence, b) Impact, and c) Implementation, for which comments must be given separately. Each criterion is attributed a score that determines the total score of the proposal. The result of the consensus stage is an evaluation summary report (ESR), consisting of the consolidated concerted opinions of the group of expert reviewers. Previous studies have established this approach as a stable procedure in the evaluation of research grant proposals (Pina et al., 2021; Buljan et al., 2021).

In the previous Framework Programme, Horizon 2020 (H2020), some of the grant schemes faced changes in their scoring process. This was the case of the Marie Skłodowska-Curie Actions (MSCA), the flagship funding program dedicated to the promotion of researchers’ mobility and career development at all stages of their careers. In the past, expert reviewers were asked to provide comments and numerical scores for each of the three evaluation criteria in both their individual evaluations (the so-called Individual Evaluation Report – IER) and then at the level of the consensus, resulting in the final score of the evaluation summary report (ESR). For some of the funding schemes of MSCA, this approach was discontinued, and numerical scores were not attributed anymore at the level of individual evaluations (IER). Only textual comments were required at the IER stage, and numerical scores were used at the stage of the consensus for the ESR.

A recent study has indicated that proposal weaknesses have a greater effect on the ranking, compared to proposal strengths (Hren et al., 2022). Based on this finding, the ranking of proposals would greatly depend on the reviewers’ ability to identify and describe the weaknesses. In cases when there is a large number of proposals, qualitative methods of analysis can be ineffective. Therefore, quantitative analyses of the text, i.e. tools that assess quantitative characteristics of the text, can be a solution, as they have been shown to be relevant for proposal evaluation (Luo et al., 2022).

The objective of this study was to compare the linguistic characteristics of the comments related to the Excellence, Impact, and Implementation criteria in the evaluation reports of MSCA Innovative Training Networks (ITN) proposals submitted in 2019 and 2020, under H2020, in order to assess whether the removal of numerical scoring affected the structure of IER textual comments and whether this change was associated with the evaluation outcome at the consensus stage, i.e. ESR. We chose the ITN granting scheme, because it is, with around 1500 annual submissions and a success rate below 10%, among the most oversubscribed and competitive of the whole framework program.

Methods

This study was preregistered on Open Science Framework: https://osf.io/t84ba.

Ethics and consent

We worked on anonymized datasets, without insight about the actual content of the proposal, or the names of the applicants or experts evaluators, so that the regulations on personal data protection were not applicable.

Study design

This was an observational, cross-sectional study conducted in 2022.

Participants/sources of data

The data analyzed consisted in the IERs and the ESRs of all ITN proposals evaluated in the calls of 2019 and 2020. Each report includes textual comments referring to the different evaluation criteria. Scores of IER were only available for the year 2019. The anonymized quantitative data used for the analysis in this article is available on the Open Science Framework: https://osf.io/6bpvu/?view_only=.

Assessment tool

Linguistic characteristics of experts’ comments were assessed using the Linguistic Inquiry and Word Count software (Pennenbaker et al., 2015a, 2015b), a program that counts words related to different psychological states and phenomena and gives a score that is a proportion of the specific category in the entire text.

Variables analyzed

We collected the data on the proposal status after evaluation (“Main list”, “Reserve list” or “Rejected”), call in which they were submitted (2019 or 2020), research area, total evaluation scores, as well as numerical scores for Excellence, Implementation and Impact criteria, together with corresponding comments which separately described proposal strengths and weaknesses. We separately analysed IERs and ESRs.

For the evaluation purposes, proposals are categorized in eight panels: Economics-ECO, Social sciences-SOC, Mathematics-MAT, Physics-PHY, Chemistry-CHE, Engineering-ENG, Environmental sciences-ENV, Life sciences-LIF. For this study, we clustered the MAT-PHY-CHE-ENG-ENV into one single “research domain”: PHYENG. So, the three research domains in this study were PHYENG, ECOSOC and LIF.

LIWC variables were calculated separately for strengths and weaknesses for each of the criteria assessed: They included the word count and the text tone of the evaluation report (Kaatz et al., 2015; Kacewicz et al., 2014; Pennebaker et al., 2015a):

a) Analytical tone: higher scores indicate the logical and hierarchical style of writing;
b) Clout tone: higher scores indicate confidence or leadership, with lower scores indicating insecure writing;
c) Authenticity: high scores indicate writing honestly and humbly, with expressing views as personal opinion;
d) Emotional tone: higher scores on emotional tone indicate a higher proportion of words related to a more positive emotional tone.

Bias

To eliminate potential sampling bias, we collected data for the whole cohort of submitted MSCA ITN proposals in 2019 and 2020.

Statistical analysis

The descriptive data was presented as frequencies and percentages for project status and research domain. Text characteristics were presented as means and standard deviations, or as means and 95% confidence intervals in cases of figures. We first compared the differences on all variables using a t-test or Chi squared, depending on the nature of the variables. A P value threshold of less than 0.001 was considered to be significant in t test. Variables which were not significant were excluded from further analysis. We used logistic regression to compare differences between the two call years, in which proposal variables (proposal status, word count for research excellence weaknesses, word count for implementation strengths, and negative affect levels for implementation strengths) were predictors and the year of the call was the criterion. The level of significance was set to 0.05. The analysis was done using the R statistical program (R Core Team, 2021) and the JAMOVI package for statistical analysis (The jamovi project, 2023).

Results

The number of evaluated proposals was similar in 2019 (n=1554) and 2020 (n=1503). The number of rejected proposals was 1367 (87.9%) for 2019 and 1333 (88.6%) for 2020, 128 (8.2%) and 148 (9.8%) funded proposals, and 59 (3.8%) and 22 (1.5%) proposals on the reserve list in 2019 and 2020, respectively. The majority of the proposals was from physical sciences and engineering (n=1004, 64.6% for 2019 and n=947, 63.0 for 2020), followed by life sciences (n=391, 25.1% for 2019 and n=387, 25.8%) and economics and social sciences (n=159, 10.2% for 2019 and n=169, 11.2% for 2020).

Overall, review comments were written predominantly in an analytic and objective language, which was indicated by the high level of Analytical tone and low levels of Authenticity; this indicates that a small proportion of reviewers formulated their arguments as personal opinions, rather than objective comments (Figures 1 and 2).

Figure 1. Linguistic characteristics scores of consensus report reviewer’s comments about proposal between 2019-blue and 2020-yellow.

Figure 2. Linguistic characteristics scores of individual evaluation report reviewer’s comments about proposal between 2019-blue and 2020-yellow.

There was a greater presence of clout and emotional tone in the description of the proposal’s strengths (Figures 1 and 2). Also, a greater proportion of the comments describing the strengths of the proposal had more words related to the positive emotional tone (Figures 1 and 2).

The acceptance of a proposal was predicted by the linguistic characteristics of the comments related to the weaknesses in the proposal, more specifically lower analytical tone across all three criteria’s weaknesses, a higher negative emotional tone for research excellence and impact weaknesses, and higher clout in research excellence (Table 1). In total, those predictors explained around 30% of the variance of the criteria (McFadden R²=0.30). On the other hand, the differences between 2019 and 2020 were negligible, explaining around 3% of the variance (McFadden R²=0.03) (Table 2). These predictors included the number of words in excellence strengths (both for individual reviewers and consensus reports), lower analytical tone for excellence and impact in individual reports and higher emotional tone in consensus reports.

Table 1. Ordinal logistic regression model for prediction of proposal status by linguistic characteristics of reviewer’s comments^a.

Variable	Odds ratio (95% CI)	P Value
IER WC weaknesses excellence	0.98 (0.98 to 0.99)	<0.001
IER Analytic weaknesses excellence	0.99 (0.99 to 0.99)	0.010
CR WC weaknesses excellence	0.99 (0.98 to 0.99)	<0.001
CR Analytic weaknesses excellence	0.97 (0.96 to 0.98)	<0.001
CR Clout weaknesses excellence	1.02 (1.01 to 1.02)	<0.001
CR Authentic weaknesses excellence	1.01 (1.00 to 1.01)	0.008
CR Analytic weaknesses impact	0.98 (0.98 to 0.99)	<0.001
CR Clout strengths implementation	0.95 (0.93 to 0.97)	<0.001
CR Analytic weaknesses implementation	0.98 (0.98 to 0.99)	<0.001

a The categories were ordered as following: rejected, reserved, main listed. Higher odds ratio indicates greater probability for acceptance.

Table 2. Logistic regression in predicting ITN calls with individual (IER) and consensus (CR) comment characteristics^a.

Variable	Odds ratio (95% CI)	P Value
IER WC strengths excellence	1.01 (1.00 to 1.01)	<0.001
IER Analytic strengths excellence	0.98 (0.96 to 0.99)	0.002
IER Analytic strengths impact	0.98 (0.97 to 0.99)	0.007
IER Clout weaknesses impact	1.01 (1.00 to 1.02)	0.023
CR WC weaknesses excellence	1.01 (1.00 to 1.01)	<0.001
CR WC strengths implementation	1.01 (1.00 to 1.02)	<0.001
CR Emotional tone strengths implementation	1.01 (1.01 to 1.01)	0.003

a Criterion variable was call year: the 2019 call was labeled as 0 and the 2020 call as 1.

With regard to the differences in textual characteristics between consensus report evaluation and individual evaluation scores between 2019 and 2020, the observed pattern of greatest differences between consensus score and individual scores in emotional tone was stable across different proposal status (Table 3) and research domains (Table 4). Emotional tone was overall greater for consensus score results.

Table 3. Average scores (mean, standard deviation) for differences between consensus score and individual evaluation scores across different outcome status categories and between 2019 and 2020^a.

	Main list		Reserve list		Rejected list
	2019 (n=128)	2020 (n=148)	2019 (n=59)	2020 (n=22)	2019 (n=1367)	2020 (n=1333)
Excellence – strengths:
Word count	-57.1 (101.8)	-64.4. (-111.7)	-46.3 (99.0)	-63.0 (87.8)	-7.8 (104.8)	-11.0 (105.2)
Analytic tone	3.1 (5.8)	4.2 (6.8)	2.8 (3.9)	1.7 (5.1)	2.6 (7.0)	3.8 (7.6)
Clout tone	14.1 (7.8)	14.7 (9.2)	14.3 (9.1)	10.2 (9.3)	13.9 (9.8)	13.9 (9.7)
Authentic tone	-9.4 (7.6)	-9.8 (8.3)	-8.5 (8.9)	-9.4 (6.6)	-8.4 (9.3)	-9.0 (9.6)
Emotional tone	6.8 (13.9)	5.1 (13.6)	3.9 (12.2)	7.5 (12.3)	4.9 (14.4)	4.9 (15.1)
Excellence – weaknesses:
Word count	-20.6 (59.0)	-2.5 (85.4)	-23.4 (51.8)	-25.5 (25.8)	-5.9 (59.1)	3.5 (67.3)
Analytic tone	3.8 (36.7)	8.7 (36.2)	13.3 (29.4)	7.5 (27.5)	17.1 (19.4)	16.2 (19.0)
Clout tone	5.0 (24.8)	11.2 (26.9)	13.4 (21.8)	10.8 (20.6)	16.1 (17.0)	16.9 (16.5)
Authentic tone	5.2 (30.8)	-0.5 (29.4)	4.0 (31.6)	2.6 (23.3)	-7.3 (22.9)	-8.9 (22.7)
Emotional tone	28.2 (24.2)	32.8 (27.2)	34.7 (26.2)	29.1 (23.6)	36.3 (25.4)	34.6 (25.4)
Impact – strengths:
Word count	-55.9 (102.4)	-45.8 (95.0)	-39.6 (94.2)	-31.0 (90.1)	-0.3 (94.5)	-4.2 (91.0)
Analytic tone	1.8 (12.7)	4.1 (8.9)	1.6 (4.7)	-2.6 (21.1)	3.3 (7.3)	4.5 (9.0)
Clout tone	20.5 (12.0)	20.2 (9.8)	19.0 (10.1)	16.2 (19.2)	21.5 (11.0)	21.6 (11.7)
Authentic tone	-8.5 (9.1)	-8.0 (11.3)	-7.7 (9.5)	-8.9 (11.9)	-8.0 (11.6)	-8.0 (11.5)
Emotional tone	5.0 (15.2)	7.1 (11.9)	5.4 (11.4)	-3.2 (16.5)	2.4 (14.2)	3.1 (14.3)
Impact – weaknesses:
Word count	12.3 (46.9)	-6.2 (60.8)	-18.5 (38.6)	-21.4 (22.2)	-6.7 (43.9)	0.3 (52.2)
Analytic tone	5.6 (33.2)	-0.6 (39.6)	10.1 (37.2)	5.2 (37.7)	17.4 (24.0)	16.1 (24.4)
Clout tone	4.8 (26.7)	5.8 (28.7)	10.1 (23.8)	13.7 (31.8)	19.6 (20.9)	19.3 (21.0)
Authentic tone	5.0 (28.7)	4.8 (26.6)	5.1 (26.9)	2.8 (22.2)	-4.4 (25.1)	-4.1 (24.5)
Emotional tone	32.3 (27.0)	31.0 (25.0)	39.1 (31.6)	34.6 (27.5)	42.4 (28.2)	41.8 (28.5)
Implementation – strengths:
Word count	1.3 (66.6)	6.0 (77.7)	9.9 (47.0)	16.3 (42.5)	-3.2 (54.4)	2.6 (61.6)
Analytic tone	3.6 (7.8)	4.7 (11.7)	5.5 (7.9)	-0.2 (10.7)	4.0 (7.5)	4.8 (8.9)
Clout tone	13.8 (7.5)	12.9 (10.3)	13.4 (8.6)	11.3 (5.8)	14.3 (9.1)	13.3 (9.8)
Authentic tone	-6.4 (4.6)	-6.2 (6.0)	-5.7 (5.0)	-5.1 (4.6)	-5.5 (5.3)	-5.4 (6.0)
Emotional tone	-0.5 (14.5)	-0.7 (15.6)	-5.7 (15.7)	-2.3 (14.9)	-4.1 (17.4)	-2.0 (16.1)
Implementation – weaknesses:
Word count	-9.7 (47.8)	-0.7 (59.4)	-15.8 (48.5)	-5.0 (23.3)	0.5 (50.3)	6.4 (52.9)
Analytic tone	17.1 (36.3)	6.3 (37.4)	16.8 (29.3)	12.8 (35.3)	18.8 (23.0)	18.4 (23.4)
Clout tone	10.3 (23.2)	7.7 (24.1)	12.9 (22.4)	10.7 (23.2)	16.8 (18.7)	17.9 (18.5)
Authentic tone	9.3 (28.8)	1.6 (26.9)	9.0 (31.2)	5.5 (33.0)	-3.3 (25.1)	-4.5 (24.3)
Emotional tone	24.6 (27.4)	26.7 (25.4)	24.3 (29.5)	21.4 (24.1)	22.4 (26.1)	22.8 (26.4)

a The difference was calculated for each project as the difference between the Consensus score result and the Individual evaluation score.

Table 4. Average scores for differences between consensus score and individual evaluation scores across different research domains and between 2019 and 2020^a.

	Research domain (M, SD)
	ECO/SOC		LIF		PHY/ENG
	2019 (n=159)	2020 (n=169)	2019 (n=391)	2020 (n=387)	2019 (n=1004)	2020 (n=947)
Excellence strengths
Word count research	-16.6 (101.1)	-21.3 (111.5)	-5.5 (108.2)	-19.5 (109.0)	-15.8 (104.9)	-15.2 (105.3)
Analytic tone research	1.1 (5.3)	1.9 (4.1)	3.4 (9.1)	3.4 (5.8)	2.6 (5.9)	4.3 (8.4)
Clout tone research	15.9 (8.6)	13.9 (9.6)	14.7 (9.8)	13.9 (9.3)	13.4 (9.7)	13.9 (9.8)
Authentic tone research	-10.1 (8.3)	-11.7 (10.8)	-9.0 (8.5)	-9.2 (9.9)	-8.1 (9.4)	-8.5 (8.9)
Emotional tone research	8.1 (13.4)	6.3 (15.7)	3.4 (14.4)	4.5 (14.9)	5.1 (14.4)	4.5 (14.9)
Excellence weaknesses
Word count research	8.8 (85.6)	0.6 (68.9)	-0.4 (58.4)	15.7 (73.8)	-13.2 (53.1)	-2.6 (66.1)
Analytic tone research	12.8 (21.0)	16.5 (19.9)	15.8 (21.3)	15.5 (22.7)	16.4 (22.5)	15.0 (22.7)
Clout tone research	18.4 (19.3)	19.3 (18.2)	15.7 (16.9)	16.8 (16.4)	14.3 (18.5)	15.4 (18.4)
Authentic tone research	-8.8 (23.9)	-13.7 (22.6)	-6.1 (23.5)	-8.0 (23.2)	-5.2 (24.7)	-6.8 (23.7)
Emotional tone research	33.3 (24.6)	28.4 (23.0)	38.4 (25.8)	35.7 (25.8)	34.9 (25.3)	34.9 (25.8)
Impact strengths
Word count	-11.0 (89.5)	-11.9 (97.0)	-1.4 (96.6)	-3.7 (92.3)	7.6 (97.6)	-10.2 (91.3)
Analytic tone	1.7 (7.1)	3.2 (11.2)	4.4 (7.5)	5.2 (8.3)	2.8 (7.9)	4.2 (9.2)
Clout tone	23.6 (11.8)	21.5 (13.1)	21.3 (11.1)	21.2 (11.8)	20.9 (20.9)	21.4 (11.4)
Authentic tone	-7.8 (11.4)	-9.0 (12.1)	-7.7 (11.0)	-7.5 (13.1)	-8.2 (11.5)	-8.1 (10.7)
Emotional tone	5.3 (13.3)	7.0 (12.7)	1.0 (13.9)	2.1 (12.7)	3.0 (14.4)	3.3 (14.8)
Impact weaknesses
Word count	8.5 (68.6)	2.5 (55.8)	-6.5 (40.3)	5.5 (58.3)	-10.6 (39.8)	-3.7 (49.7)
Analytic tone	12.0 (22.4)	16.6 (18.2)	16.8 (25.9)	13.1 (30.0)	16.5 (26.1)	14.3 (27.0)
Clout tone	21.5 (19.8)	20.7 (18.8)	20.3 (22.5)	19.0 (23.1)	16.6 (22.0)	16.9 (22.7)
Authentic tone	-4.2 (22.8)	-10.5 (19.6)	-4.8 (25.7)	-2.2 (24.1)	-2.6 (26.0)	-2.2 (25.8)
Emotional tone	42.3 (26.3)	39.0 (25.5)	42.7 (29.8)	42.0 (27.4)	40.9 (28.1)	40.4 (29.2)
Implementation strengths
Word count	1.3 (71.5)	15.9 (68.2)	3.8 (56.4)	6.4 (63.0)	-5.2 (51.7)	-0.5 (61.9)
Analytic tone	3.7 (6.8)	3.9 (8.8)	5.1 (8.4)	4.2 (10.0)	3.7 (7.2)	5.1 (9.1)
Clout tone	14.2 (9.2)	11.9 (10.1)	14.7 (9.1)	13.9 (10.5)	14.0 (8.9)	13.2 (9.3)
Authentic tone	-5.5 (4.9)	-5.7 (4.5)	-5.6 (5.1)	-6.0 (6.8)	-5.6 (5.4)	-5.2 (5.8)
Emotional tone	-7.2 (19.3)	-1.4 (15.3)	-5.5 (15.9)	-4.5 (16.4)	-2.6 (17.1)	-0.9 (15.9)
Implementation weaknesses
Word count	6.4 (68.7)	1.1 (54.7)	2.2 (48.7)	13.2 (56.8)	-3.4 (47.0)	3.2 (51.4)
Analytic tone	13.5 (26.7)	17.8 (28.0)	19.8 (23.6)	17.7 (26.4)	18.9 (24.6)	16.8 (24.7)
Clout tone	17.7 (22.6)	19.1 (19.1)	18.1 (19.0)	18.7 (18.2)	15.0 (18.9)	15.6 (20.0)
Authentic tone	-3.2 (22.6)	-6.8 (23.3)	-5.9 (25.6)	-5.9 (24.3)	0.1 (26.49	-2.3 (25.2)
Emotional tone	31.4 (28.1)	25.7 (27.3)	24.5 (27.7)	23.8 (27.3)	20.5 (25.2)	22.4 (25.9)

a The difference was calculated as Consensus score result-Individual evaluation score result; for every corresponding project.

Discussion

In this study, which included all ITN proposals from 2019 and 2020 calls, we aimed to assess whether the changes in the evaluation procedure were related to differences in characteristics of review reports. We found that the differences in linguistic characteristics between reports fromboth calls (2019 and 2020) were small and negligible from a practical point, indicating that the removal of numerical scores did not result in meaningful changes in the reports’ comments, assessed by quantitative text analysis. For both calls, the comments were written objectively, with weaknesses written with less emotion and more analytically than the proposals’ strengths. On the other hand, we found that the final status of the proposals (i.e. main-listed or rejected) can be predicted by the linguistic characteristics of the reviewer’s comments, especially the tone related to the identified weaknesses, indicating that weaknesses may be crucial in proposal evaluation.

The comments’ text was written mostly using formal language, indicated by high levels of analytical tone, both for strengths and weaknesses. The same feature was observed in a previous study performed on journals’ reports from peer reviewers (Buljan et al., 2021). Our results also provide evidence for a general advice to the applicants – to be very focused on the objective structure of their proposal (Baumert et al., 2022). When emphasizing proposal strengths, due to the low levels of authenticity, the reviewers less frequently used personal pronouns like “I” or “we”, probably to present the proposal strengths as factual information, and not as a personal opinion. This finding is contrary to the study of Thelwall et al. (2023), which pointed out that higher use of first pronouns in reviews is related with higher proposal quality. On the other hand, in the description of weaknesses, the reviewers more often presented the information as their personal opinion. This finding is further supported by high levels of clout tone in the description of strengths. The clout – the tone which indicates writing from a position of power – was much higher in the description of the project strengths than in the description of the weaknesses, from which we can conclude that reviewers were more certain in their evaluation. The emotional tone was more positive in the description of strengths, probably because of the use of words related to the project’s probable success. There was a greater presence of clout and emotional tone in the description of the proposal’s strengths, which indicates that reviewers wrote with less confidence when discussing the potential flaws of the proposals, compared to when they mentioned the proposal’s strengths. In that respect, it is to be noted that the EC services instruct reviewers that evaluation reports should not express opinions, but rather evaluate factual elements of the proposals.

The principal difference between consensus and individual scores was in the emotional tone score. Across different categories, the emotional tone was higher in consensus ratings rather than in individual evaluation score, indicating more positive tone of the outcome of the consensus process. This may be due to the fact that only the concensus report is sent our to applicants. The IER remains an internal (intermediate) report and is not externalised and sent to applicants. In a previous study, we found that the agreement between reviewers was very high (Pina et al., 2021). At the time of the individual evaluation, the reviewers do not know whether other reviewers will agree with them. It is possible that, when reviewers need to write a consensus evaluation, they are not limited by the objective language in the evaluation of the proposal, since it is established that other reviewers agree with their opinion, so the tone is more relaxed and positive.

Our previous study of the predictive value of comments on proposals’ strengths and weaknesses in ITN evaluation process used both qualitative and quantitative (machine learning) approach (Hren et al., 2022). Our present results partially confirm the results of that study, which found that proposals’ weaknesses are more predictive of its evaluation outcome (Hren et al., 2022). However, we found that only some elements in the weaknesses are predictive of the proposal status. Specifically, a higher analytical tone and fewer negative evaluation words in comments related to proposals’ weaknesses were associated with a more favorable funding outcome . It needs to be noted that themes that served as predictors in the regression model were identified qualitatively and were better predictors (explained around 55% of the criteria) compared to our quantitative text analysis (around 30% of the criteria). However, due to the large number of proposals, linguistic characteristics of reviewers’ comments may serve as an additional tool in the proposed evaluation, as advised by others (Luo et al., 2022).

The finding that we did not observe meaningful differences in tone of reviewers’ comments needs to be interpreted in the light of several limitations. It should be noted that our entire quantitative text analysis process was made by dictionary-based text analysis algorithms, which may slightly deviate from manual analysis (Luo et al., 2022), being still predictive for proposal funding outcomes, and that was partially reproduced in our study. One aspect of quantitative text analysis, sentiment analysis or analysis of the text tone, could serve as a useful tool to determine whether there were any differences in the evaluation performed by reviewers after the removal of individual numerical scores, as reviewers were a common part of both procedures. We only focused on the linguistic characteristics of the comments related to the positive and negative sides of the proposals. Qualitative analysis of the proposals would give input on the potential differences between the two calls but, due to the number of the proposals, there is a question of practical value for such an approach. We do not have information about who the reviewers were, which may present relevant information since some individual characteristics, such as experience in research or reviewing, may influence the review process (Seeber et al., 2021). Based on our evaluation, we found no evidence that the removal of numerical scoring produced any differences in the evaluation output.

Conclusions

This study assessed whether removing of numerical scores will have a significant effect on the evaluation procedure. The findings indicate that the removal of numerical scores did not contribute to meaningful differences in the evaluation procedure of H2020 ITN proposals or its outcome. Those results support the finding that the procedure used for the evaluation of MSCA grant proposals is very robust and stable.

Author contributions

Ivan Buljan, Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing; David G Pina, Conceptualization, Data Curation, Investigation, Methodology, Project Administration, Resources, Writing – Original Draft Preparation, Writing – Review & Editing; Antonija Mijatović, Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Writing – Original Draft Preparation, Writing – Review & Editing and Ana Marušić, Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Supervision, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing.

Data availability

Underlying data

Open Science Framework: Are numerical scores important for grant evaluation?, https://osf.io/6bpvu/?view_only= (Buljan et al., 2021).

This project contains the following underlying data:

- Data file 1. Dataset_anonymized.csv

Extended data

Open Science Framework: Are numerical scores important for grant evaluation?, https://osf.io/6bpvu/?view_only= (Buljan et al., 2021).

This project contains the following extended data:

- Extended data.docx

Reporting guidelines

Open Science Framework: STROBE checklist for “Are numerical scores important for grant evaluation? A cross sectional study”, https://osf.io/6bpvu/?view_only= (Buljan et al., 2021).

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

References

Abdoul H, Perrey C, Amiel P, et al.: Peer review of grant applications: criteria used and qualitative study of reviewer practices. PLoS One. 2012; 7(9): e46054. PubMed Abstract | Publisher Full Text | Free Full Text
Baumert P, Cenni F, Antonkine ML: Ten simple rules for a successful EU Marie Skłodowska-Curie Actions Postdoctoral (MSCA) fellowship application. PLoS Comput. Biol. 2022; 18(8): e1010371. PubMed Abstract | Publisher Full Text | Free Full Text
Buljan I, Pina DG, Marušić A: Ethics issues identified by applicants and ethics experts in Horizon 2020 grant proposals. F1000Res. 2021; 10: 471. PubMed Abstract | Publisher Full Text | Free Full Text
Hren D, Pina DG, Norman CR, et al.: What makes or breaks competitive research proposals? A mixed-methods analysis of research grant evaluation reports. J. Informet. 2022; 16(2): 101289. Publisher Full Text
Hug SE, Aeschbach M: Criteria for assessing grant applications: a systematic review. Palgrave Commun. 2020; 6: 37. Publisher Full Text
Kaatz A, Magua W, Zimmerman DR, et al.: A quantitative linguistic analysis of National Institutes of Health R01 application critiques from investigators at one institution. Acad. Med. 2015; 90(1): 69–75. PubMed Abstract | Publisher Full Text | Free Full Text
Kacewicz E, Pennebaker JW, Davis M, et al.: Pronoun use reflects standings in social hierarchies. J. Lang. Soc. Psychol. 2014; 33(2): 125–143. Publisher Full Text
Luo J, Feliciani T, Reinhart M, et al.: Analyzing sentiments in peer review reports: Evidence from two science funding agencies. Quant. Sci. Stud. 2022; 2(4): 1271–1295. Publisher Full Text
Pennebaker JW, Booth RJ, Boyd RL, et al.: Linguistic Inquiry and Word Count: LIWC2015. Pennebaker Conglomerates; 2015a. Reference Source
Pennebaker JW, Boyd RL, Jordan K, et al.: The development and psychometric properties of LIWC2015.2015b.
Pina DG, Buljan I, Hren D, et al.: A retrospective analysis of the peer review of more than 75,000 Marie Curie proposals between 2007 and 2018. elife. 2021; 10: e59338. PubMed Abstract | Publisher Full Text | Free Full Text
R Core Team: R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2021. Reference Source
Recio-Saucedo A, Crane K, Meadmore K, et al.: What works for peer review and decision-making in research funding: a realist synthesis. Res. Integr. Peer Rev. 2022; 7(1). PubMed Abstract | Publisher Full Text | Free Full Text
Seeber M, Vlegels J, Reimink E, et al.: Does reviewing experience reduce disagreement in proposals evaluation? Insights from Marie Sklodowska-Curie and COST Actions. Res. Eval. 2021; 30(3): 349–360. Publisher Full Text
Shepherd J, Frampton GK, Pickett K, et al.: Peer review of health research funding proposals: A systematic map and systematic review of innovations for effectiveness and efficiency. PLoS One. 2018; 13(5): e0196914. PubMed Abstract | Publisher Full Text | Free Full Text
The jamovi project: jamovi (Version 2.3) [Computer Software].2023. Reference Source
Thelwall M, Kousha K, Abdoli M, et al.: Terms in journal articles associating with high quality: can qualitative research be world-leading? J. Doc. 2023. Publisher Full Text
van Arensbergen P , van den Besselaar P : The selection of scientific talent in the allocation of research grants. High Educ. Pol. 2012; 25(3): 381–405. Publisher Full Text
van den Besselaar P , Sandström U, Schiffbaenker H: Studying grant decision-making: a linguistic analysis of review reports. Scientometrics. 2018; 117(1): 313–329. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 26 Sep 2023

Author details Author details

David G. Pina
Roles: Conceptualization, Data Curation, Investigation, Methodology, Project Administration, Resources, Writing – Original Draft Preparation, Writing – Review & Editing

Antonija Mijatović
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Writing – Original Draft Preparation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This study was funded by the Croatian Science Foundation (“Professionalism in Health - Decision making in practice and research, ProDeM”) under Grant Agreement No. IP-2019-04-4882. The funder had no role in the design of this study, its execution, analyses, interpretation of the data, or decision to submit results.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (2)

version 2

Revised

Published: 05 Sep 2024, 12:1216

https://doi.org/10.12688/f1000research.139743.2

version 1

Published: 26 Sep 2023, 12:1216

https://doi.org/10.12688/f1000research.139743.1

© 2023 Buljan I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Buljan I, Pina DG, Mijatović A and Marušić A. Are numerical scores important for grant proposals' evaluation? A cross sectional study [version 1; peer review: 1 approved, 2 approved with reservations, 1 not approved]. F1000Research 2023, 12:1216 (https://doi.org/10.12688/f1000research.139743.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 1

VERSION 1

PUBLISHED 26 Sep 2023

Views

Reviewer Report 29 Aug 2024

Jan-Ole Hesselberg, Department of Psychology, University of Oslo, Oslo, Norway

Approved with Reservations

https://doi.org/10.5256/f1000research.153043.r246534

GENERAL COMMENT
This is an interesting and well-written article. I have few specific comments and have no substantial comments regarding the conclusions.

My main concern are the deviations from the preregistration, both regarding the measured variables and the statistical analyses. These deviations seem to be significant and increase the risk of bias. I want to stress I believe the methods used are sound and that there are likely good reasons for the deviations. However, the reader should be made aware of them and the authors should comment on:
1) Which changes were made to the methods
2) Why they were made
3) If the results would have been different if methods described in the preregistration had been followed

I would prefer if the Methods section had contained a subsection called "Differences between preregistration and article" and that the results section contained comments regarding what the results would have been if the preplanned analyses had been done.

To help the reader of this review to see the differences. Here are the preplanned measured variables described in the preregistration (https://osf.io/t84ba):

"The outcome variable will be the difference in final scores between 2019 and 2020. Input variables will be: year of the calls; evaluation panel; country of the coordinator (high and low research performing countries); linguistic characteristics of the individual expert evaluation in each of the three evaluation criteria (Excellence, Impact, Implementation) – LIWC analysis and another measure of sentiment analysis like RSentiment."

There are no mentions of calculated indices or transformations

And here are the preplanned statistical analyses described in the preregistration (https://osf.io/t84ba):

"Statistical models
The categorical data will be presented as frequencies and percentages, while numerical data will be presented as means and standard deviations in the descriptive part and with 95% confidence intervals in the inferential part of the analysis. We will compare the differences in linguistic characteristics between 2019 and 2020 and simultaneously compare linguistic characteristics between different reviewer decisions using two-way ANOVA. The data in the models will be presented as group means and 95% confidence intervals while final results will be expressed with squared eta effect size.

Transformations
None planned.

Inference criteria
We will use the standard p<.05 criteria for determining if the ANOVA and the post hoc test suggest that the results are significantly different from those expected if the null hypothesis were correct. The post-hoc Tukey-Kramer test adjusts for multiple comparisons."

SPECIFIC COMMENTS

Introduction
"Previous studies have established this approach as a stable procedure in the evaluation of research grant proposals"

What is meant by "stable procedure" in this quote?

Participants/sources of data
Thank you for sharing the data. It would be very helpful if you could also share the analytic code (the R script).

Variables analyzed
Please comment on the changes from the preregistration. Among other things, it seems like "country of coordinator" was left out and that the clustering of PHYENG and the use of "proposal status after evaluation" was not preplanned.

Statistical analysis
Please comment on the changes from the preregistration.

Discussion
The phrase "can be predicted" is used. I believe it would be helpful to add some wording to describe to the reader to what extent it can be predicted.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Judgement and decision-making, peer-review, inter-rater agreement

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response 05 Sep 2024

Ivan Buljan, Department of Psychology, Faculty of Humanities and Social Sciences in Split, University of Split, Split, Croatia

05 Sep 2024

Author Response

Dear reviewer,

Thank you for your insightful suggestions to enhance our article. We have incorporated a new section titled "Deviations from the Protocol" and provided a detailed description of ... Continue reading Dear reviewer,

Thank you for your insightful suggestions to enhance our article. We have incorporated a new section titled "Deviations from the Protocol" and provided a detailed description of the revised methods in the latest version. The new version was submitted last week and will be available in the coming weeks. If you feel the changes still fall short, we are open to making further adjustments.
Kind regards,
Ivan Buljan
Dear reviewer,

Thank you for your insightful suggestions to enhance our article. We have incorporated a new section titled "Deviations from the Protocol" and provided a detailed description of the revised methods in the latest version. The new version was submitted last week and will be available in the coming weeks. If you feel the changes still fall short, we are open to making further adjustments.
Kind regards,
Ivan Buljan
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 05 Sep 2024

Ivan Buljan, Department of Psychology, Faculty of Humanities and Social Sciences in Split, University of Split, Split, Croatia

05 Sep 2024

Author Response

Dear reviewer,

Thank you for your insightful suggestions to enhance our article. We have incorporated a new section titled "Deviations from the Protocol" and provided a detailed description of ... Continue reading Dear reviewer,

Thank you for your insightful suggestions to enhance our article. We have incorporated a new section titled "Deviations from the Protocol" and provided a detailed description of the revised methods in the latest version. The new version was submitted last week and will be available in the coming weeks. If you feel the changes still fall short, we are open to making further adjustments.
Kind regards,
Ivan Buljan
Dear reviewer,

Thank you for your insightful suggestions to enhance our article. We have incorporated a new section titled "Deviations from the Protocol" and provided a detailed description of the revised methods in the latest version. The new version was submitted last week and will be available in the coming weeks. If you feel the changes still fall short, we are open to making further adjustments.
Kind regards,
Ivan Buljan
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 14 Mar 2024

Seba Qussini, Medical Research Center, Hamad Medical Corporation, Doha, Qatar; Center for Biomedical Ethics and Law, KU Leuven (Ringgold ID: 26657), Leuven, Flanders, Belgium

Dr. Samer Hammoudeh, Hamad Medical Corporation Medical Research Center, Doha, Doha, Qatar; Medical Research Center, Hamad Medical Corporation (Ringgold ID: 36977), Doha, Doha, Qatar

Approved with Reservations

https://doi.org/10.5256/f1000research.153043.r246528

The study in hand evaluates whether reviewer comments on research proposals differed after the restructuring of the Marie Sklodowska-Curie Actions (MSCA) Innovative Training Networks (ITN) scoring scheme by eliminating the possibility of scoring using numbers. The authors utilize proposal requests data from 2019 and 2020 and found that linguistic characteristics did not differ after the removal of numerical scores from the specified scoring scheme. The following comments are applicable to the current draft of the manuscript:

It would be strongly advised that the authors run grammar and typo check, or to obtain language and editing services. The current version of the manuscript has multiple occasions of misspelled words, grammatical errors, and typos. Some examples are figure 1 or 2: the word strengths. First paragraph of discussion: from both. Third paragraph of discussion: the word consensus.
Please add an abbreviation section at the end of the manuscript, and make sure that all abbreviations are spelled out at the first encounter throughout the manuscript.
The current study fills a current gap in the literature as identified by the authors early in the introduction section. This section can probably be restructured for better understanding of the readers, in the following manner: ideally, the first paragraph would discuss general aspects of peer review, therefore, the gap in literature can be moved towards the end of the introduction, after discussing a few aspects, models, shortcomings, etc.
Additionally, the authors may provide examples for the quantitative characteristics in the introduction, and/or further elaborate on the effect of quantitative characteristics on reviewers’ evaluation. Providing more information regarding the Marie Skłodowska-Curie Actions (MSCA) would probably help supplement this section and provide a better understanding of the dynamics of this grant. Furthermore, it could have been useful to touch on other possible proposals’ evaluating methods such as scoring and ranking, as can be found in[1]
The last paragraph may be used to highlight further the gap in the literature that the authors aim to bridge with this project, indicating that peer review is highly versatile and a diverse process, yet further focusing on the need to address the general lack of theoretical understanding in multiple aspects of peer review, as can be seen in: [2], and/or the paucity of experimental research in proposals’ peer review as summarized by [3]
The authors may want to further elaborate on the reasons for removing the numerical score from MSCA? Was it for the effect of simplifying the peer review process on reviewers’ engagement and quality of report? as pointed out by [4]
Registering with the Open Science Framework is a plus.
The study design is cross sectional but not observational. The latter terminology may be removed.
What is the justification for including data from only 2019 and 2020? Why not expand on the size of data by including more years.
Further information may be added to the assessment tool, Linguistic Inquiry and Word Count (LWIC) software, such as the validity and reliability, the LWIC scores, possible cutoffs, totals, etc. The authors do provide a reference, but it would just be easier to have it readily accessible while reading.
Under the bias heading: What is the role of including the whole cohort of submitted proposals in 2019 and 2020 in eliminating potential sampling bias? Please make it clear for the readers.

In the statistical analysis section: How were the predictors established? Was it based on an analysis or from the literature? It would be advised to clarify this detail.
Additionally, please provide justification for using the specified p-value 0.001 in your model, as it appears too stringent.
In the results section: The majority of proposals were from the physical science and engineering fields. Did this play a role in the outcome of the study, where no differences were detected before and after? The authors may want to expand on this in the discussion section.
Additionally, is it possible that COVID19 resultant urgency have affected the review of proposals in 2020?
It would also be useful to further elaborate on the blinding model adopted by the MSCA and how they may have influenced the results.
In table 3: How can the average word count be negative? Please provide an explanation.
It would be great to supplement the discussion section with a few recommendations for future researchers into the topic.
Have the authors considered adjusting for possible confounding factors, such as research domain, etc.? If not , please discuss the possible effects of such confounding factors in the Discussion section.

Overall, the authors follow a logical order in introducing the main aspects of the study, provide sufficient details in some areas and are lacking minute details in other areas. The findings are clearly conveyed along with a proper interpretation of the results. The value of such work would probably benefit research administrators along with policy makers.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

References

1. Gallo SA, Pearce M, Lee CJ, Erosheva EA: A new approach to grant review assessments: score, then rank.Res Integr Peer Rev. 2023; 8 (1): 10 PubMed Abstract | Publisher Full Text
2. Tennant JP, Ross-Hellauer T: The limitations to our understanding of peer review.Res Integr Peer Rev. 2020; 5: 6 PubMed Abstract | Publisher Full Text
3. Qussini S, MacDonald RS, Shahbal S, Dierickx K: Blinding Models for Scientific Peer-Review of Biomedical Research Proposals: A Systematic Review.J Empir Res Hum Res Ethics. 2023; 18 (4): 250-262 PubMed Abstract | Publisher Full Text
4. Herbert DL, Graves N, Clarke P, Barnett AG: Using simplified peer review processes to fund research: a prospective study.BMJ Open. 2015; 5 (7): e008380 PubMed Abstract | Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Research Integrity and Peer review

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

CITE

Report a concern

Author Response 26 Aug 2024

Ivan Buljan, Department of Psychology, Faculty of Humanities and Social Sciences in Split, University of Split, Split, Croatia

26 Aug 2024

Author Response

Dear reviewers, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 3: Approved with reservations
... Continue reading Dear reviewers, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 3: Approved with reservations
The study in hand evaluates whether reviewer comments on research proposals differed after the restructuring of the Marie Sklodowska-Curie Actions (MSCA) Innovative Training Networks (ITN) scoring scheme by eliminating the possibility of scoring using numbers. The authors utilize proposal requests data from 2019 and 2020 and found that linguistic characteristics did not differ after the removal of numerical scores from the specified scoring scheme. The following comments are applicable to the current draft of the manuscript:
It would be strongly advised that the authors run grammar and typo check, or to obtain language and editing services. The current version of the manuscript has multiple occasions of misspelled words, grammatical errors, and typos. Some examples are figure 1 or 2: the word strengths. First paragraph of discussion: from both. Third paragraph of discussion: the word consensus.
ANSWER: Thank you for your comment. We went through manuscript and did grammar check and revised the legends to the figures for clarity.
Please add an abbreviation section at the end of the manuscript, and make sure that all abbreviations are spelled out at the first encounter throughout the manuscript.
ANSWER: Thank you, we have added the explanations in the first encounter now, and the Abbreviations section at the end.
The current study fills a current gap in the literature as identified by the authors early in the introduction section. This section can probably be restructured for better understanding of the readers, in the following manner: ideally, the first paragraph would discuss general aspects of peer review, therefore, the gap in literature can be moved towards the end of the introduction, after discussing a few aspects, models, shortcomings, etc.
ANSWER: Thank you for this suggestion. We cite a recent systematic review on grant review, so we do not consider that it is important to repeat the same information, as we follow the standards on the length of introduction section in biomedical papers.
Additionally, the authors may provide examples for the quantitative characteristics in the introduction, and/or further elaborate on the effect of quantitative characteristics on reviewers’ evaluation. Providing more information regarding the Marie Skłodowska-Curie Actions (MSCA) would probably help supplement this section and provide a better understanding of the dynamics of this grant. Furthermore, it could have been useful to touch on other possible proposals’ evaluating methods such as scoring and ranking, as can be found in[1]. The last paragraph may be used to highlight further the gap in the literature that the authors aim to bridge with this project, indicating that peer review is highly versatile and a diverse process, yet further focusing on the need to address the general lack of theoretical understanding in multiple aspects of peer review, as can be seen in: [2], and/or the paucity of experimental research in proposals’ peer review as summarized by [3].
ANSWER: Thank you, we cited our previous studies on MSCA grant proposal review, in order to keept this paper more compact and clear. As for other types of evaluating proposals, we cite a recent systematic review on this topics, which has the information suggested by the reviewer.
The authors may want to further elaborate on the reasons for removing the numerical score from MSCA? Was it for the effect of simplifying the peer review process on reviewers’ engagement and quality of report? as pointed out by [4]
ANSWER: The reason for removal of the numerical scores was an intention to shift focus from numerical scores of the proposals to the text, which represents the feedback on the evaluation process. We have now added this information in the text, along with the reference you suggest.
The study design is cross sectional but not observational. The latter terminology may be removed.
ANSWER: We have removed the term “observational”.

What is the justification for including data from only 2019 and 2020? Why not expand on the size of data by including more years.
ANSWER: Unfortunately, we did not have data from other years on our disposal.

Further information may be added to the assessment tool, Linguistic Inquiry and Word Count (LWIC) software, such as the validity and reliability, the LWIC scores, possible cutoffs, totals, etc. The authors do provide a reference, but it would just be easier to have it readily accessible while reading.
ANSWER: We have uploaded the table with explanations of dimensions, and total scores. Since we used the summary variables, there is no provided reliability coefficient from the original study (Pennebaker et al, 2015).

Under the bias heading: What is the role of including the whole cohort of submitted proposals in 2019 and 2020 in eliminating potential sampling bias? Please make it clear for the readers.
ANSWER: We aimed to state that we avoided potential problems in the interpretation in case we took only samples from the cohorts in both years.

In the statistical analysis section: How were the predictors established? Was it based on an analysis or from the literature? It would be advised to clarify this detail.
ANSWER: As explained above, since there was a significant number of potential predictors, and this was an exploratory analysis, we decided to conduct variable selection, firstly by univariate testing and then by multivariate testing through logistic regression. The process follows the procedure recommended by :
Bursac, Z., Gauss, C.H., Williams, D.K. et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 3, 17 (2008). https://doi.org/10.1186/1751-0473-3-17
What we now provide are t-tests for comparison between 2019 and 2020, the initial model and final model for call comparison; and also Kruskal Wallis test due to sample size discrepancy, the initial model, and final model published online publicly. Also, Kruskal Wallis tests were done in R, since JASP program cannot support the size of the analysis.
The entire procedure is now uploaded on OSF, comparisons, initial and final regression models, as well as descriptive statistics.

Additionally, please justify using the specified p-value 0.001 in your model, as it appears too stringent.
ANSWER: The reason for the P<0.001 was due to the significant number of variables tested, in order to avoid alpha error. However, we now also provide the entire analysis, where the reviewer can see that in logistic regression, all significant variables were significant at P<0.001 level.

In the results section: The majority of proposals were from the physical science and engineering fields. Did this play a role in the outcome of the study, where no differences were detected before and after? The authors may want to expand on this in the discussion section.
Additionally, is it possible that COVID19 resultant urgency have affected the review of proposals in 2020?
ANSWER: Thank you for your suggestions. Both of them seem plausible, and we added them to the discussion section.

It would also be useful to further elaborate on the blinding model adopted by the MSCA and how they may have influenced the results.
ANSWER: Thank you for this comment. Individual assessment of grant proposal are done independently by each expert, but they are not blinded to the applicant identity. They are blinded to the indentity of other experts, but they meet at the consensus stage. As this part of the evalution did not change, we did not address this in the manuscript as it would not have had an impact on the results.

In Table 3: How can the average word count be negative? Please provide an explanation.
ANSWER: We subtracted results from ESR and IER, and in the table we present only differences. The negative direction only indicates that the values in IER were higher than in ESR.

It would be great to supplement the discussion section with a few recommendations for future researchers into the topic.
ANSWER: Thanks, we added some recommendations in the Discussion section.

Have the authors considered adjusting for possible confounding factors, such as research domain, etc.? If not , please discuss the possible effects of such confounding factors in the Discussion section.
ANSWER: Thank you for your comment. We have not included this in the paragraph on limitation in the Discussion section in previous version but it is included now.
Thank you for your comments and we are ready to do any additional comments if needed.
Kind regards,
Ivan Buljan
Dear reviewers, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 3: Approved with reservations
The study in hand evaluates whether reviewer comments on research proposals differed after the restructuring of the Marie Sklodowska-Curie Actions (MSCA) Innovative Training Networks (ITN) scoring scheme by eliminating the possibility of scoring using numbers. The authors utilize proposal requests data from 2019 and 2020 and found that linguistic characteristics did not differ after the removal of numerical scores from the specified scoring scheme. The following comments are applicable to the current draft of the manuscript:
It would be strongly advised that the authors run grammar and typo check, or to obtain language and editing services. The current version of the manuscript has multiple occasions of misspelled words, grammatical errors, and typos. Some examples are figure 1 or 2: the word strengths. First paragraph of discussion: from both. Third paragraph of discussion: the word consensus.
ANSWER: Thank you for your comment. We went through manuscript and did grammar check and revised the legends to the figures for clarity.
Please add an abbreviation section at the end of the manuscript, and make sure that all abbreviations are spelled out at the first encounter throughout the manuscript.
ANSWER: Thank you, we have added the explanations in the first encounter now, and the Abbreviations section at the end.
The current study fills a current gap in the literature as identified by the authors early in the introduction section. This section can probably be restructured for better understanding of the readers, in the following manner: ideally, the first paragraph would discuss general aspects of peer review, therefore, the gap in literature can be moved towards the end of the introduction, after discussing a few aspects, models, shortcomings, etc.
ANSWER: Thank you for this suggestion. We cite a recent systematic review on grant review, so we do not consider that it is important to repeat the same information, as we follow the standards on the length of introduction section in biomedical papers.
Additionally, the authors may provide examples for the quantitative characteristics in the introduction, and/or further elaborate on the effect of quantitative characteristics on reviewers’ evaluation. Providing more information regarding the Marie Skłodowska-Curie Actions (MSCA) would probably help supplement this section and provide a better understanding of the dynamics of this grant. Furthermore, it could have been useful to touch on other possible proposals’ evaluating methods such as scoring and ranking, as can be found in[1]. The last paragraph may be used to highlight further the gap in the literature that the authors aim to bridge with this project, indicating that peer review is highly versatile and a diverse process, yet further focusing on the need to address the general lack of theoretical understanding in multiple aspects of peer review, as can be seen in: [2], and/or the paucity of experimental research in proposals’ peer review as summarized by [3].
ANSWER: Thank you, we cited our previous studies on MSCA grant proposal review, in order to keept this paper more compact and clear. As for other types of evaluating proposals, we cite a recent systematic review on this topics, which has the information suggested by the reviewer.
The authors may want to further elaborate on the reasons for removing the numerical score from MSCA? Was it for the effect of simplifying the peer review process on reviewers’ engagement and quality of report? as pointed out by [4]
ANSWER: The reason for removal of the numerical scores was an intention to shift focus from numerical scores of the proposals to the text, which represents the feedback on the evaluation process. We have now added this information in the text, along with the reference you suggest.
The study design is cross sectional but not observational. The latter terminology may be removed.
ANSWER: We have removed the term “observational”.

What is the justification for including data from only 2019 and 2020? Why not expand on the size of data by including more years.
ANSWER: Unfortunately, we did not have data from other years on our disposal.

Further information may be added to the assessment tool, Linguistic Inquiry and Word Count (LWIC) software, such as the validity and reliability, the LWIC scores, possible cutoffs, totals, etc. The authors do provide a reference, but it would just be easier to have it readily accessible while reading.
ANSWER: We have uploaded the table with explanations of dimensions, and total scores. Since we used the summary variables, there is no provided reliability coefficient from the original study (Pennebaker et al, 2015).

Under the bias heading: What is the role of including the whole cohort of submitted proposals in 2019 and 2020 in eliminating potential sampling bias? Please make it clear for the readers.
ANSWER: We aimed to state that we avoided potential problems in the interpretation in case we took only samples from the cohorts in both years.

In the statistical analysis section: How were the predictors established? Was it based on an analysis or from the literature? It would be advised to clarify this detail.
ANSWER: As explained above, since there was a significant number of potential predictors, and this was an exploratory analysis, we decided to conduct variable selection, firstly by univariate testing and then by multivariate testing through logistic regression. The process follows the procedure recommended by :
Bursac, Z., Gauss, C.H., Williams, D.K. et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 3, 17 (2008). https://doi.org/10.1186/1751-0473-3-17
What we now provide are t-tests for comparison between 2019 and 2020, the initial model and final model for call comparison; and also Kruskal Wallis test due to sample size discrepancy, the initial model, and final model published online publicly. Also, Kruskal Wallis tests were done in R, since JASP program cannot support the size of the analysis.
The entire procedure is now uploaded on OSF, comparisons, initial and final regression models, as well as descriptive statistics.

Additionally, please justify using the specified p-value 0.001 in your model, as it appears too stringent.
ANSWER: The reason for the P<0.001 was due to the significant number of variables tested, in order to avoid alpha error. However, we now also provide the entire analysis, where the reviewer can see that in logistic regression, all significant variables were significant at P<0.001 level.

In the results section: The majority of proposals were from the physical science and engineering fields. Did this play a role in the outcome of the study, where no differences were detected before and after? The authors may want to expand on this in the discussion section.
Additionally, is it possible that COVID19 resultant urgency have affected the review of proposals in 2020?
ANSWER: Thank you for your suggestions. Both of them seem plausible, and we added them to the discussion section.

It would also be useful to further elaborate on the blinding model adopted by the MSCA and how they may have influenced the results.
ANSWER: Thank you for this comment. Individual assessment of grant proposal are done independently by each expert, but they are not blinded to the applicant identity. They are blinded to the indentity of other experts, but they meet at the consensus stage. As this part of the evalution did not change, we did not address this in the manuscript as it would not have had an impact on the results.

In Table 3: How can the average word count be negative? Please provide an explanation.
ANSWER: We subtracted results from ESR and IER, and in the table we present only differences. The negative direction only indicates that the values in IER were higher than in ESR.

It would be great to supplement the discussion section with a few recommendations for future researchers into the topic.
ANSWER: Thanks, we added some recommendations in the Discussion section.

Have the authors considered adjusting for possible confounding factors, such as research domain, etc.? If not , please discuss the possible effects of such confounding factors in the Discussion section.
ANSWER: Thank you for your comment. We have not included this in the paragraph on limitation in the Discussion section in previous version but it is included now.
Thank you for your comments and we are ready to do any additional comments if needed.
Kind regards,
Ivan Buljan
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 26 Aug 2024

Ivan Buljan, Department of Psychology, Faculty of Humanities and Social Sciences in Split, University of Split, Split, Croatia

26 Aug 2024

Author Response

Dear reviewers, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 3: Approved with reservations
... Continue reading Dear reviewers, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 3: Approved with reservations
The study in hand evaluates whether reviewer comments on research proposals differed after the restructuring of the Marie Sklodowska-Curie Actions (MSCA) Innovative Training Networks (ITN) scoring scheme by eliminating the possibility of scoring using numbers. The authors utilize proposal requests data from 2019 and 2020 and found that linguistic characteristics did not differ after the removal of numerical scores from the specified scoring scheme. The following comments are applicable to the current draft of the manuscript:
It would be strongly advised that the authors run grammar and typo check, or to obtain language and editing services. The current version of the manuscript has multiple occasions of misspelled words, grammatical errors, and typos. Some examples are figure 1 or 2: the word strengths. First paragraph of discussion: from both. Third paragraph of discussion: the word consensus.
ANSWER: Thank you for your comment. We went through manuscript and did grammar check and revised the legends to the figures for clarity.
Please add an abbreviation section at the end of the manuscript, and make sure that all abbreviations are spelled out at the first encounter throughout the manuscript.
ANSWER: Thank you, we have added the explanations in the first encounter now, and the Abbreviations section at the end.
The current study fills a current gap in the literature as identified by the authors early in the introduction section. This section can probably be restructured for better understanding of the readers, in the following manner: ideally, the first paragraph would discuss general aspects of peer review, therefore, the gap in literature can be moved towards the end of the introduction, after discussing a few aspects, models, shortcomings, etc.
ANSWER: Thank you for this suggestion. We cite a recent systematic review on grant review, so we do not consider that it is important to repeat the same information, as we follow the standards on the length of introduction section in biomedical papers.
Additionally, the authors may provide examples for the quantitative characteristics in the introduction, and/or further elaborate on the effect of quantitative characteristics on reviewers’ evaluation. Providing more information regarding the Marie Skłodowska-Curie Actions (MSCA) would probably help supplement this section and provide a better understanding of the dynamics of this grant. Furthermore, it could have been useful to touch on other possible proposals’ evaluating methods such as scoring and ranking, as can be found in[1]. The last paragraph may be used to highlight further the gap in the literature that the authors aim to bridge with this project, indicating that peer review is highly versatile and a diverse process, yet further focusing on the need to address the general lack of theoretical understanding in multiple aspects of peer review, as can be seen in: [2], and/or the paucity of experimental research in proposals’ peer review as summarized by [3].
ANSWER: Thank you, we cited our previous studies on MSCA grant proposal review, in order to keept this paper more compact and clear. As for other types of evaluating proposals, we cite a recent systematic review on this topics, which has the information suggested by the reviewer.
The authors may want to further elaborate on the reasons for removing the numerical score from MSCA? Was it for the effect of simplifying the peer review process on reviewers’ engagement and quality of report? as pointed out by [4]
ANSWER: The reason for removal of the numerical scores was an intention to shift focus from numerical scores of the proposals to the text, which represents the feedback on the evaluation process. We have now added this information in the text, along with the reference you suggest.
The study design is cross sectional but not observational. The latter terminology may be removed.
ANSWER: We have removed the term “observational”.

What is the justification for including data from only 2019 and 2020? Why not expand on the size of data by including more years.
ANSWER: Unfortunately, we did not have data from other years on our disposal.

Further information may be added to the assessment tool, Linguistic Inquiry and Word Count (LWIC) software, such as the validity and reliability, the LWIC scores, possible cutoffs, totals, etc. The authors do provide a reference, but it would just be easier to have it readily accessible while reading.
ANSWER: We have uploaded the table with explanations of dimensions, and total scores. Since we used the summary variables, there is no provided reliability coefficient from the original study (Pennebaker et al, 2015).

Under the bias heading: What is the role of including the whole cohort of submitted proposals in 2019 and 2020 in eliminating potential sampling bias? Please make it clear for the readers.
ANSWER: We aimed to state that we avoided potential problems in the interpretation in case we took only samples from the cohorts in both years.

In the statistical analysis section: How were the predictors established? Was it based on an analysis or from the literature? It would be advised to clarify this detail.
ANSWER: As explained above, since there was a significant number of potential predictors, and this was an exploratory analysis, we decided to conduct variable selection, firstly by univariate testing and then by multivariate testing through logistic regression. The process follows the procedure recommended by :
Bursac, Z., Gauss, C.H., Williams, D.K. et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 3, 17 (2008). https://doi.org/10.1186/1751-0473-3-17
What we now provide are t-tests for comparison between 2019 and 2020, the initial model and final model for call comparison; and also Kruskal Wallis test due to sample size discrepancy, the initial model, and final model published online publicly. Also, Kruskal Wallis tests were done in R, since JASP program cannot support the size of the analysis.
The entire procedure is now uploaded on OSF, comparisons, initial and final regression models, as well as descriptive statistics.

Additionally, please justify using the specified p-value 0.001 in your model, as it appears too stringent.
ANSWER: The reason for the P<0.001 was due to the significant number of variables tested, in order to avoid alpha error. However, we now also provide the entire analysis, where the reviewer can see that in logistic regression, all significant variables were significant at P<0.001 level.

In the results section: The majority of proposals were from the physical science and engineering fields. Did this play a role in the outcome of the study, where no differences were detected before and after? The authors may want to expand on this in the discussion section.
Additionally, is it possible that COVID19 resultant urgency have affected the review of proposals in 2020?
ANSWER: Thank you for your suggestions. Both of them seem plausible, and we added them to the discussion section.

It would also be useful to further elaborate on the blinding model adopted by the MSCA and how they may have influenced the results.
ANSWER: Thank you for this comment. Individual assessment of grant proposal are done independently by each expert, but they are not blinded to the applicant identity. They are blinded to the indentity of other experts, but they meet at the consensus stage. As this part of the evalution did not change, we did not address this in the manuscript as it would not have had an impact on the results.

In Table 3: How can the average word count be negative? Please provide an explanation.
ANSWER: We subtracted results from ESR and IER, and in the table we present only differences. The negative direction only indicates that the values in IER were higher than in ESR.

It would be great to supplement the discussion section with a few recommendations for future researchers into the topic.
ANSWER: Thanks, we added some recommendations in the Discussion section.

Have the authors considered adjusting for possible confounding factors, such as research domain, etc.? If not , please discuss the possible effects of such confounding factors in the Discussion section.
ANSWER: Thank you for your comment. We have not included this in the paragraph on limitation in the Discussion section in previous version but it is included now.
Thank you for your comments and we are ready to do any additional comments if needed.
Kind regards,
Ivan Buljan
Dear reviewers, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 3: Approved with reservations
The study in hand evaluates whether reviewer comments on research proposals differed after the restructuring of the Marie Sklodowska-Curie Actions (MSCA) Innovative Training Networks (ITN) scoring scheme by eliminating the possibility of scoring using numbers. The authors utilize proposal requests data from 2019 and 2020 and found that linguistic characteristics did not differ after the removal of numerical scores from the specified scoring scheme. The following comments are applicable to the current draft of the manuscript:
It would be strongly advised that the authors run grammar and typo check, or to obtain language and editing services. The current version of the manuscript has multiple occasions of misspelled words, grammatical errors, and typos. Some examples are figure 1 or 2: the word strengths. First paragraph of discussion: from both. Third paragraph of discussion: the word consensus.
ANSWER: Thank you for your comment. We went through manuscript and did grammar check and revised the legends to the figures for clarity.
Please add an abbreviation section at the end of the manuscript, and make sure that all abbreviations are spelled out at the first encounter throughout the manuscript.
ANSWER: Thank you, we have added the explanations in the first encounter now, and the Abbreviations section at the end.
The current study fills a current gap in the literature as identified by the authors early in the introduction section. This section can probably be restructured for better understanding of the readers, in the following manner: ideally, the first paragraph would discuss general aspects of peer review, therefore, the gap in literature can be moved towards the end of the introduction, after discussing a few aspects, models, shortcomings, etc.
ANSWER: Thank you for this suggestion. We cite a recent systematic review on grant review, so we do not consider that it is important to repeat the same information, as we follow the standards on the length of introduction section in biomedical papers.
Additionally, the authors may provide examples for the quantitative characteristics in the introduction, and/or further elaborate on the effect of quantitative characteristics on reviewers’ evaluation. Providing more information regarding the Marie Skłodowska-Curie Actions (MSCA) would probably help supplement this section and provide a better understanding of the dynamics of this grant. Furthermore, it could have been useful to touch on other possible proposals’ evaluating methods such as scoring and ranking, as can be found in[1]. The last paragraph may be used to highlight further the gap in the literature that the authors aim to bridge with this project, indicating that peer review is highly versatile and a diverse process, yet further focusing on the need to address the general lack of theoretical understanding in multiple aspects of peer review, as can be seen in: [2], and/or the paucity of experimental research in proposals’ peer review as summarized by [3].
ANSWER: Thank you, we cited our previous studies on MSCA grant proposal review, in order to keept this paper more compact and clear. As for other types of evaluating proposals, we cite a recent systematic review on this topics, which has the information suggested by the reviewer.
The authors may want to further elaborate on the reasons for removing the numerical score from MSCA? Was it for the effect of simplifying the peer review process on reviewers’ engagement and quality of report? as pointed out by [4]
ANSWER: The reason for removal of the numerical scores was an intention to shift focus from numerical scores of the proposals to the text, which represents the feedback on the evaluation process. We have now added this information in the text, along with the reference you suggest.
The study design is cross sectional but not observational. The latter terminology may be removed.
ANSWER: We have removed the term “observational”.

What is the justification for including data from only 2019 and 2020? Why not expand on the size of data by including more years.
ANSWER: Unfortunately, we did not have data from other years on our disposal.

Further information may be added to the assessment tool, Linguistic Inquiry and Word Count (LWIC) software, such as the validity and reliability, the LWIC scores, possible cutoffs, totals, etc. The authors do provide a reference, but it would just be easier to have it readily accessible while reading.
ANSWER: We have uploaded the table with explanations of dimensions, and total scores. Since we used the summary variables, there is no provided reliability coefficient from the original study (Pennebaker et al, 2015).

Under the bias heading: What is the role of including the whole cohort of submitted proposals in 2019 and 2020 in eliminating potential sampling bias? Please make it clear for the readers.
ANSWER: We aimed to state that we avoided potential problems in the interpretation in case we took only samples from the cohorts in both years.

In the statistical analysis section: How were the predictors established? Was it based on an analysis or from the literature? It would be advised to clarify this detail.
ANSWER: As explained above, since there was a significant number of potential predictors, and this was an exploratory analysis, we decided to conduct variable selection, firstly by univariate testing and then by multivariate testing through logistic regression. The process follows the procedure recommended by :
Bursac, Z., Gauss, C.H., Williams, D.K. et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 3, 17 (2008). https://doi.org/10.1186/1751-0473-3-17
What we now provide are t-tests for comparison between 2019 and 2020, the initial model and final model for call comparison; and also Kruskal Wallis test due to sample size discrepancy, the initial model, and final model published online publicly. Also, Kruskal Wallis tests were done in R, since JASP program cannot support the size of the analysis.
The entire procedure is now uploaded on OSF, comparisons, initial and final regression models, as well as descriptive statistics.

Additionally, please justify using the specified p-value 0.001 in your model, as it appears too stringent.
ANSWER: The reason for the P<0.001 was due to the significant number of variables tested, in order to avoid alpha error. However, we now also provide the entire analysis, where the reviewer can see that in logistic regression, all significant variables were significant at P<0.001 level.

In the results section: The majority of proposals were from the physical science and engineering fields. Did this play a role in the outcome of the study, where no differences were detected before and after? The authors may want to expand on this in the discussion section.
Additionally, is it possible that COVID19 resultant urgency have affected the review of proposals in 2020?
ANSWER: Thank you for your suggestions. Both of them seem plausible, and we added them to the discussion section.

It would also be useful to further elaborate on the blinding model adopted by the MSCA and how they may have influenced the results.
ANSWER: Thank you for this comment. Individual assessment of grant proposal are done independently by each expert, but they are not blinded to the applicant identity. They are blinded to the indentity of other experts, but they meet at the consensus stage. As this part of the evalution did not change, we did not address this in the manuscript as it would not have had an impact on the results.

In Table 3: How can the average word count be negative? Please provide an explanation.
ANSWER: We subtracted results from ESR and IER, and in the table we present only differences. The negative direction only indicates that the values in IER were higher than in ESR.

It would be great to supplement the discussion section with a few recommendations for future researchers into the topic.
ANSWER: Thanks, we added some recommendations in the Discussion section.

Have the authors considered adjusting for possible confounding factors, such as research domain, etc.? If not , please discuss the possible effects of such confounding factors in the Discussion section.
ANSWER: Thank you for your comment. We have not included this in the paragraph on limitation in the Discussion section in previous version but it is included now.
Thank you for your comments and we are ready to do any additional comments if needed.
Kind regards,
Ivan Buljan
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 12 Mar 2024

Sven E Hug, Department of Psychology, Social and Business Psychology, University of Zurich, Zürich, Switzerland; Department of Psychology, University of Zurich, Zürich, north-central Switzerland, 8021, Switzerland

Not Approved

https://doi.org/10.5256/f1000research.153043.r246537

The study examines whether and how the removal of rating scores in grant peer review changes the reviewers’ texts. The analysis is based on data from a funding scheme where the rating scores have been removed at the level of individual reviews (Marie Skłodowska-Curie Actions: Innovative Training Networks, ITN). Data was analyzed using the Linguistic Inquiry and Word Count software (LIWC). The results presented in Figures 1 and 2 show that the removal of rating scores had no effect on the four LIWC categories employed in the study, suggesting that written assessments are not affected by whether reviewers are mandated to assign scores to a proposal or not.

The study examines a relevant and original research question, uses a unique data set, and employs an analytical approach previously applied to peer review data (LIWC). However, the study needs to be improved in the following respects:

(1) Please justify why the five panels MAT-PHY-CHE-ENG-ENV were clustered into one “research domain” and why the two panels ECO-SOC were grouped together. The samples do have very different sizes.

(2) In the section “Statistical analysis”, variables are mentioned that have not been described in the section “Variables analyzed” (e.g., “word count for research excellence weaknesses”). I suggest that all variables used in the study are listed in a table, including the values the variables can take. If the variables are not self-explanatory, please also provide a description of the variable.

(3) Many analytical categories are available in LIWC. The current study used four. Please explain why these four categories were chosen and why they are appropriate for addressing the research question.

(4) In the section “Statistical analysis”, it is stated that the variables that were not significant were excluded from further analysis. Does this mean that variables were excluded before conducting regression analyses? Please cite a method paper that justifies this approach. Please state which variables were excluded and indicate their p values? Please report the results of the t-tests and Chi-square tests according to APA standards.

(5) In the preregistered study design, regression analysis has not been mentioned. Please explain why regressions were computed and how they are related to the research question. Please also explain why the variables were regressed on proposal status (Table 1). Please state which variable was predicted in Table 2 and why it was predicted (does “ITN calls” mean the years of the call?).

(6) Tables 3 and 4 are extremely difficult to process and understand. Therefore, I cannot follow the conclusions drawn in the last paragraph of page 6. Please describe more clearly how you arrived at your conclusions and/or present the data in a way that is easier to understand.

(7) “Discussion” and “Conclusions”: Please discuss the implications of your findings for peer review practice. Please suggest what future research could investigate.

(8) Please upload the R and Jamovi code to OSF so that others can reproduce your analysis.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: research evaluation, bibliometrics, peer review

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Author Response 05 Sep 2024

Ivan Buljan, Department of Psychology, Faculty of Humanities and Social Sciences in Split, University of Split, Split, Croatia

05 Sep 2024

Author Response

Dear reviewer, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 2: Not approved

... Continue reading Dear reviewer, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 2: Not approved

Reviewer comment: The study examines whether and how the removal of rating scores in grant peer review changes the reviewers’ texts. The analysis is based on data from a funding scheme where the rating scores have been removed at the level of individual reviews (Marie Skłodowska-Curie Actions: Innovative Training Networks, ITN). Data was analyzed using the Linguistic Inquiry and Word Count software (LIWC). The results presented in Figures 1 and 2 show that the removal of rating scores had no effect on the four LIWC categories employed in the study, suggesting that written assessments are not affected by whether reviewers are mandated to assign scores to a proposal or not.
The study examines a relevant and original research question, uses a unique data set, and employs an analytical approach previously applied to peer review data (LIWC). However, the study needs to be improved in the following respects:
(1) Please justify why the five panels MAT-PHY-CHE-ENG-ENV were clustered into one “research domain” and why the two panels ECO-SOC were grouped. The samples do have very different sizes.
Author response: Thank you for your comment. We used European Research Council Field classification: https://ejoss.euras-edu.org/erc-field-classification/#:~:text=There%20are%2025%20ERC%20panels,Panels%2C%20LS1%E2%80%93LS9). We understand that samples have different sizes, but this approach allowed us to narrow down the number of categories in a logical manner for statistical analysis.

Reviewer comment: (2) In the section “Statistical analysis”, variables are mentioned that have not been described in the section “Variables analyzed” (e.g., “word count for research excellence weaknesses”). I suggest that all variables used in the study be listed in a table, including the values the variables can take. If the variables are not self-explanatory, please also describe the variable.
Author response: Thank you for your comment, the table with explanations is now provided.
Table 1
Description of LIWC variables used in the study
Refer Table 1

For the provided example “word count for research excellence weaknesses”, the variable is word count, but only for a subset of reports that discuss proposal weaknesses.

Reviewer comment: (3) Many analytical categories are available in LIWC. The current study used four. Please explain why these four categories were chosen and why they are appropriate for addressing the research question.
Author response: We used the four most comprehensive categories, which have been applied in most studies. It is true that by using more specific categories we might find differences, but we would need to apply and compare a vast number of categories to do that. So we opted for a more pragmatic approach in our study. We now state this in the limitations:
“…Also, we used the most comprehensive categories, which are very broad. LIWC dictionary contains many different categories, and it is possible that by using more specific categories, one may find meaningful differences…”

Reviewer comment: (4) In the section “Statistical analysis”, it is stated that the variables that were not significant were excluded from further analysis. Does this mean that variables were excluded before conducting regression analyses? Please cite a method paper that justifies this approach. Please state which variables were excluded and indicate their p values? Please report the results of the t-tests and Chi-square tests according to APA standards.
Author response: Yes, since there was a significant number of potential predictors, and this was an exploratory analysis, we decided to conduct variable selection, firstly by univariate testing and then by multivariate testing through logistic regression. The process follows the procedure recommended by :
Bursac, Z., Gauss, C.H., Williams, D.K. et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 3, 17 (2008). https://doi.org/10.1186/1751-0473-3-17
What we now provide are t-tests for comparison between 2019 and 2020, the initial model and final model for call comparison; and also Kruskal Wallis test due to sample size discrepancy, the initial model, and final model published online publicly. Also, Kruskal Wallis tests were done in R, since JASP program cannot support the size of the analysis.
The entire procedure is now uploaded on OSF, comparisons, initial and final regression models, as well as descriptive statistics.

Reviewer comment: (5) In the preregistered study design, regression analysis has not been mentioned. Please explain why regressions were computed and how they are related to the research question.
Author response: Thank you for your observation. Due to the significant number of comparisons, we decided to utilize regression after the preregistration has been done. We now labeled that part in statistical analysis as : “Deviation from the protocol”.

Reviewer comment: Please also explain why the variables were regressed on proposal status (Table 1).
Author response: We intended to determine whether the linguistic characteristics of IER or CR comments predict the status of the proposal. We now state this in the analysis section.

Reviewer comment: Please state which variable was predicted in Table 2 and why it was predicted (does “ITN calls” mean the years of the call?).
Author response: We are unsure what the reviewer suggests. The description of the criterion variable is provided below the table. The explanation for ITN is Innovative Training Networks, and it is now added below tables.

Reviewer comment: (6) Tables 3 and 4 are extremely difficult to process and understand. Therefore, I cannot follow the conclusions drawn in the last paragraph of page 6. Please describe more clearly how you arrived at your conclusions and/or present the data in a way that is easier to understand.
Author response: We apologize to reviewer for the confusion. We added more text to make the results more clear.

Reviewer comment: (7) “Discussion” and “Conclusions”: Please discuss the implications of your findings for peer review practice. Please suggest what future research could investigate.
Author response: Thank you for your comments, we will add recommendations for future.

Reviewer comment: (8) Please upload the R and Jamovi code to OSF so that others can reproduce your analysis.
Author response: We have uploaded all analyses, and data, to OSF as JASP files.

Thank you for your time spent in reviewing our manuscript.
Kind regards,
Ivan Buljan
Dear reviewer, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 2: Not approved

Reviewer comment: The study examines whether and how the removal of rating scores in grant peer review changes the reviewers’ texts. The analysis is based on data from a funding scheme where the rating scores have been removed at the level of individual reviews (Marie Skłodowska-Curie Actions: Innovative Training Networks, ITN). Data was analyzed using the Linguistic Inquiry and Word Count software (LIWC). The results presented in Figures 1 and 2 show that the removal of rating scores had no effect on the four LIWC categories employed in the study, suggesting that written assessments are not affected by whether reviewers are mandated to assign scores to a proposal or not.
The study examines a relevant and original research question, uses a unique data set, and employs an analytical approach previously applied to peer review data (LIWC). However, the study needs to be improved in the following respects:
(1) Please justify why the five panels MAT-PHY-CHE-ENG-ENV were clustered into one “research domain” and why the two panels ECO-SOC were grouped. The samples do have very different sizes.
Author response: Thank you for your comment. We used European Research Council Field classification: https://ejoss.euras-edu.org/erc-field-classification/#:~:text=There%20are%2025%20ERC%20panels,Panels%2C%20LS1%E2%80%93LS9). We understand that samples have different sizes, but this approach allowed us to narrow down the number of categories in a logical manner for statistical analysis.

Reviewer comment: (2) In the section “Statistical analysis”, variables are mentioned that have not been described in the section “Variables analyzed” (e.g., “word count for research excellence weaknesses”). I suggest that all variables used in the study be listed in a table, including the values the variables can take. If the variables are not self-explanatory, please also describe the variable.
Author response: Thank you for your comment, the table with explanations is now provided.
Table 1
Description of LIWC variables used in the study
Refer Table 1

For the provided example “word count for research excellence weaknesses”, the variable is word count, but only for a subset of reports that discuss proposal weaknesses.

Reviewer comment: (3) Many analytical categories are available in LIWC. The current study used four. Please explain why these four categories were chosen and why they are appropriate for addressing the research question.
Author response: We used the four most comprehensive categories, which have been applied in most studies. It is true that by using more specific categories we might find differences, but we would need to apply and compare a vast number of categories to do that. So we opted for a more pragmatic approach in our study. We now state this in the limitations:
“…Also, we used the most comprehensive categories, which are very broad. LIWC dictionary contains many different categories, and it is possible that by using more specific categories, one may find meaningful differences…”

Reviewer comment: (4) In the section “Statistical analysis”, it is stated that the variables that were not significant were excluded from further analysis. Does this mean that variables were excluded before conducting regression analyses? Please cite a method paper that justifies this approach. Please state which variables were excluded and indicate their p values? Please report the results of the t-tests and Chi-square tests according to APA standards.
Author response: Yes, since there was a significant number of potential predictors, and this was an exploratory analysis, we decided to conduct variable selection, firstly by univariate testing and then by multivariate testing through logistic regression. The process follows the procedure recommended by :
Bursac, Z., Gauss, C.H., Williams, D.K. et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 3, 17 (2008). https://doi.org/10.1186/1751-0473-3-17
What we now provide are t-tests for comparison between 2019 and 2020, the initial model and final model for call comparison; and also Kruskal Wallis test due to sample size discrepancy, the initial model, and final model published online publicly. Also, Kruskal Wallis tests were done in R, since JASP program cannot support the size of the analysis.
The entire procedure is now uploaded on OSF, comparisons, initial and final regression models, as well as descriptive statistics.

Reviewer comment: (5) In the preregistered study design, regression analysis has not been mentioned. Please explain why regressions were computed and how they are related to the research question.
Author response: Thank you for your observation. Due to the significant number of comparisons, we decided to utilize regression after the preregistration has been done. We now labeled that part in statistical analysis as : “Deviation from the protocol”.

Reviewer comment: Please also explain why the variables were regressed on proposal status (Table 1).
Author response: We intended to determine whether the linguistic characteristics of IER or CR comments predict the status of the proposal. We now state this in the analysis section.

Reviewer comment: Please state which variable was predicted in Table 2 and why it was predicted (does “ITN calls” mean the years of the call?).
Author response: We are unsure what the reviewer suggests. The description of the criterion variable is provided below the table. The explanation for ITN is Innovative Training Networks, and it is now added below tables.

Reviewer comment: (6) Tables 3 and 4 are extremely difficult to process and understand. Therefore, I cannot follow the conclusions drawn in the last paragraph of page 6. Please describe more clearly how you arrived at your conclusions and/or present the data in a way that is easier to understand.
Author response: We apologize to reviewer for the confusion. We added more text to make the results more clear.

Reviewer comment: (7) “Discussion” and “Conclusions”: Please discuss the implications of your findings for peer review practice. Please suggest what future research could investigate.
Author response: Thank you for your comments, we will add recommendations for future.

Reviewer comment: (8) Please upload the R and Jamovi code to OSF so that others can reproduce your analysis.
Author response: We have uploaded all analyses, and data, to OSF as JASP files.

Thank you for your time spent in reviewing our manuscript.
Kind regards,
Ivan Buljan
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 05 Sep 2024

Ivan Buljan, Department of Psychology, Faculty of Humanities and Social Sciences in Split, University of Split, Split, Croatia

05 Sep 2024

Author Response

Dear reviewer, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 2: Not approved

... Continue reading Dear reviewer, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 2: Not approved

Reviewer comment: The study examines whether and how the removal of rating scores in grant peer review changes the reviewers’ texts. The analysis is based on data from a funding scheme where the rating scores have been removed at the level of individual reviews (Marie Skłodowska-Curie Actions: Innovative Training Networks, ITN). Data was analyzed using the Linguistic Inquiry and Word Count software (LIWC). The results presented in Figures 1 and 2 show that the removal of rating scores had no effect on the four LIWC categories employed in the study, suggesting that written assessments are not affected by whether reviewers are mandated to assign scores to a proposal or not.
The study examines a relevant and original research question, uses a unique data set, and employs an analytical approach previously applied to peer review data (LIWC). However, the study needs to be improved in the following respects:
(1) Please justify why the five panels MAT-PHY-CHE-ENG-ENV were clustered into one “research domain” and why the two panels ECO-SOC were grouped. The samples do have very different sizes.
Author response: Thank you for your comment. We used European Research Council Field classification: https://ejoss.euras-edu.org/erc-field-classification/#:~:text=There%20are%2025%20ERC%20panels,Panels%2C%20LS1%E2%80%93LS9). We understand that samples have different sizes, but this approach allowed us to narrow down the number of categories in a logical manner for statistical analysis.

Reviewer comment: (2) In the section “Statistical analysis”, variables are mentioned that have not been described in the section “Variables analyzed” (e.g., “word count for research excellence weaknesses”). I suggest that all variables used in the study be listed in a table, including the values the variables can take. If the variables are not self-explanatory, please also describe the variable.
Author response: Thank you for your comment, the table with explanations is now provided.
Table 1
Description of LIWC variables used in the study
Refer Table 1

For the provided example “word count for research excellence weaknesses”, the variable is word count, but only for a subset of reports that discuss proposal weaknesses.

Reviewer comment: (3) Many analytical categories are available in LIWC. The current study used four. Please explain why these four categories were chosen and why they are appropriate for addressing the research question.
Author response: We used the four most comprehensive categories, which have been applied in most studies. It is true that by using more specific categories we might find differences, but we would need to apply and compare a vast number of categories to do that. So we opted for a more pragmatic approach in our study. We now state this in the limitations:
“…Also, we used the most comprehensive categories, which are very broad. LIWC dictionary contains many different categories, and it is possible that by using more specific categories, one may find meaningful differences…”

Reviewer comment: (4) In the section “Statistical analysis”, it is stated that the variables that were not significant were excluded from further analysis. Does this mean that variables were excluded before conducting regression analyses? Please cite a method paper that justifies this approach. Please state which variables were excluded and indicate their p values? Please report the results of the t-tests and Chi-square tests according to APA standards.
Author response: Yes, since there was a significant number of potential predictors, and this was an exploratory analysis, we decided to conduct variable selection, firstly by univariate testing and then by multivariate testing through logistic regression. The process follows the procedure recommended by :
Bursac, Z., Gauss, C.H., Williams, D.K. et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 3, 17 (2008). https://doi.org/10.1186/1751-0473-3-17
What we now provide are t-tests for comparison between 2019 and 2020, the initial model and final model for call comparison; and also Kruskal Wallis test due to sample size discrepancy, the initial model, and final model published online publicly. Also, Kruskal Wallis tests were done in R, since JASP program cannot support the size of the analysis.
The entire procedure is now uploaded on OSF, comparisons, initial and final regression models, as well as descriptive statistics.

Reviewer comment: (5) In the preregistered study design, regression analysis has not been mentioned. Please explain why regressions were computed and how they are related to the research question.
Author response: Thank you for your observation. Due to the significant number of comparisons, we decided to utilize regression after the preregistration has been done. We now labeled that part in statistical analysis as : “Deviation from the protocol”.

Reviewer comment: Please also explain why the variables were regressed on proposal status (Table 1).
Author response: We intended to determine whether the linguistic characteristics of IER or CR comments predict the status of the proposal. We now state this in the analysis section.

Reviewer comment: Please state which variable was predicted in Table 2 and why it was predicted (does “ITN calls” mean the years of the call?).
Author response: We are unsure what the reviewer suggests. The description of the criterion variable is provided below the table. The explanation for ITN is Innovative Training Networks, and it is now added below tables.

Reviewer comment: (6) Tables 3 and 4 are extremely difficult to process and understand. Therefore, I cannot follow the conclusions drawn in the last paragraph of page 6. Please describe more clearly how you arrived at your conclusions and/or present the data in a way that is easier to understand.
Author response: We apologize to reviewer for the confusion. We added more text to make the results more clear.

Reviewer comment: (7) “Discussion” and “Conclusions”: Please discuss the implications of your findings for peer review practice. Please suggest what future research could investigate.
Author response: Thank you for your comments, we will add recommendations for future.

Reviewer comment: (8) Please upload the R and Jamovi code to OSF so that others can reproduce your analysis.
Author response: We have uploaded all analyses, and data, to OSF as JASP files.

Thank you for your time spent in reviewing our manuscript.
Kind regards,
Ivan Buljan
Dear reviewer, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 2: Not approved

Reviewer comment: The study examines whether and how the removal of rating scores in grant peer review changes the reviewers’ texts. The analysis is based on data from a funding scheme where the rating scores have been removed at the level of individual reviews (Marie Skłodowska-Curie Actions: Innovative Training Networks, ITN). Data was analyzed using the Linguistic Inquiry and Word Count software (LIWC). The results presented in Figures 1 and 2 show that the removal of rating scores had no effect on the four LIWC categories employed in the study, suggesting that written assessments are not affected by whether reviewers are mandated to assign scores to a proposal or not.
The study examines a relevant and original research question, uses a unique data set, and employs an analytical approach previously applied to peer review data (LIWC). However, the study needs to be improved in the following respects:
(1) Please justify why the five panels MAT-PHY-CHE-ENG-ENV were clustered into one “research domain” and why the two panels ECO-SOC were grouped. The samples do have very different sizes.
Author response: Thank you for your comment. We used European Research Council Field classification: https://ejoss.euras-edu.org/erc-field-classification/#:~:text=There%20are%2025%20ERC%20panels,Panels%2C%20LS1%E2%80%93LS9). We understand that samples have different sizes, but this approach allowed us to narrow down the number of categories in a logical manner for statistical analysis.

Reviewer comment: (2) In the section “Statistical analysis”, variables are mentioned that have not been described in the section “Variables analyzed” (e.g., “word count for research excellence weaknesses”). I suggest that all variables used in the study be listed in a table, including the values the variables can take. If the variables are not self-explanatory, please also describe the variable.
Author response: Thank you for your comment, the table with explanations is now provided.
Table 1
Description of LIWC variables used in the study
Refer Table 1

For the provided example “word count for research excellence weaknesses”, the variable is word count, but only for a subset of reports that discuss proposal weaknesses.

Reviewer comment: (3) Many analytical categories are available in LIWC. The current study used four. Please explain why these four categories were chosen and why they are appropriate for addressing the research question.
Author response: We used the four most comprehensive categories, which have been applied in most studies. It is true that by using more specific categories we might find differences, but we would need to apply and compare a vast number of categories to do that. So we opted for a more pragmatic approach in our study. We now state this in the limitations:
“…Also, we used the most comprehensive categories, which are very broad. LIWC dictionary contains many different categories, and it is possible that by using more specific categories, one may find meaningful differences…”

Reviewer comment: (4) In the section “Statistical analysis”, it is stated that the variables that were not significant were excluded from further analysis. Does this mean that variables were excluded before conducting regression analyses? Please cite a method paper that justifies this approach. Please state which variables were excluded and indicate their p values? Please report the results of the t-tests and Chi-square tests according to APA standards.
Author response: Yes, since there was a significant number of potential predictors, and this was an exploratory analysis, we decided to conduct variable selection, firstly by univariate testing and then by multivariate testing through logistic regression. The process follows the procedure recommended by :
Bursac, Z., Gauss, C.H., Williams, D.K. et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 3, 17 (2008). https://doi.org/10.1186/1751-0473-3-17
What we now provide are t-tests for comparison between 2019 and 2020, the initial model and final model for call comparison; and also Kruskal Wallis test due to sample size discrepancy, the initial model, and final model published online publicly. Also, Kruskal Wallis tests were done in R, since JASP program cannot support the size of the analysis.
The entire procedure is now uploaded on OSF, comparisons, initial and final regression models, as well as descriptive statistics.

Reviewer comment: (5) In the preregistered study design, regression analysis has not been mentioned. Please explain why regressions were computed and how they are related to the research question.
Author response: Thank you for your observation. Due to the significant number of comparisons, we decided to utilize regression after the preregistration has been done. We now labeled that part in statistical analysis as : “Deviation from the protocol”.

Reviewer comment: Please also explain why the variables were regressed on proposal status (Table 1).
Author response: We intended to determine whether the linguistic characteristics of IER or CR comments predict the status of the proposal. We now state this in the analysis section.

Reviewer comment: Please state which variable was predicted in Table 2 and why it was predicted (does “ITN calls” mean the years of the call?).
Author response: We are unsure what the reviewer suggests. The description of the criterion variable is provided below the table. The explanation for ITN is Innovative Training Networks, and it is now added below tables.

Reviewer comment: (6) Tables 3 and 4 are extremely difficult to process and understand. Therefore, I cannot follow the conclusions drawn in the last paragraph of page 6. Please describe more clearly how you arrived at your conclusions and/or present the data in a way that is easier to understand.
Author response: We apologize to reviewer for the confusion. We added more text to make the results more clear.

Reviewer comment: (7) “Discussion” and “Conclusions”: Please discuss the implications of your findings for peer review practice. Please suggest what future research could investigate.
Author response: Thank you for your comments, we will add recommendations for future.

Reviewer comment: (8) Please upload the R and Jamovi code to OSF so that others can reproduce your analysis.
Author response: We have uploaded all analyses, and data, to OSF as JASP files.

Thank you for your time spent in reviewing our manuscript.
Kind regards,
Ivan Buljan
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 12 Jan 2024

Mariana Pascual, Pontificia Universidad Catolica de Chile, Santiago, Santiago Metropolitan Region, Chile

Approved

https://doi.org/10.5256/f1000research.153043.r233042

The manuscript represents a high-quality text with a sound justification and is highly relevant in the discipline. The research focus is clearly established, and the procedures for selecting and analyzing data are explained in detail.
The language is formal and precise, and there are no signs of ambiguity that may lead to a lack of clarity.
The methodological design is simple and well-described, with explicit references to all the tools and variables selected for data analysis.
The conclusions are relevant and proportional to the scope of the findings, which are discussed in relation to the corresponding literature.
Only one specific linguistic resource is mentioned: the use of first-person pronouns. The reader may benefit from more textual data to accompany the broad categories that the article presents. Including specific examples from the data would have been helpful and increased the article's readability.
All in all, the manuscript shows a clear command of the topic, clarity of expression, and a sound and appropriate methodological design.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Linguistics; Discourse studies; evaluative linguistic resources

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Author Response 26 Aug 2024

Ivan Buljan, Department of Psychology, Faculty of Humanities and Social Sciences in Split, University of Split, Split, Croatia

26 Aug 2024

Author Response

Thank you for your time and your comments.
Kind regards,
Ivan Buljan
Competing Interests: No competing interests were disclosed.
Thank you for your time and your comments.
Kind regards,
Ivan Buljan
Thank you for your time and your comments.
Kind regards,
Ivan Buljan
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 26 Aug 2024

Ivan Buljan, Department of Psychology, Faculty of Humanities and Social Sciences in Split, University of Split, Split, Croatia

26 Aug 2024

Author Response

Thank you for your time and your comments.
Kind regards,
Ivan Buljan
Competing Interests: No competing interests were disclosed.
Thank you for your time and your comments.
Kind regards,
Ivan Buljan
Thank you for your time and your comments.
Kind regards,
Ivan Buljan
Competing Interests: No competing interests were disclosed. Close
Report a concern

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 26 Sep 2023

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3	4
Version 2 (revision) 05 Sep 24			read
Version 1 26 Sep 23	read	read	read	read

Mariana Pascual, Pontificia Universidad Catolica de Chile, Santiago, Chile
Sven E Hug, University of Zurich, Zürich, Switzerland; University of Zurich, Zürich, Switzerland
Seba Qussini, Hamad Medical Corporation, Doha, Qatar; KU Leuven (Ringgold ID: 26657), Leuven, Belgium

Dr. Samer Hammoudeh, Hamad Medical Corporation Medical Research Center, Doha, Qatar; Hamad Medical Corporation (Ringgold ID: 36977), Doha, Qatar
Jan-Ole Hesselberg, University of Oslo, Oslo, Norway

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

4 Views

11 Sep 2024 | for Version 2

Seba Qussini, Medical Research Center, Hamad Medical Corporation, Doha, Qatar; Center for Biomedical Ethics and Law, KU Leuven (Ringgold ID: 26657), Leuven, Flanders, Belgium

Dr. Samer Hammoudeh, Hamad Medical Corporation Medical Research Center, Doha, Doha, Qatar; Medical Research Center, Hamad Medical Corporation (Ringgold ID: 36977), Doha, Doha, Qatar

4 Views Cite this report Responses(0)

Approved

We thank the authors for implementing the necessary revisions.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Research Integrity and Peer review

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

8 Views

29 Aug 2024 | for Version 1

Jan-Ole Hesselberg, Department of Psychology, University of Oslo, Oslo, Norway

8 Views Cite this report Responses(1)

Approved With Reservations

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Judgement and decision-making, peer-review, inter-rater agreement

Respond to this report

Responses (1)

Back to all reports

Reviewer Report

27 Views

14 Mar 2024 | for Version 1

Seba Qussini, Medical Research Center, Hamad Medical Corporation, Doha, Qatar; Center for Biomedical Ethics and Law, KU Leuven (Ringgold ID: 26657), Leuven, Flanders, Belgium

Dr. Samer Hammoudeh, Hamad Medical Corporation Medical Research Center, Doha, Doha, Qatar; Medical Research Center, Hamad Medical Corporation (Ringgold ID: 36977), Doha, Doha, Qatar

27 Views Cite this report Responses(1)

Approved With Reservations

It would be strongly advised that the authors run grammar and typo check, or to obtain language and editing services. The current version of the manuscript has multiple occasions of misspelled words, grammatical errors, and typos. Some examples are figure 1 or 2: the word strengths. First paragraph of discussion: from both. Third paragraph of discussion: the word consensus.
Please add an abbreviation section at the end of the manuscript, and make sure that all abbreviations are spelled out at the first encounter throughout the manuscript.
The current study fills a current gap in the literature as identified by the authors early in the introduction section. This section can probably be restructured for better understanding of the readers, in the following manner: ideally, the first paragraph would discuss general aspects of peer review, therefore, the gap in literature can be moved towards the end of the introduction, after discussing a few aspects, models, shortcomings, etc.
Additionally, the authors may provide examples for the quantitative characteristics in the introduction, and/or further elaborate on the effect of quantitative characteristics on reviewers’ evaluation. Providing more information regarding the Marie Skłodowska-Curie Actions (MSCA) would probably help supplement this section and provide a better understanding of the dynamics of this grant. Furthermore, it could have been useful to touch on other possible proposals’ evaluating methods such as scoring and ranking, as can be found in[1]
The last paragraph may be used to highlight further the gap in the literature that the authors aim to bridge with this project, indicating that peer review is highly versatile and a diverse process, yet further focusing on the need to address the general lack of theoretical understanding in multiple aspects of peer review, as can be seen in: [2], and/or the paucity of experimental research in proposals’ peer review as summarized by [3]
The authors may want to further elaborate on the reasons for removing the numerical score from MSCA? Was it for the effect of simplifying the peer review process on reviewers’ engagement and quality of report? as pointed out by [4]
Registering with the Open Science Framework is a plus.
The study design is cross sectional but not observational. The latter terminology may be removed.
What is the justification for including data from only 2019 and 2020? Why not expand on the size of data by including more years.
Further information may be added to the assessment tool, Linguistic Inquiry and Word Count (LWIC) software, such as the validity and reliability, the LWIC scores, possible cutoffs, totals, etc. The authors do provide a reference, but it would just be easier to have it readily accessible while reading.
Under the bias heading: What is the role of including the whole cohort of submitted proposals in 2019 and 2020 in eliminating potential sampling bias? Please make it clear for the readers.

In the statistical analysis section: How were the predictors established? Was it based on an analysis or from the literature? It would be advised to clarify this detail.
Additionally, please provide justification for using the specified p-value 0.001 in your model, as it appears too stringent.
In the results section: The majority of proposals were from the physical science and engineering fields. Did this play a role in the outcome of the study, where no differences were detected before and after? The authors may want to expand on this in the discussion section.
Additionally, is it possible that COVID19 resultant urgency have affected the review of proposals in 2020?
It would also be useful to further elaborate on the blinding model adopted by the MSCA and how they may have influenced the results.
In table 3: How can the average word count be negative? Please provide an explanation.
It would be great to supplement the discussion section with a few recommendations for future researchers into the topic.
Have the authors considered adjusting for possible confounding factors, such as research domain, etc.? If not , please discuss the possible effects of such confounding factors in the Discussion section.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

References

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Research Integrity and Peer review

Respond to this report

Responses (1)

Author Response

26 Aug 2024

Ivan Buljan, Department of Psychology, Faculty of Humanities and Social Sciences in Split, University of Split, Split, Croatia

Dear reviewers, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 3: Approved with reservations
The study in hand evaluates whether reviewer comments on research proposals differed after the restructuring of the Marie Sklodowska-Curie Actions (MSCA) Innovative Training Networks (ITN) scoring scheme by eliminating the possibility of scoring using numbers. The authors utilize proposal requests data from 2019 and 2020 and found that linguistic characteristics did not differ after the removal of numerical scores from the specified scoring scheme. The following comments are applicable to the current draft of the manuscript:
It would be strongly advised that the authors run grammar and typo check, or to obtain language and editing services. The current version of the manuscript has multiple occasions of misspelled words, grammatical errors, and typos. Some examples are figure 1 or 2: the word strengths. First paragraph of discussion: from both. Third paragraph of discussion: the word consensus.
ANSWER: Thank you for your comment. We went through manuscript and did grammar check and revised the legends to the figures for clarity.
Please add an abbreviation section at the end of the manuscript, and make sure that all abbreviations are spelled out at the first encounter throughout the manuscript.
ANSWER: Thank you, we have added the explanations in the first encounter now, and the Abbreviations section at the end.
The current study fills a current gap in the literature as identified by the authors early in the introduction section. This section can probably be restructured for better understanding of the readers, in the following manner: ideally, the first paragraph would discuss general aspects of peer review, therefore, the gap in literature can be moved towards the end of the introduction, after discussing a few aspects, models, shortcomings, etc.
ANSWER: Thank you for this suggestion. We cite a recent systematic review on grant review, so we do not consider that it is important to repeat the same information, as we follow the standards on the length of introduction section in biomedical papers.
Additionally, the authors may provide examples for the quantitative characteristics in the introduction, and/or further elaborate on the effect of quantitative characteristics on reviewers’ evaluation. Providing more information regarding the Marie Skłodowska-Curie Actions (MSCA) would probably help supplement this section and provide a better understanding of the dynamics of this grant. Furthermore, it could have been useful to touch on other possible proposals’ evaluating methods such as scoring and ranking, as can be found in[1]. The last paragraph may be used to highlight further the gap in the literature that the authors aim to bridge with this project, indicating that peer review is highly versatile and a diverse process, yet further focusing on the need to address the general lack of theoretical understanding in multiple aspects of peer review, as can be seen in: [2], and/or the paucity of experimental research in proposals’ peer review as summarized by [3].
ANSWER: Thank you, we cited our previous studies on MSCA grant proposal review, in order to keept this paper more compact and clear. As for other types of evaluating proposals, we cite a recent systematic review on this topics, which has the information suggested by the reviewer.
The authors may want to further elaborate on the reasons for removing the numerical score from MSCA? Was it for the effect of simplifying the peer review process on reviewers’ engagement and quality of report? as pointed out by [4]
ANSWER: The reason for removal of the numerical scores was an intention to shift focus from numerical scores of the proposals to the text, which represents the feedback on the evaluation process. We have now added this information in the text, along with the reference you suggest.
The study design is cross sectional but not observational. The latter terminology may be removed.
ANSWER: We have removed the term “observational”.

What is the justification for including data from only 2019 and 2020? Why not expand on the size of data by including more years.
ANSWER: Unfortunately, we did not have data from other years on our disposal.

Further information may be added to the assessment tool, Linguistic Inquiry and Word Count (LWIC) software, such as the validity and reliability, the LWIC scores, possible cutoffs, totals, etc. The authors do provide a reference, but it would just be easier to have it readily accessible while reading.
ANSWER: We have uploaded the table with explanations of dimensions, and total scores. Since we used the summary variables, there is no provided reliability coefficient from the original study (Pennebaker et al, 2015).

Under the bias heading: What is the role of including the whole cohort of submitted proposals in 2019 and 2020 in eliminating potential sampling bias? Please make it clear for the readers.
ANSWER: We aimed to state that we avoided potential problems in the interpretation in case we took only samples from the cohorts in both years.

In the statistical analysis section: How were the predictors established? Was it based on an analysis or from the literature? It would be advised to clarify this detail.
ANSWER: As explained above, since there was a significant number of potential predictors, and this was an exploratory analysis, we decided to conduct variable selection, firstly by univariate testing and then by multivariate testing through logistic regression. The process follows the procedure recommended by :
Bursac, Z., Gauss, C.H., Williams, D.K. et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 3, 17 (2008). https://doi.org/10.1186/1751-0473-3-17
What we now provide are t-tests for comparison between 2019 and 2020, the initial model and final model for call comparison; and also Kruskal Wallis test due to sample size discrepancy, the initial model, and final model published online publicly. Also, Kruskal Wallis tests were done in R, since JASP program cannot support the size of the analysis.
The entire procedure is now uploaded on OSF, comparisons, initial and final regression models, as well as descriptive statistics.

Additionally, please justify using the specified p-value 0.001 in your model, as it appears too stringent.
ANSWER: The reason for the P<0.001 was due to the significant number of variables tested, in order to avoid alpha error. However, we now also provide the entire analysis, where the reviewer can see that in logistic regression, all significant variables were significant at P<0.001 level.

In the results section: The majority of proposals were from the physical science and engineering fields. Did this play a role in the outcome of the study, where no differences were detected before and after? The authors may want to expand on this in the discussion section.
Additionally, is it possible that COVID19 resultant urgency have affected the review of proposals in 2020?
ANSWER: Thank you for your suggestions. Both of them seem plausible, and we added them to the discussion section.

It would also be useful to further elaborate on the blinding model adopted by the MSCA and how they may have influenced the results.
ANSWER: Thank you for this comment. Individual assessment of grant proposal are done independently by each expert, but they are not blinded to the applicant identity. They are blinded to the indentity of other experts, but they meet at the consensus stage. As this part of the evalution did not change, we did not address this in the manuscript as it would not have had an impact on the results.

In Table 3: How can the average word count be negative? Please provide an explanation.
ANSWER: We subtracted results from ESR and IER, and in the table we present only differences. The negative direction only indicates that the values in IER were higher than in ESR.

It would be great to supplement the discussion section with a few recommendations for future researchers into the topic.
ANSWER: Thanks, we added some recommendations in the Discussion section.

Have the authors considered adjusting for possible confounding factors, such as research domain, etc.? If not , please discuss the possible effects of such confounding factors in the Discussion section.
ANSWER: Thank you for your comment. We have not included this in the paragraph on limitation in the Discussion section in previous version but it is included now.
Thank you for your comments and we are ready to do any additional comments if needed.
Kind regards,
Ivan Buljan

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

27 Views

12 Mar 2024 | for Version 1

27 Views Cite this report Responses(1)

Not Approved

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

research evaluation, bibliometrics, peer review

Respond to this report

Responses (1)

Author Response

05 Sep 2024

Ivan Buljan, Department of Psychology, Faculty of Humanities and Social Sciences in Split, University of Split, Split, Croatia

Dear reviewer, we are submitting our revised paper and point-by-point responses to your queries. We appreciate your efforts and time invested in our paper.

Reviewer 2: Not approved

Reviewer comment: The study examines whether and how the removal of rating scores in grant peer review changes the reviewers’ texts. The analysis is based on data from a funding scheme where the rating scores have been removed at the level of individual reviews (Marie Skłodowska-Curie Actions: Innovative Training Networks, ITN). Data was analyzed using the Linguistic Inquiry and Word Count software (LIWC). The results presented in Figures 1 and 2 show that the removal of rating scores had no effect on the four LIWC categories employed in the study, suggesting that written assessments are not affected by whether reviewers are mandated to assign scores to a proposal or not.
The study examines a relevant and original research question, uses a unique data set, and employs an analytical approach previously applied to peer review data (LIWC). However, the study needs to be improved in the following respects:
(1) Please justify why the five panels MAT-PHY-CHE-ENG-ENV were clustered into one “research domain” and why the two panels ECO-SOC were grouped. The samples do have very different sizes.
Author response: Thank you for your comment. We used European Research Council Field classification: https://ejoss.euras-edu.org/erc-field-classification/#:~:text=There%20are%2025%20ERC%20panels,Panels%2C%20LS1%E2%80%93LS9). We understand that samples have different sizes, but this approach allowed us to narrow down the number of categories in a logical manner for statistical analysis.

Reviewer comment: (2) In the section “Statistical analysis”, variables are mentioned that have not been described in the section “Variables analyzed” (e.g., “word count for research excellence weaknesses”). I suggest that all variables used in the study be listed in a table, including the values the variables can take. If the variables are not self-explanatory, please also describe the variable.
Author response: Thank you for your comment, the table with explanations is now provided.
Table 1
Description of LIWC variables used in the study
Refer Table 1

For the provided example “word count for research excellence weaknesses”, the variable is word count, but only for a subset of reports that discuss proposal weaknesses.

Reviewer comment: (3) Many analytical categories are available in LIWC. The current study used four. Please explain why these four categories were chosen and why they are appropriate for addressing the research question.
Author response: We used the four most comprehensive categories, which have been applied in most studies. It is true that by using more specific categories we might find differences, but we would need to apply and compare a vast number of categories to do that. So we opted for a more pragmatic approach in our study. We now state this in the limitations:
“…Also, we used the most comprehensive categories, which are very broad. LIWC dictionary contains many different categories, and it is possible that by using more specific categories, one may find meaningful differences…”

Reviewer comment: (4) In the section “Statistical analysis”, it is stated that the variables that were not significant were excluded from further analysis. Does this mean that variables were excluded before conducting regression analyses? Please cite a method paper that justifies this approach. Please state which variables were excluded and indicate their p values? Please report the results of the t-tests and Chi-square tests according to APA standards.
Author response: Yes, since there was a significant number of potential predictors, and this was an exploratory analysis, we decided to conduct variable selection, firstly by univariate testing and then by multivariate testing through logistic regression. The process follows the procedure recommended by :
Bursac, Z., Gauss, C.H., Williams, D.K. et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 3, 17 (2008). https://doi.org/10.1186/1751-0473-3-17
What we now provide are t-tests for comparison between 2019 and 2020, the initial model and final model for call comparison; and also Kruskal Wallis test due to sample size discrepancy, the initial model, and final model published online publicly. Also, Kruskal Wallis tests were done in R, since JASP program cannot support the size of the analysis.
The entire procedure is now uploaded on OSF, comparisons, initial and final regression models, as well as descriptive statistics.

Reviewer comment: (5) In the preregistered study design, regression analysis has not been mentioned. Please explain why regressions were computed and how they are related to the research question.
Author response: Thank you for your observation. Due to the significant number of comparisons, we decided to utilize regression after the preregistration has been done. We now labeled that part in statistical analysis as : “Deviation from the protocol”.

Reviewer comment: Please also explain why the variables were regressed on proposal status (Table 1).
Author response: We intended to determine whether the linguistic characteristics of IER or CR comments predict the status of the proposal. We now state this in the analysis section.

Reviewer comment: Please state which variable was predicted in Table 2 and why it was predicted (does “ITN calls” mean the years of the call?).
Author response: We are unsure what the reviewer suggests. The description of the criterion variable is provided below the table. The explanation for ITN is Innovative Training Networks, and it is now added below tables.

Reviewer comment: (6) Tables 3 and 4 are extremely difficult to process and understand. Therefore, I cannot follow the conclusions drawn in the last paragraph of page 6. Please describe more clearly how you arrived at your conclusions and/or present the data in a way that is easier to understand.
Author response: We apologize to reviewer for the confusion. We added more text to make the results more clear.

Reviewer comment: (7) “Discussion” and “Conclusions”: Please discuss the implications of your findings for peer review practice. Please suggest what future research could investigate.
Author response: Thank you for your comments, we will add recommendations for future.

Reviewer comment: (8) Please upload the R and Jamovi code to OSF so that others can reproduce your analysis.
Author response: We have uploaded all analyses, and data, to OSF as JASP files.

Thank you for your time spent in reviewing our manuscript.
Kind regards,
Ivan Buljan

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

21 Views

12 Jan 2024 | for Version 1

Mariana Pascual, Pontificia Universidad Catolica de Chile, Santiago, Santiago Metropolitan Region, Chile

21 Views Cite this report Responses(1)

Approved

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Linguistics; Discourse studies; evaluative linguistic resources

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (1)

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] Abdoul H, Perrey C, Amiel P, et al.: Peer review of grant applications: criteria used and qualitative study of reviewer practices. PLoS One. 2012; 7(9): e46054. PubMed Abstract | Publisher Full Text | Free Full Text

[2] Baumert P, Cenni F, Antonkine ML: Ten simple rules for a successful EU Marie Skłodowska-Curie Actions Postdoctoral (MSCA) fellowship application. PLoS Comput. Biol. 2022; 18(8): e1010371. PubMed Abstract | Publisher Full Text | Free Full Text

[3] Buljan I, Pina DG, Marušić A: Ethics issues identified by applicants and ethics experts in Horizon 2020 grant proposals. F1000Res. 2021; 10: 471. PubMed Abstract | Publisher Full Text | Free Full Text

[4] Hren D, Pina DG, Norman CR, et al.: What makes or breaks competitive research proposals? A mixed-methods analysis of research grant evaluation reports. J. Informet. 2022; 16(2): 101289. Publisher Full Text

[5] Hug SE, Aeschbach M: Criteria for assessing grant applications: a systematic review. Palgrave Commun. 2020; 6: 37. Publisher Full Text

[6] Kaatz A, Magua W, Zimmerman DR, et al.: A quantitative linguistic analysis of National Institutes of Health R01 application critiques from investigators at one institution. Acad. Med. 2015; 90(1): 69–75. PubMed Abstract | Publisher Full Text | Free Full Text

[7] Kacewicz E, Pennebaker JW, Davis M, et al.: Pronoun use reflects standings in social hierarchies. J. Lang. Soc. Psychol. 2014; 33(2): 125–143. Publisher Full Text

[8] Luo J, Feliciani T, Reinhart M, et al.: Analyzing sentiments in peer review reports: Evidence from two science funding agencies. Quant. Sci. Stud. 2022; 2(4): 1271–1295. Publisher Full Text

[9] Pennebaker JW, Booth RJ, Boyd RL, et al.: Linguistic Inquiry and Word Count: LIWC2015. Pennebaker Conglomerates; 2015a. Reference Source

[10] Pennebaker JW, Boyd RL, Jordan K, et al.: The development and psychometric properties of LIWC2015.2015b.

[11] Pina DG, Buljan I, Hren D, et al.: A retrospective analysis of the peer review of more than 75,000 Marie Curie proposals between 2007 and 2018. elife. 2021; 10: e59338. PubMed Abstract | Publisher Full Text | Free Full Text

[12] R Core Team: R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2021. Reference Source

[13] Recio-Saucedo A, Crane K, Meadmore K, et al.: What works for peer review and decision-making in research funding: a realist synthesis. Res. Integr. Peer Rev. 2022; 7(1). PubMed Abstract | Publisher Full Text | Free Full Text

[14] Seeber M, Vlegels J, Reimink E, et al.: Does reviewing experience reduce disagreement in proposals evaluation? Insights from Marie Sklodowska-Curie and COST Actions. Res. Eval. 2021; 30(3): 349–360. Publisher Full Text

[15] Shepherd J, Frampton GK, Pickett K, et al.: Peer review of health research funding proposals: A systematic map and systematic review of innovations for effectiveness and efficiency. PLoS One. 2018; 13(5): e0196914. PubMed Abstract | Publisher Full Text | Free Full Text

[16] The jamovi project: jamovi (Version 2.3) [Computer Software].2023. Reference Source

[17] Thelwall M, Kousha K, Abdoli M, et al.: Terms in journal articles associating with high quality: can qualitative research be world-leading? J. Doc. 2023. Publisher Full Text

[18] van Arensbergen P , van den Besselaar P : The selection of scientific talent in the allocation of research grants. High Educ. Pol. 2012; 25(3): 381–405. Publisher Full Text

[19] van den Besselaar P , Sandström U, Schiffbaenker H: Studying grant decision-making: a linguistic analysis of review reports. Scientometrics. 2018; 117(1): 313–329. PubMed Abstract | Publisher Full Text | Free Full Text

Are numerical scores important for grant proposals' evaluation? A cross sectional study

Abstract

Keywords

Disclaimer

Introduction

Methods

Ethics and consent

Study design

Participants/sources of data

Assessment tool

Variables analyzed

Bias

Statistical analysis

Results

Figure 1. Linguistic characteristics scores of consensus report reviewer’s comments about proposal between 2019-blue and 2020-yellow.

Figure 2. Linguistic characteristics scores of individual evaluation report reviewer’s comments about proposal between 2019-blue and 2020-yellow.

Table 1. Ordinal logistic regression model for prediction of proposal status by linguistic characteristics of reviewer’s commentsa.

Table 2. Logistic regression in predicting ITN calls with individual (IER) and consensus (CR) comment characteristicsa.

Table 3. Average scores (mean, standard deviation) for differences between consensus score and individual evaluation scores across different outcome status categories and between 2019 and 2020a.

Table 4. Average scores for differences between consensus score and individual evaluation scores across different research domains and between 2019 and 2020a.

Discussion

Conclusions

Author contributions

Data availability

Underlying data

Extended data

Reporting guidelines

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated

Table 1. Ordinal logistic regression model for prediction of proposal status by linguistic characteristics of reviewer’s comments^a.

Table 2. Logistic regression in predicting ITN calls with individual (IER) and consensus (CR) comment characteristics^a.

Table 3. Average scores (mean, standard deviation) for differences between consensus score and individual evaluation scores across different outcome status categories and between 2019 and 2020^a.

Table 4. Average scores for differences between consensus score and individual evaluation scores across different research domains and between 2019 and 2020^a.