Using an expert survey and user feedback to construct PRECHECK: A checklist to evaluate preprints on COVID-19 and beyond

Nora Turoman; Rachel Heyard; Simon Schwab; Eva Furrer; Evie Vergauwe; Leonhard Held

doi:10.12688/f1000research.129814.1

Home Browse Using an expert survey and user feedback to construct PRECHECK: A...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Using an expert survey and user feedback to construct PRECHECK: A checklist to evaluate preprints on COVID-19 and beyond

[version 1; peer review: 1 approved with reservations, 1 not approved]

Nora Turoman ¹, Rachel Heyard ^2,3, Simon Schwab^2,3, Eva Furrer^2,3, Evie Vergauwe^1,4, Leonhard Held^2,3

Nora Turoman ¹, Rachel Heyard ^2,3, [...] Simon Schwab^2,3, Eva Furrer^2,3, Evie Vergauwe^1,4, Leonhard Held^2,3

PUBLISHED 01 Jun 2023

Author details Author details

¹ Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland
² Department of Biostatistics at the Epidemiology Biostatistics and Prevention Institute (EPBI), University of Zurich, Zurich, Switzerland
³ Center for Reproducible Science (CRS), University of Zurich, Zurich, Switzerland
⁴ Geneva University Neurocenter, University of Geneva, Geneva, Switzerland

Nora Turoman
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Software, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Rachel Heyard
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Software, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Simon Schwab
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Software, Validation, Visualization, Writing – Review & Editing

Eva Furrer
Roles: Conceptualization, Supervision, Validation, Writing – Review & Editing

Evie Vergauwe
Roles: Conceptualization, Funding Acquisition, Resources, Supervision, Validation, Writing – Review & Editing

Leonhard Held
Roles: Conceptualization, Funding Acquisition, Resources, Supervision, Validation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Research on Research, Policy & Culture gateway.

This article is included in the Innovations in Research Assessment collection.

Abstract

Background: The quality of COVID-19 preprints should be considered with great care, as their contents can influence public policy. Efforts to improve preprint quality have mostly focused on introducing quick peer review, but surprisingly little has been done to calibrate the public’s evaluation of preprints and their contents. The PRECHECK project aimed to generate a tool to teach and guide scientifically literate non-experts to critically evaluate preprints, on COVID-19 and beyond.
Methods: To create a checklist, we applied a four-step procedure consisting of an initial internal review, an external review by a pool of experts (methodologists, meta-researchers/experts on preprints, journal editors, and science journalists), a final internal review, and an implementation stage. For the external review step, experts rated the relevance of each element of the checklist on five-point Likert scales, and provided written feedback. After each internal review round, we applied the checklist on a set of high-quality preprints from an online list of milestone research works on COVID-19 and low-quality preprints, which were eventually retracted, to verify whether the checklist can discriminate between the two categories.
Results: At the external review step, 26 of the 54 contacted experts responded. The final checklist contained four elements (Research question, study type, transparency and integrity, and limitations), with ‘superficial’ and ‘deep’ levels for evaluation. When using both levels of evaluation, the checklist was effective at discriminating high- from low-quality preprints. Its usability was confirmed in workshops with our target audience: Bachelors students in Psychology and Medicine, and science journalists.
Conclusions: We created a simple, easy-to-use tool for helping scientifically literate non-experts navigate preprints with a critical mind. We believe that our checklist has great potential to help guide decisions about the quality of preprints on COVID-19 in our target audience and that this extends beyond COVID-19.

Keywords

COVID-19, preprints, checklist, science education, science communication

Corresponding authors: Nora Turoman, Rachel Heyard

Competing interests: No competing interests were disclosed.

Grant information: This project was supported by the UZH-UNIGE Joint Seed Funding for Collaboration in Research and Teaching between the University of Zürich and the University of Geneva, to LH and EV.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2023 Turoman N et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Turoman N, Heyard R, Schwab S et al. Using an expert survey and user feedback to construct PRECHECK: A checklist to evaluate preprints on COVID-19 and beyond [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2023, 12:588 (https://doi.org/10.12688/f1000research.129814.1) First published: 01 Jun 2023, 12:588 (https://doi.org/10.12688/f1000research.129814.1) Latest published: 03 Jun 2024, 12:588 (https://doi.org/10.12688/f1000research.129814.3)

Introduction

During the COVID-19 pandemic, there has been both a proliferation of scientific data, and a major shift in how results were disseminated, with many researchers opting to post their work as preprints ahead of or instead of publication in scientific journals.¹^–⁴ Preprints are scientific manuscripts that are posted on freely accessible preprint servers (such as medRxiv, bioRxiv, PsyArXiv, MetaArXiv or arXiv), and which have not gone through formal peer review. Preprints take an extremely short time to become ‘live’ – between 24 and 48 hours after basic checks by server administrators, such as that the content of the manuscript is scientific text within the scope of the server, and not spam or plagiarised text.⁵ This is clearly advantageous in a rapidly-evolving pandemic.⁶^,⁷ However, unlike journal submissions, the dissemination of preprints is not predicated on any quality control procedure.⁸ Even before the pandemic, concerns have been raised about the potential of such unvetted results and interpretation of findings leading to widespread misinformation.⁹ Indeed, in the past three years, we have seen prominent examples of non-peer-reviewed COVID-19-related claims¹⁰^,¹¹ that were promptly uncovered as misleading or seriously flawed by the scientific community,¹²^,¹³ nonetheless infiltrate the public consciousness¹⁴ and even public policy.¹⁵ In this high-stakes context, two things have become clear: 1) that preprints have become a tool for disseminating disease outbreak research,¹⁶ and 2) that evaluating preprint quality will remain key for ensuring positive public health outcomes.

Since the start of the pandemic, the number of preprints on COVID-19 has been steadily rising, with over 55,000 preprints to date (as of June 27, 2022; see also Figure 1). In the early stages of the pandemic, studies have shown that COVID-19 preprints were typically less well-written in terms of readability and spelling correctness,¹⁷ and that most did not meet standards for reproducibility and research integrity.¹⁸^–²⁰ A number of preprints also contained extremely serious issues, such as ethical and privacy concerns, data manipulation, and flawed designs.²¹ These data support the notion of a proliferation of bad quality work over the course of the pandemic (‘research waste’,²²^,²³), ultimately leading to a spread of misinformation.²⁴

Figure 1. The increase of COVID-19 preprints.

(A) Preprints appearing per day on a selection of preprint servers indicated in the figure’s legend, from January 2020 to June 2022. To account for daily variation in the upload of preprints, 30-day moving averages are represented, i.e. the average of the last 30 days. (B) The cumulative number of preprints posted to the same set of preprint servers, indexed in Europe PMC since the first WHO statement regarding a novel infectious disease outbreak on January 9, 2020.⁶⁸

Such initial findings are countered by a much more nuanced story of COVID-19 preprint quality. While it is true that many preprints never convert into publications (70-80% as of April 2021, see e.g., Refs. 25, 26) and that preprints tend to be less cited than the resulting peer reviewed publication,²⁷^,²⁸ this is not necessarily due to poor quality. For one, the link between a preprint and its publication may be lost when it is the preprint that gets cited.²⁹ Alternatively, the authors’ decisions could be the cause, as some may avoid the publication process altogether.³⁰ Others may intentionally use preprints to release replications, and null results that are difficult to publish,³¹ or works in progress which may be less well-written and inadequate at sharing data/code.¹⁷^–²⁰ Many preprints actually report their results in a balanced way so as not to ‘oversell’ their findings,³⁰ and there is growing evidence of high concordance between findings published in preprints and in peer-reviewed journals.²⁵^,³²^–³⁹ Nonetheless, a large portion of COVID-19 preprints show substantial changes in methods and results after peer-review (nearly half of the preprints analysed by Oikonomidi (2020) and Nicolalde et al. (2020)²⁵^,³¹), suggesting flaws in the most essential elements of many COVID-19 preprints. At least two potential solutions for distinguishing high- from low-quality research in preprints are possible: 1) introducing quality control measures, and 2) educating the readership of preprints to make quality evaluations themselves. Of the extant efforts to improve preprint quality, most have focused on introducing quality control via quick peer-review, e.g., Prereview (https://www.prereview.org/), Review Commons (https://www.reviewcommons.org/), PreLights (https://prelights.biologists.com/)[1]. Though peer-review is often considered the gold-standard for scientific quality control, it has limitations: it can be time consuming,³ at times inefficient at weeding out fraud, and often contaminated with reviewer bias, negligence, and self-interest.³⁹^–⁴⁵ Automated problem detectors are a promising way forward,⁴⁶ however such tools still require continued refinement and human verification of results.⁴⁷ When research is under a stress test, such as during a world-wide pandemic, alternative forms of quality control have to be considered.

Aside from improving the contents of preprints directly, more could be done to educate the readership of preprints on the judicious interpretation of their contents. Preprints receive large attention on social and traditional media,⁴⁸ but so far, their contents have not always been reported adequately, as many news reports do not provide explanations of the publication process, of how preprint platforms work, and of the implications of non-peer-reviewed research.⁴⁴^,⁴⁵ This is made all the more disappointing by recent findings showing that a simple, one-paragraph explanation of the nature of preprints and the scientific publishing process can meaningfully change laypeople’s perceived credibility of scientific results.⁴⁶

Public education initiatives on understanding preprints (on COVID-19 or more generally) have been next to non-existent. Apart from the study by Wingen et al.,⁴⁹ only one effort was identified,⁵⁰ which aimed to provide a set of guidelines for providing feedback on preprints, whether for reviewers or members of the broader community. In the absence of a set of guidelines on interpreting and evaluating information in preprints, we created the PRECHECK project (www.precheck.site). As the first project of its kind, the aim was to develop a simple, user-friendly tool to guide non-scientists in their evaluation of preprint quality. Though we were inspired by the proliferation of preprints and misinformation during the COVID-19 pandemic, we created a tool that can be applied to preprints or publications on other topics, with hopes that it can help empower non-scientists and non-specialists in making their own judgments.

Methods

Ethical considerations

The entire project “PRECHECK: A checklist to evaluate COVID-19 preprints” was conducted under the ethical policies of the University of Zurich. As stipulated in sections 5 and 6 of these policies (https://www.rud.uzh.ch/dam/jcr:c42f07d3-2e89-485c-8a8b-3a3c3f46a3f5/UZH%20Policy%20on%20the%20Ethical%20Review%20of%20Research%20Projects%20Involving%20Human%20Subjects%20 (UZH%20Ethics%20Policy).pdf), our project falls outside the scope of the Swiss Human Research Act and, per section 8.1 of these policies, can be considered as a study that “generally cannot harmfully affect study participants”. Therefore, our project did not require explicit prior ethical approval from the institution, and for this reason we did not ask for ethical approval from the institution. A post-hoc verification with the University of Zurich ethical commission’s ethics review checklist (available in in the OSF repository for this project⁶⁸) confirmed that we did not require ethical approval from the current project per the regulations of the University of Zurich. Written consent for participating in student workshops was not required because the workshops were administered as part of their regular university courses (details in manuscript) to which students do not need to exceptionally consent.

Study design

The aim of the study was to develop simple and clear guidance, in the form of a checklist, to help assess the quality of a preprint. Our target audience for the checklist were scientifically literate non-specialists, such as students of medicine and psychology, and science journalists. To develop a checklist that would be both user-appropriate, and discriminative of preprints of different levels of quality, we applied a multi-step approach inspired by the qualitative Delphi method.⁵¹^,⁵² As such, our study could be considered a qualitative study, and we have thus followed the Standards for Reporting Qualitative Research (SRQR) (Ref. 53; see Ref. 69 for a completed version of the checklist for the current project). The Delphi method uses successive voting by expert groups to iteratively narrow down a pool of options until consensus is reached and is effective in determining consensus in situations with little to no objective evidence.⁵⁴ Our procedure involved four main stages. In the first stage, the first draft of the checklist was reviewed internally by the senior members of our team (who were not involved in creating the draft) and subjected to a sensitivity test to verify whether the checklist could discriminate high- from low-quality preprints[2]. In the second stage, a panel of external experts rated the relevance of each element of the checklist and provided feedback, after which the checklist was updated. In the third stage, we conducted a final round of internal review producing the third draft of the checklist, which was also submitted to a final sensitivity analysis. At the end of the above three stages, we verified if members of our target audience could use the checklist successfully and if they appeared to find it useful, via workshops with university students and journalists. We called this stage the implementation of the checklist. In the workshop for journalists, some participants offered unsolicited yet helpful feedback, which was incorporated into the finalised version of the checklist (see also Figure 2).

Figure 2. The stages of our Delphi-inspired approach to creating the PRECHECK checklist.

There are four successive stages (Stage 1 – Stage 3 and the implementation stage, in blue, green, yellow, and purple fields, respectively) where each is made up of a set of successive steps (in white rectangular fields).

Researcher characteristics

The research team conducting the analysis was composed of three junior researchers (postdoctoral level) and three senior researchers (two professors and one senior scientific collaborator). All members of the team have experience in meta-research, but are trained in other disciplines (statistics and psychology).

Internal review

In this step, the three senior members of our team gave written feedback on a draft of the checklist in the form of comments. After the written feedback was incorporated by two junior members of our team, final feedback in the form of verbal comments from the senior members was obtained in an online meeting round. A copy of each version of the checklist after feedback and other validation procedures below is available in the OSF repository for this project.⁶⁸

Sensitivity test

After each of the two rounds of internal review (Stage 1 and Stage 3), we conducted a sensitivity test in order to verify whether the checklist, at its given stage of development, could successfully discriminate between preprints of different levels of quality. We did not perform any statistical analyses as part of this assessment. Since objective criteria for quality that would apply to all preprints across disciplines were difficult to envision, we used a list of milestone research works in the COVID-19 pandemic⁵⁵ as a proxy of high quality, and we identified what would be the preprints in our high-quality set from this list. The preprints that would make up our low-quality set, on the other hand, were chosen from a list of retracted COVID-19 preprints,⁵⁶ with retraction being a proxy for low quality. There were no a-priori criteria for selecting preprints. We selected three high-quality preprints⁵⁷^–⁵⁹ and three low-quality preprints⁹^,⁶⁰^,⁶¹ to test across multiple stages. Since preprint selection occurred in August 2021, we only included preprints that were available online from the start of the pandemic up until that point. In the Stage 1 round, one junior team member tested only the high-quality preprints and another junior team member the low-quality preprints. In the Stage 3 round, both team members tested all preprints independently. The same high- and low-quality preprints were used at both stages. To document the test results, a spreadsheet was generated with one field per checklist element, where the response to the checklist element for each preprint was entered. Fields were also coloured red, green, and yellow, to visually better represent ‘yes’, ‘no’, and ‘maybe’ responses to specific checklist elements, respectively. The results of each round of sensitivity tests with the links to the respective preprints are available in the Open Science Framework (OSF) repository for this project.⁶⁸

External expert review

Panel selection

We invited a total of 54 experts across four topic groups: research methodology (14 invitations), preprints (15 invitations), journal editors (15 invitations), and science journalists (10 invitations), as these fields of expertise were judged to be of the greatest relevance for evaluating our checklist. Experts were identified through personal connections, by identifying editors of relevant journals in the fields of psychology and medicine (for the editor cohort), science journalists from a list of speakers at the World Conference of Science Journalists, Lausanne 2019 (https://www.wcsj2019.eu/speakers); for the journalist group), and identifying individuals whose work in the topic areas of research methodology and preprints is noteworthy and well-known. There were no a-priori criteria for selecting experts, other than their perceived belonging to one of the topic groups. In actual Delphi designs, 5-10 experts may be considered sufficient,⁶² however, there is no clear consensus on how large expert groups should be.⁶³ Of the total number of experts, 29 were personal contacts. Panel members were contacted by email between October 18 2021 and November 16 2021 with a standardised explanation of the project, their role in the refinement of the checklist, how their data would be used (that their name and email will not be shared, that their responses will be fully anonymous, and that the aggregated results will be published and shared on our OSF repository for the project), and a link to a Google Forms survey where they could enter their responses anonymously. By clicking on the link to the survey, the experts thus gave their implicit informed consent to participate. The Survey Form sent to the experts is available in the OSF repository for this project.⁶⁸ Experts that replied to our email declining to take part in the survey were not contacted further. Any experts that did not explicitly decline to take part were reminded twice, as the survey responses were fully anonymous, and we had no way of linking responses to individual experts. A total of 26 experts (48%) responded to the invitation by filling our survey (14 of them were personal contacts). The experts (self-)reported that they belonged to the following expert groups: four experts in science journalism, four journal editors, seven meta-researchers/experts on preprints, and 11 methodologists.

Response collection and analysis

Experts rated the relevance of each element of the checklist on a five-point Likert scale, with the following response options: extremely irrelevant, mostly irrelevant, neither relevant nor irrelevant, mostly relevant, and extremely relevant. Response data were analysed by computing mean responses per element, using R (Version 4.04). We based our decisions on which elements to keep following the procedure used to establish the CONSORT guidelines for abstracts.⁶⁴ That is, all elements with a mean score of four and above were kept (in the CONSORT for abstracts, this score was eight, as they used a 10-point Likert scale), elements with a mean score between three and four (four not included; between six and seven in the CONSORT for abstracts criteria) were marked for possible inclusion, and elements with a mean score below three were rejected.

Experts also had the option to provide free-text comments: general additional comments on the checklist elements, suggestions for potentially relevant items that are missing, and suggestions on the structure (a PDF of the survey, including the Likert scales and free-text options is available in the OSF repository for this project⁶⁸). These comments were collected into a single document and responded to in a point-by-point manner akin to a response to reviewers document (also in the OSF repository for this project⁶⁸), arguing our agreement/disagreement with expert comments and how they were addressed in the subsequent draft of the checklist.

Results

Stage 1 results

First draft of the checklist

The first draft of the checklist was created from April 16 until May 17, 2021, and contained six categories of items: research question, study type, transparency, limitations, study reporting, and research integrity. The research question item asked whether the study mentioned the research question/aim, this being the most basic component of a research study. The study type question asked whether the study type was mentioned, and in the case that it was not, asked users to try and infer what the study type was, with guidance. In transparency, users were supposed to check the existence, availability, and accessibility of a protocol, and of data, code, and materials sharing. The limitations question asked whether any limitations of the study were mentioned, and asked users to try and evaluate any potentially unmentioned limitations (biases, specifically), with guidance. In study reporting, we asked to check for reporting guidelines that were followed explicitly or implicitly. Finally, the research integrity category asked users to check whether ethical approval, conflicts of interest, and contributor roles were reported.

Internal review results

In the first round of internal review, the senior members of our team provided feedback on the contents of the first draft of the checklist. This round revealed that explanations were needed as to the importance of considering each specific item in our checklist, and that both a more ‘superficial level’ and a ‘deeper level’ were necessary to account for all user needs. For these reasons, we expanded the initial checklist, such that each item consisted of a main question, an explanatory section entitled ‘Why is this important’, and a section entitled ‘Let’s dig deeper’. The main questions formed the basis of the checklist, as they were all closed questions that could be answered via the ‘yes’ box next to them. Users could tick the box in full to indicate that the preprint being read passes the question, in part to indicate a ‘maybe’ response, or not at all to indicate that the preprint does not pass the question. This level was also called the ‘superficial level’ of assessment, as the questions could mostly be answered after a quick read of a preprint, and by searching for keywords that appear in the questions. The’why is this important?’ section was added in order to increase the pedagogical value of the checklist, which was designed as a teaching tool, such that users could learn about the purpose behind the main questions, and their importance for evaluating research. The ‘let’s dig deeper’ sections were added as an optional part for users that wanted to go beyond the main questions in their evaluation of a preprint, or that wanted to learn more about how to structure their thoughts when evaluating research work. Thus, this section does not have a tick-box, as it usually contains open questions and suggestions. This section cannot stand alone and should be consulted after the main questions and ‘why is this important’ section, which is why we called this the ‘deep level’ of assessment. A full version of the checklist at this stage can be found in the OSF repository for this project.⁶⁸ This was the version of the checklist that we submitted to a sensitivity test on high- and low-quality preprints.

Sensitivity test results

While searching for high- and low-quality preprint examples to apply the checklist to, we discovered that the checklist works best when applied to research with human subjects using primary data (or systematic reviews, meta analyses and re-analyses of primary data). That is, the questions turned out to be ill-posed (unanswerable) for research that did not fall into the above description, such as simulation studies, for example. We decided to indicate this in the introduction section of the checklist.

At the superficial level, the high-quality preprints that we used⁵⁷^–⁵⁹ received mostly ‘yes’ and ‘maybe’ responses to the main questions. Only⁵⁸ had a ‘no’ for mentioning/sharing their data/code. The low-quality preprints⁹^,⁶⁰^,⁶¹ also received several ‘yes’ and ‘maybe’ responses, though fewer than the high-quality preprints. Interestingly, the Transparency and Study reporting items all received ‘no’ responses for the low-quality preprints, whereas the high-quality preprints mostly received ‘maybe’ and ‘yes’ responses. This demonstrates both that high- and low-quality preprints differ in terms of how they meet transparency and reporting standards, but also highlights that even high-quality preprints do not necessarily meet all these standards.

At the deep level, high-quality preprints mostly received ‘yes’ responses to the points in the ‘let’s dig deeper’ section, and a few ‘no’ and ‘maybe’ responses. Meanwhile, the low-quality preprints had no clear pattern of responses with many ‘no’, ‘yes’, and ‘maybe’ responses. However, low-quality preprints had more ‘no’ responses than high-quality preprints. At this level of assessment, one could delve into issues with study design, specifically in Limitations, where low-quality preprints performed demonstrably worse than high-quality preprints. Thus, it may be important to consider both levels of assessment when evaluating preprints using the checklist. The final version of the checklist at this stage was completed on October 18, 2021.

Stage 2 results

External expert review summary results

Experts were first asked how relevant they thought each of the six item categories were for assessing preprint quality on a five-point Likert scale: from extremely irrelevant (score of 1) to extremely relevant (score of 5). Here, all categories scored above four, except study reporting which received a 3.96. All elements with a mean score of four and above were kept, elements with a mean score between three and four were marked for possible inclusion and those with a mean score below three were rejected. Next, experts were to rate the relevance of each of the elements per category, including the main questions, and each of the points in the ‘let’s dig deeper’ sections. The results are summarised in Table 1 below. None of the elements has a score lower than three, which meant that none of the elements warranted immediate exclusion. However, several elements, including the entire study reporting category, had scores between three and four that placed them in the ‘possible inclusion’ category.

Table 1. Results of expert survey and implementation of results.

Checklist Element	Average Expert Score (SD)	Decision
1. Research Question	4.23 (1.3)	Keep
1a) Is the research question/aim stated?	4.04 (1.4)	Keep
LDD 1) Is the study confirmatory or exploratory?	3.69 (1.2)	Reject after discussion
2. Study Type	4.31 (0.9)	Keep
2a) Is the study type mentioned in the title, abstract, introduction, or methods?	4.27 (0.8)	Keep
LDD 2) What is the study type?	4.08 (1.1)	Keep
3. Transparency	4.19 (1.0)	Keep
3a) Is a protocol, study plan, or registration mentioned, and accessible (e.g., by hyperlink, by registration number)?	4.04 (1.0)	Keep
3b) Is data sharing mentioned, and are the data accessible (e.g., by hyperlink)?	3.88 (0.9)	Keep after discussion
3c) Is code/materials sharing mentioned, and are they accessible (e.g., by hyperlink)?	3.77 (0.9)	Keep after discussion
LDD 3a) Try to find and read the study protocol - are the procedures described in the protocol consistent with what was reported in the preprint?	4.00 (1.0)	Keep
LDD 3b) Do the same with the shared data - do they match the data reported in the manuscript?	4.15 (0.9)	Keep
4. Limitations	4.19 (1.1)	Keep
4a) Are the limitations of the study addressed in the discussion/conclusion section?	4.19 (1.1)	Keep
LDD 4a) Check the study’s sample (methods section)?	4.50 (1.0)	Keep
LDD 4b) Was there a control group or control condition?	4.19 (1.0)	Keep
LDD 4c) Was there randomisation?	4.08 (1.2)	Keep
LDD 4d) Was there blinding?	4.00 (1.1)	Keep
5. Study Reporting	3.96 (1.1)	Reject after discussion
5a) Were any reporting guidelines used (e.g., PRISMA, SPIRIT, CONSORT, MOOSE, STROBE, STARD, ARRIVE; JARS, in medical and psychological	3.85 (1.1)	Reject after discussion
LDD 5a) Have a look at the criteria of the reporting guidelines that fit the discipline of the manuscript, and try to see if the study follows them even if they do not state it	3.65 (0.9)	Reject after discussion
LDD 5b) Were all the performed analyses reported in the introduction and/or methods section?	3.81 (1.0)	Reject after discussion
LDD 5c) Were all the research questions/aims stated in the introduction addressed by analyses (see results and discussion sections)?	3.88 (1.1)	Reject after discussion
LDD 5d) Were exploratory analyses stated as exploratory (e.g., in the introduction section)?	3.69 (1.1)	Reject after discussion
6. Research Integrity	4.08 (1.1)	Keep
6a) Does the article contain an ethics approval statement (e.g., approval granted or no approval required), that can be verified (e.g., by approval number)?	4.04 (1.2)	Keep
6b) Have conflicts of interest been declared?	4.08 (1.3)	Keep

Summary of external experts’ comments

The full list of free-text comments submitted by the experts and our responses to them is available in the OSF repository for this project.⁶⁸ Overall, the comments revealed enthusiasm about the checklist, but also important criticisms. First, there were concerns that the target audience as it was previously defined (non-scientists) would not be able to adequately use a checklist of the complexity present at the time. Paradoxically, however, there were also many suggestions for additional elements that could further increase complexity, such as: how the study fills the gap in the knowledge, if the study can be fully reproduced, if there is justification for the statistical methodology used, etc. Another prominent criticism was that the checklist inadvertently prioritised randomised controlled trials and could potentially discredit observational studies. Our sensitivity test did not find observational studies to be particularly disadvantaged in comparison with randomised controlled trials. However, we did acknowledge that some categories, specifically limitations, appeared more oriented towards randomised controlled trials. One specific comment raised the issue that our checklist did not include COVID-19 specific questions, despite the checklist’s motivation. We thus sought to correct and elaborate these points in the next versions of the checklist.

Some experts had additional ideas, some of which were outside the scope of this project, but others of which were easy to implement. In particular, the suggestion to ask users to check if the manuscript converted into a publication, and to include spin and overinterpretation of results as other examples of biases in the Limitations section. These suggestions were incorporated into the subsequent versions of the checklist.

Second draft of the checklist

The second draft of the checklist⁶⁸ consisted of the following elements: Research question (main question only), study type, transparency (main question on mentioning the protocol, and two ‘let’s dig deeper’ questions on accessing the protocol, and accessing the data), limitations, and research integrity (two main questions on mentioning an ethics approval statement, and conflicts of interest). After expert feedback, we decided to omit the study reporting category, and instead incorporate the reporting guidelines mentioned there in the newly added descriptions of study types in the item study type, to reduce overall complexity, and increase clarity in this section specifically. After comments on the checklist’s complexity, we combined the transparency and research integrity categories and their remaining elements into a new category called ‘transparency and integrity’. Here, in response to a specific comment, we altered the main questions such that they probe if a study protocol, data sharing, materials sharing, ethical approval, and conflicts of interest, are mentioned, whereas the ‘let’s dig deeper’ section asks if the above are accessible (in the aforementioned section, we define what we mean by accessible). In addition to these changes, we realised the utility of looking both at the preprint and the preprint server when using the checklist, as sometimes the preprint does not mention data/materials sharing, but these resources are shared on the server.¹⁹ Thus, we added a recommendation into the introduction to consider both sources when using the checklist. In response to the comment on inadvertently disadvantaging observational research, we added disclaimers into the limitations section stating when certain sources of bias do not apply. Specifically, for the control group/condition, randomisation, and blinding elements, we added a clause to the end of each paragraph that states ‘(if your preprint is on an observational study, this item does not apply)’. In response to the lack of a COVID-19 element, we elaborated that, although the state of COVID-19 preprints and their effect on society was our inspiration for the project, our expertise (preprint quality in medicine and psychology) made it difficult to create an adequate COVID-19-specific checklist. With this, we modified the title of the checklist and expanded the introduction, to avoid further confusion. The final version of the checklist at this stage was completed on January 17, 2022. It is available in the OSF repository for this project.⁶⁹ Also, we structured our workshops with students and journalists around issues with preprints and peer review more generally.

To provide more details on the workshops, we envisaged two for students: one for Bachelors’ students of Psychology at the University of Geneva, and one for Bachelors’ students of Medicine at the University of Zurich. Both workshops were given as invited lectures in standard classes taught to students as part of their respective Bachelors’ courses (as part of the “Scientific skills and competencies in psychology” class at the University of Geneva [in the original French: “Compétences et connaissances scientifiques en psychologie”], and the “Biostatistics for medical professionals” class at the University of Zurich [in the original German: “Biostatistik für Mediziner”]). The workshop content was made clear to the students in advance. Only those students that were enrolled in the above classes as part of their Bachelors’ courses could attend the workshops. Thus, no special consent was required to participate in the workshops. One junior member led the workshop at the University of Geneva, while at the University of Zurich, one junior and one senior member led the workshop. There was no relationship of dependence between either one of these members and the students they administered the workshops to (i.e., neither member was involved in the grading of the students’ course assignments or exams). Both workshops consisted of a theoretical presentation of the scientific publishing process, what preprints are, and current issues with preprints and peer review (based on the conclusions of the literature review in the present manuscript). Next, the purpose and component parts of the checklist were presented, together with instructions on how to use it for the subsequent in-class exercise. In the exercise, students were split into two groups according to their last names, each given 15-20 minutes to re-read a preprint (they were informed about the exercise and given the preprints beforehand; one group read,⁶⁰ and the other group read⁶⁵) and use the checklist to evaluate it. The answers were collected anonymously, in an aggregated (non-individual) fashion, via an in-house platform. At the University of Geneva, we used surveys on Votamatic (https://votamatic.unige.ch/) to gather the percentages of students that voted ‘yes’, ‘no’, or ‘partly’ for each superficial-level item, and Padlet (https://unige.padlet.org) to gather optional free text responses to the deep-level items. At the University of Zurich, Klicker (https://www.klicker.uzh.ch/home) was used to gather the percentages of students that voted ‘yes’, ‘no’, or ‘maybe’ for each superficial-level item, alongside optional free text comments. Once recorded, these percentages were discussed by the whole class, and compared against the researchers’ own evaluation of said preprint. The responses were only analysed inasmuch as percentages of ‘yes’, ‘no’, and ‘partly/maybe’ responses were automatically generated by the respective platforms, for the purpose of in-class discussion. There were no formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way.

The above workshop format was also adapted for science journalists at the Universities of Geneva and Zurich. In practice, only the workshop at the University of Geneva was held (and led by the junior member of the team based at the University of Geneva), as the relevant personnel at the University of Zurich failed to respond to our requests to hold a workshop. The format was largely the same as that of the workshop for students, except that the initial presentation focused less on the scientific publishing process, and more on issues with preprints and peer review, and included more time for a mutual discussion of said issues. We intended for these to be small workshops open only to a select number of people, on a purely voluntary basis, invited by the main contact person at each University’s communications section. All of the invited parties were informed on the content of the workshop beforehand. Only those individuals that were interested in attending did so and provided verbal assent to participate in the workshop. The same in-house online platform as before (i.e., Votamatic) was used to collect data on percentages of ‘yes’, ‘no’, or ‘partly’ for each superficial-level item of the checklist in an anonymous, aggregated (non-individual) fashion, and this automatic generation of response percentages was the only analysis performed on these data. Deep-level item answers were informally discussed. As before, there were no formal statistical analyses performed on these data. Some informal verbal feedback (not previously solicited) that we received from the participants did influence the content of the checklist, as we detail in the Implementation section below.

Finally, the content of the workshop for university science journalists was adapted for a fact-checking seminar organised by the Swiss Association of Science Journalists (https://www.science-journalism.ch/event/spring-seminar-2022). Two junior members gave this workshop at the Swiss National Science Foundation headquarters in Bern. Participation was voluntary and open only to individuals that registered to attend the conference at which we were invited to give our workshop as part of the advertised programme. Thus, no special consent was required for participation. We used the same procedure as above, with the exception that a ‘nonsense’ preprint was read and evaluated (Ref. 66; i.e., a manuscript describing a made-up study to prove the point that predatory journals do not screen for quality) and no response data were collected (responses were probed and discussed verbally during the session).

Stage 3 results

Final draft of the checklist

In the second internal review round, we kept the four-item structure, most of the text, and visual separation between the superficial and deep levels from the second draft of the checklist. In addition, the wording was clarified and additional explanations were provided where necessary. In transparency and integrity, we added clauses to explain what to do in situations where data and materials are stated to be shared upon request, or if reasons against sharing are mentioned. Specifically, since acquiring data/materials after requesting it from authors that only share upon request happens only in a minority of cases,⁶⁵ we advised that ‘mentioning only that data will be shared ‘upon request’ does not count’. However, if authors mention that they cannot share the data or materials for a specific reason, we advised that ‘mentioning any reasons against sharing also counts [as a ‘yes’]’. In the limitation section, we added examples of spin and overinterpretation as other sorts of biases to look out for. In the introduction, we added the recommendation to check whether the preprint has converted into a publication.

Sensitivity test results

At the superficial level, both high- and low- quality preprints had many positive responses, for both team members that tested the checklist. The elements that differed the most were transparency and integrity, with low-quality preprints having slightly more ‘no’ and ‘maybe’ responses than high-quality preprints. Additionally, both raters agreed that two of the three low-quality preprints did not discuss limitations. This is a slight departure from the prior sensitivity test results, in that the difference between high- and low-quality preprints appears to be smaller.

At the deep level, however, the differences were once again well-pronounced, as high-quality preprints had predominantly positive responses, while low-quality preprints had predominantly negative and ‘maybe’ responses. Thus, the deep level seems to be necessary to fully discern between high- and low-quality preprints. Similarly, to the prior results, the responses in the transparency and integrity section exposed that even high-quality preprints did not always meet the standards of transparency and integrity. The final version of the checklist at this stage was completed on March 1, 2022.

Implementation

With an operational checklist, we set out to teach members of our target audience how to use the checklist and to verify if they could use it as intended. To this end, we gave courses including workshops to Bachelor students in Medicine and Psychology at the University of Zurich, and the University of Geneva, respectively. The workshop at the University of Zurich took place on March 11, 2022, while the workshop at the University of Geneva took place on May 9, 2022. For our journalist cohort, we provided a lecture and practical for members of the University of Geneva communications office and scientific information division. This workshop took place on May 19, 2022. We later also presented the checklist at a fact-checking seminar organised by the Swiss Association of Science Journalists on May 31, 2022.

Across all of these classes, except for the fact-checking seminar, we used the same high- and low-quality preprints as practice material, and explained both how to use the checklist, as well as the rationale behind its elements, as stated in our point-by-point response to experts. A few suggestions for improvement nonetheless emerged from the workshop with the journalists at the University of Geneva, as some participants wished to offer feedback. Some participants of this workshop pointed out that the necessity and function of both the superficial and deep level of assessment were not clear. Others had trouble understanding what mention of data and materials sharing counts as a ‘yes’ response and what counts as a ‘no’ response. Finally, one participant suggested that even before using the checklist, one should generally apply more caution when assessing controversial research or findings that sound ‘too good to be true’. We incorporated all of this feedback into the final version of the checklist, which was completed on May 20, 2022 (see also Table 2).⁶⁸

Table 2. The PRECHECK checklist.

	Category	Items	Yes
Research question	1	Is the research question/aim stated?	□
Why is this important?		A study cannot be done without a research question/aim. A clear and precise research question/aim is necessary for all later decisions on the design of the study. The research question/aim should ideally be part of the abstract and explained in more detail at the end of the introduction.
Study type	2	Is the study type mentioned in the title, abstract, introduction, or methods?	□
Why is this important?		For a study to be done well and to provide credible results, it has to be planned properly from the start, which includes deciding on the type of study that is best suited to address the research question/aim. There are various types of study (e.g., observational studies, randomised experiments, case studies, etc.), and knowing what type a study was can help to evaluate whether the study was good or not. What is the study type? Some common examples include: - observational studies - studies where the experimental conditions are not manipulated by the researcher and the data are collected as they become available. For example, surveying a large group of people about their symptoms is observational. So is collecting nasal swabs from all patients in a ward, without having allocated them to different pre-designed treatment groups. Analysing data from registries or records is also observational. For more information on what to look for in a preprint on a study of this type, please consult the relevant reporting guidelines: STROBE. - randomised experiments - studies where participants are randomly allocated to different pre-designed experimental conditions (these include Randomised controlled trials [RCTs]). For example, to test the effectiveness of a drug, patients in a ward can be randomly allocated to a group that receives the drug in question, and a group that receives standard treatment, and then followed up for signs of improvement. For more information on what to look for in a preprint on a study of this type, please consult the relevant reporting guidelines: CONSORT. - case studies - studies that report data from a single patient or a single group of patients. For more information on what to look for in a preprint on a study of this type, please consult the relevant reporting guidelines: CARE. - systematic reviews and meta-analyses - summaries of the findings of already existing, independent studies. For more information on what to look for in a preprint on a study of this type, please consult the relevant reporting guidelines: PRISMA.
Let’s dig deeper		If the study type is not explicitly stated, check whether you can identify the study type after reading the paper. Use the question below for guidance: - Does the study pool the results from multiple previous studies? - If yes, it falls in the category systematic review/meta-analysis. - Does the study compare two or more experimenter-generated conditions or interventions in a randomised manner? - If yes, it is a randomised experiment. - Does the study explore the relationship between characteristics that were not experimenter-generated? - If yes, then it is an observational study - Does the study document one or multiple clinical cases? - If yes, it is a case study.
Transparency and Integrity	3	(a) Is a protocol, study plan, or registration of the study at hand mentioned? (b) Is data sharing mentioned? Mentioning any reasons against sharing also counts as a ‘yes’. Mentioning only that data will be shared “upon request” counts as a ‘no’. (c) Is materials sharing mentioned? Mentioning any reasons against sharing also counts as a ‘yes’. Mentioning only that materials will be shared “upon request” counts as a ‘no’. (d) Does the article contain an ethics approval statement (e.g., approval granted by institution, or no approval required)? (e) Have conflicts of interest been declared? Declaring that there were none also counts as a ‘yes’.	□ □ □ □ □
Why is this important?		Study protocols, plans, and registrations serve to define a study’s research question, sample, and data collection method. They are usually written before the study is conducted, thus preventing researchers from changing their hypotheses based on their results, which adds credibility. Some study types, like RCT’s, must be registered. Sharing data and materials is good scientific practice which allows people to review what was done in the study, and to try to reproduce the results. Materials refer to the tools used to conduct the study, such as code, chemicals, tests, surveys, statistical software, etc. Sometimes, authors may state that data will be “available upon request”, or during review, but that does not guarantee that they will actually share the data when asked, or after the preprint is published. Before studies are conducted, they must get approval from an ethical review board, which ensures that no harm will come to the study participants and that their rights will not be infringed. Studies that use previously collected data do not normally need ethical approval. Ethical approval statements are normally found in the methods section. Researchers have to declare any conflicts of interest that may have biased the way they conducted their study. For example, the research was perhaps funded by a company that produces the treatment of interest, or the researcher has received payments from that company for consultancy work. If a conflict of interest has not been declared, or if a lack of conflict of interest was declared, but a researcher’s affiliation matches with an intervention used in the study (e.g., the company that produces the drug that is found to be the most effective), that could indicate a potential conflict of interest, and a possible bias in the results. A careful check of the affiliation of the researchers can help identify potential conflicts of interest or other inconsistencies. Conflicts of interests should be declared in a dedicated section along with the contributions of each author to the paper.
Let’s dig deeper		(a) Can you access the protocol/study plan (e.g., via number or hyperlink) (b) Can you access at least part of the data (e.g., via hyperlink, or on the preprint server). Not applicable in case of a valid reason for not sharing. (c) Can you access at least part of the materials (e.g., via hyperlink, or on the preprint server). Not applicable in case of a valid reason for not sharing. (d) Can the ethical approval be verified (e.g., by number). Not applicable if it is clear that no approval was needed. By ‘access’, we mean whether you can look up and see the actual protocol, data, materials, and ethical approval. If you can, you can also look into whether it matches what is reported in the preprint.
Limitations	4	Are the limitations of the study addressed in the discussion/conclusion section?	□
Why is this important?		No research study is perfect, and it is important that researchers are transparent about the limitations of their own work. For example, many study designs cannot provide causal evidence, and some inadvertent biases in the design can skew results. Other studies are based on more or less plausible assumptions. Such issues should be discussed either in the Discussion, or even in a dedicated Limitations section.
Let’s dig deeper		Check for potential biases yourself. Here are some examples of potential sources of bias. 1. Check the study’s sample (methods section). Do the participants represent the target population? Testing a drug only on white male British smokers over 50 is probably not going to yield useful results for everyone living in the UK, for example. How many participants were there? There is no one-size-fits-all number of participants that makes a study good, but in general, the more participants, the stronger the evidence. 2. Was there a control group or control condition (e.g., placebo group or non-intervention condition)? If not, was there a reason? Having a control group helps to determine whether the treatment under investigation truly has an effect on an experimental group and reduces the possibility of making an erroneous conclusion. Not every study can have such controls though. Observational studies, for example, typically do not have a control group or condition, nor do case studies or reviews. If your preprint is on an observational study, case study, or review, this item may not apply. 3. Was there randomisation? That is, was the allocation of participants or groups of participants to experimental conditions done in a random way? If not, was there a reason? Randomisation is an excellent way to ensure that differences between treatment groups are due to treatment and not confounded by other factors. For example, if different treatments are given to patients based on their disease severity, and not at random, then the results could be due to either treatment effects or disease severity effects, or an interaction - we cannot know. However, some studies, like observational studies, case studies, or reviews, do not require randomisation. If your preprint is on an observational study, case study, or review, this item may not apply. 4. Was there blinding? Blinding means that some or all people involved in the study did not know how participants were assigned to experimental conditions. For example, if participants in a study do not know whether they are being administered a drug or a sham medication, the researchers can control for the placebo effect (people feeling better even after fake medication because of their expectation to get better). However, blinding is not always possible and cannot be applied in observational studies or reanalyses of existing non-blinded data, for example. If your preprint is on an observational study, case study, or review, this item may not apply). Other examples of sources of bias include spin, where results are reported in a misleading way to fit a story, and overinterpretation, where results are presented as if they mean more than they do.

Discussion

Over four successive stages of refining the PRECHECK checklist, we arrived at a four-item checklist approved by internal and external review, that can help critically evaluate preprints. After a set of workshops with Bachelor students from Psychology and Medicine, and science journalists, we concluded that the checklist was in a state where it was ready to be used by scientifically literate non-specialists. Here we recapitulate the main findings and discuss their implications.

The results of the sensitivity tests can be divided into three key findings. First, across both tests, preprints deemed to be of high quality had consistently more positive responses on the checklist than preprints deemed to be of low quality. This indicates that the checklist is effective at discriminating preprints of high and low quality, since positive responses mean that the preprint in question ‘passes’ a given criterion for good quality. Importantly, this holds even when the checklist is reduced to only four items, which is important for maintaining the user-friendliness of the checklist.

That said, and we consider this the second key finding, the deep level seemed to be especially important for discriminating preprint quality. It was especially evident in the second sensitivity test that combining the deep and superficial level is optimal for distinguishing high- from low-quality preprints. This is likely due to the nature of the questions at these two different levels of evaluation, as the superficial level probes surface level questions which can be considered a ‘bare minimum’ of quality. For example, it is standard scientific practice that the research question or aim of a study must be mentioned in its resulting manuscript, and indeed, most preprints, even ones that eventually end up being retracted, do offer this information. The issues with low-quality preprints rather seem to be in their designs, ethical considerations, handling of the data and interpretation of the results,²¹ and how transparently they report their results.¹⁸^,¹⁹ In our checklist, the first set of issues is detected by the deep level of the Limitations item, as this level asks users to engage with the content of the research reported in the preprint, and to check for potential issues with the design. The second set of issues is addressed by the transparency and integrity question, at both the deep and superficial levels. Both levels have been successful at detecting problems in low-quality preprints, but also high-quality preprints, which brings us to the final key finding.

Third, for both high- and low-quality preprints, the checklist highlighted issues with transparency and research integrity. This mirrors the state of the COVID-19 literature, as we have seen that though the credibility of the research reported in preprints can be sound,³⁷^–³⁹ data and materials sharing may nonetheless often be lacking.¹⁸^,¹⁹ This could be taken as a weakness of the checklist, as this item is not particularly discriminative of preprint quality. However, we believe it is a strength, albeit an unintended one. On one hand, this feature highlights where there is room for improvement even in works that are otherwise of good quality. Thus, there is potential for integrating the checklist in peer-review procedures, as one of our external experts suggested. On the other hand, it informs non-specialist audiences of the importance of transparency and integrity when considering the quality of a manuscript. As another external expert pointed out, certain practices such as open data sharing are considered to be at the forefront of scientific endeavours, and not all researchers adhere to these standards, even if their research is sound. However, we believe that part of the reason for this non-adherence is that not everyone believes such Open Science practices to be sufficiently important, even despite a clear need to improve reproducibility in many areas of science.⁶⁵^,⁶⁷ By dedicating an entire item to issues of transparent reporting and research integrity, we hope to encourage non-specialists as well as scientists to both pay attention to and think critically about issues of transparency and integrity, in addition to the soundness of the research being reported.

Apart from the external expert comments already mentioned, the (online survey) review process revealed several important insights. It was illuminating that many experts agreed that the initial draft was too complex for non-scientist audiences, while there were also suggestions for additional items that required, in our view, expertise that non-scientists would not reasonably have. This situation made it clear that we had to define our target audience more precisely, and strike the right balance between simplicity and functionality in our checklist to make it both usable and accurate. The implementation round confirmed that the checklist was indeed usable, as both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used. Further, in our workshops, there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own. We do acknowledge that many important aspects of checking the quality of a scientific work that the experts mentioned were omitted, such as verifying the references, checking whether the work fills a gap in knowledge, and checking whether the statistical analyses are justified. Though this can be construed as a limitation, we believe it is justified by the feasibility constraint of the checklist being a teaching tool for audiences that will very likely not have the expertise to verify preprints in such ways.

Another prominent theme that emerged in the external experts’ comments was an implicit prioritization of randomized controlled trials, which could in turn disadvantage observational studies. Though we did not find the checklist to be as gravely biased, as all three of our high-quality preprints⁵⁷^–⁵⁹ were on observational studies, and they ‘passed’ the items on the checklist very well. We nonetheless appreciated the importance of this point, as much of the COVID-19 research reported in preprints was indeed observational.¹⁶ In response, we made two substantial changes to the checklist. For one, we expanded the explanation of what observational studies are in the Why is this important? Section of Study type, by including concrete examples. This could help prevent non-experts from falsely identifying an observational study as having not indicated a study type, as unlike for randomized controlled trials, manuscripts on observational studies often do not explicitly mention that the research was observational. Moreover, we made it clear that the potential sources of bias mentioned in the ‘let’s dig deeper’ section in limitations were only examples, we added disclaimers for those sources of bias that may not apply to observational studies (control group/control condition, randomization, and blinding, specifically), and we included two more examples of biases that could apply to observational studies (spin and overinterpretation). We believe that these changes made the checklist more balanced towards all study types.

One important aspect of the checklist to note is its intended use as a teaching tool to guide non-specialist audiences to evaluate preprints more critically than they otherwise might. The checklist is not meant to be a litmus test for whether a preprint is trustworthy or not, but rather a set of guidelines for users to make their own judgements based on, and a pedagogical tool to improve people’s competences and confidence at evaluating the quality of research. Though the superficial level alone could, in theory, be automated, as we have seen, the best results are obtained when the superficial and deep levels are combined, and it is the deep level that allows the user to delve deeper into issues of study design and potential biases. Nonetheless, it is useful to have a division into a superficial and deep level of assessment, as this allows greater flexibility for users to apply the checklist according to their needs.

Conclusions

Over multiple iterative steps of internal and external review, sensitivity tests, and final polishing after the implementation phase, we created the PRECHECK checklist: a simple, user-friendly tool for helping scientifically literate non-experts critically evaluate preprint quality. We were inspired by the urgency of improving the public’s understanding of COVID-19 preprints, in which efforts to increase public awareness on preprint quality and competences at estimating preprint quality have been next to non-existent. Despite our COVID-19-related motivation, the PRECHECK checklist should be more broadly applicable to scientific works. With this, and the fact that our target audience could use the checklist, we believe that the checklist has great potential to help guide non-scientists’ understanding of scientific content, especially in preprint form.

Authors' contributions (CRediT)

Conceptualization: LH, EV, EF; Data curation: NT, SS, RH; Formal Analysis: NT, SS, RH; Funding acquisition: LH, EV; Investigation: NT, SS, RH; Methodology: NT, SS, RH; Project administration: NT, SS, RH; Resources: LH, EV; Software: NT, SS, RH; Supervision: LH, EV, EF; Validation: NT, RH, SS, EF, EV, LH; Visualization: NT, SS, RH; Writing – original draft: NT, RH; Writing – review & editing: NT, RH, SS, EF, EV, LH.

Data availability

OSF: PRECHECK. https://doi.org/10.17605/OSF.IO/NK4TA.⁶⁸

This project contains the following underlying data:

• Code for Figure 1 folder. [R code in an RMD document to reproduce Figure 1 with the data that is also uploaded in this folder].
• Sensitivity Test Preprints folder. [high-quality subfolder containing the high-quality preprints chosen for the sensitivity test (Bi et al., - 2020 - Epidemiology and Transmission of COVID-19 in Shenz.pdf, Lavezzo et al., - 2020 - Suppression of COVID-19 outbreak in the municipali.pdf, Wyllie et al., - 2020 - Saliva is more sensitive for SARS-CoV-2 detection.pdf ), low-quality subfolder containing the low-quality preprints chosen for the sensitivity test (Davido et al., - 2020 - Hydroxychloroquine plus azithromycin a potential.pdf, Elgazzar et al., - 2020 - Efficacy and Safety of Ivermectin for Treatment an.pdf, Pradhan et al., 2020 - Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag.pdf ), and workshop_nonsense subfolder containing the preprint used for the fact-checking seminar (Oodendijk - 2020 - SARS-CoV-2 was Unexpectedly Deadlier than.pdf )]
• Stage 1 folder. [Checklist Version after Stage 1 (20211018_PRECHECKchecklist.pdf ) and the sensitivity test performed using that version of the checklist (Stage1_SensTest.xlsx)].
• Stage 2 folder [Checklist Version after Stage 2 (20220117_PRECHECKchecklist.pdf ), the form that was used to collect expert responses (ExpertSurveyForm.pdf ), and the replies to expert free text comments (Point-by-pointExpertReplies.pdf )]
• Stage 3 folder [Checklist Version after Stage 3 (20220301_PRECHECKchecklist_afterComments.pdf ), the results of the sensitivity analyses done by the junior authors, NT (Stage3_SensTest_NT.xlsx) and RH (Stage3_SensTest_RH.xlsx)].

Extended data

OSF: PRECHECK. https://doi.org/10.17605/OSF.IO/NK4TA.⁶⁸

This project contains the following extended data:

• 20220520_FINAL_PRECHECKchecklist_afterComments_afterWorkshops.pdf. (Final version of the PRECHECK checklist).

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

Reporting guidelines

Repository: SRQR checklist for ‘Using an expert survey and user feedback to construct PRECHECK: A checklist to evaluate preprints on COVID-19 and beyond’. https://doi.org/10.17605/OSF.IO/JVHBW.⁶⁹

Acknowledgements

We would like to thank all of the experts that so graciously volunteered their time to refine this checklist, as well as the students and journalists that took part in our workshops.

References

1. Homolak J, Kodvanj I, Virag D: Preliminary analysis of COVID-19 academic information patterns: a call for open science in the times of closed borders. Scientometrics. 2020 Sep; 124(3): 2687–2701. PubMed Abstract | Publisher Full Text | Free Full Text
2. Gianola S, Jesus TS, Bargeri S, et al.: Characteristics of academic publications, preprints, and registered clinical trials on the COVID-19 pandemic. Mathes T, editor. PLoS One. 2020 Oct 6; 15(10): e0240123. PubMed Abstract | Publisher Full Text | Free Full Text
3. Schwab S, Held L: Science after Covid-19 - Faster, better, stronger? Significance. 2020; 17: 8–9. Publisher Full Text
4. Watson C: Rise of the preprint: how rapid data sharing during COVID-19 has changed science forever. Nat. Med. 2022; 28: 2–5. PubMed Abstract | Publisher Full Text
5. Kirkham JJ, Penfold NC, Murphy F, et al.: Systematic examination of preprint platforms for use in the medical and biomedical sciences setting. BMJ Open. 2020 Dec; 10(12): e041849. PubMed Abstract | Publisher Full Text | Free Full Text
6. Cowling BJ, Leung GM: Epidemiological research priorities for public health control of the ongoing global novel coronavirus (2019-nCoV) outbreak. Eurosurveillance. 2020; 25(6): 2000110.
7. Vlasschaert C, Topf JM, Hiremath S: Proliferation of Papers and Preprints During the Coronavirus Disease 2019 Pandemic: Progress or Problems With Peer Review? Adv. Chronic Kidney Dis. 27: 418–426. PubMed Abstract | Publisher Full Text | Free Full Text
8. Ravinetto R, Caillet C, Zaman MH, et al.: Preprints in times of COVID19: the time is ripe for agreeing on terminology and good practices. BMC Med. Ethics. 2021 Dec; 22(1): 1–5. Publisher Full Text
9. Sheldon T: Preprints could promote confusion and distortion. Nature. 2018; 559(7714): 445–446. PubMed Abstract | Publisher Full Text
10. Pradhan P, Pandey AK, Mishra A, et al.: Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag [Internet]. Evol. Biol. 2020 Jan [cited 2021 Aug 20]. Publisher Full Text
11. Kim MS, Jang SW, Park YK, et al.: Treatment response to hydroxychloroquine, lopinavir–ritonavir, and antibiotics for moderate COVID-19: a first report on the pharmacological outcomes from South Korea. MedRxiv. 2020.
12. Zhang C, Zheng W, Huang X, et al.: Protein Structure and Sequence Reanalysis of 2019-nCoV Genome Refutes Snakes as Its Intermediate Host and the Unique Similarity between Its Spike Protein Insertions and HIV-1. J. Proteome Res. 2020 Apr 3; 19(4): 1351–1360. PubMed Abstract | Publisher Full Text | Free Full Text
13. Alexander PE, Debono VB, Mammen MJ, et al.: COVID-19 coronavirus research has overall low methodological quality thus far: case in point for chloroquine/hydroxychloroquine. J. Clin. Epidemiol. 2020 Jul; 123: 120–126. PubMed Abstract | Publisher Full Text | Free Full Text
14. Lee S: Shoddy coronavirus studies are going viral and stoking panic. BuzzFeed News.Reference Source2020.
15. U.S. Food and Drug Administration (FDA): Coronavirus (COVID-19) Update: FDA Revokes Emergency Use Authorization for Chloroquine and Hydroxychloroquine.2020 Jun 15.
16. Johansson MA, Reich NG, Meyers LA, et al.: Preprints: An underutilized mechanism to accelerate outbreak science. PLoS Med. 2018 Apr 3; 15(4): e1002549. PubMed Abstract | Publisher Full Text | Free Full Text
17. Celi LA, Charpignon ML, Ebner DK, et al.: Gender Balance and Readability of COVID-19 Scientific Publishing: A Quantitative Analysis of 90,000 Preprint Manuscripts. Health Informatics. 2021 Jun [cited 2022 Jun 27]. Publisher Full Text
18. Sumner J, Haynes L, Nathan S, et al.: Reproducibility and reporting practices in COVID-19 preprint manuscripts. Health Informatics. 2020 Mar [cited 2021 Apr 16]. Publisher Full Text
19. Strcic J, Civljak A, Glozinic T, et al.: Open data and data sharing in articles about COVID-19 published in preprint servers medRxiv and bioRxiv. Scientometrics. 2022 May; 127(5): 2791–2802. PubMed Abstract | Publisher Full Text | Free Full Text
20. Yesilada M, Holford DL, Wulf M, et al.: Where: Tracking the development of COVID-19 related PsyArXiv preprints. PsyArXiv.2021 May [cited 2022 Jun 27]. Reference Source
21. Bramstedt KA: The carnage of substandard research during the COVID-19 pandemic: a call for quality. J. Med. Ethics. 2020 Dec; 46(12): 803–807. PubMed Abstract | Publisher Full Text
22. Chalmers I, Glasziou P: Avoidable Waste in the Production and Reporting of Research Evidence.2009; 114(6): 5.
23. Macleod MR, Michie S, Roberts I, et al.: Biomedical research: increasing value, reducing waste. Lancet. 2014; 383(9912): 101–104. Publisher Full Text
24. Brierley L: Lessons from the influx of preprints during the early COVID-19 pandemic. Lancet Planet. Health. 2021 Mar; 5(3): e115–e117. PubMed Abstract | Publisher Full Text
25. Oikonomidi T: Changes in evidence for studies assessing interventions for COVID-19 reported in preprints: meta-research study. BMC Med. 2020; 10.
26. Sevryugina Y, Dicks AJ: Publication practices during the COVID-19 pandemic: Biomedical preprints and peer-reviewed literature. BioRxiv. 2021; 63.
27. Jung YEG, Sun Y, Schluger NW: Effect and reach of medical articles posted on preprint servers during the COVID-19 pandemic. JAMA Intern. Med. 2021; 181(3): 395–397. PubMed Abstract | Publisher Full Text | Free Full Text
28. Gehanno JF, Grosjean J, Darmoni SJ, et al.: Reliability of citations of medRxiv preprints in articles published on COVID-19 in the world leading medical journals. Infectious Diseases (except HIV/AIDS). 2022 Feb [cited 2022 Jun 27]. Publisher Full Text
29. Lachapelle F: COVID-19 Preprints and Their Publishing Rate: An Improved Method. Infectious Diseases (except HIV/AIDS). 2020 Sep [cited 2021 Apr 16]. Publisher Full Text
30. Bordignon F, Ermakova L, Noel M: Over-promotion and caution in abstracts of preprints during the COVID -19 crisis. Learned Publishing. 2021 Oct; 34(4): 622–636. PubMed Abstract | Publisher Full Text | Free Full Text
31. Nicolalde B, Añazco D, Mushtaq M, et al.: Citations and publication rate of preprints on pharmacological interventions for COVID-19: The good, the bad and, the ugly. In Review.2020 Sep [cited 2022 Jun 28]. Reference Source
32. Bero L, Lawrence R, Leslie L, et al.: Cross-sectional study of preprints and final journal publications from COVID-19 studies: discrepancies in results reporting and spin in interpretation. BMJ Open. 2021 Jul; 11(7): e051821. PubMed Abstract | Publisher Full Text | Free Full Text
33. Carneiro CFD, Queiroz VGS, Moulin TC, et al.: Comparing quality of reporting between preprints and peer-reviewed articles in the biomedical literature. Res. Integr. Peer Rev. 2020 Dec; 5(1): 16. PubMed Abstract | Publisher Full Text | Free Full Text
34. Klein M, Broadwell P, Farb SE, et al.: Comparing published scientific journal articles to their pre-print versions. Int. J. Digit. Libr. 2019 Dec; 20(4): 335–350. Publisher Full Text
35. Shi X, Ross JS, Amancharla N, et al.: Assessment of Concordance and Discordance Among Clinical Studies Posted as Preprints and Subsequently Published in High-Impact Journals. JAMA Netw. Open. 2021 Mar 18; 4(3): e212110. PubMed Abstract | Publisher Full Text | Free Full Text
36. Zeraatkar D, Pitre T, Leung G, et al.: Consistency of covid-19 trial preprints with published reports and impact for decision making: retrospective review. BMJ Med. 2022 Oct; 1(1): e000309. PubMed Abstract | Publisher Full Text | Free Full Text
37. Zeraatkar D, Pitre T, Leung G, et al.: The trustworthiness and impact of trial preprints for COVID-19 decision-making: A methodological study. Epidemiology. 2022 Apr [cited 2022 Jun 27]. Publisher Full Text
38. Wang Y: The collective wisdom in the COVID-19 research: Comparison and synthesis of epidemiological parameter estimates in preprints and peer-reviewed articles. Int. J. Infect. Dis. 2021; 9. Publisher Full Text
39. Majumder MS, Mandl KD: Early in the epidemic: impact of preprints on global discourse about COVID-19 transmissibility. Lancet Glob. Health. 2020 May 1; 8(5): e627–e630. PubMed Abstract | Publisher Full Text | Free Full Text
40. Clyne B, Walsh KA, O’Murchu E, et al.: Using preprints in evidence synthesis: Commentary on experience during the COVID-19 pandemic. J. Clin. Epidemiol. 2021 Oct; 138: 203–210. PubMed Abstract | Publisher Full Text | Free Full Text
41. Powell K: Does it take too long to publish research? Nature. 2016; 530(7589): 148–151. PubMed Abstract | Publisher Full Text
42. Henderson M: Problems with peer review. BMJ. 2010; 340: c1409. Publisher Full Text
43. Benos DJ, Bashari E, Chaves JM, et al.: The ups and downs of peer review. Adv. Physiol. Educ. 2007 Jun; 31(2): 145–152. PubMed Abstract | Publisher Full Text
44. Campanario JM: Peer review for journals as it stands today—Part 2. Sci. Commun. 1998; 19(4): 277–306. Publisher Full Text
45. Smith R: Peer Review: A Flawed Process at the Heart of Science and Journals. J. R. Soc. Med. 2006; 99: 5.
46. Weissgerber T, Riedel N, Kilicoglu H, et al.: Automated screening of COVID-19 preprints: can we help authors to improve transparency and reproducibility? Nat. Med. 2021; 27(1): 6–7. PubMed Abstract | Publisher Full Text | Free Full Text
47. Van Noorden R: Pioneering duplication detector trawls thousands of coronavirus preprints. Nature. 2020. Publisher Full Text
48. Limaye RJ, Sauer M, Ali J, et al.: Building trust while influencing online COVID-19 content in the social media world. Lancet Digital Health. 2020 Jun; 2(6): e277–e278. PubMed Abstract | Publisher Full Text | Free Full Text
49. Wingen T, Berkessel JB, Dohle S: Caution, Preprint! Brief Explanations Allow Nonscientists to Differentiate Between Preprints and Peer-Reviewed Journal Articles. Adv. Methods Pract. Psychol. Sci. 2022;15.
50. Iborra SF, Polka J, Monaco S, et al.: FAST principles for preprint feedback.2022.
51. Dalkey NC: The Delphi method: An experimental study of group opinion. RAND CORP SANTA MONICA CA; 1969.
52. Dalkey N, Helmer O: An experimental application of the Delphi method to the use of experts. Manag. Sci. 1963; 9(3): 458–467. Publisher Full Text
53. O’Brien BC, Harris IB, Beckman TJ, et al.: Standards for Reporting Qualitative Research: A Synthesis of Recommendations. Acad. Med. 2014 Sep; 89(9): 1245–1251. Publisher Full Text
54. Meshkat B, Cowman S, Gethin G, et al.: Using an e-Delphi technique in achieving consensus across disciplines for developing best practice in day surgery in Ireland. JHA. 2014 Jan 22; 3(4): 1. Publisher Full Text
55. Parasher A: COVID research: a year of scientific milestones. Nature. 2021.
56. Retraction Watch team. Retracted coronavirus (COVID-19) papers. Retraction Watch blog.2020.
57. Lavezzo E, Franchin E, Ciavarella C, et al.: Suppression of COVID-19 outbreak in the municipality of Vo’, Italy. Epidemiology. 2020 Apr [cited 2021 Aug 13]. Publisher Full Text
58. Bi Q, Wu Y, Mei S, et al.: Epidemiology and Transmission of COVID-19 in Shenzhen China: Analysis of 391 cases and 1,286 of their close contacts. Infectious Diseases (except HIV/AIDS). 2020 Mar [cited 2021 Aug 13]. Publisher Full Text
59. Wyllie AL, Fournier J, Casanovas-Massana A, et al.: Saliva is more sensitive for SARS-CoV-2 detection in COVID-19 patients than nasopharyngeal swabs. Infectious Diseases (except HIV/AIDS). 2020 Apr [cited 2021 Aug 13]. Publisher Full Text
60. Elgazzar A, Eltaweel A, Youssef SA, et al.: Efficacy and Safety of Ivermectin for Treatment and prophylaxis of COVID-19 Pandemic. In Review.2020 Dec [cited 2021 Aug 20]. Reference Source
61. Davido B, Lansaman T, Bessis S, et al.: Hydroxychloroquine plus azithromycin: a potential interest in reducing in-hospital morbidity due to COVID-19 pneumonia (HI-ZY-COVID)? Infectious Diseases (except HIV/AIDS). 2020 May [cited 2021 Aug 20]. Publisher Full Text
62. Lynn MR: Determination and quantification of content validity. Nurs. Res. 1986; 35: 382–386. Publisher Full Text
63. Hasson F: Research guidelines for the Delphi survey technique. J. Adv. Nurs. 2000; 32: 1008. Publisher Full Text
64. Hopewell S, Clarke M, Moher D, et al.: CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLoS Med. 2008 Jan; 5(1): e20. PubMed Abstract | Publisher Full Text | Free Full Text
65. Stodden V, Seiler J, Ma Z: An empirical analysis of journal policy effectiveness for computational reproducibility. Proc. Natl. Acad. Sci. U. S. A. 2018 Mar 13; 115(11): 2584–2589. PubMed Abstract | Publisher Full Text | Free Full Text
67. Baker M: 1,500 scientists lift the lid on reproducibility. Nature. 2016; 533(7604): 452–454. PubMed Abstract | Publisher Full Text
66. Oodendijk W, Rochoy M, Ruggeri V, et al.: SARS-CoV-2 was unexpectedly deadlier than push-scooters: could hydroxychloroquine be the unique solution. Asian J. Med. Health. 2020; 18(9): 14–21.
68. Schwab S, Turoman N, Heyard R, et al.: Precheck. Dataset. 2023, January 30. Publisher Full Text
69. Heyard R, Schwab S, Turoman N, et al.: Reporting Guideline. OSF. 2023. Publisher Full Text

Footnotes

1 For experience-based suggestions for teams of researchers interested in rapid peer-review, see Clyne and colleagues.⁴⁰

2 As we will explain below, high and low quality was decided based on proxies, and not on objective criteria, as we could not determine a set of such criteria that apply to all preprints across all disciplines that the checklist could possibly be used on. With this, when we refer to ‘high-quality’ and ‘low-quality’ preprints, we do so for brevity, and not because the preprints in question belong to predefined universally agreed upon categories of ‘good’ and ‘bad’. Further, the checklist is not a litmus test for whether a preprint is ‘good’ versus ‘bad’. Rather, it is intended to be used as a set of guidelines and a tool to get scientifically literate non-specialist to think critically about how they read preprints.

Comments on this article Comments (0)

Version 3

VERSION 3 PUBLISHED 01 Jun 2023

Author details Author details

Simon Schwab
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Software, Validation, Visualization, Writing – Review & Editing

Eva Furrer
Roles: Conceptualization, Supervision, Validation, Writing – Review & Editing

Evie Vergauwe
Roles: Conceptualization, Funding Acquisition, Resources, Supervision, Validation, Writing – Review & Editing

Leonhard Held
Roles: Conceptualization, Funding Acquisition, Resources, Supervision, Validation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This project was supported by the UZH-UNIGE Joint Seed Funding for Collaboration in Research and Teaching between the University of Zürich and the University of Geneva, to LH and EV.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (3)

version 3

Revised

Published: 03 Jun 2024, 12:588

https://doi.org/10.12688/f1000research.129814.3

version 2

Revised

Published: 26 Jan 2024, 12:588

https://doi.org/10.12688/f1000research.129814.2

version 1

Published: 01 Jun 2023, 12:588

https://doi.org/10.12688/f1000research.129814.1

© 2023 Turoman N et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Turoman N, Heyard R, Schwab S et al. Using an expert survey and user feedback to construct PRECHECK: A checklist to evaluate preprints on COVID-19 and beyond [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2023, 12:588 (https://doi.org/10.12688/f1000research.129814.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 01 Jun 2023

Views

Reviewer Report 11 Oct 2023

Alice Fleerackers, Simon Fraser University, Burnaby, British Columbia, Canada

Not Approved

https://doi.org/10.5256/f1000research.142526.r199813

Thank you for giving me the opportunity to review this study. The topic is timely and relevant, and addresses a question I have spent a lot of time thinking about: How do you help people outside of academia vet the quality of preprint research?

The study uses the DELPHI method to develop a checklist that ‘scientifically literate non-experts’ can use to assess the quality of preprint research. That is, the researchers developed an initial checklist, that was then iterated and refined through multiple rounds of expert feedback and testing. The experts who contributed feedback included academics (including those on the study team but also researchers with expertise in preprints, peer review, meta research, etc), as well as science journalists and students of medicine and psychology.

The strengths of this study include the use of pre-defined protocols and quality reporting checklists (e.g., SRQR, Delphi method) and the choice to explicitly invite feedback from target users. I found that overall, the study was well-written. It provides links to iterations of the checklist, as well as data and protocols, which improves the transparency of the research.

However, I have several concerns with the manuscript—some of which are major:

First, I am very surprised that this did not require ethics review. While it does appear to fall outside of the scope of the university’s ethics guidelines, it was not clear from the manuscript whether participants were told their data would be used in a study. Rather students and journalists agreed to participate in a ‘workshop.’ Were they told their data would be used in reports/research? It would be helpful if the authors could clarity.

Second, I have several concerns about the participants. While both science students and science journalists may be considered scientifically literate non-experts, these user groups have very different needs and likely very different understandings of academic publishing, research, and peer review—as well as different challenges and uses with respect to preprints. In addition, there are many other types of scientifically literate non-experts who could have been recruited to this study, such as policymakers, high school science teachers, ordinary people who studied science in the past, etc. A clear rationale for why you chose these two groups would considerably strengthen the manuscript.

Relatedly, these groups (students, journalists) are incredibly diverse and there may be very different levels of science literacy within each group. For example, journalists in many beats use research in their reporting (not just those specialized in science), but some use it more frequently than others and some have professional training in a STEM field (e.g., they did a science degree before becoming journalists). Those who use research frequently, and/or have a research degree, likely have more in-depth knowledge than those who only occasionally use research or don’t have a STEM background. Similarly, some of the students in these two classes may have already completed a degree in STEM, or worked as researchers, while others may not have. Yet you provide no information about participants’ level of science literacy, previous experience with research, or education. Given that your target population is ‘scientifically literate non-experts’ then I would expect some assessment of science literacy, as well as a check that none of them are really ‘experts,’ to be a requirement when sampling participants. More broadly, it would be useful to know a bit more about both sets of participants (students and journalists), especially their educational backgrounds and level of experience with academic research, as this would provide a better understanding of who this checklist would be most useful for.

Relatedly, very little information is provided about how the science journalists who were invited to participate in the workshops were recruited. How did you define ‘science journalist’? Did you include freelancers or staff reporters only? What about general news reporters who sometimes cover science? Did you only look at traditional, legacy journalism outlets or did you also consider newer forms of journalism? Providing more clarity around the inclusion or exclusion criteria for participating in the study, and the population of journalists from which you sampled, would improve the quality of the methods section.

On a more minor note, you mention getting input on the checklist from “experts across four topic groups: research methodology (14 invitations), preprints (15 invitations), journal editors (15 invitations), and science journalists (10 invitations).” The rest of the paragraph explains who these people are in more detail, but at first reading it was not clear whether you were inviting researchers who study science journalists or the journalists themselves. Similarly, did you invite preprint server staff, preprint authors, or researchers who study preprints? I would adapt the language a bit here for clarity.

In addition, while I am not an expert in the DELPHI method, I do have some concerns about how it was implemented, given the conclusions that were drawn.

First, this is a mixed method study, not a completely qualitative one, given that you gathered Likert-scale data (which was used to perform quantitative tests), as well as open-ended feedback. In addition, the Sensitivity Testing relied on binary data (i.e., yes/no) responses, which is quantitative. While I commend the authors for using Standards for Reporting Qualitative Research (SRQR), some standards for quantitative research have not been met. Specifically, there was no information about the degree to which the various users who used the checklist in the Sensitivity Testing agreed with one another. How often did one coder choose yes and the other choose no? Normally I would expect to see some form of intercoder reliability test here, especially if the goal of the study is to develop a checklist that can help people assess the quality of the research. If the checklist works well, then two individuals should be able to come to similar conclusions when using it. On a more minor note, in qualitative research, it is not true that “in general, the more participants, the stronger the evidence” (Turoman et al., 2023, p. 14). I would remove this statement or make it clear in the text that it relates specifically to the quantitative aspects of the research.

Second, it strikes me that the checklist was never sensitivity tested by participants in the two target groups, only by members the study team. Why is that? Given that the research team are experts in this issue and had read and attempted to assess the quality of the checklist multiple times, they would likely be much more adept at applying the checklist than journalists or students using the checklist for the first time. This seems like a major limitation of the study, given that it seeks to provide a checklist that is useful for these groups. You do comment that “both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used” and that “there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own.” (Turoman et al., 2023, p. 15), suggesting that you did examine the similarity in participants’ responses to the checklist questions. If this was the case, including the process used to make the comparison in the methods section, as well as the results of the comparison, would considerably strengthen the manuscript. If you did not compare the users’ responses in any systematic way, I do not think you can make conclusive claims such as, “The implementation round confirmed that the checklist was indeed usable (Turoman et al., 2023, p. 15) or . “Appearing” to understand how to use a checklist is not the same as understanding how to use it, nor using it in a way that is effective and reliable.

Third, I found myself wondering about the choice of preprints used in the testing. How could reusing the same preprints over the various rounds have influenced results, especially given that the evaluation/application of the checklist was performed by the same two members each time (as mentioned above). Similarly, while the proxies you used to identify ‘high’ and ‘low’ quality preprints are creative, and has advantages over other proxy measures of quality (e.g., preprints published in high impact journals, or cited a lot), it seems like the Milestone List is based on the significance/usefulness of the research for addressing the COVID-19 crisis, not the integrity of the methods, data, study design, or reporting. Yet this is just a guess, because looking at the list, it's not clear at all what criteria were used to select the articles—it just says that "Nature highlighted key papers and preprints to help readers keep up with the flood of coronavirus research." What makes a paper "key" in this case? This seems very important given how central this list is to the method of the study. More broadly, I would appreciate a definition of "quality" somewhere earlier in this article, as well as a clear discussion of the limitations associated with the sample (e.g., limited to only three preprints, all of which were about COVID-19, and all of which are biomedical). The checklist seems very inappropriate for assessing the ‘quality’ of research or scholarship in other disciplines, such as the social sciences or humanities. You may want to rephrase the purpose and conclusions of the study to specify the focus on biomedical preprints, rather than on preprints in general.

In addition, there is some literature missing from the review that seems worth citing with respect to journalists’ use of preprints, as well as the public’s understanding/reponses to media coverage of preprints. This includes some of my own work (I know, reviewers always do this), but also of other scholars. Here are some citations worth considering. I also encourage the authors to do an additional literature search, since more research may have come online recently of which I am not aware:

van Schalkwyk, F., & Dudek, J. (2022). Reporting preprints in the media during the Covid-19 pandemic. Public Understanding of Science, 31(5), 608–616. https://doi.org/10.1177/09636625221077392
Massarani, L., & Neves, L. F. F. (2022). Reporting COVID-19 preprints: Fast science in newspapers in the United States, the United Kingdom and Brazil. Ciência & Saúde Coletiva, 27, 957–968. https://doi.org/10.1590/1413-81232022273.20512021
Massarani, L., Neves, L. F. F., Entradas, M., Lougheed, T., & Bauer, M. W. (2021). Perceptions of the impact of the COVID-19 pandemic on the work of science journalists: Global perspectives. Journal of Science Communication, 20(07), A06. https://doi.org/10.22323/2.20070206
Massarani, L., Neves, L. F. F., & Silva, C. M. da. (2021). Excesso e alta velocidade das informações científicas: Impactos da COVID-19 no trabalho de jornalistas. E-Compós. https://doi.org/10.30962/ec.2426
Oliveira, T., Araujo, R. F., Cerqueira, R. C., & Pedri, P. (2021). Politização de controvérsias científicas pela mídia brasileira em tempos de pandemia: A circulação de preprints sobre Covid-19 e seus reflexos. Revista Brasileira de História da Mídia, 10(1), Article 1. https://doi.org/10.26664/issn.2238-5126.101202111810
Fleerackers, A., Moorhead, L. L., Maggio, L. A., Fagan, K., & Alperin, J. P. (2022). Science in motion: A qualitative analysis of journalists’ use and perception of preprints. PLOS ONE, 17(11), e0277769. https://doi.org/10.1371/journal.pone.0277769
Fleerackers, A., Riedlinger, M., Moorhead, L. L., Ahmed, R., & Alperin, J. P. (2022). Communicating scientific uncertainty in an age of COVID-19: An investigation into the use of preprints by digital media outlets. Health Communication, 37(6), 726–738. https://doi.org/10.1080/10410236.2020.1864892
Fleerackers, A., Shores, K., Chtena, N., & Alperin, J. P. (2023). Unreviewed science in the news: The evolution of preprint media coverage from 2014-2021 (2023.07.10.548392). bioRxiv. https://doi.org/10.1101/2023.07.10.548392
Ratcliff, C. L., Fleerackers, A., Wicke, R., Harvill, B., King, A. J., & Jensen, J. D. (2023). Framing COVID-19 preprint research as uncertain: A mixed-method study of public reactions. Health Communication, 0(0), 1–14. https://doi.org/10.1080/10410236.2023.2164954
Cyr, C., Cataldo, T. T., Brannon, B., Buhler, A., Faniel, I., Connaway, L. S., Valenza, J. K., Elrod, R., & Putnam, S. (2021). Backgrounds and behaviors: Which students successfully identify online resources in the face of container collapse. First Monday. https://doi.org/10.5210/fm.v26i3.10871
Cataldo, T., Faniel, I., Buhler, A., Brannon, B., Connaway, L., & Putnam, S. (2023). Students’ Perceptions of Preprints Discovered in Google: A Window into Recognition And Evaluation. College & Research Libraries, 84(1). https://doi.org/10.5860/crl.84.1.137
Sebbah, B., Bousquet, F., & Cabanac, G. (2022). Le journalisme scientifique à l’épreuve de l’actualité « tout covid » et de la méthode scientifique. Les Cahiers du journalisme, 2(8–9), R119–R135. https://doi.org/10.31188/CaJsm.2(8-9).2022.R119

Finally, I encourage the authors to make the implications and contributions of the work clearer. From a practical perspective, I found myself wondering, is this actually a checklist for vetting the quality of preprints, or for assessing research quality in general? Given that there are many, many checklists available for assessing the quality of peer reviewed research, answering this question seems very important. Perhaps the contribution is that this checklist could help nonexperts vet research—peer reviewed or not. I do not know this area well enough to know the conceptual, theoretical, or scholarly gaps that have been filled, but I am sure the authors can do so.

I know that these are a lot of comments and concerns, but I have included them not to be rude or overly critical but because I believe addressing will make the study better able to contribute to an important issue that I care about deeply. I hope the authors find them helpful and are not discouraged from pursuing future research about journalists’ use of preprints. We need more of it!

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions drawn adequately supported by the results?

Partly

References

1. van Schalkwyk F, Dudek J: Reporting preprints in the media during the COVID-19 pandemic.Public Underst Sci. 2022; 31 (5): 608-616 PubMed Abstract | Publisher Full Text
2. Massarani L, Neves LFF: Reporting COVID-19 preprints: fast science in newspapers in the United States, the United Kingdom and Brazil.Cien Saude Colet. 2022; 27 (3): 957-968 PubMed Abstract | Publisher Full Text
3. Massarani L, Neves L, Entradas M, Lougheed T, et al.: Perceptions of the impact of the COVID-19 pandemic on the work of science journalists: global perspectives. Journal of Science Communication. 2021; 20 (07). Publisher Full Text
4. Massarani L, Neves L, Da Silva C: Excesso e alta velocidade das informações científicas. E-Compós. 2021. Publisher Full Text
5. Oliveira T, Araujo R, Cerqueira R, Pedri P: Politização de controvérsias científicas pela mídia brasileira em tempos de pandemia: a circulação de preprints sobre Covid-19 e seus reflexos. Revista Brasileira de História da Mídia. 2021; 10 (1). Publisher Full Text
6. Fleerackers A, Moorhead LL, Maggio LA, Fagan K, et al.: Science in motion: A qualitative analysis of journalists' use and perception of preprints.PLoS One. 2022; 17 (11): e0277769 PubMed Abstract | Publisher Full Text
7. Fleerackers A, Riedlinger M, Moorhead L, Ahmed R, et al.: Communicating Scientific Uncertainty in an Age of COVID-19: An Investigation into the Use of Preprints by Digital Media Outlets.Health Commun. 2022; 37 (6): 726-738 PubMed Abstract | Publisher Full Text
8. Fleerackers A, Shores K, Chtena N, Alperin J: Unreviewed science in the news: The evolution of preprint media coverage from 2014-2021. bioRxiv. 2023. Publisher Full Text
9. Ratcliff CL, Fleerackers A, Wicke R, Harvill B, et al.: Framing COVID-19 Preprint Research as Uncertain: A Mixed-Method Study of Public Reactions.Health Commun. 2023. 1-14 PubMed Abstract | Publisher Full Text
10. Cyr C, Tobin Cataldo T, Brannon B, Buhler A, et al.: Backgrounds and behaviors: Which students successfully identify online resources in the face of container collapse. First Monday. 2021. Publisher Full Text
11. Cataldo T, Faniel I, Buhler A, Brannon B, et al.: Students’ Perceptions of Preprints Discovered in Google: A Window into Recognition And Evaluation. College & Research Libraries. 2023; 84 (1). Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Science journalism (specifically the use of preprints), science communication, health communication, preprints/open science

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Author Response 22 Mar 2024

Nora Turoman, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland

22 Mar 2024

Author Response
Point-by-point response to Reviewer 2 - Alice Fleerackers
Thank you for giving me the opportunity to review this study. The topic is timely and relevant, and addresses a question I ... Continue reading
Point-by-point response to Reviewer 2 - Alice Fleerackers
Thank you for giving me the opportunity to review this study. The topic is timely and relevant, and addresses a question I have spent a lot of time thinking about: How do you help people outside of academia vet the quality of preprint research?
The study uses the DELPHI method to develop a checklist that ‘scientifically literate non-experts’ can use to assess the quality of preprint research. That is, the researchers developed an initial checklist, that was then iterated and refined through multiple rounds of expert feedback and testing. The experts who contributed feedback included academics (including those on the study team but also researchers with expertise in preprints, peer review, meta research, etc), as well as science journalists and students of medicine and psychology.
The strengths of this study include the use of pre-defined protocols and quality reporting checklists (e.g., SRQR, Delphi method) and the choice to explicitly invite feedback from target users. I found that overall, the study was well-written. It provides links to iterations of the checklist, as well as data and protocols, which improves the transparency of the research.
However, I have several concerns with the manuscript — some of which are major:

First, I am very surprised that this did not require ethics review. While it does appear to fall outside of the scope of the university’s ethics guidelines, it was not clear from the manuscript whether participants were told their data would be used in a study. Rather students and journalists agreed to participate in a ‘workshop.’ Were they told their data would be used in reports/research? It would be helpful if the authors could clarity.

Reply: Thank you for your concern, we do care about transparency very much and paid attention to this aspect throughout the study. The situation was different for the student workshops and the journalist workshops. For the student workshops we write in the text and this corresponds to the actual process: “The responses were only analysed inasmuch as percentages of ‘yes’, ‘no’, and ‘partly/maybe’ responses were automatically generated by the respective platforms based on the responses entered by the students, during the class, for the purpose of in-class discussion. These responses were not recorded outside of this platform (e.g., on any project team member’s computer), nor were they subject to any formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way.” Therefore, the student workshops are simply described in this article to showcase how the PRECHECK checklist could be used in practice; the responses that the students gave as part of the workshop were not used to further inform the checklist. Hence, they were not told that their data would be used in a study because there was no intention to do so.
For the journalist workshops, our intention was the same as above: we wanted to use the PRECHECK checklist purely for education, without gathering data for the project. But the journalists were informed through our presentation that we were the developers of the checklist, and some participants offered unsolicited feedback. In this situation it was clear to them that their feedback could be incorporated into a new version of the checklist and they did indeed give it for this purpose. In the article text we now write (addition from this revision in italics): “In the workshop for journalists, some participants offered unsolicited yet helpful feedback, which was incorporated into the finalised version of the checklist. These participants were aware that their feedback would be used for this purpose through the presentation given during the workshop.”
Additionally we extended the section Ethical considerations with some more explanations of the facts.

Second, I have several concerns about the participants. While both science students and science journalists may be considered scientifically literate non-experts, these user groups have very different needs and likely very different understandings of academic publishing, research, and peer review—as well as different challenges and uses with respect to preprints. In addition, there are many other types of scientifically literate non-experts who could have been recruited to this study, such as policymakers, high school science teachers, ordinary people who studied science in the past, etc. A clear rationale for why you chose these two groups would considerably strengthen the manuscript.

Reply: Thank you for this valid point. First we would like to mention that the section describing these workshops was situated in an incorrect location in the article. We have now moved it to the correct placement in the Methods Section.
We chose both groups first out of convenience, as we had direct access to them and an inherent interest in reaching them with our content. Initially our project was motivated by the Covid pandemic, during which results from preprints ended up as highlights in news outlets, and thus we wanted specifically to address science journalists. We are aware of the fact that Science journalists are actually a group of people who already have a lot of experience with vetting scientific literature. In contrast, our second group, the students, only have limited knowledge on the publishing system etc. Both the students at the University of Zurich and Geneva were at a stage in their studies where they needed to get used to using literature, the lectures in which our workshops were held had, in part, the purpose of introducing them to that. Consequently, given that they will have to read scientific papers as well as preprints for many of their future lectures and projects, we aimed at giving them a tool to help quality check what they plan on reading. We have added more explanations of these choices in the section Study Design. Specifically, we state “The chosen setup of the preliminary implementation phase does not allow a systematic validation of the checklist, especially since the target group was chosen mainly for convenience.”

Relatedly, these groups (students, journalists) are incredibly diverse and there may be very different levels of science literacy within each group. For example, journalists in many beats use research in their reporting (not just those specialized in science), but some use it more frequently than others and some have professional training in a STEM field (e.g., they did a science degree before becoming journalists). Those who use research frequently, and/or have a research degree, likely have more in-depth knowledge than those who only occasionally use research or don’t have a STEM background. Similarly, some of the students in these two classes may have already completed a degree in STEM, or worked as researchers, while others may not have. Yet you provide no information about participants’ level of science literacy, previous experience with research, or education. Given that your target population is ‘scientifically literate non-experts’ then I would expect some assessment of science literacy, as well as a check that none of them are really ‘experts,’ to be a requirement when sampling participants. More broadly, it would be useful to know a bit more about both sets of participants (students and journalists), especially their educational backgrounds and level of experience with academic research, as this would provide a better understanding of who this checklist would be most useful for.
Reply: We understand why this information could be of interest, but since we did not collect any data on the workshop participants we cannot provide this information. The text states that “The responses were only analysed inasmuch as percentages of ‘yes’, ‘no’, and ‘partly/maybe’ responses were automatically generated by the respective platforms based on the responses entered by the students, during the class, for the purpose of in-class discussion. These responses were not recorded outside of this platform (e.g., on any project team member’s computer), nor were they subject to any formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way. We did not ask any question about the workshop participants’ educational background or level of experience with academic research.” We thus added the issues raised here to the new Limitations section of the manuscript.

Relatedly, very little information is provided about how the science journalists who were invited to participate in the workshops were recruited. How did you define ‘science journalist’? Did you include freelancers or staff reporters only? What about general news reporters who sometimes cover science? Did you only look at traditional, legacy journalism outlets or did you also consider newer forms of journalism? Providing more clarity around the inclusion or exclusion criteria for participating in the study, and the population of journalists from which you sampled, would improve the quality of the methods section.
Reply: Thank you for pointing this out, we now tried to be clearer in the text. The first workshop for science journalists was organised for science journalists employed at the Universities of Geneva (i.e. the members of the respective Communication Departments). The parenthesis has been added to the text during this revision and explains that we again used a “convenience sample” within our own institutions. Second, we were invited to present at a workshop for science journalists organised by a third party, the Swiss Association of Science Journalists, for which the recruitment was not in our hands. As explained on the association’s website, ordinary members of the association must attest that they work as journalists for an independent media outlet and their work should primarily focus on scientific, medical, technical or science policy issues.
We would like to stress again that none of the workshops were intended to collect data for a validation of the checklist. All events were, what we now call, preliminary implementation events, where we showcased the checklist for use in a teaching setting.

On a more minor note, you mention getting input on the checklist from “experts across four topic groups: research methodology (14 invitations), preprints (15 invitations), journal editors (15 invitations), and science journalists (10 invitations).” The rest of the paragraph explains who these people are in more detail, but at first reading it was not clear whether you were inviting researchers who study science journalists or the journalists themselves. Similarly, did you invite preprint server staff, preprint authors, or researchers who study preprints? I would adapt the language a bit here for clarity.
Reply: Thanks a lot for this comment which shows that more information is needed. We realised the sentence you also copied presenting the four topics was confusing, since we used jobs (editors and journalists) to refer to topics. Hence, this was changed now to journal publishing (15 invitations), and science journalism (10 invitations). We also added another clarifying sentence: Thus, we invited science journalists, journal editors, as well as researchers whose research focuses on research methodology and preprints.

In addition, while I am not an expert in the DELPHI method, I do have some concerns about how it was implemented, given the conclusions that were drawn.

First, this is a mixed method study, not a completely qualitative one, given that you gathered Likert-scale data (which was used to perform quantitative tests), as well as open-ended feedback. In addition, the Sensitivity Testing relied on binary data (i.e., yes/no) responses, which is quantitative. While I commend the authors for using Standards for Reporting Qualitative Research (SRQR), some standards for quantitative research have not been met. Specifically, there was no information about the degree to which the various users who used the checklist in the Sensitivity Testing agreed with one another. How often did one coder choose yes and the other choose no? Normally I would expect to see some form of intercoder reliability test here, especially if the goal of the study is to develop a checklist that can help people assess the quality of the research. If the checklist works well, then two individuals should be able to come to similar conclusions when using it. On a more minor note, in qualitative research, it is not true that “in general, the more participants, the stronger the evidence” (Turoman et al., 2023, p. 14). I would remove this statement or make it clear in the text that it relates specifically to the quantitative aspects of the research.
Reply: Thank you for this diligent point. It is true that we quantitatively assessed the responses from the expert panel. But this is the only real quantitative aspect of this study. The sensitivity testing, now called pilot testing, did not quantitatively assess the procedure. We added the following sentence to the text: Due to the limited number of assessed preprints only a qualitative summary of the results is possible and no statistical analysis of these results is provided.
You are completely correct that the remark in the checklist “There is no one-size-fits-all number of participants that makes a study good, but in general, the more participants, the stronger the evidence.” of course only applies to quantitative studies. But the checklist is preceded by the disclaimer “The checklist works best for studies with human subjects, using primary data (that the researchers collected themselves) or systematic reviews, meta-analyses and re-analyses of primary data. It is not ideally suited to simulation studies (where the data are computer-generated).”, which we have now expanded to state the following in the last sentence: “It is not ideally suited to simulation studies (where the data are computer-generated) or studies that have a qualitative part (e.g., like interview data, or other data that cannot be easily translated into numbers).” We have realised that the disclaimer had not initially made it into the manuscript, only the document in OSF itself, and the web version of the checklist. Thus, we have added the disclaimer into the manuscript now (see Table 2), and the updated version on the OSF.

Second, it strikes me that the checklist was never sensitivity tested by participants in the two target groups, only by members of the study team. Why is that? Given that the research team are experts in this issue and had read and attempted to assess the quality of the checklist multiple times, they would likely be much more adept at applying the checklist than journalists or students using the checklist for the first time. This seems like a major limitation of the study, given that it seeks to provide a checklist that is useful for these groups. You do comment that “both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used” and that “there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own.” (Turoman et al., 2023, p. 15), suggesting that you did examine the similarity in participants’ responses to the checklist questions. If this was the case, including the process used to make the comparison in the methods section, as well as the results of the comparison, would considerably strengthen the manuscript. If you did not compare the users’ responses in any systematic way, I do not think you can make conclusive claims such as, “The implementation round confirmed that the checklist was indeed usable (Turoman et al., 2023, p. 15) or . “Appearing” to understand how to use a checklist is not the same as understanding how to use it, nor using it in a way that is effective and reliable.
Reply: Again, thank you for this valid point. We made sure that the distinction between expert testers in the pilot testing phases (formerly sensitivity testing) is made more clearly, e.g. in the abstract with the adapted sentence “When using both levels of evaluation, the checklist was effective for expert users at discriminating high- from low-quality preprints in a small set of example preprints.” During the workshops we used the classroom response systems (Votamatic and Klicker) provided by our universities to discuss with the audience what we believe are the correct answers to the checklist questions for two specific preprints that we also included in the pilot testing phase. This workshop discussion phase is the reason why we mention that participants appeared to understand the use of the checklist and provided in majority the same answers as ourselves. We adapted the wording of this section as follows: The preliminary implementation round confirmed that the checklist was indeed usable, as, during the workshop discussion, both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used. Further, in our workshops, the live surveys showed that there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own.

Third, I found myself wondering about the choice of preprints used in the testing. How could reusing the same preprints over the various rounds have influenced results, especially given that the evaluation/application of the checklist was performed by the same two members each time (as mentioned above). Similarly, while the proxies you used to identify ‘high’ and ‘low’ quality preprints are creative, and has advantages over other proxy measures of quality (e.g., preprints published in high impact journals, or cited a lot), it seems like the Milestone List is based on the significance/usefulness of the research for addressing the COVID-19 crisis, not the integrity of the methods, data, study design, or reporting. Yet this is just a guess, because looking at the list, it's not clear at all what criteria were used to select the articles—it just says that "Nature highlighted key papers and preprints to help readers keep up with the flood of coronavirus research." What makes a paper "key" in this case? This seems very important given how central this list is to the method of the study. More broadly, I would appreciate a definition of "quality" somewhere earlier in this article, as well as a clear discussion of the limitations associated with the sample (e.g., limited to only three preprints, all of which were about COVID-19, and all of which are biomedical). The checklist seems very inappropriate for assessing the ‘quality’ of research or scholarship in other disciplines, such as the social sciences or humanities. You may want to rephrase the purpose and conclusions of the study to specify the focus on biomedical preprints, rather than on preprints in general.
Reply: Thank you very much for these questions, which in part were also asked by Reviewer 1. In the new Limitations sections we discuss the limitations of the selection of the sample of preprints. We also warn in the Pilot testing section: Due to the limited number of assessed preprints, only a qualitative summary of the results is possible and no statistical analysis of these results is provided. As a consequence the set of chosen preprints is, in our opinion, not completely central to the presented project and we chose not to comment any further on the process of how papers on the Milestone list were selected.
We also refrained from defining “quality” of a preprint or a publication in general, as already mentioned in the manuscript (Footnote 2): As we will explain below, high and low quality was decided based on proxies, and not on objective criteria, as we could not determine a set of such criteria that apply to all preprints across all disciplines that the checklist could possibly be used on. With this, when we refer to ‘high-quality’ and ‘low-quality’ preprints, we do so for brevity, and not because the preprints in question belong to predefined universally agreed upon categories of ‘good’ and ‘bad’. Here, as in response to reviewer 1, we use retraction as an imperfect proxy of low quality.
Finally, we did adapt our conclusion to be narrower: Despite our COVID-19-related motivation, the PRECHECK checklist should be more broadly applicable to scientific works describing biomedical studies with human subjects.

In addition, there is some literature missing from the review that seems worth citing with respect to journalists’ use of preprints, as well as the public’s understanding/reponses to media coverage of preprints. This includes some of my own work (I know, reviewers always do this), but also of other scholars. Here are some citations worth considering. I also encourage the authors to do an additional literature search, since more research may have come online recently of which I am not aware:
[LIST OF REFERENCES]
Reply: Thanks a lot for pointing us to further literature we were unaware of. Some of those papers were published after our first submission. We reviewed the papers for their relevance for our manuscript and added many of them in our introduction/literature review.

Finally, I encourage the authors to make the implications and contributions of the work clearer. From a practical perspective, I found myself wondering, is this actually a checklist for vetting the quality of preprints, or for assessing research quality in general? Given that there are many, many checklists available for assessing the quality of peer reviewed research, answering this question seems very important. Perhaps the contribution is that this checklist could help nonexperts vet research—peer reviewed or not. I do not know this area well enough to know the conceptual, theoretical, or scholarly gaps that have been filled, but I am sure the authors can do so.
Reply: We repeatedly refer to our checklist as a “teaching tool”, as for example in this sentence in the discussion: One important aspect of the checklist to note is its intended use as a teaching tool to guide non-specialist audiences to evaluate preprints more critically by giving them concrete aspects to search for in the manuscript. We additionally decided to delete the last sentence in our Conclusions section (With this, and the fact that our target audience could use the checklist, we believe that the checklist has great potential to help guide non-scientists’ understanding of scientific content, especially in preprint form.) because this claim is not actually supported by our project.

I know that these are a lot of comments and concerns, but I have included them not to be rude or overly critical but because I believe addressing will make the study better able to contribute to an important issue that I care about deeply. I hope the authors find them helpful and are not discouraged from pursuing future research about journalists’ use of preprints. We need more of it!
Reply: We greatly appreciate your comments and were able to integrate them in our manuscript. We are convinced that the changes we made thanks to your comments improved the clarity and relevance of our manuscript.
Point-by-point response to Reviewer 2 - Alice Fleerackers
Thank you for giving me the opportunity to review this study. The topic is timely and relevant, and addresses a question I have spent a lot of time thinking about: How do you help people outside of academia vet the quality of preprint research?
The study uses the DELPHI method to develop a checklist that ‘scientifically literate non-experts’ can use to assess the quality of preprint research. That is, the researchers developed an initial checklist, that was then iterated and refined through multiple rounds of expert feedback and testing. The experts who contributed feedback included academics (including those on the study team but also researchers with expertise in preprints, peer review, meta research, etc), as well as science journalists and students of medicine and psychology.
The strengths of this study include the use of pre-defined protocols and quality reporting checklists (e.g., SRQR, Delphi method) and the choice to explicitly invite feedback from target users. I found that overall, the study was well-written. It provides links to iterations of the checklist, as well as data and protocols, which improves the transparency of the research.
However, I have several concerns with the manuscript — some of which are major:

First, I am very surprised that this did not require ethics review. While it does appear to fall outside of the scope of the university’s ethics guidelines, it was not clear from the manuscript whether participants were told their data would be used in a study. Rather students and journalists agreed to participate in a ‘workshop.’ Were they told their data would be used in reports/research? It would be helpful if the authors could clarity.

Reply: Thank you for your concern, we do care about transparency very much and paid attention to this aspect throughout the study. The situation was different for the student workshops and the journalist workshops. For the student workshops we write in the text and this corresponds to the actual process: “The responses were only analysed inasmuch as percentages of ‘yes’, ‘no’, and ‘partly/maybe’ responses were automatically generated by the respective platforms based on the responses entered by the students, during the class, for the purpose of in-class discussion. These responses were not recorded outside of this platform (e.g., on any project team member’s computer), nor were they subject to any formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way.” Therefore, the student workshops are simply described in this article to showcase how the PRECHECK checklist could be used in practice; the responses that the students gave as part of the workshop were not used to further inform the checklist. Hence, they were not told that their data would be used in a study because there was no intention to do so.
For the journalist workshops, our intention was the same as above: we wanted to use the PRECHECK checklist purely for education, without gathering data for the project. But the journalists were informed through our presentation that we were the developers of the checklist, and some participants offered unsolicited feedback. In this situation it was clear to them that their feedback could be incorporated into a new version of the checklist and they did indeed give it for this purpose. In the article text we now write (addition from this revision in italics): “In the workshop for journalists, some participants offered unsolicited yet helpful feedback, which was incorporated into the finalised version of the checklist. These participants were aware that their feedback would be used for this purpose through the presentation given during the workshop.”
Additionally we extended the section Ethical considerations with some more explanations of the facts.

Second, I have several concerns about the participants. While both science students and science journalists may be considered scientifically literate non-experts, these user groups have very different needs and likely very different understandings of academic publishing, research, and peer review—as well as different challenges and uses with respect to preprints. In addition, there are many other types of scientifically literate non-experts who could have been recruited to this study, such as policymakers, high school science teachers, ordinary people who studied science in the past, etc. A clear rationale for why you chose these two groups would considerably strengthen the manuscript.

Reply: Thank you for this valid point. First we would like to mention that the section describing these workshops was situated in an incorrect location in the article. We have now moved it to the correct placement in the Methods Section.
We chose both groups first out of convenience, as we had direct access to them and an inherent interest in reaching them with our content. Initially our project was motivated by the Covid pandemic, during which results from preprints ended up as highlights in news outlets, and thus we wanted specifically to address science journalists. We are aware of the fact that Science journalists are actually a group of people who already have a lot of experience with vetting scientific literature. In contrast, our second group, the students, only have limited knowledge on the publishing system etc. Both the students at the University of Zurich and Geneva were at a stage in their studies where they needed to get used to using literature, the lectures in which our workshops were held had, in part, the purpose of introducing them to that. Consequently, given that they will have to read scientific papers as well as preprints for many of their future lectures and projects, we aimed at giving them a tool to help quality check what they plan on reading. We have added more explanations of these choices in the section Study Design. Specifically, we state “The chosen setup of the preliminary implementation phase does not allow a systematic validation of the checklist, especially since the target group was chosen mainly for convenience.”

Relatedly, these groups (students, journalists) are incredibly diverse and there may be very different levels of science literacy within each group. For example, journalists in many beats use research in their reporting (not just those specialized in science), but some use it more frequently than others and some have professional training in a STEM field (e.g., they did a science degree before becoming journalists). Those who use research frequently, and/or have a research degree, likely have more in-depth knowledge than those who only occasionally use research or don’t have a STEM background. Similarly, some of the students in these two classes may have already completed a degree in STEM, or worked as researchers, while others may not have. Yet you provide no information about participants’ level of science literacy, previous experience with research, or education. Given that your target population is ‘scientifically literate non-experts’ then I would expect some assessment of science literacy, as well as a check that none of them are really ‘experts,’ to be a requirement when sampling participants. More broadly, it would be useful to know a bit more about both sets of participants (students and journalists), especially their educational backgrounds and level of experience with academic research, as this would provide a better understanding of who this checklist would be most useful for.
Reply: We understand why this information could be of interest, but since we did not collect any data on the workshop participants we cannot provide this information. The text states that “The responses were only analysed inasmuch as percentages of ‘yes’, ‘no’, and ‘partly/maybe’ responses were automatically generated by the respective platforms based on the responses entered by the students, during the class, for the purpose of in-class discussion. These responses were not recorded outside of this platform (e.g., on any project team member’s computer), nor were they subject to any formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way. We did not ask any question about the workshop participants’ educational background or level of experience with academic research.” We thus added the issues raised here to the new Limitations section of the manuscript.

Relatedly, very little information is provided about how the science journalists who were invited to participate in the workshops were recruited. How did you define ‘science journalist’? Did you include freelancers or staff reporters only? What about general news reporters who sometimes cover science? Did you only look at traditional, legacy journalism outlets or did you also consider newer forms of journalism? Providing more clarity around the inclusion or exclusion criteria for participating in the study, and the population of journalists from which you sampled, would improve the quality of the methods section.
Reply: Thank you for pointing this out, we now tried to be clearer in the text. The first workshop for science journalists was organised for science journalists employed at the Universities of Geneva (i.e. the members of the respective Communication Departments). The parenthesis has been added to the text during this revision and explains that we again used a “convenience sample” within our own institutions. Second, we were invited to present at a workshop for science journalists organised by a third party, the Swiss Association of Science Journalists, for which the recruitment was not in our hands. As explained on the association’s website, ordinary members of the association must attest that they work as journalists for an independent media outlet and their work should primarily focus on scientific, medical, technical or science policy issues.
We would like to stress again that none of the workshops were intended to collect data for a validation of the checklist. All events were, what we now call, preliminary implementation events, where we showcased the checklist for use in a teaching setting.

On a more minor note, you mention getting input on the checklist from “experts across four topic groups: research methodology (14 invitations), preprints (15 invitations), journal editors (15 invitations), and science journalists (10 invitations).” The rest of the paragraph explains who these people are in more detail, but at first reading it was not clear whether you were inviting researchers who study science journalists or the journalists themselves. Similarly, did you invite preprint server staff, preprint authors, or researchers who study preprints? I would adapt the language a bit here for clarity.
Reply: Thanks a lot for this comment which shows that more information is needed. We realised the sentence you also copied presenting the four topics was confusing, since we used jobs (editors and journalists) to refer to topics. Hence, this was changed now to journal publishing (15 invitations), and science journalism (10 invitations). We also added another clarifying sentence: Thus, we invited science journalists, journal editors, as well as researchers whose research focuses on research methodology and preprints.

In addition, while I am not an expert in the DELPHI method, I do have some concerns about how it was implemented, given the conclusions that were drawn.

First, this is a mixed method study, not a completely qualitative one, given that you gathered Likert-scale data (which was used to perform quantitative tests), as well as open-ended feedback. In addition, the Sensitivity Testing relied on binary data (i.e., yes/no) responses, which is quantitative. While I commend the authors for using Standards for Reporting Qualitative Research (SRQR), some standards for quantitative research have not been met. Specifically, there was no information about the degree to which the various users who used the checklist in the Sensitivity Testing agreed with one another. How often did one coder choose yes and the other choose no? Normally I would expect to see some form of intercoder reliability test here, especially if the goal of the study is to develop a checklist that can help people assess the quality of the research. If the checklist works well, then two individuals should be able to come to similar conclusions when using it. On a more minor note, in qualitative research, it is not true that “in general, the more participants, the stronger the evidence” (Turoman et al., 2023, p. 14). I would remove this statement or make it clear in the text that it relates specifically to the quantitative aspects of the research.
Reply: Thank you for this diligent point. It is true that we quantitatively assessed the responses from the expert panel. But this is the only real quantitative aspect of this study. The sensitivity testing, now called pilot testing, did not quantitatively assess the procedure. We added the following sentence to the text: Due to the limited number of assessed preprints only a qualitative summary of the results is possible and no statistical analysis of these results is provided.
You are completely correct that the remark in the checklist “There is no one-size-fits-all number of participants that makes a study good, but in general, the more participants, the stronger the evidence.” of course only applies to quantitative studies. But the checklist is preceded by the disclaimer “The checklist works best for studies with human subjects, using primary data (that the researchers collected themselves) or systematic reviews, meta-analyses and re-analyses of primary data. It is not ideally suited to simulation studies (where the data are computer-generated).”, which we have now expanded to state the following in the last sentence: “It is not ideally suited to simulation studies (where the data are computer-generated) or studies that have a qualitative part (e.g., like interview data, or other data that cannot be easily translated into numbers).” We have realised that the disclaimer had not initially made it into the manuscript, only the document in OSF itself, and the web version of the checklist. Thus, we have added the disclaimer into the manuscript now (see Table 2), and the updated version on the OSF.

Second, it strikes me that the checklist was never sensitivity tested by participants in the two target groups, only by members of the study team. Why is that? Given that the research team are experts in this issue and had read and attempted to assess the quality of the checklist multiple times, they would likely be much more adept at applying the checklist than journalists or students using the checklist for the first time. This seems like a major limitation of the study, given that it seeks to provide a checklist that is useful for these groups. You do comment that “both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used” and that “there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own.” (Turoman et al., 2023, p. 15), suggesting that you did examine the similarity in participants’ responses to the checklist questions. If this was the case, including the process used to make the comparison in the methods section, as well as the results of the comparison, would considerably strengthen the manuscript. If you did not compare the users’ responses in any systematic way, I do not think you can make conclusive claims such as, “The implementation round confirmed that the checklist was indeed usable (Turoman et al., 2023, p. 15) or . “Appearing” to understand how to use a checklist is not the same as understanding how to use it, nor using it in a way that is effective and reliable.
Reply: Again, thank you for this valid point. We made sure that the distinction between expert testers in the pilot testing phases (formerly sensitivity testing) is made more clearly, e.g. in the abstract with the adapted sentence “When using both levels of evaluation, the checklist was effective for expert users at discriminating high- from low-quality preprints in a small set of example preprints.” During the workshops we used the classroom response systems (Votamatic and Klicker) provided by our universities to discuss with the audience what we believe are the correct answers to the checklist questions for two specific preprints that we also included in the pilot testing phase. This workshop discussion phase is the reason why we mention that participants appeared to understand the use of the checklist and provided in majority the same answers as ourselves. We adapted the wording of this section as follows: The preliminary implementation round confirmed that the checklist was indeed usable, as, during the workshop discussion, both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used. Further, in our workshops, the live surveys showed that there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own.

Third, I found myself wondering about the choice of preprints used in the testing. How could reusing the same preprints over the various rounds have influenced results, especially given that the evaluation/application of the checklist was performed by the same two members each time (as mentioned above). Similarly, while the proxies you used to identify ‘high’ and ‘low’ quality preprints are creative, and has advantages over other proxy measures of quality (e.g., preprints published in high impact journals, or cited a lot), it seems like the Milestone List is based on the significance/usefulness of the research for addressing the COVID-19 crisis, not the integrity of the methods, data, study design, or reporting. Yet this is just a guess, because looking at the list, it's not clear at all what criteria were used to select the articles—it just says that "Nature highlighted key papers and preprints to help readers keep up with the flood of coronavirus research." What makes a paper "key" in this case? This seems very important given how central this list is to the method of the study. More broadly, I would appreciate a definition of "quality" somewhere earlier in this article, as well as a clear discussion of the limitations associated with the sample (e.g., limited to only three preprints, all of which were about COVID-19, and all of which are biomedical). The checklist seems very inappropriate for assessing the ‘quality’ of research or scholarship in other disciplines, such as the social sciences or humanities. You may want to rephrase the purpose and conclusions of the study to specify the focus on biomedical preprints, rather than on preprints in general.
Reply: Thank you very much for these questions, which in part were also asked by Reviewer 1. In the new Limitations sections we discuss the limitations of the selection of the sample of preprints. We also warn in the Pilot testing section: Due to the limited number of assessed preprints, only a qualitative summary of the results is possible and no statistical analysis of these results is provided. As a consequence the set of chosen preprints is, in our opinion, not completely central to the presented project and we chose not to comment any further on the process of how papers on the Milestone list were selected.
We also refrained from defining “quality” of a preprint or a publication in general, as already mentioned in the manuscript (Footnote 2): As we will explain below, high and low quality was decided based on proxies, and not on objective criteria, as we could not determine a set of such criteria that apply to all preprints across all disciplines that the checklist could possibly be used on. With this, when we refer to ‘high-quality’ and ‘low-quality’ preprints, we do so for brevity, and not because the preprints in question belong to predefined universally agreed upon categories of ‘good’ and ‘bad’. Here, as in response to reviewer 1, we use retraction as an imperfect proxy of low quality.
Finally, we did adapt our conclusion to be narrower: Despite our COVID-19-related motivation, the PRECHECK checklist should be more broadly applicable to scientific works describing biomedical studies with human subjects.

In addition, there is some literature missing from the review that seems worth citing with respect to journalists’ use of preprints, as well as the public’s understanding/reponses to media coverage of preprints. This includes some of my own work (I know, reviewers always do this), but also of other scholars. Here are some citations worth considering. I also encourage the authors to do an additional literature search, since more research may have come online recently of which I am not aware:
[LIST OF REFERENCES]
Reply: Thanks a lot for pointing us to further literature we were unaware of. Some of those papers were published after our first submission. We reviewed the papers for their relevance for our manuscript and added many of them in our introduction/literature review.

Finally, I encourage the authors to make the implications and contributions of the work clearer. From a practical perspective, I found myself wondering, is this actually a checklist for vetting the quality of preprints, or for assessing research quality in general? Given that there are many, many checklists available for assessing the quality of peer reviewed research, answering this question seems very important. Perhaps the contribution is that this checklist could help nonexperts vet research—peer reviewed or not. I do not know this area well enough to know the conceptual, theoretical, or scholarly gaps that have been filled, but I am sure the authors can do so.
Reply: We repeatedly refer to our checklist as a “teaching tool”, as for example in this sentence in the discussion: One important aspect of the checklist to note is its intended use as a teaching tool to guide non-specialist audiences to evaluate preprints more critically by giving them concrete aspects to search for in the manuscript. We additionally decided to delete the last sentence in our Conclusions section (With this, and the fact that our target audience could use the checklist, we believe that the checklist has great potential to help guide non-scientists’ understanding of scientific content, especially in preprint form.) because this claim is not actually supported by our project.

I know that these are a lot of comments and concerns, but I have included them not to be rude or overly critical but because I believe addressing will make the study better able to contribute to an important issue that I care about deeply. I hope the authors find them helpful and are not discouraged from pursuing future research about journalists’ use of preprints. We need more of it!
Reply: We greatly appreciate your comments and were able to integrate them in our manuscript. We are convinced that the changes we made thanks to your comments improved the clarity and relevance of our manuscript.
Competing Interests: There are no competing interests to report. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 22 Mar 2024

Nora Turoman, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland

22 Mar 2024

Author Response
Point-by-point response to Reviewer 2 - Alice Fleerackers
Thank you for giving me the opportunity to review this study. The topic is timely and relevant, and addresses a question I ... Continue reading
Point-by-point response to Reviewer 2 - Alice Fleerackers
Thank you for giving me the opportunity to review this study. The topic is timely and relevant, and addresses a question I have spent a lot of time thinking about: How do you help people outside of academia vet the quality of preprint research?
The study uses the DELPHI method to develop a checklist that ‘scientifically literate non-experts’ can use to assess the quality of preprint research. That is, the researchers developed an initial checklist, that was then iterated and refined through multiple rounds of expert feedback and testing. The experts who contributed feedback included academics (including those on the study team but also researchers with expertise in preprints, peer review, meta research, etc), as well as science journalists and students of medicine and psychology.
The strengths of this study include the use of pre-defined protocols and quality reporting checklists (e.g., SRQR, Delphi method) and the choice to explicitly invite feedback from target users. I found that overall, the study was well-written. It provides links to iterations of the checklist, as well as data and protocols, which improves the transparency of the research.
However, I have several concerns with the manuscript — some of which are major:

First, I am very surprised that this did not require ethics review. While it does appear to fall outside of the scope of the university’s ethics guidelines, it was not clear from the manuscript whether participants were told their data would be used in a study. Rather students and journalists agreed to participate in a ‘workshop.’ Were they told their data would be used in reports/research? It would be helpful if the authors could clarity.

Reply: Thank you for your concern, we do care about transparency very much and paid attention to this aspect throughout the study. The situation was different for the student workshops and the journalist workshops. For the student workshops we write in the text and this corresponds to the actual process: “The responses were only analysed inasmuch as percentages of ‘yes’, ‘no’, and ‘partly/maybe’ responses were automatically generated by the respective platforms based on the responses entered by the students, during the class, for the purpose of in-class discussion. These responses were not recorded outside of this platform (e.g., on any project team member’s computer), nor were they subject to any formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way.” Therefore, the student workshops are simply described in this article to showcase how the PRECHECK checklist could be used in practice; the responses that the students gave as part of the workshop were not used to further inform the checklist. Hence, they were not told that their data would be used in a study because there was no intention to do so.
For the journalist workshops, our intention was the same as above: we wanted to use the PRECHECK checklist purely for education, without gathering data for the project. But the journalists were informed through our presentation that we were the developers of the checklist, and some participants offered unsolicited feedback. In this situation it was clear to them that their feedback could be incorporated into a new version of the checklist and they did indeed give it for this purpose. In the article text we now write (addition from this revision in italics): “In the workshop for journalists, some participants offered unsolicited yet helpful feedback, which was incorporated into the finalised version of the checklist. These participants were aware that their feedback would be used for this purpose through the presentation given during the workshop.”
Additionally we extended the section Ethical considerations with some more explanations of the facts.

Second, I have several concerns about the participants. While both science students and science journalists may be considered scientifically literate non-experts, these user groups have very different needs and likely very different understandings of academic publishing, research, and peer review—as well as different challenges and uses with respect to preprints. In addition, there are many other types of scientifically literate non-experts who could have been recruited to this study, such as policymakers, high school science teachers, ordinary people who studied science in the past, etc. A clear rationale for why you chose these two groups would considerably strengthen the manuscript.

Reply: Thank you for this valid point. First we would like to mention that the section describing these workshops was situated in an incorrect location in the article. We have now moved it to the correct placement in the Methods Section.
We chose both groups first out of convenience, as we had direct access to them and an inherent interest in reaching them with our content. Initially our project was motivated by the Covid pandemic, during which results from preprints ended up as highlights in news outlets, and thus we wanted specifically to address science journalists. We are aware of the fact that Science journalists are actually a group of people who already have a lot of experience with vetting scientific literature. In contrast, our second group, the students, only have limited knowledge on the publishing system etc. Both the students at the University of Zurich and Geneva were at a stage in their studies where they needed to get used to using literature, the lectures in which our workshops were held had, in part, the purpose of introducing them to that. Consequently, given that they will have to read scientific papers as well as preprints for many of their future lectures and projects, we aimed at giving them a tool to help quality check what they plan on reading. We have added more explanations of these choices in the section Study Design. Specifically, we state “The chosen setup of the preliminary implementation phase does not allow a systematic validation of the checklist, especially since the target group was chosen mainly for convenience.”

Relatedly, these groups (students, journalists) are incredibly diverse and there may be very different levels of science literacy within each group. For example, journalists in many beats use research in their reporting (not just those specialized in science), but some use it more frequently than others and some have professional training in a STEM field (e.g., they did a science degree before becoming journalists). Those who use research frequently, and/or have a research degree, likely have more in-depth knowledge than those who only occasionally use research or don’t have a STEM background. Similarly, some of the students in these two classes may have already completed a degree in STEM, or worked as researchers, while others may not have. Yet you provide no information about participants’ level of science literacy, previous experience with research, or education. Given that your target population is ‘scientifically literate non-experts’ then I would expect some assessment of science literacy, as well as a check that none of them are really ‘experts,’ to be a requirement when sampling participants. More broadly, it would be useful to know a bit more about both sets of participants (students and journalists), especially their educational backgrounds and level of experience with academic research, as this would provide a better understanding of who this checklist would be most useful for.
Reply: We understand why this information could be of interest, but since we did not collect any data on the workshop participants we cannot provide this information. The text states that “The responses were only analysed inasmuch as percentages of ‘yes’, ‘no’, and ‘partly/maybe’ responses were automatically generated by the respective platforms based on the responses entered by the students, during the class, for the purpose of in-class discussion. These responses were not recorded outside of this platform (e.g., on any project team member’s computer), nor were they subject to any formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way. We did not ask any question about the workshop participants’ educational background or level of experience with academic research.” We thus added the issues raised here to the new Limitations section of the manuscript.

Relatedly, very little information is provided about how the science journalists who were invited to participate in the workshops were recruited. How did you define ‘science journalist’? Did you include freelancers or staff reporters only? What about general news reporters who sometimes cover science? Did you only look at traditional, legacy journalism outlets or did you also consider newer forms of journalism? Providing more clarity around the inclusion or exclusion criteria for participating in the study, and the population of journalists from which you sampled, would improve the quality of the methods section.
Reply: Thank you for pointing this out, we now tried to be clearer in the text. The first workshop for science journalists was organised for science journalists employed at the Universities of Geneva (i.e. the members of the respective Communication Departments). The parenthesis has been added to the text during this revision and explains that we again used a “convenience sample” within our own institutions. Second, we were invited to present at a workshop for science journalists organised by a third party, the Swiss Association of Science Journalists, for which the recruitment was not in our hands. As explained on the association’s website, ordinary members of the association must attest that they work as journalists for an independent media outlet and their work should primarily focus on scientific, medical, technical or science policy issues.
We would like to stress again that none of the workshops were intended to collect data for a validation of the checklist. All events were, what we now call, preliminary implementation events, where we showcased the checklist for use in a teaching setting.

On a more minor note, you mention getting input on the checklist from “experts across four topic groups: research methodology (14 invitations), preprints (15 invitations), journal editors (15 invitations), and science journalists (10 invitations).” The rest of the paragraph explains who these people are in more detail, but at first reading it was not clear whether you were inviting researchers who study science journalists or the journalists themselves. Similarly, did you invite preprint server staff, preprint authors, or researchers who study preprints? I would adapt the language a bit here for clarity.
Reply: Thanks a lot for this comment which shows that more information is needed. We realised the sentence you also copied presenting the four topics was confusing, since we used jobs (editors and journalists) to refer to topics. Hence, this was changed now to journal publishing (15 invitations), and science journalism (10 invitations). We also added another clarifying sentence: Thus, we invited science journalists, journal editors, as well as researchers whose research focuses on research methodology and preprints.

In addition, while I am not an expert in the DELPHI method, I do have some concerns about how it was implemented, given the conclusions that were drawn.

First, this is a mixed method study, not a completely qualitative one, given that you gathered Likert-scale data (which was used to perform quantitative tests), as well as open-ended feedback. In addition, the Sensitivity Testing relied on binary data (i.e., yes/no) responses, which is quantitative. While I commend the authors for using Standards for Reporting Qualitative Research (SRQR), some standards for quantitative research have not been met. Specifically, there was no information about the degree to which the various users who used the checklist in the Sensitivity Testing agreed with one another. How often did one coder choose yes and the other choose no? Normally I would expect to see some form of intercoder reliability test here, especially if the goal of the study is to develop a checklist that can help people assess the quality of the research. If the checklist works well, then two individuals should be able to come to similar conclusions when using it. On a more minor note, in qualitative research, it is not true that “in general, the more participants, the stronger the evidence” (Turoman et al., 2023, p. 14). I would remove this statement or make it clear in the text that it relates specifically to the quantitative aspects of the research.
Reply: Thank you for this diligent point. It is true that we quantitatively assessed the responses from the expert panel. But this is the only real quantitative aspect of this study. The sensitivity testing, now called pilot testing, did not quantitatively assess the procedure. We added the following sentence to the text: Due to the limited number of assessed preprints only a qualitative summary of the results is possible and no statistical analysis of these results is provided.
You are completely correct that the remark in the checklist “There is no one-size-fits-all number of participants that makes a study good, but in general, the more participants, the stronger the evidence.” of course only applies to quantitative studies. But the checklist is preceded by the disclaimer “The checklist works best for studies with human subjects, using primary data (that the researchers collected themselves) or systematic reviews, meta-analyses and re-analyses of primary data. It is not ideally suited to simulation studies (where the data are computer-generated).”, which we have now expanded to state the following in the last sentence: “It is not ideally suited to simulation studies (where the data are computer-generated) or studies that have a qualitative part (e.g., like interview data, or other data that cannot be easily translated into numbers).” We have realised that the disclaimer had not initially made it into the manuscript, only the document in OSF itself, and the web version of the checklist. Thus, we have added the disclaimer into the manuscript now (see Table 2), and the updated version on the OSF.

Second, it strikes me that the checklist was never sensitivity tested by participants in the two target groups, only by members of the study team. Why is that? Given that the research team are experts in this issue and had read and attempted to assess the quality of the checklist multiple times, they would likely be much more adept at applying the checklist than journalists or students using the checklist for the first time. This seems like a major limitation of the study, given that it seeks to provide a checklist that is useful for these groups. You do comment that “both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used” and that “there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own.” (Turoman et al., 2023, p. 15), suggesting that you did examine the similarity in participants’ responses to the checklist questions. If this was the case, including the process used to make the comparison in the methods section, as well as the results of the comparison, would considerably strengthen the manuscript. If you did not compare the users’ responses in any systematic way, I do not think you can make conclusive claims such as, “The implementation round confirmed that the checklist was indeed usable (Turoman et al., 2023, p. 15) or . “Appearing” to understand how to use a checklist is not the same as understanding how to use it, nor using it in a way that is effective and reliable.
Reply: Again, thank you for this valid point. We made sure that the distinction between expert testers in the pilot testing phases (formerly sensitivity testing) is made more clearly, e.g. in the abstract with the adapted sentence “When using both levels of evaluation, the checklist was effective for expert users at discriminating high- from low-quality preprints in a small set of example preprints.” During the workshops we used the classroom response systems (Votamatic and Klicker) provided by our universities to discuss with the audience what we believe are the correct answers to the checklist questions for two specific preprints that we also included in the pilot testing phase. This workshop discussion phase is the reason why we mention that participants appeared to understand the use of the checklist and provided in majority the same answers as ourselves. We adapted the wording of this section as follows: The preliminary implementation round confirmed that the checklist was indeed usable, as, during the workshop discussion, both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used. Further, in our workshops, the live surveys showed that there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own.

Third, I found myself wondering about the choice of preprints used in the testing. How could reusing the same preprints over the various rounds have influenced results, especially given that the evaluation/application of the checklist was performed by the same two members each time (as mentioned above). Similarly, while the proxies you used to identify ‘high’ and ‘low’ quality preprints are creative, and has advantages over other proxy measures of quality (e.g., preprints published in high impact journals, or cited a lot), it seems like the Milestone List is based on the significance/usefulness of the research for addressing the COVID-19 crisis, not the integrity of the methods, data, study design, or reporting. Yet this is just a guess, because looking at the list, it's not clear at all what criteria were used to select the articles—it just says that "Nature highlighted key papers and preprints to help readers keep up with the flood of coronavirus research." What makes a paper "key" in this case? This seems very important given how central this list is to the method of the study. More broadly, I would appreciate a definition of "quality" somewhere earlier in this article, as well as a clear discussion of the limitations associated with the sample (e.g., limited to only three preprints, all of which were about COVID-19, and all of which are biomedical). The checklist seems very inappropriate for assessing the ‘quality’ of research or scholarship in other disciplines, such as the social sciences or humanities. You may want to rephrase the purpose and conclusions of the study to specify the focus on biomedical preprints, rather than on preprints in general.
Reply: Thank you very much for these questions, which in part were also asked by Reviewer 1. In the new Limitations sections we discuss the limitations of the selection of the sample of preprints. We also warn in the Pilot testing section: Due to the limited number of assessed preprints, only a qualitative summary of the results is possible and no statistical analysis of these results is provided. As a consequence the set of chosen preprints is, in our opinion, not completely central to the presented project and we chose not to comment any further on the process of how papers on the Milestone list were selected.
We also refrained from defining “quality” of a preprint or a publication in general, as already mentioned in the manuscript (Footnote 2): As we will explain below, high and low quality was decided based on proxies, and not on objective criteria, as we could not determine a set of such criteria that apply to all preprints across all disciplines that the checklist could possibly be used on. With this, when we refer to ‘high-quality’ and ‘low-quality’ preprints, we do so for brevity, and not because the preprints in question belong to predefined universally agreed upon categories of ‘good’ and ‘bad’. Here, as in response to reviewer 1, we use retraction as an imperfect proxy of low quality.
Finally, we did adapt our conclusion to be narrower: Despite our COVID-19-related motivation, the PRECHECK checklist should be more broadly applicable to scientific works describing biomedical studies with human subjects.

In addition, there is some literature missing from the review that seems worth citing with respect to journalists’ use of preprints, as well as the public’s understanding/reponses to media coverage of preprints. This includes some of my own work (I know, reviewers always do this), but also of other scholars. Here are some citations worth considering. I also encourage the authors to do an additional literature search, since more research may have come online recently of which I am not aware:
[LIST OF REFERENCES]
Reply: Thanks a lot for pointing us to further literature we were unaware of. Some of those papers were published after our first submission. We reviewed the papers for their relevance for our manuscript and added many of them in our introduction/literature review.

Finally, I encourage the authors to make the implications and contributions of the work clearer. From a practical perspective, I found myself wondering, is this actually a checklist for vetting the quality of preprints, or for assessing research quality in general? Given that there are many, many checklists available for assessing the quality of peer reviewed research, answering this question seems very important. Perhaps the contribution is that this checklist could help nonexperts vet research—peer reviewed or not. I do not know this area well enough to know the conceptual, theoretical, or scholarly gaps that have been filled, but I am sure the authors can do so.
Reply: We repeatedly refer to our checklist as a “teaching tool”, as for example in this sentence in the discussion: One important aspect of the checklist to note is its intended use as a teaching tool to guide non-specialist audiences to evaluate preprints more critically by giving them concrete aspects to search for in the manuscript. We additionally decided to delete the last sentence in our Conclusions section (With this, and the fact that our target audience could use the checklist, we believe that the checklist has great potential to help guide non-scientists’ understanding of scientific content, especially in preprint form.) because this claim is not actually supported by our project.

I know that these are a lot of comments and concerns, but I have included them not to be rude or overly critical but because I believe addressing will make the study better able to contribute to an important issue that I care about deeply. I hope the authors find them helpful and are not discouraged from pursuing future research about journalists’ use of preprints. We need more of it!
Reply: We greatly appreciate your comments and were able to integrate them in our manuscript. We are convinced that the changes we made thanks to your comments improved the clarity and relevance of our manuscript.
Point-by-point response to Reviewer 2 - Alice Fleerackers
Thank you for giving me the opportunity to review this study. The topic is timely and relevant, and addresses a question I have spent a lot of time thinking about: How do you help people outside of academia vet the quality of preprint research?
The study uses the DELPHI method to develop a checklist that ‘scientifically literate non-experts’ can use to assess the quality of preprint research. That is, the researchers developed an initial checklist, that was then iterated and refined through multiple rounds of expert feedback and testing. The experts who contributed feedback included academics (including those on the study team but also researchers with expertise in preprints, peer review, meta research, etc), as well as science journalists and students of medicine and psychology.
The strengths of this study include the use of pre-defined protocols and quality reporting checklists (e.g., SRQR, Delphi method) and the choice to explicitly invite feedback from target users. I found that overall, the study was well-written. It provides links to iterations of the checklist, as well as data and protocols, which improves the transparency of the research.
However, I have several concerns with the manuscript — some of which are major:

First, I am very surprised that this did not require ethics review. While it does appear to fall outside of the scope of the university’s ethics guidelines, it was not clear from the manuscript whether participants were told their data would be used in a study. Rather students and journalists agreed to participate in a ‘workshop.’ Were they told their data would be used in reports/research? It would be helpful if the authors could clarity.

Reply: Thank you for your concern, we do care about transparency very much and paid attention to this aspect throughout the study. The situation was different for the student workshops and the journalist workshops. For the student workshops we write in the text and this corresponds to the actual process: “The responses were only analysed inasmuch as percentages of ‘yes’, ‘no’, and ‘partly/maybe’ responses were automatically generated by the respective platforms based on the responses entered by the students, during the class, for the purpose of in-class discussion. These responses were not recorded outside of this platform (e.g., on any project team member’s computer), nor were they subject to any formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way.” Therefore, the student workshops are simply described in this article to showcase how the PRECHECK checklist could be used in practice; the responses that the students gave as part of the workshop were not used to further inform the checklist. Hence, they were not told that their data would be used in a study because there was no intention to do so.
For the journalist workshops, our intention was the same as above: we wanted to use the PRECHECK checklist purely for education, without gathering data for the project. But the journalists were informed through our presentation that we were the developers of the checklist, and some participants offered unsolicited feedback. In this situation it was clear to them that their feedback could be incorporated into a new version of the checklist and they did indeed give it for this purpose. In the article text we now write (addition from this revision in italics): “In the workshop for journalists, some participants offered unsolicited yet helpful feedback, which was incorporated into the finalised version of the checklist. These participants were aware that their feedback would be used for this purpose through the presentation given during the workshop.”
Additionally we extended the section Ethical considerations with some more explanations of the facts.

Second, I have several concerns about the participants. While both science students and science journalists may be considered scientifically literate non-experts, these user groups have very different needs and likely very different understandings of academic publishing, research, and peer review—as well as different challenges and uses with respect to preprints. In addition, there are many other types of scientifically literate non-experts who could have been recruited to this study, such as policymakers, high school science teachers, ordinary people who studied science in the past, etc. A clear rationale for why you chose these two groups would considerably strengthen the manuscript.

Reply: Thank you for this valid point. First we would like to mention that the section describing these workshops was situated in an incorrect location in the article. We have now moved it to the correct placement in the Methods Section.
We chose both groups first out of convenience, as we had direct access to them and an inherent interest in reaching them with our content. Initially our project was motivated by the Covid pandemic, during which results from preprints ended up as highlights in news outlets, and thus we wanted specifically to address science journalists. We are aware of the fact that Science journalists are actually a group of people who already have a lot of experience with vetting scientific literature. In contrast, our second group, the students, only have limited knowledge on the publishing system etc. Both the students at the University of Zurich and Geneva were at a stage in their studies where they needed to get used to using literature, the lectures in which our workshops were held had, in part, the purpose of introducing them to that. Consequently, given that they will have to read scientific papers as well as preprints for many of their future lectures and projects, we aimed at giving them a tool to help quality check what they plan on reading. We have added more explanations of these choices in the section Study Design. Specifically, we state “The chosen setup of the preliminary implementation phase does not allow a systematic validation of the checklist, especially since the target group was chosen mainly for convenience.”

Relatedly, these groups (students, journalists) are incredibly diverse and there may be very different levels of science literacy within each group. For example, journalists in many beats use research in their reporting (not just those specialized in science), but some use it more frequently than others and some have professional training in a STEM field (e.g., they did a science degree before becoming journalists). Those who use research frequently, and/or have a research degree, likely have more in-depth knowledge than those who only occasionally use research or don’t have a STEM background. Similarly, some of the students in these two classes may have already completed a degree in STEM, or worked as researchers, while others may not have. Yet you provide no information about participants’ level of science literacy, previous experience with research, or education. Given that your target population is ‘scientifically literate non-experts’ then I would expect some assessment of science literacy, as well as a check that none of them are really ‘experts,’ to be a requirement when sampling participants. More broadly, it would be useful to know a bit more about both sets of participants (students and journalists), especially their educational backgrounds and level of experience with academic research, as this would provide a better understanding of who this checklist would be most useful for.
Reply: We understand why this information could be of interest, but since we did not collect any data on the workshop participants we cannot provide this information. The text states that “The responses were only analysed inasmuch as percentages of ‘yes’, ‘no’, and ‘partly/maybe’ responses were automatically generated by the respective platforms based on the responses entered by the students, during the class, for the purpose of in-class discussion. These responses were not recorded outside of this platform (e.g., on any project team member’s computer), nor were they subject to any formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way. We did not ask any question about the workshop participants’ educational background or level of experience with academic research.” We thus added the issues raised here to the new Limitations section of the manuscript.

Relatedly, very little information is provided about how the science journalists who were invited to participate in the workshops were recruited. How did you define ‘science journalist’? Did you include freelancers or staff reporters only? What about general news reporters who sometimes cover science? Did you only look at traditional, legacy journalism outlets or did you also consider newer forms of journalism? Providing more clarity around the inclusion or exclusion criteria for participating in the study, and the population of journalists from which you sampled, would improve the quality of the methods section.
Reply: Thank you for pointing this out, we now tried to be clearer in the text. The first workshop for science journalists was organised for science journalists employed at the Universities of Geneva (i.e. the members of the respective Communication Departments). The parenthesis has been added to the text during this revision and explains that we again used a “convenience sample” within our own institutions. Second, we were invited to present at a workshop for science journalists organised by a third party, the Swiss Association of Science Journalists, for which the recruitment was not in our hands. As explained on the association’s website, ordinary members of the association must attest that they work as journalists for an independent media outlet and their work should primarily focus on scientific, medical, technical or science policy issues.
We would like to stress again that none of the workshops were intended to collect data for a validation of the checklist. All events were, what we now call, preliminary implementation events, where we showcased the checklist for use in a teaching setting.

On a more minor note, you mention getting input on the checklist from “experts across four topic groups: research methodology (14 invitations), preprints (15 invitations), journal editors (15 invitations), and science journalists (10 invitations).” The rest of the paragraph explains who these people are in more detail, but at first reading it was not clear whether you were inviting researchers who study science journalists or the journalists themselves. Similarly, did you invite preprint server staff, preprint authors, or researchers who study preprints? I would adapt the language a bit here for clarity.
Reply: Thanks a lot for this comment which shows that more information is needed. We realised the sentence you also copied presenting the four topics was confusing, since we used jobs (editors and journalists) to refer to topics. Hence, this was changed now to journal publishing (15 invitations), and science journalism (10 invitations). We also added another clarifying sentence: Thus, we invited science journalists, journal editors, as well as researchers whose research focuses on research methodology and preprints.

In addition, while I am not an expert in the DELPHI method, I do have some concerns about how it was implemented, given the conclusions that were drawn.

First, this is a mixed method study, not a completely qualitative one, given that you gathered Likert-scale data (which was used to perform quantitative tests), as well as open-ended feedback. In addition, the Sensitivity Testing relied on binary data (i.e., yes/no) responses, which is quantitative. While I commend the authors for using Standards for Reporting Qualitative Research (SRQR), some standards for quantitative research have not been met. Specifically, there was no information about the degree to which the various users who used the checklist in the Sensitivity Testing agreed with one another. How often did one coder choose yes and the other choose no? Normally I would expect to see some form of intercoder reliability test here, especially if the goal of the study is to develop a checklist that can help people assess the quality of the research. If the checklist works well, then two individuals should be able to come to similar conclusions when using it. On a more minor note, in qualitative research, it is not true that “in general, the more participants, the stronger the evidence” (Turoman et al., 2023, p. 14). I would remove this statement or make it clear in the text that it relates specifically to the quantitative aspects of the research.
Reply: Thank you for this diligent point. It is true that we quantitatively assessed the responses from the expert panel. But this is the only real quantitative aspect of this study. The sensitivity testing, now called pilot testing, did not quantitatively assess the procedure. We added the following sentence to the text: Due to the limited number of assessed preprints only a qualitative summary of the results is possible and no statistical analysis of these results is provided.
You are completely correct that the remark in the checklist “There is no one-size-fits-all number of participants that makes a study good, but in general, the more participants, the stronger the evidence.” of course only applies to quantitative studies. But the checklist is preceded by the disclaimer “The checklist works best for studies with human subjects, using primary data (that the researchers collected themselves) or systematic reviews, meta-analyses and re-analyses of primary data. It is not ideally suited to simulation studies (where the data are computer-generated).”, which we have now expanded to state the following in the last sentence: “It is not ideally suited to simulation studies (where the data are computer-generated) or studies that have a qualitative part (e.g., like interview data, or other data that cannot be easily translated into numbers).” We have realised that the disclaimer had not initially made it into the manuscript, only the document in OSF itself, and the web version of the checklist. Thus, we have added the disclaimer into the manuscript now (see Table 2), and the updated version on the OSF.

Second, it strikes me that the checklist was never sensitivity tested by participants in the two target groups, only by members of the study team. Why is that? Given that the research team are experts in this issue and had read and attempted to assess the quality of the checklist multiple times, they would likely be much more adept at applying the checklist than journalists or students using the checklist for the first time. This seems like a major limitation of the study, given that it seeks to provide a checklist that is useful for these groups. You do comment that “both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used” and that “there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own.” (Turoman et al., 2023, p. 15), suggesting that you did examine the similarity in participants’ responses to the checklist questions. If this was the case, including the process used to make the comparison in the methods section, as well as the results of the comparison, would considerably strengthen the manuscript. If you did not compare the users’ responses in any systematic way, I do not think you can make conclusive claims such as, “The implementation round confirmed that the checklist was indeed usable (Turoman et al., 2023, p. 15) or . “Appearing” to understand how to use a checklist is not the same as understanding how to use it, nor using it in a way that is effective and reliable.
Reply: Again, thank you for this valid point. We made sure that the distinction between expert testers in the pilot testing phases (formerly sensitivity testing) is made more clearly, e.g. in the abstract with the adapted sentence “When using both levels of evaluation, the checklist was effective for expert users at discriminating high- from low-quality preprints in a small set of example preprints.” During the workshops we used the classroom response systems (Votamatic and Klicker) provided by our universities to discuss with the audience what we believe are the correct answers to the checklist questions for two specific preprints that we also included in the pilot testing phase. This workshop discussion phase is the reason why we mention that participants appeared to understand the use of the checklist and provided in majority the same answers as ourselves. We adapted the wording of this section as follows: The preliminary implementation round confirmed that the checklist was indeed usable, as, during the workshop discussion, both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used. Further, in our workshops, the live surveys showed that there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own.

Third, I found myself wondering about the choice of preprints used in the testing. How could reusing the same preprints over the various rounds have influenced results, especially given that the evaluation/application of the checklist was performed by the same two members each time (as mentioned above). Similarly, while the proxies you used to identify ‘high’ and ‘low’ quality preprints are creative, and has advantages over other proxy measures of quality (e.g., preprints published in high impact journals, or cited a lot), it seems like the Milestone List is based on the significance/usefulness of the research for addressing the COVID-19 crisis, not the integrity of the methods, data, study design, or reporting. Yet this is just a guess, because looking at the list, it's not clear at all what criteria were used to select the articles—it just says that "Nature highlighted key papers and preprints to help readers keep up with the flood of coronavirus research." What makes a paper "key" in this case? This seems very important given how central this list is to the method of the study. More broadly, I would appreciate a definition of "quality" somewhere earlier in this article, as well as a clear discussion of the limitations associated with the sample (e.g., limited to only three preprints, all of which were about COVID-19, and all of which are biomedical). The checklist seems very inappropriate for assessing the ‘quality’ of research or scholarship in other disciplines, such as the social sciences or humanities. You may want to rephrase the purpose and conclusions of the study to specify the focus on biomedical preprints, rather than on preprints in general.
Reply: Thank you very much for these questions, which in part were also asked by Reviewer 1. In the new Limitations sections we discuss the limitations of the selection of the sample of preprints. We also warn in the Pilot testing section: Due to the limited number of assessed preprints, only a qualitative summary of the results is possible and no statistical analysis of these results is provided. As a consequence the set of chosen preprints is, in our opinion, not completely central to the presented project and we chose not to comment any further on the process of how papers on the Milestone list were selected.
We also refrained from defining “quality” of a preprint or a publication in general, as already mentioned in the manuscript (Footnote 2): As we will explain below, high and low quality was decided based on proxies, and not on objective criteria, as we could not determine a set of such criteria that apply to all preprints across all disciplines that the checklist could possibly be used on. With this, when we refer to ‘high-quality’ and ‘low-quality’ preprints, we do so for brevity, and not because the preprints in question belong to predefined universally agreed upon categories of ‘good’ and ‘bad’. Here, as in response to reviewer 1, we use retraction as an imperfect proxy of low quality.
Finally, we did adapt our conclusion to be narrower: Despite our COVID-19-related motivation, the PRECHECK checklist should be more broadly applicable to scientific works describing biomedical studies with human subjects.

In addition, there is some literature missing from the review that seems worth citing with respect to journalists’ use of preprints, as well as the public’s understanding/reponses to media coverage of preprints. This includes some of my own work (I know, reviewers always do this), but also of other scholars. Here are some citations worth considering. I also encourage the authors to do an additional literature search, since more research may have come online recently of which I am not aware:
[LIST OF REFERENCES]
Reply: Thanks a lot for pointing us to further literature we were unaware of. Some of those papers were published after our first submission. We reviewed the papers for their relevance for our manuscript and added many of them in our introduction/literature review.

Finally, I encourage the authors to make the implications and contributions of the work clearer. From a practical perspective, I found myself wondering, is this actually a checklist for vetting the quality of preprints, or for assessing research quality in general? Given that there are many, many checklists available for assessing the quality of peer reviewed research, answering this question seems very important. Perhaps the contribution is that this checklist could help nonexperts vet research—peer reviewed or not. I do not know this area well enough to know the conceptual, theoretical, or scholarly gaps that have been filled, but I am sure the authors can do so.
Reply: We repeatedly refer to our checklist as a “teaching tool”, as for example in this sentence in the discussion: One important aspect of the checklist to note is its intended use as a teaching tool to guide non-specialist audiences to evaluate preprints more critically by giving them concrete aspects to search for in the manuscript. We additionally decided to delete the last sentence in our Conclusions section (With this, and the fact that our target audience could use the checklist, we believe that the checklist has great potential to help guide non-scientists’ understanding of scientific content, especially in preprint form.) because this claim is not actually supported by our project.

I know that these are a lot of comments and concerns, but I have included them not to be rude or overly critical but because I believe addressing will make the study better able to contribute to an important issue that I care about deeply. I hope the authors find them helpful and are not discouraged from pursuing future research about journalists’ use of preprints. We need more of it!
Reply: We greatly appreciate your comments and were able to integrate them in our manuscript. We are convinced that the changes we made thanks to your comments improved the clarity and relevance of our manuscript.
Competing Interests: There are no competing interests to report. Close
Report a concern

Views

Reviewer Report 18 Jul 2023

Paul Glasziou, Institute for Evidence-Based Healthcare, Bond University, Robina, Queensland, Australia

Oyungerel Byambasuren, Institute for Evidence-Based Healthcare, Bond University, Robina, Queensland, Australia

Approved with Reservations

https://doi.org/10.5256/f1000research.142526.r176427

Turoman and colleagues have reported on the development of a much-needed assessment tool to help lay people to check preprint quality. The rationale for the study were clearly laid out. They went through four steps of development to create the checklist: initial internal and external reviews, final internal review and implementation. The resulting checklist has 4-categories – research question, study type, transparency and integrity, and limitations. Their reporting was well done following the Standards for Reporting Qualitative Research.

This study aimed to develop a brief checklist to guide non-experts in assessing pre-prints on Covid-19, but clearly can be extended to other types of research. The authors should be congratulated on making an interesting attempt. It represents a good first step, but I have a number of reservations about their validation process and the current checklist.

There are a few questions and comments below that could further improve this manuscript.

Introduction: could be more concise particularly paragraph 1 and 3.

Methods:

Ethics clearance was not required. The study has a repository on Open Science Framework with all relevant documents, but it’s not clear whether it was submitted a priori. Please clarify.
The researchers have not reported whether they conducted a literature review before developing their first draft of the checklist with six items. Checking existing literature in the beginning of research helps reduce research waste and needless duplication or repetition. Please clarify and if not conducted justify this omission.
The sensitivity testing is very limited using only three "high-quality" and three "low quality" preprints assessed with the checklist. That might be reasonable pilot testing, but far from a sensitivity testing or validation.
Whilst choosing major COVID-19-related research as “high” quality test papers for the sensitivity analysis was understandable, choosing retracted preprints for the “low” quality set is not necessarily appropriate. Manuscripts could be of ‘good’ quality methodologically, yet they could be retracted due to data fabrication or irregularities. So, the fact that they were retracted doesn’t automatically mean they were of “low” quality. Please add it to the limitations.

Results:

Second iteration of the checklist was trialled with psychology and medicine students and science journalists. They were given 15-20 minutes to read the preprints and complete the checklist as a group. For inexperienced people, 15-20 minutes is not enough time to understand a scientific study manuscript let alone assess the quality of it. This raises a concern about the real-world usability of this checklist.
Sensitivity tests of the checklist drafts are not statistically analysed. Yes and no answers were compared between testers and workshop participants and between ‘high’ and ‘low’ quality preprints to make a final conclusion.
Final draft checklist was again tested with students and journalists. It now has 4 categories: research questions, study type, transparency and integrity, and limitations. Each category has a brief statement explaining why this category is important and the latter 3 categories each have 4-5 additional questions to evaluate the manuscript in “a deeper level”.
The authors note that the checklist is only applicable to research in humans, and not for in vitro or animal studies. Implicitly then this is in the first check item that needs to be added to their checklist

Discussion: The authors claim that there are 3 key findings from this study:

They state that this checklist is effective in discerning between high- and low-quality studies, because they observed that ‘high’ quality studies get more ‘yes’ answers than the ‘low’ quality ones. Is this claim warranted without proper statistical analysis?
They continue that the ‘deep level’ questions are especially important in discriminating the quality of preprints. This means the reader must consider about 20 different questions and points to make a quality judgement of a paper, which brings us to the issue of practical usability of this checklist. To make a reasonable judgement, a reader will need to read the preprint more than once to be able to orient them and to find relevant information pertaining to the checklist. This would take more than 20 minutes the authors allowed the workshop participants. Please elaborate on why a reader should spend that much time to use this checklist themselves over a social media post by an ‘expert’ dissecting the paper, for example.
Third key finding is that this checklist highlighted issues with transparency and research integrity. We agree that any quality assessment attempt should consider these issues and this checklist could help improve the understanding and literacy of lay people regarding issues around research integrity and transparency.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Evidence-based practice, overdiagnosis, non-drug interventions,

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

CITE

Report a concern

Author Response 22 Mar 2024

Nora Turoman, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland

22 Mar 2024

Author Response
Point-by-point responses to Reviewer 1 - Paul P. Glasziou and Oyungerel Byambasuren
Turoman and colleagues have reported on the development of a much-needed assessment tool to help lay people to ... Continue reading
Point-by-point responses to Reviewer 1 - Paul P. Glasziou and Oyungerel Byambasuren
Turoman and colleagues have reported on the development of a much-needed assessment tool to help lay people to check preprint quality. The rationale for the study were clearly laid out. They went through four steps of development to create the checklist: initial internal and external reviews, final internal review and implementation. The resulting checklist has 4-categories – research question, study type, transparency and integrity, and limitations. Their reporting was well done following the Standards for Reporting Qualitative Research.
This study aimed to develop a brief checklist to guide non-experts in assessing pre-prints on Covid-19, but clearly can be extended to other types of research. The authors should be congratulated on making an interesting attempt. It represents a good first step, but I have a number of reservations about their validation process and the current checklist.
There are a few questions and comments below that could further improve this manuscript.

Introduction: could be more concise particularly paragraph 1 and 3.
Reply: Paragraphs 1 and 3 were shortened, while we also added additional literature in the introduction as requested by the second reviewer (Alice Fleerackers).

Methods:

Ethics clearance was not required. The study has a repository on Open Science Framework with all relevant documents, but it’s not clear whether it was submitted a priori. Please clarify.

Reply: No application for ethics clearance was submitted a priori. As we state in the text in the section Ethical considerations the verification was post-hoc: “A post-hoc verification with the University of Zurich ethical commission’s ethics review checklist (available in in the OSF repository for this project) confirmed that we did not require ethical approval for the current project per the regulations of the University of Zurich.” Unfortunately, we only realised during this revision that the document was provided to the editorial office but was not put on the OSF repository. This has been rectified now and the document is available here: https://osf.io/uysze. Additionally we extended the section Ethical considerations with additional explanations.

The researchers have not reported whether they conducted a literature review before developing their first draft of the checklist with six items. Checking existing literature in the beginning of research helps reduce research waste and needless duplication or repetition. Please clarify and if not conducted justify this omission.

Reply: In order to draft the first version of the checklist, a literature review was performed to understand the state of disseminating COVID-19-related results via preprints at the time, the impact of such dissemination, and what was already present and what was missing in terms of mitigating some negative impacts of such dissemination. The results of this literature review enter in the introduction of the manuscript. In the description of the study design we added that the first draft of the checklist was developed by the junior members of the team after an initial review of the literature and potential solutions (which forms the introduction of this paper).

The sensitivity testing is very limited using only three "high-quality" and three "low quality" preprints assessed with the checklist. That might be reasonable pilot testing, but far from a sensitivity testing or validation.

Reply: You are making a valid point. We adapted the wording and changed the term “sensitivity testing” to “pilot testing”. As explained later, we also extended the list of limitations of our work and the issue of limited testing is now explicitly mentioned.

Whilst choosing major COVID-19-related research as “high” quality test papers for the sensitivity analysis was understandable, choosing retracted preprints for the “low” quality set is not necessarily appropriate. Manuscripts could be of ‘good’ quality methodologically, yet they could be retracted due to data fabrication or irregularities. So, the fact that they were retracted doesn’t automatically mean they were of “low” quality. Please add it to the limitations.

Reply: Again, you raise a valid point. Our initial reasoning was that if the preprint authors, for example, fabricate their data, they might also not transparently disclose other aspects (limitations, conflict of interests, data sharing), so that our checklist could identify the preprint as being of poor quality. As such, retraction is used here as an imperfect proxy of low quality. We already discussed the difficulty of the selection of appropriate “low” quality preprints, but mention it now as a true limitation of our study in a dedicated Limitations section and in an amended sentence in the Pilot testing section “[...] with retraction being an imperfect but feasible proxy for low quality [...]”.

Results:

Second iteration of the checklist was trialled with psychology and medicine students and science journalists. They were given 15-20 minutes to read the preprints and complete the checklist as a group. For inexperienced people, 15-20 minutes is not enough time to understand a scientific study manuscript let alone assess the quality of it. This raises a concern about the real-world usability of this checklist.

Reply: Thank you for raising this point. First we would like to mention that the section describing these workshops was situated in an incorrect location in the article. We have now moved it to the correct placement in the Methods Section.
In fact, we designed the checklist to be used as a quick tool, with the option of digging deeper depending on interest, background, and preference. Hence, 15-20 minutes should be enough to decide whether it is worth spending more time reading a preprint. As such, the superficial level of our checklist provides a starting point of concrete aspects to search for without a deep understanding of the underlying science. The deeper level allows a more in-depth consideration, and for some aspects at this level, it is true that one needs to understand the presented study more thoroughly, which would take longer. That said, the less experienced student audience was given the preprints in advance and had to read at least one of them as homework (see Pilot implementation phase). In light of this comment, we have now added more detail in the Discussion section.

Sensitivity tests of the checklist drafts are not statistically analysed. Yes and no answers were compared between testers and workshop participants and between ‘high’ and ‘low’ quality preprints to make a final conclusion.

Reply: No, there were no statistical tests and analyses produced, and this was never intended. During the workshops we did not even collect data due to ethical considerations. This sensitivity (or pilot) testing should therefore be regarded as exploratory; we added a sentence to the corresponding section in the Methods section: “There were no formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way.”

Final draft checklist was again tested with students and journalists. It now has 4 categories: research questions, study type, transparency and integrity, and limitations. Each category has a brief statement explaining why this category is important and the latter 3 categories each have 4-5 additional questions to evaluate the manuscript in “a deeper level”.

The authors note that the checklist is only applicable to research in humans, and not for in vitro or animal studies. Implicitly then this is in the first check item that needs to be added to their checklist.

Reply: We understand your point. There is already a disclaimer on the checklist stating that it is mainly applicable for research in humans. We realised that the disclaimer was not added to the manuscript, and we have added it now (see Table 2). We believe that this is a better solution than adding an item to the checklist which could be wrongly understood as an additional quality criterion. We also discuss this in the new Limitations section.

Discussion: The authors claim that there are 3 key findings from this study:

They state that this checklist is effective in discerning between high- and low-quality studies, because they observed that ‘high’ quality studies get more ‘yes’ answers than the ‘low’ quality ones. Is this claim warranted without proper statistical analysis?

Reply: Thank you for this very good point. As below, we changed to more cautious wording in the Abstract and the Results section, and we added an explanatory sentence in the Methods section:
“When using both levels of evaluation, the checklist was effective for expert users at discriminating high- from low-quality preprints in a small set of example preprints.”
“This indicates that the checklist might be effective at discriminating between preprints of high and low quality.”
“Due to the limited number of assessed preprints, only a qualitative summary of the results is possible and no statistical analysis of these results is provided..”

They continue that the ‘deep level’ questions are especially important in discriminating the quality of preprints. This means the reader must consider about 20 different questions and points to make a quality judgement of a paper, which brings us to the issue of practical usability of this checklist. To make a reasonable judgement, a reader will need to read the preprint more than once to be able to orient them and to find relevant information pertaining to the checklist. This would take more than 20 minutes the authors allowed the workshop participants. Please elaborate on why a reader should spend that much time to use this checklist themselves over a social media post by an ‘expert’ dissecting the paper, for example.

Reply: The workshops we organised are only one example of how to use the checklist in guiding non-experts in critical assessment. Their purpose was to teach university students and science journalists how the checklist works, in a way that blends in with their work schedules, and not to impose an ideal setting or time limit on their use of the checklist. Indeed, the superficial level lends itself to a rather quick check by searching digitally in the pdf of the preprint. For inexperienced users of the scientific literature, this provides a way of knowing how to start such an assessment. For more experienced users, e.g. the science journalists, the deeper level provides a good starting set of aspects to reflect upon. In a different setting from our workshops, one could start with the superficial level, and choose one specific theme from the deeper level to be discussed, e.g. bias. Depending on the user’s experience and level of involvement, this or any use of the ‘deep level’ could take various amounts of time. Again, we must stress that the checklist is primarily a way to get non-specialists thinking about how to assess preprint quality for themselves - not every question must be considered for the exercise to be valid, as long as it gets non-specialists engaged and thinking critically. We have now clarified this in the Discussion section.
If the checklist is used on its own, and specifically in relation to a social media post, it provides, in our opinion, a way to check that the post did indeed consider important aspects of preprint quality (which are summarised in the checklist). If the answers to all deeper level questions can be found in the post, 20 minutes could again be sufficient time to dig deeper.

Third key finding is that this checklist highlighted issues with transparency and research integrity. We agree that any quality assessment attempt should consider these issues and this checklist could help improve the understanding and literacy of lay people regarding issues around research integrity and transparency.

Overall, the authors had their work cut out in attempting to create “a simple and user-friendly tool” to check quality of preprints for educated lay people as it requires a fine balance between usability and accuracy. The resulting checklist was shown to be a useful tool for educating potential readers of medical research on wider issues of publishing, preprints, peer review, evidence-based practice, research integrity, and transparency. This study sparks important research on improving research literacy of lay people and we hope that this study would be extended and applied to many different settings around the world.
Given the limitations I would suggest the authors need to be more circumspect In their conclusions about the usefulness of this current checklist given the extremely limited validity checking of it this is a very useful pilot but I would say I was only very preliminary work.
Reply: Thanks a lot for your comments and suggestions. We hope our manuscript now clearly states the limitations of our study and highlights that the assessment is only qualitative and restricted to a small set of perprints.

Minor. In figure 2 the "implementation stage" might better be labelled a "piloting stage" - this was really pilot testing the draft checklist out on a number of groups before finalising the checklist, rather than implementation to the wider community.
Reply: We called this stage implementation stage because we wanted to showcase how we implemented the checklist in our teaching and in the interaction with our communications departments without the intention of adapting the checklist afterwards. But of course you are right that piloting is the more common term. However, we already called the sensitivity test a pilot test, and thus decided on “Preliminary implementation stage” as a compromise. We hope that this change is addressing your concern sufficiently.
Point-by-point responses to Reviewer 1 - Paul P. Glasziou and Oyungerel Byambasuren
Turoman and colleagues have reported on the development of a much-needed assessment tool to help lay people to check preprint quality. The rationale for the study were clearly laid out. They went through four steps of development to create the checklist: initial internal and external reviews, final internal review and implementation. The resulting checklist has 4-categories – research question, study type, transparency and integrity, and limitations. Their reporting was well done following the Standards for Reporting Qualitative Research.
This study aimed to develop a brief checklist to guide non-experts in assessing pre-prints on Covid-19, but clearly can be extended to other types of research. The authors should be congratulated on making an interesting attempt. It represents a good first step, but I have a number of reservations about their validation process and the current checklist.
There are a few questions and comments below that could further improve this manuscript.

Introduction: could be more concise particularly paragraph 1 and 3.
Reply: Paragraphs 1 and 3 were shortened, while we also added additional literature in the introduction as requested by the second reviewer (Alice Fleerackers).

Methods:

Ethics clearance was not required. The study has a repository on Open Science Framework with all relevant documents, but it’s not clear whether it was submitted a priori. Please clarify.

Reply: No application for ethics clearance was submitted a priori. As we state in the text in the section Ethical considerations the verification was post-hoc: “A post-hoc verification with the University of Zurich ethical commission’s ethics review checklist (available in in the OSF repository for this project) confirmed that we did not require ethical approval for the current project per the regulations of the University of Zurich.” Unfortunately, we only realised during this revision that the document was provided to the editorial office but was not put on the OSF repository. This has been rectified now and the document is available here: https://osf.io/uysze. Additionally we extended the section Ethical considerations with additional explanations.

The researchers have not reported whether they conducted a literature review before developing their first draft of the checklist with six items. Checking existing literature in the beginning of research helps reduce research waste and needless duplication or repetition. Please clarify and if not conducted justify this omission.

Reply: In order to draft the first version of the checklist, a literature review was performed to understand the state of disseminating COVID-19-related results via preprints at the time, the impact of such dissemination, and what was already present and what was missing in terms of mitigating some negative impacts of such dissemination. The results of this literature review enter in the introduction of the manuscript. In the description of the study design we added that the first draft of the checklist was developed by the junior members of the team after an initial review of the literature and potential solutions (which forms the introduction of this paper).

The sensitivity testing is very limited using only three "high-quality" and three "low quality" preprints assessed with the checklist. That might be reasonable pilot testing, but far from a sensitivity testing or validation.

Reply: You are making a valid point. We adapted the wording and changed the term “sensitivity testing” to “pilot testing”. As explained later, we also extended the list of limitations of our work and the issue of limited testing is now explicitly mentioned.

Whilst choosing major COVID-19-related research as “high” quality test papers for the sensitivity analysis was understandable, choosing retracted preprints for the “low” quality set is not necessarily appropriate. Manuscripts could be of ‘good’ quality methodologically, yet they could be retracted due to data fabrication or irregularities. So, the fact that they were retracted doesn’t automatically mean they were of “low” quality. Please add it to the limitations.

Reply: Again, you raise a valid point. Our initial reasoning was that if the preprint authors, for example, fabricate their data, they might also not transparently disclose other aspects (limitations, conflict of interests, data sharing), so that our checklist could identify the preprint as being of poor quality. As such, retraction is used here as an imperfect proxy of low quality. We already discussed the difficulty of the selection of appropriate “low” quality preprints, but mention it now as a true limitation of our study in a dedicated Limitations section and in an amended sentence in the Pilot testing section “[...] with retraction being an imperfect but feasible proxy for low quality [...]”.

Results:

Second iteration of the checklist was trialled with psychology and medicine students and science journalists. They were given 15-20 minutes to read the preprints and complete the checklist as a group. For inexperienced people, 15-20 minutes is not enough time to understand a scientific study manuscript let alone assess the quality of it. This raises a concern about the real-world usability of this checklist.

Reply: Thank you for raising this point. First we would like to mention that the section describing these workshops was situated in an incorrect location in the article. We have now moved it to the correct placement in the Methods Section.
In fact, we designed the checklist to be used as a quick tool, with the option of digging deeper depending on interest, background, and preference. Hence, 15-20 minutes should be enough to decide whether it is worth spending more time reading a preprint. As such, the superficial level of our checklist provides a starting point of concrete aspects to search for without a deep understanding of the underlying science. The deeper level allows a more in-depth consideration, and for some aspects at this level, it is true that one needs to understand the presented study more thoroughly, which would take longer. That said, the less experienced student audience was given the preprints in advance and had to read at least one of them as homework (see Pilot implementation phase). In light of this comment, we have now added more detail in the Discussion section.

Sensitivity tests of the checklist drafts are not statistically analysed. Yes and no answers were compared between testers and workshop participants and between ‘high’ and ‘low’ quality preprints to make a final conclusion.

Reply: No, there were no statistical tests and analyses produced, and this was never intended. During the workshops we did not even collect data due to ethical considerations. This sensitivity (or pilot) testing should therefore be regarded as exploratory; we added a sentence to the corresponding section in the Methods section: “There were no formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way.”

Final draft checklist was again tested with students and journalists. It now has 4 categories: research questions, study type, transparency and integrity, and limitations. Each category has a brief statement explaining why this category is important and the latter 3 categories each have 4-5 additional questions to evaluate the manuscript in “a deeper level”.

The authors note that the checklist is only applicable to research in humans, and not for in vitro or animal studies. Implicitly then this is in the first check item that needs to be added to their checklist.

Reply: We understand your point. There is already a disclaimer on the checklist stating that it is mainly applicable for research in humans. We realised that the disclaimer was not added to the manuscript, and we have added it now (see Table 2). We believe that this is a better solution than adding an item to the checklist which could be wrongly understood as an additional quality criterion. We also discuss this in the new Limitations section.

Discussion: The authors claim that there are 3 key findings from this study:

They state that this checklist is effective in discerning between high- and low-quality studies, because they observed that ‘high’ quality studies get more ‘yes’ answers than the ‘low’ quality ones. Is this claim warranted without proper statistical analysis?

Reply: Thank you for this very good point. As below, we changed to more cautious wording in the Abstract and the Results section, and we added an explanatory sentence in the Methods section:
“When using both levels of evaluation, the checklist was effective for expert users at discriminating high- from low-quality preprints in a small set of example preprints.”
“This indicates that the checklist might be effective at discriminating between preprints of high and low quality.”
“Due to the limited number of assessed preprints, only a qualitative summary of the results is possible and no statistical analysis of these results is provided..”

They continue that the ‘deep level’ questions are especially important in discriminating the quality of preprints. This means the reader must consider about 20 different questions and points to make a quality judgement of a paper, which brings us to the issue of practical usability of this checklist. To make a reasonable judgement, a reader will need to read the preprint more than once to be able to orient them and to find relevant information pertaining to the checklist. This would take more than 20 minutes the authors allowed the workshop participants. Please elaborate on why a reader should spend that much time to use this checklist themselves over a social media post by an ‘expert’ dissecting the paper, for example.

Reply: The workshops we organised are only one example of how to use the checklist in guiding non-experts in critical assessment. Their purpose was to teach university students and science journalists how the checklist works, in a way that blends in with their work schedules, and not to impose an ideal setting or time limit on their use of the checklist. Indeed, the superficial level lends itself to a rather quick check by searching digitally in the pdf of the preprint. For inexperienced users of the scientific literature, this provides a way of knowing how to start such an assessment. For more experienced users, e.g. the science journalists, the deeper level provides a good starting set of aspects to reflect upon. In a different setting from our workshops, one could start with the superficial level, and choose one specific theme from the deeper level to be discussed, e.g. bias. Depending on the user’s experience and level of involvement, this or any use of the ‘deep level’ could take various amounts of time. Again, we must stress that the checklist is primarily a way to get non-specialists thinking about how to assess preprint quality for themselves - not every question must be considered for the exercise to be valid, as long as it gets non-specialists engaged and thinking critically. We have now clarified this in the Discussion section.
If the checklist is used on its own, and specifically in relation to a social media post, it provides, in our opinion, a way to check that the post did indeed consider important aspects of preprint quality (which are summarised in the checklist). If the answers to all deeper level questions can be found in the post, 20 minutes could again be sufficient time to dig deeper.

Third key finding is that this checklist highlighted issues with transparency and research integrity. We agree that any quality assessment attempt should consider these issues and this checklist could help improve the understanding and literacy of lay people regarding issues around research integrity and transparency.

Overall, the authors had their work cut out in attempting to create “a simple and user-friendly tool” to check quality of preprints for educated lay people as it requires a fine balance between usability and accuracy. The resulting checklist was shown to be a useful tool for educating potential readers of medical research on wider issues of publishing, preprints, peer review, evidence-based practice, research integrity, and transparency. This study sparks important research on improving research literacy of lay people and we hope that this study would be extended and applied to many different settings around the world.
Given the limitations I would suggest the authors need to be more circumspect In their conclusions about the usefulness of this current checklist given the extremely limited validity checking of it this is a very useful pilot but I would say I was only very preliminary work.
Reply: Thanks a lot for your comments and suggestions. We hope our manuscript now clearly states the limitations of our study and highlights that the assessment is only qualitative and restricted to a small set of perprints.

Minor. In figure 2 the "implementation stage" might better be labelled a "piloting stage" - this was really pilot testing the draft checklist out on a number of groups before finalising the checklist, rather than implementation to the wider community.
Reply: We called this stage implementation stage because we wanted to showcase how we implemented the checklist in our teaching and in the interaction with our communications departments without the intention of adapting the checklist afterwards. But of course you are right that piloting is the more common term. However, we already called the sensitivity test a pilot test, and thus decided on “Preliminary implementation stage” as a compromise. We hope that this change is addressing your concern sufficiently.
Competing Interests: We have no competing interests to report. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 22 Mar 2024

Nora Turoman, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland

22 Mar 2024

Author Response
Point-by-point responses to Reviewer 1 - Paul P. Glasziou and Oyungerel Byambasuren
Turoman and colleagues have reported on the development of a much-needed assessment tool to help lay people to ... Continue reading
Point-by-point responses to Reviewer 1 - Paul P. Glasziou and Oyungerel Byambasuren
Turoman and colleagues have reported on the development of a much-needed assessment tool to help lay people to check preprint quality. The rationale for the study were clearly laid out. They went through four steps of development to create the checklist: initial internal and external reviews, final internal review and implementation. The resulting checklist has 4-categories – research question, study type, transparency and integrity, and limitations. Their reporting was well done following the Standards for Reporting Qualitative Research.
This study aimed to develop a brief checklist to guide non-experts in assessing pre-prints on Covid-19, but clearly can be extended to other types of research. The authors should be congratulated on making an interesting attempt. It represents a good first step, but I have a number of reservations about their validation process and the current checklist.
There are a few questions and comments below that could further improve this manuscript.

Introduction: could be more concise particularly paragraph 1 and 3.
Reply: Paragraphs 1 and 3 were shortened, while we also added additional literature in the introduction as requested by the second reviewer (Alice Fleerackers).

Methods:

Ethics clearance was not required. The study has a repository on Open Science Framework with all relevant documents, but it’s not clear whether it was submitted a priori. Please clarify.

Reply: No application for ethics clearance was submitted a priori. As we state in the text in the section Ethical considerations the verification was post-hoc: “A post-hoc verification with the University of Zurich ethical commission’s ethics review checklist (available in in the OSF repository for this project) confirmed that we did not require ethical approval for the current project per the regulations of the University of Zurich.” Unfortunately, we only realised during this revision that the document was provided to the editorial office but was not put on the OSF repository. This has been rectified now and the document is available here: https://osf.io/uysze. Additionally we extended the section Ethical considerations with additional explanations.

The researchers have not reported whether they conducted a literature review before developing their first draft of the checklist with six items. Checking existing literature in the beginning of research helps reduce research waste and needless duplication or repetition. Please clarify and if not conducted justify this omission.

Reply: In order to draft the first version of the checklist, a literature review was performed to understand the state of disseminating COVID-19-related results via preprints at the time, the impact of such dissemination, and what was already present and what was missing in terms of mitigating some negative impacts of such dissemination. The results of this literature review enter in the introduction of the manuscript. In the description of the study design we added that the first draft of the checklist was developed by the junior members of the team after an initial review of the literature and potential solutions (which forms the introduction of this paper).

The sensitivity testing is very limited using only three "high-quality" and three "low quality" preprints assessed with the checklist. That might be reasonable pilot testing, but far from a sensitivity testing or validation.

Reply: You are making a valid point. We adapted the wording and changed the term “sensitivity testing” to “pilot testing”. As explained later, we also extended the list of limitations of our work and the issue of limited testing is now explicitly mentioned.

Whilst choosing major COVID-19-related research as “high” quality test papers for the sensitivity analysis was understandable, choosing retracted preprints for the “low” quality set is not necessarily appropriate. Manuscripts could be of ‘good’ quality methodologically, yet they could be retracted due to data fabrication or irregularities. So, the fact that they were retracted doesn’t automatically mean they were of “low” quality. Please add it to the limitations.

Reply: Again, you raise a valid point. Our initial reasoning was that if the preprint authors, for example, fabricate their data, they might also not transparently disclose other aspects (limitations, conflict of interests, data sharing), so that our checklist could identify the preprint as being of poor quality. As such, retraction is used here as an imperfect proxy of low quality. We already discussed the difficulty of the selection of appropriate “low” quality preprints, but mention it now as a true limitation of our study in a dedicated Limitations section and in an amended sentence in the Pilot testing section “[...] with retraction being an imperfect but feasible proxy for low quality [...]”.

Results:

Second iteration of the checklist was trialled with psychology and medicine students and science journalists. They were given 15-20 minutes to read the preprints and complete the checklist as a group. For inexperienced people, 15-20 minutes is not enough time to understand a scientific study manuscript let alone assess the quality of it. This raises a concern about the real-world usability of this checklist.

Reply: Thank you for raising this point. First we would like to mention that the section describing these workshops was situated in an incorrect location in the article. We have now moved it to the correct placement in the Methods Section.
In fact, we designed the checklist to be used as a quick tool, with the option of digging deeper depending on interest, background, and preference. Hence, 15-20 minutes should be enough to decide whether it is worth spending more time reading a preprint. As such, the superficial level of our checklist provides a starting point of concrete aspects to search for without a deep understanding of the underlying science. The deeper level allows a more in-depth consideration, and for some aspects at this level, it is true that one needs to understand the presented study more thoroughly, which would take longer. That said, the less experienced student audience was given the preprints in advance and had to read at least one of them as homework (see Pilot implementation phase). In light of this comment, we have now added more detail in the Discussion section.

Sensitivity tests of the checklist drafts are not statistically analysed. Yes and no answers were compared between testers and workshop participants and between ‘high’ and ‘low’ quality preprints to make a final conclusion.

Reply: No, there were no statistical tests and analyses produced, and this was never intended. During the workshops we did not even collect data due to ethical considerations. This sensitivity (or pilot) testing should therefore be regarded as exploratory; we added a sentence to the corresponding section in the Methods section: “There were no formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way.”

Final draft checklist was again tested with students and journalists. It now has 4 categories: research questions, study type, transparency and integrity, and limitations. Each category has a brief statement explaining why this category is important and the latter 3 categories each have 4-5 additional questions to evaluate the manuscript in “a deeper level”.

The authors note that the checklist is only applicable to research in humans, and not for in vitro or animal studies. Implicitly then this is in the first check item that needs to be added to their checklist.

Reply: We understand your point. There is already a disclaimer on the checklist stating that it is mainly applicable for research in humans. We realised that the disclaimer was not added to the manuscript, and we have added it now (see Table 2). We believe that this is a better solution than adding an item to the checklist which could be wrongly understood as an additional quality criterion. We also discuss this in the new Limitations section.

Discussion: The authors claim that there are 3 key findings from this study:

They state that this checklist is effective in discerning between high- and low-quality studies, because they observed that ‘high’ quality studies get more ‘yes’ answers than the ‘low’ quality ones. Is this claim warranted without proper statistical analysis?

Reply: Thank you for this very good point. As below, we changed to more cautious wording in the Abstract and the Results section, and we added an explanatory sentence in the Methods section:
“When using both levels of evaluation, the checklist was effective for expert users at discriminating high- from low-quality preprints in a small set of example preprints.”
“This indicates that the checklist might be effective at discriminating between preprints of high and low quality.”
“Due to the limited number of assessed preprints, only a qualitative summary of the results is possible and no statistical analysis of these results is provided..”

They continue that the ‘deep level’ questions are especially important in discriminating the quality of preprints. This means the reader must consider about 20 different questions and points to make a quality judgement of a paper, which brings us to the issue of practical usability of this checklist. To make a reasonable judgement, a reader will need to read the preprint more than once to be able to orient them and to find relevant information pertaining to the checklist. This would take more than 20 minutes the authors allowed the workshop participants. Please elaborate on why a reader should spend that much time to use this checklist themselves over a social media post by an ‘expert’ dissecting the paper, for example.

Reply: The workshops we organised are only one example of how to use the checklist in guiding non-experts in critical assessment. Their purpose was to teach university students and science journalists how the checklist works, in a way that blends in with their work schedules, and not to impose an ideal setting or time limit on their use of the checklist. Indeed, the superficial level lends itself to a rather quick check by searching digitally in the pdf of the preprint. For inexperienced users of the scientific literature, this provides a way of knowing how to start such an assessment. For more experienced users, e.g. the science journalists, the deeper level provides a good starting set of aspects to reflect upon. In a different setting from our workshops, one could start with the superficial level, and choose one specific theme from the deeper level to be discussed, e.g. bias. Depending on the user’s experience and level of involvement, this or any use of the ‘deep level’ could take various amounts of time. Again, we must stress that the checklist is primarily a way to get non-specialists thinking about how to assess preprint quality for themselves - not every question must be considered for the exercise to be valid, as long as it gets non-specialists engaged and thinking critically. We have now clarified this in the Discussion section.
If the checklist is used on its own, and specifically in relation to a social media post, it provides, in our opinion, a way to check that the post did indeed consider important aspects of preprint quality (which are summarised in the checklist). If the answers to all deeper level questions can be found in the post, 20 minutes could again be sufficient time to dig deeper.

Third key finding is that this checklist highlighted issues with transparency and research integrity. We agree that any quality assessment attempt should consider these issues and this checklist could help improve the understanding and literacy of lay people regarding issues around research integrity and transparency.

Overall, the authors had their work cut out in attempting to create “a simple and user-friendly tool” to check quality of preprints for educated lay people as it requires a fine balance between usability and accuracy. The resulting checklist was shown to be a useful tool for educating potential readers of medical research on wider issues of publishing, preprints, peer review, evidence-based practice, research integrity, and transparency. This study sparks important research on improving research literacy of lay people and we hope that this study would be extended and applied to many different settings around the world.
Given the limitations I would suggest the authors need to be more circumspect In their conclusions about the usefulness of this current checklist given the extremely limited validity checking of it this is a very useful pilot but I would say I was only very preliminary work.
Reply: Thanks a lot for your comments and suggestions. We hope our manuscript now clearly states the limitations of our study and highlights that the assessment is only qualitative and restricted to a small set of perprints.

Minor. In figure 2 the "implementation stage" might better be labelled a "piloting stage" - this was really pilot testing the draft checklist out on a number of groups before finalising the checklist, rather than implementation to the wider community.
Reply: We called this stage implementation stage because we wanted to showcase how we implemented the checklist in our teaching and in the interaction with our communications departments without the intention of adapting the checklist afterwards. But of course you are right that piloting is the more common term. However, we already called the sensitivity test a pilot test, and thus decided on “Preliminary implementation stage” as a compromise. We hope that this change is addressing your concern sufficiently.
Point-by-point responses to Reviewer 1 - Paul P. Glasziou and Oyungerel Byambasuren
Turoman and colleagues have reported on the development of a much-needed assessment tool to help lay people to check preprint quality. The rationale for the study were clearly laid out. They went through four steps of development to create the checklist: initial internal and external reviews, final internal review and implementation. The resulting checklist has 4-categories – research question, study type, transparency and integrity, and limitations. Their reporting was well done following the Standards for Reporting Qualitative Research.
This study aimed to develop a brief checklist to guide non-experts in assessing pre-prints on Covid-19, but clearly can be extended to other types of research. The authors should be congratulated on making an interesting attempt. It represents a good first step, but I have a number of reservations about their validation process and the current checklist.
There are a few questions and comments below that could further improve this manuscript.

Introduction: could be more concise particularly paragraph 1 and 3.
Reply: Paragraphs 1 and 3 were shortened, while we also added additional literature in the introduction as requested by the second reviewer (Alice Fleerackers).

Methods:

Ethics clearance was not required. The study has a repository on Open Science Framework with all relevant documents, but it’s not clear whether it was submitted a priori. Please clarify.

Reply: No application for ethics clearance was submitted a priori. As we state in the text in the section Ethical considerations the verification was post-hoc: “A post-hoc verification with the University of Zurich ethical commission’s ethics review checklist (available in in the OSF repository for this project) confirmed that we did not require ethical approval for the current project per the regulations of the University of Zurich.” Unfortunately, we only realised during this revision that the document was provided to the editorial office but was not put on the OSF repository. This has been rectified now and the document is available here: https://osf.io/uysze. Additionally we extended the section Ethical considerations with additional explanations.

The researchers have not reported whether they conducted a literature review before developing their first draft of the checklist with six items. Checking existing literature in the beginning of research helps reduce research waste and needless duplication or repetition. Please clarify and if not conducted justify this omission.

Reply: In order to draft the first version of the checklist, a literature review was performed to understand the state of disseminating COVID-19-related results via preprints at the time, the impact of such dissemination, and what was already present and what was missing in terms of mitigating some negative impacts of such dissemination. The results of this literature review enter in the introduction of the manuscript. In the description of the study design we added that the first draft of the checklist was developed by the junior members of the team after an initial review of the literature and potential solutions (which forms the introduction of this paper).

The sensitivity testing is very limited using only three "high-quality" and three "low quality" preprints assessed with the checklist. That might be reasonable pilot testing, but far from a sensitivity testing or validation.

Reply: You are making a valid point. We adapted the wording and changed the term “sensitivity testing” to “pilot testing”. As explained later, we also extended the list of limitations of our work and the issue of limited testing is now explicitly mentioned.

Whilst choosing major COVID-19-related research as “high” quality test papers for the sensitivity analysis was understandable, choosing retracted preprints for the “low” quality set is not necessarily appropriate. Manuscripts could be of ‘good’ quality methodologically, yet they could be retracted due to data fabrication or irregularities. So, the fact that they were retracted doesn’t automatically mean they were of “low” quality. Please add it to the limitations.

Reply: Again, you raise a valid point. Our initial reasoning was that if the preprint authors, for example, fabricate their data, they might also not transparently disclose other aspects (limitations, conflict of interests, data sharing), so that our checklist could identify the preprint as being of poor quality. As such, retraction is used here as an imperfect proxy of low quality. We already discussed the difficulty of the selection of appropriate “low” quality preprints, but mention it now as a true limitation of our study in a dedicated Limitations section and in an amended sentence in the Pilot testing section “[...] with retraction being an imperfect but feasible proxy for low quality [...]”.

Results:

Second iteration of the checklist was trialled with psychology and medicine students and science journalists. They were given 15-20 minutes to read the preprints and complete the checklist as a group. For inexperienced people, 15-20 minutes is not enough time to understand a scientific study manuscript let alone assess the quality of it. This raises a concern about the real-world usability of this checklist.

Reply: Thank you for raising this point. First we would like to mention that the section describing these workshops was situated in an incorrect location in the article. We have now moved it to the correct placement in the Methods Section.
In fact, we designed the checklist to be used as a quick tool, with the option of digging deeper depending on interest, background, and preference. Hence, 15-20 minutes should be enough to decide whether it is worth spending more time reading a preprint. As such, the superficial level of our checklist provides a starting point of concrete aspects to search for without a deep understanding of the underlying science. The deeper level allows a more in-depth consideration, and for some aspects at this level, it is true that one needs to understand the presented study more thoroughly, which would take longer. That said, the less experienced student audience was given the preprints in advance and had to read at least one of them as homework (see Pilot implementation phase). In light of this comment, we have now added more detail in the Discussion section.

Sensitivity tests of the checklist drafts are not statistically analysed. Yes and no answers were compared between testers and workshop participants and between ‘high’ and ‘low’ quality preprints to make a final conclusion.

Reply: No, there were no statistical tests and analyses produced, and this was never intended. During the workshops we did not even collect data due to ethical considerations. This sensitivity (or pilot) testing should therefore be regarded as exploratory; we added a sentence to the corresponding section in the Methods section: “There were no formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way.”

Final draft checklist was again tested with students and journalists. It now has 4 categories: research questions, study type, transparency and integrity, and limitations. Each category has a brief statement explaining why this category is important and the latter 3 categories each have 4-5 additional questions to evaluate the manuscript in “a deeper level”.

The authors note that the checklist is only applicable to research in humans, and not for in vitro or animal studies. Implicitly then this is in the first check item that needs to be added to their checklist.

Reply: We understand your point. There is already a disclaimer on the checklist stating that it is mainly applicable for research in humans. We realised that the disclaimer was not added to the manuscript, and we have added it now (see Table 2). We believe that this is a better solution than adding an item to the checklist which could be wrongly understood as an additional quality criterion. We also discuss this in the new Limitations section.

Discussion: The authors claim that there are 3 key findings from this study:

They state that this checklist is effective in discerning between high- and low-quality studies, because they observed that ‘high’ quality studies get more ‘yes’ answers than the ‘low’ quality ones. Is this claim warranted without proper statistical analysis?

Reply: Thank you for this very good point. As below, we changed to more cautious wording in the Abstract and the Results section, and we added an explanatory sentence in the Methods section:
“When using both levels of evaluation, the checklist was effective for expert users at discriminating high- from low-quality preprints in a small set of example preprints.”
“This indicates that the checklist might be effective at discriminating between preprints of high and low quality.”
“Due to the limited number of assessed preprints, only a qualitative summary of the results is possible and no statistical analysis of these results is provided..”

They continue that the ‘deep level’ questions are especially important in discriminating the quality of preprints. This means the reader must consider about 20 different questions and points to make a quality judgement of a paper, which brings us to the issue of practical usability of this checklist. To make a reasonable judgement, a reader will need to read the preprint more than once to be able to orient them and to find relevant information pertaining to the checklist. This would take more than 20 minutes the authors allowed the workshop participants. Please elaborate on why a reader should spend that much time to use this checklist themselves over a social media post by an ‘expert’ dissecting the paper, for example.

Reply: The workshops we organised are only one example of how to use the checklist in guiding non-experts in critical assessment. Their purpose was to teach university students and science journalists how the checklist works, in a way that blends in with their work schedules, and not to impose an ideal setting or time limit on their use of the checklist. Indeed, the superficial level lends itself to a rather quick check by searching digitally in the pdf of the preprint. For inexperienced users of the scientific literature, this provides a way of knowing how to start such an assessment. For more experienced users, e.g. the science journalists, the deeper level provides a good starting set of aspects to reflect upon. In a different setting from our workshops, one could start with the superficial level, and choose one specific theme from the deeper level to be discussed, e.g. bias. Depending on the user’s experience and level of involvement, this or any use of the ‘deep level’ could take various amounts of time. Again, we must stress that the checklist is primarily a way to get non-specialists thinking about how to assess preprint quality for themselves - not every question must be considered for the exercise to be valid, as long as it gets non-specialists engaged and thinking critically. We have now clarified this in the Discussion section.
If the checklist is used on its own, and specifically in relation to a social media post, it provides, in our opinion, a way to check that the post did indeed consider important aspects of preprint quality (which are summarised in the checklist). If the answers to all deeper level questions can be found in the post, 20 minutes could again be sufficient time to dig deeper.

Third key finding is that this checklist highlighted issues with transparency and research integrity. We agree that any quality assessment attempt should consider these issues and this checklist could help improve the understanding and literacy of lay people regarding issues around research integrity and transparency.

Overall, the authors had their work cut out in attempting to create “a simple and user-friendly tool” to check quality of preprints for educated lay people as it requires a fine balance between usability and accuracy. The resulting checklist was shown to be a useful tool for educating potential readers of medical research on wider issues of publishing, preprints, peer review, evidence-based practice, research integrity, and transparency. This study sparks important research on improving research literacy of lay people and we hope that this study would be extended and applied to many different settings around the world.
Given the limitations I would suggest the authors need to be more circumspect In their conclusions about the usefulness of this current checklist given the extremely limited validity checking of it this is a very useful pilot but I would say I was only very preliminary work.
Reply: Thanks a lot for your comments and suggestions. We hope our manuscript now clearly states the limitations of our study and highlights that the assessment is only qualitative and restricted to a small set of perprints.

Minor. In figure 2 the "implementation stage" might better be labelled a "piloting stage" - this was really pilot testing the draft checklist out on a number of groups before finalising the checklist, rather than implementation to the wider community.
Reply: We called this stage implementation stage because we wanted to showcase how we implemented the checklist in our teaching and in the interaction with our communications departments without the intention of adapting the checklist afterwards. But of course you are right that piloting is the more common term. However, we already called the sensitivity test a pilot test, and thus decided on “Preliminary implementation stage” as a compromise. We hope that this change is addressing your concern sufficiently.
Competing Interests: We have no competing interests to report. Close
Report a concern

Comments on this article Comments (0)

Version 3

VERSION 3 PUBLISHED 01 Jun 2023

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3	4	5
Version 3 (revision) 03 Jun 24			read		read
Version 2 (revision) 26 Jan 24			read	read
Version 1 01 Jun 23	read	read

Paul Glasziou, Bond University, Robina, Australia

Oyungerel Byambasuren, Bond University, Robina, Australia
Alice Fleerackers, Simon Fraser University, Burnaby, Canada
Artemis Chaleplioglou, University of West Attica, Aigaleo, Greece
Somipam R Shimray, Babasaheb Bhimrao Ambedkar University, Lucknow, India
Adrian Lison, ETH Zurich, Zürich, Switzerland

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

10 Views

02 Aug 2024 | for Version 3

Adrian Lison, ETH Zurich, Zürich, Zurich, Switzerland

10 Views Cite this report Responses(0)

Approved With Reservations

The study by Turoman et al. presents a concise checklist for assessing the quality of health-related research papers and describes its multi-stage development process including the solicitation of expert opinions. Although the initial focus of the tool was on research related to COVID-19, the checklist developed is mostly generic and does not contain any COVID-19-specific items. The manuscript generally sets a positive example of extensive and open documentation of the research methodology. While no systematic validation of the checklist's ability to discriminate between low- and high-quality studies could be made, the authors discuss a preliminary application of the tool for teaching purposes. Below, I present a few remaining reservations and suggestions for improvement of the manuscript.

Limitations w.r.t. observational studies
The limitations of the checklist with regard to observational studies are already discussed in detail in the manuscript. I would nevertheless like to raise one more point for discussion. The "dig deeper" sections of the final checklist now contain disclaimers that control groups and randomization may not apply to observational studies. While this is true in a strict sense, I think that a major criterion to distinguish high-quality and low-quality observational studies is how much effort was invested into ruling out potential sources of confounding and other biases. We have seen this extensively when reviewing observational studies assessing the effectiveness of non-pharmaceutical interventions during COVID-19 [1], where many studies derived causal interpretations from simple descriptive analyses without discussing potential sources of bias. On the other hand, high-quality studies invested major efforts to reduce confounding through study design (e.g. use of variation in intervention exposure between populations and over time) and specific methodologies (e.g. synthetic controls or difference-in-differences techniques). I am aware that such an assessment of the study methodology is probably to in-depth and technical for the proposed checklist. Still, I wonder if it could be stressed that even if observational studies do not have control groups in a strict sense, much can be done (or not done) to avoid bias.

Development of first draft
While I generally appreciated the transparent documentation of the research methodology, I do have one point of critique: There are very few methodological details on how the first draft of the checklist was developed by the author team. As a response to an earlier comment by reviewer #1, a statement was added to the manuscript that "the first draft of the checklist was developed by the junior members of the team after an initial review of the literature and potential solutions (which forms the introduction of this paper)." I personally do not find this a sufficient explanation of the approach used to develop a first draft. Given the stepwise procedure of the study, I expect the first draft to have a strong influence on the final result. For example, even though the external experts were able to propose additional items, the review process seemed to be focused on the rating of existing items. It is thus relevant to know in more detail how the initial set was derived from the reviewed literature.

Communication of the internal pilot testing
I think it is clear that the pilot testing conducted in this study does not constitute a validation of the checklist due to various points already raised by reviewer #2, including the limited number of preprints assessed, use of the same set of studies in Stage 1 and Stage 3, and, most importantly, the fact that the testing was conducted by the developers of the checklist themselves, and not by external experts. Personally, I think that the present study can be approved and is valuable even without independent validation. However, in the current version of the manuscript (version 3), I still see room for improvement w.r.t. communicating this limitation clearly to the reader.

For example, the Results section of the abstract first mentions the external review (which was used to gather feedback on the checklist items) and then states that "when using both levels, the checklist was effective at discriminating a small set of high- and low-quality preprints" (which refers to the pilot testing). As a reader, I would thus by default assume that the pilot testing was also done by the external expert panel, when it was actually done by the study authors.

Similarly, the second paragraph of the discussion starts with "The results of the pilot testing by expert users can be divided into three key findings." Again, I would recommend communicating more clearly that this refers to internal testing that was done by two members of the developer team.

Educational resource vs. assessment tool
I figure from the discussion in the manuscript that the developed checklist is primarily meant as an educational resource/teaching tool. However, I wonder what the envisioned role of the PRECHECK project website is in this context. In its current form, the website appears like an assessment tool to me, and its description in the Introduction of the manuscript seems to reflect this, stating that "the aim was to develop a simple, user-friendly tool to guide non-scientists in their evaluation of preprint quality." I therefore wonder about the development team's plans going forward:

If the PRECHECK project is primarily meant as an educational tool, then the website could communicate this aim more clearly and provide additional resources such as exemplary filled-out checklists for students, etc. If, in contrast, the goal is to offer an assessment tool, then it would be valuable to provide additional guidance and disclaimers as addressed in the discussion of the manuscript also on the website - in particular on the question of how to interpret the results of the checklist. Many potential issues could potentially be raised, for example, that bad quality studies should not be interpreted as evidence against the claims of the study. Also, further resources such as the quick peer-review projects mentioned in the manuscript could be linked. To summarize, I did not find the presentation of the project goals fully consistent, and think the PRECHECK website could be further optimized depending on what its primary aim is.

The role of science journalism
I found it very convincing that (science) journalists could be a primary target group for the developed tool, since they operate in time- and resource-constrained environments during a public health emergency but their decisions to disseminate preprints in various media can have a substantial impact. The potential value of such checklists for journalism is a bit underdeveloped in the discussion and can be further highlighted, in my opinion.

Minor points
Is it possible to improve the resolution of Figure 1?

Introduction: "Nonetheless, a large portion of COVID-19 preprints show substantial changes in methods and results after peer-review." Are there specific references supporting this claim?

Methods: "... workshop as homework; one group read,64 and the other group read)62". The formatting could be improved here.

Checklist item 3a): "Is a protocol, study plan, or registration mentioned, and
accessible (e.g., by hyperlink, by registration number)?" While I fully agree with the value of protocols and pre-registration, I think that this requirement might not be realistic for early preprints during a public health emergency. I still think it is a valuable item to include, but this limitation might be worth mentioning.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

References

1. Banholzer N, Lison A, Özcelik D, Stadler T, et al.: The methodologies to assess the effectiveness of non-pharmaceutical interventions during COVID-19: a systematic review.Eur J Epidemiol. 2022; 37 (10): 1003-1024 PubMed Abstract | Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Infectious disease modeling, systematic literature review

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

6 Views

08 Jul 2024 | for Version 3

Artemis Chaleplioglou, University of West Attica, Aigaleo, Greece

6 Views Cite this report Responses(0)

Approved

I approve this version of this research article [version 3] as it is scientifically sound and interesting in preprints evaluation in general. The revisions by the authors are sufficient and undress well the reviewers’ concerns. The authors well addressed the importance of preprints in quick research dissemination, the scientific community response reflexes against preprint misleading cases, as well as other efforts in rapid evaluation of preprints. The PRECHECK project created by the authors represents an important initiative in evaluation of preprint quality by non-experts.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Library information science (LIS), Health information

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

4 Views

07 May 2024 | for Version 2

Somipam R Shimray, Babasaheb Bhimrao Ambedkar University, Lucknow, India

4 Views Cite this report Responses(1)

Approved With Reservations

The authors provide a contemporary perspective on Covid-19, developing a checklist for non-experts to assess pre-prints. The authors used four stages of development to create the checklist: internal reviews and external expert reviews, final internal review and preliminary implementation stage. I will make a few brief observations. The manuscript could benefit from further improvement through the following questions and comments.

Introduction

The inclusion of literature on the consequences of a preprint not meeting standard quality checks may be beneficial.
The report lacks a comprehensive literature review, necessitating a thorough review to prevent duplication and resource wastage.

Methods

The pilot testing consists of three high-quality and three low-quality preprints, which may be reasonable but not sufficient for validation.

Results

Participants were given only 15-20 minutes to read preprints and complete a checklist, which is insufficient for novices or inexpert to understand and assess a scientific study manuscript. Thus, checklist's real-world usability is questioned, and more time should be allocated for manuscript assessment.
The study's results require statistical validation.

Discussion

This study’s key finding require statistical validation, as merely obtaining more positive responses ‘yes’ does not guarantee validation.
The checklist's practical usability is questioned as it requires readers to consider 20 questions and points to evaluate a paper's quality.
The authors created a user-friendly checklist to check quality of preprints. The checklist was a useful tool for educating prospective readers of medical research about peer review, publishing, preprints, evidence-based practice, research integrity, and transparency. This study aims to enhance research literacy among students, with the potential to be applied globally in various settings.
These suggestions only improve the article; thus, I appreciate the authors considering them.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Research ethics

Respond to this report

Responses (1)

Author Response

21 Jun 2024

Nora Turoman, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland

We thank the reviewer for their observations and recommendations.

With the passage of time, any snapshot of the relevant literature that we can provide in the introduction to this manuscript will at some point become outdated, and thus a point for incoming reviews to point out. Thus, we can never provide a fully exhaustive literature review, especially while also respecting previous reviewer comments that our introduction was already too long to begin with. However, we believe that we have already discussed the consequences of preprint not meeting quality standards in the context of Covid-19 (which was the focus of our work) in the beginning of the introduction: “Even before the pandemic, concerns have been raised about the potential of such unvetted results and interpretation of findings leading to widespread misinformation. 10 Indeed, we have seen examples of non-peer-reviewed COVID-19-related claims 11 , 12 that were promptly uncovered as misleading or seriously flawed by the scientific community, 13 , 14 nonetheless infiltrate the public consciousness 15 and even public policy. 16”

The methods comment has been addressed before in our response to the previous reviewers by adding a disclaimer in the limitation section: “First, the selection of high- and low-quality preprints was based on proxies that may not objectively reflect a distinction between low- and high-quality studies. Moreover, the number of assessed preprints in both categories would need to be much higher if results were to be assessed quantitatively.”. That said, we additionally qualified one of the sentences in the above paragraph. After addressing this and another one of your comments below, this part now reads: “Moreover, the number of assessed preprints in both categories was at present too low for results to be assessed quantitatively, and to provide a formal validation of the checklist. For statistical validation to be possible, the number of preprints in each category would need to be much higher.”

Regarding the results, firstly, we have already addressed the issue of 15-20 minutes potentially not being enough time for every checklist user, in the text just before the Limitations section: “For instance, we are aware that the 15-20 minutes of time that workshop participants were allocated may not be sufficient for everyone to explore both levels. However, there is no one optimal setting or correct way to use the checklist for its use to be valid. Rather, it is meant to be flexible to suit the needs of various users, as long as it gets non-specialists engaged and thinking critically.” However, we have made this even clearer now by adding that “For instance, we are aware that the 15-20 minutes of time that workshop participants were allocated may not be sufficient for everyone, especially individuals with no scientific training, to explore both levels.”
Secondly, regarding statistical analyses, we mentioned that we could not conduct any 1) at the stage of evaluating high- and low-quality preprints because, per the Pilot testing paragraph: “Due to the limited number of assessed preprints, only a qualitative summary of the results is possible, and no statistical analysis of these results is provided.”, and 2) in the Preliminary implementation stage, as we did not have ethical clearance to collect participant data (we mention throughout the manuscript that no data were collected in any of the workshops). To make these points clear, we have expanded the Limitations section, which now reads as follows: “First, the selection of high- and low-quality preprints was based on proxies that may not objectively reflect a distinction between low- and high-quality studies. Moreover, the number of assessed preprints in both categories was at present too low for results to be assessed quantitatively, and to provide a formal validation of the checklist. For statistical validation to be possible, the number of preprints in each category would need to be much higher. A second limitation of the study is the limited and informal Preliminary implementation stage which was based on a population chosen mostly for convenience. At present, we did not collect any data from participants at this stage, as we did not have ethical approval to do so, and thus a formal and quantitative assessment could not be done. In order to conduct a formal statistical validation of the effectiveness of the checklist at this stage, a more careful recruitment and setup of an implementation stage would have been necessary.”

We thank you for your suggestions raised in your discussion points. We believe we have addressed them in the changes reported above.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

11 Views

26 Apr 2024 | for Version 2

Artemis Chaleplioglou, University of West Attica, Aigaleo, Greece

11 Views Cite this report Responses(1)

Approved With Reservations

During the COVID-19 pandemic the need for relative scientific information became imperative. The scientific journals pushed COVID-19 publications with fast-track reviewing and editing, whilst they released COVID-19 content freely available and reusable following their commitment on data sharing in public health emergencies (https://wellcome.org/press-release/sharing-research-data-and-findings-relevant-novel-coronavirus-ncov-outbreak and https://wellcome.org/press-release/publishers-make-coronavirus-covid-19-content-freely-available-and-reusable). Preprint repositories such as arXiv, SSRN, BioRxiv and MedRxiv jointly participated and signed the common statement of data sharing together with research fund organizations, universities, scientific societies, private companies, and scientific publishers, the day after the World Health Organization declared the COVID-19 outbreak as a public-health emergency of international concern. This statement ensured research finding availability via preprint servers before journal publication and to authors that preprints sharing ahead of submission wouldn’t pre-empt their publication in scientific journals. Under these conditions many investigators overcame any hesitations and contributed their findings including preliminary and interim works related to the pandemic to preprint repositories as elegantly demonstrated in Fig.1. It is important to refer to the collaborative participation of preprint repositories and scientific journals in the fast delivery of COVID-19 related data and information. In this report Turoman et al presents a checklist to qualitative evaluation of COVID-19 preprints. Rapid Reviews\COVID-19 (https://rrid.mitpress.mit.edu/rrc19) initiative deserves to be mentioned as a platform reviewing COVID-19 preprints as well as because of its innovative approach summarized in its peer review guidelines (https://rrid.mitpress.mit.edu/guidelines), and the delivery of the summary of reviews together with an easy to use and understanding by non-experts, strength of evidence scale key. This approach could be compared and discussed in parallel with the Turoman et al qualitative evaluation checklist.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Library information science (LIS), Health information

Respond to this report

Responses (1)

Author Response

21 Jun 2024

Nora Turoman, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland

We thank you for your positive and insightful comment.

We have now included the collaboration between journals and preprint repositories that you mentioned after the first reference to Figure 1: “Since the start of the pandemic, the number of preprints on COVID-19 has been steadily rising, with over 55,000 preprints to date (as of June 27, 2022; see also Figure 1). This proliferation can be explained by researchers hesitating less to upload their results as preprints, following the joint commitment to openly share COVID-19-related content by preprint servers, research funders, universities, scientific societies, private companies, and scientific publishers, the day after the World Health Organization declared the COVID-19 outbreak as a public-health emergency of international concern (https://wellcome.org/press-release/sharing-research-data-and-findings-relevant-novel-coronavirus-ncov-outbreak; https://wellcome.org/press-release/publishers-make-coronavirus-covid-19-content-freely-available-and-reusable).”

Rapid Reviews/COVID-19 has now been included in the list of existing approaches for offering quick peer review to Covid-19-related manuscripts: “Of the extant efforts to improve preprint quality, most have focused on introducing quality control via quick peer-review, e.g., Prereview ( https://www.prereview.org/), Review Commons ( https://www.reviewcommons.org/), PreLights ( https://prelights.biologists.com/), and Rapid Reviews/COVID-19 (https://rrid.mitpress.mit.edu/rrc19).” We have also discussed how the checklist and the innovative Rapid Reviews/COVID-19 approach could co-exist in providing a more balanced set of information on preprints in the last paragraph before the Limitations section: “As a teaching tool meant to engage non-specialists, the current checklist could be a useful addition to existing approaches offering quick peer-review, such as Rapid Reviews\COVID-19. This specific initiative encourages short expert reviews, and in a particularly innovative approach, asks reviewers to use a Strength of Evidence Scale to select one of 5 options for how well the claims made in the preprint are supported by the data and method (https://rrid.mitpress.mit.edu/guidelines). Thus, a fast, easy to interpret, and quantitative expert reviewer assessment is provided. Such assessments could be nicely complemented by the deep level aspects of the non-specialist “review” that our checklist can offer. For example, policymakers facing decisions based on preprints would be able to rely both on easy-to-understand quantitative expert assessments and qualitative non-specialist assessments for information.”

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

24 Views

11 Oct 2023 | for Version 1

Alice Fleerackers, Simon Fraser University, Burnaby, British Columbia, Canada

24 Views Cite this report Responses(1)

Not Approved

van Schalkwyk, F., & Dudek, J. (2022). Reporting preprints in the media during the Covid-19 pandemic. Public Understanding of Science, 31(5), 608–616. https://doi.org/10.1177/09636625221077392
Massarani, L., & Neves, L. F. F. (2022). Reporting COVID-19 preprints: Fast science in newspapers in the United States, the United Kingdom and Brazil. Ciência & Saúde Coletiva, 27, 957–968. https://doi.org/10.1590/1413-81232022273.20512021
Massarani, L., Neves, L. F. F., Entradas, M., Lougheed, T., & Bauer, M. W. (2021). Perceptions of the impact of the COVID-19 pandemic on the work of science journalists: Global perspectives. Journal of Science Communication, 20(07), A06. https://doi.org/10.22323/2.20070206
Massarani, L., Neves, L. F. F., & Silva, C. M. da. (2021). Excesso e alta velocidade das informações científicas: Impactos da COVID-19 no trabalho de jornalistas. E-Compós. https://doi.org/10.30962/ec.2426
Oliveira, T., Araujo, R. F., Cerqueira, R. C., & Pedri, P. (2021). Politização de controvérsias científicas pela mídia brasileira em tempos de pandemia: A circulação de preprints sobre Covid-19 e seus reflexos. Revista Brasileira de História da Mídia, 10(1), Article 1. https://doi.org/10.26664/issn.2238-5126.101202111810
Fleerackers, A., Moorhead, L. L., Maggio, L. A., Fagan, K., & Alperin, J. P. (2022). Science in motion: A qualitative analysis of journalists’ use and perception of preprints. PLOS ONE, 17(11), e0277769. https://doi.org/10.1371/journal.pone.0277769
Fleerackers, A., Riedlinger, M., Moorhead, L. L., Ahmed, R., & Alperin, J. P. (2022). Communicating scientific uncertainty in an age of COVID-19: An investigation into the use of preprints by digital media outlets. Health Communication, 37(6), 726–738. https://doi.org/10.1080/10410236.2020.1864892
Fleerackers, A., Shores, K., Chtena, N., & Alperin, J. P. (2023). Unreviewed science in the news: The evolution of preprint media coverage from 2014-2021 (2023.07.10.548392). bioRxiv. https://doi.org/10.1101/2023.07.10.548392
Ratcliff, C. L., Fleerackers, A., Wicke, R., Harvill, B., King, A. J., & Jensen, J. D. (2023). Framing COVID-19 preprint research as uncertain: A mixed-method study of public reactions. Health Communication, 0(0), 1–14. https://doi.org/10.1080/10410236.2023.2164954
Cyr, C., Cataldo, T. T., Brannon, B., Buhler, A., Faniel, I., Connaway, L. S., Valenza, J. K., Elrod, R., & Putnam, S. (2021). Backgrounds and behaviors: Which students successfully identify online resources in the face of container collapse. First Monday. https://doi.org/10.5210/fm.v26i3.10871
Cataldo, T., Faniel, I., Buhler, A., Brannon, B., Connaway, L., & Putnam, S. (2023). Students’ Perceptions of Preprints Discovered in Google: A Window into Recognition And Evaluation. College & Research Libraries, 84(1). https://doi.org/10.5860/crl.84.1.137
Sebbah, B., Bousquet, F., & Cabanac, G. (2022). Le journalisme scientifique à l’épreuve de l’actualité « tout covid » et de la méthode scientifique. Les Cahiers du journalisme, 2(8–9), R119–R135. https://doi.org/10.31188/CaJsm.2(8-9).2022.R119

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions drawn adequately supported by the results?

Partly

References

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Science journalism (specifically the use of preprints), science communication, health communication, preprints/open science

Respond to this report

Responses (1)

Author Response

22 Mar 2024

Nora Turoman, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland

Point-by-point response to Reviewer 2 - Alice Fleerackers
Thank you for giving me the opportunity to review this study. The topic is timely and relevant, and addresses a question I have spent a lot of time thinking about: How do you help people outside of academia vet the quality of preprint research?
The study uses the DELPHI method to develop a checklist that ‘scientifically literate non-experts’ can use to assess the quality of preprint research. That is, the researchers developed an initial checklist, that was then iterated and refined through multiple rounds of expert feedback and testing. The experts who contributed feedback included academics (including those on the study team but also researchers with expertise in preprints, peer review, meta research, etc), as well as science journalists and students of medicine and psychology.
The strengths of this study include the use of pre-defined protocols and quality reporting checklists (e.g., SRQR, Delphi method) and the choice to explicitly invite feedback from target users. I found that overall, the study was well-written. It provides links to iterations of the checklist, as well as data and protocols, which improves the transparency of the research.
However, I have several concerns with the manuscript — some of which are major:

First, I am very surprised that this did not require ethics review. While it does appear to fall outside of the scope of the university’s ethics guidelines, it was not clear from the manuscript whether participants were told their data would be used in a study. Rather students and journalists agreed to participate in a ‘workshop.’ Were they told their data would be used in reports/research? It would be helpful if the authors could clarity.

Reply: Thank you for your concern, we do care about transparency very much and paid attention to this aspect throughout the study. The situation was different for the student workshops and the journalist workshops. For the student workshops we write in the text and this corresponds to the actual process: “The responses were only analysed inasmuch as percentages of ‘yes’, ‘no’, and ‘partly/maybe’ responses were automatically generated by the respective platforms based on the responses entered by the students, during the class, for the purpose of in-class discussion. These responses were not recorded outside of this platform (e.g., on any project team member’s computer), nor were they subject to any formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way.” Therefore, the student workshops are simply described in this article to showcase how the PRECHECK checklist could be used in practice; the responses that the students gave as part of the workshop were not used to further inform the checklist. Hence, they were not told that their data would be used in a study because there was no intention to do so.
For the journalist workshops, our intention was the same as above: we wanted to use the PRECHECK checklist purely for education, without gathering data for the project. But the journalists were informed through our presentation that we were the developers of the checklist, and some participants offered unsolicited feedback. In this situation it was clear to them that their feedback could be incorporated into a new version of the checklist and they did indeed give it for this purpose. In the article text we now write (addition from this revision in italics): “In the workshop for journalists, some participants offered unsolicited yet helpful feedback, which was incorporated into the finalised version of the checklist. These participants were aware that their feedback would be used for this purpose through the presentation given during the workshop.”
Additionally we extended the section Ethical considerations with some more explanations of the facts.

Second, I have several concerns about the participants. While both science students and science journalists may be considered scientifically literate non-experts, these user groups have very different needs and likely very different understandings of academic publishing, research, and peer review—as well as different challenges and uses with respect to preprints. In addition, there are many other types of scientifically literate non-experts who could have been recruited to this study, such as policymakers, high school science teachers, ordinary people who studied science in the past, etc. A clear rationale for why you chose these two groups would considerably strengthen the manuscript.

Reply: Thank you for this valid point. First we would like to mention that the section describing these workshops was situated in an incorrect location in the article. We have now moved it to the correct placement in the Methods Section.
We chose both groups first out of convenience, as we had direct access to them and an inherent interest in reaching them with our content. Initially our project was motivated by the Covid pandemic, during which results from preprints ended up as highlights in news outlets, and thus we wanted specifically to address science journalists. We are aware of the fact that Science journalists are actually a group of people who already have a lot of experience with vetting scientific literature. In contrast, our second group, the students, only have limited knowledge on the publishing system etc. Both the students at the University of Zurich and Geneva were at a stage in their studies where they needed to get used to using literature, the lectures in which our workshops were held had, in part, the purpose of introducing them to that. Consequently, given that they will have to read scientific papers as well as preprints for many of their future lectures and projects, we aimed at giving them a tool to help quality check what they plan on reading. We have added more explanations of these choices in the section Study Design. Specifically, we state “The chosen setup of the preliminary implementation phase does not allow a systematic validation of the checklist, especially since the target group was chosen mainly for convenience.”

Relatedly, these groups (students, journalists) are incredibly diverse and there may be very different levels of science literacy within each group. For example, journalists in many beats use research in their reporting (not just those specialized in science), but some use it more frequently than others and some have professional training in a STEM field (e.g., they did a science degree before becoming journalists). Those who use research frequently, and/or have a research degree, likely have more in-depth knowledge than those who only occasionally use research or don’t have a STEM background. Similarly, some of the students in these two classes may have already completed a degree in STEM, or worked as researchers, while others may not have. Yet you provide no information about participants’ level of science literacy, previous experience with research, or education. Given that your target population is ‘scientifically literate non-experts’ then I would expect some assessment of science literacy, as well as a check that none of them are really ‘experts,’ to be a requirement when sampling participants. More broadly, it would be useful to know a bit more about both sets of participants (students and journalists), especially their educational backgrounds and level of experience with academic research, as this would provide a better understanding of who this checklist would be most useful for.
Reply: We understand why this information could be of interest, but since we did not collect any data on the workshop participants we cannot provide this information. The text states that “The responses were only analysed inasmuch as percentages of ‘yes’, ‘no’, and ‘partly/maybe’ responses were automatically generated by the respective platforms based on the responses entered by the students, during the class, for the purpose of in-class discussion. These responses were not recorded outside of this platform (e.g., on any project team member’s computer), nor were they subject to any formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way. We did not ask any question about the workshop participants’ educational background or level of experience with academic research.” We thus added the issues raised here to the new Limitations section of the manuscript.

Relatedly, very little information is provided about how the science journalists who were invited to participate in the workshops were recruited. How did you define ‘science journalist’? Did you include freelancers or staff reporters only? What about general news reporters who sometimes cover science? Did you only look at traditional, legacy journalism outlets or did you also consider newer forms of journalism? Providing more clarity around the inclusion or exclusion criteria for participating in the study, and the population of journalists from which you sampled, would improve the quality of the methods section.
Reply: Thank you for pointing this out, we now tried to be clearer in the text. The first workshop for science journalists was organised for science journalists employed at the Universities of Geneva (i.e. the members of the respective Communication Departments). The parenthesis has been added to the text during this revision and explains that we again used a “convenience sample” within our own institutions. Second, we were invited to present at a workshop for science journalists organised by a third party, the Swiss Association of Science Journalists, for which the recruitment was not in our hands. As explained on the association’s website, ordinary members of the association must attest that they work as journalists for an independent media outlet and their work should primarily focus on scientific, medical, technical or science policy issues.
We would like to stress again that none of the workshops were intended to collect data for a validation of the checklist. All events were, what we now call, preliminary implementation events, where we showcased the checklist for use in a teaching setting.

On a more minor note, you mention getting input on the checklist from “experts across four topic groups: research methodology (14 invitations), preprints (15 invitations), journal editors (15 invitations), and science journalists (10 invitations).” The rest of the paragraph explains who these people are in more detail, but at first reading it was not clear whether you were inviting researchers who study science journalists or the journalists themselves. Similarly, did you invite preprint server staff, preprint authors, or researchers who study preprints? I would adapt the language a bit here for clarity.
Reply: Thanks a lot for this comment which shows that more information is needed. We realised the sentence you also copied presenting the four topics was confusing, since we used jobs (editors and journalists) to refer to topics. Hence, this was changed now to journal publishing (15 invitations), and science journalism (10 invitations). We also added another clarifying sentence: Thus, we invited science journalists, journal editors, as well as researchers whose research focuses on research methodology and preprints.

In addition, while I am not an expert in the DELPHI method, I do have some concerns about how it was implemented, given the conclusions that were drawn.

First, this is a mixed method study, not a completely qualitative one, given that you gathered Likert-scale data (which was used to perform quantitative tests), as well as open-ended feedback. In addition, the Sensitivity Testing relied on binary data (i.e., yes/no) responses, which is quantitative. While I commend the authors for using Standards for Reporting Qualitative Research (SRQR), some standards for quantitative research have not been met. Specifically, there was no information about the degree to which the various users who used the checklist in the Sensitivity Testing agreed with one another. How often did one coder choose yes and the other choose no? Normally I would expect to see some form of intercoder reliability test here, especially if the goal of the study is to develop a checklist that can help people assess the quality of the research. If the checklist works well, then two individuals should be able to come to similar conclusions when using it. On a more minor note, in qualitative research, it is not true that “in general, the more participants, the stronger the evidence” (Turoman et al., 2023, p. 14). I would remove this statement or make it clear in the text that it relates specifically to the quantitative aspects of the research.
Reply: Thank you for this diligent point. It is true that we quantitatively assessed the responses from the expert panel. But this is the only real quantitative aspect of this study. The sensitivity testing, now called pilot testing, did not quantitatively assess the procedure. We added the following sentence to the text: Due to the limited number of assessed preprints only a qualitative summary of the results is possible and no statistical analysis of these results is provided.
You are completely correct that the remark in the checklist “There is no one-size-fits-all number of participants that makes a study good, but in general, the more participants, the stronger the evidence.” of course only applies to quantitative studies. But the checklist is preceded by the disclaimer “The checklist works best for studies with human subjects, using primary data (that the researchers collected themselves) or systematic reviews, meta-analyses and re-analyses of primary data. It is not ideally suited to simulation studies (where the data are computer-generated).”, which we have now expanded to state the following in the last sentence: “It is not ideally suited to simulation studies (where the data are computer-generated) or studies that have a qualitative part (e.g., like interview data, or other data that cannot be easily translated into numbers).” We have realised that the disclaimer had not initially made it into the manuscript, only the document in OSF itself, and the web version of the checklist. Thus, we have added the disclaimer into the manuscript now (see Table 2), and the updated version on the OSF.

Second, it strikes me that the checklist was never sensitivity tested by participants in the two target groups, only by members of the study team. Why is that? Given that the research team are experts in this issue and had read and attempted to assess the quality of the checklist multiple times, they would likely be much more adept at applying the checklist than journalists or students using the checklist for the first time. This seems like a major limitation of the study, given that it seeks to provide a checklist that is useful for these groups. You do comment that “both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used” and that “there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own.” (Turoman et al., 2023, p. 15), suggesting that you did examine the similarity in participants’ responses to the checklist questions. If this was the case, including the process used to make the comparison in the methods section, as well as the results of the comparison, would considerably strengthen the manuscript. If you did not compare the users’ responses in any systematic way, I do not think you can make conclusive claims such as, “The implementation round confirmed that the checklist was indeed usable (Turoman et al., 2023, p. 15) or . “Appearing” to understand how to use a checklist is not the same as understanding how to use it, nor using it in a way that is effective and reliable.
Reply: Again, thank you for this valid point. We made sure that the distinction between expert testers in the pilot testing phases (formerly sensitivity testing) is made more clearly, e.g. in the abstract with the adapted sentence “When using both levels of evaluation, the checklist was effective for expert users at discriminating high- from low-quality preprints in a small set of example preprints.” During the workshops we used the classroom response systems (Votamatic and Klicker) provided by our universities to discuss with the audience what we believe are the correct answers to the checklist questions for two specific preprints that we also included in the pilot testing phase. This workshop discussion phase is the reason why we mention that participants appeared to understand the use of the checklist and provided in majority the same answers as ourselves. We adapted the wording of this section as follows: The preliminary implementation round confirmed that the checklist was indeed usable, as, during the workshop discussion, both our student and journalist cohorts across multiple sites appeared to understand how the checklist was supposed to be used. Further, in our workshops, the live surveys showed that there appeared to be an overlap in the students’ and journalists’ application of the checklist to the high- and low- quality preprints and our own.

Third, I found myself wondering about the choice of preprints used in the testing. How could reusing the same preprints over the various rounds have influenced results, especially given that the evaluation/application of the checklist was performed by the same two members each time (as mentioned above). Similarly, while the proxies you used to identify ‘high’ and ‘low’ quality preprints are creative, and has advantages over other proxy measures of quality (e.g., preprints published in high impact journals, or cited a lot), it seems like the Milestone List is based on the significance/usefulness of the research for addressing the COVID-19 crisis, not the integrity of the methods, data, study design, or reporting. Yet this is just a guess, because looking at the list, it's not clear at all what criteria were used to select the articles—it just says that "Nature highlighted key papers and preprints to help readers keep up with the flood of coronavirus research." What makes a paper "key" in this case? This seems very important given how central this list is to the method of the study. More broadly, I would appreciate a definition of "quality" somewhere earlier in this article, as well as a clear discussion of the limitations associated with the sample (e.g., limited to only three preprints, all of which were about COVID-19, and all of which are biomedical). The checklist seems very inappropriate for assessing the ‘quality’ of research or scholarship in other disciplines, such as the social sciences or humanities. You may want to rephrase the purpose and conclusions of the study to specify the focus on biomedical preprints, rather than on preprints in general.
Reply: Thank you very much for these questions, which in part were also asked by Reviewer 1. In the new Limitations sections we discuss the limitations of the selection of the sample of preprints. We also warn in the Pilot testing section: Due to the limited number of assessed preprints, only a qualitative summary of the results is possible and no statistical analysis of these results is provided. As a consequence the set of chosen preprints is, in our opinion, not completely central to the presented project and we chose not to comment any further on the process of how papers on the Milestone list were selected.
We also refrained from defining “quality” of a preprint or a publication in general, as already mentioned in the manuscript (Footnote 2): As we will explain below, high and low quality was decided based on proxies, and not on objective criteria, as we could not determine a set of such criteria that apply to all preprints across all disciplines that the checklist could possibly be used on. With this, when we refer to ‘high-quality’ and ‘low-quality’ preprints, we do so for brevity, and not because the preprints in question belong to predefined universally agreed upon categories of ‘good’ and ‘bad’. Here, as in response to reviewer 1, we use retraction as an imperfect proxy of low quality.
Finally, we did adapt our conclusion to be narrower: Despite our COVID-19-related motivation, the PRECHECK checklist should be more broadly applicable to scientific works describing biomedical studies with human subjects.

In addition, there is some literature missing from the review that seems worth citing with respect to journalists’ use of preprints, as well as the public’s understanding/reponses to media coverage of preprints. This includes some of my own work (I know, reviewers always do this), but also of other scholars. Here are some citations worth considering. I also encourage the authors to do an additional literature search, since more research may have come online recently of which I am not aware:
[LIST OF REFERENCES]
Reply: Thanks a lot for pointing us to further literature we were unaware of. Some of those papers were published after our first submission. We reviewed the papers for their relevance for our manuscript and added many of them in our introduction/literature review.

Finally, I encourage the authors to make the implications and contributions of the work clearer. From a practical perspective, I found myself wondering, is this actually a checklist for vetting the quality of preprints, or for assessing research quality in general? Given that there are many, many checklists available for assessing the quality of peer reviewed research, answering this question seems very important. Perhaps the contribution is that this checklist could help nonexperts vet research—peer reviewed or not. I do not know this area well enough to know the conceptual, theoretical, or scholarly gaps that have been filled, but I am sure the authors can do so.
Reply: We repeatedly refer to our checklist as a “teaching tool”, as for example in this sentence in the discussion: One important aspect of the checklist to note is its intended use as a teaching tool to guide non-specialist audiences to evaluate preprints more critically by giving them concrete aspects to search for in the manuscript. We additionally decided to delete the last sentence in our Conclusions section (With this, and the fact that our target audience could use the checklist, we believe that the checklist has great potential to help guide non-scientists’ understanding of scientific content, especially in preprint form.) because this claim is not actually supported by our project.

I know that these are a lot of comments and concerns, but I have included them not to be rude or overly critical but because I believe addressing will make the study better able to contribute to an important issue that I care about deeply. I hope the authors find them helpful and are not discouraged from pursuing future research about journalists’ use of preprints. We need more of it!
Reply: We greatly appreciate your comments and were able to integrate them in our manuscript. We are convinced that the changes we made thanks to your comments improved the clarity and relevance of our manuscript.

View more View less

Competing Interests

There are no competing interests to report.

Back to all reports

Reviewer Report

34 Views

18 Jul 2023 | for Version 1

Paul Glasziou, Institute for Evidence-Based Healthcare, Bond University, Robina, Queensland, Australia

Oyungerel Byambasuren, Institute for Evidence-Based Healthcare, Bond University, Robina, Queensland, Australia

34 Views Cite this report Responses(1)

Approved With Reservations

Ethics clearance was not required. The study has a repository on Open Science Framework with all relevant documents, but it’s not clear whether it was submitted a priori. Please clarify.
The researchers have not reported whether they conducted a literature review before developing their first draft of the checklist with six items. Checking existing literature in the beginning of research helps reduce research waste and needless duplication or repetition. Please clarify and if not conducted justify this omission.
The sensitivity testing is very limited using only three "high-quality" and three "low quality" preprints assessed with the checklist. That might be reasonable pilot testing, but far from a sensitivity testing or validation.
Whilst choosing major COVID-19-related research as “high” quality test papers for the sensitivity analysis was understandable, choosing retracted preprints for the “low” quality set is not necessarily appropriate. Manuscripts could be of ‘good’ quality methodologically, yet they could be retracted due to data fabrication or irregularities. So, the fact that they were retracted doesn’t automatically mean they were of “low” quality. Please add it to the limitations.

Results:

Second iteration of the checklist was trialled with psychology and medicine students and science journalists. They were given 15-20 minutes to read the preprints and complete the checklist as a group. For inexperienced people, 15-20 minutes is not enough time to understand a scientific study manuscript let alone assess the quality of it. This raises a concern about the real-world usability of this checklist.
Sensitivity tests of the checklist drafts are not statistically analysed. Yes and no answers were compared between testers and workshop participants and between ‘high’ and ‘low’ quality preprints to make a final conclusion.
Final draft checklist was again tested with students and journalists. It now has 4 categories: research questions, study type, transparency and integrity, and limitations. Each category has a brief statement explaining why this category is important and the latter 3 categories each have 4-5 additional questions to evaluate the manuscript in “a deeper level”.
The authors note that the checklist is only applicable to research in humans, and not for in vitro or animal studies. Implicitly then this is in the first check item that needs to be added to their checklist

Discussion: The authors claim that there are 3 key findings from this study:

They state that this checklist is effective in discerning between high- and low-quality studies, because they observed that ‘high’ quality studies get more ‘yes’ answers than the ‘low’ quality ones. Is this claim warranted without proper statistical analysis?
They continue that the ‘deep level’ questions are especially important in discriminating the quality of preprints. This means the reader must consider about 20 different questions and points to make a quality judgement of a paper, which brings us to the issue of practical usability of this checklist. To make a reasonable judgement, a reader will need to read the preprint more than once to be able to orient them and to find relevant information pertaining to the checklist. This would take more than 20 minutes the authors allowed the workshop participants. Please elaborate on why a reader should spend that much time to use this checklist themselves over a social media post by an ‘expert’ dissecting the paper, for example.
Third key finding is that this checklist highlighted issues with transparency and research integrity. We agree that any quality assessment attempt should consider these issues and this checklist could help improve the understanding and literacy of lay people regarding issues around research integrity and transparency.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Evidence-based practice, overdiagnosis, non-drug interventions,

Respond to this report

Responses (1)

Author Response

22 Mar 2024

Nora Turoman, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland

Point-by-point responses to Reviewer 1 - Paul P. Glasziou and Oyungerel Byambasuren
Turoman and colleagues have reported on the development of a much-needed assessment tool to help lay people to check preprint quality. The rationale for the study were clearly laid out. They went through four steps of development to create the checklist: initial internal and external reviews, final internal review and implementation. The resulting checklist has 4-categories – research question, study type, transparency and integrity, and limitations. Their reporting was well done following the Standards for Reporting Qualitative Research.
This study aimed to develop a brief checklist to guide non-experts in assessing pre-prints on Covid-19, but clearly can be extended to other types of research. The authors should be congratulated on making an interesting attempt. It represents a good first step, but I have a number of reservations about their validation process and the current checklist.
There are a few questions and comments below that could further improve this manuscript.

Introduction: could be more concise particularly paragraph 1 and 3.
Reply: Paragraphs 1 and 3 were shortened, while we also added additional literature in the introduction as requested by the second reviewer (Alice Fleerackers).

Methods:

Ethics clearance was not required. The study has a repository on Open Science Framework with all relevant documents, but it’s not clear whether it was submitted a priori. Please clarify.

Reply: No application for ethics clearance was submitted a priori. As we state in the text in the section Ethical considerations the verification was post-hoc: “A post-hoc verification with the University of Zurich ethical commission’s ethics review checklist (available in in the OSF repository for this project) confirmed that we did not require ethical approval for the current project per the regulations of the University of Zurich.” Unfortunately, we only realised during this revision that the document was provided to the editorial office but was not put on the OSF repository. This has been rectified now and the document is available here: https://osf.io/uysze. Additionally we extended the section Ethical considerations with additional explanations.

The researchers have not reported whether they conducted a literature review before developing their first draft of the checklist with six items. Checking existing literature in the beginning of research helps reduce research waste and needless duplication or repetition. Please clarify and if not conducted justify this omission.

Reply: In order to draft the first version of the checklist, a literature review was performed to understand the state of disseminating COVID-19-related results via preprints at the time, the impact of such dissemination, and what was already present and what was missing in terms of mitigating some negative impacts of such dissemination. The results of this literature review enter in the introduction of the manuscript. In the description of the study design we added that the first draft of the checklist was developed by the junior members of the team after an initial review of the literature and potential solutions (which forms the introduction of this paper).

The sensitivity testing is very limited using only three "high-quality" and three "low quality" preprints assessed with the checklist. That might be reasonable pilot testing, but far from a sensitivity testing or validation.

Reply: You are making a valid point. We adapted the wording and changed the term “sensitivity testing” to “pilot testing”. As explained later, we also extended the list of limitations of our work and the issue of limited testing is now explicitly mentioned.

Whilst choosing major COVID-19-related research as “high” quality test papers for the sensitivity analysis was understandable, choosing retracted preprints for the “low” quality set is not necessarily appropriate. Manuscripts could be of ‘good’ quality methodologically, yet they could be retracted due to data fabrication or irregularities. So, the fact that they were retracted doesn’t automatically mean they were of “low” quality. Please add it to the limitations.

Reply: Again, you raise a valid point. Our initial reasoning was that if the preprint authors, for example, fabricate their data, they might also not transparently disclose other aspects (limitations, conflict of interests, data sharing), so that our checklist could identify the preprint as being of poor quality. As such, retraction is used here as an imperfect proxy of low quality. We already discussed the difficulty of the selection of appropriate “low” quality preprints, but mention it now as a true limitation of our study in a dedicated Limitations section and in an amended sentence in the Pilot testing section “[...] with retraction being an imperfect but feasible proxy for low quality [...]”.

Results:

Second iteration of the checklist was trialled with psychology and medicine students and science journalists. They were given 15-20 minutes to read the preprints and complete the checklist as a group. For inexperienced people, 15-20 minutes is not enough time to understand a scientific study manuscript let alone assess the quality of it. This raises a concern about the real-world usability of this checklist.

Reply: Thank you for raising this point. First we would like to mention that the section describing these workshops was situated in an incorrect location in the article. We have now moved it to the correct placement in the Methods Section.
In fact, we designed the checklist to be used as a quick tool, with the option of digging deeper depending on interest, background, and preference. Hence, 15-20 minutes should be enough to decide whether it is worth spending more time reading a preprint. As such, the superficial level of our checklist provides a starting point of concrete aspects to search for without a deep understanding of the underlying science. The deeper level allows a more in-depth consideration, and for some aspects at this level, it is true that one needs to understand the presented study more thoroughly, which would take longer. That said, the less experienced student audience was given the preprints in advance and had to read at least one of them as homework (see Pilot implementation phase). In light of this comment, we have now added more detail in the Discussion section.

Sensitivity tests of the checklist drafts are not statistically analysed. Yes and no answers were compared between testers and workshop participants and between ‘high’ and ‘low’ quality preprints to make a final conclusion.

Reply: No, there were no statistical tests and analyses produced, and this was never intended. During the workshops we did not even collect data due to ethical considerations. This sensitivity (or pilot) testing should therefore be regarded as exploratory; we added a sentence to the corresponding section in the Methods section: “There were no formal statistical analyses performed on these responses, and the responses did not influence the content of the checklist in any way.”

Final draft checklist was again tested with students and journalists. It now has 4 categories: research questions, study type, transparency and integrity, and limitations. Each category has a brief statement explaining why this category is important and the latter 3 categories each have 4-5 additional questions to evaluate the manuscript in “a deeper level”.
The authors note that the checklist is only applicable to research in humans, and not for in vitro or animal studies. Implicitly then this is in the first check item that needs to be added to their checklist.

Reply: We understand your point. There is already a disclaimer on the checklist stating that it is mainly applicable for research in humans. We realised that the disclaimer was not added to the manuscript, and we have added it now (see Table 2). We believe that this is a better solution than adding an item to the checklist which could be wrongly understood as an additional quality criterion. We also discuss this in the new Limitations section.

Discussion: The authors claim that there are 3 key findings from this study:

They state that this checklist is effective in discerning between high- and low-quality studies, because they observed that ‘high’ quality studies get more ‘yes’ answers than the ‘low’ quality ones. Is this claim warranted without proper statistical analysis?

Reply: Thank you for this very good point. As below, we changed to more cautious wording in the Abstract and the Results section, and we added an explanatory sentence in the Methods section:
“When using both levels of evaluation, the checklist was effective for expert users at discriminating high- from low-quality preprints in a small set of example preprints.”
“This indicates that the checklist might be effective at discriminating between preprints of high and low quality.”
“Due to the limited number of assessed preprints, only a qualitative summary of the results is possible and no statistical analysis of these results is provided..”

They continue that the ‘deep level’ questions are especially important in discriminating the quality of preprints. This means the reader must consider about 20 different questions and points to make a quality judgement of a paper, which brings us to the issue of practical usability of this checklist. To make a reasonable judgement, a reader will need to read the preprint more than once to be able to orient them and to find relevant information pertaining to the checklist. This would take more than 20 minutes the authors allowed the workshop participants. Please elaborate on why a reader should spend that much time to use this checklist themselves over a social media post by an ‘expert’ dissecting the paper, for example.

Reply: The workshops we organised are only one example of how to use the checklist in guiding non-experts in critical assessment. Their purpose was to teach university students and science journalists how the checklist works, in a way that blends in with their work schedules, and not to impose an ideal setting or time limit on their use of the checklist. Indeed, the superficial level lends itself to a rather quick check by searching digitally in the pdf of the preprint. For inexperienced users of the scientific literature, this provides a way of knowing how to start such an assessment. For more experienced users, e.g. the science journalists, the deeper level provides a good starting set of aspects to reflect upon. In a different setting from our workshops, one could start with the superficial level, and choose one specific theme from the deeper level to be discussed, e.g. bias. Depending on the user’s experience and level of involvement, this or any use of the ‘deep level’ could take various amounts of time. Again, we must stress that the checklist is primarily a way to get non-specialists thinking about how to assess preprint quality for themselves - not every question must be considered for the exercise to be valid, as long as it gets non-specialists engaged and thinking critically. We have now clarified this in the Discussion section.
If the checklist is used on its own, and specifically in relation to a social media post, it provides, in our opinion, a way to check that the post did indeed consider important aspects of preprint quality (which are summarised in the checklist). If the answers to all deeper level questions can be found in the post, 20 minutes could again be sufficient time to dig deeper.

Third key finding is that this checklist highlighted issues with transparency and research integrity. We agree that any quality assessment attempt should consider these issues and this checklist could help improve the understanding and literacy of lay people regarding issues around research integrity and transparency.

Overall, the authors had their work cut out in attempting to create “a simple and user-friendly tool” to check quality of preprints for educated lay people as it requires a fine balance between usability and accuracy. The resulting checklist was shown to be a useful tool for educating potential readers of medical research on wider issues of publishing, preprints, peer review, evidence-based practice, research integrity, and transparency. This study sparks important research on improving research literacy of lay people and we hope that this study would be extended and applied to many different settings around the world.
Given the limitations I would suggest the authors need to be more circumspect In their conclusions about the usefulness of this current checklist given the extremely limited validity checking of it this is a very useful pilot but I would say I was only very preliminary work.
Reply: Thanks a lot for your comments and suggestions. We hope our manuscript now clearly states the limitations of our study and highlights that the assessment is only qualitative and restricted to a small set of perprints.

Minor. In figure 2 the "implementation stage" might better be labelled a "piloting stage" - this was really pilot testing the draft checklist out on a number of groups before finalising the checklist, rather than implementation to the wider community.
Reply: We called this stage implementation stage because we wanted to showcase how we implemented the checklist in our teaching and in the interaction with our communications departments without the intention of adapting the checklist afterwards. But of course you are right that piloting is the more common term. However, we already called the sensitivity test a pilot test, and thus decided on “Preliminary implementation stage” as a compromise. We hope that this change is addressing your concern sufficiently.

View more View less

Competing Interests

We have no competing interests to report.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Homolak J, Kodvanj I, Virag D: Preliminary analysis of COVID-19 academic information patterns: a call for open science in the times of closed borders. Scientometrics. 2020 Sep; 124(3): 2687–2701. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Gianola S, Jesus TS, Bargeri S, et al.: Characteristics of academic publications, preprints, and registered clinical trials on the COVID-19 pandemic. Mathes T, editor. PLoS One. 2020 Oct 6; 15(10): e0240123. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Schwab S, Held L: Science after Covid-19 - Faster, better, stronger? Significance. 2020; 17: 8–9. Publisher Full Text

[4] 4. Watson C: Rise of the preprint: how rapid data sharing during COVID-19 has changed science forever. Nat. Med. 2022; 28: 2–5. PubMed Abstract | Publisher Full Text

[5] 5. Kirkham JJ, Penfold NC, Murphy F, et al.: Systematic examination of preprint platforms for use in the medical and biomedical sciences setting. BMJ Open. 2020 Dec; 10(12): e041849. PubMed Abstract | Publisher Full Text | Free Full Text

[6] 6. Cowling BJ, Leung GM: Epidemiological research priorities for public health control of the ongoing global novel coronavirus (2019-nCoV) outbreak. Eurosurveillance. 2020; 25(6): 2000110.

[7] 7. Vlasschaert C, Topf JM, Hiremath S: Proliferation of Papers and Preprints During the Coronavirus Disease 2019 Pandemic: Progress or Problems With Peer Review? Adv. Chronic Kidney Dis. 27: 418–426. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Ravinetto R, Caillet C, Zaman MH, et al.: Preprints in times of COVID19: the time is ripe for agreeing on terminology and good practices. BMC Med. Ethics. 2021 Dec; 22(1): 1–5. Publisher Full Text

[9] 9. Sheldon T: Preprints could promote confusion and distortion. Nature. 2018; 559(7714): 445–446. PubMed Abstract | Publisher Full Text

[10] 10. Pradhan P, Pandey AK, Mishra A, et al.: Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag [Internet]. Evol. Biol. 2020 Jan [cited 2021 Aug 20]. Publisher Full Text

[11] 11. Kim MS, Jang SW, Park YK, et al.: Treatment response to hydroxychloroquine, lopinavir–ritonavir, and antibiotics for moderate COVID-19: a first report on the pharmacological outcomes from South Korea. MedRxiv. 2020.

[12] 12. Zhang C, Zheng W, Huang X, et al.: Protein Structure and Sequence Reanalysis of 2019-nCoV Genome Refutes Snakes as Its Intermediate Host and the Unique Similarity between Its Spike Protein Insertions and HIV-1. J. Proteome Res. 2020 Apr 3; 19(4): 1351–1360. PubMed Abstract | Publisher Full Text | Free Full Text

[13] 13. Alexander PE, Debono VB, Mammen MJ, et al.: COVID-19 coronavirus research has overall low methodological quality thus far: case in point for chloroquine/hydroxychloroquine. J. Clin. Epidemiol. 2020 Jul; 123: 120–126. PubMed Abstract | Publisher Full Text | Free Full Text

[14] 14. Lee S: Shoddy coronavirus studies are going viral and stoking panic. BuzzFeed News.Reference Source2020.

[15] 15. U.S. Food and Drug Administration (FDA): Coronavirus (COVID-19) Update: FDA Revokes Emergency Use Authorization for Chloroquine and Hydroxychloroquine.2020 Jun 15.

[16] 16. Johansson MA, Reich NG, Meyers LA, et al.: Preprints: An underutilized mechanism to accelerate outbreak science. PLoS Med. 2018 Apr 3; 15(4): e1002549. PubMed Abstract | Publisher Full Text | Free Full Text

[17] 17. Celi LA, Charpignon ML, Ebner DK, et al.: Gender Balance and Readability of COVID-19 Scientific Publishing: A Quantitative Analysis of 90,000 Preprint Manuscripts. Health Informatics. 2021 Jun [cited 2022 Jun 27]. Publisher Full Text

[18] 18. Sumner J, Haynes L, Nathan S, et al.: Reproducibility and reporting practices in COVID-19 preprint manuscripts. Health Informatics. 2020 Mar [cited 2021 Apr 16]. Publisher Full Text

[19] 19. Strcic J, Civljak A, Glozinic T, et al.: Open data and data sharing in articles about COVID-19 published in preprint servers medRxiv and bioRxiv. Scientometrics. 2022 May; 127(5): 2791–2802. PubMed Abstract | Publisher Full Text | Free Full Text

[20] 20. Yesilada M, Holford DL, Wulf M, et al.: Where: Tracking the development of COVID-19 related PsyArXiv preprints. PsyArXiv.2021 May [cited 2022 Jun 27]. Reference Source

[21] 21. Bramstedt KA: The carnage of substandard research during the COVID-19 pandemic: a call for quality. J. Med. Ethics. 2020 Dec; 46(12): 803–807. PubMed Abstract | Publisher Full Text

[22] 22. Chalmers I, Glasziou P: Avoidable Waste in the Production and Reporting of Research Evidence.2009; 114(6): 5.

[23] 23. Macleod MR, Michie S, Roberts I, et al.: Biomedical research: increasing value, reducing waste. Lancet. 2014; 383(9912): 101–104. Publisher Full Text

[24] 24. Brierley L: Lessons from the influx of preprints during the early COVID-19 pandemic. Lancet Planet. Health. 2021 Mar; 5(3): e115–e117. PubMed Abstract | Publisher Full Text

[25] 25. Oikonomidi T: Changes in evidence for studies assessing interventions for COVID-19 reported in preprints: meta-research study. BMC Med. 2020; 10.

[26] 26. Sevryugina Y, Dicks AJ: Publication practices during the COVID-19 pandemic: Biomedical preprints and peer-reviewed literature. BioRxiv. 2021; 63.

[27] 27. Jung YEG, Sun Y, Schluger NW: Effect and reach of medical articles posted on preprint servers during the COVID-19 pandemic. JAMA Intern. Med. 2021; 181(3): 395–397. PubMed Abstract | Publisher Full Text | Free Full Text

[28] 28. Gehanno JF, Grosjean J, Darmoni SJ, et al.: Reliability of citations of medRxiv preprints in articles published on COVID-19 in the world leading medical journals. Infectious Diseases (except HIV/AIDS). 2022 Feb [cited 2022 Jun 27]. Publisher Full Text

[29] 29. Lachapelle F: COVID-19 Preprints and Their Publishing Rate: An Improved Method. Infectious Diseases (except HIV/AIDS). 2020 Sep [cited 2021 Apr 16]. Publisher Full Text

[30] 30. Bordignon F, Ermakova L, Noel M: Over-promotion and caution in abstracts of preprints during the COVID -19 crisis. Learned Publishing. 2021 Oct; 34(4): 622–636. PubMed Abstract | Publisher Full Text | Free Full Text

[31] 31. Nicolalde B, Añazco D, Mushtaq M, et al.: Citations and publication rate of preprints on pharmacological interventions for COVID-19: The good, the bad and, the ugly. In Review.2020 Sep [cited 2022 Jun 28]. Reference Source

[32] 32. Bero L, Lawrence R, Leslie L, et al.: Cross-sectional study of preprints and final journal publications from COVID-19 studies: discrepancies in results reporting and spin in interpretation. BMJ Open. 2021 Jul; 11(7): e051821. PubMed Abstract | Publisher Full Text | Free Full Text

[33] 33. Carneiro CFD, Queiroz VGS, Moulin TC, et al.: Comparing quality of reporting between preprints and peer-reviewed articles in the biomedical literature. Res. Integr. Peer Rev. 2020 Dec; 5(1): 16. PubMed Abstract | Publisher Full Text | Free Full Text

[34] 34. Klein M, Broadwell P, Farb SE, et al.: Comparing published scientific journal articles to their pre-print versions. Int. J. Digit. Libr. 2019 Dec; 20(4): 335–350. Publisher Full Text

[35] 35. Shi X, Ross JS, Amancharla N, et al.: Assessment of Concordance and Discordance Among Clinical Studies Posted as Preprints and Subsequently Published in High-Impact Journals. JAMA Netw. Open. 2021 Mar 18; 4(3): e212110. PubMed Abstract | Publisher Full Text | Free Full Text

[36] 36. Zeraatkar D, Pitre T, Leung G, et al.: Consistency of covid-19 trial preprints with published reports and impact for decision making: retrospective review. BMJ Med. 2022 Oct; 1(1): e000309. PubMed Abstract | Publisher Full Text | Free Full Text

[37] 37. Zeraatkar D, Pitre T, Leung G, et al.: The trustworthiness and impact of trial preprints for COVID-19 decision-making: A methodological study. Epidemiology. 2022 Apr [cited 2022 Jun 27]. Publisher Full Text

[38] 38. Wang Y: The collective wisdom in the COVID-19 research: Comparison and synthesis of epidemiological parameter estimates in preprints and peer-reviewed articles. Int. J. Infect. Dis. 2021; 9. Publisher Full Text

[39] 39. Majumder MS, Mandl KD: Early in the epidemic: impact of preprints on global discourse about COVID-19 transmissibility. Lancet Glob. Health. 2020 May 1; 8(5): e627–e630. PubMed Abstract | Publisher Full Text | Free Full Text

[40] 40. Clyne B, Walsh KA, O’Murchu E, et al.: Using preprints in evidence synthesis: Commentary on experience during the COVID-19 pandemic. J. Clin. Epidemiol. 2021 Oct; 138: 203–210. PubMed Abstract | Publisher Full Text | Free Full Text

[41] 41. Powell K: Does it take too long to publish research? Nature. 2016; 530(7589): 148–151. PubMed Abstract | Publisher Full Text

[42] 42. Henderson M: Problems with peer review. BMJ. 2010; 340: c1409. Publisher Full Text

[43] 43. Benos DJ, Bashari E, Chaves JM, et al.: The ups and downs of peer review. Adv. Physiol. Educ. 2007 Jun; 31(2): 145–152. PubMed Abstract | Publisher Full Text

[44] 44. Campanario JM: Peer review for journals as it stands today—Part 2. Sci. Commun. 1998; 19(4): 277–306. Publisher Full Text

[45] 45. Smith R: Peer Review: A Flawed Process at the Heart of Science and Journals. J. R. Soc. Med. 2006; 99: 5.

[46] 46. Weissgerber T, Riedel N, Kilicoglu H, et al.: Automated screening of COVID-19 preprints: can we help authors to improve transparency and reproducibility? Nat. Med. 2021; 27(1): 6–7. PubMed Abstract | Publisher Full Text | Free Full Text

[47] 47. Van Noorden R: Pioneering duplication detector trawls thousands of coronavirus preprints. Nature. 2020. Publisher Full Text

[48] 48. Limaye RJ, Sauer M, Ali J, et al.: Building trust while influencing online COVID-19 content in the social media world. Lancet Digital Health. 2020 Jun; 2(6): e277–e278. PubMed Abstract | Publisher Full Text | Free Full Text

[49] 49. Wingen T, Berkessel JB, Dohle S: Caution, Preprint! Brief Explanations Allow Nonscientists to Differentiate Between Preprints and Peer-Reviewed Journal Articles. Adv. Methods Pract. Psychol. Sci. 2022;15.

[50] 50. Iborra SF, Polka J, Monaco S, et al.: FAST principles for preprint feedback.2022.

[51] 51. Dalkey NC: The Delphi method: An experimental study of group opinion. RAND CORP SANTA MONICA CA; 1969.

[52] 52. Dalkey N, Helmer O: An experimental application of the Delphi method to the use of experts. Manag. Sci. 1963; 9(3): 458–467. Publisher Full Text

[53] 53. O’Brien BC, Harris IB, Beckman TJ, et al.: Standards for Reporting Qualitative Research: A Synthesis of Recommendations. Acad. Med. 2014 Sep; 89(9): 1245–1251. Publisher Full Text

[54] 54. Meshkat B, Cowman S, Gethin G, et al.: Using an e-Delphi technique in achieving consensus across disciplines for developing best practice in day surgery in Ireland. JHA. 2014 Jan 22; 3(4): 1. Publisher Full Text

[55] 55. Parasher A: COVID research: a year of scientific milestones. Nature. 2021.

[56] 56. Retraction Watch team. Retracted coronavirus (COVID-19) papers. Retraction Watch blog.2020.

[57] 57. Lavezzo E, Franchin E, Ciavarella C, et al.: Suppression of COVID-19 outbreak in the municipality of Vo’, Italy. Epidemiology. 2020 Apr [cited 2021 Aug 13]. Publisher Full Text

[58] 58. Bi Q, Wu Y, Mei S, et al.: Epidemiology and Transmission of COVID-19 in Shenzhen China: Analysis of 391 cases and 1,286 of their close contacts. Infectious Diseases (except HIV/AIDS). 2020 Mar [cited 2021 Aug 13]. Publisher Full Text

[59] 59. Wyllie AL, Fournier J, Casanovas-Massana A, et al.: Saliva is more sensitive for SARS-CoV-2 detection in COVID-19 patients than nasopharyngeal swabs. Infectious Diseases (except HIV/AIDS). 2020 Apr [cited 2021 Aug 13]. Publisher Full Text

[60] 60. Elgazzar A, Eltaweel A, Youssef SA, et al.: Efficacy and Safety of Ivermectin for Treatment and prophylaxis of COVID-19 Pandemic. In Review.2020 Dec [cited 2021 Aug 20]. Reference Source

[61] 61. Davido B, Lansaman T, Bessis S, et al.: Hydroxychloroquine plus azithromycin: a potential interest in reducing in-hospital morbidity due to COVID-19 pneumonia (HI-ZY-COVID)? Infectious Diseases (except HIV/AIDS). 2020 May [cited 2021 Aug 20]. Publisher Full Text

[62] 62. Lynn MR: Determination and quantification of content validity. Nurs. Res. 1986; 35: 382–386. Publisher Full Text

[63] 63. Hasson F: Research guidelines for the Delphi survey technique. J. Adv. Nurs. 2000; 32: 1008. Publisher Full Text

[64] 64. Hopewell S, Clarke M, Moher D, et al.: CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLoS Med. 2008 Jan; 5(1): e20. PubMed Abstract | Publisher Full Text | Free Full Text

[65] 65. Stodden V, Seiler J, Ma Z: An empirical analysis of journal policy effectiveness for computational reproducibility. Proc. Natl. Acad. Sci. U. S. A. 2018 Mar 13; 115(11): 2584–2589. PubMed Abstract | Publisher Full Text | Free Full Text

[66] 67. Baker M: 1,500 scientists lift the lid on reproducibility. Nature. 2016; 533(7604): 452–454. PubMed Abstract | Publisher Full Text

[67] 66. Oodendijk W, Rochoy M, Ruggeri V, et al.: SARS-CoV-2 was unexpectedly deadlier than push-scooters: could hydroxychloroquine be the unique solution. Asian J. Med. Health. 2020; 18(9): 14–21.

[68] 68. Schwab S, Turoman N, Heyard R, et al.: Precheck. Dataset. 2023, January 30. Publisher Full Text

[69] 69. Heyard R, Schwab S, Turoman N, et al.: Reporting Guideline. OSF. 2023. Publisher Full Text

Using an expert survey and user feedback to construct PRECHECK: A checklist to evaluate preprints on COVID-19 and beyond

Abstract

Keywords

Introduction

Figure 1. The increase of COVID-19 preprints.

Methods

Ethical considerations

Study design

Figure 2. The stages of our Delphi-inspired approach to creating the PRECHECK checklist.

Researcher characteristics

Internal review

Sensitivity test

External expert review

Results

Stage 1 results

Stage 2 results

Table 1. Results of expert survey and implementation of results.

Stage 3 results

Implementation

Table 2. The PRECHECK checklist.

Discussion

Conclusions

Authors' contributions (CRediT)

Data availability

Extended data

Reporting guidelines

Acknowledgements

References

Footnotes

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated