ALL Metrics
-
Views
Get PDF
Get XML
Cite
Export
Track
Study Protocol

Development of a tool to assess the risk of bias in statistical simulation studies: study protocol

[version 1; peer review: awaiting peer review]
PUBLISHED 30 Jun 2026
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

Abstract

Background

Statistical simulation studies are the principal methodology for examining the performance of statistical methods. Findings from statistical simulation studies guide researchers in their statistical decision making, and inform evidence syntheses of statistical simulation studies, so should provide a fair representation of how the methods are expected to perform.

Objectives

To develop a tool to assess whether statistical simulation studies provide a fair representation of how statistical methods are expected to perform.

Methods

We will undertake a multi-step process to develop a domain-based tool with signalling questions. The project team will consist of a core working group and an international advisory group. We will hold virtual meetings with the advisory group to agree on the scope and content of the tool. We will conduct systematic reviews and cited reference searches (forward and backward citations) to develop an evidence base to inform domains and signalling questions. An initial set of domains will be agreed on with the advisory group. We will then undertake a survey of methodologists and statisticians with expertise in statistical simulation design and intended users of the tool, to seek their views on which domains are most important. We will propose signalling questions for each domain and revise the domains with feedback from the advisory group until domains are agreed. We will pilot the tool with intended users such as consulting statisticians and systematic reviewers of findings from simulation studies. The developed tool and guidance documentation will be published in an open-access journal and disseminated via conferences and workshops. This protocol has been registered on the Open Science Framework (OSF) on June 2 2026, (Registration DOI: https://doi.org/10.17605/OSF.IO/DW4SM). 1

Keywords

statistical simulation, risk of bias, bias assessment tool, protocol, systematic review

Background

Statistical simulation studies are the principal methodology for examining how well a set of statistical methods perform against a known truth. This is achieved by generating hypothetical data sets based on known characteristics, applying the statistical method(s) to the data sets and comparing the results with the known characteristics.2,3 Statistical simulation can be used to evaluate whether a statistical method performs as intended, to examine the robustness of a statistical method or to compare multiple statistical methods (either as a comparison of previously published methods or to compare the performance of a newly developed method) (see Table 1). Statistical simulation is also used for other purposes, such as to construct empirical sampling distributions (e.g., bootstrapped confidence intervals) or to determine the power or sample size when designing a study, although such uses are beyond the scope of the current project.

Table 1. Example uses of statistical simulation in studies.

ReferencePurpose of simulation studyDescription of simulation study
Li, 2024 44Evaluate whether a statistical method performs as intended.Develop a method for small samples that extends standard longitudinal models to accommodate informative observations in clinical studies; simulation was used to show that the proposed estimators performed as expected from theory.
Abbas-Aghababazadeh, 202345Examine the robustness of a statistical method to violations in the assumptions underpinning the method.Compare meta-analysis methods for gene-drug associations or biomarker discovery using preclinical pharmacogenomics data; simulation was used to evaluate the performance of the standard meta-analysis methods, which assume independence between included studies, when this assumption was violated.
Jiang, 202346Compare a newly developed method with multiple previously published statistical methods.Develop a mediation modelling approach that addresses zero-inflated mediators containing both true zeros and false zeros, and compare this approach to existing standard causal mediation analysis approaches; simulation was used compare the performance of the approaches across a range of scenarios.
Cho, 2024 47Compare multiple previously published statistical methods (sometimes referred to as a neutral comparison study).Compare existing reliability estimators for single-administration test scores; simulation was used to evaluate the accuracy of each estimator under a range of scenarios.

One of the primary purposes of statistical simulation studies is to guide researchers in statistical decision making, for example, in selecting statistical methodology for a particular scenario. For instance, simulation studies might be used to select which small sample correction to make in a cluster randomised trial with few clusters.4 Researchers may refer to individual simulation studies, or evidence syntheses that combine the results from multiple simulation studies (e.g., a systematic review of simulation studies that evaluated the properties of small sample corrections in cluster randomised trials5). However, both individual statistical simulation studies and evidence syntheses of them may unfairly represent the true performance of the statistical methods, if the simulation results are biased or unrepresentative of the situation to which they are applied.

Bias is defined as a “systematic error, or deviation from the truth, in the results”.6 In the context of statistical simulation studies, the ‘results’ are performance metrics used to quantify how well the statistical methods under evaluation are behaving, and these may be evaluated under multiple scenarios (see Table 2 for definitions and examples of terms in bold). Note that ‘bias’ is a key performance metric commonly used in simulations studies to quantify systematic error in statistical method’s estimator relative to the true parameter value (see footnote [1]). To distinguish this ‘bias’ from the bias we are referring to, we use the terminology ‘bias in the simulation study results’.

Table 2. Glossary of terms used in this article to describe elements of a statistical simulation study, with an example from Turner et al. 2021 .48

The structure of this glossary follows the ADEMP system from Morris et al.2 The terms in brackets refer to alternate usage for the term.

TermDefinitionExample from Turner et al. 2021
Statistical simulation study (study)A statistical simulation study is a computer experiment designed to evaluate a specific aim, using data created from pseudo-random sampling of known probability distributions.“In this study, we therefore aimed to examine the performance of a range of statistical methods for analysing interrupted time series studies with a continuous outcome using segmented linear models.”
Data-generating mechanismThe data-generating mechanism is the process of using random numbers to generate (simulate) one or more data sets.2“We simulated continuous data from ITS studies by randomly sampling from a parametric model [a segmented linear regression model], with a single interruption at the midpoint, and first order autoregressive errors. We multiplied the first error term by 11ρ2 [where ρ is the lag-1 autocorrelation of the errors] so that the variance of the error term was constant at all time points.”
Statistical model (model)A statistical model describes the assumed mathematical relationship between the data points.“We use a segmented linear regression model with a single interruption, which can be written using the parameterisation [defined below] proposed by Huitema and McKean as: Yt = β0 + β1t + β2Dt + β3[t-TI] Dt + εt
VariableThe variables in the statistical model are the quantities that can vary across data points.“Yt represents the continuous outcome variable at time point t of N time points [t is a variable]. Dt is an indicator variable that represents the post-interruption interval (i.e. Dt = 1 (t ≥ TI)) where TI represents the time of the interruption [TI is a parameter, defined below].”
ParameterThe parameters of the statistical model are the fixed quantities that define the data-generating process.“The model parameters, β0, β1, β2 and β3 represent the intercept (e.g., baseline rate), slope in the pre-interruption interval, the change in level and the change in slope, respectively. The error term, εt, represents deviations from the fitted model.”
Parameter value (Factor)Parameter values (Factors) are the values given to the parameters underlying the data and other experimental design choices that the researcher specifies in the data-generating mechanism, (e.g. (true value of a data characteristic such as the mean)).“We created a range of simulation scenarios including different values of the model parameters and different numbers of data points per series. … All combinations of these parameter values (factors) were simulated, leading to 800 different simulation scenarios.”
Simulation scenario (scenario)The factors used to specify a data-generating mechanism define a single simulation scenario. There are typically multiple scenarios considered in each statistical simulation study.“We created a range of simulation scenarios including different values of the model parameters and different numbers of data points per series. … All combinations of these factors were simulated, leading to 800 different simulation scenarios.”
Data setA data set contains set of data points. Each simulation scenario uses a unique data-generating mechanism to generate multiple data sets.“Design parameter values (factors) were combined using a fully factorial design with 10,000 data sets generated per combination.”
Data pointA data point is a single observation, case or record within the data set.An example data point is t = 20 months, Dt = 1, Y20 = 0.50 C. difficile infections per 1,000 patient-days.
EstimandThe estimand is a population quantity, or true characteristic of the data, that is estimated by the statistical methods in the statistical simulation study.“The primary estimands of the simulation study are the parameters of the model, β2 (level change) and β3 (slope change).”
Statistical method (method)Statistical method typically refers to a model used for data analysis but can also refer to the procedure used to choose an analysis.2 A statistical simulation study may evaluate the performance of a single method or compare the performance of multiple methods.“We focus on statistical methods that have been more commonly used (Ordinary Least Square (OLS), Generalised Least Squares (GLS), Newey-West (NW), Autoregressive Integrated Moving Average (ARIMA)). In addition, we have included Restricted Maximum Likelihood (REML) (with and without the Satterthwaite adjustment), which although is not a method in common use, is included because of its potential for reduced bias in the estimation of the autocorrelation parameter, as has been discussed for general (non-interrupted) time series.”
Performance metricsThe performance metrics generate the numerical quantities, i.e., the results, used to assess the performance of the method(s) under evaluation (e.g., bias, confidence interval coverage, mean square error).“The performance of the methods was evaluated by examining bias*, empirical standard error, model-based standard error, 95% confidence interval coverage and power.”

* Bias in this context refers to the difference between the expected value of the estimator for the estimand and the true value of the estimand.

† Some of text of the original paper has been modified to align with the terminology of the present paper.

Bias may occur in the simulation results when, for example, researchers alter the study design after seeing the initial results to favour a preferred method.710 Additionally, the composition of the research team can introduce bias if the researchers have varying expertise, experience or preferences regarding the methods being compared.7,1114 This could result, for example, in study design decisions that favour particular methods,711,13,14 or identification of implementation errors more readily for some methods over others.7,11,12

Many choices are made when designing a simulation study, and while some of these choices may not introduce bias into the simulation results, they can still lead to unfair representation of the performance of the methods under evaluation. For example, researchers must choose the statistical methods to be compared, the data-generating mechanism and performance metrics, and the approach used to evaluate the performance of the methods when the simulation results are missing (as occurs, for example, when methods fail to converge for some of the data sets in a particular scenario).2,3,8,11,15,16 Different design choices can lead to different findings for the performance of a statistical method, which in turn can lead users of statistical simulation studies to make a different decision when selecting statistical methodology. Such researcher choices can also lead to optimism bias, which refers to the tendency of a newly introduced method to perform better in the original publication than in subsequent comparison studies.7,14,1719

Non-reporting bias is another type of bias that is of concern for simulation studies. Simulation studies may remain entirely unpublished due to their findings (known as publication bias),11,2022 or there may be selective non-reporting of results within individual studies (known as selective non-reporting bias).7,11,22 Unlike other forms of research (e.g., randomised trials), there is not a culture or requirement to register simulation studies or publish protocols for them, and because simulation studies (mostly) do not include patient data, they are exempt from ethical review. These factors hinder the ability to identify a sample of simulation studies before their results are known and hence to assess the risk of bias due to non-reporting.

There exist several risk of bias tools for a range of study designs such as RoB 2 for randomised trials,23 ROBINS-I for non-randomised studies of interventions,24 ROBIS for systematic reviews,25 and PROBAST for prediction modelling studies.26 However, we are unaware of such a tool for assessing the risk of bias in the results and unfair representation of the performance of the methods in statistical simulation studies. While reporting guidelines and recommendations for the developers of statistical simulation studies exist,2,3,13,15,2730 these publications are not aimed at the users of simulation studies. Assessing potential for bias in findings from simulation studies is important for researchers using the findings to guide their statistical decision making, and for those undertaking evidence syntheses of statistical simulation studies. The aim of this research is therefore to develop a tool to evaluate whether statistical simulation studies provide a fair representation of how the statistical methods under investigation are expected to perform.

Methods

We aim to develop a tool to evaluate whether statistical simulation studies provide a fair representation of how statistical methods are expected to perform. Development will be based on guidance by Whiting et al.31 and Moher et al.,32 and informed by methods used in the development of other related tools.26,3336 We describe the activities planned at each proposed stage. The development process for the tool is shown in Figure 1.

75631807-937a-4c59-bd2d-d1411c5f258c_figure1.gif

Figure 1. Development process for the risk of bias tool.

1. Assemble the team

The project team will consist of a core working group and an international advisory group. The core group includes: Sarah Arnup (co-lead), Joanne McKenzie (co-lead), Simon Turner, and Matthew Page, based at the School of Public Health and Preventive Medicine at Monash University, and Julian Higgins (University of Bristol). The core group consists of researchers with expertise in the design, conduct and analysis of statistical simulation studies; experience in using statistical simulation studies to inform their statistical practice; knowledge of bias in different study designs; and experience developing risk of bias tools (e.g., RoB2,23 ROB-MEN,22 ROB-ME37). The core group will be responsible for leading the tool development and undertaking the research (e.g., systematic review to inform the content of the tool, conduct and analyse the survey used to generate potential items for the tool, organising consensus meetings).

An international advisory group will be established to provide advice to the core working group throughout the development process. The advisory group will consist of international interest holders including methodologists and statisticians with expertise in numerical simulations, and likely users of the tool (e.g., consulting statisticians, systematic reviewers of findings from simulation studies).

2. Define the scope and outline the conceptual decisions for the development of the tool

During the first virtual meeting with members of the core and advisory group, we will seek agreement on the conceptual decisions outlined in Table 3.

Table 3. Key definitions and conceptual decisions to make for defining the scope of a risk of bias tool (adapted from Whiting, 2017).31

Conceptual decisions to makeConsiderations and examples
What is the definition of a statistical simulation study?We will seek agreement on the definition of a statistical simulation study.
The terminology used to refer to a statistical simulation study varies with discipline, and can include, for example, simulation study, Monte Carlo simulation, stochastic simulation, computer simulation, numerical simulation study, computational research and benchmark/ing study.
Examples of published definitions for a statistical simulation study include:

  • “Simulation studies are computer experiments that involve creating data by pseudo-random sampling from known probability distributions.”2

  • “Simulation studies use computer intensive procedures to test particular hypotheses and assess the appropriateness and accuracy of a variety of statistical methods in relation to the known truth.”3

  • Monte Carle Simulation: “Simulation that uses repeated random sampling to obtain results; the random sampling may be pseudorandom, implemented via a computer.” 49

Which applications of statistical simulation studies will be targeted by the risk of bias tool?We propose to restrict the tool to studies evaluating statistical methods but will seek agreement on the exclusion of other uses of statistical simulation.
Statistical simulation studies are used for different purposes. Common aims of a statistical simulation study, including studies that use different terminology but meet the definition of a statistical simulation study, can include both studies that evaluate statistical methods and apply statistical methods. (See also Table 1):
Evaluation
Example aims of studies that evaluate whether a statistical method performs as intended:

  • To check the algebra (and code) when a new method has been derived algebraically.2

  • To assess the relevance of large sample theory in finite samples.2,50

  • To assess whether a method performs as expected in data where the underlying parameters are consistent the parameters the method was designed for.2,50

Example aims of studies that examine the robustness of a statistical method:

  • To understand where a well-used method breaks down, and how robust the method is to violation of the underlying parameter assumptions.2,50

Example aims of studies that compare multiple statistical methods:

  • To compare the relative performance of multiple methods in the application of interest, either to demonstrate the performance improvements or other advantages of new methods (superiority), or to systematically compare existing methods (neutral comparison).30

  • To compare the relative performance of multiple methods in broad, practically relevant settings, either to demonstrate the performance improvements or other advantages of new methods (superiority); or to systematically compare existing methods (neutral comparison).2,13,19,50

Application
Example aims of studies that apply simulation methods:

  • To calculate the sample size or power provided by a study design.2

  • To construct empirical estimations of sampling distributions, e.g., bootstrapped confidence intervals.

  • To provide instructional tools to help with understanding statistical concepts.3

How is risk of bias defined?We will seek agreement on the definition of the risk of bias.
There are at least two options of what could be assessed for risk of bias in a simulation study. Firstly, risk of bias of the individual results of performance metrics; that is, deviation in the numerical quantities used to estimate the performance of the methods under evaluation, from the results that would have been reached in a study with no flaws in the design, conduct or analysis. Secondly, risk of bias of the overall conclusions that the simulation study authors draw about the performance of the methods.
Will the tool consider only risk of bias of the results (internal validity) or will it also be concerned with assessing applicability (external validity) and possibly reporting quality and missing results?We will seek agreement on whether the tool should consider only risk of bias in the simulation study results, or in addition, assess broader issues of unfair representation of the performance of the methods.
Consideration of unfair representation of the methods will include an assessment of the potential for design choices to misrepresent methods, in addition to bias in the simulation results (i.e., internal validity).
Examples of practices that may introduce bias into the simulation results or misrepresent the methods (presented in italics) are provided below:

  • 1. Neutrality of the authors - When the expertise, experience and preference of the authors varies across each method under consideration, then bias may arise because one (or more) methods may be configured, coded and evaluated differently compared with the other methods under consideration, and not perform as it would in a fair comparison.

  • 2. Blinding of the authors – When the same researchers configure, code, debug and evaluate the methods under consideration, the authors may be more motivated to investigate unexpected (or undesirable) performance in a preferred method, leading to bias in the estimates of the performance.

  • 3. Selection of the statistical methods - When the methods under consideration in a study are not representative of the methods that are used in practice, for a given application and research question, the results may not be generalisable (or have external validity). The conclusions reached about the superiority of a new method relative to existing methods may not be replicated in subsequent studies.

  • 4. Data generating mechanism - When simulation scenarios are not representative of the intended application, or chosen to favour a preferred method, the results may not be generalisable (or have external validity).

  • 5. Seed setting and random number generation – When the random number generation and seed setting procedure do not ensure appropriate independence between data sets, bias may arise in the metrics used to quantify performance of the methods under consideration.

  • 6. Parameter tuning and software version – When the statistical methods under consideration require input parameters from the researcher, and these parameters are selected to fit the simulation scenarios in only the preferred method, or when different software versions are used for each method, bias may arise in the metrics used to quantify performance of the methods under consideration.

  • 7. Handling of missing values – The treatment of data sets that fail to produce an outcome when analysed by the methods under consideration, e.g., due to non-convergence of the methods, may lead to bias in the metrics used to quantify performance of the methods.

  • 8. Performance metrics – When the performance metrics are chosen to favour a preferred method, or a limited number of performance metrics are used, the results may not be generalisable (or have external validity).

  • 9. Selective reporting - When there is selective reporting of, e.g., simulation scenarios, comparison statistical methods or performance metrics that favour a preferred method, then bias may arise in the metrics used to quantify the performance of the methods under consideration.

  • 10. Alteration of the study design after seeing the results – When aspects of the study design, e.g., data-generating mechanism, comparison statistical methods, performance metrics, are changed after seeing the results of the performance metrics, bias may arise in the metrics used to quantify performance of the methods under consideration.

Who is the target audience?We have decided to develop a tool for researchers who use the findings of statistical simulation studies to guide their statistical decision making (e.g., consulting statisticians), and researchers undertaking evidence syntheses of statistical simulation studies.
What type of tool structure will be adopted, e.g., simple checklist design or a domain-based approach?We have decided to develop a domain-based tool with signalling questions, as per, for example, RoB2,23 ROBIS.25
How will quality items be rated within the tool?We will seek agreement on the response options for the signalling questions and the domains.
As a starting point, we will consider using the response options used for other risk of bias tools; that is high/low/some concerns for the domain, and yes/no/unclear or yes/probably yes/probably no/no/no information for the signalling questions.

3. Develop the evidence base to inform domains and signalling questions

We will identify items to inform the content (e.g. domains and signalling questions) for the proposed tool by undertaking a systematic review. We will obtain evidence from four article types: statistical simulation studies (type 1); protocols for statistical simulation studies (type 2); articles that provide guidance, tutorial, commentary or evidence for unfair representation of the methods in statistical simulations (type 3); and systematic reviews that include statistical simulation studies (type 4). Details of the eligibility criteria for each article are available in Appendix Table A1.1

3.1. Search strategy

The Ovid MEDLINE search strategies have been iteratively developed with the assistance of an experienced information specialist (SM). We have designed a base search strategy that is highly sensitive for statistical simulation studies (see Appendix Table A21). This base search strategy was developed and tested using a set of 32 articles; 8 articles were obtained from a recent study examining the replicability of highly cited statistical simulation studies,38 6 articles from statistical simulation studies published in Biometrical Journal as part of the collection “Neutral comparison studies in methodological research”,39 and 20 articles from a convenience sample of articles identified by early iterations of the search strategy. Because the search strategy is expected to return an infeasible number of articles to screen, we will combine the base strategy with more focussed search terms for article types 1, 2 and 4. For article type 3, we will identify articles by conducting a cited reference search (forward and backward citations) of a set of articles known to the authors. Descriptions and rationale for the search strategies, and the search syntax (where applicable), are available in Appendix Table A2.1

3.2. Selection of studies

Citations identified from the search will be imported to Microsoft Excel (Microsoft Office LTSC Professional Plus 2021). One author (SA) will screen all abstracts against the eligibility criteria and classify them as eligible, ineligible or potentially eligible. Full-text articles will be retrieved for all abstracts classified as eligible and potentially eligible articles. Where eligibility is unclear, the article will be reviewed by the core group. For type 1 (statistical simulation) articles, and for reasons of feasibility, we will randomly select and screen abstracts and full-text articles until we identify 50 eligible studies. Eligible articles will be imported to EndNote X8 (Clarivate Analytics, Philadelphia) to remove duplicates.

3.3. Data extraction and management

Data will be extracted from eligible articles using a data collection form developed in Research Electronic Data Capture (REDCap) online designer.40,41 The core working group will pilot the data extraction form for article types 3 and 4. For article type 3, the core working group will independently extract data from the same set of articles because we anticipate subjectivity in identifying and categorising data for extraction. For article type 4, two authors, SA and another (EK, SLT, MJP or JEM), will independently extract data from a set of articles. SA will identify discrepancies from the piloting and present these at meetings for discussion. The data extraction form and guidance will be refined through this process. No piloting of article types 1 and 2 will be undertaken because the guidance developed for articles type 3 and 4 will also apply to articles types 1 and 2. The remainder of articles will be extracted by one author (SA), with any uncertainties discussed with the core working group.

We will extract data on study characteristics (Appendix Table A31). For all article types we will identify whether the authors discuss a potential source of bias in the simulation study results; state or discuss potential flaws in the design of the study; state or discuss a design, conduct or reporting practice which signals, mitigates or allows the assessment of flaws in the design of the study, or provide empirical evidence for bias in statistical simulation studies. We will extract the quote, and write a sentence summarising the concept (which we call an item), categorise the item according to the reason for inclusion and select practises that could lead to the type of bias identified in the item (e.g., we extract the quote: “In case of simulated data, organize a fair comparison in terms of the relation between the methods under study and the data-generating mechanisms of the simulations, with fair meaning that one should not exclusively rely on mechanisms that unilaterally favour methods which explicitly or implicitly assume that these mechanisms are in place”; summarise the concept: “The data-generating mechanism should produce data that allows a fair assessment of the performance the methods under investigation”; categorise the quote: “Discuss a potential source of bias in the results of the simulation study” and we select the practice leading to this bias as: “Data-generating mechanism.”) This latter selection is important for informing potential signalling questions.

3.4. Code and categorise items into domains and signalling questions for potential inclusion in the tool

One author (SA) will: (1) group the items by the types of practices that could lead to bias in the simulation study results (items may be grouped under multiple practices); (2) synthesize and reword similar items under each practice to create a unique set of items; (3) categorise the practices into broad domains of bias (e.g., selective outcome reporting), generalisability (e.g., choice of simulation scenarios), and an unclear category for further discussion. We will provide a definition for each domain, provide a rationale for including the domain, and give example signalling questions. This initial draft set of domains will then be distributed to the advisory group for review. Analyses of the extracted data will be undertaken in Stata version 18.042 and Microsoft Excel (Microsoft Office LTSC Professional Plus 2021).

4. Hold meeting(s) to identify domains for inclusion in the tool

Following distribution of the initial set of domains, we will hold virtual consensus meeting(s) with the advisory group to discuss which domains should be retained, which require modification, and whether there are any missing domains. We will also seek feedback on the definitions of the domains. The domains and their definitions will be revised in response to this feedback and used as the basis for the subsequent survey of interest holders.

5. Conduct a survey to elicit views on proposed domains

We will undertake a survey of a broader group of interest holders to seek wider input on the domains to include in the tool. The interest holders will be selected to ensure representation from researchers with statistical simulation experience, methodologists and statisticians with expertise in statistical simulation design and users of the research. Potential participants will be identified by members of core and advisory groups. In addition, we will advertise the survey via relevant mailing lists (e.g., Statistical Society of Australia).

The core group will draft the survey and pilot test with members of the advisory committee. The survey will be created and distributed using Qualtrics online survey software (Qualtrics, Provo, Utah, USA. https://www.qualtrics.com). Participants will be presented with domains and their definitions, and asked to rate the importance of each domain. They will be given the opportunity to provide comments on the domains, definitions and whether there are any missing domains. We will also collect brief demographic information and ask whether participants would be willing to pilot test the tool.

We will calculate summary statistics to quantify participants’ views of the importance of the domains. Responses to the open-ended question will be summarised, keeping all unique ideas (regardless of the frequency with which they were made).

Ethics approval for the survey will be sought from the Monash University Human Research Ethics Committee.

6. Conduct consensus meetings with advisory group to agree domains

We will hold virtual consensus meeting(s), to discuss the survey results with the advisory group. Each domain will be introduced by a core group member, together with the summary statistics of the importance rating and any major comments. We will focus our discussion on domains that the survey participants have rated as not important, and on any additional domains suggested by the participants. The core group will revise the domains in response to these meetings.

7. Draft the tool and guidance

The core group will prepare a list of potential signalling questions, elaborations and response options for each domain. The wording for the signalling questions will be informed by the unique items identified within each practice from the systematic review (step 3). The core group will seek feedback from the advisory group, revise the domains, signalling questions, and other content, and continue this process until major concerns are addressed.

8. Pilot and refine the tool

The draft tool will be piloted on a random sample of statistical simulation studies identified in article type 1 of the systematic review by two reviewers. The reviewers will be identified from among the survey respondents who indicated a willingness to pilot the tool. We will ask the reviewers to record any issues with interpreting or applying each signalling question, or with the accompanying elaborations. Identified issues with domains and signalling questions will be discussed by the core group and refined. In addition, we will invite members of the advisory group to provide feedback. The core group will discuss and further refine any problematic signalling questions, before finalising the tool.

9. Disseminate the developed tool and guidance documentation

A paper describing the tool will be published in an open-access format. The tool and guidance document will be made available on a website housing risk of bias tools (https://www.riskofbias.info/). The tool will be disseminated via presentations and workshops at relevant conferences and workshops and via social media, and in a series of international webinars.

Study status

As of 3 June 2026, we have run the searches and screened, identified and extracted data from 44 statistical simulation studies; 47 protocols for statistical simulation studies; 56 articles that provide guidance, tutorial, commentary or evidence for unfair representation of the methods in statistical simulations; and 42 systematic reviews that include statistical simulation studies. We have commenced grouping the extracted items by practices that could lead to bias and synthesizing similar items under each practice.

Discussion

We plan to develop a tool to assess the potential for bias in the results of statistical simulation studies and unfair representation of the performance of statistical methods in statistical simulation studies. While there are reporting and conduct guidelines available for statistical simulation studies,2,3,13,15,2730 to our knowledge, this will be the first tool for systematically evaluating whether a statistical simulation study presents a fair representation of how the statistical methods under investigation are expected to perform. The developed tool is intended to assist researchers in determining which statistical methods to use in their statistical decision making, and inform evidence syntheses of statistical simulation studies.

To ensure the tool is relevant and useful to the end-users of statistical simulation studies, the tool will be developed through co-design with end-users, who are part of the core working group and the international advisory group, and whose views will be sought through the survey and piloting processes. To ensure a comprehensive list of possible domains are considered when developing the tool, we will first undertake a systematic review of the statistical simulation literature to identify domains. Recognising the challenge of identifying studies from this literature, we have involved an information specialist to develop the search strategies and utilise cited reference searches (forward and backward citations) of key methodological articles. In addition, the identified list of domains will be supplemented by input from the international advisory group and survey respondents.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 30 Jun 2026
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Arnup SJ, Turner SL, Page MJ et al. Development of a tool to assess the risk of bias in statistical simulation studies: study protocol [version 1; peer review: awaiting peer review]. F1000Research 2026, 15:1038 (https://doi.org/10.12688/f1000research.184100.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 30 Jun 2026
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.