Effect of intraoperative PEEP with recruitment maneuvers on the occurrence of postoperative pulmonary complications during general anesthesia––protocol for Bayesian analysis of three randomized clinical trials of intraoperative ventilation

Background: Using the frequentist approach, a recent meta–analysis of three randomized clinical trials in patients undergoing intraoperative ventilation during general anesthesia for major surgery failed to show the benefit of ventilation that uses high positive end–expiratory pressure with recruitment maneuvers when compared to ventilation that uses low positive end–expiratory pressure without recruitment maneuvers. Methods: We designed a protocol for a Bayesian analysis using the pooled dataset. The multilevel Bayesian logistic model will use the individual patient data. Prior distributions will be prespecified to represent a varying level of skepticism for the effect estimate. The primary endpoint will be a composite of postoperative pulmonary complications (PPC) within the first seven postoperative days, which reflects the primary endpoint of the original studies. We preset a range of practical equivalence to assess the futility of the intervention with an interval of odds ratio (OR) between 0.9 and 1.1 and assess how much of the 95% of highest density interval (HDI) falls between the region of practical equivalence. Ethics and dissemination: The used data derive from approved studies that were published in recent years. The findings of this current analysis will be reported in a new manuscript, drafted by the writing committee on behalf of the three research groups. All investigators listed in the original trials will serve as collaborative authors.

failed to show the benefit of ventilation that uses high positive end-expiratory pressure with recruitment maneuvers when compared to ventilation that uses low positive end-expiratory pressure without recruitment maneuvers. Methods: We designed a protocol for a Bayesian analysis using the pooled dataset. The multilevel Bayesian logistic model will use the individual patient data. Prior distributions will be prespecified to represent a varying level of skepticism for the effect estimate. The primary endpoint will be a composite of postoperative pulmonary complications (PPC) within the first seven postoperative days, which reflects the primary endpoint of the original studies. We preset a range of practical equivalence to assess the futility of the intervention with an interval of odds ratio (OR) between 0.9 and 1.1 and assess how much of the 95% of highest density interval (HDI) falls between the region of practical equivalence. Ethics and dissemination: The used data derive from approved studies that were published in recent years. The findings of this current analysis will be reported in a new manuscript, drafted by the writing committee on behalf of the three research groups. All investigators listed in the original trials will serve as collaborative authors.

Introduction
Mechanical ventilation under general anesthesia and neuromuscular blockade yields a reduction in lung volume that can affect respiratory mechanics and gas exchange, 1 especially in specific clinical scenarios such as laparoscopic surgery with pneumoperitoneum insufflation. 2 This reduction in volume can lead to a cyclic opening and closing of alveoli during mechanical ventilation, ultimately inducing tissue injury known as atelectrauma. 3 Minimizing atelectrauma by applying a 'lung protective ventilation' to reopen the closed alveoli with a recruitment maneuver (RM) while sustaining their permeability by applying higher positive end-expiratory pressure (PEEP) is likely associated with a reduction in postoperative pulmonary complications. 4 The optimization of intraoperative ventilation and its potential beneficial effects on clinically relevant postoperative outcome measures is of particular importance due to the large number of surgical operations worldwide per year. 5 Using the frequentist statistical approach, a recent meta-analysis of three randomized clinical trials (RCTs) in patients undergoing intraoperative ventilation during general anesthesia for major surgery failed to show the benefit of ventilation that uses high PEEP with RM when compared to ventilation that uses low PEEP without RM. 6 To enhance comprehension of the study data, the Bayesian methodology is beginning to be used in anesthesiology and critical care studies with uncertain frequentist outcomes. [7][8][9][10] In addition, applying a Bayesian framework in metaanalysis allows to model the heterogeneity estimation directly and to estimate pooled effects more precisely, especially when the number of studies included in the analysis is small. 11 Furthermore, Bayesian analysis can produce a full posterior distribution for both the effect estimate and heterogeneity and provide the capability of testing for tailored hypotheses assessing, for instance, if the estimate is smaller or larger than a specified interesting threshold. 12,13 We hypothesize that the intervention effect on the posterior probability distribution will lay outside a predefined region of practical equivalence. To test this hypothesis, we will reanalyze the pooled individual patient dataset of the three largest RCTs of intraoperative ventilation comparing ventilation with high PEEP with RM with ventilation with low PEEP without RM using a Bayesian framework according to previously published recommendations. 7,14 The protocol for the Bayesian statistical approach is presented in this paper. The posterior probability for the effect of the intervention on postoperative pulmonary complications will be assessed to better understand the potential benefit or harm of the tested intervention. This could provide a more interpretable probabilistic framework and allows to include preexisting knowledge into the analysis.

Study type
We will perform a Bayesian analysis of the previously combined dataset named 'Re-evaluation of the  17 and the 'Effect of intraoperative high positive end-expiratory pressure with recruitment maneuvers vs low PEEP on postoperative pulmonary complications in obese patients' (PROBESE) study (registered with ClinicalTrials.gov: NCT02148692). 18 The study protocols of the original studies were approved by the respective institutional review boards and were published before the start of patient enrolment. 19-21 Written informed consent was obtained from all participating individuals before enrolment, and the rules of good clinical practices were followed. All analyses were performed with R version 4.0.1 (R Core Team).
Interventions from the original trials PROVHILO was an international multicenter study comparing intraoperative ventilation with 12 cm H 2 O PEEP with RM to intraoperative ventilation with 0-2 cm H 2 O PEEP without RM in non-obese patients scheduled for major abdominal surgery. 16 iPROVE was a national multicenter study comparing two intraoperative ventilation strategies and two

REVISED Amendments from Version 1
This new version implements a hopefully clearer explanation on the strengths of priors belief and the suggested heterogeneity distribution (half-Cauchy) for the sensitivity analysis.
Any further responses from the reviewers can be found at the end of the article postoperative ventilatory support strategies in non-obese patients scheduled for major abdominal surgery. 17

Data management
The REPEAT database is harmonized, protected, and does not contain any patient-identifying information. Data are stored at Hospital Israelita Albert Einstein, Sao Paulo, Brazil. The full description of the data harmonization process is published elsewhere in full detail. 6,15 Briefly, the level of PEEP considered in the low PEEP group was considered as ≤5 cm H 2 O and data from iPROVE were used according to the intraoperative ventilation strategy as no significant interaction with postoperative intervention was found. 15

Outcomes
The single primary outcome of this current analysis is a collapsed composite of postoperative pulmonary complications (PPCs) developed during the first seven postoperative days as collected in the REPEAT database. The definitions used in the study are reported in Table 1.

Power calculation
For this unplanned analysis, we will use all the available data without any a priori power calculations.

Analysis plan
We will carry out a one-stage approach meta-analysis by fitting a multilevel Bayesian logistic model including the effect of the PEEP and RM strategy as population (fixed) effect and the study and site as a varying (random) effect modeling heterogeneity of effect across different studies.

Prior definition
For defining priors, we will follow previously published recommendations on Bayesian reanalysis 7 and recommendations from studies focusing on Bayesian modelling and meta-analysis. 22,23 The guidelines recommend to consider the full range of possible beliefs. Thus, we will use one skeptical, one pessimistic, and one optimistic prior for the effect of PEEP and RM compared to low PEEP, i.e., a PEEP of 5 cm H 2 O or less, without RM on PPCs. In other words, we define three distributions to decide where to assign most of the probability mass. The pessimistic and optimistic prior distributions will be centred, on the averaged estimate from the original, whereas the skeptical prior will be centred around the absence of effect, i.e. 0. The variance of the prior distributions is defined according to the clinical belief that we cannot rule out a benefit for the intervention, although we can likely rule out a large effect and we cannot exclude a probability of harm and we set the probability masses accordingly. Therefore, we will consider a moderate belief strength for both the optimistic and neutral prior and a weak pessimistic prior. Following previous recommendations, 7 we will define priors as follows: • • The prior for the heterogeneity of the intervention across studies is defined first assuming that at least some between-study variability is present; thus, it should always be more than 0. In the context of log-ORs there are established thresholds of heterogeneity according to the parameter τ that are defined as 'reasonable' (0.1 < τ < 0.5), 'fairly high' (0.5 < τ < 1.0), 'fairly extreme' (τ > 1.0). 24 As recommended in a previously published paper 23 we assume a prior distribution for heterogeneity as a half-Normal with a mean of 0 and an SD = 0.5, this yields prior probabilities of 52% in the reasonable, 27% in the fairly high and 5% in the extreme category respectively ( Figure 2); • The prior correlation for the correlation matrix will be based on the Lewandowski-Kurowicka-Joe (LKJ) distribution with a η parameter of 2 for the varying effect. 22,25 ( Figure 3).

Setting the Range of Practical Equivalence (ROPE)
The ROPE measures how much the posterior probability distribution falls between a specific interval of equivalent effect. By assessing how much of the 95% highest density interval (HDI) falls between the ROPE we can quantify the probability of the studied intervention having a benefit or harm. 26 We define the ROPE as the interval between 0.9 < OR > 1.1. In addition, we define a threshold for severe harm at OR = 1.25. We will draw samples from the posterior distribution after fitting the models with each of the previously defined priors and determine how much of the mass probability lies in the ROPE interval or exceed the threshold for severe harm and determine the expected predicted posterior probabilities of treatment effect using the emmeans package. Furthermore, we compare the ROPE interval with the 95% HDI as previously recommended to see if the HDI probability mass falls outside the ROPE. 26 To illustrate this principle, we simulate data for a binary outcome and a binary dependent variable with four different effect estimates and report the posterior distribution with 95%HDIs and ROPE along with probability masses after fitting logistic models with skeptic priors as previously stated and following a previously published approach 7 ( Figure 4).

Subgroup analysis
To determine if the relationship between treatment and the primary outcome differs between predetermined, clinically significant subgroups, we will fit the varying effect logistic regression model by adding an interaction term between treatment and subgroup and report the conditional effect of the interaction.
We will assess the following subgroups: • Type of surgery, i.e., laparoscopic vs. open surgery; • Risk for PPCs, i.e., Assess Respiratory Risk in Surgical Patients in Catalonia (ARISCAT) score < 45 vs. ≥ 45; We can see how fitting logistic models with different simulated effect yields to different interpretation for instance there is considerable difference between panel A, where the probability of harm is 63% and the probability that the estimate fall in the ROPE is 68% and the HDI 95% crosses the 0 threshold and panel D, where the probability of benefit is 100% and the 95%HDI and ROPE do not overlap.
• Body mass index (BMI), used as a continuous variable; • Type of PEEP selection, i.e., fixed vs. titrated; and • Use of a postoperative element as part of the tested intervention.

Sensitivity analysis
We plan to perform the following sensitivity analyses: • We will fit the same prespecified model but use only severe PPCs as the primary outcome, i.e. excluding mild respiratory failure.
• We will perform one analysis including only patients with lung collapse before the intervention; and • We will fit the model by varying the heterogeneity prior τ to a to a half-Cauchy distribution with location 0 and scale 1. 27

Ethics and dissemination
The study will be performed according to national and international guidelines. All data derive from clinical trials approved by a competent institutional review board in each participating center according to the applicable legislation. The study Steering Committee will publish the study findings. The writing committee will submit the main manuscript on behalf of the research group. All investigators listed in the original trials will be listed as collaborators in an Appendix in alphabetical order, according to the centre's name. All efforts will be made to link all collaborators to the final publication in indexed databases.

Study status
We will carry out the analysis on the already locked database after publishing the protocol.

Discussion
We here describe the protocol and statistical analysis plan for a Bayesian reanalysis of an individual patient data metaanalysis which aim is to compare the effect of intraoperative high PEEP after RM vs. low PEEP without RM on the incidence of postoperative pulmonary complications in patients undergoing surgery with general anesthesia and low tidal volume mechanical ventilation. The optimization of mechanical ventilation during surgery is important since it can potentially improve clinically relevant post-operative outcomes with beneficial effects on patients, families and healthcare systems.
The present proposed analysis has several strengths. Firstly, we can use the merged dataset of three well-performed multicenter RCTs, that tested the effect of fairly comparable intraoperative ventilation strategies concerning a similar endpoint. Secondly, the Bayesian framework will provide probabilities of harm or benefit associated with the studied intervention, adding further insights to the frequentist interpretations used in the previous analyses of these three RCTs. 15 We scrupulously followed the published recommendations, especially concerning prior selection. 7 Thirdly, we prespecify subgroups analysis to assess the intervention effect in particularly interesting subpopulations such as patients who underwent laparoscopic surgery, and sensitivity analyses to test the robustness of our methodology, which is crucial in a Bayesian framework.
The current standard statistical paradigm to analyze RCTs and perform meta-analyses is based on null-hypothesis testing and P-values and is referred to as frequentist approach. P-values indicate how incompatible a data set is with a specified statistical model, but their correct interpretation can be counterintuitive and at times even problematic. For instance, the typical P < 0.05 is defined as the probability that another study would yield a result equal to or more extreme than the one observed, assuming that the null hypothesis is true. This definition can hamper the correct interpretation of the results of different studies, particularly if P > 0.05. When a test returns a P > 0.05, a study is often interpreted as negative, meaning that the intervention had no effect on the outcome of interest, while the rigorous interpretation should be that the available data were insufficient to reject the null hypothesis. 28,29 The results from the REPEAT analysis fall precisely in this category. The effect of a high PEEP after RM maneuvers compared to low PEEP without RM on post-operative pulmonary complications was not statistically significant, with a P value of 0.06, thus yielding an indeterminate result. We plan to leverage the advantages of a Bayesian approach to expand and escape the all-or-nothing simplistic interpretation of the study derived from the null hypothesis testing.
Further, a probabilistic framework to attach probabilities to specific estimates has been added to the analysis, thus providing a much more interpretable and intuitive explanation of the results.
Bayesian analysis in this context has proven to help gain additional insights and scope and has been increasingly used in recent years. For instance, Bayesian analysis applied to RCTs with indeterminate frequentist in intensive care setting interventions focused on improving mortality 30,31 found that the posterior probability of mortality benefit, i.e., relative risk (RR) < 1 or OR <1, ranged between 88% and 99% according to a range of prespecified priors. 8,9 Other reports used the same approach 7 to investigate further an opposite, i.e. probability of harm, indeterminate result in same clincal setting RCT, 32 found that the probability of harm of the intervention was > 90%. Bayesian analysis has also been used to elucidate the effect of interventions in specific subgroups, 10 to evaluate the effect of selection bias 33 and in metaanalysis. 34 This current analysis has several limitations which need to be addressed. First, some differences between studies concerning how PEEP and RM were used and titrated cannot be unraveled, although we will use a previously harmonized individual patient database. Secondly, we analyze the effect of a broad category, i.e., high PEEP and RM vs. low PEEP and no RM. Defining the optimal PEEP and RM strategy is beyond the scope of the current analysis and must be elucidated in further investigations. Thirdly, the original RCTs did not exclude patients without lung collapse; therefore, a selection bias towards less benefit of the intervention cannot be excluded. Fourthly, although previously published recommendations 7 were rigorously followed, there is no unequivocal way to choose a universally correct prior probability distribution. Moreover, although we included all data available from three of the largest RCTs assessing the effect of open lung strategy in the perioperative period, a certain degree of precision bias cannot be excluded should data from other studies be incorporated, although a change in the overall conclusions is unlikely.
In conclusion, we will use a Bayesian methodology to better interpret data from three large RCTs investigating the potential beneficial role of high PEEP after RM compared to low PEEP without RM during intraoperative low tidal mechanical ventilation to prevent post-operative pulmonary complications and improve clinically relevant outcome measures. Bayesian analysis can be a helpful tool to augment the interpretation of anesthesiology and critical care trials.

Open Peer Review
indeterminate when using a frequentist approach. Therefore, we will consider a moderate belief strength for both the optimistic and neutral prior and a weak pessimistic prior. Following previous recommendations,7 we will define priors as follows…." What exactly does the word "therefore" signify here? It seems sensible to align the means ESs to the preexisting data, the weighting choice for the optimistic vs pessimistic priors could be explained some more (especially with regard to possible pessimistic assumptions, i.e., higher complication incidence). I suggest the authors elaborate a bit more here.
○ complication stages in the analysis or expanding their sensitivity analysis in this regard (they were already planning on excluding "minor" complications for sensitivity analysis). On the other hand, subgrouping may limit statistical power further and lead to model convergence issues.
Thank you for the interesting remark. Indeed, we used a database where the primary outcome definition where previously harmonised to join the data from the different RCTs. New aspiration pneumonitis, pulmonary infiltrates and cardiogenic pulmonary oedema were not included in the definition of PPC, because they were not recorded in the iPROVE trial. Also, moderate respiratory failure, as defined only in the PROBESE trial was merged in the definition of severe respiratory failure to achieve concordance with the other trials. Still, there is some slight mismatch in some of the definitions, as the reviewer points out, and we already acknowledge in the limitations section of the protocol and in the previous studies. We planned indeed a sensitivity analysis excluding mild respiratory failure from the outcome to test the robustness of our findings. We purposedly tried to design a simple protocol and not overburden with too many secondary analysis because we feel that this would drive the focus away from the main question and could possibly complicate the model estimation as the reviewer correctly pointed out.

2.
As for the prior definition, This important topic is explained at length and is planned according to a comprehensive protocol (Zampieri et al, 2021) and technical description (Röver et al, 2021). The reviewer appreciates the nuanced approach. However, I do believe that some more information / explanation could be conveyed regarding the choice of relative prior weighting.
We agree that this is a key concept. We, therefore, reworked the paragraph on the prior to be clearer and more explicit. The updated version of the manuscript now reads as follows: "For defining priors, we will follow previously published recommendations on Bayesian reanalysis 7 and recommendations from studies focusing on Bayesian modelling and meta-analysis. 22,23 The guidelines recommend to consider the full range of possible beliefs. Thus, we will use one skeptical, one pessimistic, and one optimistic prior for the effect of PEEP and RM compared to low PEEP, i.e., a PEEP of 5 cmH2O or less, without RM on PPCs. In other words, we define three distributions to decide where to assign most of the probability mass. The pessimistic and optimistic prior distributions will be centred, on the averaged estimate from the original, whereas the skeptical prior will be centred around the absence of effect, i.e. 0. The variance of the prior distributions is defined according to the clinical belief that we cannot rule out a benefit for the intervention, although we can likely rule out a large effect and we cannot exclude a probability of harm and we set the probability masses accordingly. Therefore, we will consider a moderate belief strength for both the optimistic and neutral prior and a weak pessimistic prior. Following previous recommendations, 7 we will define priors as follows: The The mentioned study is already referenced in the manuscript; it is reference #15. If the reviewer is referencing another study, please let us know, and we will add it to the manuscript.
4. The authors will use a half-normal distribution for the heterogeneity prior. Could they briefly explain the rationale to perform a sensitivity analysis with I squared distribution (instead of i.e. Half-Cauchy).
We agree. The I 2 distribution is used mainly for modelling heterogeneity in conventional Bayesian metanalysis models with aggregate data, as explained in length in Rover et al. Res. Synth. Methods. 2021 Jul 1;12(4): 448-474. Following the reviewer, we changed the distribution of the heterogeneity prior for the sensitivity analysis to a half-Cauchy prior. We changed the manuscript accordingly, which now reads: "We will fit the model by varying the heterogeneity prior τ to a half-Cauchy distribution with location 0 and scale 1." 5."Previous studies suggested a benefit of PEEP and RM, but many studies were neutral or indeterminate when using a frequentist approach. Therefore, we will consider a moderate belief strength for both the optimistic and neutral prior and a weak pessimistic prior. Following previous recommendations,7 we will define priors as follows….". What exactly does the word "therefore" signify here? It seems sensible to align the means ESs to the preexisting data, the weighting choice for the optimistic vs pessimistic priors could be explained some more (especially with regard to possible pessimistic assumptions, i.e., higher complication incidence). I suggest the authors elaborate a bit more here.
Please see our reply to comment #2. We hopefully made the logic behind the priors definition clearer.
© 2023 Gasteiger L. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Lukas Gasteiger
Department of Anesthesiology and Intensive Care, Universitat Innsbruck, Innsbruck, Tyrol, Austria This is a very well designed protocol for a Bayesian analysis of three recent RCT's to better understand the effect of intraoperative PEEP on PPCs. By now no RCT was able to clearly show a benefit of a higher PEEP and RM compared to lower PEEP levels and no RM. Therefore the question, if such an intervention is beneficial to our patients, is of high clinical relevance. The chosen model to use a Bayesian protocol seems appropriate and adds the advantage that no new trial and inclusion of patients is needed.
Furthermore to assess the effect of this intervention is of central interest as it is known, that a higher lever of PEEP may also interact with Right Ventricular Function and therefore may lead to circulatory side-effects such as hypotension. Also, an increased need for vasopressor therapy may be associated.
Is it planned to also assess for circulatory side-effects as secondary outcome? ○ Is the rationale for, and objectives of, the study clearly described? Yes

Are sufficient details of the methods provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes