Predictive physiological anticipatory activity preceding seemingly unpredictable stimuli: An update of Mossbridge et al’s meta-analysis

Background: This is an update of the Mossbridge et al’s meta-analysis related to the physiological anticipation preceding seemingly unpredictable stimuli which overall effect size was 0.21; 95% Confidence Intervals: 0.13 - 0.29 Methods: Nineteen new peer and non-peer reviewed studies completed from January 2008 to June 2018 were retrieved describing a total of 27 experiments and 36 associated effect sizes. Results: The overall weighted effect size, estimated with a frequentist multilevel random model, was: 0.28; 95% Confidence Intervals: 0.18-0.38; the overall weighted effect size, estimated with a multilevel Bayesian model, was: 0.28; 95% Credible Intervals: 0.18-0.38. The weighted mean estimate of the effect size of peer reviewed studies was higher than that of non-peer reviewed studies, but with overlapped confidence intervals: Peer reviewed: 0.36; 95% Confidence Intervals: 0.26-0.47; Non-Peer reviewed: 0.22; 95% Confidence Intervals: 0.05-0.39. Similarly, the weighted mean estimate of the effect size of Preregistered studies was higher than that of Non-Preregistered studies: Preregistered: 0.31; 95% Confidence Intervals: 0.18-0.45; No-Preregistered: 0.24; 95% Confidence Intervals: 0.08-0.41. The statistical estimation of the publication bias by using the Copas selection model suggest that the main findings are not contaminated by publication bias. Conclusions: In summary, with this update, the main findings reported in Mossbridge et al’s meta-analysis, are confirmed.


Introduction
The human ability to predict future events has been crucial in our evolutionary development and proliferation over epochs of time, both from a species perspective, but also, on an individual level. Our day-to-day survival is predicated on a successful marriage of experience (e.g., memory) and sensory processing (e.g., perceptual cues); for example, on a very humid heavily overcast night, our perceptions and memories inform us that a thunder storm is possible and it might be intelligent to find shelter. Such behaviour is highly adaptive as it fosters survival based strategies and is perfectly explicable in terms of current theories of biological causality. Now imagine if such prognosticating ability was possible without any sensory or other inferential cues (see Mossbridge & Radin, 2018 for a review). Such seemingly inexplicable ability would definitely hold survival advantage, if they existed. For millennia people have been reporting strange feelings of foreboding that later transpired to have significance. Over the last 36 years these phenomena have been scrutinized in the laboratory in which a subject's physiology is monitored before a randomly presented stimulus that is designed to evoke a significant post-stimulus response. Disturbingly, moments before the stimulus is presented there are physiological changes ahead of time. This effect is termed presentiment, or more recently, Predictive Anticipatory Activity (Mossbridge et al., 2014). By 2012 a good number of these studies had been completed and it was deemed worthwhile to conduct a meta-analysis of the extant literature at the time. Mossbridge, Tressoldi and Utts located 42 studies published from 1978 to 2010, testing the presentiment hypothesis, out of which 26 enabled a true comparison between pre and poststimulus epochs (Mossbridge et al., 2012), that is the pre-stimulus physiological responses mirrored even if to a lesser degree, the post-stimulus responses.
Here two paradigms were used: either a randomly ordered presentation of arousing vs. neutral stimuli or guessing tasks in which the stimulus is the feedback about the participant's guess (correct vs. incorrect). In both of these approaches it is difficult to envision mundane strategies that might explain the anomalous pre-stimulus effects observed, and indeed, Mossbridge et al., went to significant lengths in refuting the leading candidate -expectancy effects, both in the 2012 meta-analysis and in post-review exchanges with sceptical psychologists and physiologists. Regardless of the paradigm, a broad range of physiological measures were employed from skin conductance, heart rate, blood volume, respiration, electroencephalographic (EEG) activity, pupil dilation, blink rate, and/or blood oxygenation level dependent (BOLD) responses. These are recorded throughout the session, with a pre-determined anticipatory period of between 4 to 10 seconds, in which the any pre-stimulus effect is captured. The presentiment hypothesis calls for a difference between the pre-stimulus responses of the two stimulus categories and this is calculated across sessions. Mossbridge et al. found substantive evidence in favour of a presentiment effect concatenated to over 6 sigma -extreme statistical significance. Additionally, they also found evidence of presentiment effects from mainstream research programs (Bierman, 2000) something that is becoming increasingly important as these effects become more widely known.
Because of the high profile nature of Mossbridge et al., (over 93,000 views as of January 2018) there has been a good number of replications in the few years since publication. We located an additional 26 studies describing 34 effect sizes from a dozen laboratories. The most striking aspect of this fresh database is the sheer variation in experimental approaches as researchers seek to tackle more process-oriented questions rather than continuing the proof-oriented work found in the earlier meta-analysis. Because expectancy effects have been proposed as a potential mechanism to explain at least some of the presentiment effect, it is noteworthy that several experiments in this fresh cohort of studies tackle this head on by only analysing the first trial of a run. These single-trial presentiment studies are expectancy free and are becoming more dominant in this research domain. Another interesting question that is probed in these new studies is the idea of utilizing pre-stimulus physiological activity to predict future events. This provides another objective measure of the validity of the presentiment effect. There are several studies that utilize this approach and they are discussed later on. Also of note we found several PhD theses describing presentiment research and a greater geographical spread than in 2012, both evidence of the increasing attention such research is garnering. Lastly, we found increasing dialogue between presentiment researchers and physicists interested in retrocausality -the idea that effects can precede their cause. This is witnessed in the recent AAAS retrocausality symposium in which several researchers participated and in which some of those papers made their way into this meta-analysis (Sheehan, 2017).

Methods
The whole procedure followed both the APA Meta-Analysis Reporting Standards (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008), the Preferred Reporting Items for Systematic reviews and Meta-Analyses for Protocols (PRISMA) 2015 (Moher et al., 2015) and the reporting standards for literature searches and report inclusion (Atkinson et al., 2015). A completed PRISMA checklist can be found in Supplementary File 1.

Study eligibility criteria
Study inclusion criteria were the analysis of both psychophysiological or neurophysiological signals before the random presentation of whichever type of stimulus, e.g. pictures, sounds

Updates from Version 1
• We updated the database covering the period January 2008 -June 2018 • We added a comparison between pre-registered versus no-preregistered studies and have added a new Table 4 • We updated Figure 1 and Figure 2 to take in account the new study included in the database

Study selection
Study selection is illustrated in the flow-diagram presented in Figure 1 Excluded records were studies where the psychophysiological variables were analysed only after and not before the stimuli presentations (Jin et al., 2013) and with an unusual procedure (Tressoldi et al., 2015), i.e. using heart rate feedback to inform a voluntary decision to predict random positive or negative events.
Records excluded after the screening were studies where authors did not agree to share their data for different reasons (Baumgart et al., 2017;Modestino et al., 2011). Excluded studies revealed either statistically significant or trending evidence in support of the anticipation effect in most cases, thus reducing the concerns surrounding biased removal.
The references of the included studies are reported in Supplementary File 2.

Coding procedure
The two co-authors agreed on the following coding variables: Authors; year of publication; participant selection: yes = selected according to specific criteria; no = selected without specific criteria; number of participants; number of trials; stimuli type; type of randomisation: pseudo or true random; psychophysiological signals, e.g. EEG, Heart Rate, etc.; anticipatory period; type of statistics; value of statistics and independently extracted them from the eligible studies. After the comparison, they discussed how to solve the inter-coder' differences.
On the database we have added a note for each effect size, describing where we extracted the corresponding statistics in the original papers. The database along with all 19 papers are available from Tressoldi (2017). A summary of the selected studies along with their corresponding effect sizes, variance and standard error, is reported on Table S1 in the Supplementary File 3.

Moderator variables
Apart from the overall effect, we chose to compare the following moderator variables, peer review (PeerRev, yes vs no) as a control of study quality. Given the low number of studies no further moderator analyses were carried out.

Statistical methods
The standardized effect size d of each dependent variable, was estimated from the descriptive statistics (means, standard deviation and number of participants) when available. In all other cases, it was estimated by using the available summary statistics, i.e. paired t-test; Stouffer's Z; etc. by using Lakens' software (Lakens, 2013) and the function escalc () of the R package metaphor (Viechtbauer, 2017).
All effect sizes were then converted into the Hedges' g and the corresponding variance by using the formulae suggested by In order to control the reliability of the results, a second analysis was carried out by using a multilevel approach as suggested by Assink & Wibbelink (2016)

Frequentist multilevel random model
The forest plot is presented in Figure 2. The summary of the frequentist multilevel random model analysis is presented in Table 1 compared with the results obtained by Mossbridge et al., whereas the summary of the Bayesian multilevel random model meta-analysis is presented in Table 2.
Sensitivity analysis of the overall effect size, didn't reveal any change from Rho 0 to Rho 1, suggesting that the degree of correlations among the dependent effect sizes don't affect its magnitude.
Another "sensitivity analysis" was carried out excluding the new Mossbridge and Tressoldi studies in order to control whether different authors could obtain similar results. The main results of this analysis by using the same frequentist multilevel random model, is reported in Table 3.
Both the frequentist and the Bayesian analyses support the evidence of an overall main effect of approximately .28, and a small difference between the peer and non-peer reviewed studies. These findings will be commented further in the discussion of the comparison with Mossbridge et al.

Preregistered vs No-preregistered studies
This distinction is relevant for assessing the impact of the so-called Questionable Research Practices and in particular p-hacking (Head et al., 2015;John et al., 2012). Preregistered studies must describe all details on how the data will be analyzed before their collection, thus reducing the degree of freedom available during and after data collection. It can be seen that preregistration makes a range of analytically spurious practices far less likely: from changing the type of data to be analysed, swapping secondary and primary hypotheses and creating new hypotheses post hoc and other practices aimed at artificially inflating the "true" effect size.
From our database it was possible to compare the estimate of the effect size obtained from the pre-registered studies with that obtained from the no-preregistered ones. The results are presented in the following Table 4.
The effect size point estimates clearly show that the effect size of the preregistered studies is larger than that of the nopreregistered studies, however their precision estimates (see the 95% CI) reveal a considerable overlap and consequently they cannot be considered statistically different.

Publication bias
Our very comprehensive literature search is likely to have reduced the probability of a publication bias. Nevertheless we added a statistical estimation of the publication bias.
Unfortunately, there is no consensus about what tests are statistically more valid (Carter et al., 2017).    (2012) original meta-analysis and gives further support to the hypothesis of predictive physiological anticipatory activity of future random events. This phenomenon may hence be considered among the more reliable within those covered under the umbrella term "psi" (see Cardeña, 2018 for an exhaustive review of the evidence and the theoretical hypotheses of all these phenomena).
The limitations of the present meta-analysis are similar to most meta-analyses which include non-preregistered studies. The solution is that of prospective meta-analyses (Watt & Kennedy, 2017), based on all preregistered studies where the methods and data analyses have been declared and made public beforehand.
As to the future of this line of research we think the time is now ripe for testing potential practical applications as suggested for example by Mossbridge et al. (2014). Franklin et al. (2014) and Khoshnoud et al. (2015).
In order to arrive at such an ambitious goal, it is necessary to achieve a high degree of correct classifications based on prestimulus activity at the level of each trial so that the number of false positives and false negatives is reduced to a bare minimum. The experiments of Mossbridge (2017)

Data availability
Underlying data for this meta-analysis is available from FigShare

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work.
p-uniform* and the Vevea and Hedges' weight-function model (Vevea & Woods (2005), seem not recommended for multilevel random meta-analyses with high heterogeneity like the present one. Anyway, we applied the Copas selection model which is recommended by Jin et al. (2015). The Copas selection model was implemented using the metasens package (Schwarzer et al., 2016), The results are presented in the Table 5.
With this statistic, it emerges that there is no apparent statistical publication bias. Both the frequentist and the Bayesian analyses converged on similar results, making our findings quite robust. The overall effect size 0.28, 95% CI = 0.18 -0.38, overlaps to that reported in the original paper: 0.21, 95% CI = 0.13-0.29, even if the heterogeneity is substantially higher: I 2 = 81.9 vs 27.4.

Discussion
The high level of heterogeneity is expected considering the varieties of experimental protocols and the diversity of dependent variables, from heart rate to pupil dilation.
Furthermore, we did not find substantial differences between peer and non-peer reviewed papers as in the original paper, as the confidence intervals of their mean effect size, overlap considerably.

Supplementary material
Supplementary File 1 -Completed PRISMA checklist.
Click here to access the data.
Supplementary File 2 -List of references used in this analysis.
Click here to access the data.
Supplementary File 3 -contains Table S1: Summary of the selected studies along with their corresponding effect sizes, variance and standard error. Stephen Baumgart Department of Psychology and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, USA I think the addition of the "Preregistered versus No-pregistered" section, analysis, and table adequately satisfies the serious concerns that I have about p-hacking. I switch my status to "Approved". Even though I'm still concerned about including non-preregistered studies at all, I realize that such a restriction does not have strong precedent. But I would still encourage authors to focus on meta-analysis of preregistered studies in the future.
One minor comment which I have on the writing is that "No-pregistered" should be "Non-preregistered".
No competing interests were disclosed. Competing Interests: Stephen Baumgart Department of Psychology and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, USA

Addressing Major Criticisms
This is a controversial topic and careful consideration of objections is needed in a meta-analysis. Presentiment or Predictive Physiological Anticipation Studies (PAA) are typically criticized on these grounds (see, for example, Wagenmakers, Wetzels, Borsboom, Kievit, van der Maas, 2015): Physical impossibility File-drawer effect Biases due to multiple comparisons or p-hacking As for the first criticism, discussion of physical plausibility is beyond the scope of this meta-analysis and is left to the discretion of the authors. Nevertheless, apparent violations of our intuitions of time are found at the quantum level, such as the Wheeler Delayed-Choice experiment. It is not impossible that such effects may scale up to a macroscopic level in a not-yet-understood emergent process. I think the final two sentences of the introduction satisfy considerations of the physical impossibility objection and no changes are needed.
Though file-drawer effects are frequently cited as a serious concern, the results section adequately discusses this issue. However, expert review is needed for this area (my response to "Is the statistical analysis and its interpretation appropriate? " should really be a combination of "Partly" and "A qualified statistician is needed".) I agree with the first sentence of the "Publication Bias" subsection that publication bias is not that serious of a concern because of the limited number of researchers and available funding.
By far the most serious concern is the third, that of multiple comparisons or p-hacking, which I do not believe is adequately addressed by either the discussion or conclusion sections. Two sentences in the conclusion are not sufficient to address this serious concern. I have included recommendations later in this review. I am aware the authors already know the following but by doing multiple analyses and only reporting a sub-sample of them a believer or supporter of a hypothesis could bias effect sizes up while a skeptic or opponent could bias effect sizes down (and none of these biases are necessarily intentional or even conscious).
In the context of PAA, serious sources of p-hacking concern are establishing baselines for electrophysiological data, deciding time regions for analysis, and methodologies for rejecting bad data and artifacts. For some physiological measurements, the problem is even worse. In Electroencephalography (EEG) studies, for example, a researcher could either study event-related potentials (ERPs), the spectral power densities of various oscillations, or the phases of such oscillations, or a host of other possible analyses. Considering oscillations, the frequency range of an analysis can also be freely selected. Additionally, a researcher could select different bandpass filters to use or even which section of the head is included in the analysis. This is in addition to the concerns with artifact rejection, time region, and baselining already discussed. With so many free parameters, a non-preplanned study is practically useless as hard evidence for an effect unless the statistical significance of the effect is high enough that it becomes implausible that the effect in question can be generated by tweaking free parameters. Even if the statistical significance is high, the effect size is still untrustworthy because an analyst could be tweaking parameters in an effort to improve the analysis or fix problems but is only 1.

2.
analyst could be tweaking parameters in an effort to improve the analysis or fix problems but is only homing in on statistical fluctuations. These concerns are one reason why I refused to include exploratory EEG research from my own lab in this meta-analysis.
The solution to the multiple analysis problem is to separate research into exploratory studies where adjustments can be made in analysis and pre-planned confirmatory studies. Some of the studies included in the meta-analysis are pre-planned confirmatory studies, which should be considered the only truly reliable results for estimates of effect size due to the concerns laid out in this review (even for confirmatory studies, mistakes by researchers could distort effect sizes but these mistakes may average out in the long run).
My recommended solutions for this paper are: More discussion of the risks of p-hacking in biasing results in the discussion section Separated analyses of pre-registered confirmatory studies and exploratory studies and discussion comparing the two For exploratory studies in the study tables, include the experimenter expectation of whether the hypothesis will be verified (such as in Galak, LeBoeuf, Nelson, & Simmons, 2012) Show whether multiple comparison corrections were made for exploratory studies Exploratory studies are necessary for advancing the field. But a meta-analysis should not include them without major caveats due to potential distortions of the effect size.
I am aware the extra attention given to p-hacking risks in this research is not precedented by other fields but the small effect sizes and the major implications to our understanding of physics, psychology, and neuroscience PAA research engenders may justify additional caution be used. My colleagues and I discuss this further in Schooler, Baumgart, & Franklin, 2018.

Other Comments
"The presentiment hypothesis calls for a difference between arousing and neural pre-stimulus response and this is calculated across sessions" is not always true. For example, the hypothesis could also cover the difference between two different types of arousing stimulus (for example, auditory versus visual stimulus or two different types of visual stimulus).

Are the rationale for, and objectives of, the Systematic Review clearly stated? Yes
Are sufficient details of the methods and analysis provided to allow replication by others? Yes

Are the conclusions drawn adequately supported by the results presented in the review? Partly
No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Patrizio Tressoldi
Thank you for your detailed and constructive comments.
Here it follows our replies to your main comments.
Though file-drawer effects are frequently cited as a serious concern, the results section adequately discusses this issue. However, expert review is needed for this area (my response to "Is the statistical analysis and its interpretation appropriate? " should really be a combination of "Partly" and "A qualified statistician is needed".) I agree with the first sentence of the "Publication Bias" subsection that publication bias is not that serious of a concern because of the limited number of researchers and available funding.
Reply: we think we have a sufficient expertise in dealing with this problem. Furthermore we consulted with R.C.M. van Aert who is an expert on this topic.

My recommended solutions for this paper are:
More discussion of the risks of p-hacking in biasing results in the discussion section Separated analyses of pre-registered confirmatory studies and exploratory studies and discussion comparing the two Reply: we have added a direct comparison between preregistered and no-preregistered studies, see Table 4

and the paragraph "Preregistered vs No-preregistered studies"
For exploratory studies in the study tables, include the experimenter expectation of whether the hypothesis will be verified (such as in Galak, LeBoeuf, Nelson, & Simmons, 2012) Reply: Unfortunately no one study checked this moderating variable, but our sensitivity 1. 1.

1.
Reply: Unfortunately no one study checked this moderating variable, but our sensitivity analysis reported in Table 3, suggests that the experimenter expectation did not affect considerably the overall results.
Show whether multiple comparison corrections were made for exploratory studies Reply: our choice to use multivariate analyses, partly reduce the impact of this procedure.
"The presentiment hypothesis calls for a difference between arousing and neural pre-stimulus response and this is calculated across sessions" is not always true. For example, the hypothesis could also cover the difference between two different types of arousing stimulus (for example, auditory versus visual stimulus or two different types of visual stimulus). Reply: revised as "The presentiment hypothesis calls for a difference between the pre-stimulus responses of the two stimulus categories.." Further discussion should be included for the observations mentioned of the second-to-last paragraph of the discussion; otherwise, it may be unclear why these studies are interesting as the paper asserts. Reply: we expanded our conclusion as suggested.
No competing interests were disclosed.

Competing Interests:
Introduction P.2, Line 21: Not sure I would agree with 'body predicting moments ahead of time' as this suggests understanding -try 'reacting ahead of time' or simply 'physiological changes ahead…' P.2: Para 2: the authors note that two paradigms were used, presentation of arousing/neutral stimuli or guessing tasks. Were any clear differences in PAA effects reported between these tasks? Also, given the 'broad range of physiological measures' used to assess such changes were there any key differences here? P.2, Para 2, final sentence: the 'evidence from mainstream research' -what specifically does this refer to? Behavioural effects? Ie changes in accuracy and/or response times and if so could do with a clear reference. P. 2, Para 3, line 9: 'forwarded' doesn't make sense. Do you mean 'proposed as a potential framework/theory'? P. 2, Para 3: Not sure I'd agree that using physiological markers to 'predict' future events is a 'second objective' measure. It is simply another way to view the same procedure. P. 2, Para 3: the vague references to 'presentiment piggybacking onto mainstream research' needs clarifying and supporting with references.

Methods
P.2, Para 1: need to identify the acronym 'PRISMA' after it is outlined. P. 2, Para 3, line 3: change 'were' to 'where ' ………………..,line 4: change 'Differently' to 'In addition,' Also, what is the rationale for utilising a distinct eligibility criterion? It seems that prior research focused on testing for a pre-stim signal that would match the post-stim presentation. By not using this method you open yourself up to the criticism of widening the scope and also of looking for 'any physiological change' as opposed to one that would be specifically linked to the presentation of the target. The authors claim this is 'more comprehensive' but it could just as easily be seen as less conservative. P.3, Para 4: line 1: change 'were studies were' to 'were studies where' P.3 -is it possible to say a bit more about why some authors did not agree to share their data -looks distinctly odd. P.4, Para 6: sentence referring to 'Assink' doesn't make sense -unless you move the ref out of parenthesis and into the sentence. P.4: Change 'The Bayesian' to 'A Bayesian'. And pull the sentence with syntax to the same paragraph. P. 4: Change: 'Even if with our search activity we are quite….' To 'The robust search is likely to have reduced the probability of a publication bias occurring. Nevertheless, to test this a statistical estimation was conducted using the Copas selection model, as recommended by Jin et al'

Results
Keep tense to past ie peer reviewed not review.
It doesn't make sense to compare data from the current review to Mossbridge et al 'if' both sets of data contain the same studies -as this would lead to obvious similarities etc. To an extent this seems to be addressed by the data in Table 3 but not made clearly -ie why not simply state that when X studies were excluded due to Y reasons the overall effect was still significant? I don't see the moderation results for PeerRev reported here?
The reported 'small difference between the peer reviewed and non-peer reviewed' is vague and unhelpful. State clearly what was found -ie, are they 'significantly different' if not then they are not 'different' in any meaningful way.
Under 'Publication bias' I think para 2, 3 and 4 (which appears on P.6) should be joined as one single paragraph.

Discussion
This is rather poor and reads like a list of points. There needs to be some discussion here not simply a Also, given the 'broad range of physiological measures' used to assess such changes were there any key differences here? Reply: No P.2, Para 2, final sentence: the 'evidence from mainstream research' -what specifically does this refer to? Behavioural effects? Ie changes in accuracy and/or response times and if so could do with a clear reference. Reply: Added reference P. 2, Para 3, line 9: 'forwarded' doesn't make sense. Do you mean 'proposed as a potential framework/theory'? Reply: replaced with "proposed as a potential mechanism". P. 2, Para 3: Not sure I'd agree that using physiological markers to 'predict' future events is a 'second objective' measure. It is simply another way to view the same procedure.
Reply: changed as "another way.." P. 2, Para 3: the vague references to 'presentiment piggybacking onto mainstream research' needs clarifying and supporting with references.

Reply: changed accordingly
Also, what is the rationale for utilising a distinct eligibility criterion? It seems that prior research focused on testing for a pre-stim signal that would match the post-stim presentation. By not using this method you open yourself up to the criticism of widening the scope and also of looking for 'any physiological change' as opposed to one that would be specifically linked to the presentation of the target. The authors claim this is 'more comprehensive' but it could just as easily be seen as less conservative.
Reply: We prefer the term more comprehensive because some experimental designs, e.g. hit guessing, don't allow a post-stimulus physiological measure. However, all experimental designs tied the differential anticipatory physiological activity to two different outcomes, e.g. hits or misses. P.3 -is it possible to say a bit more about why some authors did not agree to share their datalooks distinctly odd.
Reply: the reasons for such decisions are confidential.
P.4, Para 6: sentence referring to 'Assink' doesn't make sense -unless you move the ref out of parenthesis and into the sentence.
Reply: fixed P.4: Change 'The Bayesian' to 'A Bayesian'. And pull the sentence with syntax to the same paragraph. Reply: fixed P. 4: Change: 'Even if with our search activity we are quite….' To 'The robust search is likely to have reduced the probability of a publication bias occurring. Nevertheless, to test this a statistical estimation was conducted using the Copas selection model, as recommended by Jin et al'

Results
Keep tense to past ie peer reviewed not review.

Reply: fixed
It doesn't make sense to compare data from the current review to Mossbridge et al 'if' both sets of data contain the same studies -as this would lead to obvious similarities etc. To an extent this seems to be addressed by the data in Table 3 but not made clearly -ie why not simply state that when X studies were excluded due to Y reasons the overall effect was still significant?
Reply: we clarified that the Mossbridge and Tressoldi's studies were those included in this update.
I don't see the moderation results for PeerRev reported here?
Reply: we think this analysis redundant given the data reported on Tables 1 and 2 The reported 'small difference between the peer reviewed and non-peer reviewed' is vague and unhelpful. State clearly what was found -ie, are they 'significantly different' if not then they are not unhelpful. State clearly what was found -ie, are they 'significantly different' if not then they are not 'different' in any meaningful way.
Reply: we clarified that the means are different, but their precision estimate, i.e. confidence intervals, overlap.
Under 'Publication bias' I think para 2, 3 and 4 (which appears on P.6) should be joined as one single paragraph.

Discussion
This is rather poor and reads like a list of points. There needs to be some discussion here not simply a repetition of the data. Ie -given this effect size how would the authors attempt to account for it? what are the implications of such a finding? Is there any scope for teasing out of the data any factors that may/may not influence the outcome -e.g., a possible relationship between the PAA and the various DV measures used?
Reply: we changed the discussion and the conclusion to include our evaluation of the status of art and the future of this phenomenon.
What is the 'conventional research program' of Kittenis? Reply: omitted How, exactly, does the single trial work of Mossbridge counter QRP?
Reply: we wrote "pre-registered single-trial work". Preregistration of data analyses constraints the use of post-hoc data analysis flexibility.
No competing interests were disclosed. Competing Interests: