Pupil dilation prediction of random events

We report the results of a conceptual replication of a study that reported that pupil dilation can predict potentially threatening random events above chance level. In this study, participants’ pupil dilation was used to predict the appearance of a threatening or a neutral stimulus, presented randomly in a double sequence of ten trials with replacement, i.e. replacing the chosen trial for the future extractions. In the first experiment, with a sample of 100 participants, the average correct prediction was 55.9%, with a small difference between the two stimuli. This effect was further tested in an exact pre-registered study where the average correct prediction was 58.7%. The reliability of these findings was checked utilizing both a frequentist and a Bayesian statistical parameters estimate approach. These findings collectively support the hypothesis that pupil dilation can be used to anticipate random and therefore theoretically “unpredictable” events in an implicit unconscious way that is without a conscious awareness, and that this ability is another characteristic of the powerful anticipatory adaptive systems of our psychophysiological system. Patrizio E Tressoldi ( ) Corresponding author: Patrizio.tressoldi@unipd.it Tressoldi PE, Martinelli M and Semenzato L (2014) How to cite this article: Pupil dilation prediction of random events [v2; ref status: 2014, :262 (doi: ) approved with reservations 2, ] http://f1000r.es/3dw F1000Research 2 10.12688/f1000research.2-262.v2 © 2014 Tressoldi PE . This is an open access article distributed under the terms of the , Copyright: et al Creative Commons Attribution Licence which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Data associated with the article are available under the terms of the (CC0 1.0 Public domain dedication). Creative Commons Zero "No rights reserved" data waiver The author(s) declared that no grants were involved in supporting this work. Grant information: Competing interests: No competing interests were disclosed. 02 Dec 2013, :262 (doi: ) First published: 2 10.12688/f1000research.2-262.v1 Referees v1 published 02 Dec 2013 v2 published 09 May 2014 1 2


Introduction
The anticipation, or prediction, of future events is a fundamental activity of our conscious and implicit, (i.e.unconscious) cognitive abilities.To be able to predict, even approximately, future movements, perceptions and events, reduces the risks and harms of a future threatening situation and more importantly, optimizes the use of our limited cognitive and energetic resources (e.g.Friston 1 ).
It is therefore not surprising that the study of anticipation has become an interdisciplinary line of research encompassing psychophysiology and neurophysiology (e.g.Van Boxtel and Böcker 2 ), cognitive processes (Barcelo, Bestmann and Yu 3 ) and artificial intelligence (Butz, Sigaud and Baldassarre 4 ).
At the core of the anticipatory activity is an innate sensitivity to the structure and statistics of the environment in order to build up correct representations of each event in order to prepare the organism to perceive and act in a way that minimizes errors and reduces unnecessary adjustments (e.g.Clark 5 ).But this raises the question of how it is possible for the organism to prepare for future events if they are unpredictable or equally probable?
Our interest is focused on this more extreme form of anticipation, that of random events.Even if the study of predicting unpredictable events seems a paradox, there is cumulative evidence related to almost 15 years of investigation, that humans, and perhaps every other living organism, can predict some classes of events above the level expected by chance (see the meta-analysis by Mossbridge, Tressoldi, and Utts 6 ).If proven, this phenomenon will further enhance our knowledge about innate predictive survival abilities.
In this study, we demonstrate that pupil dilation (PD) reactions differ before the random presentation of a neutral or a potentially threatening stimulus.
Similar to other psychophysiological variables (e.g.heart rate, skin conductance, etc.), PD reacts to different emotional states that are correlated with the arousal of the autonomic sensory system (e.g.Bradley et al. 7 ).Because the pupil dilates with sympathetic activity and constricts with parasympathetic activity (Beatty and Lucero-Wagoner 8 ; Steinhauer et al. 9 ), it represents an accurate and unobtrusive measure of the cognitive resources invested in a task, for example in doing arithmetic tasks.
An interesting application of PD is shown in tasks where participants are not consciously aware of some information but their PD reveals that it is unconsciously available and is being processed by the cognitive system.For example Bijleveld, Custers and Aarts 10 , used PD to reveal the strategic recruitment of resources upon presentation of subliminal reward cues.
However PD not only gives information about unconscious cognitive activity but can even anticipate future perceptual or cognitive activities.Einhäuser, Stout, Koch, and Carter 11 used PD to reveal perceptual selection and its prediction of subsequent stability in perceptual rivalry and Einhäuser, Koch, and Carter 12 employed PD to reveal decision making before a person voluntarily reported it.
Starting from these studies and the cumulative evidence that the human psychophysiological and electrophysiological systems react differently before random presentation of two categories of emotional stimuli such as pictures, sounds, etc. (Mossbridge, Tressoldi, and Utts 6 ), we aimed to further investigate whether PD can actually predict random events at the level of each single trial of each participant (see Procedure).An increase of approximately 20% of alerting sounds above the mean chance expected (MCE) has already been observed by Tressoldi et al. 13 in the prediction of auditory alerting and neutral sounds presented randomly at the level of single trials (a summary of results obtained in that study is presented in Table S1 in the Supplementary Material of this paper).
In this paper, we will present the results of a conceptual replication using visual information contrasting a potentially "threatening" stimulus with a neutral one using an experimental procedure that simulate an ecological condition where a dangerous or a neutral event could happen randomly.

Participants
Participants were recruited by advertisements mainly among students of Padova University.Exclusion criteria included uncorrected vision and use of drugs that could affect pupil size and pupil dilation.These characteristics were ascertained by asking each participant.

Sample definition
Estimating an effect size of approximately 0.30, to achieve a statistical power above 0.80, setting α=0.05, an opportunity sample of 100 students and personnel from Padova University (Faul et al. 14 ) were recruited by a research assistant to participate in an experiment on a gambling task.The final sample comprised 32 males and 68 females with a mean age of 29.3 and with a standard deviation of 3.8.

Ethics statement
Participation inclusion followed the ethics guidelines in accordance with the Helsinki Declaration and the study was approved by the Ethics Committee of Dipartimento di Psicologia Generale, the hosting institution.Before taking part in the experiment, each participant provided written consent after reading a brief description of the experiment.

Procedure
The experiment consisted of two different phases, a preliminary and a formal one.The preliminary phase aimed at familiarizing each participant with the procedure and testing if they reacted differently to the two stimuli.

Preliminary phase
Each participant was seated in front of a 19 inch monitor in a sound and light attenuated lab of approximately 120 cd/m 2 measured with a Minolta light meter.
Before the formal sessions, each participant was told: "Before the formal experiment, we must record your personal pupil dilation reactivity to the two types of stimuli you will see behind the door.
You must simply watch what will happen on the screen without doing anything.When the door opens, you will see a gun shooting at you, hearing a shot, or you will see a smile.You will see the shooting gun and the smile ten times each, in random order".
We chose to limit the test to 10 trials per stimulus to avoid boredom and reduce the possibility of using controlled strategies to predict the target.
If there was no need for further clarification, the task started with the calibration of the eye position for the Eye Tracker apparatus.This consisted of participants following a dot moving slowly in different positions on the monitor in a natural way without the need to fix the head position.
Once the calibration was completed, the task started with the stimuli presentation.The sequence of events is presented in Figure 1.The inter-item interval was randomly chosen between 2 and 4 sec.
The two target stimuli, the gun and the smile, and the door, were calibrated for luminance (300 × 471 pixel; 72 horizontal and vertical dpi).The door was colored in black similar to the video background to avoid PD modification consequent to differences in luminosity.Their luminance measured using cd/m 2 with a Minolta ® photometer was, for the gun: 15 center, 90 periphery, Smile: 73 center, 4 periphery, door: 48 center, 8 periphery.After the preliminary phase, the formal phase started.

Formal experimental phase
The research assistant's instruction to each participant was: "Now your task is to let your pupil dilation predict what you will see behind a closed door that will be shown in the center of the monitor.Behind the door you can see a gun shooting at you or a smile.The computer will monitor your pupil dilation and will predict for you what you will see.Remember that the choice of the shooting gun and the smile, is completely random and hence it is not possible to find an underlying rule to predict their sequence.The task consists in two sessions of 10 trials each.For each correct hit you will earn 0.5 euros".
The sequence of events is presented in Figure 2. In this case pupil dilation was measured for 5 seconds during the fixation of the door.

Software and apparatus
Eye-Tracker Apparatus: The eye-tracker model Tobii T120 ® , Tobii, Stockholm, has the following technical characteristics: data rate, 120 Hz; accuracy, 0.5 degrees; freedom of head movements, 30 × 22 × 30 cm; monitor, 17 inch; 1280 × 1024 pixels; automatic optimization of brightdark pupil tracking.PD is measured automatically in millimeters by the apparatus using the incorporated near infrared detectors and software.These data were fed to an original software for their storage.This program, created using E-Prime™ v.2.0, written by two of the authors (MM and LS) and interfaced with the eye tracker, controlled events presentation and pupil size automatic recording.The source code is available at: dx.doi.org/10.6084/m9.figshare.848604.
The sampling with replacement of the two stimuli in the two series of ten trials was randomized using E-Prime™ v2.0 randomized statement and Random function which was reset after every trial.This procedure guarantee against the possibility to guess the incoming stimulus by learning implicit and explicit rules.
The light in the laboratory was constantly dim, approximately 30 cd/m 2 to avoid undesired or unrelated changes to the participants' pupils.
The time necessary to complete the calibration, 2 min on average and give the instructions to participants was long enough to accommodate their pupils to the ambient light before starting the experiment.

Statistical methods
We used both a frequentist parameters estimation and a Bayesian model comparison approach, according to the American Psychology Association (APA) 15 , Kruschke 16 and Wagenmakers, Wetzels, Borsboom and van der Maas's 17 statistical recommendations.This statistical approach is recommended to limit the shortcomings of the classical Null Hypothesis Significant Testing (e.g.Tressoldi et al. 18 ).Basically, each parameter of interest (mean, correlation, etc.) is estimated for its precision by the confidence intervals, and its effect size or Bayes Factor.For those interested in the classical statistical significance with this approach, it sufficient to check if the confidence intervals include (not significant) or exclude (significant) zero.
Inferential frequentist estimates were applied both to the sum and the average of correct guesses (hits) using a binomial and a one-sample t-test statistical test respectively to take in account the sum and the percentages of hit responses.Confidence intervals were estimated using a bootstrap procedure based on 5000 samples.

Bayesian statistics
We adopted a model comparison approach contrasting the alternative hypothesis of a higher difference with respect to MCE (H1) with the Null Hypothesis (H0) of a nil difference with respect to the MCE.We calculated the Bayes Factor (BF H1/H0 ) using the software implemented by Morey and Rouder 19 for the comparison with the one-tailed one-sample t-test, applying Jeffreys, Zellner, Siow (JZS) prior (see Jeffreys 20 ) setting an effect size of 0.3, as suggested by Rouder et al. 21.

Preliminary data analysis
Before proceeding with the statistical analyses, the data for each participant were screened for artifacts.All artifacts, i.e. missing or anomalous (PD values close to/below 1, or above 10) data recordings related to PD easily detected by inspecting the raw scores saved in the individual files, were eliminated.If they exceeded the threshold of 60%, that is 12 out 20 trials, the entire participant was excluded and substituted to keep the total sample equal to 100.The overall percentage of artifacts was 4%.The full raw data and corrected for anomalous data are available at http://dx.doi.org/10.6084/m9.figshare.818978(Tressoldi)  22 .

Individual prediction accuracy
In order to take into account individual differences, we standardized the PD values related to the 20 trials measured in the anticipation medium effects.Furthermore the BF values range from 6 for the smile to 284394 for the overall accuracy.

Predictors of hits percentages
It is plausible to expect a correlation in the difference between the anticipatory PD associated with the two stimuli and their prediction accuracy.The variance explained by this correlation is R 2 =0.348, 95%CI: 0.20 to 0.49.This moderate correlation suggests that anticipatory PD differences between the two stimuli explain only a part of the hits or correct predictions.This finding will be commented further after the results of the exact replication.

Expectation bias control
Expectation bias, arises when a random sequence including multiple repetitions of the same stimulus type (e.g., five non-arousing phase to z scores for each participant.Next, the means associated with the two stimuli chosen by the software were calculated.In this way a mean was always above zero and the second one below zero unless of an identical mean between the two stimuli.The prediction for each trial was obtained simply by defining whether the value of PD, above or below zero corresponded to the stimulus that was chosen randomly.For example, if the PD standardized means associated with the smile and the gun were respectively 0.25 and -0.15, each PD value above zero predicted a smile and vice versa, each value below zero predicted a gun.At the end of the trial, the sum and the percentage of hits (correct predictions) were calculated for each participant.

Prediction accuracy
In Table 1 we report the descriptive statistics, in Figure 3 the hits percentages with their 95% Confidence Intervals (CIs).and in Table 2 the effect sizes estimation and the BF H1/H0 of the two stimuli and overall with respect the MCE with the corresponding 95% CIs.
The means estimate related to both stimuli and overall, show clearly that the prediction accuracy is above the mean chance expected of 50%.
The estimates of all effect sizes, both those referred to the binomial test and to the one-sample t-test, are above zero and in the range of

Pupil dilation recording 5 sec or
Before commenting further on these findings we wanted to test their reliability in an exact replication of the experiment.

Experiment registration
Following the suggestions of Wagenmakers et al. 23 and of the Open Science Collaboration 24 , the experiment was registered on the site http://www.openscienceframework.orgbefore data collection.

Participants
We preplanned to recruit the same number of participants as in the original study assuming a similar effect size and setting the statistical power to 0.80.The final sample recruited as in the first experiment, comprised 26 males and 74 females with a mean age of 23.02 with an associated standard deviation of 2.7.stimuli) produces an expectation in the participant that the next stimulus should be of another type (e.g., an arousing stimulus) and the contrary (the Gambler's Fallacy).In the Figure S1 in the Supplementary Material we report the trend observed for the two types of targets.Ninety-eight percent of all series of identical stimuli were comprised between 1 and 5.The visual inspection supported by the estimate of the linear trend, -0.0071 for the smile and -0.0128 for the gun, excludes an expectation bias for both the stimuli.

General comment
The overall prediction accuracy turned out above 50%, the chance expected.Even if of small magnitude in absolute terms, approximately 5%, the parameter estimates suggest that this is quite substantial in term of effect size and BF.The difference between the anticipatory PD associated with the two stimuli predicts approximately one third of the variance related to the overall accuracy.At present we do not have hypotheses about which other predictors can contribute to the remaining variance.

General discussion
The results of the two experiments support the idea that PD can predict future random stimuli therefore adding more evidence to the findings reported by Tressoldi et al. 13 using auditory stimuli.
Even if these results were due to unpredicted methodological or statistical artifacts, as in all experiments, we can rule out that in our case they are a due to an improper randomization algorithm, the characteristics of our participants or a fault detection of PD by our apparatus.
The observed estimated prediction accuracy is between 5 to 10% above the chance level expected.It remains to be explored whether the 5 to 10% above chance represents an upper limit of this prediction or whether it can be enhanced.
It seems then that PD can be implicitly (unconsciously) modulated to predict random and hence statistically "unpredictable" events.

Procedure
Identical to the original study.The overall percentage of artifacts was 7.3%.

Overall prediction accuracy
In Table 3 we report the descriptive statistics, in Figure 4 the hits percentages with their 95% CIs and in Table 4 the effect sizes estimation and the BF H1/H0 of the two stimuli and overall with respect the MCE with the corresponding 95% Cis.
The means estimates related to both stimuli and overall, show clearly that the prediction accuracy is above the mean chance expected of 50%.
The estimates of all effect sizes, both those referred to the binomial test and to the one-sample t-test, are above zero and in the range of medium to large effects.Furthermore the BF values range from 2317 for the smile to 1.5 × 10 13 for the overall accuracy.

Correlation between PD and hit percentages
The correlations between the difference between the anticipatory PD associated with the overall prediction accuracy was R 2 =0.42, 95%CI: 0.27,0.56,overlapping that observed in the original experiment.

Expectation bias
The same analysis used in the original experiment yielded similar results (see Figure S2 in the Supplementary Material), showing no sign of expectation bias.

Comparison with the original study
In this replication, the hit percentages of the two stimuli and the overall hit percentage are slightly larger, as well as the effect sizes and BFs estimates, confirming the results of the original study.The  restoring brief time-symmetry situations, apparently violating the past to future flow of time, allowing a connection between present and immediately future events like those observed in quantum physics (e.g.Ma et al. 29 ) and recently studied in the perception of ambiguous images by Atmanspacher and Filk 30 .
The fact that it is possible to study this characteristic at the level of a single trial taking into account individual differences, opens up the possibility of devising proof of concept experiments for potential possible applications to be adopted in real life.For example, it is not very complicated to devise technical devices to add to glasses or a smartphone for instance, that can amplify the subtle variations of PD at the level of a conscious overt detection in a way to be used intentionally by every person or to use these variations to activate automatically an alarm that could enhance personal safety when driving or walking.
However only independent exact or conceptual replications can support our findings, for example changing the type of stimuli and/or the randomization algorithm.

Author contributions
Authorship: P. Tressoldi developed the study concept.M. Martinelli  Even if of small magnitude, this predictive ability could have important adaptive consequences, for example in cases of serious threats to life, suggesting that this characteristic is another expression of the powerful adaptive functions of our psychophysiological system that can anticipate future events, for example by promoting advantageous decision-making (e.g.Denburg et al. 25 ), anticipating a reward (e.g.Hackley et al. 26 ) and the pain of others (e.g.Caes et al. 27 ).
It seems that PD, like possibly all other apparatuses regulated by the psychophysiological system (i.e.heart rate, skin conductance level), has innate characteristics specifically dedicated to the anticipation of future events, no matter how predictable, extending the potentialities of survival mechanisms.
The investigation of the mechanisms at the base of this capacity is an open question.If random events cannot be predicted using previous experiences and information, we can argue that some "guessing" mechanisms based on probabilistic estimations are adopted.For example, we are currently testing if our results can be modeled using the Bayesian hierarchical generative model suggested by Mathys, Daunizeau, Friston and Stephan 28 to investigate individual learning under uncertainty.
Another hypothesis is the possibility that our psychophysiological system can manifest a sort of temporal quantum-like entanglement,

Open Peer Review
Current Referee Status: This paper presents evidence that the average person can predict random events better than chance in the context of the experimental setup described by the authors.This is a remarkable result if true.Sociologist Marcello Truzzi coined the aphorism "extraordinary claims require extraordinary proof", from which standpoint a review of any such data must be based.It appears that the sample sizes, study design and statistical methods used are generally appropriate throughout.However I can think of a number of potential artifacts of data collection and processing in this experiment not addressed in this paper that may produce positive results inappropriately.These potential artifacts could be tested statistically within the existing dataset (i.e.without further experimentation), and the paper would be strengthened if each of these potential sources for false positive results could be eliminated.This paper is supported by a nice dataset and it would be good to see the paper refined through exposure in F1000 to accommodate data excluding other sources of bias as others read and comment on the paper to the authors.Ultimately it would be great to see a well-powered dataset (such as this one), pointing to an uncomfortable conclusion, that has addressed all the criticisms of the community and becomes an outstanding challenge for refutation.

Expectation bias control:
The authors report that 98% of random sequences have identical sequential elements of 5 or less, and show a constant hits percentage over lag to support this argument in Figure S1.This constant rate is an important assumption in the paper because assuming a linear trend removes the requirement to correct for gamblers fallacy, as pointed out by the authors.
We would expect wider confidence intervals around larger lag values shown in S1 because these events are less frequent.It would be useful if the authors could include confidence intervals around these points else it could be argued that the 'linear' trend is merely point values within ever widening confidence intervals that could equally support a non-constant hits percentage.An example of the consequence of a non-constant hits percentage is that because of the smaller numbers of such events at higher lags, chance, especially if combined with a participants' inner bias to select more often a gun or smile, may have a small effect on the overall hits accuracy.Admittedly, this would at most be only a few percentage points, but for a result a few percent above chance, this could be important.
As a caveat, it would also be nice to see second order data, say looking at participant's likelihood of selecting the same class 2 or 3 steps later.This is analogous to the gambler's fallacy in its potential effect but is not discussed by the authors and no data is shown.

F1000Research
selecting the same class 2 or 3 steps later.This is analogous to the gambler's fallacy in its potential effect but is not discussed by the authors and no data is shown.

Removed artifacts:
The authors report 4% of trials removed due to artifacts.It would be helpful if the authors would report the number of artifacts removed for each corresponding random event.A significant imbalance would immediately be indicative of bias.False conclusions may be reached under these conditions.For example, if the smile produced a PD which was more likely to be removed as an artifact leaving an excess of guns for analysis, and the participants in general had a bias towards expecting a gun over a smile, as is supported by evidence for prospect theory (Khaneman), then the result would be a higher 'chance' hits percentage merely because participants are naturally biasing towards selection of a more prevalent class.Again, this is unlikely to be more than a few percent but may still be important in this context.One way to easily show this bias as non-existent in this data would be to remove the artifacts, and randomly remove trials so that the number of guns and smiles is the same for analysis, then re-run the numbers.

Minor comments (typos):
"In this way a mean was always above zero and the second one below zero of an identical unless mean between the two stimuli."Unless?Perhaps should be 'except in the case of an identical mean…' "This finding will be commented further after the results of the exact replication" on I have read this submission.I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed.

Patrizio Tressoldi
First of all, many thanks for your interest and useful suggestions to our paper.
As suggested, in our revised article we have added confidence intervals to the expectation bias measures.As you predicted, the decreasing number of data as the lag increases makes the estimates of the average prediction more wide.
As for whether the artefacts could be unbalanced between the two stimuli, they are completely random and due to anomalous or missing values, and not to participants' bias toward one specific stimulus.

Chris Baker
Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, MD, USA At first glance, this appears to be a thorough demonstration that pupil diameter can be predictive of upcoming events.The study is a conceptual replication of an earlier study by the same group (Tressoldi et  ) and the manuscript contains two separate experiments, each with 100 participants.Further, the , 2011 al second experiment was registered at before data collection http://www.openscienceframework.org(although at the time of writing this, I was unable to access any public record of the registration on the web site).Finally, the authors analyzed their results using both frequentist and Bayesian approaches, producing similar results.
However, the claim being made -that somehow pupil responses can predict upcoming random eventswould be incredibly remarkable (if true).Given such a bold, and some would say impossible claim, it is critical to carefully examine the experimental design and the veracity of the results.To their credit, the authors have made much of the raw data available as well as the source code for the experiments.
Here, I'm going to just focus on evaluating the specific methods and analyses in this particular study, ignoring any evidence from prior work.Briefly, each experiment was composed of two phases: Preliminary Phase -The experimenters first measured individual participants' pupil responses to two visual stimuli, a smiley face and a pointed gun (accompanied by the sound of a shot), which appeared following the presentation of a picture of a door.There were 10 trials of each condition presented in a random order (without replacement) and pupil diameter was measured during the presentation of the second stimulus (smiley or gun).
Experimental Phase -In two blocks of ten trials, participants were again presented with a picture of a door followed by a smiley or a gun, but the trials were fully randomized (no replacement) and pupil diameter was now measured during the period when the door was on the screen.
The authors' claim is that in the Experimental Phase, the pupil diameter measured during the presentation of the door is predictive of the upcoming (random) stimulus.
Unfortunately, the method for generating and testing predictions is not entirely clear.My understanding is that for each phase of the experiment, they separately z-scored the data across all (both smiley and gun) trials.Then for the Preliminary Phase data, they determined whether the average z-score for smileys and guns was above or below zero.This was necessary because it appears there were individual differences in the relative pupil size for the two stimuli.The prediction for each Experimental Phase trial was then generated by determining whether the z-score for that trial was above or below zero, and assigning the corresponding stimulus that had a positive or negative mean in the Preliminary Phase as the predicted stimulus.This method does not generate trial-by-trial predictions since it simply assesses whether the pupil diameter for a given stimulus tended to be in the same half of the data in both the Preliminary and Experimental Phase.
Using this method the authors then generated percentage hits, comparing the predicted and actual stimuli, and determined that they were significantly above chance (for both stimuli considered separately and combined) using both frequentist and Bayesian approaches.F1000Research participants had to indicate whether two sequential stimuli were the same or different.He found that some subjects were incredibly fast and accurate at the task and couldn't work out how it was possible until one of his participants told him that he was listening to the sound of the computer disk spinning -on different trials the computer would have to spin and load up a second image!I have read this submission.I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
No competing interests were disclosed.

Competing Interests:
Author Response 24 Apr 2014 , Dipartimento di Psicologia Generale, Università di Padova, Italy Patrizio Tressoldi Many thanks for your interest and useful comments to our paper.
We completely agree that a more "convincing" support to our findings can derive only by independent replications.However we remind that in the cited meta-analysis by Mossbridge .et al , there are independent replications, even if the basic experimental protocol is limited to the (2012) averaging of the anticipatory physiological variables.
As to the differences in PD between the Preliminary and the Experimental phases, as explained on page 2 and 3, in the Preliminary phase PD was measured during the two stimuli presentation, whereas in the Experimental phase PD was measured before their presentation with participants fixating the door image.Furthermore, on page 2 we clarified that the Preliminary phase was used to familiarise the participants with the task, and not to obtain data for the prediction.In our revised article we have added "Anticipation phase" in Figure 2, and the sentence "In this case pupil dilation was measured for 5 seconds during the fixation of the door and used to calculate the " to clarify that only these data were used to predict the future stimuli.prediction accuracy The individual differences with the same exact stimuli and the same luminance properties are expected, given the individual differences in the sympathetic and parasympathetic reactivity.I am a corresponding author of the paper Competing Interests:

Figure 1 .
Figure 1.Sequence of events related to the preliminary phase.

Figure 2 .
Figure 2. Sequence of events of the prediction phase.
case are superior to those observed in the original study and in the range of extreme evidence according to Jeffreys 20 criteria.

Figure S2 .
Figure S2.Expectation Bias of Exact replication of Experiment 1.

I
am the correspondent author of the paper Competing Interests:

Table 2 .
Parameters estimation.Effect sizes with 95% CIs and BF H1/H0 values of hits percentage for the two stimuli and overall with respect the MCE = 50%.

Table 4 .
Inferential statistics.Effect sizes with 95% CIs and BF H1/H0 values of hits percentage for the two stimuli with respect the mean chance expected, 50%.