Previously titled: The Noetic Experience and Belief Scale: A validation and reliability study

Background: Belief in the paranormal is widespread worldwide. Recent surveys suggest that subjective experiences of the paranormal are common. A concise instrument that adequately evaluates beliefs as distinct from experiences does not currently exist. To address this gap, we created the Noetic Experiences and Beliefs Scale (NEBS) which evaluates belief and experience as separate constructs. Methods: The NEBS is a 20-item survey with 10 belief and 10 experience items rated on a visual analog scale from 0-100. In an observational study, the survey was administered to 361 general population adults in the United States and a subsample of 96 one month later. Validity, reliability and internal consistency were evaluated. A confirmatory factor analysis was conducted to confirm the latent variables of belief and experience. The survey was then administered to a sample of 646 IONS Discovery Lab participants to evaluate divergent validity and confirm belief and experience as latent variables of the model in a different population. Results: The NEBS demonstrated convergent validity, reliability and internal consistency (Cronbach’s alpha Belief 0.90; Experience 0.93) and test-retest reliability (Belief: r = 0.83; Experience: r = 0.77). A confirmatory factor analysis model with belief and experience as latent variables demonstrated a good fit. The factor model was confirmed as having a good fit and divergent validity was established in the sample of 646 IONS Discovery Lab participants. Conclusions: The NEBS is a short, valid, and reliable instrument for Open Peer Review


Introduction
"Paranormal beliefs pertain to phenomena that have not been empirically attested to the satisfaction of the scientific establishment" 1 . Paranormal beliefs encompass a broad range of concepts, such as ghosts or spirits, extrasensory perception (ESP), extraterrestrial beings, and mind-to-mind communication, or telepathy. Belief in the paranormal is widespread around the world [1][2][3][4][5][6][7][8][9][10][11][12][13] . For example, in a Gallup poll of 1,002 United States adults conducted in 2005, 55% respondents believed in psychic or spiritual healing or the power of the human mind to heal the body, 41% believed in extrasensory perception, and 31% believed in telepathy or mind-to-mind communication 14 .
However, having a belief in the paranormal does not necessarily mean having experienced the paranormal. A paranormal experience refers to an individual's memory of an experience that one judges to be genuine. The memory of a paranormal experience relies on a different mental substrate than a belief based on environment, education and reasoning and the neural structures underlying memory of an experience and belief are likely different 15,16 . Paranormal belief and experience are often correlated when measured simultaneously, although this is rarely done 17,18 . For example, one study found a positive correlation (r = 0.61) between paranormal experience and belief scores 12 . Another interesting study found that exposure to television programs that regularly depict paranormal phenomena was positively correlated with belief, but only for respondents who had personal experiences 19 .
Prevalence of reported paranormal experiences evaluated over the last 40 years in a variety of populations has ranged from a low of 10% in Scottish citizens 20 to a high as 97% in enthusiasts in the United States 12 . Two very large prevalence studies have been conducted. One surveyed adults in 13 European countries and the United States (N=18,607). European respondents reported experiencing telepathy (34%), clairvoyance (21%), and contact with the dead (25%). Percentages for the U.S. adults were considerably higher: 54%, 25%, and 30% respectively 21 . Another large study of British adults (n=4,096) found that 37% of respondents reported at least one paranormal experience defined as precognitions, extra-sensory perception, mystical experiences, telepathy, and after-death communication 22 . Other smaller prevalence studies have been conducted around the world. Haraldsson et al. conducted two surveys of prevalence in Iceland, one in 1974 with 902 participants 5 and one in 2006 with 991 participants 3 . He found that psychic phenomena increased from 59% of men and 71% of women in 1974 to 70% of men and 81% of women in 2006. In Scotland, 10-16% of the general population sample (n -241) had experienced second sight, with the exception of the Grampian area where prevalence was more than doubled at 33% 20 . Chinese, Japanese, African-American and Caucasian-American college students (n -1922) were surveyed and 31-47% reported having at least one experience 23 . Of 502 adults in Winnipeg, Canada 65% 24 and 38% of 622 Charlottesville, Virginia students and townspeople 25 reported having at least one experience. In the United States, 67% of the 1460 participants reported having had an ESP experience, 31% a clairvoyant experience, and 42% contact with the dead 26 . More recently in the United States, 89.3% of the general population, 89.5% of scientists and engineers, and 97.8% of paranormal enthusiasts reported at least one paranormal experience 12 .
Specificity of the work in this field is limited by the lack of questionnaires that adequately separate paranormal belief from experience and do so concisely 1,27 . Using ambiguous measures can lead to confounding the two constructs of belief and experience, and blur results 1,6 . Instruments that do separate these constructs are long and not conducive to the time constraints of many studies -see Exceptional Experiences Questionnaire 28 and Anomalous Experience Inventory 29 . To address these limitations and as part of a larger research program on extended human capacities, we created the Noetic Experience and Belief Scale (NEBS), a 20-item survey that evaluates paranormal beliefs and experiences separately. The present studies investigate the psychometric properties of the Noetic Experience and Belief Scale in two populations. By studying these phenomena, we aim to gain a deeper understanding of the nature of consciousness and the reach of human potential.
The objectives of the following two observational studies were to evaluate the validity and reliability of the Noetic Experience and Belief Scale (NEBS) and to confirm the two latent variables of belief and experience in a confirmatory factor analysis. In study 1, the survey was administered to 350 participants for the validity and confirmatory factor analyses and again to a subsample of 96 of these participants for a test-retest analysis. In study 2, the survey was administered to a different population where divergent validity was evaluated and the factor model reevaluated. We hypothesized that NEBS would be valid, reliable, and demonstrate good fit for a model with belief and experience as latent variables in both populations.

Methods
Initial development of the NEBS The NEBS was developed through consensus by the authors and two expert consultants who actively work in the field. This group was informed by our own previous studies and by reviewing other studies and previously used instruments that evaluated paranormal beliefs and/or experiences. One previous study 30 evaluated the prevalence of 27 paranormal experiences listed here in decreasing order of prevalence: Claircognizance, Clairempathy, Precognition, Lucid Dreaming, Emotional Healing, Clairvoyance, Clairsentience, Animal Communication, Telepathy, Aura Reading, Astral Projection, Clairaudience, Clairalience, Mediumship, Channeling, Physical Healing, Geomancy, Retrocognition, Psychometry, Remote Viewing, Automatic Writing, Clairgustance, Psychokinesis, Pyrokinesis, Levitation, and Psychic Surgery (please see extended data for definitions of each of these terms 31 ). Based on feedback from participants and a review of these items, we removed emotional healing (very similar to physical healing), psychic surgery (very rare), and clairsentience (very similar to claircognizance), renamed channel to psychophony and mediumship to contact with the dead, and added the item Information from Dreams. We then conducted another prevalence study in a different population. Notably, we did not use the jargon term for each paranormal belief/ experience, but instead used as neutral language as possible to describe the experience itself. For example, rather than asking if the participant had ever experienced "pyrokinesis -the ability to create and/or manipulate fire", the item asked "Have you ever created fire using only your concentration or will?" These neutral language items were then administered to participants consisting of three groups: a general population sample, scientists and engineers, and paranormal enthusiasts 12 .
In both studies, we found that some items were highly correlated and represented overlapping constructs. They could also be viewed as specific nuanced experiences within a larger extended human capacities category. For example, psychic physical healing or the purported ability to feel other people's physical symptoms in your own body and heal, transform, or transmute them would fall under the umbrella category of psychokinesis or the purported ability to influence a physical system without any physical interaction or with mental effort alone. Thus, in an effort to reduce participant burden and allow for quick assessment of experiences and beliefs we collapsed any overlapping constructs into individual items for each of the following categories: 1. Non-local consciousness (e.g. Astral Projection, Lucid Dreaming); 2. Extraterrestrials; 3. Precognition/Retrocausation; 4. Survival of Consciousness (after bodily death); 5. Contact with the dead (Mediumship); 6. Clairvoyance (Claircognizance, Clairempathy, Clairvoyance, Clairsentience, Aura Reading, Clairalience, Clairaudience, Geomancy, Clairgustance, Remote Viewing, Psychometry, Animal Communication); 7. Psychokinesis (Physical Healing, Psychokinesis, Psychic Surgery, Pyrokinesis, Levitation); 8. Telepathy; 9. Automatism (Channeling, Automatic Writing) We also reviewed a number of existing questionnaires that measured paranormal experience and/or belief 8,26,28,29,[32][33][34][35][36][37][38][39][40][41][42][43][44][45] . A summary of this review is presented as Supplemental data A (please see extended data 31 ). Each questionnaire was evaluated for the number of items, whether it assesses belief, experience or both, whether it evaluates belief and experience as separate constructs, and subscales if applicable. From this review, an additional item on intuition, representing perhaps the most common paranormal experience, was added to the new scale for a total of 10-items.
The instrument was called the Noetic Experience and Belief Scale using noetic from the Greek noēsis/noētikos, meaning inner wisdom, direct knowing, or subjective understanding; and unlike a vague impression, a noetic experience carries a deep sense of authority and certainty. We included "noetic" in the title rather than "paranormal" in part because of the stigma associated with the term paranormal, which could introduce bias that might be mitigated by using an alternate term. Similarly, the paranormal categories were not stated in the scale but only descriptions of the constructs included (please see extended data) 31 .

STUDY 1: General population sample Procedures
The first study administered the NEBS to a randomly selected general population group in the United States to establish validity, test-retest reliability, and confirm the two latent variables of belief and experience. We contracted with Lucid, LLC (New Orleans, Louisiana) to obtain completed surveys from an unbiased census-distributed sample of 350 participants representative of the general population in the United States. The sample was unbiased in that it was not associated with the Institute of Noetic Sciences or any other paranormal or noetic-related group. Lucid, LLC is a marketplace that connects hundreds of sample suppliers with individual primary research studies to facilitate online surveys. Lucid uses screening questions to qualify respondents for a particular study then through programmatic technology aligns the best suppliers for that individual audience. Once a respondent qualifies through the screener, the appropriate suppliers are notified through an API and an email is triggered from the supplier directly to the survey taker. Each of the suppliers on the marketplace has approximately 200 pre-profiled mapped qualifications. These include age, gender, household income, job role, hobbies, etc. Lucid uses these qualifications as well as the screening questions to ensure efficiency and high quality when matching survey takers with individual projects. All potential volunteers are screened, checked for validity, and emailed a link to the survey. Participants were Englishspeaking adults in the United States. Inclusion criteria were: Adults aged 18 to 89, who could read and understand English, and were willing to complete questionnaires. Exclusion criteria were: Children (<18 years old) or Elders >89 years old. Elders 90 years old and older were excluded because the survey was designed to be anonymous and recording ages greater than 90 is considered private health information 46 . Targets for distribution were based on United States census values and were as follows: Gender -50% males and females, Age -18-24 -13%, 25-44 -41%, 45-64 -30%, 65+ -16%; Ethnicity -Hispanic -11%, Black -12%, White (non-Hispanic) -59%, Other -18%.
The study was approved by the Institute of Noetic Sciences Institutional Review Board # WAHH_2018_06. Participants were given a link to a Health Insurance Portability and Accountability Act compliant survey on SurveyMonkey. The first page of the survey was a consent form (please see extended data 31 ). Participants were asked to read the form and check a box acknowledging that they had been informed of the procedures, and risks and benefits of participating in the study. They then completed the survey, which took approximately 15-20 minutes. Data were collected from November 9, 2018 through December 14, 2018. All data were collected anonymously, with no identifiers or IP addresses. Participants were compensated $3 for completing the survey once and $7 ($3 + $4) if they also participated in the retest administration.
In total, 444 began the survey; 26 did not agree to the consent form and 57 agreed to the consent form but did not complete the survey. The remaining 361 participants completed the survey (underlying data 47 ). Surveys were collected between November 9, 2018 and December 13, 2018. Participants were on average 44 years old ± 16.8 and had 14.5 ± 5.3 years of education. Of these, 52% were female and 56.8% were in-relationship. Participants were mostly Caucasian (67% Caucasian, 13% Black or African American, 8% Hispanic or Latino, 6% Asian or Pacific Islander, 5% American Indian or Alaskan Native, and 2% preferred not to answer). In terms of salary, 67% of participants had earned between 0 and $75,000 (30% Under $30K, 37% $30K to under $75K, 11% $75K to under $100K, 11% $100K or under $150K, 7% $150K to under $250K, 3% $250K or greater, 2% Decline to answer) with an average household size of 2.6 ± 1.4.
Of the original 361 participants who completed the survey, 96 completed the same survey again approximately one month after the first administration (mean 35.3 days ± 3.7) between December 14, 2018 and January 2, 2019. Participants who completed the retest had similar demographics as the original sample (age 44 years old ± 16.9, education 14.3 ± 2.1, 54% male, 64% Caucasian, 53% in relationship, 74% with income under $75,000, and average household size of 2.4 ± 1.4).

Measures
In addition to demographic information (e.g. age, gender, marital status, socioeconomic status), the main instrument in the survey was the Noetic Experience and Belief Scale (NEBS). The scale contains ten statements about beliefs in intuition, non-local consciousness, extraterrestrials, precognition, survival of consciousness, contact with the dead, clairvoyance, psychokinesis, telepathy, and automatism that all begin with the stem "I believe…" and then a description of the concept. The participant rates each belief statement on a slider anchored by Disagree Strongly (0) and Agree Strongly (100). For each of the ten items, participants also answered "I have personally had this experience." on a slider scale anchored by Never (0) and Always (100). Two experience items were worded differently to accommodate the nature of the concept. The life after death experience item was worded "I have personally had an experience that I interpreted as a proof that consciousness survives the physical body." and the contact with the dead item was worded "I have personally had the experience of contact with the dead." Six of the 10 items were from the Australian Sheep-Goat Scale, three of which were exactly the same (#'s 9, 10, 11), and three were modified (#'s 4, 5,14) 45 . The scale results in overall scores for paranormal belief and experience by averaging the ten items for each subscale. Item scores can also be used individually for scores on each specific category. Internal consistency of the NEBS scale was calculated with a Cronbach α coefficient, as described subsequent sections (Full scale is available in extended data 31 ).
Convergent construct validity was measured by administering pre-existing survey instruments that evaluate similar concepts: Australian Sheep-Goat Scale 45,48 , Revised Paranormal Belief Scale 43 , and Anomalous Experiences Inventory (AEI) 29 . The Australian Sheep-Goat Scale is an 18-item questionnaire on various beliefs and experiences. Respondents endorse True (2 points), Uncertain (1 point), or False (0 points) for each item. Values are then summed to form a score ranging from 0-36. The Revised Paranormal Belief Scale is a 26-item scale that measures the degree of belief in the paranormal in each of seven dimensions: Traditional Religious Belief, Psi, Witchcraft, Superstition, Spiritualism, Extraordinary Life Forms, and Precognition. Respondents endorse how strongly they believe in each item on a 7-item Likert scale. Subscales and a total score are obtained by calculating means of specific items. The AEI is a 70-item questionnaire that evaluates multiple subscales: anomalous/paranormal experience, anomalous/paranormal beliefs, anomalous/paranormal ability, fear of the anomalous/paranormal, and drug use. Respondents answer True (1) or False (0) for each item and values are summed for each scale. The scales selected have already been assessed as valid and reliable and used in numerous peer-reviewed publications. Correlation matrices of the scores were evaluated for expected patterns of associations between measures of the same construct.

Statistical methods
Test-Retest. Some participants repeated the survey approximately one month later so that test-retest reliability could be assessed with a Pearson correlation coefficient.
Sample size. Some sources suggest at least 10 people per item for psychometric validation although a recent review suggested that sample size is rarely justified a priori 49 . We aimed for a sample size of 350 for the 20-item scale. For confirmatory factor analysis, there is also no agreement on the number of participants needed although sources 50 recommend approximately 10 participants for each estimated parameter (10 × 20 parameters = 200). We had 361 participants resulting in a ratio of 18.05 participants to each parameter estimated.

Confirmatory factor analysis.
A confirmatory factor analysis was used (rather than an exploratory factor analysis) because a theoretical framework was already established for evaluating belief and experience as separate constructs, albeit highly correlated 1,12 . The latent variables for the model were Belief and Experience. Observed variables were the 20 NEBS items. Univariate variables were tested for normality with the Shapiro-Wilk Test and any outliers assessed with scatter and box plots. Normality of residuals were evaluated with kernal density estimates and standardized normal probability plots. Outliers were evaluated with residuals, leverages, influence and Cook's distance. Multicollinearity was evaluated with the variance inflation factor (VIF), which is the quotient of the variance in a model with multiple terms by the variance of a model with one term alone and quantifies the severity of multicollinearity. An unstructured covariance matrix was used so as to not impose any constraints on the variance and covariance values. All 20 items were highly correlated and thus, covariances between unique factors for all items were included in the model and then removed if they did not reach significance. All statistical analyses were conducted with Stata 15.0 (StataCorp, LLC, College Station, TX).

Construct validity
The means and standard deviations for the paranormal belief and experience questionnaires are shown in Table 1. All Table 1 correlation pairs were positive and significant at p = 0.05 level or less (all but three being more than p < 0.00005).

Reliability
Internal consistency. Cronbach's alpha was calculated for the NEBS Belief subscale items and Experience subscale items to measure the extent to which the items within the subscales correlated with each other and measured a similar construct 51 . The ten belief items had a Cronbach's alpha of 0.90 and average inter-item covariance of 429.9. The ten experience items had a Cronbach's alpha of 0.93 and average inter-item covariance of 610.7.
Belief: On average, intuition, survival of consciousness, and non-local consciousness were the highest rated Beliefs (see means and standard deviations for each item in Table 2). All Belief construct pair correlations were significant (p < 0.00005). Telepathy Belief and clairvoyance Belief were highly correlated ( Experience: On average, intuition and non-local consciousness were the most common Experiences. All Experience pairs were significantly correlated (p < 0.00005). Seven Experience pairs were highly correlated (r = 0.70 -0.89). Many Experiences were moderately correlated (r = 0.50-0.69).
Belief and Experience: Most Belief and Experience pairs were significantly correlated at the p<0.000005 level except for belief in intuition and the experience of extraterrestrials (p = 0.0004), contact with the dead (p = 0.0001), psychokinesis (p = 0.002), telepathy (p = 0.002), and automatism (p = 0.0016). Belief in telepathy was highly correlated with the Experience of telepathy (r = 0.79). Belief and Experience pairs of the same construct were all moderately correlated except for Survival of Consciousness which had a significant but low correlation. Many beliefs were moderately correlated with Experiences.
Belief and Experience as separate constructs Confirmatory factor analysis was performed based on data from 361 respondents; there were no missing data. The retest data of the 96 participants were not included in the confirmatory factor analysis modeling. A correlation table of observed values with means and standard deviations is shown in Table 2. The a priori theoretical model of Belief and Experience items as described in the statistics section is presented in Figure 1.
We hypothesized a two-factor model to be confirmed in the measurement portion of the model where Belief and Experience were the latent variables. We evaluated the assumptions of univariate and multivariate normality and linearity. Univariate variables were not normally distributed individually. The ADF estimation method was used because it makes no assumption of joint normality or even symmetry for observed or latent variables (StataCorp, 2013). Residuals were normally distributed. There were no observations with a Cook's distance greater than 1. No variable had a VIF less than 0.1 or greater than 10 (average VIF for all variables 3.29) indicating acceptable multicollinearity. The model chi-square (159) was 283.1 (p<0.00005), the root mean square error of approximation (RMSEA) was 0.060 (90% confidence interval 0.051-0.069), the comparative fit index (CFI) was 0.94, the standardized root mean squared residual (SRMR) was 0.13, and the Tucker-Lewis fit index (TLI) was 0.90. These values represent a good fit of the model to the dataset as indicated by commonly reported fit statistics (RMSEA < 0.08, CFI ≥ 0.90, SRMR < 0.08, TLI ≥ 0.95) 52 .

Test-retest reliability:
The NEBS had high test-retest reliability for both the Belief (r =0.83, p <0.00005) and Experience (r =0.77, p <0.00005) subscales. The Wilcoxon sign-rank test was used to evaluate individual item and subscale differences because variables were not normally distributed. All items and subscale scores were not significantly different between the two time-points except for the telepathy Experience item which decreased Experiences for the second administration (Table 3). Individual's responses to the subscales remained relatively consistent across the repeated administration and above standardly accepted values for reliability of r =0.70 53 .

STUDY 2 - IONS Discovery Lab sample Methods
Procedures. The NEBS was then administered to participants attending workshops at the IONS Discovery Lab and also online. Participants were recruited through workshop leaders hosting events at the IONS EarthRise Retreat Center in Petaluma, California and through workshop leaders hosting events at their own sites who contacted IONS to participate in the study. This study was approved by the IONS Institutional Review Board. Participants had to be adults (aged 18 years and above) with the ability to understand the consent form, were willing and able to complete the measures, and did not have an acute or chronic illness that precluded completion of the survey. Participants enrolled in the IONS Discovery Lab completed a number of surveys including the NEBS prior to their workshops. First, participants read and agreed to the consent form. Then data were collected anonymously through SurveyMonkey.

Measures.
Relevant measures used to establish NEBS divergent validity, which tests whether concepts or measurements that are not supposed to be related are actually unrelated were: Arizona Integrative Outcomes Scale 54 , Positive and Negative Affective Well-being Scale 55 , single-item general health 56 , acute sleep quality scale 57 , the Numeric Pain Rating Scale 58 , and Big Five Inventory-10 scale 59 , and the compassion subscale of the  Arizona Integrative Outcomes Scale (AIOS) is a one-item, visual analogue self-rating scale (VAS) with two alternate forms (one for daily ratings, AIOS-24h; and one for monthly ratings, AIOS-1m). The daily rating version was used for this study. The instructions are: "Please reflect on your sense of wellbeing, taking into account your physical, mental, emotional, social, and  spiritual condition over the past 24 hours. Mark the line below with an X at the point that summarizes your overall sense of well-being for the past 24 hours." The horizontally-displayed VAS is 100 mm in length, with the low anchor being, "Worst you have ever been" and the high anchor being, "Best you have ever been." The AIOS has demonstrated the ability to discriminate between healthy and unhealthy populations and has adequate convergent and divergent validity 54 .
Positive and negative affective well-being is measured with a variety of dichotomous indicators asking subjects whether they had experienced an emotional state for much of the day yesterday. For positive affect, the emotional states are happiness, enjoyment and smiling/laughter, which, aggregated together, have a reliability of α = 0.72. For negative affect, the emotional states are stress, worry and sadness, with a reliability of α = 0.65 55 .
Overall health is a single item question "In general, how would you rate your overall health?" which is answered by choosing one of five options: Poor; Fair; Good; Very good; Excellent 56 .
Acute sleep scale is a single item scale asking participants to rate their quality of sleep over the past 24 hours on an 11-point numeric rating scale ranging from 0 denoting "best possible sleep" to 10 denoting "worst possible sleep" 57 .
The Numeric Pain Rating Scale (NPRS) is a segmented numeric version of the visual analog scale in which a respondent selects a whole number (0-10 integers) that best reflects the intensity of his/her pain. The NPRS is anchored by terms describing pain severity extremes. Participants are asked to report pain intensity "in the last 24 hours" or an average pain intensity with 0 = "No pain" to 10 = "Worst possible pain" 58 .
Big Five Inventory-10 (BFI) scale is a 10-item measure of the Big Five (or Five-Factor Model) dimensions: Neuroticism, Extraversion, Openness to Experience, Agreeableness, Conscientiousness. The BFI-10 was developed to provide a personality inventory for research settings with time constraints. It allows assessing the Big Five with only two items per dimension. Previous research has shown that the BFI-10 possesses psychometric properties that are comparable in size and structure to longer five factor inventories such as the NEO-PI-R which has 240 items. The score for each dimension is obtained by summing standard items and reverse scored items for each scale 59 .
Compassion scale is 5 items from the Dispositional Positive Emotion Scale compassion subscale. It measures dispositional tendencies to feel positive emotions toward others in their daily lives. Items are rated from strongly disagree to strongly agree and scored from 1 to 7. Items are averaged for a total score and higher scores indicate greater levels of positive emotion 60 .
Statistical Analysis. Demographic information was qualitatively described for categorical variables. Means and standard deviations calculated for all continuous variables. Pearson correlations were conducted for relationships between measures. Cronbach's Alpha was calculated for the Belief and Experience subscales. All analyses were conducted with Stata 15.0 (StataCorp, LLC, College Station, TX). The confirmatory factor analysis was conducted in the same was study 1.

Results
The NEBS Belief items had a Cronbach's alpha of 0.93 and an average inter-item covariance of 304.4. The NEBS Experience items had a Cronbach's alpha of 0.91 and average inter-item covariance of 476.4. The experience scale was moderately correlated with the belief scale in this sample (Table 4).

Discussion
The overall results of the two studies provide psychometric support for the validity and reliability of the NEBS as a brief assessment of self-reported paranormal beliefs and experiences. When measured separately, Belief and Experience are highly correlated. We found this in both of our samples (study 1: r =0.77; study 2: r = 0.64). Interestingly, the correlation was . The AEI -Paranormal Belief and Paranormal Experience subscales were highly correlated in our study 1 sample as well (0.77). Interestingly, the original study of this scale found a much lower (r =0.57) although significant correlation between the two subscales 29 . We also found belief and experience to be highly correlated (r = 0.61) for another mixed population of scientists and engineers, the general population, and paranormal enthusiasts 12 . Other studies that have evaluated belief and experience in general have also found positive correlations 17,18 . A study examining the correlation between specific religious and classic paranormal beliefs, such as belief in heaven and hell or psychic healing, in relation to the paranormal experiences of illness cured by prayer and the use of the mind to heal the body, found mixed results. For example, belief in the devil and belief in illness cured by prayer had a low significant correlation (r = 0.38), but the relationship between illness cured by prayer and the belief in psychic healing (r = -0.04) was not 40 .
Paranormal belief and experience are highly correlated in most studies that assess them, and yet they are distinctly different constructs that should be evaluated separately. What we do not yet understand is the causal or temporal nature of the relationship between belief and experience. Does paranormal belief precede experience or vice versa? Does someone's belief in the paranormal prime them to experiencing it or does a subjective experience of the paranormal instill belief in the phenomena? Future longitudinal studies evaluating a baseline level of people's beliefs and collecting data on how those beliefs change over time in relation to any experiences they have would be helpful in answering this question.
There are a number of limitations that should be kept in mind when reviewing the results of this study. The individual constructs included in the NEBS are highly correlated. Conceptually, the individual concepts are unique but could also be viewed as overlapping. For example, the items on non-local consciousness (B2. I believe that my consciousness is not limited by my physical brain or body. E2. I have personally had this experience.) and survival of consciousness (B5. I believe in life after death. E5. I have personally had an experience that I interpreted as a proof that consciousness survives the physical body.) could be considered as the same construct worded in a different way. The experience items are administered directly after the belief item of the same construct. The instrument was purposefully designed in this way to keep it concise. However, asking the belief question directly before the experience question could bias responses to the experience question in some way. We also acknowledge that the limited objective format of the survey (answered with a slider from 0-100) with constrained definitions is limiting. A more in-depth phenomenological approach would surely provide greater nuance and depth of understanding of belief and experience. However, the nature of such an instrument in terms of administration and scoring would not solve the problem of needing a simple and concise instrument. Others have suggested that paranormal beliefs stem from abnormal brain function or psychopathology such as Dissociative Identity Disorder or Schizophrenia. The NEBS focuses on the phenomenology of paranormal beliefs and experiences regardless of any pathology that may have generated them. Any NEBS results should be interpreted with these limitations in mind.
In summary, the NEBS is a 20-item survey rated on a sliding scale from 0-100, with 10 Belief and 10 Experience items. Both subscales demonstrated convergent validity, internal consistency, and test-retest reliability. A confirmatory factor analysis model demonstrated a good fit for Belief and Experience as separate latent variables. This model was confirmed in another sample where divergent validity was also established. The NEBS is a concise, valid, and reliable instrument for evaluating individual differences in paranormal beliefs and experiences. This measure provides a new tool for rigorously investigating these beliefs and experiences, and their relationship (as predictors, outcomes, or covariates) with other variables of interest such as psychological well-being, physical health, effects of interventions, coping with death and dying, grief and trauma resilience, and extended human capacities, just to name a few.

Open Peer Review
Overall comments: This measure is a valuable contribution to the existing literature that tends to examine beliefs and experiences separately or conflate experiences with beliefs in the measurement of both. However, there are some limitations. It is not currently clear which order the scales were presented to participants to ascertain the convergent construct validity in study 1. The inclusion of these other measures may have influenced responding to the NEBs if presented prior to the NEBS. This should be clarified herein. In terms of the measure itself, think it would have been an improvement to randomly present items for the belief and experience versions of the same set of phenomena, and include more neutral language for the experience terms, which I think, still implicitly address beliefs and certainly conflate experience with appraisal, when this could have been further teased apart. Although this cannot be done on the current dataset, I think the authors should consider these issues in their discussion and note that the way that the items are worded and the measure was presented may have inflated the correlations between the 2 factors/subscales. I would also be interested to see the emergent results of EFA to see if the traditional distinctions between ESP, mind-matter interactions and survival play out within the dataset. The authors should also discuss the emergence of 2 factors for their CFA, as factors are statistical and not psychological. The belief questions and experience questions are framed as two parts to one overall type of experience, which immediately sets them apart as two things. I would like to see this addressed in their discussion with regard to how the measure will be developed moving forward. More detailed comments are included below.

Introduction:
I am not convinced that experience is necessarily a memory of an experience that one judges to be genuine. I think that there is evidence for folk to report an anomalous experience that could feasibly fit the category of paranormal without it necessarily being labeled/appraised as such and that this is demonstrated by research using the Irwin et al measure of anomalous experiences and appraisals. One might argue that all experience is being actively constructed as it is happening in addition to the memory of that experience. I think that the authors might think about adding another subscale (3rd part to each question) to future versions of this measure that reflects the experience without a paranormal attribution. This might be addressed in the discussion. I think that the relatively recent measure by Irwin et al. should be more thoroughly discussed in your literature review and more rationale given here for the development of the current measure as a viable alternative addition. The Irwin measure includes the possibility for several different appraisals of an ostensibly paranormal experience that is described phenomenologically rather than including a paranormal interpretation in the item description. It is commendable that the authors attempted to describe these phenomena without using the term paranormal, which may prime or bias responses to questions like this. The term noetic usually refers to experiential and direct knowledge, which I think encompasses experiences rather than beliefs which are usually understood to be more cognitive, but I think the terminology for the scale is fine as it is.

Method:
A lot of good design and methodological considerations have been included herein. This type of design is not usually referred to as an observational study in psychology. A more commonly used description would be a correlational design. Who were the consultants who helped the authors develop the measure? It seems that some of the items were derived from the existing literature which is commendable. With that said, it does seem as though you are including some items that are not usually included as traditional parapsychological experiences -i.e., extraterrestrials -alongside others that are. This might be addressed later in terms of how the items behave as part of the questionnaire and which relate more to the overall measure/subscales. What was the order of presentation of the variables for study 1 and study 2? Ideally, NEBS should be first, so that responding was not influenced by scoring on the other paranormally relevant measures in study 1. Although I think this is a valuable addition to the literature, I am not convinced that this measure completely addresses the very real problem of the separate measurement of beliefs and experiences of the same phenomena (although it is a step in the right direction). The items are presented together, with the belief item first. I think it would have been preferable to have presented the items in a random order overall and with the deployment of yet more neutral language and item construction. Many of the experience items are unfortunately conflated with belief because the paranormal belief always comes first (as the authors acknowledge in terms of possible response bias). In addition, the language of the experience item actually incorporates a paranormal interpretation (belief in something paranormal is followed by items asking if the respondent has experienced the paranormal phenomenon as denoted in the preceding belief item). This is problematic because the two subscales are more dependent from the outset than they might otherwise be. It would also be difficult for a skeptical person to respond to the experience items affirmatively because the experience cannot be interpreted neutrally or mundanely -in other words, the measure may be set up for believers to respond affirmatively to experiences. I think it would be preferable to have matched all the paranormal phenomena in terms of belief and equivalent experiences but used separate and neutral language for the experience items. Another way forward might be to add a 3rd part to the questions that would tap experiences (described neutrally, i.e., without any appraisal as part of the question). For example, although it is neatly presented with 2 parts to each paranormal phenomenon, it might work better to have the items presented in a more neutral way. E.g., some people have reported ____ which many interpret as ____ and then having a belief item and an experience item below it?
The authors should acknowledge this in their discussion and consider ways that they might adjust the measure in the future.

Results:
Were the data for the first sample normally distributed? Do you have any data for gender breakdown, etc. to provide norms for these scales? Your Cronbach's alpha is generally very high which is commendable -seems to be tapping something common and consistent -paranormal ideation. I think it would be valuable to explore individual items within each subscale as well as overall subscales for Cronbach's alpha to explore how well each item performs. As this paper seems to be focusing on the development of a new measure, I think it would be useful to spend some time on an item analysis and considering whether all of the items you have included in the measure should remain as part of the scale. An item analysis would usually look to see how the removal of a given item might influence the overall Cronbach's alpha. Looking at the correlations between items, it does seem that some of your items perform less well than othersin particular the item describing intuition and possibly also the item referring to extraterrestrials. I suggest that this should be discussed in your results and discussion sections with some rationale given for why all of these items were retained. It may be that this was addressed in the prior studies conducted by and referred to by the authors, but it does seem that additional items were added prior to the current study (intuition). I think it makes sense why a CFA has been run, but I would also like to see the results of an EFAperhaps as a footnote to explore emergent factors regarding how the different items work together (e.g., do we see belief and experience emerging here too, and do we see traditional categories of subjective paranormal phenomena emerging? Are they different for belief and experience?). The authors should further discuss the emergence of 2 factors for their CFA, as factors are statistical and not (necessarily) psychological. The belief questions and experience questions are framed as two parts to one overall type of experience, which immediately sets them apart as two things. I would like to see this addressed in their discussion with regard to how the measure will be developed moving forward. Are there plans to explore how a longer period of time between the first and second mailings impact the test-retest data? One month is a fairly short period of time and I wonder how the measure would perform with a longer gap. The second mailing included only 26% of those in the first mailing, which is on the low side, although it is noted that the demographics are equivalent. It is commendable that the authors attempted to explore divergent reliability in the second sample. However, the second sample is very biased in terms of gender, age, affluence, etc. This should be noted in the discussion. The variables that you explored here do not relate to NEBS for this particular sample but may relate to other more representative samples. Which order were the scales presented in? Was NEBS first or after these other measures? The second sample seems to be very different to the first as you note that experience and beliefs are both much higher in this second sample.

Discussion:
The questionnaire has potential but has some limitations. The authors should talk more about how the measure might be developed/tested in the future. The authors include a very brief discussion on the relationship between experiences and beliefs in their discussion. There is a lot of research on this topic, particularly in terms of recent research on appraising ostensibly paranormal events as being at the heart of the generation of beliefs. I think that Pekala's (2000) work on the levels of anomalous experiences would be a valuable addition here or in the literature review, as they talk about anomalous sensations, anomalous experiences, and anomalous beliefs and how they relate to one another. They note that it is very possible to have anomalous sensations and experiences in the absence of holding paranormal beliefs and vice versa and this should be included in the paper.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes 1 Heinrich-Heine-University Düsseldorf, LVR-Klinikum Düsseldorf, Düsseldorf, Germany 2 Florey Neuroscience Institutes, Melbourne, Australia The authors responded comprehensively to the comments and made corresponding changes in the manuscript.
As the authors agree, however, that E-statements are essentially B-statements and that this is intrinsic in all phenomenological surveys that ask participants to reflect on their experiences, it remains unclear why it is stated in manuscript that the aim of the study was to to differentiate beliefs from experiences rather than to focus on the validity and reliability of the scale.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Malcolm Schofield
Department of Psychology, University of Derby, Derby, UK I want to thank the authors for the changes they made to the paper and their thoughtful comments regarding the lack of an EFA on the first sample. The suggestion of an EFA was simply a methodological consideration further strengthen the paper; I am aware that an EFA is not always considered to be the best way of confirming a model when theory strongly suggests a particular factor structure. However, it can introduce unexpected results that can contribute to the current theory. I also apologise for my unfortunate use of the term 'this would probably not give the desired outcome'. I also take the point 'If a CFA fits well and satisfies all assumptions but an EFA indicates that the underlying structure of the data can be represented by different factors, is this sufficient evidence to discard the original CFA' is not sufficient evidence to discard the CFA. I am happy to accept the paper based on the current changes, but I still feel that the way the experience items are worded and presented is a weakness with this paper and that an EFA could provide some interesting results in terms of theory behind how the relationship belief and experience is conceptualised. However, this paper does provide a much-needed way of measuring both belief and experience.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes © 2020 Seitz R. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Rudiger Seitz
1 Heinrich-Heine-University Düsseldorf, LVR-Klinikum Düsseldorf, Düsseldorf, Germany 2 Florey Neuroscience Institutes, Melbourne, Australia This is the careful revision of a thorough study on a 20-item questionnaire for assessing paranormal beliefs using a visual analogue scale with 10 items on belief and 10 items on experience. The test was applied to 361 subjects in an observational study and to a second group of 646 control subjects recruited in their lab. Statistical tests were used to determine validitiy, reliability, internal consistency as well as test-retest reliability. The results show that the questionnaire matches these quality measures. The study is novel and timely. But there still is room for improvement.
The authors call their scale noetic. Noetic refers to the philosophical concepts of mind, understanding or intellect and was elaborated as a notion of high-level scientific thinking in the nineteenth century. The authors quote the ancient ethymology on page 4. Accordingly, the title of the study in its present form suggests a broad belief concept comprising both natural and potentially paranormal beliefs. Since the authors treat noetic, however, synonymous to paranormal and since paranormal beliefs are the content of the paper, noetic is misleading for the title of this paper and ought to be replaced by e.g. paranormal.
Likewise, the authors fail to provide a greater picture of what a belief is. In fact, natural or normal beliefs are essential products of brain function (Seitz et al. in this journal; cited in the paper). In contrast the paranormal beliefs of the kind the authors address in this study are considered to result from abnormal brain function. As paranormal beliefs are the objective of the study, the questionnaire is not applicable as a general or comprehensive belief scale. This should be stated in the discussion.
Right from the first sentence of the introduction the authors point out that they focus on "paranormal beliefs". Also, the authors suggest that belief pertains to the abnormal or paranormal and provide a good survey about similar studies. But it remains unclear what abnormal beliefs are in comparison to delusion-like beliefs as investigated in the normal population by Pechey and Halligan (2011).
The scale items (presented in Appendix I) consist of statements beginning with " I believe …" (Bstatement) and throughout with one statement "I have personally had this experience" (Estatement). In contrast to what the authors note the high correlation of the ten B-and Estatements is not surprising, as both statements involve the same types of neural information processing. The B-statements reflect a personal inference explaining a previous perception or experience of high personal relevance, which is a belief as described earlier by Seitz and Angel in this journal (cited in the paper). Therefore, it is justified that the authors call these phrases Bstatement. For comparison, the E-statement focuses semantically onto an experience of a paranormal perception. Asking the subjects to approve or deny this statement, however, requires that the subjects recall this very perception or experience. This recall from memory has a RS: The authors call their scale noetic. Noetic refers to the philosophical concepts of mind, understanding or intellect and was elaborated as a notion of high-level scientific thinking in the nineteenth century. The authors quote the ancient ethymology on page 4. Accordingly, the title of the study in its present form suggests a broad belief concept comprising both natural and potentially paranormal beliefs. Since the authors treat noetic, however, synonymous to paranormal and since paranormal beliefs are the content of the paper, noetic is misleading for the title of this paper and ought to be replaced by e.g. paranormal.
-We have changed the title of the paper to "Measuring extraordinary experiences and beliefs: A validation and reliability study." RS: Likewise, the authors fail to provide a greater picture of what a belief is. In fact, natural or normal beliefs are essential products of brain function (Seitz et al. in this journal; cited in the paper). In contrast the paranormal beliefs of the kind the authors address in this study are considered to result from abnormal brain function. As paranormal beliefs are the objective of the study, the questionnaire is not applicable as a general or comprehensive belief scale. This should be stated in the discussion.
-We have added these sentences to the discussion "Others have suggested that paranormal beliefs stem from abnormal brain function or psychopathology such as Dissociative Identity Disorder or Schizophrenia. The NEBS focus on the phenomenology of paranormal beliefs and experiences regardless of any pathology that may have generated them." RS: Right from the first sentence of the introduction the authors point out that they focus on "paranormal beliefs". Also, the authors suggest that belief pertains to the abnormal or paranormal and provide a good survey about similar studies. But it remains unclear what abnormal beliefs are in comparison to delusion-like beliefs as investigated in the normal population by Pechey and Halligan (2011).
-Pechey and Halligan comment in their abstract that "Delusions are defined as false beliefs different from those that almost everyone else believes." Their study shows that many of these beliefs are commonly believed by the general population and thus, should not be considered delusions. In fact, their item "The soul or spirit survives death" corresponds with the NEBS item #2 and "Some people communicate with the dead" corresponds to NEBS item #6 and both these items were commonly endorsed (@64% and 55% respectively) in Pechey and Halligan's study. We also never use the word "abnormal" in the paper. We are not saying that paranormal beliefs are abnormal anywhere in the manuscript. As Pechey and Halligan's paper supports, some of these beliefs are "normal" in that many in the population believe them. Also, the first DSM-5 schizophrenia diagnosis criterion revolves around the symptoms that may include delusions but do not have to. There must be two or more symptoms of delusions, hallucinations, disorganized speech, grossly disorganized or catatonic behavior, or negative symptoms for at least a onemonth (with at least one of those symptoms being delusions, hallucinations, or disorganized speech). Most importantly in this conversation is that the symptoms must impair the person's work, interpersonal relations, or self-care for a significant amount of time since the symptoms began. The symptoms must also be ongoing for at least 6 months and other mental illness and effects from substances or medical conditions must be ruled out. Other symptoms that contribute to a schizophrenia diagnosis beyond these key criteria include: inappropriate emotions, disturbed sleep, negative mood, anxiety and phobias, detachment or feeling of disconnection from self, a feeling that the surroundings aren't real, impairments in language, cognitive processing and memory, social deficits, and hostility and aggression.
The goal of our study was not to highlight pathology or abnormal function but to provide a tool to quickly evaluate the phenomenology of experience and beliefs of this nature.

RS:
The scale items (presented in Appendix I) consist of statements beginning with " I believe …" (B-statement) and throughout with one statement "I have personally had this experience" (E-statement). In contrast to what the authors note the high correlation of the ten B-and E-statements is not surprising, as both statements involve the same types of neural information processing. The B-statements reflect a personal inference explaining a previous perception or experience of high personal relevance, which is a belief as described earlier by Seitz and Angel in this journal (cited in the paper). Therefore, it is justified that the authors call these phrases B-statement. For comparison, the E-statement focuses semantically onto an experience of a paranormal perception. Asking the subjects to approve or deny this statement, however, requires that the subjects recall this very perception or experience. This recall from memory has a probability of being true for the subject which also reflects a belief and would be expressed correspondingly by "I believe that I have personally had this experience". Consequently, this renders the E-statements to be essentially B-statements as well.
-Yes, this is a good point. We believe this is intrinsic in all phenomenological surveys that ask participants to reflect on their experiences.

RS:
The authors present nicely the development of their scale (pages 3 through 4). This should go under a headline of its own. For comparison, study 1 refers to the procedures described on pages 4 through 5. Therefore, the headline of study 1 should be moved accordingly.
-Thank you we have moved the Study 1 heading to below the development of the scale.
RS: It is unclear who the 899 participants are (page 4 left column). If this pertains to both study populations, why is this number mentioned here? Please, clarify. Likewise, it is unclear, in which subgroups these participants were analyzed.
-Thank you, we see how this can be confusing. We were listing the number of participants in the referenced study. We've removed this from the manuscript as the number of participants in this referenced study is not important here. Readers can see the referenced paper if they would like to see more details of participant numbers in each group.
Page 12, left column, line 1: I guess it should be than instead of that.
-Thank you, this has been fixed.
Competing Interests: No competing interests were disclosed.

Version 1
Reviewer Report 17 January 2020 https://doi.org/10.5256/f1000research.22429.r56127 © 2020 Schofield M. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Malcolm Schofield
Department of Psychology, University of Derby, Derby, UK This paper acknowledges the importance of having a scale that distinguishes paranormal belief from paranormal experience and develops a concise measure of twenty items. However, to bring it up to an acceptable standard, the following points should be addressed: The introduction is severely lacking any review of previous scales and how different factors have been conceptualised in the past.

○
The development of the NEBS should be in the methods section and not the introduction.

○
The methods sections would benefit from traditional subheadings.

○
The results section contains demographic data which should be in the participant's section in the methods. The items relating to experience are all the same.

○
The participant is asked if they have experienced something directly after they have been asked if they believe it; this could present a confound.
○ Overall, while this paper offers an interesting premise, there are several flaws. The scale itself does not have dedicated experience items that refer to specific phenomena, which is problematic. The methodology itself is also flawed. I suspect an Exploratory Factor Analysis (EFA) would reveal that the factors would be around the phenomena rather than the dichotomy of belief and experience. I.e., a person, who believes in particular phenomena is more likely to experience them; therefore, the factors will demonstrate this. I would recommend that this be more of an exploratory study and that an EFA be run on the first sample at the very least. However, this would probably not give the desired outcome. The items that relate to experience should ideally be able to be answered in isolation and not be dependent on the belief items. This could act as a prime, with people who state they believe in something when asked if they have experienced it directly after being more likely to agree.

Is the work clearly and accurately presented and does it cite the current literature? Partly
Is the study design appropriate and is the work technically sound? Partly

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
Each questionnaire was evaluated for the number of items, whether it assesses belief, experience or both, whether it evaluates belief and experience as separate constructs, and subscales if applicable." The development of the NEBS should be in the methods section and not the introduction.
-The development of the NEBS has now been moved to the methods section.
The methods sections would benefit from traditional subheadings.
-The methods sections now has subheadings.
The results section contains demographic data which should be in the participant's section in the methods.
-The participant demographic information was moved to the participant's section in the methods.
An exploratory factor analysis would have been useful on the first sample.
-Please see last item for full response of this comment. The items relating to experience are all the same.
-The wording of the experience items are the same but reflect the belief construct expressed in the item directly before.
The participant is asked if they have experienced something directly after they have been asked if they believe it; this could present a confound.
-Please see response below about EFA and dependencies between belief and experience items.
Overall, while this paper offers an interesting premise, there are several flaws. The scale itself does not have dedicated experience items that refer to specific phenomena, which is problematic. The methodology itself is also flawed. I suspect an Exploratory Factor Analysis (EFA) would reveal that the factors would be around the phenomena rather than the dichotomy of belief and experience. I.e., a person, who believes in particular phenomena is more likely to experience them; therefore, the factors will demonstrate this. I would recommend that this be more of an exploratory study and that an EFA be run on the first sample at the very least. However, this would probably not give the desired outcome. The items that relate to experience should ideally be able to be answered in isolation and not be dependent on the belief items. This could act as a prime, with people who state they believe in something when asked if they have experienced it directly after being more likely to agree.
-Exploratory factor analysis (EFA) is a statistical technique used to find the underlying structure of a set of observed variables (Gorsuch, 2015), whereas confirmatory factor analysis (CFA) is used when researchers have formulated a hypothesis regarding the relationship between observed variables and the underlying latent factors (Gorsuch, 2015). There has been a debate regarding the circumstances in which these two analyses should be used in research (Hurley et al., 1997) and whether/when they should be used in tandem (Gerbing and Hamilton, 1996). EFA can be used prior to cross-validation with CFA for the purpose of model specification (Gerbing and Hamilton, 1996). EFA can also be used after CFA to explore poor fits in CFA models, explore factor structures when the original hypotheses are weak, and confirm factor structures when the original hypotheses were strong, but certain assumptions are not reasonable (Schmitt, 2011).
The reviewer recommends an EFA on the first sample. It is suggested, but not made clear, that an EFA on the first sample would serve the purpose of exploration regarding other possible factor structures. It is also suggested that EFA could possibly identify elevated correlation in the already correlated belief and experience factors that is due to survey structure alone.
In this study, hypotheses are based on theory and practice, and are therefore strong. It is unclear whether EFA is suggested to confirm assumptions that may not hold in this study. If so, what are these specific assumptions and why are they unreasonable or not upheld? If EFA is suggested as a pre-cursor to CFA, then EFA should be conducted followed by cross-validation with CFA on an independent data set, as suggested by Gerbing and Hamilton (1996). If this procedure is followed, either (1) the EFA will support the researchers' hypotheses or (2) the EFA will not support the researchers' hypotheses. If (2) occurs, it is unclear what impact this should have on the current study. The reviewer notes that if (2) occurs, "this would probably not give the desired outcome", and it is highly pertinent to point out that the researchers did not undertake this study to achieve a "desired outcome" but rather to test hypotheses. If (2) occurs, it is not clear whether this invalidates NEBS as a functioning survey tool. CFA does not always confirm a factor structure obtained via EFA (van Prooijen and van der Kloot, 2001;Borkenau and Ostendorf, 1990). If a CFA fits well and satisfies all assumptions but an EFA indicates that the underlying structure of the data can be represented by different factors, is this sufficient evidence to discard the original CFA? If so, can the reviewer provide support for this claim? In Gorsuch's classic text on factor analysis, he states "Confirmatory factor analysis tests hypotheses that a specified subset of variables legitimately define a prespecified factor" (Gorsuch, 2015). If the researchers have found that a subset of variables legitimately defined their prespecified factors and this was the central goal of their paper, what then is the purpose and role of added exploratory analysis in the context of this paper?
If the reviewer's hypothesis that an EFA "would reveal that the factors would be around the phenomena rather than the dichotomy of belief and experience" is supported by an EFA on the first sample, then it is not clear whether presenting items in isolation would remedy this given the high inherent correlation between belief and experience, and therefore between the items themselves. If this statement is more than opinion or a hypothesis and there is scientific support for it, how should this isolation be achieved and how much isolation is enough isolation to guarantee that this effect is removed from the analysis? If the diagnosis of this specific issue is the only reason for the suggested EFA, then if the researchers were to reorganize and readminister the surveys according to the reviewer's specifications, would an EFA still be necessary? It is also worth noting that the question of whether or not the survey structure introduced added dependence between items can be easily tested experimentally by providing the original and modified surveys (with some reasonable span of time in between) to a group of (new) participants and quantifying the difference in responses. If the difference is not statistically significant, then any added dependence should be negligible. Would this kind of adjustment in the methodology remedy the need for an EFA, according to the reviewer?
In summary, it is unclear whether the reviewer calls for an EFA to 1) examine other factor structures in the data or 2) test additional dependencies that may have been introduced by the survey structure. If (1), it is unclear what role this EFA would play in the current paper. If (2), it may be more straightforward to address this experimentally.