The Noetic Experience and Belief Scale: A validation and reliability study [version 1; peer review: 1 approved with reservations]

Background: Belief in the paranormal is widespread worldwide. Recent surveys suggest that subjective experiences of the paranormal are common. A concise instrument that adequately evaluates beliefs as distinct from experiences does not currently exist. To address this gap, we created the Noetic Experiences and Beliefs Scale (NEBS) which evaluates belief and experience as separate constructs. Methods: The NEBS is a 20-item survey with 10 belief and 10 experience items rated on a visual analog scale from 0-100. In an observational study, the survey was administered to 361 general population adults in the United States and a subsample of 96 one month later. Validity, reliability and internal consistency were evaluated. A confirmatory factor analysis was conducted to confirm the latent variables of belief and experience. The survey was then administered to a sample of 646 IONS Discovery Lab participants to evaluate divergent validity and confirm belief and experience as latent variables of the model in a different population. Results: The NEBS demonstrated convergent validity, reliability and internal consistency (Cronbach’s alpha Belief 0.90; Experience 0.93) This paper acknowledges the importance of having a scale that distinguishes paranormal belief from paranormal experience and develops a concise measure of twenty items. However, to bring it up to an acceptable standard, the following points should be addressed: interesting premise, are several flaws. The scale


Introduction
"Paranormal beliefs pertain to phenomena that have not been empirically attested to the satisfaction of the scientific establishment" 1 . Paranormal beliefs encompass a broad range of concepts, such as ghosts or spirits, extrasensory perception (ESP), extraterrestrial beings, and mind-to-mind communication, or telepathy. Belief in the paranormal is widespread around the world [1][2][3][4][5][6][7][8][9][10][11][12][13] . For example, in a Gallup poll of 1,002 United States adults conducted in 2005, 55% respondents believed in psychic or spiritual healing or the power of the human mind to heal the body, 41% believed in extrasensory perception, and 31% believed in telepathy or mind-to-mind communication 14 .
However, having a belief in the paranormal does not necessarily mean having experienced the paranormal. A paranormal experience refers to an individual's memory of an experience that one judges to be genuine. The memory of a paranormal experience relies on a different mental substrate than a belief based on environment, education and reasoning and the neural structures underlying memory of an experience and belief are likely different 15,16 . Paranormal belief and experience are often correlated when measured simultaneously, although this is rarely done 17,18 . For example, one study found a positive correlation (r = 0.61) between paranormal experience and belief scores 12 . Another interesting study found that exposure to television programs that regularly depict paranormal phenomena was positively correlated with belief, but only for respondents who had personal experiences 19 .
Prevalence of reported paranormal experiences evaluated over the last 40 years in a variety of populations has ranged from a low of 10% in Scottish citizens 20 to a high as 97% in enthusiasts in the United States 12 . Two very large prevalence studies have been conducted. One surveyed adults in 13 European countries and the United States (N=18,607). European respondents reported experiencing telepathy (34%), clairvoyance (21%), and contact with the dead (25%). Percentages for the U.S. adults were considerably higher: 54%, 25%, and 30% respectively 21 . Another large study of British adults (n=4,096) found that 37% of respondents reported at least one paranormal experience defined as precognitions, extra-sensory perception, mystical experiences, telepathy, and after-death communication 22 . Other smaller prevalence studies have been conducted around the world. Haraldsson et al. conducted two surveys of prevalence in Iceland, one in 1974 with 902 participants 5 and one in 2006 with 991 participants 3 . He found that psychic phenomena actually increased from 59% of men and 71% of women in 1974 to 70% of men and 81% of women in 2006. In Scotland, 10-16% of the general population sample (n -241) had experienced second sight, with the exception of the Grampian area where prevalence was more than doubled at 33% 20 . Chinese, Japanese, African-American and Caucasian-American college students (n -1922) were surveyed and 31-47% reported having at least one experience 23 . Of 502 adults in Winnipeg, Canada 65% 24 and 38% of 622 Charlottesville, Virginia students and townspeople 25 reported having at least one experience. In the United States, 67% of the 1460 participants reported having had an ESP experience, 31% a clairvoyant experience, and 42% contact with the dead 26 .
More recently in the United States, 89.3% of the general population, 89.5% of scientists and engineers, and 97.8% of paranormal enthusiasts reported at least one paranormal experience (n -899) 12 .
Specificity of the work in this field is limited by the lack of questionnaires that adequately separate paranormal belief from experience and do so concisely 1,27 . Using ambiguous measures can lead to confounding the two constructs of belief and experience, and blur results 1,6 . Instruments that do separate these constructs are long and not conducive to the time constraints of many studies -see Exceptional Experiences Questionnaire 28 and Anomalous Experience Inventory 29 . To address these limitations and as part of a larger research program on extended human capacities, we created the Noetic Experience and Belief Scale (NEBS), a 20-item survey that evaluates paranormal beliefs and experiences separately. The present studies investigate the psychometric properties of the Noetic Experience and Belief Scale in two populations. By studying these phenomena, we aim to gain a deeper understanding of the nature of consciousness and the reach of human potential.
Initial development of the NEBS The NEBS was developed through consensus by the authors and two expert consultants who actively work in the field. This group was informed by our own previous studies and by reviewing other studies and previously used instruments that evaluated paranormal beliefs and/or experiences. One previous study 30 evaluated the prevalence of 27 paranormal experiences listed here in decreasing order of prevalence: Claircognizance, Clairempathy, Precognition, Lucid Dreaming, Emotional Healing, Clairvoyance, Clairsentience, Animal Communication, Telepathy, Aura Reading, Astral Projection, Clairaudience, Clairalience, Mediumship, Channeling, Physical Healing, Geomancy, Retrocognition, Psychometry, Remote Viewing, Automatic Writing, Clairgustance, Psychokinesis, Pyrokinesis, Levitation, and Psychic Surgery (please see extended data for definitions of each of these terms 31 ). Based on feedback from participants and a review of these items, we removed emotional healing (very similar to physical healing), psychic surgery (very rare), and clairsentience (very similar to claircognizance), renamed channel to psychophony and mediumship to contact with the dead, and added the item Information from Dreams. We then conducted another prevalence study in a different population. Notably, we did not use the jargon term for each paranormal belief/experience, but instead used as neutral language as possible to describe the experience itself. For example, rather than asking if the participant had ever experienced "pyrokinesis -the ability to create and/or manipulate fire", the item asked "Have you ever created fire using only your concentration or will?" These neutral language items were then administered to 899 participants consisting of three samples: a general population sample, scientists and engineers, and paranormal enthusiasts 12 .
In both studies, we found that some items were highly correlated and represented overlapping constructs. They could also be viewed as specific nuanced experiences within a larger extended human capacities category. For example, psychic physical healing or the purported ability to feel other people's physical symptoms in your own body and heal, transform, or transmute them would fall under the umbrella category of psychokinesis or the purported ability to influence a physical system without any physical interaction or with mental effort alone. Thus, in an effort to reduce participant burden and allow for quick assessment of experiences and beliefs we collapsed any overlapping constructs into individual items for each of the following categories: 1. Non-local consciousness (e.g. Astral Projection, Lucid Dreaming); 2. Extraterrestrials; 3. Precognition/Retrocausation; 4. Survival of Consciousness (after bodily death); 5. Contact with the dead (Mediumship); 6. Clairvoyance (Claircognizance, Clairempathy, Clairvoyance, Clairsentience, Aura Reading, Clairalience, Clairaudience, Geomancy, Clairgustance, Remote Viewing, Psychometry, Animal Communication); 7. Psychokinesis (Physical Healing, Psychokinesis, Psychic Surgery, Pyrokinesis, Levitation); 8. Telepathy; 9. Automatism (Channeling, Automatic Writing) We also reviewed a number of existing questionnaires that measured paranormal experience and/or belief 8,26,28,29,32-45 . Each scale was evaluated for number of items, belief and experience as separate constructs, and subscales (please see extended data 31 ). From this review, an additional item on intuition, representing perhaps the most common paranormal experience, was added to the new scale for a total of 10-items.
The instrument was called the Noetic Experience and Belief Scale using noetic from the Greek noēsis/noētikos, meaning inner wisdom, direct knowing, or subjective understanding; and unlike a vague impression, a noetic experience carries a deep sense of authority and certainty. We included "noetic" in the title rather than "paranormal" in part because of the stigma associated with the term paranormal, which could introduce bias that might be mitigated by using an alternate term. Similarly, the paranormal categories were not stated in the scale but only descriptions of the constructs included (please see extended data 31 .
The objectives of the following two observational studies were to evaluate the validity and reliability of the NEBS and to confirm the two latent variables of belief and experience in a confirmatory factor analysis. In study 1, the survey was administered to 350 participants for the validity and confirmatory factor analyses and again to a subsample of 96 of these participants for a test-retest analysis. In study 2, the survey was administered to a different population where divergent validity was evaluated and the factor model reevaluated. We hypothesized that NEBS would be valid, reliable, and demonstrate good fit for a model with belief and experience as latent variables in both populations.

STUDY 1: General population sample
Methods Procedures The first study administered the NEBS to a randomly selected general population group in the United States to establish validity, test-retest reliability, and confirm the two latent variables of belief and experience. We contracted with Lucid, LLC (New Orleans, Louisiana) to obtain completed surveys from an unbiased census-distributed sample of 350 participants representative of the general population in the United States. The sample was unbiased in that it was not associated with the Institute of Noetic Sciences or any other paranormal or noetic-related group. Lucid, LLC is a marketplace that connects hundreds of sample suppliers with individual primary research studies to facilitate online surveys. Lucid uses screening questions to qualify respondents for a particular study then through programmatic technology aligns the best suppliers for that individual audience. Once a respondent qualifies through the screener, the appropriate suppliers are notified through an API and an email is triggered from the supplier directly to the survey taker. Each of the suppliers on the marketplace has approximately 200 pre-profiled mapped qualifications. These include age, gender, household income, job role, hobbies, etc. Lucid uses these qualifications as well as the screening questions to ensure efficiency and high quality when matching survey takers with individual projects. All potential volunteers are screened, checked for validity, and emailed a link to the survey. Participants were Englishspeaking adults in the United States. Inclusion criteria were: Adults aged 18 to 89, who could read and understand English, and were willing to complete questionnaires. Exclusion criteria were: Children (<18 years old) or Elders >89 years old. Elders 90 years old and older were excluded because the survey was designed to be anonymous and recording ages greater than 90 is considered private health information 46 . Targets for distribution were based on United States census values and were as follows: Gender -50% males and females, Age -18-24 -13%, 25-44 -41%, 45-64 -30%, 65+ -16%; Ethnicity -Hispanic -11%, Black -12%, White (non-Hispanic) -59%, Other -18%. The study was approved by the Institute of Noetic Sciences Institutional Review Board # WAHH_2018_06. Participants were given a link to a Health Insurance Portability and Accountability Act compliant survey on SurveyMonkey. The first page of the survey was a consent form (please see extended data 31 ). Participants were asked to read the form and check a box acknowledging that they had been informed of the procedures, and risks and benefits of participating in the study. They then completed the survey, which took approximately 15-20 minutes. Data were collected from November 9, 2018 through December 14, 2018. All data were collected anonymously, with no identifiers or IP addresses. Participants were compensated $3 for completing the survey once and $7 ($3 + $4) if they also participated in the retest administration.
Measures In addition to demographic information (e.g. age, gender, marital status, socioeconomic status), the main instrument in the survey was the Noetic Experience and Belief Scale (NEBS). The scale contains ten statements about beliefs in intuition, non-local consciousness, extraterrestrials, precognition, survival of consciousness, contact with the dead, clairvoyance, psychokinesis, telepathy, and automatism that all begin with the stem "I believe…" and then a description of the concept. The participant rates each belief statement on a slider anchored by Disagree Strongly (0) and Agree Strongly (100). For each of the ten items, participants also answered "I have personally had this experience." on a slider scale anchored by Never (0) and Always (100). Two experience items were worded differently to accommodate the nature of the concept. The life after death experience item was worded "I have personally had an experience that I interpreted as a proof that consciousness survives the physical body." and the contact with the dead item was worded "I have personally had the experience of contact with the dead." Six of the 10 items were from the Australian Sheep-Goat Scale, three of which were exactly the same (#'s 9, 10, 11), and three were modified (#'s 4, 5, 14) 45 . The scale results in overall scores for paranormal belief and experience by averaging the ten items for each subscale. Item scores can also be used individually for scores on each specific category. Internal consistency of the NEBS scale was calculated with a Cronbach α coefficient, as described subsequent sections (Full scale is available in extended data 31 ).
Convergent construct validity was measured by administering pre-existing survey instruments that evaluate similar concepts: Australian Sheep-Goat Scale 45,47 , Revised Paranormal Belief Scale 43 , and Anomalous Experiences Inventory (AEI) 29 . The Australian Sheep-Goat Scale is an 18-item questionnaire on various beliefs and experiences. Respondents endorse True (2 points), Uncertain (1 point), or False (0 points) for each item. Values are then summed to form a score ranging from 0-36. The Revised Paranormal Belief Scale is a 26-item scale that measures the degree of belief in the paranormal in each of seven dimensions: Traditional Religious Belief, Psi, Witchcraft, Superstition, Spiritualism, Extraordinary Life Forms, and Precognition. Respondents endorse how strongly they believe in each item on a 7-item Likert scale. Subscales and a total score are obtained by calculating means of specific items. The AEI is a 70-item questionnaire that evaluates multiple subscales: anomalous/paranormal experience, anomalous/paranormal beliefs, anomalous/paranormal ability, fear of the anomalous/paranormal, and drug use. Respondents answer True (1) or False (0) for each item and values are summed for each scale. The scales selected have already been assessed as valid and reliable and used in numerous peer-reviewed publications. Correlation matrices of the scores were evaluated for expected patterns of associations between measures of the same construct.
Test-Retest: Some participants repeated the survey approximately one month later so that test-retest reliability could be assessed with a Pearson correlation coefficient.
Sample size: Some sources suggest at least 10 people per item for psychometric validation although a recent review suggested that sample size is rarely justified a priori 48 . We aimed for a sample size of 350 for the 20-item scale. For confirmatory factor analysis, there is also no agreement on the number of participants needed although sources 49 recommend approximately 10 participants for each estimated parameter (10 × 20 parameters = 200). We had 361 participants resulting in a ratio of 18.05 participants to each parameter estimated.
Confirmatory factor analysis: A confirmatory factor analysis was used (rather than an exploratory factor analysis) because a theoretical framework was already established for evaluating belief and experience as separate constructs, albeit highly correlated 1,12 . The latent variables for the model were Belief and Experience. Observed variables were the 20 NEBS items. Univariate variables were tested for normality with the Shapiro-Wilk Test and any outliers assessed with scatter and box plots. Normality of residuals were evaluated with kernal density estimates and standardized normal probability plots. Outliers were evaluated with residuals, leverages, influence and Cook's distance. Multicollinearity was evaluated with the variance inflation factor (VIF), which is the quotient of the variance in a model with multiple terms by the variance of a model with one term alone and quantifies the severity of multicollinearity. An unstructured covariance matrix was used so as to not impose any constraints on the variance and covariance values. All 20 items were highly correlated and thus, covariances between unique factors for all items were included in the model and then removed if they did not reach significance. All statistical analyses were conducted with Stata 15.0 (StataCorp, LLC, College Station, TX).

Construct validity
In total, 444 began the survey; 26 did not agree to the consent form and 57 agreed to the consent form but did not complete the survey. The remaining 361 participants completed the survey (underlying data 50 ). Surveys were collected between November 9, 2018 and December 13, 2018. Participants were on average 44 years old ± 16.8 and had 14.5 ± 5.3 years of education. Of these, 52% were female and 56.8% were in-relationship. Participants were mostly Caucasian (67% Caucasian, 13% Black or African American, 8% Hispanic or Latino, 6% Asian or Pacific Islander, 5% American Indian or Alaskan Native, and 2% preferred not to answer). In terms of salary, 67% of participants had earned between 0 and $75,000 (30% Under $30K, 37% $30K to under $75K, 11% $75K to under $100K, 11% $100K or under $150K, 7% $150K to under $250K, 3% $250K or greater, 2% Decline to answer) with an average household size of 2.6 ± 1.4.
The means and standard deviations for the paranormal belief and experience questionnaires are shown in Table 1. All correlation pairs were positive and significant at p = 0.05 level or less (all but three being more than p < 0.00005).

Reliability
Internal consistency Cronbach's alpha was calculated for the NEBS Belief subscale items and Experience subscale items to measure the extent to which the items within the subscales correlated with each other and measured a similar construct 51 . The ten belief items had a Cronbach's alpha of 0.90 and average inter-item covariance of 429.9. The ten experience items had a Cronbach's alpha of 0.93 and average inter-item covariance of 610.7.
Belief: On average, intuition, survival of consciousness, and non-local consciousness were the highest rated Beliefs (see means and standard deviations for each item in Table 2). All Belief construct pair correlations were significant (p < 0.00005). Telepathy Belief and clairvoyance Belief were highly correlated (  Experience: On average, intuition and non-local consciousness were the most common Experiences. All Experience pairs were significantly correlated (p < 0.00005). Seven Experience pairs were highly correlated (r = 0.70 -0.89). Many Experiences were moderately correlated (r = 0.50-0.69).
Belief and Experience: Most Belief and Experience pairs were significantly correlated at the p<0.000005 level except for belief in intuition and the experience of extraterrestrials (p = 0.0004), contact with the dead (p = 0.0001), psychokinesis (p = 0.002), telepathy (p = 0.002), and automatism (p = 0.0016). Belief in telepathy was highly correlated with the Experience of telepathy (r = 0.79). Belief and Experience pairs of the same construct were all moderately correlated except for Survival of Consciousness which had a significant but low correlation. Many beliefs were moderately correlated with Experiences.
Test-retest reliability: Of the original 361 participants who completed the survey, 96 completed the same survey again approximately one month after the first administration (mean 35.3 days ± 3.7) between December 14, 2018 and January 2, 2019. Participants who completed the retest had similar demographics as the original sample (age 44 years old ± 16.9, education 14.3 ± 2.1, 54% male, 64% Caucasian, 53% in relationship, 74% with income under $75,000, and average household size of 2.4 ± 1.4). The NEBS had high test-retest reliability for both the Belief (r = 0.83, p <0.00005) and Experience (r = 0.77, p <0.00005) subscales. The Wilcoxon sign-rank test was used to evaluate individual item and subscale differences because variables were not normally distributed. All items and subscale scores were not significantly different between the two timepoints except for the telepathy Experience item which decreased Experiences for the second administration (Table 3). Individual's responses to the subscales remained relatively consistent across the repeated administration and above standardly accepted values for reliability of r = 0.70 52 .
Belief and Experience as separate constructs Confirmatory factor analysis was performed based on data from 361 respondents; there were no missing data. The retest data of the 96 participants were not included in the confirmatory factor analysis modeling. A correlation table of observed values with means and standard deviations is shown in Table 2. The a priori theoretical model of Belief and Experience items as described in the statistics section is presented in Figure 1.
We hypothesized a two-factor model to be confirmed in the measurement portion of the model where Belief and Experience were the latent variables. We evaluated the assumptions of univariate and multivariate normality and linearity. Univariate variables were not normally distributed individually. The ADF estimation method was used because it makes no assumption of joint normality or even symmetry for observed or latent variables (StataCorp, 2013). Residuals were normally distributed. There  were no observations with a Cook's distance greater than 1. No variable had a VIF less than 0.1 or greater than 10 (average VIF for all variables 3.29) indicating acceptable multicollinearity. The model chi-square (159) was 283.1, the root mean square error of approximation (RMSEA) was 0.060 (90% confidence interval 0.051-0.069), the comparative fit index (CFI) was 0.94, the standardized root mean squared residual (SRMR) was 0.13, and the Tucker-Lewis fit index (TLI) was 0.90. These values represent a good fit of the model to the dataset as indicated by commonly reported fit statistics (RMSEA < 0.08, CFI ≥ 0.90, SRMR < 0.08, TLI ≥ 0.95) 53 .

STUDY 2 -IONS Discovery Lab sample Methods
The NEBS was then administered to participants attending workshops at the IONS Discovery Lab and also online.  60 .
Arizona Integrative Outcomes Scale (AIOS) is a one-item, visual analogue self-rating scale (VAS) with two alternate forms (one for daily ratings, AIOS-24h; and one for monthly ratings, AIOS-1m). The daily rating version was used for this study. The instructions are: "Please reflect on your sense of wellbeing, taking into account your physical, mental, emotional, social, and spiritual condition over the past 24 hours. Mark the line below with an X at the point that summarizes your overall sense of wellbeing for the past 24 hours." The horizontally-displayed VAS is 100 mm in length, with the low anchor being, "Worst you have ever been" and the high anchor being, "Best you have ever been." The AIOS has demonstrated the ability to discriminate between healthy and unhealthy populations and has adequate convergent and divergent validity 54 .
Positive and negative affective well-being is measured with a variety of dichotomous indicators asking subjects whether they had experienced an emotional state for much of the day yesterday. For positive affect, the emotional states are happiness, enjoyment and smiling/laughter, which, aggregated together, have a reliability of α = 0.72. For negative affect, the emotional states are stress, worry and sadness, with a reliability of α = 0.65 55 .
Overall health is a single item question "In general, how would you rate your overall health?" which is answered by choosing one of five options: Poor; Fair; Good; Very good; Excellent 56 .
Acute sleep scale is a single item scale asking participants to rate their quality of sleep over the past 24 hours on an 11-point numeric rating scale ranging from 0 denoting "best possible sleep" to 10 denoting "worst possible sleep" 57 .
The Numeric Pain Rating Scale (NPRS) is a segmented numeric version of the visual analog scale in which a respondent selects a whole number (0-10 integers) that best reflects the intensity of his/her pain. The NPRS is anchored by terms describing pain severity extremes. Participants are asked to report pain intensity "in the last 24 hours" or an average pain intensity with 0 = "No pain" to 10 = "Worst possible pain" 58 .
Big Five Inventory-10 (BFI) scale is a 10-item measure of the Big Five (or Five-Factor Model) dimensions: Neuroticism, Extraversion, Openness to Experience, Agreeableness, Conscientiousness. The BFI-10 was developed to provide a personality inventory for research settings with time constraints. It allows assessing the Big Five with only two items per dimension. Previous research has shown that the BFI-10 possesses psychometric properties that are comparable in size and structure to longer five factor inventories such as the NEO-PI-R which has 240 items. The score for each dimension is obtained by summing standard items and reverse scored items for each scale 59 .
Compassion scale is 5 items from the Dispositional Positive Emotion Scale compassion subscale. It measures dispositional tendencies to feel positive emotions toward others in their daily lives. Items are rated from strongly disagree to strongly agree and scored from 1 to 7. Items are averaged for a total score and higher scores indicate greater levels of positive emotion 60 .
Statistical Analysis: Demographic information was qualitatively described for categorical variables. Means and standard deviations calculated for all continuous variables. Pearson correlations were conducted for relationships between measures. Cronbach's Alpha was calculated for the Belief and Experience subscales. All analyses were conducted with Stata 15.0 (StataCorp, LLC, College Station, TX). The confirmatory factor analysis was conducted in the same was study 1.

Results
In

Discussion
The overall results of the two studies provide psychometric support for the validity and reliability of the NEBS as a brief assessment of self-reported paranormal beliefs and experiences. The participant demographics of study 1 reflected the general population of the United States as designated by the recruitment criteria. Construct validity of the NEBS Belief subscale was strong, as it was strongly correlated with multiple other scales measuring paranormal belief including the Australian Sheep Goat scale, the Psi, Spiritual and Precognition subscales of the Paranormal Belief Scale, and AEI Paranormal Belief subscale. Construct validity of the NEBS Experience subscale was also strong, demonstrating higher correlations to experience items such as AEI-Paranormal Experience and AEI-Paranormal Ability than other items such as Traditional Religious Beliefs. The NEBS did not measure paranormal fear or drug use as reflected in the low correlations on those AEI subscales. For divergent validity, there were only negligible correlations (r's between 0 and 0.30) to all other measures, providing more evidence that the NEBS is not measuring other constructs. Interestingly, other studies evaluating personality traits and paranormal beliefs have been mixed 62,63 , with some studies observing positive correlations with neuroticism 64,65 (unlike our study which found a negligible negative correlation) and others not finding any correlations [65][66][67] . The NEBS reliability and internal consistency was also demonstrated through high Cronbach's alphas for both subscales in two different samples. Our confirmatory factor analysis for two latent constructs of Belief and Experience in the general population dataset revealed a model good fit (RMSEA = 0.06), controlling for covariances between specific individual items, that was then confirmed with the IONS Discovery Lab sample (RMSEA = 0.06). RMSEA calculates the size of the standardized residual correlations and theoretically ranges from 0 (perfect fit) to 1 (poor fit). A model is considered satisfactory when RMSEA < 0.08 68,69 . Our conceptual model of Belief and Experience as separate constructs and as evaluated through the NEBS was confirmed.
When measured separately, Belief and Experience are highly correlated. We found this in both of our samples (study 1: r =0.77; study 2: r = 0.64). Interestingly, the correlation was stronger in our general population sample that in our IONS Discovery Lab sample. The mean NEBS belief scores for the IONS Discovery Lab group were 21.3 points higher than the general population group (59.7 ± 21.9 general population vs. 81.0 ± 18.1 IONS Discovery Lab). The mean NEBS Experience scores were also greater in the IONS Discovery Lab group but only by 15.1 (44.3 ± 25.6 general population vs. 59.4 ± 22.9 IONS Discovery Lab). The AEI -Paranormal Belief and Paranormal Experience subscales were highly correlated in our study 1 sample as well (0.77). Interestingly, the original study of this scale found a much lower (r =0.57) although significant correlation between the two subscales 29 . We also found belief and experience to be highly correlated (r = 0.61) for another mixed population of scientists and engineers, the general population, and paranormal enthusiasts 12 . Other studies that have evaluated belief and experience in general have also found positive correlations 17,18 . A study examining the correlation between

Openness
3.9 ± 0.9 0.23 a 0.28 a specific religious and classic paranormal beliefs, such as belief in heaven and hell or psychic healing, in relation to the paranormal experiences of illness cured by prayer and the use of the mind to heal the body, found mixed results. For example, belief in the devil and belief in illness cured by prayer had a low significant correlation (r = 0.38), but the relationship between illness cured by prayer and the belief in psychic healing (r = -0.04) was not 40 .
Paranormal belief and experience are highly correlated in most studies that assess them, and yet they are distinctly different constructs that should be evaluated separately. What we do not yet understand is the causal or temporal nature of the relationship between belief and experience. Does paranormal belief precede experience or vice versa? Does someone's belief in the paranormal prime them to experiencing it or does a subjective experience of the paranormal instill belief in the phenomena? Future longitudinal studies evaluating a baseline level of people's beliefs and collecting data on how those beliefs change over time in relation to any experiences they have would be helpful in answering this question.
There are a number of limitations that should be kept in mind when reviewing the results of this study. The individual constructs included in the NEBS are highly correlated. Conceptually, the individual concepts are unique but could also be viewed as overlapping. For example, the items on non-local consciousness (B2. I believe that my consciousness is not limited by my physical brain or body. E2. I have personally had this experience.) and survival of consciousness (B5. I believe in life after death. E5. I have personally had an experience that I interpreted as a proof that consciousness survives the physical body.) could be considered as the same construct worded in a different way. The experience items are administered directly after the belief item of the same construct. The instrument was purposefully designed in this way to keep it concise. However, asking the belief question directly before the experience question could bias responses to the experience question in some way. We also acknowledge that the limited objective format of the survey (answered with a slider from 0-100) with constrained definitions is limiting. A more in-depth phenomenological approach would surely provide greater nuance and depth of understanding of belief and experience. However, the nature of such an instrument in terms of administration and scoring would not solve the problem of needing a simple and concise instrument. Any NEBS results should be interpreted with these limitations in mind.
In summary, the NEBS is a 20-item survey rated on a sliding scale from 0-100, with 10 Belief and 10 Experience items. Both subscales demonstrated convergent validity, internal consistency, and test-retest reliability. A confirmatory factor analysis model demonstrated a good fit for Belief and Experience as separate latent variables. This model was confirmed in another sample where divergent validity was also established. The NEBS is a concise, valid, and reliable instrument for evaluating individual differences in paranormal beliefs and experiences. This measure provides a new tool for rigorously investigating these beliefs and experiences, and their relationship (as predictors, outcomes, or covariates) with other variables of interest such as psychological well-being, physical health, effects of interventions, coping with death and dying, grief and trauma resilience, and extended human capacities, just to name a few.

Open Peer Review
does not have dedicated experience items that refer to specific phenomena, which is problematic. The methodology itself is also flawed. I suspect an Exploratory Factor Analysis (EFA) would reveal that the factors would be around the phenomena rather than the dichotomy of belief and experience. I.e., a person, who believes in particular phenomena is more likely to experience them; therefore, the factors will demonstrate this. I would recommend that this be more of an exploratory study and that an EFA be run on the first sample at the very least. However, this would probably not give the desired outcome. The items that relate to experience should ideally be able to be answered in isolation and not be dependent on the belief items. This could act as a prime, with people who state they believe in something when asked if they have experienced it directly after being more likely to agree.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
-The original manuscript included Supplementary Data that summarized a review of 18 previous scales that informed the design of our scale. We have revised the text in the Introduction to clarify and draw attention to this. The revised text reads: "We also reviewed a number of existing questionnaires that measured paranormal experience and/or belief (8,26,28,29,(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45). A summary of this review is presented as Supplemental data A. (please see extended data (31)). Each questionnaire was evaluated for the number of items, whether it assesses belief, experience or both, whether it evaluates belief and experience as separate constructs, and subscales if applicable." The development of the NEBS should be in the methods section and not the introduction.
-The development of the NEBS has now been moved to the methods section.
The methods sections would benefit from traditional subheadings.
-The methods sections now has subheadings.
The results section contains demographic data which should be in the participant's section in the methods.
-The participant demographic information was moved to the participant's section in the methods.
An exploratory factor analysis would have been useful on the first sample.
-Please see last item for full response of this comment.  59 , and the compassion subscale of the Dispositional Positive Emotion Scale." The manuscript then goes on to describe each of those measures in more detail.
The items relating to experience are all the same.
-The wording of the experience items are the same but reflect the belief construct expressed in the item directly before.
The participant is asked if they have experienced something directly after they have been asked if they believe it; this could present a confound.
-Please see response below about EFA and dependencies between belief and experience items.
Overall, while this paper offers an interesting premise, there are several flaws. The scale itself does not have dedicated experience items that refer to specific phenomena, which is problematic. The methodology itself is also flawed. I suspect an Exploratory Factor Analysis (EFA) would reveal that the factors would be around the phenomena rather than the dichotomy of belief and experience. I.e., a person, who believes in particular phenomena is more likely to experience them; therefore, the factors will demonstrate this. I would recommend that this be more of an exploratory study and that an EFA be run on the first sample at the very least. However, this would probably not give the desired outcome. The items that relate to experience should ideally be able to be answered in isolation and not be dependent on the belief items. This could act as a prime, with people who state they believe in something when asked if they have experienced it directly after being more likely to agree.
-Exploratory factor analysis (EFA) is a statistical technique used to find the underlying structure of a set of observed variables (Gorsuch, 2015), whereas confirmatory factor analysis (CFA) is used when researchers have formulated a hypothesis regarding the relationship between observed variables and the underlying latent factors (Gorsuch, 2015). There has been a debate regarding the circumstances in which these two analyses should be used in research (Hurley et al., 1997) and whether/when they should be used in tandem (Gerbing and Hamilton, 1996). EFA can be used prior to cross-validation with CFA for the purpose of model specification (Gerbing and Hamilton, 1996). EFA can also be used after CFA to explore poor fits in CFA models, explore factor structures when the original hypotheses are weak, and confirm factor structures when the original hypotheses were strong, but certain assumptions are not reasonable (Schmitt, 2011).
The reviewer recommends an EFA on the first sample. It is suggested, but not made clear, that an EFA on the first sample would serve the purpose of exploration regarding other possible factor structures. It is also suggested that EFA could possibly identify elevated correlation in the already correlated belief and experience factors that is due to survey structure alone.
In this study, hypotheses are based on theory and practice, and are therefore strong. It is unclear whether EFA is suggested to confirm assumptions that may not hold in this study. If so, what are these specific assumptions and why are they unreasonable or not upheld? If EFA is suggested as a pre-cursor to CFA, then EFA should be conducted followed by cross-validation with CFA on an independent data set, as suggested by Gerbing and Hamilton (1996). If this procedure is followed, either (1) the EFA will support the researchers' hypotheses or (2) the EFA will not support the researchers' hypotheses. If (2) occurs, it is unclear what impact this should have on the current study. The reviewer notes that if (2) occurs, "this would probably not give the desired outcome", and it is highly pertinent to point out that the researchers did not undertake this study to achieve a "desired outcome" but rather to test hypotheses. If (2) occurs, it is not clear whether this invalidates NEBS as a functioning survey tool. CFA does not always confirm a factor structure obtained via EFA (van Prooijen and van der Kloot, 2001;Borkenau and Ostendorf, 1990). If a CFA fits well and satisfies all assumptions but an EFA indicates that the underlying structure of the data can be represented by different factors, is this sufficient evidence to discard the original CFA? If so, can the reviewer provide support for this claim? In Gorsuch's classic text on factor analysis, he states "Confirmatory factor analysis tests hypotheses that a specified subset of variables legitimately define a prespecified factor" (Gorsuch, 2015). If the researchers have found that a subset of variables legitimately defined their prespecified factors and this was the central goal of their paper, what then is the purpose and role of added exploratory analysis in the context of this paper?
If the reviewer's hypothesis that an EFA "would reveal that the factors would be around the phenomena rather than the dichotomy of belief and experience" is supported by an EFA on the first sample, then it is not clear whether presenting items in isolation would remedy this given the high inherent correlation between belief and experience, and therefore between the items themselves. If this statement is more than opinion or a hypothesis and there is scientific support for it, how should this isolation be achieved and how much isolation is enough isolation to guarantee that this effect is removed from the analysis? If the diagnosis of this specific issue is the only reason for the suggested EFA, then if the researchers were to reorganize and readminister the surveys according to the reviewer's specifications, would an EFA still be necessary? It is also worth noting that the question of whether or not the survey structure introduced added dependence between items can be easily tested experimentally by providing the original and modified surveys (with some reasonable span of time in between) to a group of (new) participants and quantifying the difference in responses. If the difference is not statistically significant, then any added dependence should be negligible. Would this kind of adjustment in the methodology remedy the need for an EFA, according to the reviewer?
In summary, it is unclear whether the reviewer calls for an EFA to 1) examine other factor structures in the data or 2) test additional dependencies that may have been introduced by the survey structure. If (1), it is unclear what role this EFA would play in the current paper. If (2), it may be more straightforward to address this experimentally.