Does fMRI neurofeedback in the context of stress influence mood and arousal ? A randomised controlled trial with parallel group design

Stress-related mental and physical health issues burden Background: modern societies. New treatment opportunities could help to lessen long-term detrimental consequences of stress. To investigate whether real-time functional magnetic resonance Objective: imaging neurofeedback (rtfMRInf), aimed at modulating brain activity associated with a stressor, affects subjective mood and arousal. In total, 30 males participated in a randomised controlled trial Methods: with parallel-group design. rtfMRInf was the intervention, sham-neurofeedback the control condition, and the Stroop task the stressor. We instructed participants to modulate their stress response to the Stroop task via feedback from their anterior cingulate cortex and their insular cortex, concomitantly applying mental strategies. We assessed mood with the Multidimensional Mood State Questionnaire (dimensions: good/bad, GB; awake/tired, AT; and calm/nervous, CN), and subjective arousal with Self-Assessment Manikins (SAM). : We found significantly higher subjective arousal after Results neurofeedback phases in the experimental condition as compared to the control condition [t(26.6) = −2.216, 95%CI [−2.188,−0.083], p = 0.035; t(27.9) = −3.252, 95%CI [−2.685,−0.609], p = 0.003], but no significant differences between the conditions regarding mood [GB: b = 0.4, 95%CI [−0.67, 1.47], p = 0.467; AT: b = 0.769, 95%CI [−0.319, 1.857], p = 0.177; CN: b = 0.5, 95%CI [−0.53, 1.53], p = 0.352]. In both conditions, there was significantly worse and more tired mood after the fMRI session as compared to before [GB:b = −0.77, 95% CI [−1.31, 0.23], p = 0.009; AT: b = 1 2 2 1,3


Introduction
Stress is ubiquitous in life. Getting rid of it is neither realistic nor desirable, as Hans Selye pointed out: "complete freedom from stress is death!" 1, p. 137 . However, accepting stress as part of our lives does not mean we are all heading towards long-term detrimental consequences of this inevitability. We can influence how a stressor affects us in the long run. For this, let us begin with how we react to a stressor: The stress response is manifested in physiological and psychological aspects and leads to specific behavior. Physiologically, stress for example increases blood pressure, heart rate, and specific brain activity, and triggers a cascade of endocrine activity which ends in glucocorticoid release 2,3 . Psychologically, stress leads to focused attention and increased arousal and alertness, and shows up in behavioral measures such as self-reported questionnaires 2 .
This response ensures survival in the presence of a life-threatening stressor, or at least increases the chance of survival. In everyday life in a modern society, however, the response might overshoot, given the nature of the stressors we are facing. And accumulated or chronic stress can lead to dire mental and physical health issues: Stress-related mental disorders like depression and anxiety, which are globally a main source of adult disability 4 for example, or hypertension 5 , which is the top modifiable risk factor for mortality 6 .
The effects of stress and related disorders burden industrialized countries increasingly. Associated costs have been estimated to 20 billion euros per year, in the European Union alone 7 . It is also a topic which affects most of us eventually: 49% of people in a survey in the United States had a major stressful event in a one year time-window 8 . When asked how often they experience stress in their daily life, 44% of respondents said "frequently" and 35% "sometimes" to a Gallup poll conducted in the US in 2017 9 .
Finding ways to better deal with stressors in the short term might spare us of these long-term consequences and lessen the burden on the individual and society. In the definition of psychological stress by Lazarus and Folkman, our own appraisal of a situation and our coping abilities take a major role: "a relationship with the environment that the person appraises as significant for his or her well-being and in which the demands tax or exceed available coping resources" 10, p. 63 .
Cognitive and behavioral techniques used in occupation-specific stress management programs are also known to psychotherapeutic practice. Intervention programs for stress management often include education about and practice of time-management and coping skills, psychoeducation, relaxation techniques (e.g. Jacobson's progressive muscle relaxation, controlled breathing, hypnosis), mindfulness-based stress reduction, exercise, leveraging social support, or training in specific job-related skills to prevent or prepare for common stressors [11][12][13][14] . Yoga and meditation-based therapies have both also been associated with mood changes in people with the stress-related disorders depression and anxiety 15 . The mechanism behind these changes might act via the biological stress system 16 .
However, while there is support for classical stress management interventions 17 , as with any treatment it is likely that a proportion of affected people do not respond to a given intervention. In such cases, innovative neuroscientifically-informed interventions might help. Such interventions are based on neuroscientific knowledge about the stress response and can be coupled with personal neural activity in reaction to a stressor. Due to that, they are likely to help participants work deliberately on their individual stress response.
Gaining deliberate control over specific and individual brain activity (and thus indirectly over mental processes) is a key aim of neurofeedback, an approach that has over the last decades been applied to modulate a wide set of mental processes and to improve symptoms to specific mental disorders 18 . Neurofeedback describes the paradigm to feed back a signal reflecting a person's own brain activity so that the person can use information contained in the signal to better modulate their brain activity 18 . An advantage of fMRI is that it allows to work with a spatially circumscribed region of the brain. Thus, one can use this approach to target various brain areas associated with different mental processes, in real-time functional magnetic resonance imaging neurofeedback (rtfMRInf). rtfMRInf has been applied to modulate diverse mental processes, including pain 19 , anxiety 20 , mood 21 , and many others 22 . Whether the procedure could also be used to modulate the central and peripheral stress response has to the best of our knowledge not yet been investigated.
Within a larger rtfMRInf study, we assessed self-reported mood and arousal measures and tested whether these differed between the participants who had received real feedback from their own brain's activity, as compared to those who received sham feedback (the recorded brain activity of another participant). Participants thereby used one out of four different mental strategies (body attention, emotional imagery, facial expression, and contemplative repetition) to help them reduce their stress response. Details about these are available in our earlier publication 23 . The use of these strategies has shown to improve mood when trained once a day for 13 consecutive days, using smartphone-based instructions 23 . Here, we aimed to elucidate whether rtfMRInf aiming at modulating the central and peripheral stress response is related to changes in mood and subjective arousal. While we addressed the primary and secondary outcomes of the study, namely physiological components of stress (brain activity and blood pressure) and adverse events, in another manuscript (Belardi, Lee, Kim, Stalujanis, Jung, Oh, Yoo, Pruessner, Tegethoff, Meinlschmidt; unpublished study), here we focus on additional outcomes, namely self-reported psychological measures of mood and arousal.
We assumed that effects of rtfMRInf on mood and arousal may show up in one of two potential directions: Neurofeedback may lead to i) improved mood and lower arousal (in line with its aim to reduce the stress response); or ii) worse mood and higher arousal (in line with increased mental workload based on the multitasking situation going along with rtfMRInf). The second direction might be due to our experiment requiring the participants to multitask: at the same time monitoring the feedback signal; applying a specific mental strategy; and conducting the Stroop task. Current research in the field of multitasking generally reports lower task performance when the task is performed in a multitasking setting, as compared to when it is done as a single task 24 and higher arousal for complex and multitasking situations 25,26 .

Participants
We recruited 31 subjects and analyzed data of 30 of these (with mood and arousal data lacking from one subject, due to no-show for the main experiment). They were all male students at Korea University, Seoul, Republic of Korea. We allocated 17 of them to the experimental condition and 13 to the control condition, based on a predefined block-randomized scheme (blocks of 8, 10, and 12 with random order). The mean age was 24.6 years (SD=2.1) and 24.5 years (SD=2.4) in the experimental and the control condition, respectively with a mean of education of 14.9 years (SD=1.4) and 15.3 years (1.3), respectively. There were no significant differences between the conditions for these baseline characteristics [age: t(28) = -0.0585, p = 0.954; education years: t(28) = 0.718, p = 0.479].
We based the sample size on previous studies which had shown large effect sizes for rtfMRInf 19,27 . Using power analysis, we estimated that with 14 subjects in each group we could detect effects of d = 1.0 with sufficient power (1 − β > .80; given α = 0.05, one-sided). Recruitment was stopped after the intended 30 subjects had participated in the experiment.
A researcher in Switzerland who was not directly involved in conducting the experiment and who had no contact to the participants, generated the randomized allocation sequence. MATLAB was used for the randomization, whose underlying random number generator uses the Mersenne Twister algorithm by default 28 . This researcher ensured that the allocation sequence was concealed from those who recruited and assigned the participants. Participants were assigned to a condition according to the allocation sequence in the order in which they were included in the study and only after a final decision about inclusion was made, to support concealment. Researchers at Korea University did the enrollment and then assigned participants to the conditions.

Sampling procedure
We recruited participants via ads on the university's website and a bulletin board on campus. Using the following inclusion and exclusion criteria, we checked all interested students and decided about their eligibility for the study. Inclusion criteria were being i) male, ii) between 18 and 65 years of age, iii) righthanded, iv) familiar with using a smartphone, to take part in the ambulatory training, v) having sufficient English language skills to follow the written instructions in the experiment, vi) no indication of color-blindness, vii) no history of cardiovascular or neurological diseases, and viii) no history of a severe mental disorder. After finishing the whole study procedure, we paid each participant 60,000 KRW to compensate for time and effort related to study participation.

Materials
We used established tools to assess mood and arousal, applied a well-known cognitive task as a stressor in the fMRI experiment, and instructed our participants in four mental strategies, aimed at reducing their stress response during the experiment.
To assess mood, we used the English version of the established multidimensional mood state questionnaire (MDMQ) (original in German "Mehrdimensionaler Befindlichkeitsfragebogen (MDBF)"), which has good psychometric properties 29,30 . The questionnaire measures current mood on three dimensions: good to bad, awake to tired, and calm to nervous. Individual values on each dimension range from 4 to 24 and higher values represent more positive affect, feeling more awake, and calmer, respectively.
We assessed subjective arousal with a non-verbal pictorial rating scale to assess valence, arousal, and dominance, on a 9-point Likert Scale, called Self-Assessment Manikin (SAM) 31 . The arousal rating was labeled with "At the moment, I'm feeling..." and went from "very calm" to "very aroused" in addition to the original pictures. The SAM is an established tool which is used extensively in research. We were only interested in the arousal dimension, because it is clearly linked to a psychological stress response and, thus, have not analyzed the other dimensions of the SAM.
During the rtfMRInf experiment, we induced acute stress using a cognitive task which had previously been used for this purpose and shown to elicit a cardiovascular and neural stress response: the Stroop color-word interference task 32 , adapted for the use in fMRI experiments and to be more challenging due to implemented adaptive time constraints 33,34 . We instructed the participants in four mental strategies: Body attention, contemplative repetition (mantra), emotional imagery, and facial expression (make different emotional faces). More details about these strategies and the exact instructions (text and video clips) were published elsewhere 23 .
Overall study procedure We laid out the whole study as a randomised parallel-group study with rtfMRI neurofeedback as the experimental condition and sham-feedback as the control condition. Participants were blinded about their allocation, while experimenters and those analyzing the data were not. We registered the study before starting recruitment (ClinicalTrials.gov, identifier NCT01921088). Over the course of the study, participants visited the laboratory three times and conducted 13 days of smartphone-based ambulatory mental training between the two main experimental visits. We conducted the study at the Korea University, Seoul, Republic of Korea (RRID:SCR_004095) between August and October of 2013.
Initially, we screened all interested students in a short telephone interview to check for their history of diseases and mental disorders described in the exclusion criteria above. On a first visit to the laboratory, we then further checked their eligibility based on all additional inclusion and exclusion criteria with a set of questionnaires. There, we also instructed participants about the whole study and the four mental strategies.
After deciding upon the final inclusion in the study, participants then visited the laboratory two more times for the main rtfMRInf experiments. These two visits were 14 days apart and during the 13 days in-between, participants took part in a smartphone-based ambulatory mental training, where they applied the mental strategies they had already used during the experiment in short daily sessions. They were guided through the training with video clips and questionnaires on their smartphones.
We here report data from the first laboratory visit (screening day) and the first experiment day and will, thus, refer to the latter day simply as "experiment day". More details on the procedures, especially regarding the ambulatory training, have been reported elsewhere 23 .

Experimental procedure
With regard to the first experiment day, the experimental procedure contained the following phases: Structural scans (1 min), where the previously defined broad regions of interest for the neurofeedback training were localized; Functional localizer phase (13 min), where participants did the Stroop task and the individual regions of interest could be pinpointed, to ensure participants get a feedback signal from areas active during the task; resting phase (6 min); Neurofeedback-only (i.e., without Stroop) phase (9 min), where participants first had to apply the four learned mental strategies in turn, and then continue using the strategy which worked best for them, and also do neurofeedback; resting phase (6 min); phase with additional structural scans (8 min); Neurofeedback with Stroop (13 min), where subjects used both, the mental strategies and the neurofeedback signal to actively modulate their brain activity associated with the stressor; resting phase (6 min); Stroop-only phase (13 min), where subjects only used the mental strategies to reduce their stress response; resting phase (2 min).
The Stroop task runs were made up of 8 blocks each; congruent and incongruent trials were alternated. During the neurofeedback phases, we presented the feedback signal continuously on one side of the screen and (if applicable) the Stroop task on the other side. We assessed current mood (MDMQ questionnaire) once before and once after the whole fMRI experiment and arousal (with the SAMs) after each individual phase of the experiment.

Neurofeedback
We defined a set of regions of interest (ROIs) encompassing the left and right anterior cingulate cortex (ACC) and insular cortex (IC), based on previously found brain activity associated with the Stroop task 33,35 . The anatomical ROI was defined from the automated anatomical labeling (AAL) map and Brodmann's area (BA) map atlases available in MRIcron (https://www.nitrc.org/projects/ mricron). The intersection of the AAL 29/30 and the BA 48 were defined as the anatomical ROI for the IC and the intersection of the AAL 31/32 and BA 24/32/33 was defined as the anatomical ROI for the ACC. The use of the both the AAL and BA atlas was because the intersected areas from the two atlases has provided a functionally distinct area compared to the area defined from either one of the two atlas 36 .
Within this set of ROIs, a more precise individual ROI was localized during the functional localizer phase for each participant. The recorded and processed brain activity of the individual ROIs was fed back to the participants in near realtime. Participants saw the feedback signal abstracted as a white, moving thermometer-like bar on a black background, which went up and down depending on the signal strength, indicating the divergence from the baseline activity level. We instructed them to reduce the activity of the ROIs using this information, by applying the mental strategies they had learned and the information from the feedback signal. Sham-feedback for the control condition was the recording of the feedback signal from another participant.
We acquired the MRI data with a 3T Siemens Tim Trio scanner with a 12-channel head coil (Erlangen, Germany). To measure the BOLD signal, we applied a standard gradient-echo EPI pulse sequence 37 using the following specifications for rtfMRInf: repetition time (TR) = 1500 ms, echo time = 25 ms, field of view 240*240 mm, matrix size 64*64, voxel size = 3.75*3.75*5 mm, flip angle 90, and 30 interleaved slices with 5mm thickness at approximately 30 oblique to the AC-PC line without a gap 38,39 .
We calculated individual ROIs for each participant during the functional localizer phase as follows: EPI preprocessing (head motion correction for six parameters, spatial smoothing with an 8 mm full-width at half maximum Gaussian kernel); estimation of beta-value maps for each incongruent and congruent Stroop trial via general linear model (GLM) implemented in SPM to get a contrast map for "incongruent > congruent Stroop trials"; calculating the neurofeedback signal then from the intersection map between ROIs from the GLM and the predefined set of ROIs. Once a t-contrast map was obtained from the functional localizer run, a default statistical threshold of p < 0.01 was used to select the significantly active voxels from the incongruent compared to the congruent condition and consequently, these voxels entailed the functional ROI. The intersection of the functional ROI and anatomical ROI was used as the ROI for the rtfMRI-NF runs. Before the rtfMRI-NF run, the default statistical threshold (i.e., p < 0.01) was adjusted to make sure reasonable number of voxels were included in the ROI for the NF runs. The average number of voxels (+/-standard deviation) in the ROIs were 250.7 +/-83.9 for the experimental group and 289.2 +/-80.5 for the control group. The number of voxels between groups were not statistically different (t-score = -1.27 from two-sample t-test; uncorrected p = 0.22; 95% CI = [-100.7, 23.8]).
To calculate the neurofeedback signal, we first removed possible artifacts from the raw BOLD signal of the individual ROIs, applying a bandpass-filter (0.008 -0.1 Hz) using a third-order elliptic digital filter to avoid low-frequency linear drift 37 . Next, we linearly detrended the median BOLD signals within each of the ROIs as well as the whole-brain area. We then averaged the values between the 10th and 30th percentile during the cross-fixation period, using this as the baseline BOLD intensity (for ROI and whole-brain area). Percentage signal change (PSC) of the ROI relative to the whole-brain area were then estimated voxel-wise, by subtracting the estimated whole-brain PSC from the ROI PSC. This PSC difference was used as the neurofeedback signal. Finally, we averaged the signal over the last three TR periods in order to reduce potential high-frequency fluctuations occurring due to cardiac-and respiratory-related activity.
To take into account the longitudinal nature of the mood data (two measurements, before and after the fMRI session), we used linear mixed effects models. Our models included the following factors: fixed effects Time (prescan, postscan), Condition (experimental, control), and the interaction Time*Condition, and random intercept for each participant. We estimated three models, one for each of the mood dimensions as dependent variable. Together with beta values, we report 95% confidence intervals of two-sided tests using an alpha-level of 0.05 to determine statistical significance.

Participant flow
The flow of participants, from enrollment to allocation and analysis, is given in Figure 1 in a flow diagram consistent with the Consolidated Standards of Reporting Trials (CONSORT).

Mood
Mood scores were assessed with the MDMQ once before and once after the fMRI session. Results of individual mixed models for the three mood dimensions are presented in Table 1, and descriptive statistics can be found in the interaction plots in Figure 2. In the mood dimensions good/bad and awake/ tired, we saw lower values after the fMRI session than before in both conditions, leading to a significant main effect of time. In the calm/nervous dimension, a slight drop in values was present only in the experimental condition, but neither the time effect nor the interaction with condition was significant in this model. None of these models, to determine effects on mood, showed a significant main effect for condition or an interaction between time and condition. Note. σ 2 = within-group variance, τ 00 = between-group variance, ICC = intraclass correlation coefficient, CI = confidence interval. Factor predictors were coded using effect/deviant coding to increase interpretability of the fixed effects. Comparison from the mean intercept of the factor to the level names in parentheses for each factor. In the good/bad and calm/nervous models, there was one missing value each.

Arousal
We calculated Welch two sample t-tests for unequal variances to assess differences in SAM arousal values between the experimental and control conditions. To account for heteroscedasticity, these tests model different variances for both levels of the factor condition. Descriptive values for subjective arousal can be found in Figure 3. In the "Neurofeedback-only" phase, we found SAM arousal to be significantly higher for

Discussion
Subjective arousal was higher after neurofeedback training as compared to the sham-feedback control. This was true when the stressor task was present and when not. In our mood data, we could not observe changes specifically related to real neurofeedback, but participants in both conditions reported worse mood and being more tired after the fMRI session as compared to before.
These findings pose several questions: First, why did subjective arousal rise, contrary to our goal to reduce stress with our intervention? Arousal rose for experimental condition participants but not for those in the control condition, even when neurofeedback was practiced without a stressor. This finding is in contrast to the assumption that with neurofeedback, subjects reduce their stress response going along with reduced subjective arousal. The finding is, however, in line with the assumption that the cognitive demand on subjects in the experimental condition was higher as compared to subjects in the control condition. Let us recapitulate what participants did in this phase of the experiment: They applied previously learned mental strategies and used the feedback signal from their ACC and IC, trying to reduce the activity in these brain regions. This was the same for experimental and control condition participants. The only difference was the kind of feedback signal they saw (real or sham).
One possible explanation supporting the idea of increased cognitive demand in the experimental condition is the multitasking situation, present in the experiment. Participants had to do several tasks simultaneously. This might have lead to increased mental load in subjects who got real feedback compared to those who got sham-feedback, because those receiving sham-feedback might have (consciously or unconsciously) realized that the shown signal was not contingent with their brain activity. They might then have given less attention to this signal and the neurofeedback training and could focus better on other task(s) (applying mental strategies and solving the Stroop task). Arousal levels might, thus, here be an indicator for multitasking and increased mental load instead of an effect of the Stroop-induced stress. In this sense, the multitasking aspect of the experiment may have itself become a stressor, because it increased the cognitive demand of the participant.
The second set of questions concerning our results is: Why could we not observe any statistically significant mood changes related to the intervention and what could explain the increased tiredness and worse mood after the fMRI session?
We observed no statistically significant mood changes associated with neurofeedback, even though we would have expected better mood after the training in accordance with the study aim to reduce the physiological and psychological stress response with the help of neurofeedback. This could be linked to one limitation in the experimental design, namely that we measured mood only completely before and after the fMRI session. The observed mood effects can, thus, primarily be interpreted in relation to the participant's experience over the whole session and unfortunately they can not be matched to putative mood changes related to single experiment phases. Interestingly, subjective arousal, which was measured directly after each experiment phase during the fMRI session, did show differences between the conditions. We can, thus, assume that the sampling rate of mood might not have been fine-grained enough to pick up differences between the conditions and was only able to represent the overall experiment effects on all participants.
Regarding tiredness, fatigue due to the experiment and cognitive demand is expected. A one hour fMRI session is tiring and such overall effects might overshadow the miniscule differences between conditions due to the manipulation (real vs. sham feedback). Especially also since the neurofeedback manipulation was only present in some phases of the experiment.
The lower awake/tired mood values after the experiment are, thus, not surprising. Participants may become tired after a demanding experiment in an fMRI scanner where they have to repeatedly solve a monotonous cognitive task. Furthermore, we expected our participants to relax and calm down. Thus, their indication of being more tired can be interpreted in line with what they actually did.
To explain the decrease in the good/bad mood dimension, we can look at the individual items in the questionnaire that made up this dimension: Subjects rated to what degree they felt uncomfortable, content, discontent, good, bad, happy, unhappy, great, superb, and wonderful. For example, it is unlikely to feel more comfortable and content after the experiment, given that lying in an fMRI scanner can be somewhat uncomfortable, and considering that participants were challenged with a cognitive task with adaptive difficulty, ensuring that they did not perform too well. Even if participants could modulate their immediate stress response, the overall mood change from before to after the experiment towards a worse mood might, thus, be explainable.
We also did not target to specifically change mood with our intervention. In comparison to another rtfMRI study, which did exactly that 21 , we used different target brain regions. Where these researchers targeted brain regions that most highly reflected activity differences in response to positive vs. neutral images, we focused on regions associated with our stressor task. Our modulated regions were thus less likely to be directly involved in supporting positive mood and we could only expect a potential side-effect in the mood due to the down-regulation of the stress response.
Future studies should aim to overcome the above-mentioned limitations, including limited time resolution in mood assessment: They could profit from using a shorter mood assessment instrument that can be applied in higher frequency. One example is a visual analog scale (VAS) to rate perceived stress, which could be implemented during an fMRI experiment and which allows to sample rapidly and repeatedly, yet with lower precision as compared to the MDMQ. Longer questionnaires like the applied MDMQ interrupt the experiment for a longer time and are thus not ideal to be applied in higher frequency. Another potential limitation is that we used questionnaires in English but not in the Korean mother tongue of our participants. However, we ensured that all participants had sufficient English language skills and the content of the questionnaires consisted of rather simple English language, so that language issues can largely be ruled out.
Broader implications for future research include a notion that the multitasking situation during an experiment can itself influence the measured values. While it is important to make the best of experimental time for economic reasons and to avoid prolonging an experiment unnecessarily, overloading an experiment might result in unexpected and intertwined effects. In our case, the multitasking present during the experiment might have led to our finding of increased arousal connected to the neurofeedback intervention. Even though challenging, future rtfMRInf studies on stress should try to prevent multitasking situations as good as possible. One could also more explicitly look at this multitasking aspect and conduct experiments to elucidate this component in the context of rtfMRInf research. Further, it would be interesting to explore, whether subjective and brain reactivity to stress is associated across subjects; investigating the relationship between self-reported psychological factors and the brain activity changes during neurofeedback. Analyses of data from the second experiment day, conducted two weeks later, may also add information on potential delayed effects of rtfMRInf training.
We had set out asking whether rtfMRInf to modulate the stress response would influence participants' subjective perception of mood and arousal. The mood effects reflected the overall experimental experience due to sampling only before and after the fMRI session, and probably reflected rather general fatigue due to the cognitively demanding experiment than specific neurofeedback effects. To the best of our knowledge, we are the first to report a phenomenon of neurofeedback-related arousal: With regard to arousal, our findings are in line with the assumption that the multitasking nature of conducting neurofeedback during a stress task may have increased acute stress perceived by our participants, being in contrast to short-term neurofeedback effects on reduced subjective indicators of stress. Future studies should take into account multitasking situations in the experimental design, and further elucidate the neurofeedback-related arousal phenomenon, especially in the context of stress.

Data availability
Underlying data Full underlying (non-aggregated) data cannot be made publicly available since the ethics approval of this study does not cover openly publishing non-aggregated data.
In order to access this data, it must be requested from the corresponding author. Data requestors will have to provide: i) written description and legally binding confirmation that their data use is within the scope of the study; ii) detailed written description and legally binding confirmation of their actions to be taken to protect the data (e.g., with regard to transfer, storage, back-up, destruction, misuse, and use by other parties), as legally required and to current national and international standards (data protection concept); and iii) legally binding and written confirmation and description that their use of this data is in line with all applicable national and international laws (e.g., the General Data Protection Regulation of the EU).  6 . The reviewer encourages the authors to discuss their hypothesis in the context of these studies.

Reporting guidelines
Although the link between the stress response and the arousal is already clear in the first paragraph of the introduction, the same is not to be said about the relation between stress response and mood.
Information regarding the region of interest to be modulated (right anterior cingulate cortex and insular cortex) should be added to the introduction section, explaining the choice of neurofeedback target and its anatomical and functional relation with the central and peripheral stress response.
Finally, a clear a-priori scientific hypothesis should be defined substantiated by literature and if the results are contrary to the hypothesis -for instance, multitasking and it's relation with arousal and stress response -this should be discussed in the Discussion section.

Methods
The subject recruitment through block-randomized scheme is well explained, although questions arise, especially why the unbalance between the number of subjects allocated for the experimental condition (N=17) and control condition (N=13).
The authors pointed out that you based on the sample size on previous studies and estimated that with 14 subjects in each group you could detect effects of d=1 with power (1-beta > 0.8, alpha =0.05, one sided). This takes into account what outcome measure? The primary outcome of this real time fMRI Neurofeedback project ("physiological components of stress (brain activity and blood pressure") or the mood and arousal assessment?
The authors refer to reporting data from the first laboratory visit and the first experiment day. The reviewer is curious on why did the authors not use data from the third lab visit -after the intervention?
The experimental procedure needs further explanation. The reviewer would advise a figure, a visual depiction of the overall neurofeedback intervention. Although the reviewer assumes that the intervention is explained elsewhere (other manuscript), it should be noted that the work presented should be the most self-explanatory as possible. Several questions arise, such as the experimental design -block design? If so, how many blocks per "Neurofeedback only" and "Neurofeedback with Stroop '' and their duration per block? And finally, are the participants advised to use one of the four strategies (body attention, emotional imagery, facial, expression and contemplative repetition) per block or all strategies during one block?
Region of interest definition explanation is divided by a full paragraph of MR acquisition sequence information, a reformulation of this section is advised using, for example, the following order: MR acquisition, ROI definition, Neurofeedback calculation and presentation). Additionally, take note that during functional localizer description, if no feedback is being provided, the ROI definition is not from "calculating the neurofeedback signal", but perhaps the activation/BOLD signal -please reformulate.

Results
The reviewer is pleased with the explanation and the application of the proposed statistical tests and the reporting of the results, using 95%CI.

Discussion
The assumption by the authors that there is higher cognitive on subjects in the experimental group compared with the control group -sham feedback -should be substantiated by literature.
Conversely, the work of (Sorger et al. 2019) 7 states that when presenting feedback signals from a participant in the experimental group -provided that subjects in the control group do not detect the non-contingency between their efforts and the resulting feedback change -there should not be differences in motivation and perceived success. The author also adds the importance of monitoring frustration and whether the participants believe they were in the experimental group to confirm non-contingency, not mentioning differences in effort between groups.
Additionally, the claim made by the authors that subjects from the control group realized the noncontingency between effort and feedback signal should be substantiated by data from debriefing or questionnaire after the NF intervention. Non-contingency is a common element of neurofeedback studies regardless of group assignment (Sorger et al. 2019) 7 .
In order to consider arousal levels an indicator of multitasking, it should be noted that in the results from Figure 3, during the "Neurofeedback-only" runs, SAM arousal was already significantly higher for participants in the experimental condition [t(26.6)=-2.216, 95%CI[-2.188, -0.083], p=0.035, (two-tailed test)], as compared to those in the control condition. Assuming multitasking refers to simultaneous brain activation modulation whilst performing a stroop task, although it is verifiable that it leads to higher arousal, a comparison between experimental condition "Neurofeedback with Stroop" and "Neurofeedback-only" is needed to conclude the relation between arousal and multitasking. With this information, valuable insights would be added to the literature in relation to the balance of cognitive demand of the neurofeedback tasks and the efficacy/stress response of the participant.  10 . There is recent literature on these factors as well as other psychological factors that influence neurofeedback learning outcomes (Kadosh and Staunton 2019) 11 .Additionally, multitask impact on arousal and stress response should be completed with information from the last two sentences of the Introduction chapter, that are clearly arguments for the discussion section of this work.
The authors report consistent differences between SAM arousal scores comparing Experimental group and Control group, both during "Neurofeedback-only" and "Neurofeedback with Stroop" in an experiment aimed at down-regulation of the activation of brain regions related to stress response. This is an intriguing result and further explanation of this phenomenon should be obtained. Insights on the relation between the perceived control of feedback and SAM arousalusing possible debriefing questionnaires -and the relation between BOLD variation of the experimental ROI (neurofeedback success) during both Neurofeedback runs and the SAM arousal scores could add information on experimental design choice of control group in a Neurofeedback experiment.
Moreover, the reviewer agrees with adding information regarding the second experimental day, which would allow to compare post intervention scores and thus remove any bias in mood and arousal self-assessment.. (page 5) :"With regard to the first experiment day," The first experiment day was previously defined as only "experiment day" therefore you should remove the word 'first' or find a better way to uniform the phrasing.

3.
(page 5) : "and also do neurofeedback" The authors should rephrase, Neurofeedback is the technique, the action you perform is the modulation of your own brain activity.

4.
"rtfMRInf" and "rtfMRI-NF" both used throughout the work, the authors should choose only one abbreviation.

5.
(page 9) : "cognitive task with adaptive difficulty" Not sure if adaptive difficulty is the right term to describe the differences in difficulty in "Neurofeedback-only" "Neurofeedback with stroop " runs, since the difficulty does not adapt according to the results, the experimental design just has two levels of difficulty.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? Partly

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Partly Are the conclusions drawn adequately supported by the results? Partly commended for its rigorous sampling and other procedures for the randomized controlled trial. The reviewer is particularly impressed by rigid applications of statistical tests and all the information necessary for assessing the effect of neurofeedback was clearly presented, including effect size and 95%CI. Overall, the study provides significant information for researchers who are interested in the application of fMRI-neurofeedback for various purposes.
The reviewer would like to raise some concerns in the hope of improving the clarity of the study even more as follows: Although other parts of the study designs are well written, I feel that the procedures of neurofeedback are less satisfactory. Particularly, the reviewer would like to ask the following points: The authors seemed to have identified ROI in the individual brain using a contrast map of incongruent > congruent. How was the statistical threshold determined? How different were identified ROIs between individuals? Probably mean and SD of the cluster sizes should be described. How were ACC and IC combined? 1.
Subjects were instructed to apply one of the 4 mental strategies that were given to them before the study. What were they actually? The author stated that each subject selected the best one that worked for him/herself (page 5 left column), how was the "best" one determined? The reviewer would like to know whether and how the 4 mental strategies are expected to change ACC and IC activation.

2.
The reviewer could not locate the description of the length of each fMRI run, including functional localizer, NF only, resting-state, NF+Stroop, and Stroop-only.

3.
In page 5, it says that "We instructed them to modulate the activity of the ROIs ..." . Does "modulate" include both increase and decrease activation, or only decrease activation? In page 9 (left column), it says "trying to reduce the activity in these brain regions".

4.
Sham condition. In page 5, it says "sham feedback for the control condition was the recording of the feedback signal from another participant". How was "another participant" selected? Was it selected from the control group or from experiment group or both?

5.
Were subjects in the experiment group able to increase scores significantly at the end of the day 1.

6.
Was there any relationship between changes in NF scores and psychological scores between individuals?

7.
Probably related to (5) and (6), were the degrees of score changes controlled between the experiment and control groups? Can the authors exclude the possibility that the psychological scores related to stress responses are associated with the NF scores? 8.

Minor points
The reviewer understands that the participants were Korean university students. On the other hand, the authors used the English version of the MDMQ. Application of the psychological test in a foreign language might affect the subjects' responses depending on the proficiency of English.

1.
The reviewer feels that the study will become more significant analyzing the data of the second fMRI experiment after the mental training rather than focusing on the first fMRI experiment. Do we expect the report of the second fMRI experiment come up in the future?

2.
Is the work clearly and accurately presented and does it cite the current literature? Yes Is the study design appropriate and is the work technically sound? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Partly Are the conclusions drawn adequately supported by the results? Yes *****************

Major points:
The reviewer would like to raise some concerns in the hope of improving the clarity of the study even more as follows: Although other parts of the study designs are well written, I feel that the procedures of neurofeedback are less satisfactory.

1:
The authors seemed to have identified ROI in the individual brain using a contrast map of incongruent > congruent. How was the statistical threshold determined? How different were identified ROIs between individuals? Probably mean and SD of the cluster sizes should be described. How were ACC and IC combined?
*** We thank the reviewer for this insightful feedback that we used to add relevant parts to the methods section on the description of the neurofeedback procedures: Once a t-contrast map (i.e., incongruent > congruent) was obtained from the functional localizer run, a default statistical threshold of p < 0.01 was used to select the significantly active voxels from the incongruent compared to the congruent condition and consequently, these voxels entailed the functional region-of-interest (ROI).
The intersection of the functional ROI and anatomical ROI was used as the ROI for the rtfMRI-NF runs. Before the rtfMRI-NF run, the default statistical threshold (i.e., p < 0.01) was adjusted to make sure reasonable number of voxels were included in the ROI for the NF runs.
Regarding the combination of the pre-determined ROIs, an anatomical ROI to include the ACC and IC was defined from the automated anatomical labeling (AAL) map and Brodmann's area (BA) map atlases available in MRIcron (https://www.nitrc.org/projects/mricron). The intersection of the AAL 29/30 and the BA 48 were defined as the anatomical ROI for the IC and the intersection of the AAL 31/32 and BA 24/32/33 was defined as the anatomical ROI for the ACC. The use of the both the AAL and BA atlas was because the intersected areas from the two atlases has provided a functionally distinct area compared to the area defined from either one of the two atlas [1]. We added this information in the revised version of the manuscript.

2:
Subjects were instructed to apply one of the 4 mental strategies that were given to them before the study. What were they actually? The author stated that each subject selected the best one that worked for him/herself (page 5 left column), how was the "best" one determined? The reviewer would like to know whether and how the 4 mental strategies are expected to change ACC and IC activation.
*** We thank the reviewer for this comment, allowing us to clarifying on this relevant aspect of the methods: The four mental strategies are described in detail in an earlier publication (open access) by the authors, from which the written instructions for each strategy are available in detail [2]. We now highlight more clearly where this information on the instructions can be found. The participants were free to choose one of the strategies. There was no instruction on how they had to determine which strategy worked "best" for them. It was a subjective choice based on the experience the subjects made while trying out all four strategies in the scanner, during the "neurofeedback without Stroop" phase, as we assumed that subjective evaluation of the strategies would a) best indicate which strategy indeed worked best for the individual subject and b) lead to the best adherence of the subject using this strategy later on. Based on previous studies on the strategies' potential in relation to stress reduction, we expected that they could reduce these ROIs activity related to the stress response, since both ROIs had been found to be involved in the stress response elicited by the Stroop task [3,4].

3:
The reviewer could not locate the description of the length of each fMRI run, including functional localizer, NF only, resting-state, NF+Stroop, and Stroop-only.
*** As suggested by the reviewer, we added the duration of each phase of the fMRI experiment to the "Methods / Experimental procedure" section in the revised manuscript.

4:
In page 5, it says that "We instructed them to modulate the activity of the ROIs ..." . Does "modulate" include both increase and decrease activation, or only decrease activation? In page 9 (left column), it says "trying to reduce the activity in these brain regions".
*** Indeed, the instruction was to decrease the activity. We adjusted the first cited sentence to specify this.

5:
Sham condition. In page 5, it says "sham feedback for the control condition was the recording of the feedback signal from another participant". How was "another participant" selected? Was it selected from the control group or from experiment group or both? *** Sham feedback was always the recording from a participant of the experimental group. Each activity recording used for sham feedback was taken from the preceding participant of the experimental group.

6:
Were subjects in the experiment group able to increase scores significantly at the end of the day 1.
*** If we understand correctly, the reviewer here refers to the modulation of brain activity. These data are not reported in the current publication and will instead be included in another publication reporting the results from the main outcomes of the experiment (fMRI and blood pressure data).

7: Was there any relationship between changes in NF scores and psychological scores between individuals?
*** See reply to point 6 above; We thank the reviewer for pointing this out and added a respective sentence in the discussion section, highlighting that this would be an interesting additional question to be addressed.

8:
Probably related to (5) and (6), were the degrees of score changes controlled between the experiment and control groups? Can the authors exclude the possibility that the psychological scores related to stress responses are associated with the NF scores?
*** We did not control or adjust for the degree of changes in psychological scores between conditions, but took into account the factor condition, by modeling it in our analyses. With regard to any relations with brain responses, see replies to questions 6 and 7.
Minor points: 1: The reviewer understands that the participants were Korean university students. On the other hand, the authors used the English version of the MDMQ. Application of the psychological test in a foreign language might affect the subjects' responses depending on the proficiency of English.
*** We agree that this might generally be an issue. To minimize this problem, we ensured that only students with sufficient English language skills were included in the study. We added a respective sentence in the discussion section of the manuscript, pointing out this limitation of the study.

2:
The reviewer feels that the study will become more significant analyzing the data of the second fMRI experiment after the mental training rather than focusing on the first fMRI experiment. Do we expect the report of the second fMRI experiment come up in the future? *** We agree with the reviewer that the data from the second part of the fMRI experiments would add valuable information; yet habituation of the stress reactivity, the size of the stress-reactivity is getting smaller over time, makes it increasingly difficult to interpret the signals. To focus on the initial question and to keep the publication readable, we hence chose to focus on the first experiment to see the effects on the first responses to the Stroop task and the mental strategies. Further, given that we didn't see significant short-term differences between the conditions regarding mood, we do not expect them to develop later, even though we agree that this may theoretically happen and be an interesting topic for further study. We added a respective paragraph in the discussion section.