Development and validation of the English version of the Moral Growth Mindset measure.

Moral Growth Mindset (MGM) is a belief about whether one Background can become a morally better person through efforts. Prior research showed that MGM is positively associated with promotion of moral motivation among adolescents and young adults. We developed and tested the English version of the MGM measure in this study with data collected from college student participants. : In Study 1, we tested the reliability and validity of the MGM Methods measure with two-wave data (  = 212, Age mean = 24.18 years,   = 7.82 N SD years). In Study 2, we retested the construct validity of the MGM measure once again and its association with other moral and positive psychological indicators to test its convergent and discriminant validity (  = 275, Age N mean = 22.02 years,   = 6.34 years). SD : We found that the MGM measure was reliable and valid from Results Study 1. In Study 2, the results indicated that the MGM was well correlated with other moral and positive psychological indicators as expected. : We developed and validated the English version of the MGM Conclusions measure in the present study. The results from studies 1 and 2 supported the reliability and validity of the MGM measure. Given this, we found that the English version of the MGM measure can measure one’s MGM as we intended.


Abstract
: Moral Growth Mindset (MGM) is a belief about whether one Background can become a morally better person through efforts. Prior research showed that MGM is positively associated with promotion of moral motivation among adolescents and young adults. We developed and tested the English version of the MGM measure in this study with data collected from college student participants. : In Study 1, we tested the reliability and validity of the MGM Methods measure with two-wave data ( = 212, Age mean = 24.18 years, = 7.82 N SD years). In Study 2, we retested the construct validity of the MGM measure once again and its association with other moral and positive psychological indicators to test its convergent and discriminant validity ( = 275, Age N mean = 22.02 years, = 6.34 years). SD : We found that the MGM measure was reliable and valid from Results Study 1. In Study 2, the results indicated that the MGM was well correlated with other moral and positive psychological indicators as expected.
: We developed and validated the English version of the MGM Conclusions measure in the present study. The results from studies 1 and 2 supported the reliability and validity of the MGM measure. Given this, we found that the English version of the MGM measure can measure one's MGM as we intended.

Introduction
In the present study, we aimed to create and validate the English version of the Moral Growth Mindset (MGM) measure, which was originally developed in Korean. Growth mindset refers to the belief that it is possible to improve one's abilities and qualities, such as intelligence or personality 1 . These individuals believe that this can be done through effort and learning, which helps fosters motivation. Higher motivation for those with a growth mindset is encouraged through having attitudes such as viewing hardships as a chance to work harder rather than an indication of failure, and striving for success due to genuinely wanting to learn instead of being concerned with how others view them 2 . One study found that an intervention that taught students how to endorse a growth mindset reduced levels of aggression as well as depressive symptoms that resulted from being a victim of bullying 3 . This study suggested that growth mindset might be beneficial for promoting a sense of resilience when faced with social challenges or other difficulties.
MGM refers to growth mindset in the domain of morality. This mindset is related to one's belief that it is possible to become a morally better person and improve one's morals through efforts.
A previous study showed that MGM was positively associated with increases in voluntary service engagement among adolescents and young adults 4 . The results suggested that among younger populations, MGM might increase participants' prosocial behavior due to the belief that it will make them morally better. Given this, MGM would be considered as a factor that contributes to moral development. In order to adequately examine how MGM contributes to moral development, however, it is necessary to have an appropriate measure. Additionally, if moral growth mindset motivates people to learn how to become more moral, as previous research suggests, then it is important for moral educators to have a tool to assess the malleability beliefs students have related to their morals. For example, if moral educators are able to identify that some students have a fixed mindset related to their morals, then an appropriate starting point may be to provide them with evidence that it is possible to improve moral character throughout one's life.
MGM was previously included as a three-item subscale in a general measure of growth mindset called the Theory Measures 5,6 . However, because it is important to include four or more items per factor to perform psychometric tests 7 , the psychometrical qualities of the MGM subscale could not be sufficiently tested. In a previous study 4 , we developed and tested a Korean version of the MGM measure and evaluated the internal consistency and structure of the measure. However, the test-retest consistency and discriminant validity of the measure were not examined. Hence, in the present study, we created an English version of the MGM measure and tested its psychometric properties. In Study 1, we tested the internal and test-retest consistency and validity of the MGM measure and modified the measure to improve the model fit. In Study 2, we examined correlations between the MGM and other moral and positive psychological indicators associated with positive youth development to test the convergent and discriminant validity of the measure.

Study 1
In Study 1, we translated the MGM measure to English and tested its reliability and validity with two-wave data. We also modified the items to improve the model fit.

Translation of the MGM measure to English.
Based on the Korean version of the MGM measure 4 and the Implicit Theory measure 1,8 , we developed the English version of the MGM measure. Although the English version was created based on the Korean version, we did not do direct translation because of cultural differences in concepts and terms related to morals and characters (e.g., 9). Instead, the inventors (HH, KJD, and YJC) of the Korean MGM measure created its English version based on the structure of the Korean version and the wording in the Implicit Theory measure. In addition, the Implicit Theory measure was used due to the fact that it had six items and was based on Dweck's original measure of growth mindset for intelligence. As a result, the tested measure included six items as well (e.g., "No matter who you are, you can significantly improve your morals and character") and answers were anchored to a six-point Likert scale (see Extended data for the full measure 10 ).
Although Chiu, Hong, and Dweck 11 originally used more nuanced keywords such as "responsible and sincere" as well as "conscientiousness, uprightness, and honesty," we decided to use the more general terms, "morals and character." This was due to the concern that such nuanced terms in the original measure may be associated with specific moral foundations and biased towards certain groups of people. For example, conservatives have been found to score higher on measures of conscientiousness 12 whereas liberals have been found to rely primarily on the value of fairness, which is closely related to honesty, when dealing with moral issues (see research on Moral Foundation Theory; e.g., 13). Thus, we used "morals and characters" in order for participants to be able to define the terms based on their own experiences and understanding. Finally, since Chiu et al. (1994) 11 used terms related to specific morals and characteristics in their original three-item subscale (e.g., "A person's moral character," "whether a person is responsible and sincere," "a person's moral traits"), we decided to use "morals and character" in order to stay consistent with the construct they were measuring. That is, rather than measuring participants' malle-

Amendments from Version 1
In the revised manuscript, we addressed both reviewers' comments and suggestions. First, we restructured our manuscript so that more information regarding the theoretical frameworks and measurements are available. Second, we reported additional CFA results, factor loading reported from the six-item and five-item models, in the supplementary table (please refer to the updated Extended data section). Third, we revised the Discussion section to better interpret findings from both Studies 1 and 2. In addition to these major points, we also did several minor revisions to improve the quality of our work based on both reviewers' reports.
Any further responses from the reviewers can be found at the end of the article ability beliefs about the overarching system of values they have, we wanted to measure malleability beliefs regarding individual morals, as did the original measure. Doing so may increase the chance for interventions since if people want to become a better person (improve their morality) they may need to believe that their values (morals) can be improved.
Our measure is in line with the original measure, the Implicit Theory measure, consisting of six items 1 . In fact, although all of the items were meant to measure whether or not participants endorse a growth mindset and are similar to each other, the wordings varied slightly to include core concepts of growth mindset such as being able to improve regardless of who you are (i.e., "no matter who you are"), the point in time (i.e., "always"), or the degree (i.e., "considerably"). In addition, because we were interested in whether MGM can be differentiated from general growth mindset measured by the original growth mindset measure, we decided to use the same terms and format that were adopted in the original measure (e.g., "No matter who you are, you can change your intelligence a lot").
Participants. Study 1 was conducted during the 2018 fall semester. Participants were recruited from students enrolled in undergraduate educational psychology classes and they were provided with a course credit. Only students who were at least 18 years of age were eligible to complete the survey. The participants visited the subject pool system, checked the list of active research projects, and selected and signed up for our study. We decided to recruit at least 200 participants since N = 200 has been regarded as the recommended minimum sample size for confirmatory factor analysis (CFA) 14 .
A total of 212 college students (89.15% females; Age mean = 24.18 years, SD = 7.82 years; 177 Caucasian, 34 African American, 1 Native American, 1 Asian, 1 Pacific Islander, 3 Latinx, 2 multi-ethnic) from the southern USA completed the English MGM measure online via Qualtrics. They were re-invited to complete the same survey again one week later (N = 207 for Wave 2; 89.37% females; Age mean = 24.28 years, SD = 7.88 years).
Procedures. Participants who voluntarily signed up for study 1 received a link to the Qualtrics survey where they completed the MGM measure, followed by a demographics survey. When the participants signed up for the study, the subject pool manager provided us with their email addresses, and we sent the participants the survey link via email. We created our Qualtrics survey in a way that only the participants who answered all survey questions were able to complete the survey and receive a credit for their class. Thus, there was no missing data in the present study.
A consent form was sent out to the students alongside the MGM measure. This form was reviewed by the Institutional Review Board at the University of Alabama (IRB approval number: 18-04-1156), along with the approved studies, and was presented at the beginning of the Qualtrics form. Only students who read the form and agreed to participate in this study were presented with the survey forms.
Analysis. When examining test-retest reliability, we excluded participants who failed to complete the second survey within two weeks to control for the time gap between the two surveys, which left 168 cases for examining test-retest reliability (Mean time gap between Waves 1 and 2 = 7.78 days, SD = 1.66 days).
First, we examined consistency indices, i.e., Cronbach's α and test-retest consistency. Second, we performed CFA to examine the internal structure of the measure. We used robust weighted least squares (WLSMV) because it is more suitable for testing Likert-type items in a small sample 15 . During this process, we checked whether any item should be excluded from the measure to achieve a good model fit. If the measure was modified, we calculated all reliability and validity indices again. We used R (3.6.1) for statistical analyses. All data files and source codes are available as Underlying data 10 .

Results
First, the measure demonstrated at least acceptable reliability (> .7; see Table 1) according to both Cronbach's alpha values and test-retest reliability. Second, we performed CFA -the original model with all six items did not show good model fit (see Table 1). Thus, we excluded items 1 and 2 while referring to Han et al. (2018), because in that study we showed relatively lower factor loadings in the six-item and five-item models  Table 2 for the best model 16 ). As shown in Table 1, when we recalculated indices after exclusion of the items, they all remained greater than .7.
In addition to the low factor loadings, we also decided to remove items 1 and 2 due to the fact that they may have been too vague. For example, item 1 stated "you can't really do much" and item 2 stated "you can't improve very much" whereas the other items used words such as "significantly improve," "always substantially improve," and "improve…considerably" that conveyed more specific magnitude. Using the less extreme terms in items 1 and 2 may have put the items at risk of inconsistency 17 since it would be easier for participants' opinions to shift regarding whether or not you can change "much." In addition, as another possibility, items 1 and 2 are more likely about entity beliefs, not malleability beliefs that constitute the basis of growth mindset. These items contain some words perhaps related to entity beliefs (e.g., "certain morals and characters...," "something about you…"), so they might not directly measure the core of the growth mindset construct and showed lower factor loadings compared to the other items.

Study 2
In Study 2, we tested the correlation between MGM and other moral and positive psychological indicators associated with positive youth development. In addition, we performed CFA for model confirmation. We aimed at testing the validity of the measure, construct, convergent, and divergent validity. In general, according to the previous studies that examined the relationship between growth mindset, positive psychological indicators, and antisocial tendency (e.g., 26-28), we hypothesized that the sizes of correlation coefficients between MGM and other indicators, except the general growth mindset, would be between .10 (small) and .30 (medium). We discussed further details regarding the hypothesized effect size of each measure in the following sections.

Participants.
As per Study 1, participants were recruited from the educational psychology and psychology subject pools during the 2019 spring semester, with similar age and class enrollment restrictions employed in Study 1, Participants in educational psychology classes visited the subject pool system, checked the list of active research projects, and selected and signed up for our study. Participants in psychology classes who intended to sign up for our study visited the SONA system, reviewed the list of active studies, and then selected and signed up for the present study. Procedures. When participants signed up for the present study, the procedure for educational psychology students was identical to that of study 1. In the case of psychology students, they were automatically provided with a link to a Qualtrics survey via the SONA system. Participants were presented with the MGM measure and other moral and positive psychological measures, all of which were presented in a randomized order, followed by a demographics survey. Similar to Study 1, only the participants who answered all questions were able to complete the survey and receive a credit, so there was no missing data in the present study. For sample size estimation, similar to Study 1, we followed the guidelines for CFA 14 , so we determined that at least 200 participants were required.
Measures. MGM measure. We used the four-item MGM measure used in Study 1.
Implicit Theory Measure. The Implicit Theory Measure was designed to measure one's mindset regarding whether it is possible to change and improve one's intelligence and abilities in general 1 . The measure consists of six items and responses are anchored to a six-point Likert scale. The structure of this measure has been tested in previous studies (e.g., 1,8).
Given that the Implicit Theory Measure measures one's general growth mindset, we expected that it would be positively correlated with MGM. However, because the construct measured by the Implicit Theory Measure is not domain specific, we also expected that the MGM would not completely overlap with this construct (discriminant validity). Given these, the effect size of the correlation coefficient would be medium to large (r = +.3 -+.5).
Behavioral Defining Issues Test (bDIT). The bDIT was developed to assess development of one's moral judgment 19,20 . Choi et al. (2019) 19 tested its measurement structure and psychometrical qualities and found that it did not favor any gender and it showed acceptable reliability as well as concurrent validity with the DIT-1 measure. In general, the bDIT assesses whether one can make moral judgments based on the post-conventional schema instead of focusing on social norms or one's personal interests. It consists of three moral dilemmas and 24 questions that ask what the most important moral philosophical criterion is when solving the moral dilemmas. We used a percentile score that quantified the likelihood of utilizing the post-conventional schema. Because the bDIT measures one's moral judgment development, we expected that MGM would be positively associated with the bDIT score.
Unlike other self-report measures, the bDIT is a behavioral measure evaluating one's developmental level of moral judgment with behavioral responses. Previous research has shown that participants could not increase their score even if they were asked to fake higher moral judgment with the DIT 29 . Thus, the bDIT is less susceptible to social desirability bias and can measure one's actual moral functioning instead of self-reported qualities. Given that this is a psychological test to assess one's moral functioning and not a self-report measure, we expected that the bDIT score would be weakly correlated with MGM (r ~ +.1).
Interpersonal Reactivity Index (IRI). The IRI was used to measure empathic traits, i.e., empathic concern (EC), personal distress (PD), perspective taking (PT), and fantasy scale (FS) (Davis, 1983) with 28 items. The internal structure of the measure based on the four-factor model was validated in previous studies with factor analysis (see Chrysikou & Thompson, 2016). According to Decety and Cowell's (2014) discussion regarding the relationship between different subcomponents in the IRI and moral functioning, we hypothesized that only EC and PT, not PD and FS, would be positively correlated with MGM 30 . Given the IRI is a self-report measure, we expected a relatively larger (small to medium) effect size of correlation, r = +1 -+3, compared with the bDIT. Claremont Purpose Scale (CPS). This 12-item scale quantitatively measures purpose among adolescents using three subscales: meaningfulness, goal orientation, and beyond-the-self dimension 24 . CPS scores were positively associated with various moral and positive psychological indicators (e.g., purpose in life, satisfaction with life, empathic concern, wisdom) in prior research 24 . We used both the total CPS and subscale scores given that Bronk et al. (2018) 24 validated it with hierarchical CFA. Given previous studies that examine the association between morality, meaning 32 , and purpose 31,33 , similar to the cases of the IRI and MIS, we hypothesized a small to medium effect size of the correlation between MGM and CPS (r = +1 -+3).

Moral Identity Scale (MIS
Analysis. First, we performed CFA with the MGM data again to test the internal structure of the MGM measure (construct validity). Second, we conducted correlation analyses to examine how MGM was associated with other moral and positive psychological indicators (convergent validity). Third, we tested whether or not the MGM measure examines a construct independent from general growth mindset (discriminant validity) using the Fornell-Larcker criterion 34 .
We also used R in Study 2. All data files and source codes are available as Underlying data 10 .

Results
The results of the reliability check showed that the MGM measure as well as all other measures possessed at least acceptable reliability (> .7; see Table 3). Moreover, CFA supported good internal structure of the MGM measure (see Table 1 and Table 2). However, it should be acknowledged that Item 4 showed a slightly lower factor loading in Study 2 compared with Study 1, although the overall model fit indices were excellent. This point might need to be tested in future studies with more samples.
Correlation analysis demonstrated a positive association between MGM, general growth mindset, and other moral psychological indicators such as empathic concern, perspective taking, moral internalization, and purpose. Indicators relatively less relevant to morality, such as personal distress, symbolization, and meaningfulness, did not show a significant correlation (see Table 3). The effect size of the correlation coefficient between MGM and bDIT was small as predicted, but the correlation was non-significant (p = .08). MGM was not significantly correlated with PD and CPS meaning. The correlation between MGM and moral disengagement was significantly negative. We found that the correlation coefficient between MGM and general growth mindset (r = .37) was smaller than the square root of the average variance extracted (AVE=.84), which indicates MGM showed discriminant validity from general growth mindset.

Discussion
We developed and tested the English version of the MGM measure in this study with data collected from emerging adult participants. In Study 1, we found that the four-item MGM measure possessed good consistency and internal structure. In fact, the previous studies that developed and tested measurements for diverse types of domain-specific growth mindset have shown that the measurements possessed good reliability and  validity as well (e.g., 35,36). Consistent with these previous studies, we were able to show that MGM can also be appropriately measured by a self-report measure, the English version of the MGM measure, as we intended.
In Study 2, we found that MGM was positively associated with moral and positive psychological indicators as hypothesized. Two exceptions were the significant associations between MGM and FS and the non-significant association between MGM and CPS meaning. First, FS is intended to quantify one's tendency to expand their empathy toward imaginary beings, so the significant association with MGM indicates a tendency to broaden one's empathy. Second, CPS meaning is about personal meaning, which does not necessarily always mean moral 37 , so it makes sense that it would not be significantly associated with MGM. This result would suggest that the MGM measures a construct that is specifically about moral development in addition to positive youth development in general.
In the case of the bDIT, the effect size was within the hypothesized range, but the correlation was non-significant (p = .08) perhaps due to the small sample size. As previously mentioned, this could also be due to the fact that the bDIT is a behavioral measure rather than a self-report measure like the MGM measure. Since the bDIT is less susceptible to social desirability bias, it may be necessary to further explore the possibility of bias in participants' responses for the MGM measure in future studies.
In addition, moral disengagement was negatively correlated with MGM. Since moral disengagement allows people to dismiss negative feelings, they may have about behaving immorally using the eight mechanisms previously mentioned, this increases the likelihood of continuing to behave immorally. In this way, moral disengagement and MGM have somewhat reverse trajectories. As hypothesized, this suggests that MGM may promote engaging in moral behavior. In addition, since moral internalization, which has been shown to inhibit moral disengagement 38 , was also positively correlated with MGM, it makes sense that our measure was negatively correlated with moral disengagement. If somebody has a strong sense of their morals and these values are internalized, this may help them to stay engaged with their standards and furthermore, be motivated to continue to be morally better.
Finally, we found good discriminant validity between the MGM measure and the general growth mindset measure. This indicates that although the general growth mindset measure and the MGM measure are measuring growth mindset related to different domains, they are measuring distinctly different constructs related to malleability beliefs (i.e., intelligence and morals, respectively). Given this, our MGM measure significantly contributes to growth mindset research by introducing a reliable and valid measure for growth mindset related to morals.
The results from our correlation analysis are consistent with findings in previous studies that have examined the positive relationship between growth mindset and successful social adjustment and positive youth development in general 2,26,39 .
This English version of the MGM measure has the potential to significantly contribute to research in moral development and education. For instance, researchers and educators who are interested in how MGM is associated with moral development may use the MGM measure in their studies. In addition, given that we created the English version of the MGM measure, scholars who are using languages other than Korean or English will be able to translate the measure into their languages. By doing so, it would be possible to accumulate largescale datasets for testing the measure in diverse backgrounds and contexts, and to examine the roles of MGM in moral development in the long term.
However, there are limitations in this study that warrant future studies. First, we collected data only from undergraduate students and male students were underrepresented in both studies; such issues may limit the generalizability of our findings. Second, although we used straightforward terms (e.g., morals and characters), we could not test whether the measure was actually unbiased according to one's political orientation of endorsed moral foundations. To address this issue, measurement invariance test would be a way to examine whether the MGM measure, which allows participants to interpret "morals" and "characters" by themselves, measures the same construct across different groups who may use different underlying folk conceptions of morals and characters. Third, although participants spent about 33.98 minutes (median) to complete Study 2 we did not include any attention check items. Fourth, we did not employ Chiu et al.'s (1997) 5 original measure, which could be informative while conducting the convergent validity check, although our measure was based on Dweck's (2000) 1 updated six-item general growth mindset measure. Fifth, the items used in the MGM measure could be revised particularly when being administered among younger populations. We decided to use the current wordings to maintain consistency with the Korean version of the MGM measure and the Implicit Theory Measure, which constituted the basis of our measure. However, to make the measure more applicable to younger populations, some complex words (e.g., "substantially," "considerably") could be replaced with simpler words (e.g., "a lot"). Finally, since several items in the measure might seem to be similar, the words could be revised in future studies, particularly those focusing on children or young adolescents.

Data availability
Open

Di You
Psychology and Counseling Department, Alvernia University, Reading, PA, USA The authors carefully designed two studies to test the reliability and validity of the English version of MGM, moral growth mindset. The literature review was relevant and coherent. The method section was well articulated. The results were clearly presented. The discussion was well-organized. One recommendation for future study: given for both studies, majorities of the participants are Caucasian. A future study may want to investigate whether the MGM is consistent across various ethnicities.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed. Competing Interests: Reviewer Expertise: moral development; ethics I confirm that I have read this submission and believe that I have an appropriate level of 1.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
What is the relationship between morality and "morals and character"? What does morals mean here? What does character mean here? Why are these concepts combined?
The basic rule of items is that they should measure only one issue per one item.
In the limitations, author should reflect how to solve the issues mentioned above in the future studies.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: school pedagogy, moral education, gifted education, talent development We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. I do remain skeptical about the word choice "improve your morals" found in the scale's items. My concern is that the phrase introduces unnecessary ambiguity. Are participants meant to interpret it as "improve yourself on specific moral traits" (based on your response I believe this is the intention), or are they meant to interpret it as "improve your moral values." To me, the former is directly relevant to becoming a better person, whereas the latter has very little to do with it--we know, for example, that moral values and behaviour often show only small-to-moderate associations (e.g., Bardi & Schwartz, 2003 ). Becoming a better person entails not only holding moral values but the cognitive, affective, motivational, embodying and behavioral expressions of one's moral values. That said, it's entirely possible that I'm overthinking this and that typical survey respondents will intuitively grasp the meaning you intend. I'll leave it to you to decide whether this issue warrants further discussion.
Beyond that, I would ask you to consider the following additional minor suggestions: Consider providing the MGM scale anchors either in the "Translation of the MGM measure to English" section (p. 3) or in a note beneath Table 2. I think requiring people to find the scale anchors in the "Supplementary Materials.docx" file may prove inconvenient; some future researchers may take the items from Table 2 and make up their own anchors.
In discussing Item 4's smaller factor loading in Study 2, I don't think it's fair to say it had a "slightly lower factor loading in Study 2 compared with Study 1." A difference of -.73 vs. -.39 in standardized loadings is quite substantial. Additionally, I wonder if readers will know which item is #4, since you're referring to the second item in Table 2. This issue also pertains to other items mentioned by number throughout the document.  Table 1, when we recalculated indices after exclusion of the items, they all remained greater than .7." I assume you're referring to alpha, but the context suggests you're saying the model fit indices were greater than .7, which is confusing. (It would also be confusing to say each item had an alpha of .7, since alpha pertains to the collection of items.) Best wishes, Michael T. Warren PubMed Abstract Publisher Full Text

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes contribution to the growing literature on growth mindset. The initial validity and reliability of this scale appears sound, and the inclusion of moral growth mindset (MGM) to the literature base seems evident. Despite the promise of this scale and the concept it captures, the article, as written, has some room for improvement. Most notably, the items themselves may benefit from further revision, more information is needed, especially on the various types of validity discussed in the article (e.g. convergent, divergent), and the writing could be clearer and more concise throughout.

Detailed responses:
Lit review One piece missing here is an argument for the significance of this scale. Why is moral growth mindset an important concept to measure? How does this measure add to existing literature (and existing measures on growth mindset + morality?) Were items for this scale taken from related scales (growth mindset?) Study 1 Elaborate on the "Implicit theory measure." Be sure to say 1-2 sentences about this to help us understand its significance to the current study.

Participants
How were participants compensated? Class credit? Gift card? No compensation? (Ok, later in the next paragraph you mention class credit -I would mention in the first paragraph that participants were offered class credit to participate). To help clarify each portion of the study and make it easy for readers to find the information that wish, I would add a heading of "Procedure" that begins with the "Participants received a link to the qulatircs survey…" Analysis What is "underlying data" and where is it located? From the final sentence of your analysis section.

Study 2 "
In study 2, we tested THE correlation between…" (Add "the" to sentence). You mention the SONA system and how participants selected studies here -make sure to also include this recruitment information in study 1. Additionally, here you include information about the order of survey scales and demographics, which is also not in study 1. In general, there is different information here than in study 1 -I would align these participant sections to include the same relevant information. I would also, again, separate this into a clear "participants" section and a clear "procedure" section. Additionally: How long did it take participants to complete surveys? Were any attention check items included? There is a lot of information about previous studies coming up here that should also be in the introduction/literature review section. This would be great to include so we know what you are expecting before the study is run. Then, here in the discussion, you can tell us if your predictions came out as expected or not. Paragraph 1 and 2 here are a little confusing. Be sure to start each paragraph with a general sentence that indicates what you found. Then, discuss each of your results that provides evidence for your statement, and conclude with a sentence that indicates to us the importance of these results. It feels like there are too many concepts included in each paragraph, and the writing is a bit confusing throughout. Last paragraph "However, there are limitationS" (missing s).

Items
These items all feel a bit too similar -Each asks if morality can or cannot be "improved." Did you consider other items with different word choices? For example, even "You can always become a more moral person with better character." Or "It is possible to grow in your character and morality." These items are so similar that it feels hard to argue any differences between them. Additionally, if possible it's always best to include simpler words over more complicated ones. For instance "substantially" could easily be "a lot" and "considerably" could also be "a lot." More complex words tax the participants a bit more, and may make the sentences more difficult to digest, especially for those with lower readings levels or from less educated backgrounds.

General notes
There are a few places where the writing could be condensed or the use of alternate terms st There are a few places where the writing could be condensed or the use of alternate terms would improve ease of reading. For instance, an example of condensing would be: in participants, "Participants were recruited from an undergraduate subject pool. The pool consisted of students who were enrolled in educational psychology classes" could be condensed to "Participants were recruited from students enrolled in educational psychology classes." A for instance of somewhere where alternate terms would be benefitial would be in: Results, "First, all consistency indicators indicated…" Instead of using "indicator/indicated" here I would recommend "First, all consistency indicators revealed…" Or, in the spirit of condensing/being more specific, I might suggest: First, the measure demonstrated at least acceptable reliability according to both cronbach's alpha values and test-retest reliability." Look for these types of instances throughout the paper with an eye towards condensing repetitive language and becoming more specific.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Developmental psychology, positive psychology, emerging adulthood, positive pyschology interventions

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 06 May 2020 , University of Alabama, Tuscaloosa, USA Hyemin Han Dear Dr. Mangan, We sincerely appreciate your comments and suggestions on our manuscript. We found that they are very constructive and information. While revising our manuscript, we have done our best to address the concerns that you mentioned in your review report. Please find our responses to your address the concerns that you mentioned in your review report. Please find our responses to your comments below. Thank you very much for your time and consideration.

Responses
One piece missing here is an argument for the significance of this scale. Why is moral growth 1. mindset an important concept to measure? How does this measure add to existing literature (and existing measures on growth mindset + morality?) Were items for this scale taken from related scales (growth mindset?) : We appreciate your comment regarding the explanation of why the measure is Response important. Also, we described how the items were developed. In the revised manuscript, we elaborated further details.

"The results suggested that among younger populations, MGM might increase participants' prosocial behavior due to the belief that it will make them morally better. Given this, MGM would be considered as a factor that contributes to moral development. In order to adequately examine how MGM contributes to moral development, however, it is necessary to have an appropriate measure. Additionally, if moral growth mindset motivates people to learn how to become more moral, as previous research suggests, then it is important for moral educators to have a tool to assess the malleability beliefs students have related to their morals. For example, if moral educators are able to identify that some students have a fixed mindset related to their morals, then an appropriate starting point may be to provide them with evidence that it is possible to improve moral character throughout one's life."
"Instead, the inventors (HH, KJD, and YJC) of the Korean MGM measure created its English version based on the structure of the Korean version and the wording in the Implicit Theory measure. In addition, the Implicit Theory measure was used due to the fact that it had six items and was based on Dweck's original measure of growth mindset for intelligence. As a result, the tested measure included six items as well (e.g., "No matter who you are, you can significantly improve your morals and character") and answers were anchored to a six-point Likert scale (see Extended data for the full measure 10 )." Elaborate on the "Implicit theory measure." Be sure to say 1-2 sentences about this to help us 2. understand its significance to the current study.
: Thank you very much for your suggestion regarding the elaboration of the construct.

Response
We elaborated such a point in the revised manuscript: "Growth mindset refers to the belief that it is possible to improve one's abilities and qualities, such as intelligence or personality 1 . These individuals believe that this can be done through effort and learning, which helps fosters motivation. Higher motivation for those with a growth mindset is encouraged through having attitudes such as viewing hardships as a chance to work harder rather than an indication of failure, and striving for success due to genuinely wanting to learn instead of being concerned with how others view them 2" being concerned with how others view them 2" How were participants compensated? Class credit? Gift card? No compensation?(Ok, later in 3. the next paragraph you mention class credit -I would mention in the first paragraph that participants were offered class credit to participate).
: Thanks a lot for your request for the clarification of the compensation. In the revised Response manuscript, such a point is more clearly stated: "Participants were recruited from students enrolled in undergraduate educational psychology classes and they were provided with a course credit." To help clarify each portion of the study and make it easy for readers to find the information that 4. wish, I would add a heading of "Procedure" that begins with the "Participants received a link to the qulatircs survey…" : We sincerely appreciate your suggestion regarding the use of the "procedure" Response subsection. In the revised manuscript, we created the new subsection for a better structure.
What is "underlying data" and where is it located? From the final sentence of your analysis 5. section.
: Thank you for your comment regarding "underlying data." "Underlying data" is a way to Response include supplementary materials in F1000Research. Readers can download the supplementary materials with the URL provided at the end of the main text. More specifically, a link to an open science repository is provided in the "data statement" section as per the journal guidelines.
"In study 2, we tested THE correlation between…" (Add "the" to sentence).You mention the 6. SONA system and how participants selected studies here -make sure to also include this recruitment information in study 1. Additionally, here you include information about the order of survey scales and demographics, which is also not in study 1. In general, there is different information here than in study 1 -I would align these participant sections to include the same relevant information. I would also, again, separate this into a clear "participants" section and a clear "procedure" section.
: We appreciate your comments regarding the typo and the use of the independent Response subsection, "procedure." We addressed these issues in the revised manuscript.
Additionally: How long did it take participants to complete surveys? Were any attention check 7. items included?
: Thanks a lot for your kind comment regarding the survey duration. Unfortunately, we Response could not include any attention check items in the survey form. We explained further details in the limitation section:

"Third, although participants spent about 33.98 minutes (median) to complete Study 2 we did not include any attention check items."
While these measures look great -we need to understand their inclusion. In the literature review 8. portion of this article there should be discussion of each construct and why/how you expect it to relate to moral growth mindset. Why are these good choices for convergent validity?
: We sincerely appreciate your comment regarding the rationale for the inclusion of the Response additional measures. In the introduction section in Study 2, we added explanations regarding the point. In addition, while describing each additional measure, we explained the rational for the inclusion as well as the hypothesized correlation with MGM. You mention that further details are available in "extended data" but I don't see a way to access 9. this. Is all the relevant information explaining the connection of these scales to moral growth mindset provided there?
: Thank you for your comment regarding the "extended data." Same to the "underlying Response data," "extended data" can also be downloaded with the link provided at the end of the main text. In the revised manuscript, as we mentioned in our response to your comment above, we explained further details regarding each additional measure in the methods section in Study 2.
Additionally, did you test for divergent validity? Content validity? Construct validity? If not, why?

10.
Ok, some indication of a test for discriminant validity is presented in analysis, but this should be listed with the other measures above. Similarly, more information is needed here about why you expect this to be divergent from moral growth mindset.
: We appreciate your comment regarding the clarification of the tests that we Response conducted. In the revised manuscript, we explained further details.
conducted. In the revised manuscript, we explained further details. "Given that the Implicit Theory Measure measures one's general growth mindset, we expected that it would be positively correlated with MGM. However, because the construct measured by the Implicit Theory Measure is not domain specific, we also expected that the MGM would not completely overlap with this construct (discriminant validity). Given these, the effect size of the correlation coefficient would be medium to large (r = +.3 -+.5)." 1st sentence: "We developed and tested….from youth participants." These participants were 11. emerging adults, correct? Not youth?
: Thanks for your point regarding the correct use of the term. Yes, that is correct. So, we Response used "emerging adults" instead of "youth…" in the revised manuscript:  1 -.3)." In addition, in the discussion section, we elaborated the meaning of the negative correlation found from our analysis: "In addition, moral disengagement was negatively correlated with MGM. Since moral disengagement allows people to dismiss negative feelings, they may have about behaving immorally using the eight mechanisms previously mentioned, this increases the likelihood of continuing to behave immorally. In this way, moral disengagement and MGM have somewhat reverse trajectories. As hypothesized, this suggests that MGM may promote engaging in moral behavior. In addition, since moral internalization, which has been shown to inhibit moral disengagement 39, was also positively correlated with MGM, it makes sense that our measure was negatively correlated with moral disengagement. If somebody has a strong sense of their morals and these values are internalized, this may help them to stay engaged with their standards and furthermore, be motivated to continue to be morally better." I'm not sure what you mean here in sentence two "In fact, the previous studies that developed 13. and tested measurements for the mindset with diverse domains…" What is "the mindset"? Do you mean, that tested other types of domain-specific growth mindset?
: Thank you for your request for the clarification. Yes, that is correct. In the revised Response manuscript, we specified the nature of the mindset: "In fact, the previous studies that developed and tested measurements for diverse types of domain-specific growth mindset have shown that the measurements possessed good reliability and validity as well (e. g., 29, 30)." There is a lot of information about previous studies coming up here that should also be in the 14. introduction/literature review section. This would be great to include so we know what you are expecting before the study is run. Then, here in the discussion, you can tell us if your predictions came out as expected or not.
: We sincerely appreciate your suggestion regarding rewriting the introduction. We Response agree with you that some theoretical contents that were presented in the discussion section in the original manuscript could be moved on to the introduction section for a better structure. Following your suggestion, in the revised manuscript, we presented such contents in the general introduction or introduction of each study. Paragraph 1 and 2 here are a little confusing. Be sure to start each paragraph with a general 15.
sentence that indicates what you found. Then, discuss each of your results that provides evidence for your statement, and conclude with a sentence that indicates to us the importance of these results. It feels like there are too many concepts included in each paragraph, and the writing is a bit confusing throughout.
: We appreciate your comment regarding the discussion section. As you suggested, in Response the revised manuscript, we slightly restructured the paragraphs in the discussion section. We the revised manuscript, we slightly restructured the paragraphs in the discussion section. We revised each of following paragraphs so that it discusses one specific point each time.
Last paragraph "However, there are limitationS" (missing s).

16.
Response: Thank you very much for your comment on the typo. We corrected the point in the revised manuscript.
These items all feel a bit too similar -Each asks if morality can or cannot be "improved." Did 17. you consider other items with different word choices? For example, even "You can always become a more moral person with better character." Or "It is possible to grow in your character and morality." These items are so similar that it feels hard to argue any differences between them. Additionally, if possible it's always best to include simpler words over more complicated ones. For instance "substantially" could easily be "a lot" and "considerably" could also be "a lot." More complex words tax the participants a bit more, and may make the sentences more difficult to digest, especially for those with lower readings levels or from less educated backgrounds.
: We appreciate your comments regarding the items used in our measure. Yes, we Response agree with you that some words used in the items are somehow complex to be easily understood by younger participants. So, we also think that such items may need to be modified if the measure is to be administrated among younger populations. Also, we also acknowledged that some words (e.g., "improve") were repeatedly used in multiple items. Since we intended to keep the consistency with the original measures that we referred to (e.g., Dweck's general growth mindset measure, the Korean version of the MGM measure), we ended up with using such terms in our measure. We explained these points in the limitation section: "Fifth, the items used in the MGM measure could be revised particularly when being administered among younger populations. We decided to use the current wordings to maintain consistency with the Korean version of the MGM measure and the Implicit Theory Measure, which constituted the basis of our measure. However, to make the measure more applicable to younger populations, some complex words (e.g., "substantially," "considerably") could be replaced with simpler words (e.g., "a lot"). Finally, since several items in the measure might seem to be similar, the words could be revised in future studies, particularly those focusing on children or young adolescents." There are a few places where the writing could be condensed or the use of alternate terms 18. would improve ease of reading. For instance, an example of condensing would be: in participants, "Participants were recruited from an undergraduate subject pool. The pool consisted of students who were enrolled in educational psychology classes" could be condensed to "Participants were recruited from students enrolled in educational psychology classes." A for instance of somewhere where alternate terms would be benefitial would be in: Results, "First, all consistency indicators indicated…" Instead of using "indicator/indicated" here I would recommend "First, all consistency indicators revealed…" Or, in the spirit of condensing/being more specific, I might suggest: First, the measure demonstrated at least acceptable reliability according to both cronbach's alpha values and test-retest reliability." Look for these types of instances throughout the paper with an eye towards condensing repetitive language and becoming more specific.
: We sincerely appreciate your suggestions regarding the brevity of the manuscript. We Response agree with you that increasing the brevity is essential to enable potential readers to better understand the overall theme of our manuscript while saving their time. Thus, we edited the whole manuscript during the current revision process. In addition, we revised the manuscript as an effort to minimize repeating to use the same words in the multiple places.
Not available.

Competing Interests:
I see three major issues with the paper in its current form, and I would encourage the authors to revise their manuscript in light of my suggestions.
First, my biggest concern is the use of phrases such as "improve one's morals" and "improve your morals." The former occurs in the paper's conceptual framing (p. 3) and the latter appears in each of the MGM scale's four items (p. 4). The issue with these phrases is that improving one's morals seems to deviate conceptually from improving one's morality. One might improve their morals by setting new (or higher) moral standards for themselves, yet they may fail miserably in living up to their moral values. By contrast, improving one's morality involves actually becoming a better person, and this, I believe, is the construct the authors intended to measure. Since in my view the items miss the target to some extent, I have indicated that the work is only "partly" technically sound. Unfortunately, I don't think much can be done about this issue at this point, but at a minimum I would recommend that the authors either provide an argument for the use of "moral" rather than "morality" in their scale, or identify this as a limitation of their scale. In addition, they might choose to argue that this concern is assuaged by the scale's strong evidence of convergent validity with other measures of morality.
Second, I think the CFA results need to be communicated more fully, and that is why I have indicated that the statistical analyses and their interpretations are only "partly" appropriate. On p. 4, I certainly understand consulting previous studies (e.g., Han , 2018), but data from the current (i.e., English) et al. study should be given primary importance in refining the English scale. I recommend reporting the factor loadings from the original CFA (i.e., before Items 1 and 2 were removed), so readers can evaluate 1 2 loadings from the original CFA (i.e., before Items 1 and 2 were removed), so readers can evaluate whether removing these items was justified on empirical grounds. On a related note, I think it would be appropriate to acknowledge the small factor loading (-.39) for the reverse-scored item in Study 2.
Third, I think the introduction to Study 2 (p. 5) should be expanded considerably. It would be very helpful to provide a brief rationale for why the selected constructs were chosen for convergent and discriminant validity testing. In addition, it would be helpful to specify hypotheses concerning the strength and direction of the associations between MGM and other constructs (i.e., with which constructs does MGM have strongest and weakest theoretical ties?), and why. The discussion currently states that the observed associations were "as hypothesized," but no hypotheses were specified in the lead-up to Study 2. I also found myself wondering why Chiu (1997)'s original 3-item English MGM measure was not included et al. for convergent and incremental validity testing.

Minor comments:
Page 3: My understanding is that growth mindset generally concerns one's beliefs about the malleability of one's own (and others') qualities. Thus, it seems a little bit too generic to define growth mindset as believing "it is possible to improve aspects of one's life." There are aspects of a person's life (e.g., what kind of work they do; where they live, etc.) that are not qualities of their personhood. I suggest the authors consider revising their opening definition of growth mindset.
Page 3: It's not clear to me how allowing participants to define "moral" and "character" necessarily allows them to do so "without bias." Instead, I think it would be more accurate to say that the approach taken leaves it up to participants to interpret "moral" and "character" according to their own subjective understandings of those terms. (Note that this approach makes no claim that participants' understandings are "without bias.") Pages 3 and 5: I suggest the authors change the "Participants" heading to "Participants and procedures." Page 4: I suggest confirming that three IRB approvals were needed for just two studies.
Pages 4-5: I suggest referring to model fit and reliability indices as either "indices" or "indexes," rather than "indicators." Given that CFA was involved, readers may assume "indicators" refers to measured variables loading onto latent factors.
Page 5 (last paragraph of Study 1): I would like to suggest an alternative explanation as to why Items 1 and 2 (presumably) had lower factor loadings. These two were the only items to convey morality/character as dispositional (e.g., "You have a certain morality and character..."; "Your morality and character are something about you..."). By contrast, all items measured malleability beliefs, including the retained reverse-scored item ("To be honest, you can't really improve your morals and character."). My understanding is that a growth mindset is anchored in malleability beliefs, and having a growth mindset does not preclude the belief in moral dispositions (e.g., with effort I can become a more consistently/dispositionally honest person). In other words, perhaps the reason why Items 1 and 2 presumably had lower factor loadings was because they strayed somewhat from the core of the growth mindset construct (i.e., malleability beliefs), rather than because they used the vague qualifier, "much." Just some food for thought.
Page 5 (Participants section): Much of the first two paragraphs in this section is redundant with the procedures described in Study 1. The authors may wish to simply state that the same recruitment procedures were used as described in Study 1. 1 procedures were used as described in Study 1.
Page 5: I would strongly urge the authors to omit the term, "marginally correlated" in relation to MGM's association with the bDIT. Once a threshold for statistical significance has been set (e.g., .05), a finding is either statistically significant or non-significant. Correlations with p-values between .05 and .10 are non-significant.
Page 6 ( Page 6: I think more explanation is needed as to why testing measurement invariance would be helpful. For example, the authors might say that examining measurement invariance across diverse groups of people (e.g., political conservatives vs. liberals; young adults vs. older adults) would help evaluate whether the scale-which leaves it up to participants to interpret morality and character in the item stems-in fact measures the same thing for groups who may use different underlying folk conceptions of morality.

the chance for interventions since if people want to become a better person (improve their morality) they may need to believe that their values (morals) can be improved."
Second, I think the CFA results need to be communicated more fully, and that is why I have 2. indicated that the statistical analyses and their interpretations are only "partly" appropriate. On p. 4, I certainly understand consulting previous studies (e.g., Han , 2018), but data from the current et al.
(i.e., English) study should be given primary importance in refining the English scale. I recommend reporting the factor loadings from the original CFA (i.e., before Items 1 and 2 were removed), so readers can evaluate whether removing these items was justified on empirical grounds. On a related note, I think it would be appropriate to acknowledge the small factor loading (-.39) for the reverse-scored item in Study 2.
: We appreciate your comment regarding how to report results from CFA. Following Response your suggestion, we added a supplementary table that demonstrates the factor loadings in 6-item and 5-item models. As you can see, Item 1 and Item 2 showed the lowest standardized factor loadings in the 6-item and 5-item models, respectively. In the revised manuscript, we mentioned the point that they were excluded from the measure due to their lowest standardized factor loadings. Moreover, we acknowledge the slightly low factor loading in Study 2: "However, it should be acknowledged that Item 4 showed a slightly lower factor loading in Study 2 compared with Study 1, although the overall model fit indices were excellent. This point might need to be tested in future studies with more samples." Third, I think the introduction to Study 2 (p. 5) should be expanded considerably. It would be 3. very helpful to provide a brief rationale for why the selected constructs were chosen for convergent and discriminant validity testing. In addition, it would be helpful to specify hypotheses concerning the strength and direction of the associations between MGM and other constructs (i.e., with which constructs does MGM have strongest and weakest theoretical ties?), and why. The discussion currently states that the observed associations were "as hypothesized," but no hypotheses were specified in the lead-up to Study 2. I also found myself wondering why Chiu (1997)'s original et al. 3-item English MGM measure was not included for convergent and incremental validity testing.
: Thanks a lot for your suggestion regarding the expansion of the Study 2 introduction. Response Following your suggestion, in the revised manuscript, we elaborated the rationale regarding how the additional construct used in our study were selected with citations. Furthermore, in the methods section, per additional measurement, we explained the direction and effect size of the hypothesized correlation. We agree with you that employing Chiu et al.'s original measure in the present study would be beneficial. However, we did not consider doing so because our measure was originally Based on Dweck's updated six-item measure for general growth mindset. In the limitation section, we acknowledged your point for reader's information.
"Fourth, we did not employ Chiu et al.'s (1997)  Page 3: My understanding is that growth mindset generally concerns one's beliefs about the 4. malleability of one's own (and others') qualities. Thus, it seems a little bit too generic to define growth mindset as believing "it is possible to improve aspects of one's life." There are aspects of a person's life (e.g., what kind of work they do; where they live, etc.) that are not qualities of their personhood. I suggest the authors consider revising their opening definition of growth mindset.
: We appreciate your suggestion regarding the introduction. We revised the introduction Response for a better definition of the growth mindset: "Growth mindset refers to the belief that it is possible to improve one's abilities and qualities, such as intelligence or personality 1 . These individuals believe that this can be done through effort and learning, which helps fosters motivation. Higher motivation for those with a growth mindset is encouraged through having attitudes such as viewing hardships as a chance to work harder rather than an indication of failure, and striving for success due to genuinely wanting to learn instead of being concerned with how others view them 2" Page 3: It's not clear to me how allowing participants to define "moral" and "character" 5. necessarily allows them to do so "without bias." Instead, I think it would be more accurate to say that the approach taken leaves it up to participants to interpret "moral" and "character" according to their own subjective understandings of those terms. (Note that this approach makes no claim that participants' understandings are "without bias.") : We appreciate your comment regarding the use of the terms in our study. In the Response revised manuscript, we updated our explanation regarding the terms as per your comment: "Thus, we used "morals and characters" in order for participants to be able to define the terms based on their own experiences and understanding." Pages 3 and 5: I suggest the authors change the "Participants" heading to "Participants and 6. procedures." : Thanks a lot for your suggestion regarding the subsection. In the revised manuscript, Response following your and Dr. Mangan's suggestions, we moved contents regarding the study procedures to a new subsection, "procedures." Page 4: I suggest confirming that three IRB approvals were needed for just two studies.

7.
: Thank you for your comment regarding the IRB numbers. In the revised manuscript, Response we clearly stated which IRB protocols are relevant to which specific study.
Pages 4-5: I suggest referring to model fit and reliability indices as either "indices" or "indexes," 8. rather than "indicators." Given that CFA was involved, readers may assume "indicators" refers to measured variables loading onto latent factors.
: We appreciate your comment regarding the use of the term. In the revised manuscript, Response as per your comment, we used "indices" in lieu of "indicators" while addressing CFA.
Page 5 (last paragraph of Study 1): I would like to suggest an alternative explanation as to why 9. Items 1 and 2 (presumably) had lower factor loadings. These two were the only items to convey morality/character as dispositional (e.g., "You have a certain morality and character..."; "Your morality and character are something about you..."). By contrast, all items measured malleability beliefs, including the retained reverse-scored item ("To be honest, you can't really improve your morals and character."). My understanding is that a growth mindset is anchored in malleability beliefs, and having a growth mindset does not preclude the belief in moral dispositions (e.g., with effort I can become a more consistently/dispositionally honest person). In other words, perhaps the reason why Items 1 and 2 presumably had lower factor loadings was because they strayed somewhat from the core of the growth mindset construct (i.e., malleability beliefs), rather than because they used the vague qualifier, "much." Just some food for thought.
: Thanks a lot for the alternative explanation of the lower factor loadings of items 1 and Response 2. We added such an alternative explanation in the revised manuscript for readers' information: "In addition, as another possibility, items 1 and 2 are more likely about entity beliefs, not malleability beliefs that constitute the basis of growth mindset. These items contain some words perhaps related to entity beliefs (e.g., "certain morals and characters...," "something about you…"), so they might not directly measure the core of the growth mindset construct and showed lower factor loadings compared to the other items." factor loadings compared to the other items." Page 5 (Participants section): Much of the first two paragraphs in this section is redundant with 10. the procedures described in Study 1. The authors may wish to simply state that the same recruitment procedures were used as described in Study 1.
: Thank you very much for your suggestion for the brevity of our manuscript. We Response shortened the redundant part in Study 2 as per your suggestion.
Page 5: I would strongly urge the authors to omit the term, "marginally correlated" in relation to 11. MGM's association with the bDIT. Once a threshold for statistical significance has been set (e.g., .05), a finding is either statistically significant or non-significant. Correlations with p-values between .05 and .10 are non-significant.
: Thanks a lot for your comment about the use of the term, "marginally correlated." We Response agree with you that the use of the term is somehow inappropriate, so in the revised manuscript, we changed the part about interpreting the finding from correlation analysis: "The effect size of the correlation coefficient between MGM and bDIT was small as predicted, but the correlation was non-significant (p = .08)." Page 6 ( Table 3): I suggest indicating where Cronbach alphas are reported (i.e., on the 12. diagonal).
: We appreciate your suggestion. We added a brief description about where alphas Response were reported: "Cronbach αs are also reported (on the diagonal)." Page 6: More information on the potential utility of the MGM measure for understanding moral 13. development would be a nice selling point for the scale. For example, this scale makes it possible to test whether MGM moderates the efficacy of moral education and social emotional learning interventions. The scale would also be an important outcome measure in examining how to nurture MGM (e.g., through process praise, teaching about neuroplasticity, etc.).
: Thanks a lot for your suggestion about the elaboration of the potential utility of the Response measure. In the introduction, we briefly mentioned how the measure could be used in moral education: "Additionally, if moral growth mindset motivates people to learn how to become more moral, as previous research suggests, then it is important for moral educators to have a tool to assess the malleability beliefs students have related to their morals. For example, if moral educators are able to identify that some students have a fixed mindset related to their morals, then an appropriate starting point may be to provide them with evidence that it is possible to improve moral character throughout one's life." We appreciate your comment regarding the participant rate. In the revised manuscript, we clearly described that all participations were done voluntarily and all of them who appropriately signed up for our study completed the survey.
Only the participants who voluntarily signed for Study 1 were provided with the link. We created our Qualtrics survey in a way so that only the participants who answered all survey questions were able to complete the survey and receive a credit. Thus, there was no missing data in the present study. Afterwards, we sent them invitations to participate in the survey again one week later.
2. Missing data. Even when missing data was not an issue, it would be a good idea to state the % of missing values in the text. I assume that the online survey did not force the participants to respond to each item. If yes, that if okay, but even in that case you could state the (0)% of missing values explicitly.
Thank you very much for your comment regarding the missing data. As we responded to your prior comment, in the revised manuscript, we explicitly mentioned that there was no missing data.
3. Limitations. Was the course/the pool related to moral development, social psychology or related issues? If yes, the comment about the limited generalizability should be elaborated a bit in this context as the sample might over-represent people interested in character development, human relations etc.
We appreciate your comment regarding the nature of the pools. All participants were taking general psychology and educational psychology classes. Although some class contents were related to human development in general, the classes did not focus on moral and social development. In the revised manuscript, we explained the nature of the pools briefly.
Participants were recruited from an undergraduate subject pool. The pool consisted of students who were enrolled in introductory psychology and educational psychology classes.
4. Discussion. I still find pieces of the discussion unelaborated. Specifically, as a reader I would like to see there the novel findings of the study interpreted in the context of the existing knowledge with a few citations of the existing literature.
Thanks a lot for your suggestion regarding the elaboration of the discussion section. In the revised manuscript, we elaborated the section based on prior studies about the development and validation of growth mindset measures and those about the relationship between growth mindset and positive youth development.
Our results from both studies suggest that the English version of the MGM measure can well measure one's MGM as we intended. In fact, the previous studies that developed and tested measurements for the mindset with diverse domains have shown that the measurements possessed good reliability and validity Hence, our study that tested and validated the MGM measure demonstrated that first, MGM can be well measured by the MGM measure as growth mindset in measure demonstrated that first, MGM can be well measured by the MGM measure as growth mindset in general was measured by reliable and valid tools in previous studies; and second, MGM is associated with moral and positive youth development as shown in previous growth mindset studies in other domains.
I am the corresponding author of this paper.

Competing Interests:
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com