The Norwegian public’s ability to assess treatment claims: results of a cross-sectional study of critical health literacy

Background: Few studies have evaluated the ability of the general public to assess the trustworthiness of claims about the effects of healthcare. For the most part, those studies have used self-reported measures of critical health literacy. Methods: We mailed 4500 invitations to Norwegian adults. Respondents were randomly assigned to one of four online questionnaires that included multiple-choice questions that test understanding of Key Concepts people need to understand to assess healthcare claims. They also included questions about intended behaviours and self-efficacy. One of the four questionnaires was identical to one previously used in two randomised trials of educational interventions in Uganda, facilitating comparisons to Ugandan children, parents, and teachers. We adjusted the results using demographic data to reflect the population. Results: A total of 771 people responded. The adjusted proportion of Norwegian adults who answered correctly was < 50% for 17 of the 30 Key Concepts. On the other hand, less than half answered correctly for 13 concepts. The results for Norwegian adults were better than the results for Ugandan children in the intervention arm of the trial and parents, and similar to those of Ugandan teachers in the intervention arm of the trial. Based on self-report, most Norwegians are likely to find out the basis of treatment claims, but few consider it easy to assess whether claims are based on research and to assess the trustworthiness of research. Conclusions: Norwegian adults do not understand many concepts that are essential for assessing healthcare claims and making informed choices. Future interventions should be tailored to address Key Concepts for which there appears to be a lack of understanding.


Introduction
Enabling people to make informed decisions about healthcare by improving their ability to think critically about such claims is an important public health initiative 1 .Health literacy has been defined in many ways.Commonly this includes functional, interactional, and critical health literacy skills 2 .Health literacy research efforts in Norway and elsewhere have mainly been directed at providing reliable health information and education in functional literacy, focusing on improving understanding of medical terminology and self-management of health conditions 3 .Very little research has targeted critical health literacy, focused on enabling members of the public to make informed healthcare choices [4][5][6] .
Training in critical thinking about treatment decisions includes courses in evidence-based practice and education in basic research methodology.Such training is usually directed at health professionals.Despite the importance of enabling patients and the public to make informed decisions about treatment choices, such training has rarely been offered [4][5][6][7][8] .
Over the past decade, there has been increasing interest in enabling patients and the public to think critically about healthcare 4,5,[9][10][11][12][13][14] .For example, in Norway three websites have been developed that aim to empower patients and the public to assess the trustworthiness of health claims [15][16][17] .Critical thinking has also been the focus of popular television shows that target common claims about treatment effects and illustrate how such claims can be tested using rigorous study designs.In collaboration with the Norwegian national television network (NRK), we carried out randomised trials with the aim of educating the public about the need for fair comparison of treatments [18][19][20][21][22] .More recently, we have sought to improve critical thinking in school children through the Informed Health Choices (IHC) project, an international collaboration of researchers in Uganda, Rwanda, Kenya, the UK, and Norway.We developed an educational intervention to teach primary school children to assess healthcare claims and an educational podcast for their parents.Both interventions were shown to be effective in randomized trials in Uganda 23,24 .
Few studies have been done to evaluate this ability in the general public to assess the trustworthiness of claims about the effects of healthcare and to make informed health choices 6 .Most such studies have relied on self-assessment of critical health literacy skills.One such project is the HLS-EU survey, mapping health literacy skills in several European countries 25 .

The Claim Evaluation Tools item bank
The Claim Evaluation Tools item bank was originally developed in English for use in the two randomised trials described above 23,24 .Since the item bank was first created items have been translated and validated in several settings and languages, including Uganda (in Luganda and English), Mexico, and China [26][27][28][29][30] .
Our starting point for developing the educational interventions and the Claim Evaluation Tools item bank was a list of Key Concepts that people need to understand and apply to assess claims about treatment effects and make informed health choices (hereafter referred to as Key Concepts) 31 (Table 1).We define "treatment" broadly to include any intervention (action) intended to improve health, including preventive, therapeutic and rehabilitative interventions, and public health or health system interventions.The list of concepts provides a framework for researchers, teachers and others to develop interventions and measure people's ability to assess treatment claims and make informed choices.We review and amend the list yearly 9 .
The item bank is revised according to the Key Concepts list yearly.When we conducted this study, it included four multiple-choice questions (MCQs) for 32 of the Key Concepts shown in Table 1.The item bank also includes questions that assess intended behaviours and self-efficacy.The item bank is an open-access resource.Teachers and researchers can select items for questionnaires tailored to their specific needs.
To our knowledge only one survey has attempted to measure the ability to understand and apply any of the IHC Key Concepts in a representative sample of Norwegian adults 32 .That study only addressed four of the Key Concepts.The purpose of this study was to map the ability of Norwegian adults to assess treatment claims and make informed health choices, using MCQs from the Claim Evaluation Tools item bank.The findings can be used to inform the development of learning resources and communication of information to patients and the public, and for international comparisons.We also wanted to take the opportunity to compare the findings of this study with the results of our previous Ugandan study.Despite any differences that may exist between these two contexts, it is our experience from previous projects that the challenges associated with assessing claims in everyday life in Uganda and Norway are very similar.However, knowledge of specific Key Concepts and overall understanding may differ.This study will consequently provide us with comparative information of understanding of the concepts in Uganda and Norway, but also explore any differences in understanding for specific Key Concepts.Furthermore, the study in Uganda included children and thus offered us the opportunity to compare our results from the adult population in Norway with those of children in Uganda.

Objective
To map the ability of Norwegian adults to assess treatment claims and make informed choices, and to compare these results to the findings of two studies in Uganda 23,24 .

Amendments from Version 1
We have revised the manuscript and considered all suggested feedback carefully.
The methods section has been revised to improve the description of the content and development of the questionnaires.Furthermore, the wording of the abstract and the results section has been revised to improve clarity.Typos were corrected were noted.
Any further responses from the reviewers can be found at the end of the article Table 1.The Informed Health Choices Key Concepts for assessing claims about treatment effects and making well informed treatment choices.

Claims
Claims about effects that are not supported by evidence from fair comparisons are not necessarily wrong, but there is an insufficient basis for believing them.

Comparisons
Studies should make fair comparisons, designed to minimize the risk of systematic errors (biases) and random errors (the play of chance).

Choices
What to do depends on judgements about a problem, the relevance of the evidence available, and the balance of expected benefits, harms, and costs.g) The results of one study considered in isolation can be misleading.h) Widely used treatments or those that have been used for decades are not necessarily beneficial or safe.
i) Treatments that are new or technologically impressive may not be better than available alternatives.j) Increasing the amount of a treatment does not necessarily increase its benefits and may cause harm.k) Earlier detection of ' disease' is not necessarily better.l) It is rarely possible to know in advance who will benefit, who will not, and who will be harmed by using a treatment.f) Outcomes should be assessed using methods that have been shown to be reliable.
g) It is important to assess outcomes in all (or nearly all) the people or subjects in a study.h) People's outcomes should be counted in the group to which they were allocated.f) Confidence intervals should be reported for estimates of effects.g) Deeming results to be "statistically significant" or "nonsignificant" can be misleading.h) Lack of evidence of a difference is not the same as evidence of "no difference".

Evidence should be relevant.
a) Attention should focus on all important effects of treatments, and not surrogate outcomes.b) There should not be important differences between the people in studies and those to whom the evidence will be applied.
c) The treatments compared should be similar to those of interest.d) There should not be important differences between the circumstances in which the treatments were compared and those of interest.

Ethical statement
This study was considered for ethical approval by the Norwegian Institute of Public Health.This project was considered to not to require a full ethical review (reference 18/11854), because no sensitive data was included.All information was collected through Nettskjema (a web-based survey system), ascertaining a high level of data security and safety.
The population was drawn from the National registry who provided us with a CD-ROM with respondents' addresses.This CD was destroyed once data collection was terminated.
All respondents were given information about the purpose of the study and how the results would be managed and presented.To gain access to the questionnaire, each participant had to provide written consent electronically.The questionnaire was anonymous and once submitted, the information could not be traced back to the respondent.

Development of the questionnaires Translation and adaption of the item bank to Norwegian.
The Claim Evaluation Tools item bank was originally developed in English for use in Uganda.For this study, a language specialist (KFO) translated all items to Norwegian.Three researchers trained in evidence-based medicine and epidemiology reviewed this translation (AD, ADO, and Atle Fretheim).The questions were not modified other than changing some of the names of people in the scenarios to more familiar names in Norwegian.For example, "Dr.Acheng" was changed to "Dr.Anker" (a more familiar name in Norwegian).The original English version was developed using plain language and this was also an important concern in the Norwegian translation.

Description of questions included in the item bank.
All MCQs in the Claim Evaluation Tools item bank include a scenario with a treatment claim, and response options, with one answer being the "best" (correct) and the remaining options considered "worse" (incorrect).An example of a multiple-choice question is shown in Box 1.

Box 1. Example of a multiple-choice question
Judith wants smoother skin.The younger girls in her school have smoother skin than the older girls.Judith thinks this is because the younger girls use cream on their skin to make the skin smoother.
Question: Based on this link between using cream and smooth skin, is Judith correct?

A)
It is not possible to say.It depends on how many younger and older girls there are

B)
It is not possible to say.There might be other differences between the younger and older girls

C)
Yes, because the younger girls use cream on their skin and they have smoother skin

D)
No, Judith should try using the cream herself to see if it works for her In addition to knowledge questions, the item bank also includes two sets of questions scored on likert scales evaluating intended behaviour and self-efficacy associated with assessing claims.
Design of the questionnaires.For this study we decided to include two MCQs for all of the 32 Key Concepts.We also included the question sets for intended behavior and self-efficacy.Considering the large number of questions, we decided to split the questions across four questionnaires (see Additional file 1 and Additional file 5, Extended data) 33,34 .
We considered four of the 32 Key Concepts to be amongst the most important Key Concepts that can be understood and used by children, as well as adults.A set of 8 MCQs addressing these four Key Concepts were therefore included in all four questionnaires: • An outcome may be associated with a treatment but not caused by it.
• The results of one study considered in isolation can be misleading.
• Widely used treatments or those that have been used for decades are not necessarily beneficial or safe.
• Comparison groups should be as similar as possible.
In order to compare the findings of the Norwegian study with the Uganda study, the first questionnaire was designed so that it was identical to the questionnaire used in Uganda and included 27 questions 23,24 .The second questionnaire included the remaining items that address concepts in the first (Claims) category as well as items from the second (Comparisons) category (24 questions) (Table 1).The third questionnaire included items from the third (Choices) category.This questionnaire also included three items evaluating intended behaviours regarding assessing treatment claims and agreeing to participate in a study evaluating treatments, and four items regarding self-efficacy (21 questions in total).The fourth questionnaire included MCQs addressing Key Concepts that we judged to be more difficult, mostly belonging to the Comparisons category (25 questions).
All four questionnaires also included the same set of questions regarding the participant's sex, level of education, health professional background, training in research methods, and prior participation in research.

Recruitment of participants and administration of the questionnaires
Based on evidence from two systematic reviews that evaluated strategies to improve participation in research 35,36 , we developed attractive postcards personally addressed to each potential participant inviting people to take part in the study.The postcard is shown in Figure 1.The postcard included a short description of the purpose of the study as well as a URL ("link") to one of the four online questionnaires.Participants received full information about the study when they accessed the website and were asked to provide informed consent.Evidence also suggests greater participation rates can be expected if people are informed how the research may benefit them.This was stated on the postcard and the website provided information about how the study results can be accessed.
In January and February 2019, we mailed postcards to a representative sample of 4500 adults (≥18 years) living in Norway.The sample was provided to us by the Norwegian National Registry, considering level of education, sex and geographical spread 37 .The questionnaires were administrated electronically using Nettskjema, a service provided by University of Oslo 38 .In addition to the first postcard, one reminder postcard was sent out to each person.

Rasch analysis
The questionnaires were evaluated for their psychometric properties as part of this study.Although some MCQs were tested in Norway as part of a previous pilot, this is the first full-scale psychometric evaluation of the complete available battery of the Claim Evaluation Tools Item bank 26,27 .
Rasch analysis is a dynamic and practical approach to address important measurement issues required for validating an outcome [39][40][41] .In this study we followed the fundamental steps of Rasch analysis including testing for internal construct validity (multidimensionality), invariance of the items (Item-Person Interaction), and item bias (differential item functioning), as well as testing for reliability 40,41 .Rasch analysis can be used for both dichotomous and polytomous data 40,42,43 .Raw data were exported from the electronic data collection service (Nettskjema) as Excel-files and entered into RUMM2030 for Rasch analysis.Results can be replicated using the open-access software WINSTEPS 44 .Each questionnaire set was evaluated separately.Questionnaire 3 included also polytomous questions assessing intended behaviour and self-efficacy questions.Therefore, for questionnaire 3 specifically, additional analysis was done for each item block separately (MCQ-item block and intended behaviour and self-efficacy questions-block respectively).

Survey analysis
Each MCQ was scored as correct or incorrect.The Key Concepts for which there were two MCQs were scored as "understood" if a participant answered both MCQs correctly.For questions about intended behaviours and self-efficacy, we dichotomised the responses in the analysis, for example as likely (very likely or likely) vs not likely (very unlikely, unlikely, or don't know).As previously mentioned, the Key Concepts are reviewed and revised annually.Over the duration of this study, two Key Concepts were revised into the new Key Concept 1.1a.Considering that the MCQs were no longer appropriately covering this concept, these MCQS were therefore taken out of the analysis.
Based on a previous survey we conducted 32 , we anticipated that women and respondents with higher education levels would be more likely to respond.To address such non-random non-response, we used iterative post-stratification to match marginal distributions of sex and educational attainment level of the sample to the Norwegian population 45 .We also adjusted for region of residence.Due to an error, we did not collect data on participants' age and could not use it for post-stratification, as planned.
We obtained Eurostat data on the marginal distributions of sex and region of residence for all Norwegians, and educational attainment level for Norwegians aged 15 to 64 years 46,47 .
Participants reported their county of residence.We mapped counties to the corresponding Nomenclature of Territorial Units for Statistics (NUTS 2) regions of Norway used by Eurostat 48 .
Participants reported the highest level of education they attained.We mapped these to the corresponding International Standard Classification of Education (ISCED) 2011 categories used by Eurostat (levels 0-2, 3-4, and 5-8).We used multiple imputation with chained equations to account for the uncertainty introduced by missing values of the post-stratification variables sex, region of residence, and educational level 49 , and iteratively post-stratified each imputed data set.Based on these adjustments, we estimated the percentage of the Norwegian population that understands each key concept, and that responded positively for each question about intended behaviours and self-efficacy.We used the R packages tidyverse, mice, mitools, and survey.
We present summaries of the results for participants and the post-stratified population estimates for understanding of the 30 Key Concepts, and the intended behaviours and self-efficacy questions.We quantified the uncertainty of our estimates using 95% confidence intervals and protected the family-wise coverage probability of the confidence intervals for the four Key Concepts included in all questionnaires via Bonferronicorrection (i.e., we report a 98.75% CI for each of those concepts).For each Key Concept, we calculated the likelihood of answering both questions (or the one question for two Key Concepts) correctly if participants randomly guessed the answer.These probabilities vary between six and 25%, depending on the number of MCQs and the number of response options (between two and four) for each MCQ.
We compared Norwegian and Ugandan adults' understanding of the Key Concepts to that of Ugandan children in the intervention arm of our randomized trial.We compared mean test scores, and probabilities of achieving passing and mastery scores, using estimates from one year after the IHC primary school intervention for the Ugandan children and their teachers 50,51 and estimates from the control group for parents in the IHC podcast trial, one year after parents in the intervention group listened to the podcast 51,52 .We used predetermined thresholds of at least 13 of 24 questions answered correctly for a passing score and at least 20 of 24 for mastery.
We estimated mean scores and odds ratios using generalized linear mixed-effects models (GLMMs; normal errors and identity link for mean scores, binomial errors and logit link for passing and mastery) using the lme4 R package.In the trials, children and teachers were randomized in clusters (schools), while parents were individually randomized.We modelled this clustering structure as a random intercept for each randomized unit.We did not adjust for covariates such as those used for stratified sampling in the Ugandan trials because those variables are not defined for all samples (e.g., school ownership was used in the trial on Ugandan children, but there is no analogous concept for Norwegian adults).No data were missing.It is not possible to use the lme4 package to apply post-stratification weights for the Norwegians.

Exploratory analyses
We conducted exploratory analyses to investigate associations between understanding each Key Concept and the demographic covariates sex, research training, research participation, education level, and health professional background.Based on findings from our previous survey 32 , we hypothesised that better understanding of the Key Concepts would be associated with having a research background or a higher education level; that there would be little difference between health professionals and others; and that there would be little difference between women and men.We used generalized linear models (GLMs; quasibinomial errors and logit link) as before, modelled the covariates as categorical variables, and used multiple imputation and post-stratification as in the main analysis.
We used data from the first questionnaire to perform exploratory analyses to investigate how Norwegians' mean scores and achievement of passing and mastery scores are associated with the demographic covariates sex, research training, research participation, education, and health professional background.We used GLMs (normal errors and identity link for mean score; quasibinomial errors and logit link for passing and mastery) to model the outcomes in terms of the covariates, which were modelled as before.Multiple imputation and post-stratification were used as in the main analysis.The variables included in imputation were the post-stratification variables (sex, region of residence, and educational attainment); demographic variables that coded for whether participants had research training or a health professional background, and whether they had been a research participant; and mean score.We did not include passing and mastery in imputation, as they can be calculated from the mean score.

Sample size calculation
We performed three power analyses.The first two analysed power to estimate differences, of at least 5% from random guessing, in the proportion of the Norwegian population that understands the concepts.The first analysis focused on the four Key Concepts that would be included on all four questionnaires while protecting the 95% coverage of the four confidence intervals as a family, and the second analysis focused on a single prototypical Key Concept that would be included on only one questionnaire.We assumed that each concept would be probed by two MCQs, each having four options, such that we would expect that 6.25% of Norwegians would appear to "understand" a concept if all participants guessed at random.We further calculated confidence interval widths, aiming to estimate proportions with precision no worse than ±0.05 (i.e., ±5%) for those four Key Concepts and no worse than ±0.08 (i.e., ±8%) for the other Key Concepts.
The third analysis estimated power to detect a difference of at least 10% in the mean score (proportion of correct answers) between Norwegians and Ugandan children, parents, and teachers.Finally, drawing on experience with our previous survey, we explored a range of response rate scenarios.
These analyses suggested that sending a total of 4500 postcards would be sufficient to achieve a margin of error no greater than ±5% and ±8% for the first two analyses, and approximately 97% power to detect differences greater than 10% in the mean score compared to results from the Ugandan trials.

Participant characteristics
Table 2 shows participant characteristics across the four questionnaires 53 .Of the 771 respondents, 21 (2.7%) did not provide data on at least one of the covariates sex, research training, research participation, education, health professional background, and county of residence.

The validity and reliability of the questionnaires
The log describing the Rasch analysis is described in more detail in Additional file 2 (Extended data) 54 .Overall, the four questionnaires showed acceptable fit to the Rasch model.Three out of four questionnaires were found to be unidimensional.The questionnaires including intended behaviour and self-efficacy items suggested that this sub-scale may include more than one dimension.Separate analyses of the two item-sets in Questionnaire 3 (Key Concept MCQs and intended behaviour and self-efficacy items) suggest that the MCQ sub-test works very well, with no apparent validity issues.Overall, the analysis of the sub-test consisting of items evaluating intended behaviour and self-efficacy items also shows satisfactory fit.However, two of the intended behaviour items show disordered thresholds.We did not observe any important local dependency in Questionnaire 2, 3 and 4.However, in Questionnaire 1, twelve correlations was observed.Two questionnaires were underpowered for this analysis (Questionnaire 1 and 4).However, based on the available data, most MCQs in all questionnaires showed acceptable fit to the ICC-curve, and no important differential item functioning by gender was identified in any of the questionnaires.Five MCQs showed evidence of under discriminating with one ability class deviating: two MCQs in Questionnaire 1 (q11 and q19.1), two MCQs in Questionnaire 2 (q3 and q18.3), and one MCQ in Questionnaire 4. In the two questionnaires that were adequately powered, Cronbach's alpha was 0.6 and 0.7, respectively.

Norwegians' understanding of the Key Concepts
The mean width of the confidence intervals for the four concepts common to all questionnaires was ±4.6% (range ±3.6% to ±5.9%).One concept (An outcome may be associated with a treatment but not caused by it) was not estimated to the desired precision.The mean width of the other confidence intervals was ±7.92% (range ±3.37% to ±11.9%; 15 of the 30 concepts had confidence intervals wider than ±8%).
Figure 2 shows estimates of the percentage of Norwegian adults who understand each Key Concept.The adjusted proportion of Norwegian adults who answered correctly five of the Key Concepts was > 80% and more than 50% understand 17 concepts.On the other hand, less than half answered correctly for 13 concepts, and for seven of those concepts, the proportions of correct answers were no better than if people had randomly guessed.

Intended behaviours and self-efficacy
Intended behaviours and self-efficacy are presented in Figure 3.

Comparison of Norwegian and Ugandan adults to Ugandan children
The results for Norwegian and Ugandan adults are compared to results for Ugandan children in Table 3 to Table 5.The mean score for Norwegian adults who participated in the survey (Questionnaire 1) was 17% higher (95% CI 14-20%) than the mean score for Ugandan children in the intervention arm of the randomised trial of the IHC primary school resources one year after the intervention and similar to the mean score for Ugandan teachers in the intervention arm of the trial (Table 3) 23 .
The scores for Ugandan teachers in the control group were similar to those of the children in the intervention group, and the scores of Ugandan parents in the control group of the IHC podcast trial 24 were 16% lower (95% CI -19 to -13%).
Nearly all the Norwegian adults (209 out of 210) who participated in the survey and Ugandan teachers in the intervention group (77 out of 78) had a passing score (Table 4).This was 15% more than the Ugandan children in the intervention group.However, because only one Norwegian and one Ugandan teacher in the intervention group did not have a passing score, the estimated differences may be unreliable for those comparisons.The proportion of Ugandan teachers in the control group with a passing score (88%) was similar to that of the children (84%), and 47% fewer (95% CI -56% to -37%).
Three quarters (66%; 95% CI 56-74%) of the Norwegian adults who participated in the survey and 70% (95% CI 59-80%) of the Ugandan teachers in the intervention group had a mastery score (Table 5).Compared to Ugandan children in the intervention group, the Odds for achieving a mastery score by the Norwegian participants was 5.3 (95% CI 3.5-7.9%),whereas the Odds for achieving a mastery score was 0.24 (95% CI 0.15-0.4)for the Ugandan parents compared to the children.

Associations between participant characteristics and their responses
We did not find strong associations between gender, health professional background, or having participated in research and how well participants did (Table 6).Having 1-2 years of tertiary school education (ISCED levels 5-8) was associated with a mean score that was 11% higher (95% CI 4.0-19%), and a higher likelihood of having a mastery score (OR 3.5; 95% CI 1 to 12).Norwegian men were more likely than women to state that they are likely to find out the basis of treatment claims (OR 3.0; 95% CI 1.1 to 8.2), and more likely to find it easy to assess the trustworthiness of the results of studies that compare treatments (OR 3.8; 95% CI 1 to 14).People who have participated in research may be more likely to find it easy to find research based on studies that compare treatments (OR 5.7; 95% CI 1.9 to 17).People with secondary school education (ISCED levels 3-4) may be less likely than people with no more than primary school education (ISCED levels 0-2) to find it easy to assess the trustworthiness of results of studies that compare treatments (OR 0.057; 95% CI 0.0049 to 0.68).Besides these associations, we did not find evidence suggesting that gender, education level, health professional background, or prior participation in research were associated with intended behaviours or self-efficacy.Associations between participant characteristics and their understanding of specific Key Concepts, intended behaviour and self-efficacy are reported in Additional file 3 (Extended data) 55 .

Discussion
According to Statistics Norway, 24.7% of Norwegian adults have a primary school education, 41.7% secondary school education, 22.4% tertiary school education, 7.3% have a master's degree, and 0.7% of the population have a PhD.As we anticipated, participants in our study had a somewhat higher educational level than the general population 56 .In Norway, approximately 18% of the population (15-74) have an educational background in health or welfare 57 .This is comparable to participants in this study.
Participants in our survey had a good understanding of the Key Concepts that were addressed in our randomised trial of the IHC primary school intervention 23 .Their understanding was comparable to Ugandan teachers in the intervention arm of our randomized trial of a primary school intervention, and better than that of Ugandan children in the intervention arm  of that trial, and teachers in the control group.It was also better than parents in the control group of our randomised trial of an educational podcast for parents of primary school children in Uganda 24 .
We estimate that more than 80% of Norwegian adults understand these five concepts: • Increasing the amount of a treatment does not necessarily increase its benefits and may cause harm.
• Competing interests may result in misleading claims.
• Personal experiences or anecdotes alone are an unreliable basis for most claims.• The people being compared should be cared for similarly apart from the treatments being studied.
• Weigh the benefits and savings against the harms and costs of acting or not.
On the other hand, Norwegian adults appear to do no better than if they were to randomly guess the answers to questions about these seven Key Concepts: • Beliefs alone about how treatments work are not reliable predictors of the presence or size of effects.
• Widely used treatments or those that have been used for decades are not necessarily beneficial or safe.
• Comparison groups should be as similar as possible.
• People's outcomes should be counted in the group to which they were allocated.
• Results for a selected group of people within a study can be misleading.
• Deeming results to be "statistically significant" or "nonsignificant" can be misleading.
Considering that people who responded to our survey had a somewhat higher educational level than the general population, the ability of our respondents may be higher than in the general population 56 .
Based on self-report, most Norwegians are likely to find out what the basis of a treatment claim is and to find out if a treatment claim is based on research.However, they do not consider it easy to assess the relevance of a claim, whether it is based on research, or its trustworthiness.They also do not consider it is easy to find relevant studies.

Comparison with other studies
In the previous Norwegian study, four Key Concepts were evaluated 32 .In both studies the majority responded correctly to the question "Weighing the benefits and harms".However, in the present study, the respondents were less likely to respond correctly to "Relative effects of treatments can be misleading" (29% vs 64%) and "The use of p-values may be misleading" (19% vs 51.5%).In contrast, more responded correctly to "An outcome may be associate with a treatment but not caused by it" (64% vs 30%).It is difficult to interpret these differences, other than that these evaluations were done using different question sets.Consequently, one explanation may be difference in the questions' difficulty level.The gender distribution in the studies were identical, however a higher percentage in the present study had at least one year of education beyond secondary school (68% vs 52%).We did not find gender or having a health professional background to be a good predictor of participants' understanding of the Key Concepts in any of studies.This is consistent with other studies and findings that both health professionals and patients feel challenged finding, appraising, and applying relevant evidence for use in health decisions [58][59][60][61][62][63] .There are few other studies to which we can compare these results 6 .The European health literacy survey included all domains of health literacy, responses were given as self-assessments, and results are reported as overall scores 25 .
Consequently, comparison with our study is difficult.They concluded that a little more than one tenth (12%) of those surveyed had insufficient health literacy and almost one half (47%) had limited (insufficient or problematic) health literacy.Both our survey and the European health literacy survey examined associations with education and, unsurprisingly, found that people with higher education have higher health literacy skills.However, the European health literacy survey reported that this result varies by country.

Strengths and limitations
The strength of this study is that we provide new evidence on people's ability to assess treatment claims using multiple-choice questions.To our knowledge, few such objective surveys have been conducted in Europe.Our work complements studies in which participants assessed their own abilities.
Many conceptualisations of critical thinking exist.Not all are based on explicit criteria, and unlike the Key Concepts used in this survey, few if any are subject to annual revision 64 .The questionnaires we administrated were based on a framework that has been developed from methodological literature and input from multiple disciplines and people with methodological expertise 65 .
We validated the questionnaires using robust methods.Although the results of the Rasch analysis are promising (Additional file 2, Extended data) 54 , it suggests the potential for improvements.Across all four questionnaires, we found that only five MCQs warranted improvement.However, Rasch analysis of two of the questionnaires was underpowered, so the validity and reliability of these should be assessed again in future studies.
Our analysis of the questionnaire including intended behaviours and self-efficacy suggests that intended behaviour and selfefficacy items measure two different dimensions.This might be because intended behaviour is more complex and dependent on self-efficacy, values, knowledge, and other factors.Except for Questionnaire 1, we did not identify any important dependencies between items.
We did not collect information on age for this study, and thus the association between age and ability to assess treatment claims should be explored in future studies.In our previous study conducted in Norway, younger respondents had a higher proportion of correct responses and higher total scores.This may suggest that there may have been improvements over time in the ability of both health professionals and patients to assess treatment claims 32 .
Another possible limitation of our study is the low response rate, which we anticipated.Our sample was similar to the general population in terms of the percentage of health professionals, but people with a higher education were over-represented among respondents.We addressed this issue using prespecified post-stratification in the survey analyses but did not address this in the comparisons to Ugandans.The results of those analyses therefore cannot be assumed to apply to the general population of Norwegian adults.

Implications and future research
The Health professionals and others who communicate health information should be aware that patients may not be able to think critically about treatment claims and may therefore struggle to process information necessary to making informed decisions.
Our Rasch analysis suggests that the MCQs can be used in Norway.Most MCQs performed well, and this evaluation is the first step in developing a calibrated item bank that can be used for Computer Assisted Testing.The results of our Rasch analysis can be used to improve the multiple-choice questions that did not perform well.

Conclusion
Norwegian adults' understanding of Key Concepts for assessing treatment claims and making choices varies from over 80% who understand five Key Concepts to 20% or less who understand seven Key Concepts.We did not find strong evidence that gender, being a health professional, or having participated in research are associated with the ability to assess treatment claims, intended behaviours, or self-efficacy.We did find evidence that people with higher education have a better understanding of Key Concepts.
Understanding the need for systematic reviews of fair comparisons as the basis for trustworthy treatment claims has the potential to reduce waste and harm from trusting and acting on misleading claims, and from not trusting and acting on reliable claims.Future interventions should be tailored to address these essential Key Concepts as well as other Key Concepts for which there appears to be a lack of understanding.This project contains the following underlying data:

Title:
The title of your paper could be amended to more accurately reflect your study.The use of the word 'claims' in a health care setting can very easily be confused with health insurance claims!This will then need to follow through your entire manuscript.Perhaps consider using the word 'statement' rather than 'claims' so your title might be something like the following: The Norwegian public's ability to assess statements about health treatments: …

Abstract:
For the above reason the Background section of the Abstract needs to be rephrased.

Research Design:
Surrounding the research design, the rationale for comparing Uganda with Norway is not clearly explained to the reader.Why are Uganda and Norway comparable?A justification is necessary.

Methods and Analysis:
While a rationale can be understood for a comparison of adults in one country with parents and teachers in another country (all are adults), the rationale for comparing adults in one country with primary school children in another country remains a puzzle.This is so especially because the quizzes collect data on education level attained and so can be controlled for.Again this needs to be explained more clearly to the reader.

○
The structure of the Methods section is difficult for the reader to follow.For example the methods section begins with explaining through Table 1 the three 'Health Choices Key Concepts' of 'Claims, Comparisons and Choices' and then moves on to the four 'concepts' immediately following Box 1.The use of the identical term 'concept' to describe both of the above is very confusing to the reader.It is not clear how the three 'Health Choices Key Concepts' link or map to these four 'concepts'.Furthermore the label of 'concepts' does not seem to be accurate for the four 'concepts'.Are these concepts?Perhaps 'contentions' is a more accurate label.This entire section needs to be redrafted.
It is not clear from the manuscript if the quizzes went through a plain English audit process.This should be stated.

○
For an English journal publication, it might be best to move the English version of the quizzes to be Additional File 1, rather than the Norwegian version.

Conclusions:
The conclusion stated in the Abstract 'This can result in poorly informed decisions, underuse of effective interventions, and overuse of ineffective or harmful interventions' does not align with the conclusions at the end of the manuscript.Indeed it is difficult to see how these conclusions set out in the Abstract can be made from this particular study.

○
Two typing errors were noted: Ethical Statement, second sentence 1.
Page 13 should read 'including a statement where people were guaranteed…' 2.
Thank you for a very interesting piece of research and I wish you every success enhancing your paper.

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate?I cannot comment.A qualified statistician is required.

Are the conclusions drawn adequately supported by the results? Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Health Literacy, Value measurement in health care I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
included at least four multiple-choice questions for 32 Key Concepts.From this item pool four questionnaires were designed and randomly assigned to the participants.771 out of 4500 invited Norwegian adults responded.Rasch analysis were performed to evaluate the psychometric properties, the percentage of the Norwegian population that understands each key concept was calculated, the results were compared to the results of a previous study in Uganda and analyses were conducted to explore associations between understanding of the key concepts and demographic characteristics.The results are reported in detail.The authors' discussed strength and limitations of their work and provided implications for practice and future research.
I have only few comments to optimize the manuscript: The description of the results in the abstract and on page 7 last paragraph is confusing.Maybe there better way providing the results than using "…more than half understood x..." and "…less than half…"?○ Page 6 in the paragraph "Rasch analysis": There is a repetition of the sentence "Rasch analysis can be used for both dichotomous and polytomous data."

○
The authors used the STROBE checklist.Looking at reporting guidelines for online surveys (e.g.CHERRIES, Eysenbach 2004), some aspect are not entirely addressed such as the number of items in each questionnaire, number of screens / pages, completion rate (besides covariates) or the handling of incomplete questionnaires.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound?Yes

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate?I cannot comment.A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: The reviewer is involved in the translation and validation of the IHC Claim evaluation tool for its use in German speaking countries.
Reviewer Expertise: evidence-based health information, informed decision-making, trainings in evidence-based medicine I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Author Response 14 Jul 2021

Astrid Dahlgren
Thank you for considering our paper and for this useful feedback.
We have revised the manuscript and considered all suggested revisions carefully.
Our revisions and comments are provided below: The description of the results in the abstract and on page 7 last paragraph is confusing.Maybe there better way providing the results than using "…more than half understood x..." and "…less than half…"? 1.
Our response: these sections have now been revised accordingly to improve clarity.Page 6 in the paragraph "Rasch analysis": There is a repetition of the sentence "Rasch analysis can be used for both dichotomous and polytomous data." 1.
Our response: typo is corrected.
The authors used the STROBE checklist.Looking at reporting guidelines for online surveys (e.g.CHERRIES, Eysenbach 2004), some aspect are not entirely addressed such as the number of items in each questionnaire, number of screens / pages, completion rate (besides covariates) or the handling of incomplete questionnaires.

1.
Our response: a description of the number of items per questionnaire has now been added to the description of the questionnaires in the methods section.Responses were mandatory in the software we used and thus all questionnaires were complete (thus it was not necessary to handle any incomplete questionnaires) For missing values on specific questions, we used multiple imputation with chained equations to account for the uncertainty introduced by missing values of the poststratification variables sex, region of residence, and educational level.This is stated under the subheading "Survey analysis" The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com

1. 1
It should not be assumed that treatments are safe or effective -or that they are not.a) Treatments can cause harms as well as benefits.b) Large, dramatic effects are rare.c) It is rarely possible to be certain about the effects of treatments.1.2 Seemingly logical assumptions are not a sufficient basis for claims.a) Treatment may not be needed.b) Beliefs alone about how treatments work are not reliable predictors of the presence or size of effects.c) Assumptions that fair comparisons are not relevant can be misleading.d) An outcome may be associated with a treatment but not caused by it.e) More data is not necessarily better data.f) Identifying effects of treatments depends on making comparisons.

1. 3
Trust in a source alone is not a sufficient basis for believing a claim.a) Your existing beliefs may be wrong.b) Competing interests may result in misleading claims.c) Personal experiences or anecdotes alone are an unreliable basis for most claims.d) Opinions alone are not a reliable basis for claims.e) Peer review and publication by a journal do not guarantee that comparisons have been fair.

2. 1
Comparisons of treatments should be fair.a) Comparison groups should be as similar as possible.b) Indirect comparisons of treatments across different studies can be misleading.c) The people being compared should be cared for similarly apart from the treatments being studied.d) If possible, people should not know which of the treatments being compared they are receiving.e) Outcomes should be assessed in the same way in all the groups being compared.

2 . 3
need to be reliable.a) Reviews of studies comparing treatments should use systematic methods.b) Failure to consider unpublished results of fair comparisons may result in estimates of effects that are misleading.c) Treatment claims based on models may be sensitive to underlying assumptions.Descriptions should clearly reflect the size of effects and the risk of being misled by the play of chance.a) Verbal descriptions of the size of effects alone can be misleading.b) Relative effects of treatments alone can be misleading.c) Average differences between treatments can be misleading.d) Small studies may be misleading.e)Results for a selected group of people within a study can be misleading.

3. 1
Problems and options should be clear.a) Be clear about what the problem or goal is and what the options are.

3. 3
Expected advantages should outweigh expected disadvantages.a) Weigh the benefits and savings against the harms and costs of acting or not.b) Consider the baseline risk or the severity of the symptoms when estimating the size of expected effects.c) Consider how important each advantage and disadvantage is when weighing the pros and cons.d) Consider how certain you can be about each advantage and disadvantage.e) Important uncertainties about the effects of treatments should be reduced by further fair comparisons.

Figure 1 .
Figure 1.The postcard used for recruitment.

Figure 2 .
Figure 2. Estimates of the percentage of Norwegian adults who understand each Key Concept.

Figure 3 .
Figure 3.The Norwegian population's intended behaviours and self-efficacy.

Table 6 . Associations between demographic covariates and Norwegians' mean scores and achievement of passing and mastery scores.
results of this study can inform the development and evaluation of educational interventions that address Key Concepts that Norwegians appear to poorly understand.Up to now, few such interventions have been evaluated.There is a need to evaluate interventions for health professionals as well as for the general public to help ensure that they can think critically about treatment claims and choices.The results also can inform the development and evaluation or strategies for improving communication of information about the effects of treatments by researchers, health professionals, and others.Studies like this one in other countries would help to map similarities and differences in people's abilities across different countries and settings.Such information could help to determine the extent to which interventions should be tailored to address different Key Concepts for different populations.

Is the work clearly and accurately presented and does it cite the current literature? No Is the study design appropriate and is the work technically sound? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes If applicable, is the statistical analysis and its interpretation appropriate
? I cannot comment.A qualified statistician is required.

Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. Version 1
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.College of Business and Geary Institute for Public Policy, University College Dublin, Dublin, Ireland 2 College of Business and Geary Institute for Public Policy, University College Dublin, Dublin, Ireland Thank you for a very innovative research study around critical health literacy competencies and a comparative study across two countries.The educational intervention at primary school level and the podcast for parents are novel.Below are some suggestions that may help to improve your manuscript: https://doi.org/10.5256/f1000research.24149.r84745© 2021 Doyle G.