<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.132052.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Critical thinking about treatment effects in Eastern Africa: development and Rasch analysis of an assessment tool</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Dahlgren</surname>
                        <given-names>Astrid</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-6377-3321</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Semakula</surname>
                        <given-names>Daniel</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0806-213X</uri>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Chesire</surname>
                        <given-names>Faith</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Mugisha</surname>
                        <given-names>Michael</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Nakyejwe</surname>
                        <given-names>Esther</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Nsangi</surname>
                        <given-names>Allen</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8702-9217</uri>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Nyirazinyoye</surname>
                        <given-names>Laetitia</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ochieng</surname>
                        <given-names>Marlyn A.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0455-8517</uri>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Oxman</surname>
                        <given-names>Andrew David</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ssenyonga</surname>
                        <given-names>Ronald</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Simbi</surname>
                        <given-names>Clarisse Marie Claudine</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Norwegian Institute of Public Health, Oslo, Norway</aff>
                <aff id="a2">
                    <label>2</label>Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway</aff>
                <aff id="a3">
                    <label>3</label>Department of Medicine, College of Health Sciences, Makerere University, Kampala, Uganda</aff>
                <aff id="a4">
                    <label>4</label>Tropical Institute of Community Health and Development, Kisumu, Kenya</aff>
                <aff id="a5">
                    <label>5</label>School of Public Health, College of Medicine and Health Sciences, University of Rwanda, Kigali, Rwanda</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:astridad@gmail.com">astridad@gmail.com</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>26</day>
                <month>7</month>
                <year>2023</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2023</year>
            </pub-date>
            <volume>12</volume>
            <elocation-id>887</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>23</day>
                    <month>3</month>
                    <year>2023</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Dahlgren A et al.</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/12-887/pdf"/>
            <abstract>
                <p>
                    <bold>Background:</bold> Every day we are faced with different treatment claims, in the news, in social media, and by our family and friends. Some of these claims are true, but many are unsubstantiated. Without being supported by reliable evidence such guidance can lead to waste and harmful health choices. The Informed Health Choices (IHC) Network facilitates development of interventions for teaching children and adults the ability to assess treatment claims (informedhealthchoices.org). Our objective was to develop and evaluate a new assessment tool developed from the item bank for use in an upcoming trial of lower secondary school resources in Uganda, Kenya, and Rwanda.</p>
                <p>
                    <bold>Methods:</bold> A cross-sectional study evaluating a questionnaire including two item-sets was used. The first evaluated ability using multiple-choice questions (scored dichotomously) and the other evaluated intended behaviour and self-efficacy (measured using Likert scales). This study was conducted in Uganda, Kenya, and Rwanda in 2021. We recruited children (over 12 years old) and adults through schools and our networks. We entered 1,671 responses into our analysis. Summary and individual fit to the Rasch model (including Cronbach&#x2019;s Alpha) were assessed using the RUMM2030 software.</p>
                <p>
                    <bold>Results</bold>: Both item-sets were found to have good fit to the Rasch model and were acceptable to our target audience. The reliability was good (Cronbach&#x2019;s alpha &gt;0.7). Observations of the individual item and person fit provided us with guidance on how we could improve the design, scoring, and administration of the two item-sets. There was no local dependency in either of the item-sets, and both item-sets were found to have acceptable unidimensionality.</p>
                <p>
                    <bold>Conclusion</bold>: To our knowledge, this is the first instrument validated for measuring ability to assess treatment claims in Uganda, Kenya and Rwanda. Overall, the two item-sets were found to have satisfactory measurement properties.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>health literacy</kwd>
                <kwd>Rasch analysis</kwd>
                <kwd>critical thinking</kwd>
                <kwd>informed choice</kwd>
                <kwd>evidence-based practice</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>The Research Council of Norway</funding-source>
                    <award-id>Projectnumber284683</award-id>
                    <award-id>grantnumber69006</award-id>
                </award-group>
                <funding-statement>This study was funded by the Research Council of Norway (Project number 284683, grant number 69006).</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <p>Every day we are faced with different treatment claims, in the news, in social media, and by our family and friends. Some of these claims are true, but many are unsubstantiated.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> Without being supported by reliable evidence such guidance can lead to waste and harmful health choices.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> Thus, improving people&#x2019;s ability to assess whether treatment claims are based on reliable evidence may lead to better health outcomes. The spread of misinformation during the Covid-19 pandemic has further emphasized the importance of promoting critical thinking and science literacy as a public health initiative.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup>
            </p>
            <p>The Informed Health Choices (IHC) Network facilitates development of interventions for teaching children and adults the ability to assess treatment claims (
                <ext-link ext-link-type="uri" xlink:href="http://informedhealthchoices.org">informedhealthchoices.org</ext-link>). We have developed a list of Key Concepts that people need to know to be able to assess claims about treatment effects.
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup> By &#x2018;treatment&#x2019; we refer to any intervention (action) intended to improve health, including preventive, therapeutic, and rehabilitative interventions, and public health or health system interventions. In two recent randomized trials in Uganda, we found that primary school children and their parents could be taught to apply these concepts.
                <sup>
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>
                </sup> Currently we are preparing for a new trial in Kenya, Rwanda, and Uganda to evaluate a set of educational resources for lower secondary schools (the IHC secondary school resources).</p>
            <p>The Claim Evaluation Tools item bank was first developed for use in the abovementioned trials in Uganda, evaluating learning outcomes in primary school children and their parents.
                <sup>
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>
                </sup> We also developed the item bank so that it could be used as a flexible resource for teachers and researchers, enabling them to design their own instrument for their own purposes.
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup> The item bank can be used for creating tests in schools (including higher education) and for research purposes in, for example, surveys and randomized trials.</p>
            <p>Since it was first developed, the item bank has been periodically revised to reflect changes we have made to the Key Concepts list. Since our first trials in Uganda, researchers have developed instruments using items from the item bank in other contexts, including China, Mexico, and Norway.
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref14">14</xref>
                </sup> Other studies are underway in Croatia and the USA. Currently, the item bank includes more than 200 items, with three to four multiple-choice questions (MCQs) available for assessing knowledge and the ability to apply each concept in the list. The item bank also includes a sample of literacy questions for use in contexts where reading ability may be a barrier for responding to the MCQs. It also includes items for assessing people&#x2019;s intended behaviours and self-efficacy (scored on 5-point Likert scales). All items are written in plain language and are suitable for both children and adults.</p>
            <p>In the present study, our objective was to develop and evaluate the psychometric properties of a new assessment tool developed from the item bank for use in Uganda, Kenya, and Rwanda. This outcome measure will be used in randomised trials of the IHC lower secondary school resources.</p>
        </sec>
        <sec id="sec2" sec-type="methods">
            <title>Methods</title>
            <p>Below we describe how we designed the questionnaire, how it was administered, and how we analysed and report the data. The protocol and underlying data for this study has been published.
                <sup>
                    <xref ref-type="bibr" rid="ref34">34</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref35">35</xref>
                </sup>
            </p>
            <sec id="sec3">
                <title>Designing the questionnaire</title>
                <p>For this study we included both ability items and the items measuring intended behaviour and self-efficacy.</p>
                <p>We planned on removing MCQs with sub-optimal measurement properties based on the results of this study. Therefore, we included more MCQs than we plan to use in the trial (two MCQs per Key Concept). The educational intervention we will evaluate in the randomised trials addresses nine Key Concepts (
                    <xref ref-type="table" rid="T1">Table 1</xref>). For each of those concepts, we included three MCQs in the questionnaire, a total of 27 MCQs assessing ability. All MCQs included 3 response options.</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>Table 1. </label>
                    <caption>
                        <title>Key Concepts included as learning goals in the Informed Health Choices (IHC) lower secondary school learning resources.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Higher level concepts</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Included Key Concepts</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="2" rowspan="1" valign="top">
                                    <bold>Claims</bold>
                                    <break/>Claims about effects that are not supported by evidence from fair comparisons are not necessarily wrong, but there is an insufficient basis for believing them.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="3" valign="top">Assumptions that treatments are safe or effective can be misleading.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <p>
                                        <list list-type="order">
                                            <list-item>
                                                <label>1.</label>
                                                <p>Do not assume that treatments are safe.</p>
                                            </list-item>
                                        </list>
                                    </p>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <p>
                                        <list list-type="order">
                                            <list-item>
                                                <label>2.</label>
                                                <p>Do not assume that treatments have large, dramatic effects.</p>
                                            </list-item>
                                        </list>
                                    </p>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <p>
                                        <list list-type="order">
                                            <list-item>
                                                <label>3.</label>
                                                <p>Do not assume that comparisons are not needed.</p>
                                            </list-item>
                                        </list>
                                    </p>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Trust based on the source of a claim alone can be misleading.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <p>
                                        <list list-type="order">
                                            <list-item>
                                                <label>4.</label>
                                                <p>Do not assume that personal experiences alone are sufficient.</p>
                                            </list-item>
                                        </list>
                                    </p>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="2" valign="top">Seemingly logical assumptions about treatments can be misleading.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <p>
                                        <list list-type="order">
                                            <list-item>
                                                <label>5.</label>
                                                <p>Do not assume that a treatment is better based on how new or technologically impressive it is.</p>
                                            </list-item>
                                        </list>
                                    </p>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <p>
                                        <list list-type="order">
                                            <list-item>
                                                <label>6.</label>
                                                <p>Do not assume that a treatment is helpful or safe based on how widely used it is or has been.</p>
                                            </list-item>
                                        </list>
                                    </p>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="2" rowspan="1" valign="top">
                                    <bold>Comparisons</bold>
                                    <break/>To identify treatment effects, studies should make fair comparisons, designed to minimize the risk of systematic errors (biases) and random errors (the play of chance).</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Comparisons of treatments should be fair.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <p>
                                        <list list-type="order">
                                            <list-item>
                                                <label>7.</label>
                                                <p>Consider whether the people being compared were similar.</p>
                                            </list-item>
                                        </list>
                                    </p>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Descriptions of effects should reflect the risk of being misled by the play of chance.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <p>
                                        <list list-type="order">
                                            <list-item>
                                                <label>8.</label>
                                                <p>Be cautious of small studies.</p>
                                            </list-item>
                                        </list>
                                    </p>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="2" rowspan="1" valign="top">
                                    <bold>Choices</bold>
                                    <break/>What to do depends on judgements about a problem, the relevance of the available evidence, and the balance of expected benefits, harms, and costs.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Expected advantages should outweigh expected disadvantages.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <p>
                                        <list list-type="order">
                                            <list-item>
                                                <label>9.</label>
                                                <p>Weigh the benefits and savings against the harms and costs of acting or not.</p>
                                            </list-item>
                                        </list>
                                    </p>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>We included three items that assess intended behaviour and four items that assess self-efficacy. The Likert scales include four response options ranging from very likely to very unlikely (intended behaviour) or very difficult to very easy (self-efficacy), and a fifth option: &#x2018;I don&#x2019;t know&#x2019;.</p>
                <p>In addition, we included demographic questions asking about gender, age, educational level, country of residence, training in research methods, and experience with participation in randomised trials. Gender, age, and country of residence were important for the psychometric analysis (testing for differential item functioning). The other background factors were used to ascertain that we were able to recruit people with a spread in ability level (ability to assess treatment claims). Level of education and familiarity with research methods have been shown to be associated with more correct answers.
                    <sup>
                        <xref ref-type="bibr" rid="ref14">14</xref>
                    </sup>
                </p>
                <p>In preparation for this study, we conducted cognitive interviews and piloted the questionnaire with individuals from our potential target groups in Uganda, Kenya, and Rwanda.
                    <sup>
                        <xref ref-type="bibr" rid="ref11">11</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref15">15</xref>
                    </sup> The objective was to get feedback from members of our target groups in the three contexts on the acceptability and relevance of the terminology and formats used in the questionnaire. Even though the items included in the Claim Evaluation Tools item bank have previously gone through an extensive development process in Uganda, we considered it important to get feedback from people in our target groups in Rwanda and Kenya, where the items had not been tested before.</p>
                <p>We recruited schools in May- August 2021 through the project&#x2019;s teacher networks. In the interviews the students were encouraged to think aloud about how they understood the scenarios and response options, and to identify any issues they had regarding comprehension of terminology or format. The researcher noted down all identified issues. All feedback was summarised by the lead investigators and the findings was discussed in the project group including the research teams in all three contexts.</p>
                <p>Piloting took place in a classroom setting. The purpose and instructions of the test was introduced to the students by a member of the research team in collaboration with the teacher, observations were made regarding time taken to complete the questionnaire and comprehension of the format (incorrectly filled in response options).</p>
                <p>Findings coming out of the interviews and pilots led to only minor changes, such as changing some of the names and other terminology used in the MCQs to improve familiarity in the two new contexts. We also changed the format of the intended behaviour and self-efficacy items from a traditional Likert-scale to resemble a multiple-choice format, keeping the same response options (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>).</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Example of an intended behaviour item.</title>
                    </caption>
                    <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/144949/f1fb1f29-8ab6-4c91-a74e-11e3d01c028d_figure1.gif"/>
                </fig>
                <p>We made that change because the Likert-scale format was unfamiliar to some of the students in the three contexts, and the MCQ format was more familiar and acceptable to the students. The pilot studies also provided us with information about the time needed to complete the questionnaire (between 30 and 60 minutes) and what we could expect in terms of missing responses in the upcoming trial.</p>
                <p>Previously, several tests have been developed from the claim evaluation tools item bank. The test developed for this study was named the Critical Thinking about Health test. A copy of the test evaluated in this study is available us extended data.
                    <sup>
                        <xref ref-type="bibr" rid="ref36">36</xref>
                    </sup>
                </p>
            </sec>
            <sec id="sec4">
                <title>Inclusion criteria</title>
                <p>There is no gold standard for the number of respondents needed for Rasch analysis. This is a pragmatic judgement considering the number of items evaluated and the statistical power needed to identify item bias resulting from background variables.
                    <sup>
                        <xref ref-type="bibr" rid="ref16">16</xref>
                    </sup>
                    <sup>&#x2013;</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref18">18</xref>
                    </sup> Rasch analysis does not require a representative sample. However, the sample should include enough people to allow for evaluating differential functioning and a spread in ability. Studies have found that a sample of 200-250 people per group is suitable for detecting differential item functioning (DIF).
                    <sup>
                        <xref ref-type="bibr" rid="ref19">19</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref20">20</xref>
                    </sup> We expected both item-sets to work in the same way for children and adults and to have no differential functioning by gender.
                    <sup>
                        <xref ref-type="bibr" rid="ref11">11</xref>
                    </sup> For this evaluation, we also needed a sample of people with different ability to assess treatment claims. There are few background variables that may predict ability to assess treatment claims, but higher education involving training in statistics or research methods may be a factor.
                    <sup>
                        <xref ref-type="bibr" rid="ref14">14</xref>
                    </sup> Consequently, we estimated that recruiting approximately 500 people in each country, with an equal distribution of men and women, and lower secondary school students and adults would be adequate (
                    <xref ref-type="table" rid="T2">Table 2</xref>). We also made sure to recruit people from higher education contexts, through the university networks in each context, as well as people in our local communities, social media, and students from schools participating in piloting of the educational intervention. We commenced data collection in July 2021 and was completed December in the same year.</p>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>Table 2. </label>
                    <caption>
                        <title>Overview of sample to be recruited.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Participants</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Kenya</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Rwanda</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Uganda</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Secondary school children</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&gt;250</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&gt;250</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&gt;250</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Adults</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&gt;250</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&gt;250</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&gt;250</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec id="sec5">
                <title>Recruitment</title>
                <p>All recruitment and data collection were done during lock down due to COVID-19, leading us to use varied strategies for recruiting our respondents.</p>
                <p>In Uganda we recruited participants using our networks there including teachers, students, and National advisory panel networks. For students, we used three strategies, including visiting students at their homes, reaching out through the student network, and also requested teachers who were conducting online revision classes to introduce us to their students via the platforms to introduce the project and share the questionnaire link via WhatsApp or Telegram (both media apps for communication) after obtaining consent. For adults, we recruited people with higher education qualifications through university platforms i.e., the University faculty platforms, a PhD forum which has over 40 PhD fellows, students studying medicine WhatsApp groups, and a teachers&#x2019; network WhatsApp group. However, for the local communities, we visited food and clothes markets and asked them to complete the questionnaires. All data collection was done in the central region (Kampala and Wakiso) and the northern region (in Gulu district) of Uganda.</p>
                <p>In Kenya we recruited students from three schools that participated in piloting the IHC secondary school resources. In those schools, we purposively included all the participants from one stream except those that had been selected for the pilot. Each school had about three-four classes and each class had about 40 students. For adults, we included the student&#x2019;s institution of tertiary education and members of the community with low education levels (secondary and below), and those that could read and owned a Smartphone. For the students, we purposively included students from two faculties (Health and Arts and Sciences). Through the Dean of students, we invited them to a meeting where we introduced the project, outcome measure and sought their verbal consent. We then shared the link to the test and asked them to log in and participate. For community members, we used our database to recruit members that were actively involved in the institute&#x2019;s previous and ongoing community-based projects in rural settings in Butere sub-County. Although we reached out to many members, only a few members responded thus we resorted to recruit more from the student&#x2019;s fraternity (pursuing diploma and certificate courses). We used a similar recruitment and consenting process described for the students above.</p>
                <p>In Rwanda, for adults, we used WhatsApp and recruited using the snowballing method through our networks, including the projects teachers&#x2019; network and students&#x2019; network in Rwanda. The teachers network included lower secondary school teachers who were from different schools, and they varied in terms of work experience, age, subject area and schools they teach from. Similarly, the students&#x2019; network included students from similar schools as members of teacher&#x2019;s network. They also varied in their age, sex, and history of school performance (high or low performing students). We also used emails and reached out to adults who work or previously worked with the school of public health researchers in Rwanda. We also engaged a teacher&#x2019;s network who also responded to the test. We recruited students through schools that participated in the development and pilot of the intervention in Kigali city and surrounding neighborhoods.</p>
            </sec>
            <sec id="sec6">
                <title>Data collection</title>
                <p>Most of the data collection was done online, using a service hosted by the University of Oslo (Nettskjema). One small sample (students in Kenya) used paper questionnaires in a classroom setting and administrated as an exam as part of pilot testing of the IHC secondary school resources. The test was administrated by a teacher under the instructions of the research team. The paper questionnaires were scanned and added to the data collected online.</p>
            </sec>
            <sec id="sec7">
                <title>Ethical statement</title>
                <p>Ethical approval was obtained from the relevant authorities in each country; Masinde Muliro University of Science and Technology, Institutional Ethics Review Committee (MMUST/IERC/75/19, License No: NACOSTI/P/21/8103) the Rwanda National Ethics Committee 916/RNEC/2019, School of Medicine Research Ethics Committee (REC REF 2020-139)/Uganda National Council of Science and Technology (HS916ES).</p>
                <p>All participants were given written information about the purpose of the study and that participation was voluntary, and how the findings would be used to improve the validity and reliability of the Critical Thinking about Health test. Children participating through their schools were also given oral information. We obtained written consent from all adult participants, the minor&#x2019;s guardians, and written assent from the minors.</p>
                <p>Since this was a knowledge test, just as a regular school exam, this study did not collect any personal or other sensitive information that could lead to identification of the respondents. None of the members of this project group had access to information that could identify individual participants during or after data collection.</p>
            </sec>
            <sec id="sec8">
                <title>Rasch analysis</title>
                <p>Rasch analysis is a dynamic way of developing measurement tools with construct validity.
                    <sup>
                        <xref ref-type="bibr" rid="ref14">14</xref>
                    </sup> The approach is used to address important measurement issues required for validating an outcome measure, including internal construct validity (by testing for unidimensionality), invariance of the items (item-person-interaction), and item bias (differential item function).
                    <sup>
                        <xref ref-type="bibr" rid="ref21">21</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup>
                </p>
                <p>We imported the data from Excel (version 2208) into RUMM2030 (
                    <ext-link ext-link-type="uri" xlink:href="https://www.rummlab.com.au/">https://www.rummlab.com.au/</ext-link>) and followed the basic steps of Rasch analysis as recommended in the literature.
                    <sup>
                        <xref ref-type="bibr" rid="ref21">21</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref23">23</xref>
                    </sup> R is a freely accessible software environment for statistical computing and graphics including Rasch analysis that can be used to run a similar analysis (
                    <ext-link ext-link-type="uri" xlink:href="https://www.r-project.org/">https://www.r-project.org/</ext-link>). We analysed the two item-sets separately based on the assumption that these measure different underling traits. The MCQs were scored dichotomously as correct or incorrect. We applied the polytomous model to the intended behaviour and self-efficacy items.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup> When entered into RUMM2030, missing data was coded as &#x201c;0&#x201d;.</p>
                <p>The first step in the analysis involved exploring the class interval structure (number and size of ability groups) and the summary statistics (person-Item distribution). In Rasch analysis, the ratio between any two items should be constant across different &#x2018;ability&#x2019; groups. The response patterns to an item-set is tested against what is expected by the model which is a probabilistic form of Guttman scaling.
                    <sup>
                        <xref ref-type="bibr" rid="ref21">21</xref>
                    </sup> In other words, the easier the item is, the more likely it will be &#x2018;passed&#x2019;, and the more able the person is the more likely he or she will pass.
                    <sup>
                        <xref ref-type="bibr" rid="ref21">21</xref>
                    </sup> We explored this relationship using the summary statistics function in RUMM2030.
                    <sup>
                        <xref ref-type="bibr" rid="ref23">23</xref>
                    </sup> In RUMM2030, the item-person interaction is presented on a logit scale, where the mean item location is &#x2018;0&#x2019;. If the instrument is a well-targeted measure (not too easy or too difficult), the mean location for individuals would be around the value of zero.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup> If the person location is higher than zero, this indicates that the test is easy, if the person location is lower than zero this indicates that the test is difficult. The item and person fit residual statistics assess the degree of divergence (or residual) between the expected and observed data for each person item when summed for all items and all individuals respectively for each test set.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup> In RUMM2030 this is reported as an approximate z-score, representing a standardized normal distribution.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup> Ideally, item and person fit should have a mean of zero and a standard deviation of one.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup>
                </p>
                <p>We calculated Cronbach&#x2019;s alpha to assess the reliability of both item-sets by removing missing data. A Cronbach&#x2019;s alpha above 0.7 was considered acceptable.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup>
                </p>
                <p>The principal component analysis/t-test protocol is used to test the hypothesis of unidimensionality. This is done by identifying the two most divergent item subsets (using the residual principal component function in RUMM2030), and then calculating t-tests.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup> If &#x2264;5% of tests are significant, strict unidimensionality can be inferred.
                    <sup>
                        <xref ref-type="bibr" rid="ref24">24</xref>
                    </sup> However, the concept of &#x2018;unidimensionality&#x2019; is not &#x2018;definite&#x2019; but relative and should be supplemented with quantitative or qualitative interpretation of the explicit variable definition and considering the context and purpose of the measurement.
                    <sup>
                        <xref ref-type="bibr" rid="ref24">24</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref25">25</xref>
                    </sup>
                </p>
                <p>We tested for local dependency by using the residual correlations function in RUMM2030. Data from this output was copied into Excel (version 2208) and any residual correlations greater than 0.2 above the average was considered as potential problematic dependency.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup>
                </p>
                <p>We identified individuals and items with &#x2018;misfit&#x2019; to the Rasch model by chi-square statistics and by exploring the fit residuals. Items with statistically significant chi-square probabilities do not fit the model at 0.01 significance level, items within a &#x00b1;2.5 fit residual range are considered to be potentially problematic.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup> Similarly, individuals with a fit residual of &#x00b1;2.5 were considered as not fitting the model. Such extreme values can be an indication of, for example, guessing or copying, and that the item-set is not appropriate.</p>
                <p>We examined differential item functioning (DIF) by age, gender, and country of residence. It was our objective to include only items that could be applied fairly across these demographic variables. Ideally, all items in the Claim Evaluation Tools item bank are expected to work in the same way for men and women, and across age groups. There are two types of DIF. Uniform DIF is when the difference between groups for an item is systematic - for example adults having systematically higher ability compared to lower secondary school students. This is less problematic (when it is known) than non-uniform DIF, where the difference between groups on an item is inconsistent across ability groups.
                    <sup>
                        <xref ref-type="bibr" rid="ref21">21</xref>
                    </sup> For this study, we considered non-uniform DIF as unacceptable. We predicted that we would find uniform DIF by country, as we know from other studies that there are differences in ability-by-concept across countries.
                    <sup>
                        <xref ref-type="bibr" rid="ref14">14</xref>
                    </sup> Uniform DIF by gender and age was unwanted but would be considered in relation to the other findings from the Rasch analysis. The reason for this was that the questionnaire will be used for measuring differences between an intervention and a comparison group, and systematic DIF would therefore not be a problem in our study.</p>
                <p>In the item characteristic curve plot the expected scores and the observed scores for the class intervals of the different ability levels are displayed. We observed the item characteristic curve for each item and made note of items that showed under-discrimination, over-discrimination, or had several deviating ability groups.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup> We considered items with under-discrimination and classic over-discrimination for removal. Marginal over-discrimination was not considered to be a problem for our purposes.</p>
                <p>For the polytomous items we explored the threshold ordering (fit to the expected logical order of the response options) to check for disordered thresholds. Disordered thresholds suggest that the scoring categories are not progressing as expected, and that the item is not working properly.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup>
                </p>
                <p>This study follows the STROBE-reporting standards.
                    <sup>
                        <xref ref-type="bibr" rid="ref38">38</xref>
                    </sup>
                </p>
            </sec>
        </sec>
        <sec id="sec9" sec-type="results">
            <title>Results</title>
            <sec id="sec10">
                <title>Summary statistics</title>
                <p>A total of 1,671 responses were entered into the analysis distributed across 10 ability groups identified by the RUMM2030 software of which 49% were women and 40% were young people (under 18). Of these, 35% were from Kenya, 34% from Uganda, and 31% from Rwanda. Missing data was minimal only 0.004%, and thus had no impact on the analysis.</p>
                <p>The person-item distribution shows that both item sets are well targeted (mean person location was -0.218 for the ability item set and 0.084 for the Likert item-set.</p>
                <p>For the ability items, the person fit residual was -0.204 (SD 0.741) and thus showed satisfactory fit to the model. The items&#x2019; fit residual was 0.712 (SD 2.235) and warranted further investigation in subsequent analyses.</p>
                <p>For the Likert items, the item fit residual was 0.543 (SD 0.938), indicating reasonable fit. However, the high standard deviation for the person fit residual (-0.546, SD 1.783) suggested some misfit to the model.</p>
                <p>Both item-sets were found to be reliable, with a Cronbach&#x2019;s alpha of 0.72 and 0.79 for the ability and Likert item-sets respectively.</p>
            </sec>
            <sec id="sec11">
                <title>Individual person and item fit</title>
                <p>In the analysis of the ability item-set, we identified one person with a highly negative fit residual (adult, female, Rwanda) and two with highly positive fit residuals (male, young person, Rwanda and adult, female, Rwanda). Of the 27 MCQs, three items had extreme negative values, and four items had extreme positive values.</p>
                <p>There were no items with extreme values in the Likert item-set. However, several misfitting persons were identified (296 individuals) with high negative residuals and two individuals with high positive residuals.</p>
            </sec>
            <sec id="sec12">
                <title>DIF and item characteristic curve-analyses</title>
                <p>The majority of the ability items had a good fit to the item characteristics curve (
                    <xref ref-type="fig" rid="f2">Figure 2</xref>). Four items showed evidence of classic overdiscrimination, of which two of these also had very high negative fit residuals (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>). Four items showed sign of classic underdiscrimination and were considered candidates for removal (
                    <xref ref-type="fig" rid="f4">Figure 4</xref>). Most Likert-items showed a good fit, although two items were slightly overdiscriminating, this was considered acceptable.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Example of ability-item with satisfactory fit.</title>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/144949/f1fb1f29-8ab6-4c91-a74e-11e3d01c028d_figure2.gif"/>
                </fig>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Example of ability-item with classic over-discrimination.</title>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/144949/f1fb1f29-8ab6-4c91-a74e-11e3d01c028d_figure3.gif"/>
                </fig>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Example of ability-item with classic under-discrimination.</title>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/144949/f1fb1f29-8ab6-4c91-a74e-11e3d01c028d_figure4.gif"/>
                </fig>
                <p>In the DIF analysis of the 27 ability items, two items showed uniform DIF by gender (one item where males did systematically better and one where females had higher ability). Three items showed DIF by age, of which two were uniform (one item where young people performed better and one item where adults had higher ability). One item had non-uniform DIF by age. Uniform DIF by country was found for 10 items, the ranking of the three countries differing across these items.</p>
                <p>There was no DIF by gender, age, or country in the analysis of the Likert item-set.</p>
                <p>In the Likert item-set, two items were found to be slightly over-discriminating and were therefore considered acceptable. The remaining items showed very good fit.</p>
                <p>When exploring the ordering of the thresholds, we found that the three Likert items evaluating intended behaviour were disorganized. A reanalysis of these suggest that these could be improved by dichotomising the response options. The four items evaluating self-efficacy showed a good fit.</p>
            </sec>
            <sec id="sec13">
                <title>Test of unidimensionality</title>
                <p>In the analysis of the ability item-set, 8% of the T-tests were significant.</p>
                <p>The magnitude of multidimensionality in Likert-items were found satisfactory at 5% and considered to be unidimensional.</p>
            </sec>
            <sec id="sec14">
                <title>Local dependency</title>
                <p>There were no item-pairs correlations above 0.2 of the average value in any of the item-sets, suggesting no important redundancy.</p>
            </sec>
            <sec id="sec15">
                <title>Revision of the questionnaire</title>
                <p>The outcome measure to be used in the final trial was reduced to include only two MCQs for each Key Concept to be assessed. We removed the ability-items with suboptimal fit. Since the Likert-items were all found to have good fit, these remained unchanged.</p>
                <p>The revised outcome measure has been published as extended data.
                    <sup>
                        <xref ref-type="bibr" rid="ref37">37</xref>
                    </sup>
                </p>
            </sec>
        </sec>
        <sec id="sec16" sec-type="discussion">
            <title>Discussion</title>
            <p>Overall, both item-sets were found to have good fit to the Rasch model and suitable for our target audience. The reliability of both item-sets was also good. Observations of the individual item and person fit provided us with guidance on how to improve the design and administration of the two item-sets.</p>
            <p>When observing each individual item&#x2019;s fit to the Rasch model in the ability item-set, we identified some items that could be removed to improve the questionnaire. Of 27 ability items, three had differential item functioning by age or gender of which only one of these were highly problematic (non-uniform). As expected, some items also showed differential item functioning by country. Possible explanations for this may be that there are differences in cultural beliefs or because there are differences in the curricula taught in schools. Considering that the differential item functioning by country was uniform and that we are planning to use the outcome measure in randomised trials comparing effects between comparison groups in each specific context, this was not considered to be a concern for our purposes. We also identified some items with poor measurement properties by observing the item characters curves. Taken together with the item showing non-uniform DIF, these were considered for removal from the final outcome measure to be used in our upcoming trial.</p>
            <p>In the analysis of the Likert item-set, two issues were identified that we needed to address. Three items measuring intended behaviour showed disordered response categories, furthermore we identified a high number of people with extreme values. This can be an indication that some of the respondents had difficulty answering these questions. As noted in the methods, we observed that some people in the studied contexts were unfamiliar with intended behaviour and self-efficacy questions. The results from this study suggested that we need to plan carefully for how this item-set is administered and ensure that people are adequately instructed about the format and purpose of these questions. The results also suggested that we should either redesign the attitude items so that the response options are dichotomized (with three response options instead of five) or dichotomise the answers by collapsing the response options in the analysis following the trial. We did the latter in the trial of the IHC primary school resources by combining likely (or difficult) and very likely, and combining unlikely, very unlikely, and &#x2018;don&#x2019;t know&#x2019;).
                <sup>
                    <xref ref-type="bibr" rid="ref26">26</xref>
                </sup>
            </p>
            <p>We found no important redundancy in the item-sets (dependency between item pairs), and both item-sets appear to measure only one underlying trait (unidimensionality). The ability item-set had a somewhat higher percentage of T-tests above the statistical threshold of 5%.
                <sup>
                    <xref ref-type="bibr" rid="ref24">24</xref>
                </sup> Considering that this is the first time we have observed this in one of the many Rasch analyses we have done on instruments developed from the Claim Evaluation Tools item bank, we considered the magnitude of unidimensionality observed in the ability item-set acceptable.
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref14">14</xref>
                </sup>
            </p>
            <p>The overabundance of unreliable treatment claims that accompanied the COVID-19 pandemic has highlighted the need for facilitating critical thinking as an important public health initiative.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> This is essential to protect people against unreliable treatment claims and enable them to make informed treatment choices.</p>
            <p>Health literacy is defined in many ways, but typically includes the ability to think critically (sometimes referred to as critical health literacy).
                <sup>
                    <xref ref-type="bibr" rid="ref27">27</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref28">28</xref>
                </sup> A conceptual framework is helpful when developing assessment tools.
                <sup>
                    <xref ref-type="bibr" rid="ref29">29</xref>
                </sup> Health literacy is often measured using self-report.
                <sup>
                    <xref ref-type="bibr" rid="ref30">30</xref>
                </sup> Furthermore, many of the health literacy instruments available aim to capture other domains of health literacy such as functional and social literacy.
                <sup>
                    <xref ref-type="bibr" rid="ref30">30</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref31">31</xref>
                </sup> In addition to measuring perceptions of one&#x2019;s own abilities (self-report or self-efficacy), it is important to measure abilities objectively (performance). The association between self-report and performance is not straightforward.
                <sup>
                    <xref ref-type="bibr" rid="ref32">32</xref>
                </sup> The Health Literacy Tool shed, a database of health literacy measures has indexed 16 instruments evaluating an aspect of health literacy intended for adolescents using an objective measurement of performance, of which eight are available in English.
                <sup>
                    <xref ref-type="bibr" rid="ref30">30</xref>
                </sup> The Claim Evaluation Tools have a narrower scope than most of these and focusses on one critical skill, the ability to assess treatment claims and make informed treatment choices. Although these instruments can provide information about people&#x2019;s 
                <italic toggle="yes">general</italic> health literacy skills, applying a more specific assessment tool in, for example, mapping studies, makes it easier to design interventions targeting the specific gaps identified.</p>
            <sec id="sec17">
                <title>Strengths and limitations</title>
                <p>One limitation of this study is that the adult population included more people with higher education than the general population in each of these three settings. Thus, the test might be more difficult for people with less education. However, although participants with higher education are somewhat more likely to answering the ability questions correctly, there does not seem to be a strong association.
                    <sup>
                        <xref ref-type="bibr" rid="ref14">14</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref33">33</xref>
                    </sup> Another limitation is that the findings of this study are exclusive to the three Eastern African countries, and the validity and reliability in other contexts are uncertain. The item-sets validated in this study should therefore undergo further psychometric testing if used elsewhere.</p>
                <p>The strategy of using pilot testing and a Rasch analysis have been found to be a robust method for developing measurement tools in several contexts.
                    <sup>
                        <xref ref-type="bibr" rid="ref10">10</xref>
                    </sup>
                    <sup>&#x2013;</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref13">13</xref>
                    </sup> An important strength of this study is that we used explicit and transparent methods, following the principal steps recommended for Rasch analysis.
                    <sup>
                        <xref ref-type="bibr" rid="ref21">21</xref>
                    </sup>
                    <sup>&#x2013;</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref23">23</xref>
                    </sup> Another strength is that we were able to recruit enough people despite the fact all three countries were burdened by the pandemic during the data collection. The results of this study and subsequent design of the questionnaire based on these results ensures that both the ability and Likert item-sets are a valid and reliable outcome measure for the randomised trials of the IHC lower secondary school intervention in all three countries.</p>
            </sec>
        </sec>
        <sec id="sec18" sec-type="conclusion">
            <title>Conclusion</title>
            <p>To our knowledge, this is the first measurement tool developed for measuring ability, intended behaviours, and self-efficacy for critical thinking about treatments in Kenya and Rwanda, as well as in Uganda. The two item-sets we evaluated in this study were found to be reliable and to have satisfactory measurement properties.</p>
            <p>The findings from our analysis were used to redesign and improve the ability item-set. The results also informed guidance for how the Likert item-set should be administered and analysed.</p>
        </sec>
    </body>
    <back>
        <sec id="sec21" sec-type="data-availability">
            <title>Data availability</title>
            <sec id="sec22">
                <title>Underlying data</title>
                <p>Zenodo: Critical thinking about treatment effects in Eastern Africa. Data set uncoded. [Data set]. Zenodo. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.7680780">https://doi.org/10.5281/zenodo.7680780</ext-link>.
                    <sup>

                        <xref ref-type="bibr" rid="ref34">34</xref>
</sup>
                </p>
                <p>The project contains the following underlying data:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>data-209546-2021-12-22-1147-utf_final_eclaim_rasch_2021.xlsx. (Raw data electronically collected - adults and students).</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>data-237440-2021-12-22-1155-utf_pilot Rwanda_Rasch_2021.xlsx. (Raw data collected from paper-based questionnaires used in the pilot survey - students).
</p>
                        </list-item>
                    </list>
                </p>
            </sec>
            <sec id="sec23">
                <title>Extended data</title>
                <p>Zenodo: Study protocol: Assessment of validity and reliability of a questionnaire based on the Claim Evaluation Tools Item bank in Uganda, Kenya and Rwanda. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.7680616">https://doi.org/10.5281/zenodo.7680616</ext-link>.
                    <sup>

                        <xref ref-type="bibr" rid="ref35">35</xref>
</sup>
                </p>
                <p>The project contains the following extended data:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
Protocol_Claim_Choice 2021 23 03.docx.(2)pdf. (Study protocol)
</p>
                        </list-item>
                    </list>
                </p>
                <p>Zenodo: Critical thinking about treatment effects in Eastern Africa. The Critical Thinking about Health test (before Rasch analysis). Zenodo. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.7756037">https://doi.org/10.5281/zenodo.7756037</ext-link>.
                    <sup>

                        <xref ref-type="bibr" rid="ref36">36</xref>
</sup>
                </p>
                <p>The project contains the following extended data:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Critical thinking about treatments test &#x2013; Vis - Nettskjema.pdf. (Original test validated as part of this study).
</p>
                        </list-item>
                    </list>
                </p>
                <p>Zenodo: Critical thinking about treatment effects in Eastern Africa. The Critical Thinking about Health test. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.7680606">https://doi.org/10.5281/zenodo.7680606</ext-link>.
                    <sup>

                        <xref ref-type="bibr" rid="ref37">37</xref>
</sup>
                </p>
                <p>The project contains the following extended data:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
Test_CHOICE_final_with literacy and userexperience_march_2022_FORMATD.pdf. (Final revised test).</p>
                        </list-item>
                    </list>
                </p>
            </sec>
            <sec id="sec24">
                <title>Reporting guidelines</title>
                <p>Zenodo: STROBE checklist for &#x2018;Critical thinking about treatment effects in Eastern Africa: development and Rasch analysis of an assessment tool&#x2019;. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.7680586">https://doi.org/10.5281/zenodo.7680586</ext-link>.
                    <sup>

                        <xref ref-type="bibr" rid="ref38">38</xref>
</sup>
                </p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
            </sec>
        </sec>
        <ack>
            <title>Acknowledgements</title>
            <p>We would like to thank Sarah Rosenbaum for providing her expertise in designing the questionnaire. Furthermore, we would like to thank the rest of Informed Health Choices team for their valuable feedback and discussion in planning and conducting this study. We are also very grateful for all the secondary school students and adults who took time to contribute to this study and to the ministry of education and school administration for allowing students participation.</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mian</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Khan</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Coronavirus: the spread of misinformation.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med.</italic>
</source>
                    <year>2020</year>;<volume>18</volume>(<issue>1</issue>):<fpage>89</fpage>.
                    <pub-id pub-id-type="pmid">32188445</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12916-020-01556-3</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7081539</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Oxman</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Larun</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>P&#x00e9;rez Gaxiola</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Quality of information in news media reports about the effects of health interventions: Systematic review and meta-analyses.</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>2022</year>;<volume>10</volume>(<issue>433</issue>):<fpage>433</fpage>.
                    <pub-id pub-id-type="pmid">35083033</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.52894.2</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8756300</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Brownlee</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chalkidou</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Doust</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evidence for overuse of medical services around the world.</article-title>
                    <source>

                        <italic toggle="yes">Lancet.</italic>
</source>
                    <year>2017</year>;<volume>390</volume>(<issue>10090</issue>):<fpage>156</fpage>&#x2013;<lpage>168</lpage>.
                    <pub-id pub-id-type="pmid">28077234</pub-id>
                    <pub-id pub-id-type="doi">10.1016/S0140-6736(16)32585-5</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5708862</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Glasziou</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Straus</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brownlee</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evidence for underuse of effective medical services around the world.</article-title>
                    <source>

                        <italic toggle="yes">Lancet.</italic>
</source>
                    <year>2017</year>;<volume>390</volume>(<issue>10090</issue>):<fpage>169</fpage>&#x2013;<lpage>177</lpage>.
                    <pub-id pub-id-type="pmid">28077232</pub-id>
                    <pub-id pub-id-type="doi">10.1016/S0140-6736(16)30946-1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <collab>The Lancet Infectious D</collab>:
                    <article-title>The COVID-19 infodemic.</article-title>
                    <source>

                        <italic toggle="yes">Lancet Infect. Dis.</italic>
</source>
                    <year>2020</year>;<volume>20</volume>(<issue>8</issue>):<fpage>875</fpage>.
                    <pub-id pub-id-type="pmid">32687807</pub-id>
                    <pub-id pub-id-type="doi">10.1016/S1473-3099(20)30565-X</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7367666</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <collab>The Lancet R</collab>:
                    <article-title>Going viral: misinformation in the time of COVID-19.</article-title>
                    <source>

                        <italic toggle="yes">Lancet Rheumatol.</italic>
</source>
                    <year>2021</year>;<volume>3</volume>(<issue>6</issue>):<fpage>e393</fpage>.
                    <pub-id pub-id-type="pmid">34075359</pub-id>
                    <pub-id pub-id-type="doi">10.1016/S2665-9913(21)00154-5</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8159185</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Oxman</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chalmers</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Austvoll-Dahlgren</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Key Concepts for assessing claims about treatment effects and making well-informed treatment choices.</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>2019</year>;<volume>7</volume>(<issue>1784</issue>):<fpage>1784</fpage>.
                    <pub-id pub-id-type="pmid">30631443</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.16771.2</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6290969</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nsangi</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Semakula</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oxman</surname>
                            <given-names>AD</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Effects of the Informed Health Choices primary school intervention on the ability of children in Uganda to assess the reliability of claims about treatment effects, 1-year follow-up: a cluster-randomised trial.</article-title>
                    <source>

                        <italic toggle="yes">Trials.</italic>
</source>
                    <year>2020</year>;<volume>21</volume>(<issue>1</issue>):<fpage>27</fpage>.
                    <pub-id pub-id-type="pmid">31907013</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13063-019-3960-9</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6945419</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Semakula</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nsangi</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oxman</surname>
                            <given-names>AD</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Effects of the Informed Health Choices podcast on the ability of parents of primary school children in Uganda to assess the trustworthiness of claims about treatment effects: one-year follow up of a randomised trial.</article-title>
                    <source>

                        <italic toggle="yes">Trials.</italic>
</source>
                    <year>2020</year>;<volume>21</volume>(<issue>1</issue>):<fpage>187</fpage>.
                    <pub-id pub-id-type="pmid">32059694</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13063-020-4093-x</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7023790</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Austvoll-Dahlgren</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Guttersrud</surname>
                            <given-names>O</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nsangi</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the &#x2018;Claim Evaluation Tools&#x2019; database using Rasch modelling.</article-title>
                    <source>

                        <italic toggle="yes">BMJ Open.</italic>
</source>
                    <year>2017</year>;<volume>7</volume>(<issue>5</issue>):<fpage>e013185</fpage>.
                    <pub-id pub-id-type="pmid">28550019</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmjopen-2016-013185</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5777469</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Austvoll-Dahlgren</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Semakula</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nsangi</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Measuring ability to assess claims about treatment effects: the development of the &#x2018;Claim Evaluation Tools&#x2019;.</article-title>
                    <source>

                        <italic toggle="yes">BMJ Open.</italic>
</source>
                    <year>2017</year>;<volume>7</volume>(<issue>5</issue>):<fpage>e013184</fpage>.
                    <pub-id pub-id-type="pmid">28515181</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmjopen-2016-013184</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5777467</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Perez-Gaxiola</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Austvoll-Dahlgren</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Validacion de un cuestionario para medir la habilidad de la poblacion general para evaluar afirmaciones acerca de tratamientos medicos.</article-title>
                    <source>

                        <italic toggle="yes">Gac. Med. Mex.</italic>
</source>
                    <year>2018</year>;<volume>154</volume>(<issue>4</issue>):<fpage>480</fpage>&#x2013;<lpage>495</lpage>.
                    <pub-id pub-id-type="pmid">30250337</pub-id>
                    <pub-id pub-id-type="doi">10.24875/GMM.17003340</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Austvoll-Dahlgren</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evaluating people&#x2019;s ability to assess treatment claims: Validating a test in Mandarin from Claim Evaluation Tools database.</article-title>
                    <source>

                        <italic toggle="yes">J. Evid. Based Med.</italic>
</source>
                    <year>2019</year>;<volume>12</volume>(<issue>2</issue>):<fpage>140</fpage>&#x2013;<lpage>146</lpage>.
                    <pub-id pub-id-type="pmid">31144466</pub-id>
                    <pub-id pub-id-type="doi">10.1111/jebm.12343</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dahlgren</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Furuseth-Olsen</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rose</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The Norwegian public?s ability to assess treatment claims: results of a cross-sectional study of critical health literacy [version 2; peer review: 1 approved, 1 approved with reservations].</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>2021</year>;<volume>9</volume>(<issue>179</issue>).
                    <pub-id pub-id-type="doi">10.12688/f1000research.21902.2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bloem</surname>
                            <given-names>EF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zuuren</surname>
                            <given-names>FJ</given-names>
                            <prefix>van</prefix>
                        </name>

                        <name name-style="western">
                            <surname>Koeneman</surname>
                            <given-names>MA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Clarifying quality of life assessment: do theoretical models capture the underlying cognitive processes?</article-title>
                    <source>

                        <italic toggle="yes">Qual. Life Res.</italic>
</source>
                    <year>2008</year>;<volume>17</volume>(<issue>8</issue>):<fpage>1093</fpage>&#x2013;<lpage>1102</lpage>.
                    <pub-id pub-id-type="pmid">18704756</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s11136-008-9380-z</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Choi</surname>
                            <given-names>SW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cook</surname>
                            <given-names>KF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dodd</surname>
                            <given-names>BG</given-names>
                        </name>
</person-group>:
                    <article-title>Parameter recovery for the partial credit model using MULTILOG.</article-title>
                    <source>

                        <italic toggle="yes">J. Outcome Meas.</italic>
</source>
                    <year>1997</year>;<volume>1</volume>(<issue>2</issue>):<fpage>114</fpage>&#x2013;<lpage>142</lpage>.
                    <pub-id pub-id-type="pmid">9661717</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Linacre</surname>
                            <given-names>JM</given-names>
                        </name>
</person-group>:
                    <article-title>Sample size and item calibration stability.</article-title>
                    <source>

                        <italic toggle="yes">Rasch Measurement Transactions.</italic>
</source>
                    <year>1994</year>;<volume>328</volume>.</mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Clauser</surname>
                            <given-names>BE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mazor</surname>
                            <given-names>KM</given-names>
                        </name>
</person-group>:
                    <article-title>Using Statistical Procedures to Identify Differentially Functioning Test Items.</article-title>
                    <source>

                        <italic toggle="yes">Educ. Meas. Issues Pract.</italic>
</source>
                    <year>1998</year>;<volume>17</volume>(<issue>1</issue>):<fpage>31</fpage>&#x2013;<lpage>44</lpage>.</mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rogers</surname>
                            <given-names>HJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Swaminathan</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning.</article-title>
                    <source>

                        <italic toggle="yes">Appl. Psychol. Meas.</italic>
</source>
                    <year>1993</year>;<volume>17</volume>(<issue>2</issue>):<fpage>105</fpage>&#x2013;<lpage>116</lpage>.
                    <pub-id pub-id-type="doi">10.1177/014662169301700201</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Narayanan</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Swaminathan</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>Performance of the Mantel-Haenszel and Simultaneous Item Bias Procedures for Detecting Differential Item Functioning.</article-title>
                    <source>

                        <italic toggle="yes">Appl. Psychol. Meas.</italic>
</source>
                    <year>1994</year>;<volume>18</volume>(<issue>4</issue>):<fpage>315</fpage>&#x2013;<lpage>328</lpage>.
                    <pub-id pub-id-type="doi">10.1177/014662169401800403</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tennant</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Conaghan</surname>
                            <given-names>PG</given-names>
                        </name>
</person-group>:
                    <article-title>The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper?</article-title>
                    <source>

                        <italic toggle="yes">Arthritis Rheum.</italic>
</source>
                    <year>2007</year>;<volume>57</volume>(<issue>8</issue>):<fpage>1358</fpage>&#x2013;<lpage>1362</lpage>.
                    <pub-id pub-id-type="pmid">18050173</pub-id>
                    <pub-id pub-id-type="doi">10.1002/art.23108</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="book">
                    <collab>Psylab Group</collab>:
                    <source>

                        <italic toggle="yes">Introductory Rasch Analysis Using RUMM2030.</italic>
</source>
                    <publisher-name>Psychometric Labaratory for Health Sciences: The Section of Rehabilitation Medicine University of Leeds</publisher-name>;<year>2016</year>.</mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="book">
                    <collab>Rumm Laboratory Pty Ltd</collab>:
                    <source>

                        <italic toggle="yes">Displaying the RUMM2030 analysis.</italic>
</source>
                    <publisher-name>Rasch unidimensional measurement model</publisher-name>;<year>2015</year>.</mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hagell</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Testing Rating Scale Unidimensionality Using the Principal Component Analysis (PCA)/t-Test Protocol with the Rasch Model: The Primacy of Theory over Statistics.</article-title>
                    <source>

                        <italic toggle="yes">Open J. Stat.</italic>
</source>
                    <year>2014</year>;<volume>04</volume>:<fpage>456</fpage>&#x2013;<lpage>465</lpage>.
                    <pub-id pub-id-type="doi">10.4236/ojs.2014.46044</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Andrich</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Rasch Models for Measurement.</italic>
</source>
                    <publisher-loc>Beverly Hills</publisher-loc>:
                    <publisher-name>Sage Publications I</publisher-name>;<year>1988</year>.</mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nsangi</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Semakula</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oxman</surname>
                            <given-names>AD</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Effects of the Informed Health Choices primary school intervention on the ability of children in Uganda to assess the reliability of claims about treatment effects: a cluster-randomised controlled trial.</article-title>
                    <source>

                        <italic toggle="yes">Lancet.</italic>
</source>
                    <year>2017</year>;<volume>390</volume>(<issue>10092</issue>):<fpage>374</fpage>&#x2013;<lpage>388</lpage>.
                    <pub-id pub-id-type="pmid">28539194</pub-id>
                    <pub-id pub-id-type="doi">10.1016/S0140-6736(17)31226-6</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chinn</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>Critical health literacy: A review and critical analysis.</article-title>
                    <source>

                        <italic toggle="yes">Soc. Sci. Med.</italic>
</source>
                    <year>2011</year>;<volume>73</volume>(<issue>1</issue>):<fpage>60</fpage>&#x2013;<lpage>67</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.socscimed.2011.04.004</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Guo</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Armstrong</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Waters</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Quality of health literacy instruments used in children and adolescents: a systematic review.</article-title>
                    <source>

                        <italic toggle="yes">BMJ Open.</italic>
</source>
                    <year>2018</year>;<volume>8</volume>(<issue>6</issue>):<fpage>e020080</fpage>.
                    <pub-id pub-id-type="pmid">29903787</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmjopen-2017-020080</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6009458</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref29">
                <label>29</label>
                <mixed-citation publication-type="other">
                    <article-title>COSMIN Taxonomy of Measurement Properties.</article-title>
                    <ext-link ext-link-type="uri" xlink:href="https://www.cosmin.nl/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref30">
                <label>30</label>
                <mixed-citation publication-type="other">
                    <article-title>Health Literacy Tool Shed: A database of health literacy measures.</article-title>
                    <ext-link ext-link-type="uri" xlink:href="http://healthliteracy.bu.edu/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref31">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nguyen</surname>
                            <given-names>TH</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Paasche-Orlow</surname>
                            <given-names>MK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>McCormack</surname>
                            <given-names>LA</given-names>
                        </name>
</person-group>:
                    <article-title>The State of the Science of Health Literacy Measurement.</article-title>
                    <source>

                        <italic toggle="yes">Stud. Health Technol. Inform.</italic>
</source>
                    <year>2017</year>;<volume>240</volume>:<fpage>17</fpage>&#x2013;<lpage>33</lpage>.
                    <pub-id pub-id-type="pmid">28972507</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kiechle</surname>
                            <given-names>ES</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bailey</surname>
                            <given-names>SC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hedlund</surname>
                            <given-names>LA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Different Measures, Different Outcomes? A Systematic Review of Performance-Based versus Self-Reported Measures of Health Literacy and Numeracy.</article-title>
                    <source>

                        <italic toggle="yes">J. Gen. Intern. Med.</italic>
</source>
                    <year>2015</year>;<volume>30</volume>(<issue>10</issue>):<fpage>1538</fpage>&#x2013;<lpage>1546</lpage>.
                    <pub-id pub-id-type="pmid">25917656</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s11606-015-3288-4</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4579206</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>S&#x00f8;rensen</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pelikan</surname>
                            <given-names>JM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>R&#x00f6;thlin</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Health literacy in Europe: comparative results of the European health literacy survey (HLS-EU).</article-title>
                    <source>

                        <italic toggle="yes">Eur. J. Pub. Health.</italic>
</source>
                    <year>2015</year>;<volume>25</volume>(<issue>6</issue>):<fpage>1053</fpage>&#x2013;<lpage>1058</lpage>.
                    <pub-id pub-id-type="pmid">25843827</pub-id>
                    <pub-id pub-id-type="doi">10.1093/eurpub/ckv043</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4668324</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <label>34</label>
                <mixed-citation publication-type="data">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dahlgren</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <data-title>Critical thinking about treatment effects in Eastern Africa. Data set uncoded.</data-title>Data set.
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2023</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7680780</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref35">
                <label>35</label>
                <mixed-citation publication-type="data">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dahlgren</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Semakula</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oxman</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <data-title>Study protocol: Assessment of validity and reliability of a questionnaire based on the Claim Evaluation Tools Item bank in Uganda, Kenya and Rwanda.</data-title>Dataset.
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2023</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7680616</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref36">
                <label>36</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dahlgren</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Critical thinking about treatment effects in Eastern Africa. The Critical Thinking about Health test (before Rasch analysis).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2023</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7756037</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref37">
                <label>37</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dahlgren</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Critical thinking about treatment effects in Eastern Africa. The Critical Thinking about Health test.</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2023</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7680606</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref38">
                <label>38</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dahlgren</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Critical thinking about treatment effects in Eastern Africa: development and Rasch analysis of an assessment tool. STROBE checklist.</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2023</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7680586</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report244064">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.144949.r244064</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Ezinne</surname>
                        <given-names>Ngozika Esther</given-names>
                    </name>
                    <xref ref-type="aff" rid="r244064a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-3138-0213</uri>
                </contrib>
                <aff id="r244064a1">
                    <label>1</label>University West Indies Saint Augustine, Saint Augustine, Trinidad and Tobago</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>25</day>
                <month>3</month>
                <year>2024</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2024 Ezinne NE</copyright-statement>
                <copyright-year>2024</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport244064" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.132052.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Dear Editor,</p>
            <p> Thanks for the opportunity to review the study. The study was interesting to read and had relevant information that can add to literature. Below are my comments.</p>
            <p> Introduction</p>
            <p> There was no clear justification for the need for the study.</p>
            <p> The aim of the study was not clearly stated.&#x00a0;</p>
            <p> </p>
            <p> Method&#x00a0;</p>
            <p> The study failed to use method that is reproducible.</p>
            <p> Variations in the method used in recruiting participants and the population studied could create a bias that will make the findings from the study unreliable.&#x00a0;&#x00a0;</p>
            <p> Method used in checking for fitness of the tool assessed was not well specified.&#x00a0;</p>
            <p> Measures taken to ensure reliability of the tool was not clear.&#x00a0;</p>
            <p> The study failed to provide information on how response dependency was checked and controlled.&#x00a0;</p>
            <p> How dimensionality was measured was not clear.</p>
            <p> Scale of difficulty ranking was not provided.&#x00a0;</p>
            <p> It is not clear if the findings from the study can be generalized since only three Eastern African countries were included in the study.</p>
            <p> </p>
            <p> Overall, the study findings are relevant and can add to the existing literature. However, it is not suitable in its current form. I therefore recommend major revision.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>I cannot comment. A qualified statistician is required.</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>No</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Eye, Ocular surface diseases, optometry, ophthalmology and eye health</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment15208-244064">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Dahlgren</surname>
                            <given-names>Astrid</given-names>
                        </name>
                        <aff/>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>I have no competing interests to declare</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>6</day>
                    <month>1</month>
                    <year>2026</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Thank you for this opportunity to revise our manuscript. We have done our best to address the reviewer&#x2019;s concerns and have made revisions were necessary.</p>
                <p> With best wishes,</p>
                <p> Astrid Dahlgren</p>
                <p> &#x00a0; 
                    <list list-type="order">
                        <list-item>
                            <p>
                                <bold>Introduction</bold>
                            </p>
                        </list-item>
                    </list> There was no clear justification for the need for the study. The aim of the study was not clearly stated.</p>
                <p> 
                    <underline>Authors feedback</underline>: We believe we have describe the justification and aim in the original manuscript introductory where we state that the instrument was needed for assessing outcomes of three randomized controlled trials in Uganda, Kenya and Rwanda, and that the objective of the study was to &#x201c;to develop and evaluate the psychometric properties of this new assessment tool&#x201d;. However, to improve clarity we have added the following paragraph:</p>
                <p> 
                    <italic>Planning for the three randomised controlled trials evaluating educational interventions in Uganda, Kenya and Rwanda, there was a need for developing an instrument with acceptable psychometric properties assessing the learning objectives for these trials. This was done by selecting items from the Claim Evaluation Tools item bank. In this paper, we describe this process where the objective was to develop and evaluate the psychometric properties of the new assessment tool using Rasch analysis for use in Uganda, Kenya, and Rwanda. &#x00a0;</italic>
                </p>
                <p> &#x00a0; 
                    <list list-type="order">
                        <list-item>
                            <p>
                                <bold>Method </bold>
                            </p>
                        </list-item>
                    </list> 
                    <list list-type="order">
                        <list-item>
                            <p>The study failed to use method that is reproducible. Variations in the method used in recruiting participants and the population studied could create a bias that will make the findings from the study unreliable.&#x00a0;</p>
                        </list-item>
                    </list> </p>
                <p> 
                    <underline>Authors response:</underline> In this study we followed the fundamental steps of Rasch analysis including testing for internal construct validity (multidimensionality), invariance of the items (Item-Person Interaction), and item bias (differential item functioning), as well as testing for reliability. We have published a protocol and provided all underlying data through a referenced data depository, so we are unfortunately unclear about what the reviewer finds unsatisfactory. Regarding the recruitment of participants, when evaluating the psychometric properties of an instrument using Rasch analysis, a representative sample is not necessary. Instead, as we state in the original manuscript under &#x201c;inclusion criteria&#x201d;: &#x201c;
                    <italic>However, the sample should include enough people to allow for evaluating differential functioning and a spread in ability&#x201d;.</italic> Our approach to this, consideration of sample size and choice of factors to include in DIF-analysis is described in detail in the same paragraph and based on methodological literature referenced here (see references 16-18 and 19- 20). Signs of error regarding data collection that would impact the analysis or our conclusions would for example be a large proportion of missing responses or extreme person values, however none of these were present.</p>
                <p> &#x00a0; 
                    <list list-type="order">
                        <list-item>
                            <p>Method used in checking for fitness of the tool assessed was not well specified.</p>
                        </list-item>
                    </list> 
                    <underline>Authors response:</underline> We are a bit unsure what the reviewer points to when it comes to &#x201c;fitness&#x201d; of the tool in terms of Rach methodology, however generally, this expression is often used for considering 
                    <italic>suitability</italic>. If this is a correct interpretation, suitability can be considered both in terms of the validity and reliability of the instrument of course, as we would not regard a tool as suitable if it had poor psychometric properties. As is stated in the discussion and conclusion, we conclude that the two item sets had &#x201c;satisfactory measurement properties&#x201d;.</p>
                <p> Regarding suitability for the population, we planned to include in upcoming randomized controlled trials, our analysis of targeting but also item and person fit provides useful information to this. As we report in the &#x201c;summary statistics&#x201d; section under &#x201c;results&#x201d;, both item sets were well targeted. Analysis of person fit can give an indication to if a test is 
                    <italic>acceptable</italic> to the target audience which may impact suitability, for example signs of respondents guessing. Out of 1671 responses, only 3 people had extreme values. This provides convincing evidence that the suitability of the instrument to the targe group.</p>
                <p> To improve precision and make the suitability of the two item-sets clearer, we have added the following sentence to the abstract and conclusion: Overall, the two item-sets were found to have satisfactory measurement properties. Based on our analysis and subsequent refinement, we consider these instruments to be suitable for our target audiences in Uganda, Kenya and Rwanda.</p>
                <p> &#x00a0; 
                    <list list-type="order">
                        <list-item>
                            <p>Measures taken to ensure reliability of the tool was not clear.</p>
                        </list-item>
                    </list> Author feedback: Under &#x201c;Rasch analysis&#x201d; in the methods section we describe that we used RUMM2030 for our analysis (reference 22), followed the principal steps of Rasch analysis (reference 21) and largely based on the protocol by the Psylab Group (reference 22). We state that reliability was assessed by calculation Cronbach&#x2019;s Alpha: We calculated Cronbach&#x2019;s alpha to assess the reliability of both item-sets by removing missing data. A Cronbach&#x2019;s alpha above 0.7 was considered acceptable. 22 
                    <list list-type="order">
                        <list-item>
                            <p>The study failed to provide information on how response dependency was checked and controlled. How dimensionality was measured was not clear.</p>
                        </list-item>
                    </list> 
                    <underline>Author feedback:</underline> Under &#x201c;Rasch analysis&#x201d; in the methods section we describe that we used RUMM2030 for our analysis (reference 22), followed the principal steps of Rasch analysis (reference 21) and largely based on the protocol by the Psylab Group (reference 22). This methodological literature describes the approach to exploring unidimensionality and local dependency, as we have referenced, and our approach to this has already been described in the original manuscript:</p>
                <p> 
                    <italic>The principal component analysis/t-test protocol is used to test the hypothesis of unidimensionality. This is done by identifying the two most divergent item subsets (using the residual principal component function in RUMM2030), and then calculating t-tests. 22 If &#x2264;5% of tests are significant, strict unidimensionality can be inferred. 24 However, the concept of &#x2018;unidimensionality&#x2019; is not &#x2018;definite&#x2019; but relative and should be supplemented with quantitative or qualitative interpretation of the explicit variable definition and considering the context and purpose of the measurement. 24 , 25 </italic>
                </p>
                <p> 
                    <italic>We tested local dependency by using the residual correlations function in RUMM2030. Data from this output was copied into Excel (version 2208) and any residual correlations greater than 0.2 above the average was considered as potential problematic dependency. 22</italic>
                </p>
                <p> </p>
                <p> c)&#x00a0; Scale of difficulty ranking was not provided.</p>
                <p> 
                    <underline>Author feedback: We have person-item distribution maps in a supplemental file in Zenodo:</underline>
                </p>
                <p> &#x00a0;The person-item threshold map visually compares where people's ability lie versus where items' difficulties/thresholds are located on the same continuum. Thus, this provides information about the targeting of the scale but also the difficulty of the items. 
                    <list list-type="order">
                        <list-item>
                            <p>It is not clear if the findings from the study can be generalized since only three Eastern African countries were included in the study.</p>
                        </list-item>
                    </list> 
                    <underline>Author feedback:</underline> The item-sets should not be used in other country settings without checking for DIF by context, and perhaps preferably redoing the Rasch analysis as the psychometric properties we describe here are specific to the country contexts we tested them in. That being said, we have successfully translated and tested instruments based on the Claim Evaluation Tools in a range of countries and with positive results, and the analysis is easily redone as part of a cross-sectional study if one should seek to apply the instrument resulting from this analysis in other contexts.</p>
                <p> In the abstract and conclusion, we underline that intended use and suitability for the populations in our target groups in Kenya and Rwanda, as well as in Uganda.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
