<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.173732.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>A Decade of the Turkish Pediatric Endocrinology Subspecialty Board Examination: Structure, Outcomes, and Candidate Perspectives</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>&#x00c7;ali&#x015f;kan</surname>
                        <given-names>S. Ayhan</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-9714-6249</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Demir</surname>
                        <given-names>Korcan</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Darcan</surname>
                        <given-names>&#x015e;&#x00fc;kran</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>ANIK</surname>
                        <given-names>Ahmet</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Darendeliler</surname>
                        <given-names>Feyza</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <collab>Turkish Board of Pediatric Endocrinology Exam Committee
                        <contrib-group>
                            <contrib>
                                <name>
                                    <surname>B&#xfffd;BER</surname>
                                    <given-names>Ece</given-names>
                                </name>
                            </contrib>
                            <contrib>
                                <name>
                                    <surname>D&#xfffd;NERAY</surname>
                                    <given-names>Hakan</given-names>
                                </name>
                            </contrib>
                            <contrib>
                                <name>
                                    <surname>SAVA? ERDEVE</surname>
                                    <given-names>?enay</given-names>
                                </name>
                            </contrib>
                            <contrib>
                                <name>
                                    <surname>G&#xfffd;K?EN</surname>
                                    <given-names>Damla</given-names>
                                </name>
                            </contrib>
                            <contrib>
                                <name>
                                    <surname>T&#xfffd;T&#xfffd;NC&#xfffd;LER K&#xfffd;KENL?</surname>
                                    <given-names>Filiz</given-names>
                                </name>
                            </contrib>
                            <contrib>
                                <name>
                                    <surname>&#xfffd;ZEN</surname>
                                    <given-names>Samim</given-names>
                                </name>
                            </contrib>
                            <contrib>
                                <name>
                                    <surname>&#xfffd;Z&#xfffd;N</surname>
                                    <given-names>Alev</given-names>
                                </name>
                            </contrib>
                            <contrib>
                                <name>
                                    <surname>T&#xfffd;RKKAHRAMAN</surname>
                                    <given-names>Do?a</given-names>
                                </name>
                            </contrib>
                        </contrib-group>
                    </collab>
                </contrib>
                <aff id="a1">
                    <label>1</label>Medical Education, United Arab Emirates University College of Medicine and Health Sciences, Al Ain, Abu Dhabi, United Arab Emirates</aff>
                <aff id="a2">
                    <label>2</label>Pediatric Endocrinology, Dokuz Eyl&#x00fc;l University Faculty of Medicine, &#x0130;zmir, Turkey</aff>
                <aff id="a3">
                    <label>3</label>Pediatric Endocrinology, Ege University Faculty of Medicine, &#x0130;zmir, Turkey</aff>
                <aff id="a4">
                    <label>4</label>Pediatric Endocrinology, Adnan Menderes University Faculty of Medicine, Ayd&#x0131;n, Turkey</aff>
                <aff id="a5">
                    <label>5</label>Pediatric Endocrinology, Istanbul University Istanbul Faculty of Medicine, &#x0130;stanbul, Turkey</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:ayhanca@gmail.com">ayhanca@gmail.com</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>8</day>
                <month>12</month>
                <year>2025</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2025</year>
            </pub-date>
            <volume>14</volume>
            <elocation-id>1371</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>27</day>
                    <month>11</month>
                    <year>2025</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2025 &#x00c7;ali&#x015f;kan SA et al.</copyright-statement>
                <copyright-year>2025</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/14-1371/pdf"/>
            <abstract>
                <sec>
                    <title>Background</title>
                    <p>To present a decade-long evaluation of the Turkish Pediatric Endocrinology Subspecialty Board Examination (TPEBE), focusing on its structural evolution, psychometric performance, and candidates&#x2019; perceptions.</p>
                </sec>
                <sec>
                    <title>Methods</title>
                    <p>This cross-sectional study analyzed examination data from 2015&#x2013;2025, encompassing 263 sittings (261 eligible candidates) and post-exam survey responses from 217 participants. Examination metrics included mean scores, pass rates, and reliability coefficients (Cronbach&#x2019;s &#x03b1;) for multiple-choice question (MCQ) and key feature problem (KFP) components. Survey items assessed perceptions of exam difficulty, fairness, relevance, and organization using a 9-point Likert scale. Quantitative data were analyzed using descriptive and inferential statistics.</p>
                </sec>
                <sec>
                    <title>Results</title>
                    <p>Mean total scores declined following the 2019 inclusion of KFPs (x&#x0304;=52.9&#x00b1;9.05 in 2025), while reliability improved progressively (MCQ &#x03b1;=0.53&#x2013;0.90; KFP &#x03b1;=0.31&#x2013;0.85). Pass rates varied from 22.6% to 85.0%. Male candidates scored higher on MCQs and total scores (p&lt;0.05), but gender differences in KFP performance and overall pass rates were not statistically significant. Candidates rated the examination highly for organization (x&#x0304;=7.75&#x00b1;1.54) and clinical relevance of KFPs (x&#x0304;=7.35&#x00b1;1.59), though exam duration received the lowest satisfaction (x&#x0304;=5.14&#x00b1;2.97). Qualitative feedback emphasized the educational value of KFPs and recommended extended testing time.</p>
                </sec>
                <sec>
                    <title>Conclusions</title>
                    <p>Over ten years, the TPEBE has evolved into a psychometrically robust and educationally valuable certification process. The balanced integration of MCQs and KFPs has strengthened construct validity and candidate engagement. These examinations are expected to gain broader recognition by institutions and regulators as a benchmark for educational and professional achievement.</p>
                </sec>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Graduate Medical Education</kwd>
                <kwd>Pediatric Endocrinology</kwd>
                <kwd>Subspecialty Training</kwd>
                <kwd>Board Certification</kwd>
                <kwd>Board Examination</kwd>
                <kwd>Medical Education</kwd>
                <kwd>Assessment</kwd>
                <kwd>Key Feature Problems</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec5" sec-type="intro">
            <title>Introduction</title>
            <p>Pediatric endocrinology was formally recognized as a subspecialty in T&#x00fc;rkiye in 1973, alongside pediatric metabolic diseases. It subsequently attained the status of an independent subspecialty in 2002.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>,
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> The establishment of the Turkish Society for Pediatric Endocrinology and Diabetes (TSPED) in 1994 marked a critical milestone in the institutional development of the field. Since its inception, TSPED has played a central role in advancing pediatric endocrinology and diabetes care through the promotion of professional collaboration, standard-setting initiatives, and a broad array of educational activities&#x2014;including conferences, workshops, and training programs&#x2014;designed to enhance the competencies of healthcare professionals.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup>
            </p>
            <p>In alignment with international practices for professional certification, the Turkish Pediatric Endocrinology Subspecialty Board Examination (TPEBE) was introduced in 2015. Administered by the Turkish Board of Pediatric Endocrinology Exam Committee (TBPEC) under TSPED, the TPEBE serves as a formal mechanism to evaluate the clinical knowledge and competencies of pediatric endocrinologists in T&#x00fc;rkiye. Board certification examinations are globally recognized as essential instruments for safeguarding the quality of clinical practice and maintaining high professional standards across medical specialties.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> Pediatric subspecialty board examinations are key instruments of international competency assurance. The implementation of the TPEBE underscores T&#x00fc;rkiye&#x2019;s commitment to aligning its training and certification processes with global standards in medical education. Thus, the establishment of the TPEBE represents a significant step in harmonizing Turkish pediatric endocrinology training and credentialing processes with international norms in medical education and assessment.</p>
            <p>During the initial three years, the examination process exclusively employed multiple-choice questions (MCQs). They are recognized as a highly effective tool for evaluating comprehensive knowledge across broad content areas, as they facilitate extensive content coverage and contribute to strong content validity.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> This approach supports making valid inferences about the entire content domain. Furthermore, MCQs are extensively utilized because they offer high reliability and are easy to score, ensuring precision, uniformity, and efficiency.
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup> However, poorly designed MCQs may inadvertently target superficial content and fail to measure higher-order cognitive processes.
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>,
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup>
            </p>
            <p>Consequently, the selection of item formats for any given assessment should be guided by a clear understanding of their respective strengths and limitations. A robust assessment strategy integrates diverse methods, each tailored to meet specific evaluative objectives.
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>
                </sup> While MCQs ensured coverage and reliability, concerns about assessing higher-order reasoning prompted the integration of new formats.</p>
            <p>In response to these considerations, the TPEBE incorporated key feature problems (KFPs) into its format starting in 2019, with the objective of enhancing the assessment of candidates&#x2019; clinical decision-making skills.
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup> KFPs are designed to simulate real-life clinical scenarios that require the integration of complex data to make clinically meaningful decisions.
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>,
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup> The format focuses on pivotal points in case management&#x2014;referred to as &#x201c;key features&#x201d;&#x2014;which represent the most essential and error-prone aspects of clinical problems.
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>,
                    <xref ref-type="bibr" rid="ref13">13</xref>
                </sup> Originally introduced at the Cambridge Conference in 1984, the key features format was adopted by the Medical Council of Canada in 1992 as part of the MCC Qualifying Examination (MCCQE) Part I. This innovation aimed to replace the older Patient Management Problems (PMPs) format and reduce the overreliance on MCQs in licensure examinations.
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>,
                    <xref ref-type="bibr" rid="ref12">12</xref>,
                    <xref ref-type="bibr" rid="ref14">14</xref>
                </sup> The adoption of KFPs by the TPEBE reflects a parallel intent: to augment the assessment of clinical competence beyond the limits of traditional MCQs.</p>
            <p>This manuscript presents a comprehensive 10-year review of the Turkish Pediatric Endocrinology Subspecialty Board Examination, focusing on its structural evolution, aggregated examination outcomes, and candidates&#x2019; perspectives on the examination process.</p>
        </sec>
        <sec id="sec6" sec-type="methods">
            <title>Methods</title>
            <sec id="sec7">
                <title>Study design and setting</title>
                <p>This cross-sectional study analyzed data from the TPEBE conducted between 2015 and 2025. Examination records were combined with candidate survey responses collected immediately after each examination, except in the first year (2015), when no survey was administered. A total of ten examination sessions were included in the analysis; the 2020 session was cancelled due to the COVID-19 pandemic.</p>
            </sec>
            <sec id="sec8">
                <title>Participants</title>
                <p>Eligible candidates were pediatric endocrinologists and subspecialty residents who met the requirements defined by the TBPEC. Eligibility was verified through documentation review, and only approved applicants were permitted to sit for the exam. Across the study period, there were 263 examination sittings corresponding to 186 unique candidates where 58 attempted the examination more than once. Of these, 193 (73.4%) were female and 70 (26.6%) males.</p>
            </sec>
            <sec id="sec9">
                <title>Exam sets</title>
                <p>Exam questions were developed by faculty members from academic institutions across T&#x00fc;rkiye, each submitting items within their subspecialty areas to an online item bank organized by predefined subject categories. Depending on the year, seven or eight TBPEC members reviewed these items during face-to-face structured meetings, revised them as needed, and selected the final questions for each examination. All examinations were administered in paper&#x2013;pencil format, at a single venue, under TBPEC members&#x2019; supervision.</p>
                <p>Each exam set consisted of two sections:
                    <list list-type="order">
                        <list-item>
                            <label>1.</label>
                            <p>

                                <bold>MCQ section:</bold> In the first four years (2015&#x2013;2018), the exams consisted exclusively of MCQs, ranging from 75 items in 2015 to 100 items between 2016&#x2013;2018, 85 items in 2019 and 80 items between 2021-2025. All MCQs had five options with one correct answer.</p>
                        </list-item>
                        <list-item>
                            <label>2.</label>
                            <p>

                                <bold>KFP section:</bold> Beginning in 2019, the exam included a second section of KFPs. Each exam contained five KFP cases, with 2&#x2013;4 items per case (13&#x2013;14 items in total). Most required short written responses, while only five were multiple-response items allowing more than one correct option.</p>
                        </list-item>
                    </list>
                </p>
                <p>The maximum achievable score for each examination was 100 points. To determine the cut scores, two standard-setting methods were employed: the Nedelsky method for MCQs (applied to all exams except 2015) and the Angoff method for KFPs, each selected for its suitability to the respective item format. For exams that included both MCQ and KFP sections, the final cut score represented the sum of the two section-specific cut scores. The evolution of the examination structure across the study period is summarized in 
                    <xref ref-type="table" rid="T1">
Table 1</xref>.</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>
Table 1. </label>
                    <caption>
                        <title>Structure of the Turkish pediatric endocrinology subspecialty board examination by year, 2015&#x2013;2025.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Year(s)
                                    <xref ref-type="table-fn" rid="tfn1">*</xref>
                                </th>
                                <th align="left" colspan="1" rowspan="1" valign="top">MCQ items n</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">KFP cases (items) n</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Exam format</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2015</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">75</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="2" valign="top">MCQ</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2016&#x2013;2018</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">100</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2019</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">85</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5 cases (13&#x2013;14)</td>
                                <td align="left" colspan="1" rowspan="2" valign="top">MCQ + KFP</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2021&#x2013;2025</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">80</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5 cases (13&#x2013;14)</td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <fn-group content-type="footnotes">
                            <fn id="tfn1">
                                <label>*</label>
                                <p>2020 exam was cancelled due to COVID-19.</p>
                            </fn>
                        </fn-group>
                    </table-wrap-foot>
                </table-wrap>
                <p>

                    <bold>Post-exam review:</bold> After each examination, board members reviewed the questions with the candidates and received any appeals. Following the evaluation of these appeals, some MCQs were removed from the exam sets over the years for various reasons: 2016 (4 items), 2017, 2018, and 2021 (2 items each), and 2024 and 2025 (1 item each). Omitted questions were scored as correct for all examinees.</p>
            </sec>
            <sec id="sec10">
                <title>Survey instrument</title>
                <p>The feedback survey was administered at the examination venue after candidates completed the examination. Candidate feedback was collected using a paper-based questionnaire administered immediately after the exam except in 2015. The instrument consisted of items questioning demographic characteristics, 11 structured items assessing perceptions of exam difficulty, relevance, fairness, and organization, each rated on a 9-point Likert scale (1: Strongly disagree/Very poor, 9: Strongly agree/Very good); and open-ended questions exploring the most useful aspects, least useful aspects, suggestions for improvement and comments. The survey was completed anonymously and voluntarily. Informed consent was obtained verbally from all participants prior to data collection. Verbal consent was obtained because the survey was anonymous, posed minimal risk, and did not involve the collection of any identifying or sensitive personal information.</p>
            </sec>
            <sec id="sec11">
                <title>Data analysis</title>
                <p>Data were analyzed using IBM SPSS Statistics version 29.0 (IBM Corp., Armonk, NY, USA). The normal distribution of continuous variables was examined using the Kolmogorov-Smirnov test and presented as mean and standard deviation (x&#x0304; &#x00b1; SD). Comparisons between the two groups were conducted using the independent-samples t-test or Mann-Whitney U test where applicable. Categorical variables were presented as numbers and percentages. The relationship between categorical variables was examined using Pearson&#x2019;s chi-square and Fisher&#x2019;s exact test. Cronbach&#x2019;s alpha coefficient was calculated for the reliability of the tests. A 95% confidence interval was adopted, and statistical significance was set at p &lt; 0.05.</p>
            </sec>
        </sec>
        <sec id="sec12" sec-type="results">
            <title>Results</title>
            <sec id="sec13">
                <title>Exam performance</title>
                <p>Between 2015 and 2025, a total of 263 candidates (193 women, 70 men) participated in the board examination. Two candidates who did not respond to any exam items were excluded, leaving 261 candidates for statistical analysis.</p>
                <p>Mean total exam scores, cut scores, pass rates, and reliability coefficients (Cronbach&#x2019;s &#x03b1;) for both the MCQ and KFP components are summarized in 
                    <xref ref-type="table" rid="T2">
Table 2</xref>. Overall, mean total scores showed a gradual decline after 2018, with the lowest average observed in 2023 (x&#x0304; = 52.33 &#x00b1; 12.18). Pass rates fluctuated across years, ranging from 22.6% (2023) to 85.0% (2016). Reliability coefficients for the MCQ component remained moderate to high (&#x03b1; = 0.53&#x2013;0.90), while those for the KFP component improved over time, from 0.31 in 2021 to 0.85 in 2024.</p>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>
Table 2. </label>
                    <caption>
                        <title>Examination performance metrics by year (2015&#x2013;2025).</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Year
                                    <xref ref-type="table-fn" rid="tfn2">
                                        <sup>a</sup>
                                    </xref>
                                </th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Candidate n</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Exam total score Mean (SD)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Cut score (%)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Pass rate (%)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
MCQ
                                    <xref ref-type="table-fn" rid="tfn3">
                                        <sup>b</sup>
                                    </xref> (&#x03b1;)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
KFP
                                    <xref ref-type="table-fn" rid="tfn3">
                                        <sup>b</sup>
                                    </xref> (&#x03b1;)</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2015</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">26</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">70.45 (5.97)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">70.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">73.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.529</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">-</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2016</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">20</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">74.00 (7.06)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">65.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">85.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.686</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">-</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2017</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">26</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">64.04 (10.33)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">60.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">69.2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.831</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">-</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2018</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">18</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">61.83 (13.65)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">60.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">61.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.899</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">-</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2019</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">22</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">57.64 (10.40)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">50.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">72.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.813</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.425</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2021</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">23</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">56.69 (8.43)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">60.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">39.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.677</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.312</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2022</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">29</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">57.98 (10.77)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">60.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">41.4</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.765</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.648</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2023</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">31</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">52.33 (12.18)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">60.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">22.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.854</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.711</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2024</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">29</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">53.99 (11.36)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">58.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">37.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.807</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.854</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2025</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">37</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">52.90 (9.05)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">58.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">29.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.675</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.662</td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <fn-group content-type="footnotes">
                            <fn id="tfn2">
                                <label>
                                    <sup>a</sup>
                                </label>
                                <p>The 2020 exam cancelled due to COVID-19 pandemic.</p>
                            </fn>
                            <fn id="tfn3">
                                <label>
                                    <sup>b</sup>
                                </label>
                                <p>Cronbach&#x2019;s &#x03b1;.</p>
                            </fn>
                        </fn-group>
                    </table-wrap-foot>
                </table-wrap>
                <p>Male candidates achieved significantly higher mean scores in both the MCQ and exam total scores compared with females, while no statistically significant gender difference was observed for KFP scores 
                    <xref ref-type="table" rid="T3">
Table 3</xref>.</p>
                <table-wrap id="T3" orientation="portrait" position="float">
                    <label>
Table 3. </label>
                    <caption>
                        <title>Comparison of examination scores by gender.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Score type</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Female (mean &#x00b1; SD)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Male (mean &#x00b1; SD)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">t (df
)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
p
                                    <xref ref-type="table-fn" rid="tfn4">
                                        <sup>a</sup>
                                    </xref>
                                </th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">MCQ</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">52.28 &#x00b1; 12.49</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">57.90 &#x00b1; 16.08</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;2.65 (&#x2248;101)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.009</bold>
                                    <xref ref-type="table-fn" rid="tfn5">
                                        <sup>

                                            <bold>b</bold>
                                        </sup>
                                    </xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">KFP</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">8.50 &#x00b1; 3.45</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">8.89 &#x00b1; 3.23</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;0.60 (167)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.459
                                    <xref ref-type="table-fn" rid="tfn6">
                                        <sup>

                                            <bold>c</bold>
                                        </sup>
                                    </xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Exam Total</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">58.24 &#x00b1; 11.49</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">62.35 &#x00b1; 13.28</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;2.45 (259)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.015</bold>
                                    <xref ref-type="table-fn" rid="tfn5">
                                        <sup>

                                            <bold>b</bold>
                                        </sup>
                                    </xref>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <fn-group content-type="footnotes">
                            <fn id="tfn4">
                                <label>
                                    <sup>a</sup>
                                </label>
                                <p>Independent t-test.</p>
                            </fn>
                            <fn id="tfn5">
                                <label>
                                    <sup>b</sup>
                                </label>
                                <p>Statistically significant (p &lt; 0.05).</p>
                            </fn>
                            <fn id="tfn6">
                                <label>
                                    <sup>c</sup>
                                </label>
                                <p>Mann-Whitney U test.</p>
                            </fn>
                        </fn-group>
                    </table-wrap-foot>
                </table-wrap>
                <p>Throughout the study period, male examinees achieved an overall pass rate of 60.0%, compared with 46.6% among female examinees. This difference was not statistically significant (&#x03c7;
                    <sup>2</sup> = 3.68, p = 0.055).</p>
            </sec>
            <sec id="sec14">
                <title>Candidate feedback</title>
                <p>Of the 235 examinees across the 2016&#x2013;2025 examination years, 217 (92.3%) completed the post-examination feedback questionnaire. The mean and standard deviation scores for each structured statement, aggregated across examination years, are presented in 
                    <xref ref-type="table" rid="T4">
Table 4</xref>.</p>
                <table-wrap id="T4" orientation="portrait" position="float">
                    <label>
Table 4. </label>
                    <caption>
                        <title>Candidate evaluation of examination components by year: Mean (SD) Scores.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top"/>
                                <th align="left" colspan="1" rowspan="1" valign="top">
2016 (n = 19)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
2017 (n = 25)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
2018 (n = 18)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
2019 (n = 19)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
2021 (n = 22)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
2022 (n = 27)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
2023 (n = 30)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
2024 (n = 23)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
2025 (n = 34)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Overall (n = 217)</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">

                                    <bold>Item</bold>
</td>
                                <td align="left" colspan="10" rowspan="1" valign="top">
                                    <bold>Mean (SD)</bold>
</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">1. The MCQs were difficult.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.22 (1.93)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.76 (1.83)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.11 (1.49)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.00 (1.87)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.71 (1.71)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.33 (1.62)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.79 (1.54)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.59 (1.89)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.24 (1.33)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.36 (1.77)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2. The KFP questions were difficult.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">4.59 (1.58)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.38 (1.60)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.27 (1.78)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.86 (2.03)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.67 (1.85)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.21 (1.57)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.01 (1.90)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">3. The KFP questions were consistent with my current clinical practice.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.41 (1.33)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.48 (1.29)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.96 (2.01)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.34 (1.65)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.72 (1.13)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.35 (1.72)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.35 (1.59)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">4. I believe the KFP questions measured my clinical problem-solving skills.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.47 (1.33)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.29 (1.93)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.92 (1.98)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.24 (1.81)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.42 (1.22)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.00 (1.79)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.18 (1.72)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">5. I appreciated the inclusion of KFP questions in the exam.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.33 (2.06)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.36 (2.55)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.06 (2.80)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.29 (1.83)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.48 (2.38)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.46 (1.63)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.52 (2.37)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.35 (2.16)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.88 (2.11)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.31 (2.09)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">6. The exam duration was sufficient.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.22 (1.96)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.24 (1.74)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.67 (1.50)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.71 (2.85)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.14 (2.85)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.11 (2.81)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">4.89 (3.26)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">3.82 (2.61)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">3.91 (3.18)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.14 (2.97)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">7. The exam was well-organized.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.44 (1.85)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.12 (1.64)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.28 (2.05)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.82 (2.24)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">8.24 (0.94)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">8.11 (1.15)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">8.07 (1.19)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">8.23 (1.11)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.76 (1.60)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.75 (1.54)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">8. The exam was designed to effectively differentiate between knowledgeable and less knowledgeable candidates.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.67 (1.64)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.56 (1.76)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.33 (1.68)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.00 (1.70)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.33 (2.01)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.37 (1.69)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.24 (1.30)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.91 (1.11)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.26 (1.76)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.24 (1.68)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">9. The distribution of questions across topics was balanced throughout the exam.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.59 (1.54)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.40 (1.58)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.78 (1.90)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.88 (1.54)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.95 (1.53)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.41 (1.95)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.17 (1.71)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.55 (1.01)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.26 (1.78)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.11 (1.66)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">10. The exam was suitable for pediatric endocrinolgy subspecialty qualification assessment.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.88 (1.69)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5.84 (2.29)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.44 (1.62)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.00 (1.70)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.52 (2.09)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.11 (1.87)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.31 (1.39)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.41 (1.30)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.47 (1.56)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.00 (1.67)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">11. The exam content was consistent with the scope of my pediatric endocrinolgy subspecialty training.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2013;</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.12 (2.03)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6.81 (1.99)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.30 (1.84)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.52 (1.50)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.64 (1.09)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.56 (1.44)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7.15 (1.79)</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>Overall, participants rated the MCQ section as slightly more difficult (x&#x0304; = 6.36 &#x00b1; 1.77) than the KFP section (x&#x0304; = 6.01 &#x00b1; 1.90). Most respondents agreed that the KFP questions reflected real clinical practice and effectively assessed problem-solving ability. The inclusion of KFPs received high satisfaction scores, indicating broad approval of this format.</p>
                <p>Among all items, exam organization achieved the highest overall average rating (x&#x0304; = 7.75 &#x00b1; 1.54), suggesting that participants were highly satisfied with the administration and logistical arrangements. Conversely, the adequacy of exam duration received the lowest rating (x&#x0304; = 5.14 &#x00b1; 2.97), representing a level only slightly above neutrality on the satisfaction scale and highlighting persistent time-related concerns.</p>
                <p>Open-ended feedback from the examinations revealed diverse yet coherent themes reflecting participants&#x2019; evaluation of exam quality, content relevance, and organizational logistics. Respondents appreciated the exam&#x2019;s comprehensive coverage across all topics and its alignment with clinical practice, particularly valuing the case-based (KFP) section for fostering problem-solving and reflective learning. Several participants noted that the test effectively highlighted their knowledge gaps and motivated further study.</p>
                <p>Conversely, time constraints emerged as the most prominent source of dissatisfaction. Participants described the allotted duration as insufficient, citing long and complex questions that limited their ability to complete the exam. Suggestions included extending the time or dividing the test into two parts.</p>
                <p>Regarding content, opinions were mixed. While most found the questions well prepared, a few perceived the MCQs as overly difficult or focused on unnecessary details&#x2014;particularly in genetic and metabolic topics. Participants recommended increasing the proportion of clinical and case-based questions and providing post-exam answer booklets to enhance learning. Despite the stress associated with the process, many expressed gratitude to the organizing committee and acknowledged the exam&#x2019;s educational value in guiding professional self-assessment and development.</p>
            </sec>
        </sec>
        <sec id="sec15" sec-type="discussion">
            <title>Discussion</title>
            <p>This 10-year evaluation of the TPEBE provides the first comprehensive overview of its evolution, psychometric performance, and candidates&#x2019; perceptions since its inception in 2015. The findings reveal a trajectory of increasing structural improvement and reliability, particularly following the integration of KFPs in 2019. The transition from a purely MCQ format to a mixed MCQ&#x2013;KFP design has progressively strengthened the exam&#x2019;s ability to assess higher-order clinical reasoning while maintaining fairness and organizational quality. Pass rates fluctuated over time, likely reflecting the combined influence of item difficulty calibration, evolving training quality, and candidate preparedness. Importantly, reliability coefficients for both sections improved steadily, indicating the maturation of item-writing processes and enhanced internal consistency.</p>
            <p>Throughout the decade, mean total scores and pass rates varied between exam years, with notably higher performance during the early years and lower outcomes after 2021. Several factors may explain this trend. The early sessions (2015&#x2013;2018) consisted exclusively of MCQs, a format known for high reliability and broad content coverage but limited depth in assessing clinical judgment.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>,
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup> As the examination evolved to include KFPs, which emphasize reasoning and decision-making, overall scores declined&#x2014;an expected phenomenon also observed in other board examinations that introduced performance-oriented formats.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>,
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup> The temporary disruption of training schedules and reduced clinical exposure during the COVID-19 pandemic may have further contributed to decreased performance in 2021&#x2013;2023, a possible concern reported in other research in postgraduate assessments.
                <sup>
                    <xref ref-type="bibr" rid="ref15">15</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref18">18</xref>
                </sup>
            </p>
            <p>The introduction of KFPs in 2019 represents a major pedagogical advancement in the TPEBE. Designed to target &#x201c;key decision points&#x201d; in clinical management, KFPs enable a more valid assessment of problem-solving and integrative reasoning than MCQs alone.
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>,
                    <xref ref-type="bibr" rid="ref13">13</xref>
                </sup> The progressive increase in KFP reliability&#x2014;from &#x03b1; = 0.312 in 2021 to &#x03b1; = 0.854 in 2024&#x2014;indicates improved case design, examiner calibration, and standardization procedures. This pattern suggests growing psychometric robustness and aligns with findings from international studies reporting that mixed-format examinations achieve better construct validity.
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>,
                    <xref ref-type="bibr" rid="ref8">8</xref>,
                    <xref ref-type="bibr" rid="ref19">19</xref>,
                    <xref ref-type="bibr" rid="ref20">20</xref>
                </sup> The balanced use of two complementary item types&#x2014;MCQs for breadth and KFPs for depth&#x2014;reflects an evidence-informed approach to assessment design consistent with contemporary medical education principles.
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>,
                    <xref ref-type="bibr" rid="ref21">21</xref>
                </sup>
            </p>
            <p>In this study, although male candidates achieved slightly higher MCQ and total scores, gender differences in pass rates were not statistically significant. Comparable findings have been reported in other nationwide postgraduate assessments, with several studies showing no overall gender differences in examination outcomes, and in some cases, female candidates performing significantly better in certain domains or clinical components.
                <sup>
                    <xref ref-type="bibr" rid="ref22">22</xref>,
                    <xref ref-type="bibr" rid="ref23">23</xref>
                </sup> The observed score gap may reflect differential exposure to standardized testing environments or self-perceived exam confidence. Importantly, the absence of significant disparities in KFP performance suggests that case-based, reasoning-oriented formats may reduce gender-related score variance, supporting their inclusion as a more equitable assessment component.</p>
            <p>Candidates&#x2019; feedback provides valuable qualitative insight into the examination&#x2019;s educational value and perceived validity. The majority of participants rated the exam highly for its organization, content balance, and reflection of real clinical practice. The strong endorsement of the KFP format underscores its perceived authenticity and relevance to day-to-day decision-making in pediatric endocrinology. Participants consistently reported that the exam highlighted their knowledge gaps and motivated targeted self-study&#x2014;confirming the dual function of certification assessments as both evaluative and formative instruments. These findings echo international reports emphasizing the educational impact of well-designed board examinations in promoting reflective practice and continuing professional development.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>,
                    <xref ref-type="bibr" rid="ref24">24</xref>
                </sup>
            </p>
            <p>After the introduction of KFPs, the most consistent area of dissatisfaction concerned the perceived inadequacy of examination duration (x&#x0304; = 5.14 &#x00b1; 2.97). The relatively large standard deviation indicates considerable variability in respondents&#x2019; perceptions, suggesting diverse experiences or expectations regarding the sufficiency of the allotted time. This concern likely arises from the inherent cognitive load of the KFP section, which requires interpretive reasoning and the formulation of written responses. Similar challenges have been reported in previous studies, noting that constructing answers under open-ended conditions is cognitively demanding and time-consuming, as it involves articulating reasoning and justifying decisions rather than merely recognizing correct options.
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>,
                    <xref ref-type="bibr" rid="ref25">25</xref>,
                    <xref ref-type="bibr" rid="ref26">26</xref>
                </sup> Addressing time constraints&#x2014;either through adaptive scheduling or improved question pacing&#x2014;could enhance candidate experience without compromising assessment validity.</p>
            <p>The TBPEC&#x2019;s iterative review of item performance and appeals, coupled with scoring adjustments for omitted questions, reflects a transparent and learner-centered quality assurance framework. The continuous monitoring of reliability and cut-score consistency over time has strengthened the exam&#x2019;s credibility and accountability. Looking ahead, the transition to digital or hybrid exam delivery may offer opportunities for enhanced item analysis, automated scoring, and secure remote administration, aligning the TPEBE with global advancements in high-stakes assessment.</p>
            <sec id="sec16">
                <title>Limitations and future directions</title>
                <p>This analysis has some limitations. The cross-sectional design precludes longitudinal tracking of individual progress or causal inference regarding changes in performance. Survey data were self-reported and may be influenced by response bias. Furthermore, the absence of pre-pandemic comparator data limits the interpretation of COVID-19&#x2013;related effects. Future research could explore predictive validity (e.g., relationship between board scores and subsequent clinical performance), longitudinal reliability, and the psychometric behavior of KFP items across specialties.</p>
            </sec>
        </sec>
        <sec id="sec17" sec-type="conclusion">
            <title>Conclusion</title>
            <p>Over the past decade, the TPEBE has evolved from a traditional knowledge-based test into a multidimensional assessment aligned with international best practices in medical education. The integration of KFPs, progressive enhancement of reliability indices, and strong candidate endorsement collectively indicate a maturing and credible certification process. Continued investment in psychometric monitoring, examiner training, and technological innovation will further enhance the examination&#x2019;s role in ensuring clinical competence and fostering excellence in pediatric endocrinology practice in T&#x00fc;rkiye. It is anticipated that these examinations will be increasingly recognized and utilized by educational institutions and regulatory authorities as a benchmark for achievement in both education and professional practice.</p>
        </sec>
        <sec id="sec18">
            <title>Ethical considerations</title>
            <p>The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ege University Faculty of Medicine Medical Research Ethics Board Ref. Date: 25.05.2023 #23-5.1T/46. Informed consent was obtained verbally from all participants prior to data collection. Verbal consent was obtained because the survey was anonymous, posed minimal risk, and did not involve the collection of any identifying or sensitive personal information.</p>
        </sec>
    </body>
    <back>
        <sec id="sec22" sec-type="data-availability">
            <title>Data availability statement</title>
            <p>figshare: A Decade of the Turkish Pediatric Endocrinology Subspecialty Board Examination: Structure, Outcomes, and Candidate Perspectives.</p>
            <p>Dataset: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.30628319">https://doi.org/10.6084/m9.figshare.30628319</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref27">27</xref>
                </sup>
            </p>
            <p>This project contains the following data:
                <list list-type="bullet">
                    <list-item>
                        <label>&#x2022;</label>
                        <p>Data File: TPEBE-Data-2015-2025.xlsx</p>
                    </list-item>
                </list>
            </p>
            <p>Data are available under the terms of the 
                <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/deed.en">CC BY 4.0</ext-link>
            </p>
        </sec>
        <ack>
            <title>Acknowledgements</title>
            <p>We express our sincere appreciation to the TPEBEC members&#x2014;Ece B&#x00f6;ber, Hakan D&#x00f6;neray, &#x015e;enay Sava&#x015f; Erdeve, Damla G&#x00f6;k&#x015f;en, Filiz T&#x00fc;t&#x00fc;nc&#x00fc;ler K&#x00f6;kenli, Samim &#x00d6;zen, Alev &#x00d6;z&#x00f6;n, and Do&#x011f;a T&#x00fc;rkkahraman&#x2014;for their invaluable efforts and contributions throughout the entire TPEBE process.</p>
            <p>We are also grateful to our academic colleagues who developed and submitted the MCQ and KFP items for the examination, whose contributions were essential to the success of this work. In addition, we extend our heartfelt thanks to all TPEBE participants for their engagement in the examination and for their time, commitment, and willingness to contribute to this study.</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="other">
                    <article-title>T&#x00fc;rkiye Cumhuriyeti T&#x0131;pta Uzmanl&#x0131;k T&#x00fc;z&#x00fc;&#x011f;&#x00fc; - Republic of T&#x00fc;rkiye Medical Specialization Regulation. </article-title>
                    <year>2002</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://dosyamerkez.saglik.gov.tr/Eklenti/13286/0/24790pdf.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="other">
                    <article-title>T&#x00fc;rkiye Cumhuriyeti Tababet Uzmanl&#x0131;k T&#x00fc;z&#x00fc;&#x011f;&#x00fc; - Republic of T&#x00fc;rkiye Medical Specialization Regulation.</article-title>
                    <year>1973</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://dosyamerkez.saglik.gov.tr/Eklenti/14666/0/1973pdf.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="other">
                    <article-title>Tarih&#x00e7;e &#x2013; &#x00c7;ocuk Endokrinolojisi ve Diyabet Derne&#x011f;i. </article-title>Accessed July 24, 2025.
                    <ext-link ext-link-type="uri" xlink:href="https://cocukendokrindiyabet.org/dernegimiz/tarihce/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Staudenmann</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Waldner</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>L&#x00f6;rwald</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Medical specialty certification exams studied according to the Ottawa Quality Criteria: a systematic review.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med. Educ.</italic>
</source>
                    <year>2023</year>;<volume>23</volume>(<issue>1</issue>):<fpage>619</fpage>&#x2013;<lpage>620</lpage>.
                    <pub-id pub-id-type="pmid">37649019</pub-id>
                    <pub-id pub-id-type="doi">10.1186/S12909-023-04600-X</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10466740</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shumway</surname>
                            <given-names>JM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Harden</surname>
                            <given-names>RM</given-names>
                        </name>
</person-group>:
                    <article-title>Medical Teacher AMEE Guide No. 25: The assessment of learning outcomes for the competent and reflective physician.</article-title>
                    <year>2009</year>.
                    <pub-id pub-id-type="doi">10.1080/0142159032000151907</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yudkowsky</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Soo</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Downing</surname>
                            <given-names>SM</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Assessment in Health Professions Education Editado Por Rachel Yudkowsky, Yoon Soo Park, Steven M. Downing.</italic>
</source>
                    <publisher-name>Routledge</publisher-name>;<year>2020</year>. Accessed December 20, 2024.
                    <ext-link ext-link-type="uri" xlink:href="https://www.routledge.com/Assessment-in-Health-Professions-Education/Yudkowsky-Park-Downing/p/book/9781315166902">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Renes</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vleuten</surname>
                            <given-names>CPM</given-names>
                            <prefix>van der</prefix>
                        </name>

                        <name name-style="western">
                            <surname>Collares</surname>
                            <given-names>CF</given-names>
                        </name>
</person-group>:
                    <article-title>Utility of a multimodal computer-based assessment format for assessment with a higher degree of reliability and validity.</article-title>
                    <source>

                        <italic toggle="yes">Med. Teach.</italic>
</source>
                    <year>2023</year>;<volume>45</volume>(<issue>4</issue>):<fpage>433</fpage>&#x2013;<lpage>441</lpage>.
                    <pub-id pub-id-type="pmid">36306368</pub-id>
                    <pub-id pub-id-type="doi">10.1080/0142159X.2022.2137011</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wijk</surname>
                            <given-names>EV</given-names>
                            <prefix>van</prefix>
                        </name>

                        <name name-style="western">
                            <surname>Janse</surname>
                            <given-names>RJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ruijter</surname>
                            <given-names>BN</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Use of very short answer questions compared to multiple choice questions in undergraduate medical students: An external validation study.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2023</year>;<volume>18</volume>(<issue>7</issue>):<fpage>e0288558</fpage>.
                    <pub-id pub-id-type="pmid">37450485</pub-id>
                    <pub-id pub-id-type="doi">10.1371/JOURNAL.PONE.0288558</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10348524</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schuwirth</surname>
                            <given-names>LWT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Van Der Vleuten</surname>
                            <given-names>CPM</given-names>
                        </name>
</person-group>:
                    <article-title>Different written assessment methods: what can be said about their strengths and weaknesses?</article-title>
                    <source>

                        <italic toggle="yes">Med. Educ.</italic>
</source>
                    <year>2004</year>;<volume>38</volume>(<issue>9</issue>):<fpage>974</fpage>&#x2013;<lpage>979</lpage>.
                    <pub-id pub-id-type="pmid">15327679</pub-id>
                    <pub-id pub-id-type="doi">10.1111/J.1365-2929.2004.01916.X</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Y&#x0131;lmaz</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>&#x00c7;al&#x0131;&#x015f;kan</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Darcan</surname>
                            <given-names>&#x015e;</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Flipped learning in faculty development programs: opportunities for greater faculty engagement, self-learning, collaboration and discussion.</article-title>
                    <source>

                        <italic toggle="yes">Turk. J. Biochem.</italic>
</source>
                    <year>2022</year>;<volume>47</volume>(<issue>1</issue>):<fpage>127</fpage>&#x2013;<lpage>135</lpage>.
                    <pub-id pub-id-type="doi">10.1515/TJB-2021-0071</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Page</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bordage</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Allen</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>Developing key-feature problems and examinations to assess clinical decision-making skills.</article-title>
                    <source>

                        <italic toggle="yes">Acad. Med.</italic>
</source>
                    <year>1995</year>;<volume>70</volume>(<issue>3</issue>):<fpage>194</fpage>&#x2013;<lpage>201</lpage>. Accessed January 11, 2025.
                    <ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/pubmed/7873006">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bordage</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gordon</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <chapter-title>An Alternative to PMPs: The &#x201c;Key Features&#x201d; Concept. Further Developments in Assessing Clinical Competence, 2nd Ottawa Conference, 1987, 59-75.</chapter-title>
                    <source>

                        <italic toggle="yes">An Alternative to PMPs: The &#x201c;Key Features&#x201d; Concept. Further Developments in Assessing Clinical Competence, 2nd Ottawa Conference.</italic>
</source>
                    <year>1987</year>;<fpage>59</fpage>&#x2013;<lpage>75</lpage>.</mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="other">
                    <collab>Medical Council of Canada</collab>:
                    <article-title>Guidelines for the Development of Key Feature Problems &amp; Test Cases.</article-title>
                    <year>2012</year>. Accessed December 21, 2024.
                    <ext-link ext-link-type="uri" xlink:href="https://mcc.ca/wp-content/uploads/CDM-Guidelines.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Page</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bordage</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>The Medical Council of Canada&#x2019;s key features project: a more valid written examination of clinical decision-making skills.</article-title>
                    <source>

                        <italic toggle="yes">Acad. Med.</italic>
</source>
                    <year>1995</year>;<volume>70</volume>(<issue>2</issue>):<fpage>104</fpage>&#x2013;<lpage>110</lpage>. Accessed March 8, 2019.
                    <pub-id pub-id-type="doi">10.1097/00001888-199502000-00012</pub-id>
                    <ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/pubmed/7865034">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ryan</surname>
                            <given-names>MS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Holmboe</surname>
                            <given-names>ES</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chandra</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Competency-Based Medical Education: Considering Its Past, Present, and a Post-COVID-19 Era.</article-title>
                    <source>

                        <italic toggle="yes">Acad. Med.</italic>
</source>
                    <year>2022</year>;<volume>97</volume>:<fpage>S90</fpage>&#x2013;<lpage>S97</lpage>.
                    <pub-id pub-id-type="pmid">34817404</pub-id>
                    <pub-id pub-id-type="doi">10.1097/ACM.0000000000004535</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8855766</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sneyd</surname>
                            <given-names>JR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mathoulin</surname>
                            <given-names>SE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>O&#x2019;Sullivan</surname>
                            <given-names>EP</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Impact of the COVID-19 pandemic on anaesthesia trainees and their training.</article-title>
                    <source>

                        <italic toggle="yes">Br. J. Anaesth.</italic>
</source>
                    <year>2020</year>;<volume>125</volume>(<issue>4</issue>):<fpage>450</fpage>&#x2013;<lpage>455</lpage>.
                    <pub-id pub-id-type="pmid">32773215</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.bja.2020.07.011</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7377727</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Patil</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ranjan</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kumar</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Impact of COVID-19 Pandemic on Post-Graduate Medical Education and Training in India: Lessons Learned and Opportunities Offered.</article-title>
                    <source>

                        <italic toggle="yes">Adv. Med. Educ. Pract.</italic>
</source>
                    <year>2021</year>;<volume>12</volume>:<fpage>809</fpage>&#x2013;<lpage>816</lpage>.
                    <pub-id pub-id-type="pmid">34345196</pub-id>
                    <pub-id pub-id-type="doi">10.2147/AMEP.S320524</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8325012</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="other">
                    <article-title>Exam Pass Rates|The American Board of Pediatrics. </article-title>Accessed October 13, 2025.
                    <ext-link ext-link-type="uri" xlink:href="https://www.abp.org/content/exam-pass-rates">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Farmer</surname>
                            <given-names>EA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Page</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>A practical guide to assessing clinical decision-making skills using the key features approach.</article-title>
                    <source>

                        <italic toggle="yes">Med. Educ.</italic>
</source>
                    <year>2005</year>;<volume>39</volume>(<issue>12</issue>):<fpage>1188</fpage>&#x2013;<lpage>1194</lpage>.
                    <pub-id pub-id-type="pmid">16313577</pub-id>
                    <pub-id pub-id-type="doi">10.1111/J.1365-2929.2005.02339.X</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>McNamara</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Scott</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Boyd</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Constructing validity evidence from a pilot key-features assessment of clinical decision-making in cerebral palsy diagnosis: application of Kane&#x2019;s validity framework to implementation evaluations.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med. Educ.</italic>
</source>
                    <year>2023</year>;<volume>23</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>19</lpage>.
                    <pub-id pub-id-type="doi">10.1186/S12909-023-04631-4/TABLES/4</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bird</surname>
                            <given-names>JB</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Olvet</surname>
                            <given-names>DM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Willey</surname>
                            <given-names>JM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Patients don&#x2019;t come with multiple choice options: essay-based assessment in UME.</article-title>
                    <source>

                        <italic toggle="yes">Med. Educ. Online.</italic>
</source>
                    <year>2019</year>;<volume>24</volume>(<issue>1</issue>).
                    <pub-id pub-id-type="pmid">31438809</pub-id>
                    <pub-id pub-id-type="doi">10.1080/10872981.2019.1649959</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6720218</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sulistio</surname>
                            <given-names>MS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Khera</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Squiers</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Effects of gender in resident evaluations and certifying examination pass rates.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med. Educ.</italic>
</source>
                    <year>2019</year>;<volume>19</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>7</lpage>.
                    <pub-id pub-id-type="doi">10.1186/S12909-018-1440-7/FIGURES/2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ellis</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Knapton</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cannon</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A multivariate analysis examining the relationship between sociodemographic differences and UK graduates&#x2019; performance on postgraduate medical exams.</article-title>
                    <source>

                        <italic toggle="yes">Med. Teach.</italic>
</source>
                    <year>2025</year>;<fpage>1</fpage>&#x2013;<lpage>15</lpage>.
                    <pub-id pub-id-type="pmid">40512226</pub-id>
                    <pub-id pub-id-type="doi">10.1080/0142159X.2025.2513426</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bhanji</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Naik</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Skoll</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Competence by Design: The Role of High-Stakes Examinations in a Competence Based Medical Education System.</article-title>
                    <source>

                        <italic toggle="yes">Perspect. Med. Educ.</italic>
</source>
                    <year>2024</year>;<volume>13</volume>(<issue>1</issue>):<fpage>68</fpage>&#x2013;<lpage>74</lpage>.
                    <pub-id pub-id-type="pmid">38343558</pub-id>
                    <pub-id pub-id-type="doi">10.5334/PME.965</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10854425</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Huwendiek</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Reichert</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Duncker</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Electronic assessment of clinical reasoning in clerkships: A mixed-methods comparison of long-menu key-feature problems with context-rich single best answer questions.</article-title>
                    <source>

                        <italic toggle="yes">Med. Teach.</italic>
</source>
                    <year>2017</year>;<volume>39</volume>(<issue>5</issue>):<fpage>476</fpage>&#x2013;<lpage>485</lpage>.
                    <pub-id pub-id-type="pmid">28281369</pub-id>
                    <pub-id pub-id-type="doi">10.1080/0142159X.2017.1297525</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>&#x00c7;al&#x0131;&#x015f;kan</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ta&#x015f;delen Teker</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mavio&#x011f;lu</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Digital transformation of the Turkish national neurology board examination: Implementation and candidates&#x2019; feedback.</article-title>
                    <source>

                        <italic toggle="yes">Turkish Journal of Neurology.</italic>
</source>
                    <year>2025</year>;<volume>31</volume>(<issue>3</issue>):<fpage>270</fpage>&#x2013;<lpage>277</lpage>.
                    <pub-id pub-id-type="doi">10.55697/TND.2025.383</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <label>27</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>&#x00c7;al&#x0131;&#x015f;kan</surname>
                            <given-names>SA</given-names>
                        </name>
</person-group>:
                    <article-title>Turkish Pediatric Endocrinology Subspecialty Board Examination 2015-2025.</article-title>Research Data.
                    <pub-id pub-id-type="doi">10.6084/m9.figshare.30628319</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report448667">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.191571.r448667</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Mansoor</surname>
                        <given-names>Masab</given-names>
                    </name>
                    <xref ref-type="aff" rid="r448667a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0009-0007-4501-7016</uri>
                </contrib>
                <aff id="r448667a1">
                    <label>1</label>Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>10</day>
                <month>1</month>
                <year>2026</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2026 Mansoor M</copyright-statement>
                <copyright-year>2026</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport448667" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.173732.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Peer Review Report</p>
            <p> Summary</p>
            <p> This cross-sectional study presents a comprehensive 10-year evaluation (2015-2025) of the Turkish Pediatric Endocrinology Subspecialty Board Examination (TPEBE), analyzing examination data from 263 sittings (261 eligible candidates) and post-examination survey responses from 217 participants. The manuscript documents the structural evolution from exclusively multiple-choice questions (MCQs) to a mixed format incorporating key feature problems (KFPs) beginning in 2019, and evaluates psychometric performance metrics and candidate perspectives. The authors demonstrate progressive improvements in reliability coefficients, variable pass rates (22.6%-85.0%), and generally positive candidate feedback regarding exam organization and clinical relevance.</p>
            <p> Detailed Assessment</p>
            <p> 
                <bold>Is the work clearly and accurately presented and does it cite the current literature?</bold>
            </p>
            <p> 
                <bold>Answer: Yes</bold>
            </p>
            <p> The manuscript is well-structured, logically organized, and clearly written with appropriate scientific terminology. The literature review adequately contextualizes the study within the broader framework of medical education assessment, citing relevant international standards and contemporary assessment theory. The references span classical assessment literature (Bordage, Page, Downing) and contemporary validation frameworks, appropriately supporting the methodological choices and interpretive context.</p>
            <p> 
                <bold>Minor recommendations:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Consider citing more recent literature on post-pandemic effects on medical education outcomes (the manuscript mentions COVID-19 impacts but could strengthen this with additional 2023-2024 references)</p>
                    </list-item>
                    <list-item>
                        <p>The discussion of gender differences in examination performance could benefit from more recent systematic reviews or meta-analyses on this topic in medical education</p>
                    </list-item>
                </list> </p>
            <p> 
                <bold>Is the study design appropriate and is the work technically sound?</bold>
            </p>
            <p> 
                <bold>Answer: Yes</bold>
            </p>
            <p> The cross-sectional design is appropriate for the research objectives. The 10-year timeframe provides sufficient data for trend analysis and evaluation of structural changes. The combination of quantitative examination metrics with qualitative candidate feedback creates a robust mixed-methods approach that enhances the validity of conclusions.</p>
            <p> 
                <bold>Strengths:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Comprehensive dataset spanning a full decade</p>
                    </list-item>
                    <list-item>
                        <p>High survey response rate (92.3%)</p>
                    </list-item>
                    <list-item>
                        <p>Appropriate psychometric analyses (Cronbach's &#x03b1;, pass rates, mean scores)</p>
                    </list-item>
                    <list-item>
                        <p>Transparent reporting of examination structure evolution</p>
                    </list-item>
                </list> 
                <bold>Minor considerations:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>The exclusion of two candidates who did not respond to any items is reasonable and well-documented</p>
                    </list-item>
                    <list-item>
                        <p>The cancellation of the 2020 examination due to COVID-19 creates a minor gap but is unavoidable and well-explained</p>
                    </list-item>
                </list> </p>
            <p> 
                <bold>Are sufficient details of methods and analysis provided to allow replication by others?</bold>
            </p>
            <p> 
                <bold>Answer: Partly</bold>
            </p>
            <p> The manuscript provides substantial methodological detail, but several areas require clarification or expansion:</p>
            <p> 
                <bold>Strengths:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Clear description of examination structure evolution (Table 1)</p>
                    </list-item>
                    <list-item>
                        <p>Specification of standard-setting methods (Nedelsky for MCQs, Angoff for KFPs)</p>
                    </list-item>
                    <list-item>
                        <p>Description of item development and review processes</p>
                    </list-item>
                    <list-item>
                        <p>Statistical methods appropriately described</p>
                    </list-item>
                </list> 
                <bold>Areas requiring clarification:</bold> 
                <list list-type="order">
                    <list-item>
                        <p>
                            <bold>KFP Scoring Details:</bold>&#x00a0;The manuscript states that KFPs "mostly required short written responses" with five being multiple-response items, but the specific scoring rubrics, rater training procedures, and inter-rater reliability assessments are not described. This is critical for replication and interpretation of the &#x03b1; coefficients for KFPs.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Standard-Setting Procedures:</bold>&#x00a0;While the methods (Nedelsky, Angoff) are named, the specific implementation details are lacking: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>How many judges participated?</p>
                                </list-item>
                                <list-item>
                                    <p>What was the judge calibration process?</p>
                                </list-item>
                                <list-item>
                                    <p>How were discrepancies resolved?</p>
                                </list-item>
                                <list-item>
                                    <p>Were modified or traditional versions of these methods used?</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Item Bank Management:</bold>&#x00a0;The process for organizing, selecting, and ensuring content validity of questions from the item bank needs more detail: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>What were the predefined subject categories?</p>
                                </list-item>
                                <list-item>
                                    <p>How was content balance ensured across examination domains?</p>
                                </list-item>
                                <list-item>
                                    <p>What quality control procedures were applied during item selection?</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Survey Instrument:</bold>&#x00a0;The 9-point Likert scale items are presented, but the development and validation of the survey instrument itself is not discussed. Was it pilot-tested? Was content validity established?</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Missing Data:</bold>&#x00a0;The manuscript should clarify whether there were any missing survey responses and how these were handled in analysis.</p>
                    </list-item>
                </list> 
                <bold>Recommendations:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Add a supplementary methods section or appendix detailing KFP scoring procedures</p>
                    </list-item>
                    <list-item>
                        <p>Provide more operational detail on standard-setting implementation</p>
                    </list-item>
                    <list-item>
                        <p>Clarify the content blueprint or specification table used for examination construction</p>
                    </list-item>
                </list> </p>
            <p> 
                <bold>Is the statistical analysis and its interpretation appropriate?</bold>
            </p>
            <p> 
                <bold>Answer: Yes</bold>
            </p>
            <p> The statistical methods are appropriate for the research questions and data types:</p>
            <p> 
                <bold>Appropriate analyses:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Descriptive statistics (means, standard deviations, frequencies, percentages)</p>
                    </list-item>
                    <list-item>
                        <p>Independent samples t-tests and Mann-Whitney U tests for group comparisons</p>
                    </list-item>
                    <list-item>
                        <p>Chi-square and Fisher's exact tests for categorical associations</p>
                    </list-item>
                    <list-item>
                        <p>Cronbach's &#x03b1; for internal consistency reliability</p>
                    </list-item>
                    <list-item>
                        <p>Appropriate significance threshold (p &lt; 0.05)</p>
                    </list-item>
                </list> 
                <bold>Strengths:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Proper selection of parametric vs. non-parametric tests based on distribution assessment (Kolmogorov-Smirnov)</p>
                    </list-item>
                    <list-item>
                        <p>Clear presentation of results with appropriate measures of central tendency and dispersion</p>
                    </list-item>
                    <list-item>
                        <p>Transparent reporting of statistical significance</p>
                    </list-item>
                </list> 
                <bold>Minor considerations:</bold> 
                <list list-type="order">
                    <list-item>
                        <p>
                            <bold>Effect Sizes:</bold>&#x00a0;While statistical significance is reported, effect sizes (e.g., Cohen's d for t-tests) would enhance interpretation of the practical significance of gender differences.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Multiple Comparisons:</bold>&#x00a0;With multiple statistical tests conducted, consideration of family-wise error rate correction (e.g., Bonferroni adjustment) might be warranted, though given the exploratory nature of some analyses, the current approach is defensible.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Trend Analysis:</bold>&#x00a0;Given the longitudinal nature of the data, formal trend analysis (e.g., linear regression of scores over time, joinpoint regression to identify structural breaks) could strengthen the conclusions about score trajectories and the impact of format changes.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Reliability Confidence Intervals:</bold>&#x00a0;Presenting confidence intervals for Cronbach's &#x03b1; values would enhance interpretation of reliability estimates, particularly for smaller sample sizes in individual years.</p>
                    </list-item>
                </list> 
                <bold>Recommendations:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Include effect sizes for key comparisons</p>
                    </list-item>
                    <list-item>
                        <p>Consider adding trend analysis to formally test temporal patterns</p>
                    </list-item>
                    <list-item>
                        <p>Provide confidence intervals for reliability coefficients</p>
                    </list-item>
                </list> </p>
            <p> 
                <bold>Are all source data underlying the results available to ensure full reproducibility?</bold>
            </p>
            <p> 
                <bold>Answer: Yes</bold>
            </p>
            <p> The authors have made their dataset publicly available through figshare (DOI: 10.6084/m9.figshare.30628319) under CC BY 4.0 license, which is commendable and facilitates transparency and potential replication or secondary analysis.</p>
            <p> 
                <bold>Recommendation:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Consider also depositing the survey instrument itself as supplementary material to enable full methodological transparency</p>
                    </list-item>
                </list> </p>
            <p> 
                <bold>Are the conclusions drawn adequately supported by the results?</bold>
            </p>
            <p> 
                <bold>Answer: Yes</bold>
            </p>
            <p> The conclusions are generally well-supported by the presented data and appropriately qualified:</p>
            <p> 
                <bold>Well-supported conclusions:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Progressive improvement in psychometric robustness, particularly reliability</p>
                    </list-item>
                    <list-item>
                        <p>Successful integration of KFPs enhanced assessment of clinical reasoning</p>
                    </list-item>
                    <list-item>
                        <p>High candidate satisfaction with examination organization and relevance</p>
                    </list-item>
                    <list-item>
                        <p>Time constraints as a persistent challenge requiring attention</p>
                    </list-item>
                </list> 
                <bold>Appropriately qualified interpretations:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>The authors appropriately acknowledge multiple potential explanations for score declines (format change, COVID-19 disruption, item difficulty)</p>
                    </list-item>
                    <list-item>
                        <p>Gender differences are interpreted cautiously given non-significant pass rate disparities</p>
                    </list-item>
                    <list-item>
                        <p>Limitations are transparently discussed</p>
                    </list-item>
                </list> 
                <bold>Minor considerations:</bold> 
                <list list-type="order">
                    <list-item>
                        <p>
                            <bold>Causality:</bold>&#x00a0;While the manuscript appropriately avoids strong causal claims, the discussion of score declines following KFP introduction could more explicitly acknowledge confounding (e.g., changes in candidate cohort characteristics, training program evolution, concurrent curricular changes).</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Generalizability:</bold>&#x00a0;The conclusions could be more explicit about the context-specific nature of findings (Turkish medical education system, pediatric endocrinology subspecialty) and what aspects might generalize to other contexts.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Predictive Validity:</bold>&#x00a0;The manuscript acknowledges as a limitation the lack of data on downstream clinical performance, but this could be emphasized more strongly as it affects interpretation of "validity" claims.</p>
                    </list-item>
                </list> 
                <bold>Recommendations:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Add a brief statement acknowledging that observed relationships are associative rather than causal</p>
                    </list-item>
                    <list-item>
                        <p>Explicitly discuss which findings are likely context-specific vs. generalizable</p>
                    </list-item>
                    <list-item>
                        <p>Consider moderating claims about "validity" to focus on content and construct validity rather than broader validity claims without predictive validity data</p>
                    </list-item>
                </list> </p>
            <p> Specific Technical Issues</p>
            <p> Table 2: Examination Performance Metrics 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>Issue:</bold>&#x00a0;The pass rate calculation denominator is unclear. Are candidates who were administratively ineligible or withdrew counted in the denominator?</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Recommendation:</bold>&#x00a0;Clarify in the table notes whether pass rates are calculated as passes/eligible examinees or passes/actual examinees</p>
                    </list-item>
                </list> Table 3: Gender Comparison 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>Issue:</bold>&#x00a0;The degrees of freedom for the MCQ comparison (df &#x2248; 101) suggests unequal variances were assumed, but this is not explicitly stated</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Recommendation:</bold>&#x00a0;Specify whether equal or unequal variances were assumed and report Levene's test results if applicable</p>
                    </list-item>
                </list> Table 4: Longitudinal Survey Data 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>Strength:</bold>&#x00a0;Excellent comprehensive presentation</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Minor issue:</bold>&#x00a0;The progressive decline in time adequacy ratings is dramatic (7.67 in 2018 to 3.91 in 2025) but not explicitly highlighted or statistically tested</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Recommendation:</bold>&#x00a0;Consider a formal trend analysis or ANOVA across years for this critical item</p>
                    </list-item>
                </list> Figure Absence 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>Observation:</bold>&#x00a0;The manuscript contains no figures</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Recommendation:</bold>&#x00a0;Visual representations would enhance accessibility: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>Figure 1: Line graph showing mean scores, pass rates, and reliability coefficients over time</p>
                                </list-item>
                                <list-item>
                                    <p>Figure 2: Box plots comparing MCQ vs. KFP scores, potentially stratified by gender</p>
                                </list-item>
                                <list-item>
                                    <p>Figure 3: Bar chart of candidate satisfaction ratings across survey domains</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                </list> </p>
            <p> Substantive Content Issues</p>
            <p> 1.&#x00a0;
                <bold>KFP Implementation Fidelity</bold>
            </p>
            <p> The manuscript states KFPs target "key decision points" but doesn't provide evidence that the implemented items actually focus on high-stakes, discriminating clinical decisions rather than routine information gathering. Given the initial low reliability (&#x03b1; = 0.31 in 2021), were blueprinting procedures adequate?</p>
            <p> 
                <bold>Recommendation:</bold>&#x00a0;Add a brief description of the KFP development process, including how "key features" were identified and how items were validated to target these features.</p>
            <p> 2.&#x00a0;
                <bold>Standard-Setting Defensibility</bold>
            </p>
            <p> Using different standard-setting methods for different item types (Nedelsky for MCQ, Angoff for KFP) is reasonable, but the manuscript doesn't explain why these specific methods were chosen or whether combined standard-setting procedures were used when integrating the two components.</p>
            <p> 
                <bold>Recommendation:</bold>&#x00a0;Add a brief justification for method selection and explain how component cut scores were combined into the overall pass standard.</p>
            <p> 3.&#x00a0;
                <bold>Gender Analysis Interpretation</bold>
            </p>
            <p> The finding that male candidates scored higher on MCQs and total scores (p&lt;0.05) but not on KFPs or overall pass rates requires more nuanced interpretation. The manuscript briefly mentions "differential exposure" but doesn't explore potential mechanisms or implications.</p>
            <p> 
                <bold>Recommendation:</bold>&#x00a0;Expand the discussion of gender differences, considering: 
                <list list-type="bullet">
                    <list-item>
                        <p>Whether MCQ vs. KFP performance patterns suggest format-related bias</p>
                    </list-item>
                    <list-item>
                        <p>Implications for assessment equity</p>
                    </list-item>
                    <list-item>
                        <p>Comparison to international literature on gender and assessment format</p>
                    </list-item>
                </list> 4.&#x00a0;
                <bold>COVID-19 Impact</bold>
            </p>
            <p> The manuscript attributes post-2021 score declines partly to pandemic disruptions but provides limited evidence. The 2021 exam was the first after cancellation, so reduced clinical exposure is plausible, but scores continued declining through 2025.</p>
            <p> 
                <bold>Recommendation:</bold>&#x00a0;Either strengthen this interpretation with additional evidence (e.g., candidate survey data on clinical exposure, comparisons with other subspecialty exams in Turkey) or moderate the claim.</p>
            <p> 5.&#x00a0;
                <bold>Reliability Progression</bold>
            </p>
            <p> The dramatic improvement in KFP reliability from &#x03b1; = 0.31 (2021) to &#x03b1; = 0.85 (2024) suggests major changes in item quality, scoring procedures, or both. This deserves more explicit discussion.</p>
            <p> 
                <bold>Recommendation:</bold>&#x00a0;Discuss what specific quality improvement initiatives led to enhanced KFP reliability. Were rater training procedures enhanced? Were poorly performing items systematically revised?</p>
            <p> Minor Editorial Issues</p>
            <p> Language and Clarity</p>
            <p> The manuscript is generally well-written, but minor issues exist: 
                <list list-type="order">
                    <list-item>
                        <p>
                            <bold>Page 3:</bold>&#x00a0;"TSPED has played a central role in advancing pediatric endocrinology and diabetes care through the promotion of professional collaboration, standard-setting initiatives, and a broad array of educational activities&#x2014;including conferences, workshops, and training programs&#x2014;designed to enhance the competencies of healthcare professionals." 
                            <list list-type="bullet">
                                <list-item>
                                    <p>
                                        <bold>Issue:</bold>&#x00a0;Slightly verbose</p>
                                </list-item>
                                <list-item>
                                    <p>
                                        <bold>Suggestion:</bold>&#x00a0;"TSPED has advanced pediatric endocrinology and diabetes care through professional collaboration, standard-setting, and educational programs including conferences, workshops, and training."</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Page 4:</bold>&#x00a0;"Omitted questions were scored as correct for all examinees." 
                            <list list-type="bullet">
                                <list-item>
                                    <p>
                                        <bold>Issue:</bold>&#x00a0;Could be clearer about when this occurred</p>
                                </list-item>
                                <list-item>
                                    <p>
                                        <bold>Suggestion:</bold>&#x00a0;"Questions subsequently identified as flawed through candidate appeals were retroactively scored as correct for all examinees (2016: 4 items; 2017-2018, 2021: 2 items each; 2024-2025: 1 item each)."</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Abbreviations:</bold>&#x00a0;First use should be spelled out: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>"PMPs" appears without definition on page 3 (Patient Management Problems)</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                </list> Internal Consistency 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>Page 3:</bold>&#x00a0;The manuscript states eligibility was "verified through documentation review" but doesn't specify what documentation (completion of residency? specific training requirements?)</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Page 4:</bold>&#x00a0;"seven or eight TBPEC members" &#x2013; clarify why this varied</p>
                    </list-item>
                </list> </p>
            <p> Ethical Considerations</p>
            <p> The ethical approval and consent procedures are appropriate for this type of educational research. The justification for verbal rather than written consent is reasonable given the anonymous, minimal-risk nature of the survey. The use of existing examination data for quality improvement and research is appropriately covered by institutional ethics approval.</p>
            <p> 
                <bold>Recommendation:</bold>&#x00a0;Consider explicitly stating whether candidates were informed during registration that de-identified examination data might be used for research purposes.</p>
            <p> Overall Assessment and Recommendations for Revision</p>
            <p> This manuscript presents valuable longitudinal data on subspecialty board examination evolution and provides useful insights for medical education stakeholders internationally. The core findings are sound and the conclusions are appropriately supported. However, several methodological details require clarification, and the statistical analysis could be strengthened with effect sizes and formal trend analysis.</p>
            <p> 
                <bold>Required Revisions (Essential for Scientific Soundness):</bold> 
                <list list-type="order">
                    <list-item>
                        <p>
                            <bold>Expand Methods &#x2013; KFP Scoring:</bold>&#x00a0;Provide detailed description of scoring procedures, rater training, and reliability assessment for written responses</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Expand Methods &#x2013; Standard Setting:</bold>&#x00a0;Provide operational details of standard-setting procedures including judge selection, number, calibration, and decision rules</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Statistical Enhancement:</bold>&#x00a0;Add effect sizes for key comparisons and consider formal trend analysis</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Clarify Data Analysis:</bold>&#x00a0;Specify handling of missing data and assumption checking for statistical tests</p>
                    </list-item>
                </list> 
                <bold>Recommended Revisions (Would Substantially Strengthen Manuscript):</bold> 
                <list list-type="order">
                    <list-item>
                        <p>
                            <bold>Add Figures:</bold>&#x00a0;Visual representation of trends, score distributions, and candidate feedback</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Expand Discussion:</bold>&#x00a0;More thorough interpretation of reliability improvements, gender differences, and COVID-19 impacts</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Content Validity Evidence:</bold>&#x00a0;Describe the examination blueprint and content coverage procedures</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Supplementary Materials:</bold>&#x00a0;Include survey instrument and additional methodological details</p>
                    </list-item>
                </list> 
                <bold>Minor Revisions (Would Improve Clarity):</bold> 
                <list list-type="order">
                    <list-item>
                        <p>Editorial refinements as noted above</p>
                    </list-item>
                    <list-item>
                        <p>Explicit discussion of generalizability and context-specific findings</p>
                    </list-item>
                    <list-item>
                        <p>More conservative framing of validity claims pending predictive validity studies</p>
                    </list-item>
                </list> </p>
            <p> Conclusion</p>
            <p> 
                <bold>Recommendation: Revise (Minor Revision)</bold>
            </p>
            <p> This manuscript makes a solid contribution to the medical education literature by providing transparent, longitudinal data on subspecialty board examination evolution. The integration of KFPs represents an important pedagogical advancement, and the psychometric data demonstrate continuous quality improvement. With clarification of scoring procedures, enhanced methodological detail, and strengthened statistical analysis, this work will serve as a valuable reference for other medical education systems implementing or refining board certification processes.</p>
            <p> The study is fundamentally sound, the data are robust, and the conclusions are appropriate. The required revisions are primarily matters of methodological transparency rather than fundamental scientific concerns. I recommend acceptance following minor revision to address the methodological clarifications outlined above.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Pediatric board exam performance research</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
    </sub-article>
</article>
