Keywords
Graduate Medical Education, Pediatric Endocrinology, Subspecialty Training, Board Certification, Board Examination, Medical Education, Assessment, Key Feature Problems
To present a decade-long evaluation of the Turkish Pediatric Endocrinology Subspecialty Board Examination (TPEBE), focusing on its structural evolution, psychometric performance, and candidates’ perceptions.
This cross-sectional study analyzed examination data from 2015–2025, encompassing 263 sittings (261 eligible candidates) and post-exam survey responses from 217 participants. Examination metrics included mean scores, pass rates, and reliability coefficients (Cronbach’s α) for multiple-choice question (MCQ) and key feature problem (KFP) components. Survey items assessed perceptions of exam difficulty, fairness, relevance, and organization using a 9-point Likert scale. Quantitative data were analyzed using descriptive and inferential statistics.
Mean total scores declined following the 2019 inclusion of KFPs (x̄=52.9±9.05 in 2025), while reliability improved progressively (MCQ α=0.53–0.90; KFP α=0.31–0.85). Pass rates varied from 22.6% to 85.0%. Male candidates scored higher on MCQs and total scores (p<0.05), but gender differences in KFP performance and overall pass rates were not statistically significant. Candidates rated the examination highly for organization (x̄=7.75±1.54) and clinical relevance of KFPs (x̄=7.35±1.59), though exam duration received the lowest satisfaction (x̄=5.14±2.97). Qualitative feedback emphasized the educational value of KFPs and recommended extended testing time.
Over ten years, the TPEBE has evolved into a psychometrically robust and educationally valuable certification process. The balanced integration of MCQs and KFPs has strengthened construct validity and candidate engagement. These examinations are expected to gain broader recognition by institutions and regulators as a benchmark for educational and professional achievement.
Graduate Medical Education, Pediatric Endocrinology, Subspecialty Training, Board Certification, Board Examination, Medical Education, Assessment, Key Feature Problems
Pediatric endocrinology was formally recognized as a subspecialty in Türkiye in 1973, alongside pediatric metabolic diseases. It subsequently attained the status of an independent subspecialty in 2002.1,2 The establishment of the Turkish Society for Pediatric Endocrinology and Diabetes (TSPED) in 1994 marked a critical milestone in the institutional development of the field. Since its inception, TSPED has played a central role in advancing pediatric endocrinology and diabetes care through the promotion of professional collaboration, standard-setting initiatives, and a broad array of educational activities—including conferences, workshops, and training programs—designed to enhance the competencies of healthcare professionals.3
In alignment with international practices for professional certification, the Turkish Pediatric Endocrinology Subspecialty Board Examination (TPEBE) was introduced in 2015. Administered by the Turkish Board of Pediatric Endocrinology Exam Committee (TBPEC) under TSPED, the TPEBE serves as a formal mechanism to evaluate the clinical knowledge and competencies of pediatric endocrinologists in Türkiye. Board certification examinations are globally recognized as essential instruments for safeguarding the quality of clinical practice and maintaining high professional standards across medical specialties.4 Pediatric subspecialty board examinations are key instruments of international competency assurance. The implementation of the TPEBE underscores Türkiye’s commitment to aligning its training and certification processes with global standards in medical education. Thus, the establishment of the TPEBE represents a significant step in harmonizing Turkish pediatric endocrinology training and credentialing processes with international norms in medical education and assessment.
During the initial three years, the examination process exclusively employed multiple-choice questions (MCQs). They are recognized as a highly effective tool for evaluating comprehensive knowledge across broad content areas, as they facilitate extensive content coverage and contribute to strong content validity.5 This approach supports making valid inferences about the entire content domain. Furthermore, MCQs are extensively utilized because they offer high reliability and are easy to score, ensuring precision, uniformity, and efficiency.6 However, poorly designed MCQs may inadvertently target superficial content and fail to measure higher-order cognitive processes.7,8
Consequently, the selection of item formats for any given assessment should be guided by a clear understanding of their respective strengths and limitations. A robust assessment strategy integrates diverse methods, each tailored to meet specific evaluative objectives.9 While MCQs ensured coverage and reliability, concerns about assessing higher-order reasoning prompted the integration of new formats.
In response to these considerations, the TPEBE incorporated key feature problems (KFPs) into its format starting in 2019, with the objective of enhancing the assessment of candidates’ clinical decision-making skills.10 KFPs are designed to simulate real-life clinical scenarios that require the integration of complex data to make clinically meaningful decisions.9,11 The format focuses on pivotal points in case management—referred to as “key features”—which represent the most essential and error-prone aspects of clinical problems.12,13 Originally introduced at the Cambridge Conference in 1984, the key features format was adopted by the Medical Council of Canada in 1992 as part of the MCC Qualifying Examination (MCCQE) Part I. This innovation aimed to replace the older Patient Management Problems (PMPs) format and reduce the overreliance on MCQs in licensure examinations.11,12,14 The adoption of KFPs by the TPEBE reflects a parallel intent: to augment the assessment of clinical competence beyond the limits of traditional MCQs.
This manuscript presents a comprehensive 10-year review of the Turkish Pediatric Endocrinology Subspecialty Board Examination, focusing on its structural evolution, aggregated examination outcomes, and candidates’ perspectives on the examination process.
This cross-sectional study analyzed data from the TPEBE conducted between 2015 and 2025. Examination records were combined with candidate survey responses collected immediately after each examination, except in the first year (2015), when no survey was administered. A total of ten examination sessions were included in the analysis; the 2020 session was cancelled due to the COVID-19 pandemic.
Eligible candidates were pediatric endocrinologists and subspecialty residents who met the requirements defined by the TBPEC. Eligibility was verified through documentation review, and only approved applicants were permitted to sit for the exam. Across the study period, there were 263 examination sittings corresponding to 186 unique candidates where 58 attempted the examination more than once. Of these, 193 (73.4%) were female and 70 (26.6%) males.
Exam questions were developed by faculty members from academic institutions across Türkiye, each submitting items within their subspecialty areas to an online item bank organized by predefined subject categories. Depending on the year, seven or eight TBPEC members reviewed these items during face-to-face structured meetings, revised them as needed, and selected the final questions for each examination. All examinations were administered in paper–pencil format, at a single venue, under TBPEC members’ supervision.
Each exam set consisted of two sections:
1. MCQ section: In the first four years (2015–2018), the exams consisted exclusively of MCQs, ranging from 75 items in 2015 to 100 items between 2016–2018, 85 items in 2019 and 80 items between 2021-2025. All MCQs had five options with one correct answer.
2. KFP section: Beginning in 2019, the exam included a second section of KFPs. Each exam contained five KFP cases, with 2–4 items per case (13–14 items in total). Most required short written responses, while only five were multiple-response items allowing more than one correct option.
The maximum achievable score for each examination was 100 points. To determine the cut scores, two standard-setting methods were employed: the Nedelsky method for MCQs (applied to all exams except 2015) and the Angoff method for KFPs, each selected for its suitability to the respective item format. For exams that included both MCQ and KFP sections, the final cut score represented the sum of the two section-specific cut scores. The evolution of the examination structure across the study period is summarized in Table 1.
| Year(s)* | MCQ items n | KFP cases (items) n | Exam format |
|---|---|---|---|
| 2015 | 75 | – | MCQ |
| 2016–2018 | 100 | – | |
| 2019 | 85 | 5 cases (13–14) | MCQ + KFP |
| 2021–2025 | 80 | 5 cases (13–14) |
Post-exam review: After each examination, board members reviewed the questions with the candidates and received any appeals. Following the evaluation of these appeals, some MCQs were removed from the exam sets over the years for various reasons: 2016 (4 items), 2017, 2018, and 2021 (2 items each), and 2024 and 2025 (1 item each). Omitted questions were scored as correct for all examinees.
The feedback survey was administered at the examination venue after candidates completed the examination. Candidate feedback was collected using a paper-based questionnaire administered immediately after the exam except in 2015. The instrument consisted of items questioning demographic characteristics, 11 structured items assessing perceptions of exam difficulty, relevance, fairness, and organization, each rated on a 9-point Likert scale (1: Strongly disagree/Very poor, 9: Strongly agree/Very good); and open-ended questions exploring the most useful aspects, least useful aspects, suggestions for improvement and comments. The survey was completed anonymously and voluntarily. Informed consent was obtained verbally from all participants prior to data collection. Verbal consent was obtained because the survey was anonymous, posed minimal risk, and did not involve the collection of any identifying or sensitive personal information.
Data were analyzed using IBM SPSS Statistics version 29.0 (IBM Corp., Armonk, NY, USA). The normal distribution of continuous variables was examined using the Kolmogorov-Smirnov test and presented as mean and standard deviation (x̄ ± SD). Comparisons between the two groups were conducted using the independent-samples t-test or Mann-Whitney U test where applicable. Categorical variables were presented as numbers and percentages. The relationship between categorical variables was examined using Pearson’s chi-square and Fisher’s exact test. Cronbach’s alpha coefficient was calculated for the reliability of the tests. A 95% confidence interval was adopted, and statistical significance was set at p < 0.05.
Between 2015 and 2025, a total of 263 candidates (193 women, 70 men) participated in the board examination. Two candidates who did not respond to any exam items were excluded, leaving 261 candidates for statistical analysis.
Mean total exam scores, cut scores, pass rates, and reliability coefficients (Cronbach’s α) for both the MCQ and KFP components are summarized in Table 2. Overall, mean total scores showed a gradual decline after 2018, with the lowest average observed in 2023 (x̄ = 52.33 ± 12.18). Pass rates fluctuated across years, ranging from 22.6% (2023) to 85.0% (2016). Reliability coefficients for the MCQ component remained moderate to high (α = 0.53–0.90), while those for the KFP component improved over time, from 0.31 in 2021 to 0.85 in 2024.
| Yeara | Candidate n | Exam total score Mean (SD) | Cut score (%) | Pass rate (%) | MCQb (α) | KFPb (α) |
|---|---|---|---|---|---|---|
| 2015 | 26 | 70.45 (5.97) | 70.0 | 73.1 | 0.529 | - |
| 2016 | 20 | 74.00 (7.06) | 65.0 | 85.0 | 0.686 | - |
| 2017 | 26 | 64.04 (10.33) | 60.0 | 69.2 | 0.831 | - |
| 2018 | 18 | 61.83 (13.65) | 60.0 | 61.1 | 0.899 | - |
| 2019 | 22 | 57.64 (10.40) | 50.0 | 72.7 | 0.813 | 0.425 |
| 2021 | 23 | 56.69 (8.43) | 60.0 | 39.1 | 0.677 | 0.312 |
| 2022 | 29 | 57.98 (10.77) | 60.0 | 41.4 | 0.765 | 0.648 |
| 2023 | 31 | 52.33 (12.18) | 60.0 | 22.6 | 0.854 | 0.711 |
| 2024 | 29 | 53.99 (11.36) | 58.0 | 37.9 | 0.807 | 0.854 |
| 2025 | 37 | 52.90 (9.05) | 58.0 | 29.7 | 0.675 | 0.662 |
Male candidates achieved significantly higher mean scores in both the MCQ and exam total scores compared with females, while no statistically significant gender difference was observed for KFP scores Table 3.
| Score type | Female (mean ± SD) | Male (mean ± SD) | t (df ) | pa |
|---|---|---|---|---|
| MCQ | 52.28 ± 12.49 | 57.90 ± 16.08 | –2.65 (≈101) | 0.009 b |
| KFP | 8.50 ± 3.45 | 8.89 ± 3.23 | –0.60 (167) | 0.459 c |
| Exam Total | 58.24 ± 11.49 | 62.35 ± 13.28 | –2.45 (259) | 0.015 b |
Throughout the study period, male examinees achieved an overall pass rate of 60.0%, compared with 46.6% among female examinees. This difference was not statistically significant (χ2 = 3.68, p = 0.055).
Of the 235 examinees across the 2016–2025 examination years, 217 (92.3%) completed the post-examination feedback questionnaire. The mean and standard deviation scores for each structured statement, aggregated across examination years, are presented in Table 4.
Overall, participants rated the MCQ section as slightly more difficult (x̄ = 6.36 ± 1.77) than the KFP section (x̄ = 6.01 ± 1.90). Most respondents agreed that the KFP questions reflected real clinical practice and effectively assessed problem-solving ability. The inclusion of KFPs received high satisfaction scores, indicating broad approval of this format.
Among all items, exam organization achieved the highest overall average rating (x̄ = 7.75 ± 1.54), suggesting that participants were highly satisfied with the administration and logistical arrangements. Conversely, the adequacy of exam duration received the lowest rating (x̄ = 5.14 ± 2.97), representing a level only slightly above neutrality on the satisfaction scale and highlighting persistent time-related concerns.
Open-ended feedback from the examinations revealed diverse yet coherent themes reflecting participants’ evaluation of exam quality, content relevance, and organizational logistics. Respondents appreciated the exam’s comprehensive coverage across all topics and its alignment with clinical practice, particularly valuing the case-based (KFP) section for fostering problem-solving and reflective learning. Several participants noted that the test effectively highlighted their knowledge gaps and motivated further study.
Conversely, time constraints emerged as the most prominent source of dissatisfaction. Participants described the allotted duration as insufficient, citing long and complex questions that limited their ability to complete the exam. Suggestions included extending the time or dividing the test into two parts.
Regarding content, opinions were mixed. While most found the questions well prepared, a few perceived the MCQs as overly difficult or focused on unnecessary details—particularly in genetic and metabolic topics. Participants recommended increasing the proportion of clinical and case-based questions and providing post-exam answer booklets to enhance learning. Despite the stress associated with the process, many expressed gratitude to the organizing committee and acknowledged the exam’s educational value in guiding professional self-assessment and development.
This 10-year evaluation of the TPEBE provides the first comprehensive overview of its evolution, psychometric performance, and candidates’ perceptions since its inception in 2015. The findings reveal a trajectory of increasing structural improvement and reliability, particularly following the integration of KFPs in 2019. The transition from a purely MCQ format to a mixed MCQ–KFP design has progressively strengthened the exam’s ability to assess higher-order clinical reasoning while maintaining fairness and organizational quality. Pass rates fluctuated over time, likely reflecting the combined influence of item difficulty calibration, evolving training quality, and candidate preparedness. Importantly, reliability coefficients for both sections improved steadily, indicating the maturation of item-writing processes and enhanced internal consistency.
Throughout the decade, mean total scores and pass rates varied between exam years, with notably higher performance during the early years and lower outcomes after 2021. Several factors may explain this trend. The early sessions (2015–2018) consisted exclusively of MCQs, a format known for high reliability and broad content coverage but limited depth in assessing clinical judgment.5,6 As the examination evolved to include KFPs, which emphasize reasoning and decision-making, overall scores declined—an expected phenomenon also observed in other board examinations that introduced performance-oriented formats.4,11 The temporary disruption of training schedules and reduced clinical exposure during the COVID-19 pandemic may have further contributed to decreased performance in 2021–2023, a possible concern reported in other research in postgraduate assessments.15–18
The introduction of KFPs in 2019 represents a major pedagogical advancement in the TPEBE. Designed to target “key decision points” in clinical management, KFPs enable a more valid assessment of problem-solving and integrative reasoning than MCQs alone.12,13 The progressive increase in KFP reliability—from α = 0.312 in 2021 to α = 0.854 in 2024—indicates improved case design, examiner calibration, and standardization procedures. This pattern suggests growing psychometric robustness and aligns with findings from international studies reporting that mixed-format examinations achieve better construct validity.7,8,19,20 The balanced use of two complementary item types—MCQs for breadth and KFPs for depth—reflects an evidence-informed approach to assessment design consistent with contemporary medical education principles.9,21
In this study, although male candidates achieved slightly higher MCQ and total scores, gender differences in pass rates were not statistically significant. Comparable findings have been reported in other nationwide postgraduate assessments, with several studies showing no overall gender differences in examination outcomes, and in some cases, female candidates performing significantly better in certain domains or clinical components.22,23 The observed score gap may reflect differential exposure to standardized testing environments or self-perceived exam confidence. Importantly, the absence of significant disparities in KFP performance suggests that case-based, reasoning-oriented formats may reduce gender-related score variance, supporting their inclusion as a more equitable assessment component.
Candidates’ feedback provides valuable qualitative insight into the examination’s educational value and perceived validity. The majority of participants rated the exam highly for its organization, content balance, and reflection of real clinical practice. The strong endorsement of the KFP format underscores its perceived authenticity and relevance to day-to-day decision-making in pediatric endocrinology. Participants consistently reported that the exam highlighted their knowledge gaps and motivated targeted self-study—confirming the dual function of certification assessments as both evaluative and formative instruments. These findings echo international reports emphasizing the educational impact of well-designed board examinations in promoting reflective practice and continuing professional development.4,24
After the introduction of KFPs, the most consistent area of dissatisfaction concerned the perceived inadequacy of examination duration (x̄ = 5.14 ± 2.97). The relatively large standard deviation indicates considerable variability in respondents’ perceptions, suggesting diverse experiences or expectations regarding the sufficiency of the allotted time. This concern likely arises from the inherent cognitive load of the KFP section, which requires interpretive reasoning and the formulation of written responses. Similar challenges have been reported in previous studies, noting that constructing answers under open-ended conditions is cognitively demanding and time-consuming, as it involves articulating reasoning and justifying decisions rather than merely recognizing correct options.11,25,26 Addressing time constraints—either through adaptive scheduling or improved question pacing—could enhance candidate experience without compromising assessment validity.
The TBPEC’s iterative review of item performance and appeals, coupled with scoring adjustments for omitted questions, reflects a transparent and learner-centered quality assurance framework. The continuous monitoring of reliability and cut-score consistency over time has strengthened the exam’s credibility and accountability. Looking ahead, the transition to digital or hybrid exam delivery may offer opportunities for enhanced item analysis, automated scoring, and secure remote administration, aligning the TPEBE with global advancements in high-stakes assessment.
This analysis has some limitations. The cross-sectional design precludes longitudinal tracking of individual progress or causal inference regarding changes in performance. Survey data were self-reported and may be influenced by response bias. Furthermore, the absence of pre-pandemic comparator data limits the interpretation of COVID-19–related effects. Future research could explore predictive validity (e.g., relationship between board scores and subsequent clinical performance), longitudinal reliability, and the psychometric behavior of KFP items across specialties.
Over the past decade, the TPEBE has evolved from a traditional knowledge-based test into a multidimensional assessment aligned with international best practices in medical education. The integration of KFPs, progressive enhancement of reliability indices, and strong candidate endorsement collectively indicate a maturing and credible certification process. Continued investment in psychometric monitoring, examiner training, and technological innovation will further enhance the examination’s role in ensuring clinical competence and fostering excellence in pediatric endocrinology practice in Türkiye. It is anticipated that these examinations will be increasingly recognized and utilized by educational institutions and regulatory authorities as a benchmark for achievement in both education and professional practice.
The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ege University Faculty of Medicine Medical Research Ethics Board Ref. Date: 25.05.2023 #23-5.1T/46. Informed consent was obtained verbally from all participants prior to data collection. Verbal consent was obtained because the survey was anonymous, posed minimal risk, and did not involve the collection of any identifying or sensitive personal information.
figshare: A Decade of the Turkish Pediatric Endocrinology Subspecialty Board Examination: Structure, Outcomes, and Candidate Perspectives.
Dataset: https://doi.org/10.6084/m9.figshare.3062831927
This project contains the following data:
Data are available under the terms of the CC BY 4.0
We express our sincere appreciation to the TPEBEC members—Ece Böber, Hakan Döneray, Şenay Savaş Erdeve, Damla Gökşen, Filiz Tütüncüler Kökenli, Samim Özen, Alev Özön, and Doğa Türkkahraman—for their invaluable efforts and contributions throughout the entire TPEBE process.
We are also grateful to our academic colleagues who developed and submitted the MCQ and KFP items for the examination, whose contributions were essential to the success of this work. In addition, we extend our heartfelt thanks to all TPEBE participants for their engagement in the examination and for their time, commitment, and willingness to contribute to this study.
| Views | Downloads | |
|---|---|---|
| F1000Research | - | - |
|
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)