Background

F1000Research

2046-1402

F1000 Research Limited

London, UK

10.12688/f1000research.173421.1

Method Article

Articles

Ten Tips for AI‑Assisted Key Feature Problems: A Validity‑Informed Guide for Medical Education

[version 1; peer review: awaiting peer review]

Zafar

Imran

Conceptualization Methodology Visualization Writing – Original Draft Preparation Writing – Review & Editing 1 Farooq

Munawar

Writing – Review & Editing https://orcid.org/0009-0009-2537-7115 1 Caliskan

S. Ayhan

Writing – Review & Editing 1 Magzoub

Mohi Eldin

Conceptualization Methodology Project Administration Supervision Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0000-0002-6721-4500 a 1 1Department of Medical Education, United Arab Emirates University College of Medicine and Health Sciences, Al Ain, Abu Dhabi, 20004, United Arab Emirates

a mmagzoub@uaeu.ac.ae

No competing interests were disclosed.

24 12 2025

2025

1446

8 12 2025

2025

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Generative artificial intelligence (AI) can augment educators’ capacity to design high-quality Key Feature Problems (KFPs) for valid assessment of clinical reasoning and decision-making. This practice-oriented guide presents ten evidence-informed tips for using AI to develop KFPs that are aligned with learning outcomes, cognitively demanding, and contextually authentic. Drawing on the KFP literature and contemporary validity frameworks (content, cognitive and response processes, internal structure, and consequences), we synthesize practical strategies for translating outcomes into key features, constructing realistic vignettes, creating parallel case variants, targeting higher-order thinking, ensuring curricular alignment and learner-level appropriateness, diversifying complementary item formats, validating AI-assisted items through a stepwise workflow, delivering decision-specific feedback, iterating from learner performance data, and safeguarding equity, ethics, and governance. We illustrate these recommendations with concise examples and an adapted validation workflow that supports both formative and summative applications. Although AI can accelerate scenario construction and feedback drafting, human expertise remains essential to verify clinical accuracy, prevent bias and hallucinations, calibrate difficulty, and preserve assessment security. With transparent processes and expert review, AI can serve as a collaborative assistant rather than a replacement, helping medical educators build rigorous KFPs that enhance the assessment of clinical decision-making.

Medical Education Generative Artificial Intelligence Key Feature Problems

The author(s) declared that no grants were involved in supporting this work.

Background

Key Feature Problems (KFPs) are an established assessment tool in medical education, designed to evaluate clinical decision-making skills among medical students and practitioners. KFP focus on the “key features” of a clinical case, those critical steps or decisions that are most essential to managing the clinical case scenario effectively ( Page et al., 1995). By concentrating on these pivotal elements, a KFPs offer a focused and efficient means of assessing learners’ clinical decisions in context, thereby bridging the gap between theoretical knowledge and practical application ( Farmer & Page, 2005).

Incorporating KFPs into medical education supports the integration of foundational scientific knowledge with clinical practice. The application of basic science principles within clinical reasoning is fundamental to competent medical decision-making. KFPs facilitate this integration by requiring learners to apply their understanding of underlying scientific mechanisms when evaluating clinical scenarios ( Farmer & Page, 2005; Nayer et al., 2018). This alignment ensures that students are not only acquiring factual knowledge but are also developing the capacity to apply that knowledge in nuanced, real-world clinical situations.

Moreover, the growing emphasis on clinical reasoning and self-directed learning in contemporary medical curricula underscores the relevance of KFPs. As assessment tools, KFPs are well-suited to evaluating higher-order thinking skills and have demonstrated reliability and validity in this domain ( Farmer & Page, 2005). By simulating authentic clinical decisions, KFPs support the development of critical thinking, promote problem-solving, and prepare learners to handle clinical complexity with confidence ( Farmer & Page, 2005; Nayer et al., 2018).

Challenges and principles in designing Key Feature Problems (KFPs)

Developing high-quality Key Feature Problems (KFPs) questions presents several challenges, particularly in ensuring clinical accuracy, curricular alignment, and educational relevance. Unlike traditional multiple-choice questions, KFPs aim to assess clinical decision-making skills through context-rich scenarios that mirror real-life practice, making their construction inherently complex ( Nayer et al., 2018).

Clinical accuracy is essential for maintaining the quality and integrity of assessments. A KFP often span multiple disciplines and include nuanced decision points; therefore, any factual inaccuracies can compromise validity and undermine the assessment of decision-making skills. With the rapid evolution of medical knowledge, it is essential to regularly update KFPs content to reflect current guidelines and best practices ( Farmer & Page, 2005; Nayer et al., 2018).

Effective KFP design requires alignment with clearly defined learning outcomes. Each scenario should target specific competencies expected of learners, thereby reinforcing curricular goals and ensuring that assessment remains educationally relevant ( Nayer et al., 2018). This is especially critical within the framework of competency-based medical education (CBME), where the emphasis is on demonstrable, practice-ready skills rather than rote memorization ( Connor et al., 2020). When strategically embedded across the curriculum, KFP offer longitudinal reinforcement of essential clinical competencies, supporting both horizontal and vertical integration of knowledge.

KFPs also offer opportunities to promote and evaluate ethical reasoning and professionalism. By incorporating patient-centered dilemmas or moral conflicts, KFPs can assess not only technical knowledge but also character formation and decision-making in ethically complex situations ( Andrade et al., 2024).

Authenticity is a defining characteristic of effective KFPs. Scenarios should reflect real-world clinical contexts, be appropriately pitched to the learner’s stage of training, and avoid cognitive overload. Appropriately scaffolded cases enhance engagement, reduce anxiety, and improve confidence ( Nayer et al., 2018). Emphasizing decision points related to diagnosis, management, and follow-up reinforces the transfer of knowledge to clinical settings ( Hrynchak et al., 2014).

Finally, eliminating extraneous information is critical. Irrelevant details can distract learners from key issues, increase cognitive load, and hinder performance. Streamlined scenarios sharpen focus on the essential decisions, promoting efficient and accurate reasoning skills vital in high-stakes clinical environments. Well-crafted KFPs thus strike a balance between realism, challenge, and educational purpose, serving as a robust tool for developing and evaluating clinical reasoning throughout medical training ( Nayer et al., 2018).

Why AI for KFP now?

Artificial intelligence is reshaping medical education through adaptive, data-informed tools that can strengthen both learning and assessment. Generative models, including large language models and simulation platforms, can rapidly produce realistic Key Feature Problems that align with explicit learning outcomes, match intended cognitive levels, and reflect authentic clinical contexts. This capacity accelerates the creation of item banks while supporting coherence with curricular blueprints and competency frameworks ( Indran et al., 2024; Qiu & Liu, 2025).

Beyond item drafting, AI enables innovations that are directly relevant to KFPs design and use. Systems can generate virtual patients and interactive clinical vignettes that situate key decisions within believable settings, which promotes transfer of reasoning across variants and settings ( Potter & Jefferies, 2024; Sardesai et al., 2024). AI-supported analytics can provide real-time or near-real-time feedback, surface common reasoning errors, and personalize practice based on learner performance patterns, thereby improving formative value and supporting programmatic assessment ( Mishra et al., 2024). Exposure to these tools also advances AI literacy, a competency that future clinicians increasingly require ( Subaveerapandiyan et al., 2024).

Once deployed, AI can assist with continuous quality improvement of KFPs. Models can analyze response data to detect weak distractors, ambiguous wording, and miscalibrated difficulty, then propose targeted revisions for expert review. Where governance and privacy protections are in place, linkage to de-identified clinical data or the use of synthetic datasets can further enhance authenticity by anchoring scenarios in realistic patterns of presentation and management. However, such integrations require careful oversight by institutions and remain context-dependent ( Blau et al., 2024).

These opportunities come with risks that must be actively managed. Generative systems can hallucinate facts, propagate outdated guidelines, and encode or amplify social and clinical biases. Responsible adoption, therefore, requires transparent processes, faculty development, and explicit ethical and data governance frameworks. Human subject matter expertise remains essential for verifying clinical accuracy, ensuring fairness, calibrating cognitive demand, and protecting item security ( Franco D’Souza et al., 2024; Tolsgaard et al., 2023).

In sum, strategic use of AI offers a scalable and evidence-informed approach to designing, validating, and iteratively improving KFPs. The following ten tips translate these opportunities and caution into concrete steps that educators can apply to create high-quality, learner-appropriate, and ethically sound KFPs.

How we developed these tips

We developed the ten tips through a staged process that combined theory, existing assessment standards, and iterative Subject-Matter Expert (SME) review. First, we mapped recurrent problems in AI-generated KFPs (construct drift, shallow recall, unsafe feedback, weak item documentation) against established assessment sources in medical education (key-feature literature, blueprinting and OSCE validation guidance, Messick-style (1995) validity argumentation). From this mapping we kept only frameworks that could be implemented in low- and medium-stakes contexts and that preserved the key-feature construct. Second, we used AI to draft multiple versions of each tip (purpose, action, example), then circulated these drafts to SME (assessment and clinical) to remove clinically unsafe suggestions, localize to GCC (Gulf Cooperation Council) countries practice, and align with curriculum learning outcomes’. Third, we trialed the tips on real AI outputs to see which ones actually improved item quality; tips that did not change SME ratings were merged or dropped. The final ten tips, therefore, represent the set that was (a) evidence-attuned, (b) feasible for routine faculty use, and (c) auditable through the adapted 5-step validation workflow.

Ten tips for writing key feature problems using generative AI

This section provides ten practical and evidence-informed tips to help medical educators integrate generative AI into the design of KFPs. Each tip is aimed at ensuring that AI-generated questions are pedagogically sound, clinically relevant, and aligned with curricular goals. By applying these strategies, educators can enhance the quality of assessment tools used to evaluate clinical decision-making, while also improving the efficiency of content development.

Tip 1: Define learning outcomes and key features

Before using generative AI to develop Key Feature Problems (KFPs), educators should first define clear, measurable learning outcomes and derive the corresponding key features. Key features represent the critical decisions or actions that determine effective clinical management ( Farmer & Page, 2005; Nayer et al., 2018). Establishing these foundations ensures that AI-generated content is grounded in explicit educational intent and that each scenario targets competencies essential to clinical reasoning. Developing learning outcomes and key features in advance prevents the creation of unfocused or misaligned cases and supports validity by ensuring that each question assesses a decision point directly related to the intended outcome.

Once the initial key features are identified, AI can assist in refining and expanding them. By analyzing large datasets or educational case repositories, AI can identify additional high-yield decision points that may not be apparent through manual analysis. Drawing upon diverse clinical information allows educators to uncover patterns and associations that enhance authenticity and completeness. This process strengthens alignment between curricular objectives and the reasoning steps that differentiate expert from novice performance ( Farmer & Page, 2005; Nayer et al., 2018).

Example 1:

Learning Objective:

Demonstrate the ability to diagnose and manage acute asthma in adult patients.

Identified Key Features: 1.

Assess severity of the asthma exacerbation.

Initiate immediate treatment.

Decide on patient disposition (admitting or discharge).

AI-Generated Case (Short Clinical Vignette):

A 30-year-old patient presents to the emergency department with shortness of breath and audible wheezing for the past two hours. The patient has a known history of asthma and seasonal allergies.

Key Feature Questions: 1.

(Write-in) What two clinical assessments are most important for determining the severity of this exacerbation?

(Short-menu) Select the three most appropriate immediate treatments:

•

Inhaled β ₂-agonist

•

Systemic corticosteroid

•

Oxygen therapy

•

Antibiotic therapy

•

Antihistamine

(Short-menu) Which criteria would guide your decision to discharge the patient? (Select all that apply.)

This sequence (learning outcome → key features → case → questions) illustrates the structured logic of KFP design and specifies the item format and number of responses required, consistent with established methodology ( Farmer & Page, 2005; Nayer et al., 2018).

Example 2:

Educators who identify preliminary key features for managing acute chest pain, such as: 1.

Obtaining an appropriate history and identifying red-flag symptoms,

Initiating essential diagnostic investigations, and

Deciding on immediate management priorities, can use AI tools to refine and extend these features.

Large language models may reveal additional decision points, including: •

Differentiating cardiac from non-cardiac causes (for example, pulmonary embolism or aortic dissection).

•

Recognizing atypical presentations in diabetic or female patients.

•

Applying Risk Stratification Tools in clinical decision making.

These refinements help ensure that the resulting KFPs capture a broader spectrum of clinical complexity and reflect authentic decision-making challenges encountered in practice ( Farmer & Page, 2005; Nayer et al., 2018).

Note: KFPs may be presented as write-in or short-menu (SM) items. In SM formats, response options and the number of required selections must always be explicitly stated. At this stage, AI assists in improving the quality and breadth of key features, but the educator retains responsibility for selecting which AI-suggested features to include when constructing the final clinical vignette and corresponding questions.

Tip 2: Build authentic and context-rich Clinical Scenarios

Creating realistic and contextually grounded clinical scenarios is essential to the educational value of KFPs. Once key features have been identified and refined, generative AI can be used to construct authentic vignettes that situate these decisions within believable clinical contexts ( Berbenyuk et al., 2024; Qiu & Liu, 2025). By incorporating relevant demographic, environmental, and psychosocial details, AI helps simulate the complexity of real-world medical encounters ( Potter & Jefferies, 2024; Sardesai et al., 2024).

AI tools can also vary contextual parameters, such as disease stage, comorbidities, or resource limitations, to produce multiple versions of the same case. This contextual diversity strengthens students’ ability to transfer reasoning across scenarios and enhances case authenticity without adding to faculty workload ( Berbenyuk et al., 2024; Indran et al., 2024).

Example:

AI-Enhanced Realistic KFP Scenario (Short Vignette)

Mr. Ali K., a 58-year-old taxi driver with long-standing hypertension and type 2 diabetes, arrives at a community clinic complaining of mild chest discomfort radiating to his jaw. He reports the pain began after climbing stairs 30 minutes ago and has gradually subsided. He takes metformin and amlodipine irregularly. Vital signs: BP 160/95 mmHg, HR 88 bpm, SpO ₂ 97%, BMI 31 kg/m ². The nearest hospital is 25 km away.

Key Feature Questions: 1.

(Write-in) What initial clinical assessments are essential before deciding whether this patient can safely remain in the clinic? List up to two.

(Short-menu) What are the two most critical diagnostic tests to confirm your leading diagnosis? List up to two.

•

12-lead ECG

•

Cardiac troponin I

•

Chest X-ray

•

D-dimer

(Short-menu) Which management action should be taken immediately? (Select one.)

•

Administer oral Aspirin and arrange urgent transfer

•

Begin oral antihypertensive therapy and review next week

•

Provide reassurance and schedule stress test

By prompting AI to integrate demographic, psychosocial, and logistic details, educators can generate scenarios that are not only clinically coherent but also contextually realistic ( Potter & Jefferies, 2024; Qiu & Liu, 2025). Such authenticity strengthens cognitive fidelity, meaning that decisions made in the scenario closely mirror real clinical reasoning, thereby enhancing learners’ engagement and readiness for practice ( Preiksaitis & Rose, 2023; Sardesai et al., 2024).

Note: While AI can enhance realism, each generated scenario must undergo expert review to verify clinical accuracy and appropriateness for the target learner level ( Farmer & Page, 2005; Nayer et al., 2018).

Tip 3: Generate scenario diversity and parallel case variants

Generative AI can be strategically used to create multiple, pedagogically distinct versions of clinical scenarios centered on the same medical condition. This approach promotes both educational richness and psychometric robustness by exposing learners to varied but conceptually equivalent challenges ( Berbenyuk et al., 2024; Indran et al., 2024).

By varying contextual elements such as patient demographics, comorbidities, access to resources, and disease stage, AI helps educators design cases that assess the transfer of learning rather than rote recall ( Hrynchak et al., 2014). For instance, a single learning outcome on “acute coronary syndrome management” can be represented through different case variants: a young woman with atypical chest pain, an elderly diabetic with silent ischemia, or a middle-aged smoker with classic symptoms. Each variant targets the same underlying key features but tests adaptive reasoning in distinct contexts ( Farmer & Page, 2005; Nayer et al., 2018).

AI can also support psychometric balance by generating parallel cases matched on cognitive level and difficulty, aiding blueprinting and longitudinal assessment across cohorts ( Indran et al., 2024). Through controlled prompting, educators can maintain item equivalence while ensuring content freshness and reduced cueing effects. This capacity is especially useful for formative assessments, progress tests, and multi-institutional benchmarking.

Example:

Learning Objective: Manage patients presenting with myocardial infarction.

Common Key Features: 1.

Identify ischemic symptoms and risk factors.

Interpret ECG and cardiac biomarkers.

Initiate evidence-based acute management.

By generating structured variants like these, AI helps educators evaluate consistency in reasoning across different contexts while maintaining construct validity. Moreover, such diversity supports inclusivity, ensuring exposure to a range of patient profiles and system-level challenges ( Mishra et al., 2024; Teferi et al., 2023).

When educators vary contextual parameters such as disease stage, comorbidities, or resource limitations, AI can produce multiple case versions to strengthen transfer of reasoning ( Table 1).

Table 1. AI-generated scenario variants.

Scenario	Contextual variation	Key decision focus
Case A: 65-year-old male with classic ST-elevation MI in tertiary hospital	Resource-rich environment	Timely reperfusion decision
Case B: 48-year-old female with atypical symptoms and normal ECG in rural clinic	Limited diagnostics available	Decision to transfer or observe
Case C: 72-year-old diabetic with dyspnea but no chest pain	Comorbid and silent presentation	Recognition of atypical MI

Note: Each AI-generated variant should be reviewed for alignment with curricular outcomes and calibrated for difficulty using item analysis or expert consensus ( Farmer & Page, 2005; Nayer et al., 2018).

Tip 4: Scaffold higher-order clinical reasoning

Effective KFPs go beyond factual recall and assess a learner’s ability to analyze, synthesize, and evaluate complex clinical information at the upper levels of Bloom’s taxonomy ( Zaidi et al., 2018). Generative AI can assist educators in scaffolding these higher-order cognitive processes by helping design questions that explicitly demand interpretation, prioritization, and reasoning rather than mere recognition ( Berbenyuk et al., 2024; Indran et al., 2024).

By adjusting prompts and parameters, educators can use AI to generate versions of KFP that target specific cognitive levels, for example, distinguishing between tasks that ask students to identify key findings (lower order) and those that require them to justify management decisions or evaluate competing interventions (higher order) ( Farmer & Page, 2005; Nayer et al., 2018). This calibrated complexity enhances both formative and summative assessment design within competency-based curricula ( Jantausch et al., 2023).

AI can also suggest reasoning scaffolds such as stepwise justification prompts, conditional branching, or “what-if” variations that help learners articulate the logic behind their choices. When combined with faculty validation, these features turn KFPs into active reasoning exercises that closely resemble real-world diagnostic and management decision-making ( Araújo et al., 2024).

Example:

Learning Objective: Apply critical reasoning to prioritize diagnostic steps in a patient with acute shortness of breath.

Key Features: 1.

Interpret initial presentation and vital signs.

Identify the most urgent diagnostic investigation.

Evaluate management priorities based on evolving information.

AI-Generated Higher-Order Question Sequence 1.

(Write-in) Based on this patient’s presentation, what is your leading differential diagnosis? List up to two.

(Short-menu) Select the two investigations that will most efficiently confirm your diagnosis.

(Write-in) The chest X-ray reveals a right-sided pneumothorax. Outline the next two management steps and justify their sequence.

This structure progresses from analysis to evaluation, showing how AI can scaffold increasing levels of cognitive complexity within a single clinical context.

By refining prompts to elicit reasoning, educators ensure that AI-generated KFP assess how students think, not just what they know ( Araújo et al., 2024; Jantausch et al., 2023).

Note: While AI can help generate cognitively rich content, final validation by subject-matter experts is essential to confirm that each question targets the intended cognitive level and aligns with learning outcomes ( Farmer & Page, 2005; Nayer et al., 2018).

Tip 5: Align item complexity and format with learner level and curriculum

Generative AI can accelerate the development of draft Key Feature Problems (KFPs), but educator oversight remains essential to ensure that items are constructively aligned with curricular outcomes, competency frameworks, and learner progression ( Farmer & Page, 2005; Harden et al., 1999). AI-generated content should also be contextualized to the institution’s clinical setting, patient population, and healthcare realities, enhancing authenticity and local relevance ( Berbenyuk et al., 2024; McLaughlin et al., 2019).

Item complexity must match the learner’s cognitive and experiential readiness. Early-phase students benefit from single-decision questions emphasizing recognition, while advanced learners should tackle multi-step cases demanding integration and prioritization ( Farmer & Page, 2005; Nayer et al., 2018). AI can scaffold difficulty by varying diagnostic ambiguity, patient stability, or data availability, supporting progressive learning across preclinical and clinical phases ( Berbenyuk et al., 2024; Indran et al., 2024; Tolsgaard et al., 2023).

An illustrative example of progressive complexity across learner levels focused on managing diabetic ketoacidosis (DKA) is provided in Table 2. This example demonstrates how item difficulty can be structured according to learner competence, from early recognition to advanced management and prioritization.

Table 2. Example: Learning outcome: Manage diabetic ketoacidosis (DKA) across varying levels of competence.

Learner level	AI-Generated focus	Example question type
Early learners	Identify key diagnostic findings	Write-in: List two laboratory findings confirming DKA.
Intermediate learners	Interpret severity and initiate management	Short-menu: Select three immediate management steps.
Advanced learners	Prioritize interventions in unstable patient	Write-in: Describe the sequence of management if the patient’s blood pressure drops to 80/50 mmHg despite fluid resuscitation.

AI can further diversify assessment by reformatting a single clinical concept into multiple item types, such as short-answer, extended-matching, or multiple-response questions, while maintaining the same cognitive intent ( Indran et al., 2024; Javaeed, 2018). This enhances reliability and fairness by sampling reasoning across modalities and supports triangulation in programmatic assessment frameworks ( Connor et al., 2020; Fatima et al., 2024; Tolsgaard et al., 2023).

Example:

A 28-year-old man presents with sudden onset of severe shortness of breath after a long-haul flight. He is tachycardic and mildly hypoxic.

Original KFP:

What is the most likely diagnosis, and what is the next immediate investigation? ( Write-in ).

An illustrative transformation of a single vignette into multiple formats is provided in Table 3.

Table 3. Illustrative example of how a single clinical vignette can be reformatted by AI into multiple item types while preserving cognitive intent and targeting different assessment foci.

Format	AI-Generated example	Assessment focus
Short menu	Select the two most likely diagnoses: pulmonary embolism, pneumothorax, pneumonia, acute asthma.	Diagnostic reasoning
Extended-Matching	Select the next immediate investigation from a list applicable across short vignettes.	Decision-making under time constraint
Short-Answer	Explain the pathophysiological mechanism leading to this presentation.	Integration of basic and clinical sciences
Multiple-Response	Which of the following management steps should be taken immediately? ( Select all that apply.)	Prioritization and safety judgment

By aligning complexity, format, and learner stage, AI enables coherent and longitudinal assessment design that reinforces stage-appropriate competencies while maintaining curricular coherence ( Berbenyuk et al., 2024; Harden et al., 1999; Indran et al., 2024).

Note: While AI can automate scaffolding and reformatting, faculty judgment remains indispensable to verify that each item accurately represents the intended cognitive process and meets clinical and psychometric standards ( Farmer & Page, 2005; Nayer et al., 2018).

Tip 6: Validate item using the 5-step workflow

The rapid generation of KFPs by generative AI demands a structured and defensible validation process to ensure that the resulting items meet accepted standards of quality, fairness, and educational relevance. To address this need, we adapted an evidence-based framework for validating AI-generated assessment content, drawing upon widely recognized validity models from Messick (1995), Kane (2013), Downing (2002), and Cook et al. (2015).

This adapted process integrates principles of content validity, cognitive process verification, response process accuracy, internal structure coherence, and consequential validity, contextualized for AI-assisted item generation ( Farmer & Page, 2005; Nayer et al., 2018; Tolsgaard et al., 2023). It provides educators with a transparent and replicable structure for reviewing and approving AI-generated questions prior to implementation ( Table 4).

Table 4. Adapted process for validating AI-generated questions.

Stage	Purpose	Validation evidence/Method	Source framework
1. Content Validation	Ensure alignment with curriculum outcomes and intended learning objectives.	SME review for relevance, accuracy, and blueprint mapping.	( Downing, 2002; Messick, 1995)
2. Cognitive Process Validation	Confirm that questions elicit the intended reasoning steps (analysis, synthesis, evaluation).	Think-aloud or expert cognitive walkthrough of each question’s reasoning pathway.	( Cook et al., 2015)
3. Response Process Validation	Verify that the expected student response corresponds to the key decision or action.	Pilot testing with small student sample; collect verbal feedback.	( Cook et al., 2015; Kane, 2013)
4. Internal Structure Validation	Examine psychometric properties (difficulty, discrimination, reliability).	Post-administration item analysis (CTT or IRT).	( Cook et al., 2015; Downing, 2004)
5. Consequential Validation	Evaluate educational impact and fairness.	Review of learner performance data, feedback, and potential bias in AI outputs.	( Messick, 1995)

This structured approach does not replace psychometric analysis but provides a pragmatic validity chain that educators can apply before large-scale deployment. Each step contributes evidence toward construct validity, ensuring that AI-generated KFPs assess genuine clinical reasoning rather than superficial pattern recognition ( Farmer & Page, 2005; Nayer et al., 2018; Wade et al., 2012).

The overall process is visualized in Figure 1, which outlines the adapted five-step validation workflow for AI-generated assessment items.

Figure 1. The 5-step validation process for AI-generated assessment items.

Example Application:

Suppose AI generates a KFP on managing community-acquired pneumonia. •

Stage 1: SMEs confirm the key features (diagnosis, antibiotic choice, admission criteria) match curricular outcomes.

•

Stage 2: Cognitive walkthrough reveals the item requires decision-making rather than recall.

•

Stage 3: A pilot group of students completes the item; feedback confirms clarity of question intent.

•

Stage 4: Item analysis after pilot shows appropriate difficulty (p = 0.65) and discrimination (r = 0.32).

•

Stage 5: Post-assessment debrief confirms students perceived the question as realistic and fair.

Note: The five-step validation process is an adaptation of established assessment validity frameworks ( Cook et al., 2015; Downing, 2002; Kane, 2013; Messick, 1995), contextualized for the use of generative AI in question development. It aims to provide a practical quality-assurance model for educators rather than propose a novel psychometric paradigm.

Tip 7: Provide decision-specific, actionable feedback

Effective feedback in KFPs must be decision-specific, concise, and actionable, focusing on each key feature rather than the case as a whole ( Farmer & Page, 2005; Hrynchak et al., 2014; Nayer et al., 2018). Well-designed feedback helps learners understand why a particular decision is correct and why alternatives are less appropriate. Generative AI can assist in drafting such targeted feedback rapidly, but its output must always undergo SME review to verify clinical accuracy, tone, and contextual sensitivity ( Farmer & Page, 2005; Nayer et al., 2018; Zhang et al., 2025).

Generative AI can be prompted to produce feedback at different levels of granularity, as summarized in Figure 2, which illustrates how prompts can generate decision-specific feedback messages tailored to each key feature.

Figure 2. Prompting AI to generate decision-specific feedback at multiple levels.

Per-key feature rationales explaining both correct and incorrect choices, particularly valuable for short-menu (SM) items where learners must select a specified number of responses ( Farmer & Page, 2005; Nayer et al., 2018). •

Tiered feedback messages for correct, partially correct, and incorrect responses that identify common reasoning errors and suggest appropriate next steps in decision-making ( Burner et al., 2025; Lee & Moore, 2024).

•

Counterfactual prompts, such as “What if the patient were hypotensive?”, which encourage reflective reasoning without revealing the answer ( Burner et al., 2025; Lee & Moore, 2024).

•

Clarity refinements using plain-language summaries or controlled length limits to improve accessibility for diverse learners ( Burner et al., 2025; Lee & Moore, 2024).

Timing also matters. For formative KFP, immediate, key feature level feedback enhances learning efficiency and self-regulation ( Burner et al., 2025; Lee & Moore, 2024). For summative KFP, delayed or aggregate feedback preserves item security while still supporting post-exam reflection ( Farmer & Page, 2005; Nayer et al., 2018).

Despite its efficiency, AI-generated feedback may lack nuance and contextual sensitivity in complex or atypical cases, which highlights the need for human oversight, particularly in edge scenarios ( Burner et al., 2025). SMEs should verify that AI feedback accurately targets the intended reasoning process and does not introduce misleading or unsafe guidance.

Illustrative Example (Write-in + Short-Menu with Feedback)

Scenario (abridged): A 28-year-old presents with fever, headache, and neck stiffness.

KF-Q1 (write-in): What is the most likely diagnosis? •

Correct feedback: “Bacterial meningitis is most consistent with fever and neck stiffness; treat urgently with empiric antibiotics.”

•

Partially correct (‘viral meningitis’): “Consider illness severity and urgency of treatment—what findings suggest bacterial rather than viral?”

KF-Q2 (SM; select 2): Which initial diagnostic investigations are required? •

Lumbar puncture

•

Blood Culture

•

CT Head

Tip 8. Refine items using performance and psychometric data

Continuous improvement of AI-generated KFPs depend on systematic analysis of response data and psychometric evidence. Educators should employ both quantitative and qualitative data to identify items that require revision, strengthening validity, reliability, and alignment with learning outcomes ( Farmer & Page, 2005; Kim et al., 2022; Nayer et al., 2018; Tolsgaard et al., 2023).

Data sources include item statistics from pilot tests (difficulty, discrimination, non-functioning options) and learner feedback on clarity and realism. When analyzed together, these indicators reveal whether each KFP effectively assesses the intended decision point ( Almansour & Alfhaid, 2024; Tolsgaard et al., 2023). For example, very low discrimination may indicate that the question does not differentiate between competent and struggling learners, while an unexpectedly high success rate may suggest over-cueing or insufficient cognitive demand ( Kim et al., 2022).

AI can support this process by generating revised item versions based on educator feedback or psychometric findings. Prompted appropriately, the model can reword stems for clarity, modify distractors for plausibility, or adjust contextual parameters to correct misalignment ( Berbenyuk et al., 2024; Indran et al., 2024). These revisions must then be revalidated by SMEs before reuse.

Illustrative Example (KFP Improvement via Data Review)

Original AI-Generated KFP (Pre-Revision)

Scenario: A 35-year-old patient presents with pleuritic chest pain and mild dyspnea.

Question (Write-in): What is the most likely diagnosis?

Issue: Student response data showed poor discrimination (r = 0.05); many misidentified pneumonia or pneumothorax.

Data Insight: Qualitative feedback revealed insufficient contextual clues to differentiate pulmonary embolism from other causes of chest pain.

Revised KFP (Post-Review)

Scenario: A 35-year-old female on oral contraceptives presents with sudden pleuritic chest pain and mild dyspnea after a 10-hour flight.

Question (Write-in): What is the most likely diagnosis?

Rationale: Added risk factor and temporal trigger clarified the intended decision focus (PE) without making the question easier. SME review confirmed improved alignment and realism.

This example demonstrates how data-driven iteration enhances clarity, construct validity, and clinical authenticity ( Farmer & Page, 2005; Nayer et al., 2018). The implementation steps for this iterative process are illustrated in Figure 3, which presents the data-driven KFP improvement workflow.

Figure 3. Implementation steps for data-driven KFP improvement.

Implementation Steps for Data-Driven KFP Improvement 1.

Collect data from pilot or formative use (difficulty index, discrimination, and student feedback).

Analyze patterns to identify questions that fail to differentiate or that mislead due to ambiguous wording.

Prompt AI with explicit instructions for targeted revision (“simplify stem language,” “add one contextual risk factor,” etc.).

Revalidate revised items using the adapted validation framework (Tip 6).

Re-analyze post-revision metrics before including items in summative pools ( Farmer & Page, 2005; Nayer et al., 2018; Tolsgaard et al., 2023).

Note: This process focuses solely on psychometric and content improvement. Considerations of inclusivity and bias mitigation are addressed separately (see Tip 10).

Tip 9: Safeguard equity, diversity, and inclusion in item content

Equity, diversity, and inclusion (EDI) are essential principles in assessment design. In the context of Key Feature Problems (KFP), EDI ensures that all learners engage with clinically authentic yet culturally fair scenarios that reflect the diversity of real-world patient populations ( Kim et al., 2024; Tolsgaard et al., 2023). When generative AI is used to create KFP, additional vigilance is required to prevent the unintentional introduction or amplification of bias in case content, patient descriptors, or reasoning expectations ( Kim et al., 2024; Rodman et al., 2024).

Identify and Mitigate Potential Bias in AI Outputs

AI models can inadvertently reproduce societal or dataset biases, leading to stereotypical patient profiles, imbalanced demographic representation, or culturally narrow assumptions ( Kim et al., 2024).

To prevent this, educators should: •

Audit AI-generated cases for demographic balance across age, gender, ethnicity, and socioeconomic background.

•

Remove stereotypical associations (e.g., linking certain diseases disproportionately to specific ethnic groups without epidemiological justification).

•

Diversify contextual variables, such as healthcare setting, geographic region, and access to resources, to mirror real-world practice diversity ( Tolsgaard et al., 2023).

•

Involve diverse faculty reviewers and learners in item validation to surface biases that might be invisible to homogeneous panels ( Rodman et al., 2024).

Promote Inclusive Case Representation

EDI-aligned KFP should expose learners to the breadth of human variation and social determinants that influence diagnosis and management. AI can assist by generating case variants that represent different demographic or psychosocial contexts while maintaining equivalent cognitive challenge ( Berbenyuk et al., 2024; Kim et al., 2024).

For example, a case on myocardial infarction can be rendered across: •

A younger female with atypical presentation,

•

An older diabetic male with silent ischemia, and

•

A rural patient with delayed access to emergency care.

Such diversity fosters equitable preparedness and reduces bias in clinical decision-making ( Kim et al., 2024; Rodman et al., 2024; Tolsgaard et al., 2023).

To operationalize inclusivity in AI-assisted KFP design, educators should follow a structured EDI review sequence illustrated in Figure 4, which outlines the bias-mitigation checkpoints during AI generation and validation.

Figure 4. Bias mitigation checkpoints during AI-assisted item generation.

Integrate EDI Checks Into the KFP Workflow

To operationalize inclusivity in AI-assisted KFP design: 1.

Set EDI parameters before prompting AI, specifying desired demographic distribution and case diversity.

Review all generated content with an EDI checklist (representation balance, language neutrality, accessibility).

Pilot-test questions across mixed learner groups to identify differential performance that could signal construct-irrelevant bias ( Rodman et al., 2024; Tolsgaard et al., 2023).

Document revisions and maintain transparency about the EDI review process as part of assessment governance.

Original AI Output:

A 45-year-old South Asian man with poorly controlled diabetes presents with chest pain after eating a heavy meal.

Issue: The AI model consistently associated “South Asian” with “diabetes,” reinforcing a stereotype without instructional purpose.

Revised Prompt and Case:

Generate a case of a 45-year-old adult presenting with chest pain unrelated to ethnicity. Include relevant lifestyle and risk factors.

Result: The AI produced a balanced scenario highlighting modifiable risks (sedentary lifestyle, hypertension) rather than cultural identity, aligning better with fairness and learning objectives.

Note: EDI alignment is not a single review step but a continuous design principle that parallels psychometric validation. Each AI-generated KFP should undergo both content and equity review before use to ensure fairness, representation, and clinical authenticity ( Kim et al., 2024; Rodman et al., 2024).

Tip 10: Use AI ethically and document it transparently

Use existing pre-trained models rather than developing new ones. Concentrate faculty effort on prompt design, SME review, and validity checks so AI output meets curricular and clinical standards ( Berbenyuk et al., 2024; Kovari, 2024; Tolsgaard et al., 2023).

Document for auditability. For each item, record the tool/model and version used, prompt template (and key settings), SME comments, and validation outcomes (see Tip 6). This enables reproducibility and external review by faculty and accreditors ( Kovari, 2024; Rodman et al., 2024; Tolsgaard et al., 2023).

Protect boundaries. Never upload identifiable learner or patient data to external tools; clarify authorship when AI contributes text or drafts; require human sign-off on all exam materials ( Kovari, 2024; Tolsgaard et al., 2023).

Build capacity. Provide ongoing faculty development in responsible prompting, data stewardship, and bias awareness so that AI augments educational expertise rather than replacing it ( Berbenyuk et al., 2024; Indran et al., 2024; Kovari, 2024; Tolsgaard et al., 2023).

Be pragmatic about clinical data. Until secure educational data environments mature, prefer synthetic or de-identified sources and simulated EHR interfaces; full interoperability with real systems is generally not feasible yet ( Blau et al., 2024; Razmi, 2024; Tolsgaard et al., 2023).

Quick checklist (for your item bank record): •

Tool/model + version

•

Prompt template/context

•

SME reviewers + decisions

•

Validation evidence (per Tip 6)

•

Data handling and disclosure notes

This keeps AI use ethical, transparent, and sustainable while preserving assessment integrity.

A consolidated overview of all ten tips summarizing their purposes, recommended educator actions, and common pitfalls is presented in Table 5.

Table 5. Ten tips for AI-assisted KFP design: purpose, actions, and pitfalls.

Tip	Purpose	Concrete actions	Pitfall to avoid
1. Define learning outcomes and key features	Anchor AI output to decisions that matter	Write the LO. List 3–5 key features. Prompt AI to refine only those features. Keep SME-approved features.	Letting AI invent new outcomes or drift from the blueprint
2. Build authentic, context-rich vignettes	Increase cognitive fidelity and transfer	Prompt for age/sex, comorbidities, setting, constraints; localize names, drugs, and guidelines.	Generic, placeless cases misaligned with local practice
3. Generate scenario diversity and parallel variants	Support progress testing and reduce cueing	Create 3–4 variants that keep key features but change demographics, severity, and setting; tag each variant.	Changing the construct or difficulty too much across variants
4. Scaffold higher-order reasoning	Move beyond recall to clinical reasoning	Sequence prompts: identify → interpret → prioritize → justify; add “what-if” branches.	Single-step items solvable by pattern recognition
5. Align complexity & format with learner level and curriculum	Keep items fair and teachable for the target group	State learner level/course; tune data load, ambiguity, and steps; select format (write-in/SM/EMQ) to match intent.	Reusing high-complexity items for early learners
6. Validate items using the 5-step workflow	Make items defensible before high-stakes use	Document content SME check, cognitive walkthrough, small-group response check, item analysis, consequences review.	Treating AI output as final or skipping documentation
7. Provide decision-specific, actionable feedback	Turn KFPs into formative tools	Draft per-key-feature feedback for correct/partial/incorrect; SMEs edit for safety and tone.	Global case summaries that ignore the exact decision error
8. Refine using performance & psychometrics	Close the loop with real data	Review p-value, discrimination, distractor use, and comments; prompt AI for targeted rewrites; re-validate.	Keeping weak items in the bank without revision
9. Safeguard equity, diversity & inclusion	Prevent construct-irrelevant bias	Set EDI parameters in prompts; audit representation and language; pilot across mixed groups; record EDI review.	Stereotypes, single-setting/single-demographic fixation
10. Use AI ethically & document transparently	Protect security, trust, and auditability	Prefer pre-trained models; record tool/version, prompts, SME decisions, validation evidence; avoid identifiable data; provide faculty PD; use synthetic/de-identified clinical data.	Uploading identifiable data or omitting disclosure/governance

Key takeaways

•

Start with outcomes and key features; keep AI inside those boundaries.

•

Build realism and parallel variants to test transfer, not recall.

•

Calibrate complexity and format to learner level, then validate with a simple 5-step chain.

•

Feedback must be decision-specific; use post-delivery data to iterate.

•

Bake in EDI checks to avoid bias and construct-irrelevant variance.

•

Treat AI as an assistant: document tools, prompts, SME decisions, and data-handling; never upload identifiable data.

Limitations and scope

This paper is intended as a practice-oriented guide rather than an empirical or psychometric validation study. Its focus is on the educational design and responsible use of generative artificial intelligence (AI) to assist in developing Key Feature Problems (KFP) within undergraduate (UME) and postgraduate medical education (PGME) contexts. The recommendations emphasize conceptual alignment, item quality, and governance rather than quantitative analysis of reliability, validity coefficients, or statistical performance metrics.

The scope of guidance also excludes blueprinting logistics, standard setting, and scoring procedures, which vary across institutions and are beyond the current discussion. While examples provided illustrate typical clinical reasoning domains, they are intended to demonstrate design principles rather than to serve as validated assessment items.

Implementation feasibility may differ depending on institutional infrastructure, data governance maturity, and faculty readiness. The principles described should therefore be adapted to local curricular frameworks, regulatory requirements, and available AI tools. Educators should interpret these tips as a foundation for responsible innovation and not as a prescriptive or exhaustive model for KFP development.

Conclusion

This article offers a practical pathway for integrating generative AI into Key Feature Problem design while preserving educational rigor, fairness, and clinical authenticity. The ten tips anchor AI use to clearly defined outcomes and key features; they promote authentic, context-rich vignettes and parallel variants; they scaffold higher-order reasoning rather than simple recall; and they require systematic validation, targeted feedback, and continuous psychometric refinement. Applied together, these practices turn AI from a novelty into a reliable assistant that strengthens the defensibility and learning value of KFPs within programmatic assessment.

Effective implementation depends on disciplined process rather than advanced modeling. Institutions should prioritize transparent documentation of tools, prompts, SME decisions, and validation evidence; embed equity checks to reduce construct-irrelevant variance; and provide ongoing faculty development in responsible prompting, data stewardship, and bias awareness. Until secure educational data environments mature (e.g., institutionally hosted sandboxes), realism can be achieved through synthetic or de-identified data and simulated EHR interfaces. These guardrails protect privacy and trust while allowing innovation to advance in manageable, auditable steps.

Adopting the ten tips can improve both reliability and educational impact. Items become better aligned to curricular intent and learner level, feedback becomes decision-specific and actionable, and post-administration data drive iterative improvement rather than one-off item use. In this way, AI-supported KFPs contribute to a more coherent and equitable assessment ecosystem that helps learners practice clinical reasoning and transfer it to new settings.

Future work should test these recommendations at scale. Priorities include prospective studies on learning outcomes, stability of psychometric indices across cohorts and subgroups, the effectiveness of bias and equity audits, and the operational value of documentation checklists for accreditation. Cross-institution collaborations and shared repositories of prompts, validation artifacts, and item revision histories will accelerate cumulative knowledge. With careful governance and continuous evaluation, AI can augment rather than replace educational expertise and help institutions deliver assessments that are authentic, defensible, and oriented toward better patient care.

Data availability

No datasets were generated or analyzed during the preparation of this article. Therefore, data sharing is not applicable.

References

Almansour

Alfhaid

: Generative artificial intelligence and the personalization of health professional education: A narrative review. Medicine. 2024;103(31):e38955. 39093806

10.1097/MD.0000000000038955

PMC11296413

Andrade

G d SA

Alves

Melo

: Ethical reasoning in medical decisions: the physician-patient dilemma. Revista Bioética. 2024;32. 10.1590/1983-803420243658EN

Araújo

Gomes

Ribeiro

: Critical thinking pedagogical practices in medical education: a systematic review. Front. Med. 2024;11. 38947238

10.3389/FMED.2024.1358444

PMC11211358

Berbenyuk

Powell

Zary

: Feasibility and Educational Value of Clinical Cases Generated Using Large Language Models. Stud. Health Technol. Inform. 2024;316:1524–1528. 39176494

10.3233/shti240705

Blau

Cerf

Enriquez

: Protecting scientific integrity in an age of generative AI. Proc. Natl. Acad. Sci. USA. 2024;121:e2407886121. 38771193

10.1073/pnas.2407886121

PMC11145223

Burner

Lindvig

Wærness

: “We Should Not Be Like a Dinosaur”—Using AI Technologies to Provide Formative Feedback to Students. Educ. Sci. 2025;15(1):58–58. 10.3390/educsci15010058

Connor

Durning

Rencic

: Clinical Reasoning as a Core Competency. Acad. Med. 2020;95(8):1166–1171. 10.1097/acm.0000000000003027

Cook

Brydges

Ginsburg

: A contemporary approach to validity arguments: a practical guide to Kane’s framework. Med. Educ. 2015;49(6):560–575. 25989405

10.1111/medu.12678

Downing

: Threats to the validity of locally developed multiple-choice tests in medical education: construct-irrelevant variance and construct underrepresentation. Adv. Health Sci. Educ. Theory Pract. 2002;7(3):235–241. 12510145

10.1023/A:1021112514626

Downing

: Reliability: on the reproducibility of assessment data. Med. Educ. 2004;38(9):1006–1012. 10.1111/j.1365-2929.2004.01932.x

Farmer

Page

: A practical guide to assessing clinical decision-making skills using the key features approach. Med. Educ. 2005;39(12):1188–1194. 16313577

10.1111/j.1365-2929.2005.02339.x

Fatima

Sheikh

Osama

: Authentic assessment in medical education: exploring AI integration and student-as-partners collaboration. Postgrad. Med. J. 2024;100(1190):959–967. 39041454

10.1093/postmj/qgae088

Franco D’Souza

Mathew

Mishra

: Twelve tips for addressing ethical concerns in the implementation of artificial intelligence in medical education. Med. Educ. Online. 2024;29(1). 38566608

10.1080/10872981.2024.2330250

PMC10993743

Harden

Crosby

Davis

: AMEE Guide No. 14: Outcome-based education: Part 1 - An introduction to outcome-based education. Med. Teach. 1999;21(1):7–14. 10.1080/01421599979969

Hrynchak

Glover Takahashi

Nayer

: Key-feature questions for assessment of clinical reasoning: a literature review. Med. Educ. 2014;48(9):870–883. 25113114

10.1111/medu.12509

Indran

Paranthaman

Gupta

: Twelve tips to leverage AI for efficient and effective medical question generation: A guide for educators using Chat GPT. Med. Teach. 2024;46(8):1021–1026. 38146711

10.1080/0142159X.2023.2294703

Jantausch

Bost

Bhansali

: Assessing trainee critical thinking skills using a novel interactive online learning tool. Med. Educ. Online. 2023;28(1). 36871259

10.1080/10872981.2023.2178871

PMC9987719

Javaeed

: Assessment of Higher Ordered Thinking in Medical Education: Multiple Choice Questions and Modified Essay Questions. MedEdPublish. 2018;7:128. 38074575

10.15694/mep.2018.0000128.1

PMC10699377

Kane

: Validating the Interpretations and Uses of Test Scores. J. Educ. Meas. 2013;50(1):1–73. 10.1111/JEDM.12000

Kim

Ham

Lee

S-S

: Differences in student-AI interaction process on a drawing task: Focusing on students’ attitude towards AI and the level of drawing skills. Australas. J. Educ. Technol. 2024. 10.14742/ajet.8859

Kim

Lee

Cho

: Learning design to support student-AI collaboration: perspectives of leading teachers for AI in education. Educ. Inf. Technol. 2022;27(5):6069–6104. 10.1007/s10639-021-10831-6

Kovari

: Ethical use of ChatGPT in education—Best practices to combat AI-induced plagiarism. Frontiers in Education. 2024;9. 10.3389/feduc.2024.1465703

Lee

Moore

: Harnessing Generative AI (GenAI) for Automated Feedback in Higher Education: A Systematic Review. Online Learning. 2024;28(3):82–104. 10.24059/olj.v28I3.4593

McLaughlin

Wolcott

Hubbard

: A qualitative review of the design thinking framework in health professions education. BMC Med. Educ. 2019;19(1):98. 30947748

10.1186/s12909-019-1528-8

PMC6449899

Messick

: Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. Am. Psychol. 1995;50(9):741–749. 10.1037/0003-066X.50.9.741

Mishra

Farooqui

Shimna

: The Role of Artificial Intelligence in Improving Medical Education: A Comprehensive Review. Advancement and New Understanding in Medical Science. 2024;7:81–101. 10.9734/bpi/anums/v7/7333b

Nayer

Glover Takahashi

Hrynchak

: Twelve tips for developing key-feature questions (KFQ) for effective assessment of clinical reasoning. Med. Teach. 2018;40(11):1116–1122. 30001652

10.1080/0142159X.2018.1481281

Page

Bordage

Allen

: Developing key-feature problems and examinations to assess clinical decision-making skills. Acad. Med. 1995;70(3):194–201. 7873006

10.1097/00001888-199503000-00009

Potter

Jefferies

: Enhancing communication and clinical reasoning in medical education: Building virtual patients with generative AI. Future Healthcare Journal. 2024;11:100043. 10.1016/j.fhj.2024.100043

Preiksaitis

Rose

: Opportunities, Challenges, and Future Directions of Generative Artificial Intelligence in Medical Education: Scoping Review. JMIR Medical Education. 2023;9:e48785. 37862079

10.2196/48785

PMC10625095

Qiu

Liu

: Capable exam-taker and question-generator: the dual role of generative AI in medical education assessment. Global Medical Education. 2025. 10.1515/gme-2024-0021

Razmi

: Building Robust Medical Algorithms. AI Doctor. 2024;27–65. 10.1002/9781394240197.ch2

Rodman

Mark

Artino

: Using Generative Artificial Intelligence in Medical Education. Acad. Med. 2024;100(2):250–250. 10.1097/acm.0000000000005937

Sardesai

Russo

Martin

: Utilizing generative conversational artificial intelligence to create simulated patient encounters: a pilot study for anaesthesia training. Postgrad. Med. J. 2024;100(1182):237–241. 38240054

10.1093/postmj/qgad137

Subaveerapandiyan

Mvula

Ahmad

: Assessing AI literacy and attitudes among medical students: implications for integration into healthcare practice. J. Health Organ. Manag. 2024. 10.1108/jhom-04-2024-0154

Teferi

Omar

Jeyakumar

: Accelerating the Appropriate Adoption of Artificial Intelligence in Health Care: Prioritizing IDEA to Champion a Collaborative Educational Approach in a Stressed System. Educ. Sci. 2023;14(1):39. 10.3390/educsci14010039

Tolsgaard

Pusic

Sebok-Syer

: The fundamentals of Artificial Intelligence in medical education research: AMEE Guide No. 156. Med. Teach. 2023;45(6):565–573. 36862064

10.1080/0142159X.2023.2180340

Wade

Harrison

Hollands

: Student perceptions of the progress test in two settings and the implications for test deployment. Adv. Health Sci. Educ. 2012;17(4):573–583. 22041871

10.1007/S10459-011-9334-z

Zaidi

NLB

Grob

Monrad

: Pushing Critical Thinking Skills with Multiple-Choice Questions: Does Bloom’s Taxonomy Work? Acad. Med. 2018;93(6):856–859. 10.1097/acm.0000000000002087

Zhang

Gao

Suraworachet

: Evaluating Trust in AI, Human, and Co-produced Feedback Among Undergraduate Students. 2025. Reference Source