ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Data Note

RAD-CaseBookLLM-08: An open-access dataset of structured large language model–generated radiology differential diagnosis teachings

[version 1; peer review: 1 approved, 1 approved with reservations]
PUBLISHED 02 Mar 2026
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

This article is included in the Data: Use and Reuse collection.

This article is included in the AI in Medicine and Healthcare collection.

Abstract

Background

Large language models are increasingly explored in medical education, particularly for generating structured explanatory content. However, openly accessible datasets capturing full-length model outputs in a standardized and reusable format remain limited. In radiology education, differential diagnosis teaching is typically organized around key imaging findings integrated with clinical reasoning. We developed RAD-CaseBookLLM-08, an open dataset of large language model–generated radiology differential diagnosis teachings derived from lesion-based thematic topics.

Methods

The dataset comprises 225 cases across nine radiology subspecialties. Thematic key imaging findings were derived from an established case-based radiology textbook and used as structured prompts. All cases were generated using ChatGPT-4o (OpenAI) in March 2025 via a web-based interface with conversation memory disabled. Each topic was processed in an independent session using an identical prompt template in which only the subspecialty and imaging finding were modified. Outputs were copied verbatim without editing, correction, or validation, and formatting elements were preserved. The dataset is provided in Microsoft Word and Portable Document Format files and is organized by subspecialty with sequential case labeling. No patient data were included.

Conclusions

RAD-CaseBookLLM-08 provides a structured, time-stamped collection of large language model–generated radiology teaching texts. The dataset may support reproducibility studies, benchmarking of model outputs, prompt engineering evaluation, and analysis of educational structure in machine-generated differential diagnoses. It is openly available under a Creative Commons Zero license via Zenodo.

Keywords

Radiology education; Large language models; ChatGPT; Medical artificial intelligence; Differential diagnosis; Open dataset; Medical education research

Introduction

Large language models (LLMs) have recently emerged as powerful tools capable of generating coherent, structured, and context-aware natural language outputs.13 Their rapid integration into medical domains has prompted increasing interest in their potential roles in clinical reasoning support, decision-making assistance, and medical education.46 In particular, generative models have now the potential of producing structured explanatory content that resembles textbook-style teaching material.7,8

Radiology education relies heavily on structured diagnostic reasoning. A central pedagogical component is the formulation of differential diagnoses based on key imaging findings integrated with clinical context.9,10 This lesion-based or pattern-based approach is widely used in radiology casebooks and board examination preparation materials. Trainees are typically exposed to thematic imaging findings (e.g., a cavitary pulmonary mass or distal interphalangeal arthropathy) and are expected to develop a prioritized differential diagnosis, recognize distinguishing imaging characteristics, and understand the reasoning leading to the final diagnosis.

While LLMs have demonstrated the ability to generate medical explanations and answer clinical questions, the reproducibility, structure, and educational consistency of LLM-generated differential diagnosis teachings remain insufficiently documented in openly accessible datasets.11 Existing studies often report performance metrics or qualitative assessments, but the underlying generated texts are rarely made publicly available in a structured and reusable format. This limits transparency, benchmarking across model versions, evaluation of prompt sensitivity, and methodological reproducibility.12,13

Open datasets documenting LLM-generated medical content are particularly important for several reasons. First, LLM outputs are inherently time-sensitive: model updates and parameter adjustments can alter responses over time.14 Capturing outputs at a defined timepoint enables longitudinal comparison and benchmarking. Second, prompt design significantly influences output structure and reasoning pathways.15 Publicly sharing prompt iterations enhances reproducibility and allows independent investigation of prompt engineering strategies. Third, openly available datasets support FAIR principles (Findable, Accessible, Interoperable, Reusable) and facilitate secondary analyses, including linguistic evaluation, hallucination detection research, educational structure assessment, and computational benchmarking.16

To contribute to ongoing efforts toward transparency and reproducibility in medical LLM research, we created RAD-CaseBookLLM-08, a structured dataset of LLM-generated radiology differential diagnosis teachings derived from thematic key imaging findings. The dataset was generated using a standardized prompting protocol applied systematically across multiple radiology subspecialties.

While RAD-CaseBookLLM-08 is not intended as a primary teaching resource, it provides a structured dataset suitable for research applications. The dataset can be used to study the characteristics of LLM-generated educational text, compare such outputs with conventional radiology teaching materials, investigate prompt engineering strategies, and analyze the organization, clarity, and pedagogical value of machine-generated differential diagnoses. By making the dataset openly available, we aim to support reproducibility, benchmarking, and further exploration of AI-assisted medical education.

Methods

Source of thematic topics

Thematic radiological key imaging findings were derived from the case-based structure of the radiology text book Top 3 Differentials in Radiology: A Case Review. (O’Brien, 2010).17 The source textbook presents radiological cases organized around a central imaging finding, followed by a structured differential diagnosis discussion and final diagnosis. For the purpose of this dataset, only the lesion-based thematic topics, referred to in the book as “Key Imaging Findings” (e.g., “Pharyngeal mucosal mass”), were used as input for the LLM. No textbook images, figure reproductions, or verbatim text excerpts were included in the dataset nor were they included as input for the LLM.

The following subspecialties were included, each comprising 25 cases: chest imaging, cardiac imaging, gastrointestinal imaging, genitourinary imaging, musculoskeletal imaging, head and neck imaging, brain and spine imaging, pediatric imaging, breast imaging, and vascular and interventional radiology. This resulted in a total of nine subspecialty sections and 225 cases overall. The complete dataset is compiled into a single PDF document comprising 360 pages, 66,874 words, and 502,964 characters.

The sections dedicated to nuclear medicine, fetal imaging, ultrasound imaging, and historical “Roentgen Classics” were excluded. These exclusions were made to maintain consistency with lesion-based cross-sectional radiological differential diagnosis teaching and to focus on subspecialties most commonly represented in structured diagnostic reasoning frameworks.

LLM environment

Dataset generation was performed using the following environment:

  • Model: ChatGPT-4o

  • Provider: OpenAI

  • Interface: Web-based interface

  • Model access date: March 2025

  • Conversation memory: Disabled

Each thematic topic was processed in an independent chat session. No conversation history was reused across topics.

To reduce potential personalization or adaptation effects related to prior interactions, a newly created user account was used exclusively for dataset generation. This measure was implemented to minimize contextual carryover and to improve output independence across cases.

No external plugins, browsing tools, or additional system instructions were activated during generation.

Prompt development

Prompt engineering was conducted iteratively through internal testing prior to final dataset generation. The objective was to obtain outputs that were structurally consistent, educational in tone, organized by differential diagnosis categories, explicit in diagnostic reasoning, and reproducible across thematic topics.

Multiple candidate prompts were tested and refined. Because complex prompts resulted in variable outputs, the following simple yet precise final prompt, which provided the best results, was retained:

“I am a radiology resident preparing for my final radiology exam. Please provide a concise radiological summary, from an exam-oriented perspective, of the following:

Specialty: [[subspecialty name (e.g., Musculoskeletal)]]

Topic: [[Key Imaging Finding (e.g., Sequestrum)]]”

In this final prompt, only the subspecialty name and Key Imaging Finding were manually updated to correspond to each processed case; the rest of the prompt was left untouched. All prompts were written in English. After the final prompt was chosen, the answers were extracted in a single session; we did not retry the same prompts multiple times.

Dataset generation protocol

For each thematic key imaging finding, the following standardized procedure was applied:

  • 1. A new chat session was initiated in the web interface.

  • 2. The finalized structured prompt was entered, specifying the subspecialty and thematic topic.

  • 3. The complete model output was copied verbatim in a word document.

  • 4. The case number was manually added at the top of the output.

  • 5. Original formatting (including headings, bold text, bullet points, and spacing) was preserved.

  • 6. No editorial modification, correction, summarization, or medical validation was performed.

Interactive or conversational concluding phrases generated by the model (e.g., “Would you like more details on …”) were intentionally retained to preserve authenticity of the output and maintain fidelity to the original generation context.

The dataset therefore represents unaltered LLM-generated content captured at a defined timepoint.

Dataset structure

The RAD-CaseBookLLM-08 dataset is organized by radiology subspecialty.

For each subspecialty:

  • One master document contains the complete list of LLM-generated teachings (n = 25 cases per specialty) corresponding to all thematic key findings within that section.

  • Cases are structured sequentially and labeled according to the case numbering system of the source textbook to enable future comparative or benchmarking studies.

  • Each case heading in the Word (.docx) version is formatted using the “Title 1” style to allow structured navigation via document navigation panels.

Two file formats are provided:

  • Microsoft Word (.docx) format

  • Converted PDF format

A summary dataset overview with a list of key imaging findings per specialty is provided in Tables 13.

Table 1. List of cardiothoracic, gastrointestinal, and genitourinary key imaging findings.

CARDIOTHORAXGASTROINTESTINALUROGENITAL
Solitary Pulmonary Nodule (SPN)Hyperdense LiverSolid Renal Mass
Multiple Pulmonary NodulesNodular Liver ContourMultiple Bilateral Renal Lesions/Masses
Cavitary Pulmonary MassEsophageal DiverticulumCystic Renal Mass
Miliary Pulmonary NodulesSolitary Hypodense, Hypovascular Liver MassRetroperitoneal Mass
Centrilobular Pulmonary NodulesMultiple Hypodense Liver MassesCortical Nephrocalcinosis
Cystic Lung DiseaseCystic Mass at Porta HepatisMedullary Nephrocalcinosis
Lower Lobe Interstitial Lung Disease (ILD)Esophageal Submucosal MassStriated Nephrogram
Upper Lobe Interstitial Lung Disease (ILD)Esophageal DilatationPapillary Necrosis
Hyperlucent LungEsophageal OutpouchingsStaghorn Calculus
Anterior Mediastinal MassEsophageal UlcersRenal Cortical Defect
Middle Mediastinal MassSolid Pancreatic MassRenal Pelvis Mass
Posterior Mediastinal MassLinitis PlasticaMedial Deviation of the Ureters
Chronic Air-Space DiseaseGastric UlcerUreteral Filling Defects
Peripheral Air-Space DiseaseGastric Fold ThickeningRenal Migration Anomaly
Ground-Glass Opacification (GGO)Cecal MassBladder Filling Defect
Mediastinal/Hilar LymphadenopathyMesenteric MassBilateral Cystic Renal Disease
Calcified Pleural DiseaseTerminal Ileal Wall ThickeningPerinephric Fluid Collection
BronchiectasisColonic Wall ThickeningPear-Shaped Bladder
Perilymphatic Pulmonary NodulesSmall Bowel Wall ThickeningProstate Enlargement
Pleural-Based MassEsophageal StrictureBladder Rupture
Parenchymal Disease in a Patient with HIVSmall Bowel DilatationBladder Wall Calcifications
Abnormal Left Ventricular ContourCystic Pancreatic MassAdrenal Mass
Cardiac MassHypervascular Liver MassFatty Retroperitoneal Mass
Delayed Myocardial Enhancement (DME)Multiple Splenic NodulesDilated Ureter
Cardiac Wall FatIntrahepatic Biliary Ductal StricturesUrethral Stricture

Table 2. List of musculoskeletal, head and neck, and neuro key imaging findings.

MSKHEAD AND NECKNEURO
FOG MACHINE (Mnemonic for Multifocal Lytic Lesions)Enhancing Orbital MassConfluent White Matter Lesions
SequestrumOrbital Rim FractureConfluent White Matter Lesions in a Child
Periosteal Reaction in an InfantCavernous Sinus Mass/EnhancementRing-Enhancing Lesions in Brain & Spine
Rugger Jersey SpineAggressive Sinus Disease with Bony DestructionPineal Region Mass
SacroiliitisUnilateral Parotid MassSellar/Suprasellar Mass in a Child
Proximal Arthropathy (MCP Joints)Bilateral Parotid EnlargementPosterior Fossa Mass in a Child
Distal Arthropathy (IP Joints)Orbital Muscle EnlargementPosterior Fossa Mass in an Adult
Erosive Arthropathy of the FootMucosal Space MassPosterior Fossa Cyst
ChondrocalcinosisMasticator Space MassCerebellopontine Angle (CPA) Mass
Vertebra Plana in a ChildCarotid Space MassCerebellar Tonsillar Herniation
Wormian BonesRetropharyngeal MassCerebrospinal Fluid (CSF)-Lined Cortical Cleft
Madelung DeformityClival MassEnhancing Intramedullary Spinal Mass
Lucent Metaphyseal BandsVascular Injury to the NeckIntradural Extramedullary (IDEM) Spinal Mass
Medullary/Chondroid LesionGlobe Lesion in a ChildDiffuse Temporal Lobe Mass
Acro-Osteolysis Optic Nerve Enlargement and EnhancementIncreased T2 Signal Intensity in Basal Ganglia/Thalami in a Child
Dense Joint EffusionPachymeningeal (Dural) EnhancementIntraparenchymal Hemorrhage (IPH)
Loose Bodies with ErosionsMiddle Ear MassCorpus Callosal Lesion
Expansile Rib Lesion in a ChildTemporal Bone Trauma with Mastoid FluidSubependymal Nodules
Posterior Element Lytic LesionInner Ear Congenital MalformationsMassive Supratentorial CSF Collection in a Newborn
Carpal DislocationFloor of the Mouth MassIntraventricular Mass
Periarticular Soft Tissue CalcificationsAggressive Nasal Mass in an AdolescentCerebellar Atrophy
Benign Expansile Lytic LesionCystic Neck MassSpinal Cord Signal Abnormalities
Multiple Sclerotic Foci in the PelvisJugular Foramen MassCortically Based Enhancing Neoplasm
Vertebral Body Wedge FracturePetrous Apex LesionEpidural Spinal Mass
Epiphyseal Equivalent Lucent LesionsLeptomeningeal EnhancementProminent Periventricular/Basal Ganglia Cystic Lesions

Table 3. List of pediatrics, vascular and interventional, and breast key imaging findings.

PEDIATRICSVASCULAR & INTERVENTIONALBREAST
Neonatal Lung Disease with Low Lung VolumesPost-Intervention Vascular ComplicationBreast Implant Defect
Neonatal Lung Disease with Increased Lung VolumesCarotid Artery StenosisSuspicious Enhancement on Breast MRI
Cyanotic Infant with Decreased Pulmonary Blood FlowRenal Transplant Vascular ComplicationsComplex Cystic Mass in a Lactating Woman
Cyanotic Infant with Increased Pulmonary Blood FlowDigital Artery Occlusion/IschemiaCoarse Calcifications in a Partially Circumscribed Breast Mass
Shunt VascularitySubclavian Vein OcclusionBenign-Appearing Calcifications in the Breast
Solid Pulmonary Mass in PediatricsGreat Vessel StenosisMalignant-Appearing Calcifications (Linear, Branching Forms)
Liver Mass in an InfantRenal Artery StenosisFatty Breast Lesion
Suprarenal Mass in a ChildIntraparenchymal Renal Artery AneurysmsWell-Circumscribed Breast Mass in a Young Woman
Renal Mass in a ChildHypervascular Pulmonary MassUnilateral Skin Thickening in the Breast
Cystic Renal Lesion (Pediatrics)Infrarenal Aortic OcclusionAxillary Lymphadenopathy
Subglottic NarrowingPopliteal Artery OcclusionMass with Central Lucency
Neonatal Distal Bowel ObstructionExtratesticular MassWell-Circumscribed Solid Breast Mass
Enterocolitis in an Immunocompromised ChildInferior Vena Cava (IVC) Vascular Anomaly/AbnormalityWell-Circumscribed Cystic Breast Mass
Skeletal DysplasiaHypervascular Renal MassDuctal Mass
“Double Bubble” SignProminent Paraspinal Flow VoidsPostoperative Changes
Posterior Vertebral Body ScallopingSuprasellar Mass in an AdultBilateral Skin Thickening
Presacral MassHypervascular Intracranial MassBreast Lesion in a Man
Long Bone Aggressive LesionAortic DissectionWell-Circumscribed Breast Cancer
Endobronchial Lesion in a ChildLower Gastrointestinal (GI) BleedingDeveloping Asymmetry
Generalized Increased Bone DensityVascular Ring/SlingInfiltrative Breast Mass
Lytic Skull Lesion in a ChildUrinary ObstructionBreast Lesion with Nipple Discharge
Avascular Necrosis (AVN) of the Femoral Head in ChildrenTIPS DysfunctionUnilateral Nipple/Skin Changes
Vascular Anomaly with Esophageal and Tracheal CompressionBiliary Duct ObstructionSuperficial Breast Lesion
Neonatal Cystic Lung LesionTraumatic Aortic Injury (TAI)Large Breast Mass
Esophageal Obstruction in a NeonateCeliac Axis Stenosis/OcclusionComplex Cystic and Solid Breast Mass

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 02 Mar 2026
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Saliba T and Fahrni G. RAD-CaseBookLLM-08: An open-access dataset of structured large language model–generated radiology differential diagnosis teachings [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2026, 15:333 (https://doi.org/10.12688/f1000research.178297.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 02 Mar 2026
Views
9
Cite
Reviewer Report 23 Mar 2026
Shawn Lyo, Hospital of the University of Pennsylvania, Philadelphia, USA 
Approved with Reservations
VIEWS 9
Summary:
This data note describes the creation of RAD-CaseBookLLM-08, an open-access dataset of large language model–generated radiology differential diagnosis teaching texts. The authors generated 225 cases spanning nine radiology subspecialties using a standardized prompt applied to lesion-based “key imaging ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Lyo S. Reviewer Report For: RAD-CaseBookLLM-08: An open-access dataset of structured large language model–generated radiology differential diagnosis teachings [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2026, 15:333 (https://doi.org/10.5256/f1000research.196667.r465016)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 10 Apr 2026
    Guillaume Fahrni, Department of Diagnostic and Interventional Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
    10 Apr 2026
    Author Response
    We sincerely thank the Reviewer for the thorough, constructive, and insightful evaluation of our manuscript. The comments have helped us meaningfully improve the manuscript. We address each point below.

    ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 10 Apr 2026
    Guillaume Fahrni, Department of Diagnostic and Interventional Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
    10 Apr 2026
    Author Response
    We sincerely thank the Reviewer for the thorough, constructive, and insightful evaluation of our manuscript. The comments have helped us meaningfully improve the manuscript. We address each point below.

    ... Continue reading
Views
15
Cite
Reviewer Report 17 Mar 2026
Craig S Webster, The University of Auckland, Auckland, Auckland, New Zealand 
Approved
VIEWS 15
Title: The title is a noun cluster which makes it hard to understand. Better to try to use some small words to break up the nouns, e.g. “An open-access dataset of radiology differential diagnosis teachings generated with a large-language model” ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Webster CS. Reviewer Report For: RAD-CaseBookLLM-08: An open-access dataset of structured large language model–generated radiology differential diagnosis teachings [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2026, 15:333 (https://doi.org/10.5256/f1000research.196667.r465013)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 02 Mar 2026
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.