Keywords
thoracic CT, artificial intelligence, open-source, workflow orchestration, multi-task analysis
Thoracic computed tomography frequently reveals incidental findings that are underreported in routine workflows due to time pressure and task-specific interpretation focus. While open-source artificial intelligence tools exist for secondary CT analysis tasks, they typically operate in isolation and require heterogeneous technical environments, limiting their practical integration. We present OpenOrchestratorCT-Pilot, an open-source Python framework designed to coordinate multiple independent AI modules for automated thoracic CT analysis within a unified pipeline.
The framework follows a runner–orchestrator architecture in which a central pipeline dispatches a NIfTI CT input to multiple analysis modules, each running in an isolated Conda environment. Three modules were integrated: multi-organ segmentation with volumetric abnormality detection (TotalSegmentator), emphysema quantification based on low-attenuation area thresholding, and deep learning–based pulmonary nodule detection. The framework was evaluated on two datasets: an institutional open dataset of 87 thoracic CT scans across five pathological categories (ChestPathCT5-S100) and a publicly available COVID-19 CT dataset of 20 cases from two acquisition sources.
All 87 cases from ChestPathCT5-S100 completed with full pipeline success across all three modules, with a mean processing time of 3.1 minutes per case. In the COVID-19 dataset, 90% of cases completed successfully; the two partial failures both originated from a subset of non-isotropic volumes reconstructed from stacked axial images, highlighting the framework’s dependency on input data quality. Module outputs demonstrated expected patterns but also revealed limitations inherent to the integrated tools, including false-positive nodule detections and artifactual emphysema flags in pneumothorax cases.
OpenOrchestratorCT-Pilot demonstrates the feasibility of orchestrating heterogeneous open-source AI modules for multi-task thoracic CT analysis. It represents a potential complementary layer for secondary finding detection, while prospective clinical validation of the individual modules remains necessary before any clinical use.
thoracic CT, artificial intelligence, open-source, workflow orchestration, multi-task analysis
Thoracic computed tomography (CT) is among the most widely used imaging modalities in clinical practice.1 Beyond its primary diagnostic indication, thoracic CT frequently reveals incidental findings such as cardiac chamber enlargement, emphysema, or pulmonary nodules, which may carry important clinical implications but are often underreported in routine workflows.2,3 One key contributing factor is time pressure, particularly when image interpretation is focused on a specific clinical question, such as coronary artery evaluation in cardiac CT, leading to reduced attention to secondary findings.4,5
Artificial intelligence (AI) tools have emerged as promising solutions for structured and automated CT analysis.6 However, most available tools operate in isolation, each requiring dedicated environments, dependencies, and technical expertise for deployment.7 In addition, these tools are typically designed for single, task-specific applications, contributing to fragmentation in the medical AI ecosystem and potentially leading to cumulative costs and waste of time when multiple tools are needed to cover different diagnostic tasks.8
In particular, in the context of thoracic CT, numerous secondary tasks can be envisioned, some of which are already addressed by existing open-source tools (AI-based or not). These include, for example, pulmonary nodule detection and characterization, emphysema quantification, multi-organ segmentation, volumetric abnormality detection, and body composition analysis (e.g., bone and muscle assessment). Additional tasks may also extend beyond the lungs, such as the detection of upper abdominal pathologies, aortic aneurysms, or other extra-thoracic findings captured within the scan range.9–11
Open-source algorithms offer a potential pathway to mitigate these limitations by providing accessible and cost-effective alternatives. While a growing number of such tools are available, they often remain disconnected from clinical workflows and require further validation and integration efforts to be practically usable.12,13
In this context, open-source tools could play a complementary “second-reader” role, assisting with secondary tasks such as incidental finding detection or multi-organ assessment, while specialized, state-of-the-art proprietary models remain focused on primary diagnostic objectives.14,15
A major challenge in leveraging multiple AI tools simultaneously lies in the heterogeneity of input formats, preprocessing requirements, and execution pipelines, reflecting the absence of standardized integration frameworks.16,17 Consequently, there is a need for intermediate solutions capable of taking a single imaging input (e.g., a thoracic CT scan), orchestrating its transformation and distribution across multiple AI modules, and aggregating outputs into a unified and interpretable report.
In this work, we present OpenOrchestratorCT-Pilot, an open-source Python framework designed to address this need. The tool coordinates multiple independent, third-party AI modules for automated thoracic CT analysis. Specifically, it (1) executes several validated AI modules within isolated environments, (2) manages data conversion and inter-module input/output compatibility, and (3) generates a structured report highlighting clinically relevant findings. The source code is openly available on GitHub (https://github.com/gfahrni/OpenOrchestratorCT-Pilot ), under a MIT license. A versioned release of the repository has been archived on Zenodo18 to ensure reproducibility and long-term preservation. Its applicability is demonstrated through two use cases: an institutional open dataset of thoracic CT scans and a publicly available COVID-19 CT dataset.
OpenOrchestratorCT-Pilot was designed following a runner–orchestrator architecture, in which a central pipeline coordinates multiple independent processing modules (“runners”), each encapsulating a distinct external analysis tool ( Figure 1). This design enables modular integration of heterogeneous algorithms within a unified workflow.

Schematic of the OpenOrchestrator-CT pipeline. A NIfTI CT file serves as input to the orchestrator, which dispatches it to three analysis modules, each running in its own isolated conda environment via dedicated runner scripts that handle format conversion and execution. Each module produces a structured result, and the orchestrator aggregates these into a unified summary report.
Each module implements a standardized BaseRunner interface, consisting of two core methods: check_installation(), which verifies environment and dependency availability, and run(), which executes the analysis task. This abstraction ensures interoperability and facilitates the addition of new modules without modifying the core pipeline.
To prevent dependency conflicts between modules, each runner is executed within its own isolated Conda environment using conda run. A dedicated temporary working directory is created for each pipeline execution; within this directory, each module operates in a separate subfolder, ensuring clean input/output separation and enabling systematic cleanup after execution.
In this framework, OpenOrchestratorCT-Pilot does not embed or redistribute the external analysis modules it coordinates. Instead, users are responsible for independently installing and maintaining each module within its dedicated Conda environment. This design choice preserves modularity, respects upstream licensing constraints, and allows each component to evolve independently.
For this pilot implementation, three representative tasks were selected to demonstrate the technical feasibility of the framework: organ segmentation with volumetric abnormality detection, emphysema quantification, and pulmonary nodule detection. Importantly, this work is intended to illustrate the orchestration of heterogeneous tools rather than to assess the clinical performance or relevance of the individual algorithms.
The first module integrates TotalSegmentator for automated multi-organ segmentation from thoracic CT images in NIfTI format. TotalSegmentator is a deep learning–based framework that provides comprehensive anatomical segmentation and quantitative measurements.19
In this study, four structures were specifically analyzed: the heart, thyroid gland, and bilateral adrenal glands. These organs were selected based on their relevance for incidental findings in thoracic imaging. Segmentation was performed using the --fast configuration (3 mm model) along with the --statistics option, generating a structured statistics.json output. Extracted organ volumes (converted from mm3 to mL) were compared against normative reference values derived from recent literature.20 Volumes exceeding predefined thresholds were flagged as potentially abnormal.
The second module performs emphysema quantification using an open-source tool based on density thresholding. Specifically, it computes the percentage of low-attenuation areas below −950 Hounsfield Units (LAA-950), a widely used quantitative biomarker of emphysema severity.21
Because the underlying implementation requires DICOM input, an intermediate conversion from NIfTI to DICOM is performed within the pipeline using the nibabel and pydicom libraries. Special care is taken to preserve spatial orientation and voxel geometry, including axis transpositions and flips to match clinical imaging conventions. To maintain strict module isolation, the pipeline dynamically generates and executes a temporary Python script within the module’s environment via conda run, avoiding any modification of the original repository.
A threshold of LAA-950 ≥ 10% was used to define significant emphysema. The module outputs the LAA-950 percentage (low-attenuation areas below −950 HU). A finding is flagged as significant emphysema when this value reaches or exceeds the 10% threshold.
The third module integrates a deep learning–based pulmonary nodule detection framework relying on a 3D Vision Transformer architecture (VitDet3D) trained on the LUNA16 dataset22 (available at: https://github.com/rlsn/LungNoduleDetection). The model processes volumetric CT data to identify candidate nodules using a sliding-window strategy (crop size 40 × 128 × 128, stride ratio 0.75) and outputs bounding box predictions with associated logit scores.
For each detected candidate, the module outputs world coordinates (in millimeters), estimated diameter, and confidence score derived from the logit threshold (default 0.0, corresponding to a predicted probability ≥50%). Nodules with a diameter ≥ 6 mm are flagged as clinically significant according to Fleischner Society guidelines (2017). All detections are compiled into a structured JSON output, which the orchestrator integrates into the final summary report.
Following execution of all enabled modules, results are aggregated into a structured plain-text report generated by a dedicated reporting script ( Figure 2). The report summarizes module execution status, quantitative outputs (e.g., organ volumes with reference ranges), and all flagged findings based on predefined thresholds.

The report includes: (1) organ segmentation with measured volumes for heart, thyroid, and adrenal glands, (2) emphysema quantification expressed as the percentage of low-attenuation areas below −950 Hounsfield Units (LAA-950), and (3) lung nodule detection with coordinates and size of detected nodules. The structured output illustrates the type of summary provided by the automated pipeline.
Each report is automatically named using the input scan identifier and a timestamp to ensure traceability and reproducibility. A standardized disclaimer is included to emphasize the research-only nature of the tool.
The pipeline was developed for Python 3.11 and requires Conda for environment management. It supports execution on systems equipped with NVIDIA GPUs (CUDA-enabled), and was tested on a Linux HPC environment.
Installation involves cloning the main repository, creating a dedicated orchestrator environment, and setting up individual Conda environments for each module. External module repositories are cloned into a designated modules/directory.
Pipeline configuration is centralized in a single config.yaml file, allowing users to enable or disable modules and define device allocation and file paths.
ChestPathCT5-S10023 is an open, curated, anonymized institutional dataset comprising 87 thoracic CT scans representative of five common pathological categories: rib fracture (RF, N = 14), pleural fluid (N = 16), pulmonary mass (N = 17), pulmonary embolism (PE, N = 20), and pneumothorax (N = 20). The dataset includes both contrast-enhanced and non-contrast acquisitions, reflecting heterogeneous clinical acquisition conditions. All scans were processed in batch mode using a HPC batch script on the institutional HPC cluster (SLURM, 1 GPU per job, 16 cores, 128 GB RAM, maximum wall-clock time of 1 hour per job).
All 87 cases (100%) completed with full success across all three modules, with no partial or total failures observed in any pathology group ( Table 1). Total batch processing time was 268.0 minutes, corresponding to a mean of approximately 3.1 minutes per case.
For each group, the number of cases (N) and the proportion of cases achieving complete success, partial failure, or total failure (%) across all three analysis modules are reported, along with total batch processing runtime. RF: rib fracture; PE: pulmonary embolism. Complete success: all three modules produced a valid output. Partial failure: at least one module failed while others completed. Total failure: no module produced a valid output.
| Group | N | Complete Success | Partial Failure | Total Failure |
|---|---|---|---|---|
| Overall | 87 | 100 | 0 | 0 |
| RF | 14 | 100 | 0 | 0 |
| Fluid | 16 | 100 | 0 | 0 |
| Mass | 17 | 100 | 0 | 0 |
| PE | 20 | 100 | 0 | 0 |
| Pneumothorax | 20 | 100 | 0 | 0 |
| Total runtime | 268.0 min (~3.1 min/case) | |||
Across the 87 processed cases, the three modules collectively produced the following outputs ( Table 2). TotalSegmentator flagged at least one pathological organ volume in 12.6% of cases overall, with the highest rate in the RF group (21.4%); mean heart volume was 519.33 ± 123.72 mL across the cohort. A high rate of undetected peripheral organs was observed (87.4% of cases with ≥1 undetected organ, mean 1.61 ± 0.94 per case), predominantly affecting the thyroid gland (63.2%) and adrenal glands, which is expected given the limited field of view of standard thoracic protocols. Emphysema quantification yielded a mean LAA-950 of 2.57 ± 6.08%, with 9.2% of cases exceeding the ≥10% significance threshold; the pneumothorax group showed the highest values (mean 7.94 ± 10.24%, 30.0% pathological rate), likely reflecting artifactual inclusion of intrapleural air by the density-thresholding algorithm, while the pleural fluid and pulmonary mass groups showed near-zero values with no flagged cases. Lung nodule detection identified at least one candidate in 78.2% of cases, totalling 292 candidates across the cohort (mean 3.36 ± 9.46 per case, mean diameter 7.68 ± 3.76 mm, 59.6% solid); the pneumothorax group showed the highest detection rate (100%) and mean count (8.15 ± 18.23 per case), with a high standard deviation suggesting an elevated false-positive rate, potentially driven by the heterogeneous density interfaces created by the partially collapsed lung parenchyma in pneumothorax cases.
Results from the three analysis modules are reported per group. Section 1: TotalSegmentator organ volumes and pathological flags based on published normative thresholds. Section 2: emphysema quantification expressed as LAA-950; cases ≥10% are considered pathological. Section 3: lung nodule detection prevalence, count, diameter, and morphological classification. RF: rib fracture; PE: pulmonary embolism; LAA-950: low-attenuation area below −950 HU; SD: standard deviation.
The COVID-19 CT dataset is a publicly available collection of 20 thoracic CT scans from patients diagnosed with COVID-19,24 assembled from two distinct sources: 10 scans acquired from coronacases.org and 10 scans sourced from radiopaedia.org. All scans are provided in NIfTI format and include expert segmentations of lung parenchyma and COVID-19 infection regions. Importantly, the two subsets differ markedly in their acquisition characteristics: the coronacases scans are isotropic volumetric acquisitions, whereas the radiopaedia scans were originally crawled as stacked axial JPEG/PNG images and subsequently converted to NIfTI, resulting in non-isotropic volumes with degraded inter-slice consistency. This heterogeneity, which reflects real-world variability in publicly available imaging data, provided an opportunity to assess pipeline robustness across input quality conditions.
Pipeline robustness and runtime
Of the 20 cases processed, 18 (90%) completed with full success across all three modules ( Table 3). The 2 partial failures (10%) were both from the Radiopaedia subset, in which the emphysema and nodule detection modules failed while TotalSegmentator completed successfully. Total batch processing time was 37.9 minutes, corresponding to a mean of approximately 1.9 minutes per case.
For each subset, the number of cases (N) and the proportion of cases achieving complete success, partial failure, or total failure (%) across all three analysis modules are reported, along with total batch processing runtime. Complete success: all three modules produced a valid output. Partial failure: at least one module failed while others completed. Total failure: no module produced a valid output.
| Group | N | Complete Success | Partial Failure | Total Failure |
|---|---|---|---|---|
| Overall | 20 | 90 | 10 | 0 |
| Coronacases | 10 | 100 | 0 | 0 |
| Radiopaedia | 10 | 80 | 20 | 0 |
| Total runtime | 37.9 min (~1.9 min/case) | |||
Across the 20 processed cases, results differed markedly between the two source subsets ( Table 4). TotalSegmentator flagged at least one pathological organ volume in 5.0% of cases overall (10.0% in the Coronacases subset, 0.0% in the Radiopaedia subset); all 20 cases had at least one undetected peripheral organ (mean 3.0 ± 1.05 per case), with the thyroid gland absent in 100% of cases across both subsets, consistent with the limited craniocaudal coverage of thoracic CT acquisitions. Heart volume could only be meaningfully computed in cases with valid segmentation output: mean was 478.86 ± 97.47 mL for Coronacases, while the Radiopaedia subset yielded largely invalid segmentations (mean 96.65 ± 86.75 mL), reflecting the non-isotropic volume geometry of these scans. Emphysema quantification was available only for the Coronacases subset (N = 10), yielding a mean LAA-950 of 0.43 ± 0.39% with no cases exceeding the pathological threshold; the module failed entirely for all Radiopaedia cases, again attributable to their non-standard acquisition geometry. Lung nodule detection similarly succeeded only in the Coronacases subset, with nodules detected in 90.0% of cases (mean 5.9 ± 4.87 per case, mean diameter 12.83 ± 5.52 mm, 94.9% solid, total 59 candidates); no nodules were detected in any Radiopaedia case, consistent with the module failure in this subset.
Results from the three analysis modules are reported per subset. Section 1: TotalSegmentator organ volumes and pathological flags based on published normative thresholds; heart volume is computed over valid segmentations only (volume > 0 mL). Section 2: emphysema quantification expressed as LAA-950; cases ≥10% are considered pathological; results unavailable for the Radiopaedia subset due to module failure. Section 3: lung nodule detection prevalence, count, diameter, and morphological classification; no detections were obtained for the Radiopaedia subset. LAA-950: low-attenuation area below −950 HU; SD: standard deviation.
In this work, we presented OpenOrchestratorCT-Pilot, an open-source Python framework designed to coordinate multiple independent AI modules for automated thoracic CT analysis, and demonstrated its applicability across two heterogeneous datasets totalling 107 cases.
A key strength of OpenOrchestratorCT-Pilot lies in its modular architecture. While three modules were integrated in this pilot implementation, the framework is designed to accommodate additional modules with minimal development effort. Adding a new module requires only implementing the standardized runner interface and registering the module in the central configuration file. This low integration barrier makes the framework readily extensible to other CT analysis tasks, such as aortic diameter measurement, bone density assessment, and any other chosen tasks.
A specific advantage is the use of isolated Conda environments for each module. This design directly addresses one of the most common practical obstacles in deploying multiple AI tools simultaneously: dependency conflicts. By ensuring that each module operates in its own environment, the framework allows tools with incompatible requirements to coexist without interference.
Regarding technical limitations, the results from the COVID-19 dataset use case highlighted a fundamental dependency of the framework on input data quality. While the cases with standard isotropic volumetric acquisitions achieved 100% pipeline success, 20% of Radiopaedia cases experienced partial failure, with two modules unable to process non-isotropic volumes reconstructed from stacked axial images. This illustrates an important caveat: users must ensure that the input NIfTI files meet the format and geometry requirements of each integrated module.
A second practical limitation concerns computational requirements. The framework itself is lightweight, but its utility depends entirely on the modules it coordinates, several of which rely on deep learning inference and require NVIDIA GPU resources. In this study, processing was performed on an HPC cluster with dedicated GPU allocation. Deployment in settings without such infrastructure may therefore be constrained, and this dependency should be factored into any integration planning.
Regarding clinical limitations, an important distinction must be drawn between the orchestration performance of the framework and the clinical performance of the modules it integrates. As demonstrated in this study, pipeline robustness is entirely independent of the clinical validity of the outputs produced. The framework can successfully execute all modules and generate a complete report on cases where individual module outputs are unreliable or clinically misleading. This was evident in several findings from our use cases. In the pneumothorax group, the emphysema quantification module flagged 30% of cases as pathological based on LAA-950 thresholding, likely due to artifactual inclusion of intrapleural air rather than true parenchymal destruction. Similarly, the nodule detection module yielded a substantially higher mean nodule count in the pneumothorax group (8.15 ± 18.23 per case) compared to other groups, with a high standard deviation suggesting an elevated false-positive rate driven by the heterogeneous density interfaces of partially collapsed lung. These observations reflect the intrinsic performance limits of the underlying tools, not failures of the orchestration layer. This distinction underscores the need for rigorous, prospective clinical validation of each module prior to integration into any workflow intended for clinical use.
Regarding its potential role in the medical AI ecosystem, OpenOrchestratorCT-Pilot is not intended to replace specialized, high-performance proprietary tools focused on primary diagnostic tasks. Rather, it is positioned as a complementary layer, a “radiology copilot”, capable of systematically addressing secondary tasks that are frequently overlooked in time-pressured clinical workflows. In this paradigm, state-of-the-art proprietary models remain focused on the primary clinical question, while the orchestrator mobilizes open-source tools to cover the analytical periphery.
From an economic standpoint, the reliance on open-source modules represents a meaningful advantage in contexts where commercial AI solutions are cost-prohibitive or unavailable. The framework offers a pathway toward structured, multi-task CT analysis without recurring licensing costs, provided that adequate computational infrastructure is available.
Nonetheless, the path from technical feasibility to clinical deployment remains long. The results presented here should be interpreted as a proof of concept for the orchestration approach, not as a clinical validation of the integrated modules. Future work should focus on prospective evaluation of module performance against reference standards, integration with clinical PACS workflows, and expansion of the module library to cover a broader range of thoracic and extrathoracic findings.
We present OpenOrchestratorCT-Pilot, an open-source Python framework capable of coordinating multiple independent AI modules for automated secondary-task thoracic CT analysis within a unified and reproducible pipeline. Across two heterogeneous datasets, the framework achieved robust execution while exposing the expected sensitivity to input data quality and the independence of orchestration performance from module-level clinical validity.
• Source code available on Github: https://github.com/gfahrni/OpenOrchestratorCT-Pilot
• Archived software available on Zenodo: https://doi.org/10.5281/zenodo.19278111
• License: MIT License
Zenodo: ChestPathCT5-S100. https://doi.org/10.5281/zenodo.18256796.23
This dataset is available under the terms of the Creative Commons Zero v1.0 Universal (CC0 1.0) license.
Zenodo: COVID-19 CT Lung and Infection Segmentation Dataset. This is a third-party dataset. The data are openly accessible and can be downloaded by any reader or reviewer via Zenodo at https://doi.org/10.5281/zenodo.3757475.
Zenodo: OpenOrchestratorCT-Pilot - Raw processing results and individual reports. https://doi.org/10.5281/zenodo.19758808.25
This project contains the following extended data:
• hpc-reports-textfiles.zip: Collection of the 107 individual plain-text reports generated by the pipeline, containing all extracted anatomical and pathological metrics.
• data_table1.xlsx to data_table4.xlsx: Raw data sheets containing the individual values, execution statuses, and variables used to calculate the summary measures reported in the manuscript’s tables.
Data are available under the terms of the Creative Commons Attribution 4.0 International (CC-BY 4.0) license.
The authors acknowledge the use of artificial intelligence–based language tools to assist with phrasing and editing of the manuscript. All scientific content, study design, and conclusions are solely the responsibility of the authors.
| Views | Downloads | |
|---|---|---|
| F1000Research | - | - |
|
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)