ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

BISCUIT: An Open-Source Platform for Visual Comparison of Segmentation Models in Bioimage Analysis

[version 1; peer review: awaiting peer review]
PUBLISHED 18 Nov 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

This article is included in the NEUBIAS - the Bioimage Analysts Network gateway.

Abstract

Background

Segmentation in microscopy images is a critical task in bioimage analysis, with many deep learning models available (e.g., Cellpose, Omnipose, StarDist, SAM-based models). However, researchers often face challenges in choosing the most suitable model for their data, as quantitative metrics do not always reflect the biological relevance of segmentation results.

Methods

We developed BISCUIT (BioImage Segmentation Comparison Utility and Interactive Tool), an open-source platform that enables users to run multiple state-of-the-art segmentation algorithms on the same images and visually compare their outputs side-by-side. BISCUIT is implemented as an interactive Jupyter Notebook pipeline, leveraging existing segmentation libraries, and can be executed either via a zero-installation cloud environment (Google Colab) or on local high-performance computing resources.

Results

Using BISCUIT, we demonstrate how visual inspection of segmentation outputs can reveal qualitative differences between algorithms that may be overlooked by abstract performance metrics. For example, in a fluorescence microscopy image dataset, BISCUIT allowed direct comparison of segmentations from Cellpose, Omnipose, and StarDist, highlighting differences in how each algorithm delineated cell boundaries. This visual approach helped identify the model that produced the most biologically plausible segmentation for the dataset.

Conclusions

BISCUIT provides an intuitive platform for bioimage analysts and life scientists to evaluate and “see what really works” on their data. The platform is openly available and extensible, lowering the barrier for researchers to perform rapid, interactive benchmarking of segmentation models on their own microscopy data.

Keywords

Bioimage segmentation, Deep learning, Model comparison, Microscopy, Cellpose, StarDist, Visual assessment, Open source tool

Introduction

Accurate segmentation in microscopy images is a foundational step in many biological studies,1,2 enabling quantitative analysis of morphology, distribution, and behavior. In recent years, deep learning methods have achieved state-of-the-art performance in cell segmentation.311 Notably, generalist frameworks like Cellpose12,13 and its variants (e.g. Omnipose14,15 for complex bacterial shapes) can segment a wide range of cell types without retraining, and specialized methods like StarDist excel at nuclei segmentation by representing objects as star-convex polygons.1619 With such a diverse toolkit of algorithms available, a new challenge arises20,21: how to determine which segmentation method works best for a given dataset or experimental context, in a fast, user-friendly, and easily repeatable manner. Traditionally, researchers compare algorithms using quantitative metrics, such as Intersection-over-Union (IoU) or Dice scores, against a ground-truth segmentation.3,2228 However, this approach is problematic because it requires the availability of ground truth, which is often difficult and time-consuming to obtain in practice.20 Different tools may expect ground-truth in different ways, adding further complexity. As a result, despite the lack of formal rigor, many researchers often rely on visual inspection of segmentation results to assess quality. In everyday practice, this “looking at the images” approach becomes the decisive step, as it directly reflects biological plausibility. BISCUIT29 is designed precisely around this idea: instead of requiring ground truth, it enables side-by-side visual comparison of outputs from multiple segmentation methods.

Currently, there is a lack of user-friendly tools for directly comparing multiple segmentation algorithms on the same images in a visual, interactive manner. Researchers often need to run each algorithm separately and manually overlay or juxtapose results, which is time-consuming and requires technical scripting skills. To address this gap, we present BISCUIT, the BioImage Segmentation Comparison Utility and Interactive Tool.29 BISCUIT is an open-source platform for visually comparing state-of-the-art segmentation models on microscopy images. It was designed with bioimage analysts and life scientists in mind, providing an intuitive way to evaluate segmentation quality across different algorithms without deep expertise in each algorithm’s code or parameters. By facilitating side-by-side visualization of segmentation outputs, BISCUIT enables users to leverage their domain knowledge and visual intuition when selecting a model, rather than relying solely on summary statistics.

By enabling rapid, visual benchmarking, BISCUIT complements existing evaluation methods and lowers the barrier for scientists to adopt the most suitable segmentation tools.

Methods

Implementation

BISCUIT is implemented as an interactive Jupyter Notebook pipeline written in Python. The core functionality of BISCUIT centers on running multiple segmentation algorithms on the same input images and aggregating their outputs for side-by-side visualization. To achieve this, BISCUIT interfaces with open-source segmentation libraries and pretrained models. In the current version, three state-of-the-art families of cell segmentation models are integrated by default: Cellpose, Omnipose, and StarDist. These models were chosen because they represent widely-used and complementary approaches to segmentation: Cellpose for general-purpose cell and nucleus segmentation,12 Omnipose as an extension14 of Cellpose tailored to handle challenging morphologies like elongated or branched cells,15 and StarDist for precise nuclear segmentation using shape priors.16 In addition, several models from the BioImage Model Zoo, a community repository of pretrained models for bioimage analysis,30 have been included, further broadening the range of available approaches. BISCUIT’s modular architecture enables the addition of new models, ensuring the platform can further evolve.

Under the hood, BISCUIT applies each selected model to the input dataset and collects the results. Users provide microscopy images (e.g., TIFF or PNG or any other general format supported by the Pillow Python library), which are processed through each model’s Python API with the pre-trained weights. For example, Cellpose and StarDist are invoked via their respective Python libraries to segment images. Most of the computations can leverage GPU acceleration, making the prediction fast (with multiple models and multiple images).

After segmentation, BISCUIT focuses on the visualization and comparison of results. Segmentation outputs from each model are typically instance labels, binary masks, or probability maps. BISCUIT renders these outputs in an interactive manner, for example by overlaying colored segmentation masks on the original images, or by showing side-by-side panels (original image next to the segmentation result from each model). The interactive notebook allows users to scroll through image sets. This design was informed by the principle of “visual first” evaluation, prioritizing clear visualization of segmentation boundaries, differences in object counts, and other qualitative features across models.

From a technical standpoint, BISCUIT requires a Python 3 environment with several key libraries installed. These include the deep learning frameworks and model-specific dependencies (e.g., TensorFlow or PyTorch for StarDist and Cellpose, respectively, as well as the Cellpose/Omnipose and StarDist packages themselves), and common image processing libraries such as NumPy, OpenCV, and scikit-image. The Jupyter notebook environment also uses matplotlib for plotting and image display. We have provided an environment configuration (e.g., a requirements.txt and Conda environment file in the repository) to ensure that users can install all necessary packages. Because deep learning models are computationally intensive, a machine with a modern GPU and sufficient memory is recommended for local execution of BISCUIT, especially on large collections of images.

Operation

The operation of BISCUIT is designed to be straightforward for end-users, requiring minimal software installation or configuration. We offer two primary modes of use:

  • 1. Zero-Installation via Browser

Users can run BISCUIT directly in their web browser using Google Colab. A one-click link (Run BISCUIT Now!) is provided on the project website,29 which opens the BISCUIT Google Colab Notebook. In this mode, all necessary dependencies and model weights are automatically fetched within the Colab environment. No local installation is needed, and the user is only required to have a Google account and an internet connection. Once the notebook is open, the BISCUIT interface guides the user through each step. The workflow in the notebook is as follows:

  • Setup: The notebook will first install required libraries (such as the segmentation model packages) in the Colab session.

  • Input Data: Example microscopy images are provided within the notebook, allowing users to get started immediately. Users can upload their own images (Colab provides an upload widget). In subsequent steps, users specify the channel to be analyzed and define the region of interest within the images.

  • Model Selection: The notebook interface allows users to select segmentation models from a searchable table (see Figure 1) that provides details such as model family, architecture, version, target, modality, dimensionality, training data, strengths/limitations, expected channels, and documentation links. Users can currently choose from 11 available models.

  • Running Segmentation: After selection, the user executes the notebook cell to run the segmentation. BISCUIT will process the images with each selected model sequentially and store the results.

  • Visualization of Results: Once processing is complete, BISCUIT provides an interactive interface to inspect and compare model outputs (see Figure 2). For each selected image, the notebook displays a panel that includes the raw image, an overlap map highlighting agreements and disagreements between models, individual instance masks, and outline overlays on the raw data. Models are compared in pairs, and a bar plot summarizes per-model disagreement scores (mean ± SD), offering a quantitative complement to the visual inspection. The mean disagreement score (for a given model) is calculated as the mean of all model-pair segmentation pixel-based differences over all analysed images. Assuming that model prediction inaccuracies are uncorrelated between models, the model with the lowest score yields predictions closest to the ground truth.3135 Users can switch between models and images using dropdown menus and sliders, enabling fast, side-by-side evaluation of segmentation performance.

  • Segmentation output: Based on the comparison plots and visual inspection, select the best-performing model from an interactive list, then apply it to segment the entire image stack. Users may also upload additional files for processing, and the resulting segmented images are saved for downstream analysis or storage.

  • 2. Local or HPC Installation

065d1a3c-c4c4-418c-a42e-46a11fede923_figure1.gif

Figure 1. Model selection interface in BISCUIT.

The searchable table enables users to filter models by various parameters (e.g., target, modality, or model family) and find the segmentation tools best suited to their data. Once selected, models can be applied directly within the notebook environment.

065d1a3c-c4c4-418c-a42e-46a11fede923_figure2.gif

Figure 2. Example of pairwise comparison of segmentation models in BISCUIT.

For a selected image, the interface displays the raw input, an overlap map indicating agreements and disagreements between two models, individual instance segmentations, and outline overlays on the raw data. A summary bar plot further shows mean semantic disagreement (±SD) across all models, enabling both qualitative and quantitative comparison. This figure compares Model 1 (nuclei, left; lowest mean semantic difference) with Model 2 (worm_omni, right; highest mean semantic difference).

For users requiring more control or aiming to run large-scale analyses, BISCUIT can be installed on local machines or HPC clusters. The software is open-source and available in a GitHub repository, which includes documentation for installation. Installing BISCUIT involves setting up the Python environment with the required libraries and downloading the pre-trained weights for the segmentation models (the repository provides instructions to fetch these assets). The minimal system requirements for running BISCUIT locally include a Python 3.8+ environment, approximately 4–8 GB of RAM (depending on image sizes), and CUDA-compatible GPU with at least 16 GB of GPU memory to accelerate model inference. The package dependencies include the main deep learning frameworks (TensorFlow 2.x for StarDist and PyTorch for Cellpose/Omnipose).

When running locally, users can either launch the Jupyter Notebook interface or integrate BISCUIT’s components into their own pipelines. For instance, an imaging core facility might deploy BISCUIT on a server and run the notebook for various user projects, possibly connecting to a web interface for uploading images.

The workflow overview remains similar in the local scenario: load images, run the selected models, and then review the outputs. On an HPC cluster, one might use JupyterLab to provide the same notebook experience to users. A key feature of BISCUIT is its scalability, following a ‘prototype then scale’ approach. Users can rapidly test models on a few images in the browser and then move to an HPC deployment to process hundreds or thousands. Because the same model versions are used across environments, results remain consistent, enabling researchers to progress seamlessly from exploration to full dataset analysis without switching tools.

Conclusions/Discussion

We have introduced BISCUIT, an open-source interactive platform to compare and evaluate bioimage segmentation models visually, and illustrated how it can assist researchers in selecting the most appropriate segmentation model for their bioimage analysis needs. In doing so, we address a critical gap in the bioimage analysis workflow: the ability to benchmark segmentation algorithms based on qualitative output characteristics and biological plausibility, not only numerical performance metrics.

Side-by-side visualization of segmentation results can reveal strengths and weaknesses of algorithms that aggregate metrics might hide. BISCUIT puts the expert “in the loop” by enabling direct visual inspection, thus empowering users to apply their biological knowledge when evaluating models. This approach aligns with the way many image scientists inherently validate results - by looking at overlays and pictures - and BISCUIT formalizes and streamlines that process.

Benefits and Unique Features: The advantages of BISCUIT can be summarized in three main points, echoing the design principles outlined on the project’s website: Zero Setup, Scalable by Design, and Visual-First. Zero Setup refers to the ease of use via a web browser with no installation, which lowers the entry barrier for non-technical users. Scalable by Design means that BISCUIT can be run on modest datasets in the cloud or scaled to large datasets on HPC, providing a continuum from quick testing to large-scale application Visual-First emphasizes the focus on qualitative, image-level assessment, which is the core of what BISCUIT offers. To our knowledge, BISCUIT is one of the first tools specifically catering to interactive model output comparison in the context of bioimage segmentation. While some existing software (e.g., image analysis platforms like Ilastik,36 Napari,37 or Fiji38), allow running multiple algorithms or plugins on images, they often do not provide an integrated side-by-side comparison workflow or require substantial user setup. BISCUIT’s contribution is in unifying multiple segmentation approaches under one roof.

Limitations: Despite its utility, BISCUIT has some limitations that we aim to address in future work. First, the platform currently supports a defined set of models (11 in total). If users need to compare other algorithms (for example, Ilastik classical segmentation, or proprietary software outputs), they may need to invest some effort to integrate those into the BISCUIT framework. Second, complementary scores, on top of the mean model disagreement, could be implemented. For instance, if ground truth is available, showing the metric scores for each model, or if not, perhaps asking the user to flag preferred segmentation in images and tallying a “preference count”.

The development of BISCUIT also opens up community-driven possibilities.

For example, the platform can be extended to compare other classes of image-processing models. That includes object detection models, denoising models, or image classification models. In each class, numerous models are developed, and visual inspection can bring similar benefits as in the case of segmentation.

Another exciting direction is using BISCUIT in educational settings: for teaching microscopy image analysis, instructors could use BISCUIT to demonstrate how different algorithms behave on the same data, helping students visually grasp concepts like under- vs over-segmentation, false positives vs false negatives, etc.

In conclusion, BISCUIT addresses an important need in the era of diverse AI-driven image analysis methods: it helps bridge the gap between algorithm developers and end-users by providing a simple yet powerful means for straightforward comparison of segmentation approaches. We believe this approach will contribute to more reliable and reproducible image analyses, as the human expert remains engaged in the validation loop rather than deferring entirely to automated metrics. As bioimage informatics advances, tools like BISCUIT will be essential for helping researchers leverage computational methods to extract accurate biological insights.

Software availability

Declaration of AI-assisted writing

During the preparation of this manuscript, the authors used ChatGPT (GPT-5, OpenAI, 2025) to assist in improving phrasing, grammar, and clarity, as well as for help with summarizing/shortening and rewording text sections. All scientific content, interpretations, and conclusions were written, reviewed, and approved by the authors, who take full responsibility for the final manuscript.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 18 Nov 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Rantsiou E, Oschmann F, von Ziegler L et al. BISCUIT: An Open-Source Platform for Visual Comparison of Segmentation Models in Bioimage Analysis [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:1277 (https://doi.org/10.12688/f1000research.171889.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 18 Nov 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.