DTAShiny: An Interactive R Shiny Application for Diagnostic Test Accuracy Analysis and Visualization

Syed Faizaan Shah Quadri; Ahmad Mahmood; Joanne Lac

doi:10.12688/f1000research.171468.1

Home Browse DTAShiny: An Interactive R Shiny Application for Diagnostic Test Accuracy...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

DTAShiny: An Interactive R Shiny Application for Diagnostic Test Accuracy Analysis and Visualization

[version 1; peer review: 3 approved with reservations]

Syed Faizaan Shah Quadri¹, Ahmad Mahmood¹, Joanne Lac ²

PUBLISHED 30 Oct 2025

Author details Author details

¹ Royal Free Hospital, London, England, UK
² University College London Medical School, London, England, UK

Syed Faizaan Shah Quadri
Roles: Writing – Original Draft Preparation, Writing – Review & Editing

Ahmad Mahmood
Roles: Conceptualization, Project Administration, Resources, Supervision, Writing – Original Draft Preparation

Joanne Lac
Roles: Project Administration, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the RPackage gateway.

This article is included in the Bioinformatics gateway.

Abstract

Diagnostic Test Accuracy (DTA) analyses are essential in clinical research and medical decision-making. Despite the availability of R packages such as pROC and ROCR, their use often requires programming expertise, limiting accessibility for many clinical researchers. An interactive, code-free method is therefore needed to enhance usability and understanding.

We developed DTAShiny, an R Shiny based web application that allows users to upload diagnostic data sets in CSV or Stata formats, perform DTA metric calculations, and generate dynamic visualizations. The application is built using shiny, bs4Dash, pROC, ggplot2, and other packages. It includes heuristic-based automatic detection of reference and test variables and offers real time threshold adjustment via an interactive slider.

DTAShiny computes standard sensitivity, specificity, PPV, NPV, AUC and advanced F1 score, balanced accuracy. These DTA metrics are accompanied with approximate confidence intervals. The tool generates ROC and PR curves, distribution plots, and a calibration style plot. Real-time interactivity enables users to observe trade offs as thresholds change.

This Zenodo deposit contains the DATAShiny source code, an example anonymised dataset, and documentation to run the app locally.

Keywords

Diagnostic Test Accuracy, DTA, R Shiny, ROC Analysis, Sensitivity, Specificity, Predictive Values, Interactive Visualization, Threshold Selection, pROC.

Corresponding author: Joanne Lac

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2025 Quadri SFS et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Quadri SFS, Mahmood A and Lac J. DTAShiny: An Interactive R Shiny Application for Diagnostic Test Accuracy Analysis and Visualization [version 1; peer review: 3 approved with reservations]. F1000Research 2025, 14:1185 (https://doi.org/10.12688/f1000research.171468.1) First published: 30 Oct 2025, 14:1185 (https://doi.org/10.12688/f1000research.171468.1) Latest published: 30 Oct 2025, 14:1185 (https://doi.org/10.12688/f1000research.171468.1)

Introduction

Diagnostic Test Accuracy (DTA) studies play a pivotal role in evidence based medicine by providing critical information on how well a test can distinguish between individuals with or without a specific condition. Key performance indicators such as sensitivity, specificity, predictive values, and the area under the receiver operating characteristic (ROC) curve (AUC) are used to quantify the test performance. When the test yields continuous results, choosing an appropriate threshold becomes a crucial step, as it significantly influences the calculated metrics and, ultimately, clinical decisions.

While statistical software packages support DTA analysis, few offer interactive tools that allow users to explore the effect of threshold selection dynamically and visualize its impact in real time. To bridge this gap we developed DTAShiny, a web based application built using R and the Shiny framework. DTAShiny enables users to easily upload data, automatically suggests relevant variables, adjust thresholds interactively and generate a wide array of metrics and visualizations.

Methods implementation

DTAShiny is developed using a R programming language (version 4.0+ recommended) and the Shiny web application framework. The user interface is designed with the bsDash package to create a clean, modern dashboard layout that’s both intuitive and responsive.

The core functionalities are supported by several R packages:

• shiny, bs4Dash: For the web application structure and user interface elements.
• tidyverse: Primarily ggplot2 for creating plots (boxplots, histograms, density plots, precision-recall curve, calibration-like plot) and dplyr for data manipulation (e.g., in the calibration-like plot).
• DT (DataTables): For displaying interactive tables of DTA metrics.
• pROC: Used for generating the ROC curve and calculating the AUC.
• haven: For reading Stata (.dta) files.

Data input and variable detection

Users can upload the data in the form of CSV or Stata format. DTAshiny uses simple heuristic rules to automatically suggest which columns represent the reference standard and the index test:

1. The reference standard column: It preferentially selects a column named “status” if it contains binary (0/1) numeric data. Failing that, it searches for other numeric columns containing only 0s and 1s. As a fallback, it selects the first column.
2. The index test column: It preferentially selects a column named “test_value” if present. Otherwise, it chooses the first numeric column that was not selected as the reference standard.

These automatic selections are meant to streamline setup, though users are encouraged to verify their accuracy before proceeding.

Interactive threshold adjustment

If the identified index test variable is numeric, a slider is dynamically generated. This interactive control allows users to explore different cutoff points for classification. The range of the slider is based on minimum and maximum values of the test variable and initially defaults to the median.

DTA metrics calculations

Based on the chosen threshold the DTAShiny constructs a 2x2 confusion matrix (True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN)). From this it computes the following metrics:

• Sensitivity = TP/ (TP + FN)
• Specificity = TN/ (TN + FP)
• Positive Predictive Value (PPV) = TP/ (TP + FP)
• Negative Predictive Value (NPV) = TN/ (TN + FN)
• Area Under the ROC Curve (AUC) = calculated via pROC::auc()
• F1 Score = 2 × (PPV × Sensitivity) / (PPV + Sensitivity)
• Balanced Accuracy = (Sensitivity + Specificity) /2
• Prevalence = (TP + FN) / (Total observations)

Approximate 95% confidence intervals for sensitivity, specificity, PPV, and NPV are calculated using stats::prop.test.

Visualizations

DTAShiny produces a variety of plots to support interpretation:

• ROC Curve: Plofled using pROC::plot.roc().
• Precision-Recall (PR) Curve: Calculated by evaluating precision and recall over a sequence of thresholds and plofled using ggplot2.
• Distribution Plots: Boxplots, histograms, and density plots of the index test values (overall and stratified by reference status) are generated using ggplot2.
• Calibration-like Plot: Index test values are binned into deciles using dplyr::ntile(). The mean predicted test value within each bin is plofled against the observed proportion of positive cases in that bin. This plot provides a visual, albeit illustrative, sense of calibration.

Operations

DTAShiny can be accessed as a web based application and its source code can be run locally.

System requirements

• R version 4.0 or higher
• Internet browser for the hosted version
• Required R packages: shiny, bs4Dash, ggplot2, dplyr, pROC, DT, haven

Running the application

1. Web version: Available at hflps://786miii.shinyapps.io/786MIIDTA/.
2. Local versions:
- • Open-source and hosted on GitHub at hflps://github.com/mahmood789/DTA/tree/main under the MIT License.
- • Open app. R in RStudio.
- • Run the app using shiny:: runApp().

Interface walkthrough

• Data and Threshold: For file upload (CSV/Stata format). The application automatically identifies the reference and index test variable and creates a threshold slider if the test variable is numeric. The “Run DTA Analysis” initiates the calculations.
• Overview: Displays general information about the uploaded dataset and confirms the variables selected for analysis.
• Standard Metrics: Displays a table of key data metrics (Sensitivity, Specificity, PPV, NPV, AUC) and the confusion matrix. The ROC curve is shown here.
• Advanced Metrix: Presents a table with advanced metrics including F1 Score, Balanced, Accuracy, Prevalence and a table of their confidence intervals. The Precision Recall curve is shown here.
• Plots: Offers visual tools for exploring the data such as: Boxplots of test values by reference group, Histogram of test values, Density plots for test values in each outcome group, A calibration-like plot for assessing the relationship between predicted and observed outcomes.
• Extra Text Output: Includes static text providing general guidance and interpretation notes.

Use cases and illustrative examples

DTA shiny can be utilised in different scenarios:

Initial DTA Exploration: A clinical researcher uploads data from a pilot study of a new biomarker into DTAShiny. They can instantly visualize the test performance, explore how different threshold values affect sensitivity and specificity, and examine key diagnostic metrics all without writing a code.

Understanding Threshold Impact: By interactively moving the threshold slider, users can directly observe how sensitivity and specificity trade off and how predictive values change. This helps researchers to alter diagnostic criteria for their specific clinical contexts.

Visualising Data Characteristics: The boxplots and density plots help in understanding the separation (or overlap) in test values between diseased and non-diseased individuals. The histogram provides an overview of the test value distribution.

Assessing Performance in Imbalanced Datasets: When working with conditions that have low prevalence, the ROC curve can sometimes be misleading. DTAShiny includes the Precision Recall curve which particularly provides more insight in such situations, helping researchers befler interpret model performance.

Educational Tool: Students and Teachers can learn DTAShiny with a simple dataset to learn about the DTA concept, understand how metrics are calculated and see the effect of threshold changes.

Example 1: Biomarker evaluation in screening

A researcher evaluating a new biomarker for early detection of Disease X uploads their pilot study dataset. Using the threshold slider, they explore tradeoffs: 95% sensitivity is achieved at a threshold of 2.6 but specificity drops to 68%. This informs their recommendation to prioritize sensitivity in screening contexts.

Example 2: Educational application

A public health instructor uses DTAShiny to demonstrate DTA principles. Students adjust thresholds on sample datasets and see how sensitivity and specificity move inversely, visualizing key diagnostic trade offs without writing any code.

Example 3: Low prevalence condition

An epidemiologist studying a rare condition (2% prevalence) notes that the ROC curve appears excellent (AUC = 0.9). However, using the precision-Recall curve. They observe that PPV remains low due to low prevalence, focusing on the importance of multiple metrics.

Discussion

DTAShiny provides a user friendly and interactive environment for conducting a comprehensive range of DTA analyses. Its main strength lies in its intuitive interface, automatic variable detection, and real time threshold adjustment for continuous tests. By combining standard and advanced performance metrics with rich visual outputs. The inclusion of both ROC and PR curves along with variable distributional plots and an illustrative calibration like plots provides users with a well rounded understanding of test performance.

Strengths

• Interactive Threshold Selection: Allows dynamic exploration of test performance across different cut off.
• Comprehensive Output: Provides standard and advanced DTA metrics, confidence intervals and multiple relevant plots.
• User Friendly: The tool is built around a graphical user interface that minimizes the need for coding along with automatic detection of reference and test variables.
• Supports Common Data format: Accepts both CSV and Stata formats.

Limitations

• Heuristic variable detection: While convenient the automatic detection of reference and test variables, it may not always select the appropriate columns, especially in the datasets with unconventional naming or multiple potential options. Users need to verify these selections before proceeding with the analysis.
• Approximate Confidence Intervals: The CIs for sensitivity, specificity, and predictive values are based on prop.test, which uses a normal approximation. More accurate approaches, like Clopper-Pearson intervals or bootstrap-based methods, could yield slightly different results. The AUC confidence intervals are not displayed explicitly in the current versions.
• Single Test Analysis: The current version focuses on evaluating a single index test against a single reference standard. It does not directly support the comparisons of multiple tests or analyses of paired DTA data.
• No Automated Threshold Optimisation: While the users can explore various thresholds, the app does not automatically calculate or suggest an “optimal” threshold based on criteria like Youden’s J index or proximity to the top-left corner of the ROC space.
• Illustrative Calibration Plot: The Calibration like plot provided in the tool is intended as visual aid rather than a formal calibration test. It does not replace statistical tests such as the Hosmer- Lemeshow test.

Future developments

• Enabling manual selection of reference and test variables, in case auto-detection is inaccurate.
• Implementing more robust methods for calculating confidence intervals, including for AUC.
• Adding functionality to compute and highlight optimal thresholds using formal criteria.
• Extending support to comparisons involving multiple diagnostic tests or test combinations.
• Providing options for handling missing data more explicitly.
• Introducing formal tools for assessing calibration.

Conclusion

DTAShiny is an accessible and interactive R Shiny application designed to empower researchers, clinicians and students performing diagnostic test accuracy analyses without requiring advanced coding skills. By integrating data upload, real-time threshold adjustment, detailed performance metrics, and a wide range of visualizations, DTAShiny streamlines the DTA workflow and enhances the interpretability of diagnostic test results.

Its user-friendly interface, support for common data formats, and comprehensive outputs make it a valuable tool for both research and education. While the current version is focused on single test evaluations, planned enhancements will further expand its capabilities and analytical depth. Overall, DTAShiny contributes meaningfully to the growing toolkit of evidence-based diagnostic methods.

Data availability statement

• Software Code and Example Data: The source code and example dataset for the DTAShiny application are available from the following permanent deposit: Zenodo (DOI: https://doi.org/10.5281/zenodo.17224122). The Source code is Licenced under the MIT Licence.⁶

Software availability

• Hosted version: DTAShiny is publicly available at: hflps://786miii.shinyapps.io/786MIIDTA/
• Source code: The R source code is available on GitHub at: hflps://github.com/mahmood789/DTA/tree/main
• Archived software available from: https://doi.org/10.5281/zenodo.17224122⁶
• License: The source code is licensed under the MIT License.

Acknowledgements

We acknowledge the developers of R⁴ and the R packages shiny,⁵ bs4Dash, tidyverse (including ggplot2² and dplyr³), DT, pROC,¹ and haven,⁷ which are instrumental to the functionality of this application. We also acknowledge the use of the large language model, ChatGPT (openAI) for minor assistance in grammar, clarity and language editing of the manuscript.

References

1. Robin X, Turck N, Hainard A, et al.: pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011; 12: 77. PubMed Abstract | Publisher Full Text | Free Full Text
2. Wickham H, et al.: ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag; 2016. Publisher Full Text
3. Wickham H, François R, Henry L, et al.: dplyr: A Grammar of Data Manipulation. R package version X.Y.Z. Reference Source
4. R Core Team: R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 202X. Reference Source
5. Chang W, et al.: shiny: Web Application Framework for R. R package version X.Y.Z. Reference Source
6. Quadri SFS, Ahmed M, Lac J: DTAShiny: An Interactive R Shiny Application for Diagnostic Test Accuracy Analysis and Visualization. Zenodo. 2025. Publisher Full Text
7. StataCorp: Stata Statistical Sokware: Release XX. College Station, TX: StataCorp LLC.; 202X. (If citing Stata itself, or haven for reading its files).

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 30 Oct 2025

Author details Author details

¹ Royal Free Hospital, London, England, UK
² University College London Medical School, London, England, UK

Syed Faizaan Shah Quadri
Roles: Writing – Original Draft Preparation, Writing – Review & Editing

Ahmad Mahmood
Roles: Conceptualization, Project Administration, Resources, Supervision, Writing – Original Draft Preparation

Joanne Lac
Roles: Project Administration, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 30 Oct 2025, 14:1185

https://doi.org/10.12688/f1000research.171468.1

Copyright

© 2025 Quadri SFS et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Quadri SFS, Mahmood A and Lac J. DTAShiny: An Interactive R Shiny Application for Diagnostic Test Accuracy Analysis and Visualization [version 1; peer review: 3 approved with reservations]. F1000Research 2025, 14:1185 (https://doi.org/10.12688/f1000research.171468.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 1

VERSION 1

PUBLISHED 30 Oct 2025

Views

1

Reviewer Report 02 Jan 2026

Pedro Castaneda, Universidad Nacional Toribio Rodriguez de Mendoza, Amazonas, Peru

Approved with Reservations

https://doi.org/10.5256/f1000research.189078.r442973

Is the description of the software tool technically sound?
Partly. The main architecture, packages used and functionalities are described, but key design choices (e.g. handling of prevalence, case–control designs, single‑test focus, no optimal threshold estimation) and their statistical implications are ... Continue reading

Is the description of the software tool technically sound?
Partly. The main architecture, packages used and functionalities are described, but key design choices (e.g. handling of prevalence, case–control designs, single‑test focus, no optimal threshold estimation) and their statistical implications are not fully developed.

Are sufficient details of the code, methods and analysis provided to allow replication of the software development and its use by others?
Partly. The code is openly available on GitHub and Zenodo and the main workflow is described, but some implementation details (e.g. variable selection heuristics, CI computation options, testing across environments) are only briefly outlined and would benefit from more explicit, step‑by‑step information.

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Partly. Standard outputs (ROC, PR curves, confusion matrix, sensitivity, specificity, predictive values, F1, balanced accuracy) are explained, but critical caveats—especially around interpretation of PPV/NPV and PR in case–control or low‑prevalence contexts—are not yet clearly flagged for users.

Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
Partly. The article justifies that DTAShiny is user‑friendly and useful for exploratory/educational DTA, but the claims about its analytical breadth and practical utility would be stronger with a more explicit discussion of limitations and of scenarios where the outputs may be misleading.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Data Science

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

10

Reviewer Report 30 Dec 2025

Budi Sunaryo, Universitas Bung Hatta (UBH), Padang, West Sumatra, Indonesia

Approved with Reservations

https://doi.org/10.5256/f1000research.189078.r442970

The manuscript presents DTAShiny, a web-based R Shiny application that enables users to perform Diagnostic Test Accuracy (DTA) analyses through an intuitive graphical interface. The tool allows users to upload data, automatically identify reference and test variables, adjust diagnostic thresholds ... Continue reading

The manuscript presents DTAShiny, a web-based R Shiny application that enables users to perform Diagnostic Test Accuracy (DTA) analyses through an intuitive graphical interface. The tool allows users to upload data, automatically identify reference and test variables, adjust diagnostic thresholds interactively, and generate standard DTA metrics (sensitivity, specificity, PPV, NPV, AUC), along with visualizations such as ROC curves, precision-recall curves, and distribution plots. The authors correctly identify that existing R packages for DTA require programming skills that create barriers for many clinical researchers. The development of DTAShiny addresses an essential need in making diagnostic test evaluation more accessible.

Detailed Assessment
1. Is the description of the software tool technically sound?
Partly. The description of the architecture and workflow is generally accurate. However, it omits a critical technical limitation: the tool does not account for different study designs, so the predictive values (PPV/NPV) it calculates are valid only for cohort studies, not for standard case-control designs. Additionally, an overstatement of the "calibration-like plot" as a formal diagnostic tool weakens the technical soundness.
2. Are sufficient details provided to allow replication?
Partly. Providing the GitHub repository with source code is a significant strength. However, for accurate replication, the exact computational environment is missing. The article should specify the precise versions of R and all dependent packages used, ideally by including a file such as renv.lock or the output of sessionInfo() in the repository. More details on how thresholds are selected for generating curves would also aid replication.
3. Is sufficient information provided to interpret the results?
Partly. While standard metrics are clearly labeled, the tool lacks essential guidance and warnings for proper interpretation. Crucially, it does not warn users that PPV and NPV are invalid for case-control data, a significant risk for misinterpretation. Guidance on selecting clinically meaningful thresholds and brief explanations of when to use advanced metrics, such as the F1-score, would also improve interpretability for non-expert users.
4. Are the conclusions adequately supported?
Partly. Its described functionality supports the conclusions about the tool's interactivity and user-friendly design. However, broader claims about its performance and impact, such as that it "empowers" users or provides a "well-rounded understanding," are not supported by evidence, such as user testing or validation against established software. The conclusions would be stronger if they were more carefully aligned with the tool's demonstrated features and explicitly acknowledged its current limitations.

Specific Recommendations for Revision
- Address Study Design Limitations: The manuscript and application must explicitly warn users about the limitations of PPV and NPV calculations in case-control studies. Consider implementing an option for users to input an external prevalence value when analyzing case-control data.
- Enhance Reproducibility: Add detailed version information for all software components to the GitHub repository. A renv.lock file would be ideal for ensuring computational reproducibility.
- Improve Interpretation Guidance: Incorporate clear warnings and educational notes within the application interface about the contextual interpretation of metrics, particularly predictive values.
- Moderate Conclusions: Revise the discussion and conclusion sections to more accurately reflect what has been demonstrated versus what is claimed. Explicitly acknowledge the tool's current limitations alongside its strengths.
- Clarify the Calibration Plot: Re-label or re-describe the "calibration-like plot" to avoid implying it is a formal statistical calibration tool, which it is not.

Conclusion
DTAShiny is a promising and valuable tool that makes DTA analysis more accessible through its interactive features. However, it currently has significant limitations, particularly concerning the valid interpretation of predictive values across study designs. With substantial revisions to address its epidemiological foundations, reproducibility, and the balanced reporting of its capabilities, it could become a responsible and valuable contribution to the research community. I recommend that the authors address these concerns and submit a revised version for further consideration.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Information Technology, Computer Networks, Machine Learning, Internet of Things, Data Analytics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

12

Reviewer Report 20 Nov 2025

Christos Nakas, University of Thessaly, Volos, Greece

Approved with Reservations

https://doi.org/10.5256/f1000research.189078.r428907

This might eventually evolve into a useful R shiny app for ROC curve analysis, however, currently feels more like a pre alpha version, given that no limitations are described for the use in case-control studies (where prevalence cannot be estimated ... Continue reading

This might eventually evolve into a useful R shiny app for ROC curve analysis, however, currently feels more like a pre alpha version, given that no limitations are described for the use in case-control studies (where prevalence cannot be estimated from the data). As a consequence, results can be highly misleading since all PPV, NPV, precision-recall estimates can be wrong. It would be useful if the user could provide a prevalence estimate or use the available data for its estimation.
Furthermore, currently, one cannot handle more than one biomarkers and cannot compare biomarkers (uploading a single data file). The app automatically seems to select columns from the data without any prompts or selection possibility. This limits the app functionality.
Some output feels like leftovers from the LLM helper (text part), the app seems to need improvement in such details.
Expanding the literature search (ref list) could be useful for such a tool.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Biostatistics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 30 Oct 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 1 30 Oct 25	read	read	read

Christos Nakas, University of Thessaly, Volos, Greece
Budi Sunaryo, Universitas Bung Hatta (UBH), Padang, Indonesia
Pedro Castaneda, Universidad Nacional Toribio Rodriguez de Mendoza, Amazonas, Peru

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

1 Views

02 Jan 2026 | for Version 1

Pedro Castaneda, Universidad Nacional Toribio Rodriguez de Mendoza, Amazonas, Peru

1 Views Cite this report Responses(0)

Approved With Reservations

Is the description of the software tool technically sound?
Partly. The main architecture, packages used and functionalities are described, but key design choices (e.g. handling of prevalence, case–control designs, single‑test focus, no optimal threshold estimation) and their statistical implications are not fully developed.

Are sufficient details of the code, methods and analysis provided to allow replication of the software development and its use by others?
Partly. The code is openly available on GitHub and Zenodo and the main workflow is described, but some implementation details (e.g. variable selection heuristics, CI computation options, testing across environments) are only briefly outlined and would benefit from more explicit, step‑by‑step information.

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Partly. Standard outputs (ROC, PR curves, confusion matrix, sensitivity, specificity, predictive values, F1, balanced accuracy) are explained, but critical caveats—especially around interpretation of PPV/NPV and PR in case–control or low‑prevalence contexts—are not yet clearly flagged for users.

Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
Partly. The article justifies that DTAShiny is user‑friendly and useful for exploratory/educational DTA, but the claims about its analytical breadth and practical utility would be stronger with a more explicit discussion of limitations and of scenarios where the outputs may be misleading.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Data Science

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

10 Views

30 Dec 2025 | for Version 1

Budi Sunaryo, Universitas Bung Hatta (UBH), Padang, West Sumatra, Indonesia

10 Views Cite this report Responses(0)

Approved With Reservations

The manuscript presents DTAShiny, a web-based R Shiny application that enables users to perform Diagnostic Test Accuracy (DTA) analyses through an intuitive graphical interface. The tool allows users to upload data, automatically identify reference and test variables, adjust diagnostic thresholds interactively, and generate standard DTA metrics (sensitivity, specificity, PPV, NPV, AUC), along with visualizations such as ROC curves, precision-recall curves, and distribution plots. The authors correctly identify that existing R packages for DTA require programming skills that create barriers for many clinical researchers. The development of DTAShiny addresses an essential need in making diagnostic test evaluation more accessible.

Detailed Assessment
1. Is the description of the software tool technically sound?
Partly. The description of the architecture and workflow is generally accurate. However, it omits a critical technical limitation: the tool does not account for different study designs, so the predictive values (PPV/NPV) it calculates are valid only for cohort studies, not for standard case-control designs. Additionally, an overstatement of the "calibration-like plot" as a formal diagnostic tool weakens the technical soundness.
2. Are sufficient details provided to allow replication?
Partly. Providing the GitHub repository with source code is a significant strength. However, for accurate replication, the exact computational environment is missing. The article should specify the precise versions of R and all dependent packages used, ideally by including a file such as renv.lock or the output of sessionInfo() in the repository. More details on how thresholds are selected for generating curves would also aid replication.
3. Is sufficient information provided to interpret the results?
Partly. While standard metrics are clearly labeled, the tool lacks essential guidance and warnings for proper interpretation. Crucially, it does not warn users that PPV and NPV are invalid for case-control data, a significant risk for misinterpretation. Guidance on selecting clinically meaningful thresholds and brief explanations of when to use advanced metrics, such as the F1-score, would also improve interpretability for non-expert users.
4. Are the conclusions adequately supported?
Partly. Its described functionality supports the conclusions about the tool's interactivity and user-friendly design. However, broader claims about its performance and impact, such as that it "empowers" users or provides a "well-rounded understanding," are not supported by evidence, such as user testing or validation against established software. The conclusions would be stronger if they were more carefully aligned with the tool's demonstrated features and explicitly acknowledged its current limitations.

Specific Recommendations for Revision
- Address Study Design Limitations: The manuscript and application must explicitly warn users about the limitations of PPV and NPV calculations in case-control studies. Consider implementing an option for users to input an external prevalence value when analyzing case-control data.
- Enhance Reproducibility: Add detailed version information for all software components to the GitHub repository. A renv.lock file would be ideal for ensuring computational reproducibility.
- Improve Interpretation Guidance: Incorporate clear warnings and educational notes within the application interface about the contextual interpretation of metrics, particularly predictive values.
- Moderate Conclusions: Revise the discussion and conclusion sections to more accurately reflect what has been demonstrated versus what is claimed. Explicitly acknowledge the tool's current limitations alongside its strengths.
- Clarify the Calibration Plot: Re-label or re-describe the "calibration-like plot" to avoid implying it is a formal statistical calibration tool, which it is not.

Conclusion
DTAShiny is a promising and valuable tool that makes DTA analysis more accessible through its interactive features. However, it currently has significant limitations, particularly concerning the valid interpretation of predictive values across study designs. With substantial revisions to address its epidemiological foundations, reproducibility, and the balanced reporting of its capabilities, it could become a responsible and valuable contribution to the research community. I recommend that the authors address these concerns and submit a revised version for further consideration.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Information Technology, Computer Networks, Machine Learning, Internet of Things, Data Analytics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

12 Views

20 Nov 2025 | for Version 1

Christos Nakas, University of Thessaly, Volos, Greece

12 Views Cite this report Responses(0)

Approved With Reservations

This might eventually evolve into a useful R shiny app for ROC curve analysis, however, currently feels more like a pre alpha version, given that no limitations are described for the use in case-control studies (where prevalence cannot be estimated from the data). As a consequence, results can be highly misleading since all PPV, NPV, precision-recall estimates can be wrong. It would be useful if the user could provide a prevalence estimate or use the available data for its estimation.
Furthermore, currently, one cannot handle more than one biomarkers and cannot compare biomarkers (uploading a single data file). The app automatically seems to select columns from the data without any prompts or selection possibility. This limits the app functionality.
Some output feels like leftovers from the LLM helper (text part), the app seems to need improvement in such details.
Expanding the literature search (ref list) could be useful for such a tool.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Biostatistics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. Robin X, Turck N, Hainard A, et al.: pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011; 12: 77. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Wickham H, et al.: ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag; 2016. Publisher Full Text

[3] 3. Wickham H, François R, Henry L, et al.: dplyr: A Grammar of Data Manipulation. R package version X.Y.Z. Reference Source

[4] 4. R Core Team: R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 202X. Reference Source

[5] 5. Chang W, et al.: shiny: Web Application Framework for R. R package version X.Y.Z. Reference Source

[6] 6. Quadri SFS, Ahmed M, Lac J: DTAShiny: An Interactive R Shiny Application for Diagnostic Test Accuracy Analysis and Visualization. Zenodo. 2025. Publisher Full Text

[7] 7. StataCorp: Stata Statistical Sokware: Release XX. College Station, TX: StataCorp LLC.; 202X. (If citing Stata itself, or haven for reading its files).

DTAShiny: An Interactive R Shiny Application for Diagnostic Test Accuracy Analysis and Visualization

Abstract

Keywords

Introduction

Methods implementation

Data input and variable detection

Interactive threshold adjustment

DTA metrics calculations

Visualizations

Operations

System requirements

Running the application

Interface walkthrough

Use cases and illustrative examples

Example 1: Biomarker evaluation in screening

Example 2: Educational application

Example 3: Low prevalence condition

Discussion

Strengths

Limitations

Future developments

Conclusion

Data availability statement

Software availability

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated