<i>MetaLLMReporter: An R Shiny App Integrating Meta-Analysis Execution with LLM-Assisted Reporting</i>

laiba khan; Maham khan; Nijat Rzayev; Manpreet Kour; Touqeer Rana; Mahmood Ahmad; Joanne Lac

doi:10.12688/f1000research.166080.1

Home Browse MetaLLMReporter: An R Shiny App Integrating Meta-Analysis Execution...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

MetaLLMReporter: An R Shiny App Integrating Meta-Analysis Execution with LLM-Assisted Reporting

[version 1; peer review: 1 approved with reservations]

laiba khan¹, Maham khan¹, Nijat Rzayev¹, [...] Manpreet Kour¹, Touqeer Rana¹, Mahmood Ahmad¹, Joanne Lac ¹

laiba khan¹, Maham khan¹, [...] Nijat Rzayev¹, Manpreet Kour¹, Touqeer Rana¹, Mahmood Ahmad¹, Joanne Lac ¹

PUBLISHED 23 Jul 2025

Author details Author details

¹ University College London Social Research Institute, London, England, UK

laiba khan
Roles: Investigation, Methodology, Resources, Validation, Visualization, Writing – Review & Editing

Maham khan
Roles: Investigation, Methodology, Resources, Validation, Writing – Review & Editing

Nijat Rzayev
Roles: Investigation, Methodology, Resources, Validation, Writing – Review & Editing

Manpreet Kour
Roles: Resources, Validation, Writing – Review & Editing

Touqeer Rana
Roles: Project Administration, Validation, Writing – Review & Editing

Mahmood Ahmad
Roles: Conceptualization, Data Curation, Formal Analysis, Software, Validation, Writing – Original Draft Preparation

Joanne Lac
Roles: Funding Acquisition

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

This article is included in the RPackage gateway.

Abstract

Background

Statistical tests are numerous in conducting a meta-analysis goes beyond the simple calculation of a group estimate. Combining these diverse results (heterogeneity, bias, sensitivity, subgroups, etc.) into a coherent whole suitable for various groups of people (e.g., academic papers, plain language summaries) is a major challenge.

Methods

We created MetaLLMReporter, an interactive web tool with R (v4.4.0) and Shiny framework (v1.8.1.1) with a bs4Dash (v2.3.2) interface. It accepts user-supplied CSV data for continuous measurements (mean, sd, n) carries out a series of standard meta-analysis procedures using functions from meta, metafor, and dmetar packages. Above all, it integrates Google’s Gemini large language model (LLM) using API calls (httr, jsonlite) to generate automatically systematic written text reports consolidating the analyses in various formats (Cochrane, NEJM, Lancet, Plain Language).

Results/Functionality

The MetaLLMReporter carries out a standard meta-analysis (meta::metacont) and performs additional analyses, including heterogeneity assessment, leave-one-out sensitivity analysis, publication bias tests (meta::metabias), meta-regression (metafor::rma), subgroup analysis (meta::metacont), cumulative meta-analysis (meta::metacum), Bayesian meta-analysis (bayesmeta::bayesmeta), trim-and-fill (meta::trimfill), outlier detection (dmetar::find.outliers), and p-curve analysis (dmetar::pcurve). Text summaries for each analysis is displayed. Users can then trigger the LLM to generate detailed reports formatted in specific journal formats or plain language.

Conclusions

MetaLLMReporter makes it easier to generate textual symbolises various aspects of meta-analysis and uses LLM technology to help write reports for various audiences. It is intended to assist researchers in interpreting and reporting complicated meta-analysis results more effectively.

Keywords

Meta-analysis, Reporting, Interpretation, Large Language Model (LLM), Shiny, R, Automation, Evidence Synthesis, meta, metafor

Corresponding author: Joanne Lac

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2025 khan l et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: khan l, khan M, Rzayev N et al. MetaLLMReporter: An R Shiny App Integrating Meta-Analysis Execution with LLM-Assisted Reporting [version 1; peer review: 1 approved with reservations]. F1000Research 2025, 14:724 (https://doi.org/10.12688/f1000research.166080.1) First published: 23 Jul 2025, 14:724 (https://doi.org/10.12688/f1000research.166080.1) Latest published: 23 Jul 2025, 14:724 (https://doi.org/10.12688/f1000research.166080.1)

Introduction

Meta-analysis is a valuable method of quantitative data synthesis across studies. For continuous outcome measures, it usually entails calculation and combining either Standardized Mean Differences (SMD) or Mean Differences (MD) (Borenstein et al., 2009). However, merely calculating a pooled average is the starting point. A complete meta-analysis also considers the degree to which large the results of the studies are dissimilar from each other (heterogeneity) (Higgins & Thompson, 2002), whether specific studies can be missing (most likely due to publication bias) (Egger et al., 1997), and to what degree the results are sensitive to including or excluding certain studies (sensitivity analysis). Researchers would also like to see how results of studies differ—this can be explored using subgroup analyses or meta-regression (Thompson & Higgins, 2002; Deeks et al., 2008). In other cases, different approaches like cumulative meta-analysis, Bayesian models, or trim-and-fill adjustments might be helpful. Although excellent R packages like meta (Balduzzi et al., 2019; Schwarzer et al., 2023, 2024), metafor (Viechtbauer, 2010), and dmetar (Harrer et al., 2021) have all these functions, putting it all together into a neat, well-presented report is still hard work. Writing up and summarizing results of heterogeneity tests, bias analyses, sensitivity analyses, and subgroup findings can be tedious—especially if you’re doing language writing for particular journals or for non-technical audiences. To address this challenge, we created MetaLLMReporter.

The R Shiny application (Chang et al., 2024; R Core Team, 2024) automates a wide range of common meta-analyses of continuous outcome data with a single CSV file upload.

The most distinctive feature is interoperability with Google’s Gemini large language model (LLM). After completing the analyses, the app then gathers all the results and sends them to the LLM to generate draft reports in various formats—plain language summaries, for example. This allows researchers to produce written descriptions of their results quickly without wasting time and improving readability.

Methods

Implementation

The app is built using R (version 4.4.0) and the Shiny web framework (version 1.8.1.1). It uses the bs4Dash package (version 2.3.2; Granjon, 2023) to create the dashboard interface.

Core meta-analysis engine

A full suite of analyses is run using functions from established R packages:

• Main Pooling: The function meta::metacont is used to pool MD or SMD values, depending on user selection. The user also selects the effect measure, pooling method, heterogeneity estimator, and whether to use a fixed or random effects model.
• Heterogeneity: Key heterogeneity statistics (Q, I², and τ (Borenstein et al., 2009)) are taken from the metacont object.
• Sensitivity: A leave-one-out analysis is done using meta::metainf, and outliers are detected using dmetar::find.outliers.
• Publication Bias: Egger’s regression test is done with meta::metabias, and trim-and-fill analysis with meta::trimfill. P-curve analysis uses dmetar::pcurve.
• Meta-Regression: If columns Reg, Reg2, or Reg3 are available, meta-regression is done using metafor::rma.
• Subgroup Analysis: If a subgroup column is provided, subgroup analysis is performed using meta::metacont.
• Cumulative Analysis: This is handled using meta::metacum.
• Bayesian Analysis: The Bayesian meta-analysis uses bayesmeta::bayesmeta (Röver, 2020), with effect sizes first calculated using metafor::escalc (Lüdecke, 2018).

Data input

Users upload a CSV file via shiny::fileInput. The data must contain columns for meanintervention, sdintervention, totalintervention, meancontrol, sdcontrol, totalcontrol, and author. Optional columns include Reg, Reg2, Reg3, and subgroup. If needed, a default value can be set for totalintervention (Wickham et al., 2023, 2024).

LLM integration

API call

The helper function generate_content is used to send a request via httr::POST (Ooms, 2024), using jsonlite::toJSON (Ooms, 2014) to format the request and fromJSON to process the response. The Gemini API (specifically the gemini-2.0-flash model) is called using a randomly selected API key from a hardcoded list. The user need to insert their own API key if using from github.

Prompting

After the statistical analyses are completed reactively, their textual outputs are collected using capture.output and combined into a single string (allText()). This string is used as the main input for the LLM. Prompts include specific instructions to rewrite the content in different styles (Cochrane, NEJM, Lancet, or Plain Language), along with guidelines for structure and minimum length.

Output processing

The LLM’s raw response is cleaned of basic markdown formatting using the clean_markdown function (based on gsub), and the cleaned output is rendered using shiny::htmlOutput so that formatting like line breaks is preserved (Xie et al., 2024).

User interface

The dashboard interface includes two main tabs:

• The “Inputs” tab handles data upload and lets users choose key meta-analysis settings.
• The “Meta-Analysis Text” tab displays the outputs for each analysis type (Heterogeneity, Leave-One-Out, Publication Bias, etc.), along with four tabs that show the LLM-generated text in Cochrane, NEJM, Lancet, and Plain Language formats.

Operation

You can run MetaLLM Reporter locally with R (version 4.0.0 or higher) and the required packages, or use a hosted version in your browser. To access LLM features, you’ll need an internet connection and a valid Google Gemini API key (user to provide).

Data upload & settings

In the “Inputs” tab, upload a CSV with mean, SD, and sample size for both groups, plus the author name. Optional columns (Reg, Reg2, Reg3, subgroup) enable meta-regression and subgroup analysis. Select your effect measure (MD/SMD) and analysis options. A sample CSV is available for reference.

View analysis outputs

Go to the “Meta-Analysis Text” tab to see results like heterogeneity, publication bias, and regression. The app runs all analyses automatically.

View LLM interpretations

Check the final four tabs (“LLM Cochrane”, “LLM NEJM”, “LLM Lancet”, “LLM Plain”) for AI-generated summaries tailored to different reporting styles.

Use cases

MetaLLM Reporter is designed to assist researchers who have completed a meta-analysis, or possess the requisite data, in consolidating diverse statistical outputs into comprehensive written summaries or reports.

Scenario 1: Drafting a results section: A researcher uploads their meta-analysis dataset. After confirming the analyses run correctly by checking the individual output tabs (e.g, Heterogeneity, Publication Bias), they navigate to the “LLM Cochrane” or “LLM NEJM” tab. They copy the generated text as a starting point for drafting the results section of their systematic review manuscript, carefully reviewing and editing the LLM output for accuracy, completeness, and appropriate nuance.

Scenario 2: Plain language summary generation: For patient education materials or conference abstracts, their searcher validates the “LLM Plain” tab output. This concise summary of the meta-analysis findings is meticulously reviewed for accuracy against the statistical results.

These scenarios demonstrate the tool’s capacity to automate the generation of initial text drafts from complex statistical results, potentially expediting the reporting process.

Analysis

Though a vast number of tools exist for doing meta-analysis, briefing results from heterogeneity tests, bias calculations, sensitivity analyses, and subgroup/meta-regression analyses into intelligible written description relics a massive undertaking.

MetaLLMReporter referrers to this problem by performing not just a plethora of average meta-analysis actions using already existing R packages (meta, metafor, dmetar, etc.) but also by mixing a large language model (LLM) to instinctively produce reports in different styles.

The main revolution is the usage of Google’s Gemini LLM to produce statistical yields into organized texts that are catered for a diverse audience, going from recognized Cochrane-style reports to simple language abstracts. This intends to drastically decrease the writing effort that is expected by researchers, providing them with drafts, including results from various analytical components. The program executes a wide range of appropriate analyses robotically on data upload, making sure the components are readily accessible for the LLM prompt.

Nonetheless, the dependence on LLM-generated texts needs serious deliberation. Whilst possibly saving time, the output must be cautiously reviewed as a draft needing thorough analysis and proof reading by experts. LLMs may mistake the context, misunderstand statistical distinctions, or quite possibly “hallucinate” information. The superiority of the LLM is seriously reliant on the cohesive and extensiveness of the prompt given and with the abilities of the LLM used. The present implementation also necessitates implanting API keys, leading to potential security risks thus making it inappropriate for large scale sharing without changes made. Moreover, the app centres on text production and lacks the interactive plotting aspects found in other meta-analysis tools.

Changes down the line could potentially give users the ability to choose which analysis to incorporate in the LLM prompt, fine tuning the prompts for more precision and style adherence, including visuals together with texts, accompanying further data types, presenting options for diverse LLMs, and applying additional safe API key management.

Ethics approval and consent to participate

This study did not involve human participants, human data, or human tissue. Therefore, ethical approval and consent to participate were not required.

Software availability

Source code available from: https://github.com/mahmood789/LLMSMD

Archived software available from: https://doi.org/10.5281/zenodo.15790137

License: Apache 2.0 (OSI-approved open license)

This software is freely available and can be used without restriction. Use of the LLM features (e.g., Gemini API) requires a valid API key provided by the user. The demonstration dataset and all analysis functionality are included in the GitHub repository for full replication of results.

Data and software availability

Underlying data

No new data were generated or analyzed in support of this research. All data and materials supporting the results or analyses presented in this article are openly available. A sample dataset used for demonstration is included in the GitHub repository and can be used to replicate all study findings.

These datasets include:

The values behind the means, standard deviations, and other measures reported;

The values used to build graphs and figures;

The points extracted from images for analysis.

No new external data were generated during this study. All provided data are openly licensed under the Apache 2.0 license, permitting unrestricted reuse.

Dataset available from: https://github.com/mahmood789/LLMSMD

Persistent identifier: Zenodo DOI https://doi.org/10.5281/zenodo.15790137

License: Apache 2.0

If any dataset cannot be shared due to ethical, privacy, or security concerns, this will be noted in future updates. At present, no such restrictions apply.

Acknowledgments

None.

References

Balduzzi S, Rücker G, Schwarzer G: How to perform a meta-analysis with R: a practical tutorial. Evid. Based Ment. Health. 2019; 22(4): 153–160. PubMed Abstract | Publisher Full Text | Free Full Text
Borenstein M, Hedges LV, Higgins JPT, et al.: Introduction to Meta-Analysis. Wiley; 2009. 978-0470057247.
Chang W, Cheng J, Allaire JJ, et al.: shiny: Web Application Framework for R. R package version 1.8.1.1.2024. Reference Source
Deeks JJ, Higgins JPT, Altman DG, et al.: Chapter 9: Analysing data and undertaking meta-analyses. Higgins JPT, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions (Version 5.0.0). The Cochrane Collaboration; 2008. Reference Source
Egger M, Davey Smith G, Schneider M, et al.: Bias in meta-analysis detected by a simple, graphical test. BMJ. 1997; 315(7109): 629–634. PubMed Abstract | Publisher Full Text | Free Full Text
Granjon D: bs4Dash: A ‘Bootstrap 4’ Version of ‘shinydashboard’. R package version 2.3.2. 2023. Reference Source
Harrer M, Cuijpers P, Furukawa TA, et al.: dmetar: Companion R Package For The Guide ‘Doing Meta-Analysis in R’. R package version 0.0.9000. 2021. Reference Source
Higgins JPT, Thompson SG: Quantifying heterogeneity in a meta-analysis. Stat. Med. 2002; 21(11): 1539–1558. PubMed Abstract | Publisher Full Text
Lüdecke D: esc: Effect Size Computation for Meta-Analysis. R package version [e.g., 0.5.1 or check current].2018. Reference Source
Ooms J: httr: Tools for Working with URLs and HTTP. R package version 1.4.7. 2024. Reference Source
Ooms J: The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805 [stat.CO]. 2014. Reference Source
R Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2024. Reference Source
Röver C: Bayesian random-effects meta-analysis using the bayesmeta R package. J. Stat. Softw. 2020; 93(6): 1–51. Publisher Full Text
Schwarzer G, Balduzzi S, Rücker G: meta: General Package for Meta-Analysis. R package version 7.0-0. 2024. Reference Source
Schwarzer G, Rücker G, Bagheri S: metasens: Advanced Statistical Methods to Model and Adjust for Bias in Meta-Analysis. R package version 1.4-0.2023. (Loaded but not explicitly used in server logic provided). Reference Source
Thompson SG, Higgins JPT: How should meta-regression analyses be undertaken and interpreted? Stat. Med. 2002; 21(11): 1559–1573. PubMed Abstract | Publisher Full Text
Viechtbauer W: Conducting meta-analyses in R with the metafor package. J. Stat. Softw. 2010; 36(3): 1–48. Publisher Full Text
Viechtbauer W: Metafor: Meta-Analysis Package for R. R package version 4.8-0. 2025. Reference Source
Wickham H, François R, Henry L, et al.: dplyr: A Grammar of Data Manipulation. R package version 1.1.4. 2023. Reference Source
Wickham H, Hester J, Bryan J: readr: Read Rectangular Text Data. R package version 2.1.5. 2024. Reference Source
Xie Y, Cheng J, Tan X: DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.33. 2024. Reference Source

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 23 Jul 2025

Author details Author details

¹ University College London Social Research Institute, London, England, UK

laiba khan
Roles: Investigation, Methodology, Resources, Validation, Visualization, Writing – Review & Editing

Maham khan
Roles: Investigation, Methodology, Resources, Validation, Writing – Review & Editing

Nijat Rzayev
Roles: Investigation, Methodology, Resources, Validation, Writing – Review & Editing

Manpreet Kour
Roles: Resources, Validation, Writing – Review & Editing

Touqeer Rana
Roles: Project Administration, Validation, Writing – Review & Editing

Mahmood Ahmad
Roles: Conceptualization, Data Curation, Formal Analysis, Software, Validation, Writing – Original Draft Preparation

Joanne Lac
Roles: Funding Acquisition

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 23 Jul 2025, 14:724

https://doi.org/10.12688/f1000research.166080.1

Copyright

© 2025 khan l et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

khan l, khan M, Rzayev N et al. MetaLLMReporter: An R Shiny App Integrating Meta-Analysis Execution with LLM-Assisted Reporting [version 1; peer review: 1 approved with reservations]. F1000Research 2025, 14:724 (https://doi.org/10.12688/f1000research.166080.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 23 Jul 2025

Views

5

Reviewer Report 19 Sep 2025

Xiangmin Shen, Northwestern University, Evanston, Illinois, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.182939.r413883

Summary
The article presents MetaLLMReporter, an R Shiny application that runs standard meta-analysis for continuous outcomes and then generates draft narrative reports using a large language model (LLM). The app reads a user CSV to calculate aggregated means, standard ... Continue reading

Summary
The article presents MetaLLMReporter, an R Shiny application that runs standard meta-analysis for continuous outcomes and then generates draft narrative reports using a large language model (LLM). The app reads a user CSV to calculate aggregated means, standard deviations, and sample sizes, fits fixed-effect or random-effects models using established R packages, prints textual outputs for common diagnostics such as heterogeneity, leave-one-out analysis, publication bias tests, trim-and-fill, meta-regression, subgroup and cumulative analyses, and an optional Bayesian model. It then sends the captured text to an LLM to produce prose in several editorial styles. The paper provides a public GitHub repository and a Zenodo archive.

Rationale
The motivation is clear: investigators often have many numerical outputs from a meta-analysis and need coherent text for reports or manuscripts. The paper frames the tool as a bridge between analytic output and narrative reporting.

To strengthen the motivation further, the authors can add one concrete use case that maps each displayed analysis block to a common reporting requirement, and a short note that LLM text must be checked by a human.

Technical soundness of the description
The top-level architecture is sound and maps well to standard practice in R. However, several details in the current code can yield incorrect or fragile results.

First, cumulative meta-analysis is called without enforcing a chronological or pre-specified ordering. The output will depend on input row order. A meta-analysis usually orders cumulative results by year or another meaningful variable. The app should require a study-level ordering column and sort before calling the cumulative procedure.

Second, if the uploaded CSV lacks the intervention sample size, the server fills one default value from a numeric input and uses it for all studies. This can bias effect size variances and weights. The app should reject inputs that miss required counts for either group or ask for per-study values.

Third, in the meta-regression and Bayesian analysis, the code recomputes study effects with metafor::escalc() using the UI’s effect measure, while the main model uses meta::metacont() with a configurable SMD method. This can produce a mismatch if the SMD correction in the two paths differs. The regression should reuse the study-level effects and standard errors from the fitted meta object or enforce the exact same SMD specification in escalc().

Fourth, the “Study Weights” text pastes names (m$w.random) with the numeric weights. In typical meta::metacont fits the weight vector is not named, so the display can be unlabeled.

Finally, there are plaintext LLM API keys in the code and the key is passed in the URL. Keys must not be hardcoded. They should be read from environment variables or a secrets store and never logged. Also, sampling from multiple keys seems unnecessary.

These issues do not change the overall concept, but they affect correctness, security, and maintainability.

Reproducibility
The article gives repository and archive links, the tool is small, and package names are stated, which makes basic replication feasible.

For stronger reproducibility across systems, the repository should include an renv.lock with exact package versions or a minimal Dockerfile and instructions to run the app. The app should also document the exact CSV schema with a minimal valid example and validation rules, and explain how to set the LLM API key at runtime. Without these pieces, users may encounter version drift or silent data issues.

Interpreting outputs and datasets
The tool prints many standard summaries, but it does not guide users on when a diagnostic is meaningful or how to read it. For example, Egger’s test can be unreliable with few studies or strong heterogeneity; p-curve has assumptions that should be stated; cumulative meta-analysis requires a justified ordering. The app should add short, in-context notes that state applicability conditions and common pitfalls, and it should display a per-study table of computed effects and variances so users can verify inputs. Adding forest and funnel plots would also help users cross-check the prose.

Conclusions supported by findings
It is reasonable to conclude that the app can run standard analyses and produce draft narratives. However, claims about reporting support and workflow gains would be stronger with a small worked example that starts from the sample CSV, shows the key numeric outputs, and compares the LLM drafts with a hand-written baseline. If any performance or quality claim is made, it should be backed with either expert review or a simple user study.

Suggestions to make the article scientifically sound
(i) Enforce a valid ordering for cumulative meta-analysis by requiring an ordering column and sorting before computation.
(ii) Avoid filling in default values and require complete per-study counts, or explicitly document and implement a principled missing-data strategy that does not bias variances.
(iii) Align the meta-regression inputs with the primary model by reusing study-level effects and standard errors from the meta fit or by ensuring escalc() applies the same SMD correction as the main model.
(iv) Replace the unlabeled weight printout with a labeled table that pairs each studlab with its weight.
(v) Remove hardcoded LLM API keys from the code and read a single key from the environment. Do not pass keys in query strings. Document setup steps for users.
(vi) Add input validation and short interpretive notes next to each diagnostic to warn about small-k limits, heterogeneity effects, and assumptions.
(vii) Provide an renv.lock or Dockerfile and a minimal end-to-end script or vignette that reproduces the example in the article.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Data mining, measurement

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 23 Jul 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1
Version 1 23 Jul 25	read

Xiangmin Shen, Northwestern University, Evanston, USA

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

5 Views

19 Sep 2025 | for Version 1

Xiangmin Shen, Northwestern University, Evanston, Illinois, USA

5 Views Cite this report Responses(0)

Approved With Reservations

Summary
The article presents MetaLLMReporter, an R Shiny application that runs standard meta-analysis for continuous outcomes and then generates draft narrative reports using a large language model (LLM). The app reads a user CSV to calculate aggregated means, standard deviations, and sample sizes, fits fixed-effect or random-effects models using established R packages, prints textual outputs for common diagnostics such as heterogeneity, leave-one-out analysis, publication bias tests, trim-and-fill, meta-regression, subgroup and cumulative analyses, and an optional Bayesian model. It then sends the captured text to an LLM to produce prose in several editorial styles. The paper provides a public GitHub repository and a Zenodo archive.

Rationale
The motivation is clear: investigators often have many numerical outputs from a meta-analysis and need coherent text for reports or manuscripts. The paper frames the tool as a bridge between analytic output and narrative reporting.

To strengthen the motivation further, the authors can add one concrete use case that maps each displayed analysis block to a common reporting requirement, and a short note that LLM text must be checked by a human.

Technical soundness of the description
The top-level architecture is sound and maps well to standard practice in R. However, several details in the current code can yield incorrect or fragile results.

First, cumulative meta-analysis is called without enforcing a chronological or pre-specified ordering. The output will depend on input row order. A meta-analysis usually orders cumulative results by year or another meaningful variable. The app should require a study-level ordering column and sort before calling the cumulative procedure.

Second, if the uploaded CSV lacks the intervention sample size, the server fills one default value from a numeric input and uses it for all studies. This can bias effect size variances and weights. The app should reject inputs that miss required counts for either group or ask for per-study values.

Third, in the meta-regression and Bayesian analysis, the code recomputes study effects with metafor::escalc() using the UI’s effect measure, while the main model uses meta::metacont() with a configurable SMD method. This can produce a mismatch if the SMD correction in the two paths differs. The regression should reuse the study-level effects and standard errors from the fitted meta object or enforce the exact same SMD specification in escalc().

Fourth, the “Study Weights” text pastes names (m$w.random) with the numeric weights. In typical meta::metacont fits the weight vector is not named, so the display can be unlabeled.

Finally, there are plaintext LLM API keys in the code and the key is passed in the URL. Keys must not be hardcoded. They should be read from environment variables or a secrets store and never logged. Also, sampling from multiple keys seems unnecessary.

These issues do not change the overall concept, but they affect correctness, security, and maintainability.

Reproducibility
The article gives repository and archive links, the tool is small, and package names are stated, which makes basic replication feasible.

For stronger reproducibility across systems, the repository should include an renv.lock with exact package versions or a minimal Dockerfile and instructions to run the app. The app should also document the exact CSV schema with a minimal valid example and validation rules, and explain how to set the LLM API key at runtime. Without these pieces, users may encounter version drift or silent data issues.

Interpreting outputs and datasets
The tool prints many standard summaries, but it does not guide users on when a diagnostic is meaningful or how to read it. For example, Egger’s test can be unreliable with few studies or strong heterogeneity; p-curve has assumptions that should be stated; cumulative meta-analysis requires a justified ordering. The app should add short, in-context notes that state applicability conditions and common pitfalls, and it should display a per-study table of computed effects and variances so users can verify inputs. Adding forest and funnel plots would also help users cross-check the prose.

Conclusions supported by findings
It is reasonable to conclude that the app can run standard analyses and produce draft narratives. However, claims about reporting support and workflow gains would be stronger with a small worked example that starts from the sample CSV, shows the key numeric outputs, and compares the LLM drafts with a hand-written baseline. If any performance or quality claim is made, it should be backed with either expert review or a simple user study.

Suggestions to make the article scientifically sound
(i) Enforce a valid ordering for cumulative meta-analysis by requiring an ordering column and sorting before computation.
(ii) Avoid filling in default values and require complete per-study counts, or explicitly document and implement a principled missing-data strategy that does not bias variances.
(iii) Align the meta-regression inputs with the primary model by reusing study-level effects and standard errors from the meta fit or by ensuring escalc() applies the same SMD correction as the main model.
(iv) Replace the unlabeled weight printout with a labeled table that pairs each studlab with its weight.
(v) Remove hardcoded LLM API keys from the code and read a single key from the environment. Do not pass keys in query strings. Document setup steps for users.
(vi) Add input validation and short interpretive notes next to each diagnostic to warn about small-k limits, heterogeneity effects, and assumptions.
(vii) Provide an renv.lock or Dockerfile and a minimal end-to-end script or vignette that reproduces the example in the article.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Data mining, measurement

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] Balduzzi S, Rücker G, Schwarzer G: How to perform a meta-analysis with R: a practical tutorial. Evid. Based Ment. Health. 2019; 22(4): 153–160. PubMed Abstract | Publisher Full Text | Free Full Text

[2] Borenstein M, Hedges LV, Higgins JPT, et al.: Introduction to Meta-Analysis. Wiley; 2009. 978-0470057247.

[3] Chang W, Cheng J, Allaire JJ, et al.: shiny: Web Application Framework for R. R package version 1.8.1.1.2024. Reference Source

[4] Deeks JJ, Higgins JPT, Altman DG, et al.: Chapter 9: Analysing data and undertaking meta-analyses. Higgins JPT, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions (Version 5.0.0). The Cochrane Collaboration; 2008. Reference Source

[5] Egger M, Davey Smith G, Schneider M, et al.: Bias in meta-analysis detected by a simple, graphical test. BMJ. 1997; 315(7109): 629–634. PubMed Abstract | Publisher Full Text | Free Full Text

[6] Granjon D: bs4Dash: A ‘Bootstrap 4’ Version of ‘shinydashboard’. R package version 2.3.2. 2023. Reference Source

[7] Harrer M, Cuijpers P, Furukawa TA, et al.: dmetar: Companion R Package For The Guide ‘Doing Meta-Analysis in R’. R package version 0.0.9000. 2021. Reference Source

[8] Higgins JPT, Thompson SG: Quantifying heterogeneity in a meta-analysis. Stat. Med. 2002; 21(11): 1539–1558. PubMed Abstract | Publisher Full Text

[9] Lüdecke D: esc: Effect Size Computation for Meta-Analysis. R package version [e.g., 0.5.1 or check current].2018. Reference Source

[10] Ooms J: httr: Tools for Working with URLs and HTTP. R package version 1.4.7. 2024. Reference Source

[11] Ooms J: The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805 [stat.CO]. 2014. Reference Source

[12] R Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2024. Reference Source

[13] Röver C: Bayesian random-effects meta-analysis using the bayesmeta R package. J. Stat. Softw. 2020; 93(6): 1–51. Publisher Full Text

[14] Schwarzer G, Balduzzi S, Rücker G: meta: General Package for Meta-Analysis. R package version 7.0-0. 2024. Reference Source

[15] Schwarzer G, Rücker G, Bagheri S: metasens: Advanced Statistical Methods to Model and Adjust for Bias in Meta-Analysis. R package version 1.4-0.2023. (Loaded but not explicitly used in server logic provided). Reference Source

[16] Thompson SG, Higgins JPT: How should meta-regression analyses be undertaken and interpreted? Stat. Med. 2002; 21(11): 1559–1573. PubMed Abstract | Publisher Full Text

[17] Viechtbauer W: Conducting meta-analyses in R with the metafor package. J. Stat. Softw. 2010; 36(3): 1–48. Publisher Full Text

[18] Viechtbauer W: Metafor: Meta-Analysis Package for R. R package version 4.8-0. 2025. Reference Source

[19] Wickham H, François R, Henry L, et al.: dplyr: A Grammar of Data Manipulation. R package version 1.1.4. 2023. Reference Source

[20] Wickham H, Hester J, Bryan J: readr: Read Rectangular Text Data. R package version 2.1.5. 2024. Reference Source

[21] Xie Y, Cheng J, Tan X: DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.33. 2024. Reference Source

MetaLLMReporter: An R Shiny App Integrating Meta-Analysis Execution with LLM-Assisted Reporting

Abstract

Background

Methods

Results/Functionality

Conclusions

Keywords

Introduction

Methods

Implementation

Core meta-analysis engine

Data input

LLM integration

User interface

Operation

Data upload & settings

View analysis outputs

View LLM interpretations

Use cases

Analysis

Ethics approval and consent to participate

Software availability

Data and software availability

Underlying data

Acknowledgments

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated