Online algorithm for assignment of specimens to pooled or individual testing using risk models provides a practical way to increase testing capacity

Alexander V. Alekseyenko

doi:10.12688/f1000research.126285.1

Home Browse Online algorithm for assignment of specimens to pooled or individual...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Brief Report

Online algorithm for assignment of specimens to pooled or individual testing using risk models provides a practical way to increase testing capacity

[version 1; peer review: 3 approved with reservations]

Alexander V. Alekseyenko

PUBLISHED 23 Jan 2023

Author details Author details

Biomedical informatics Center, Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, 29403, USA

Alexander V. Alekseyenko
Roles: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Software, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background: To improve throughput in diagnostic and screening testing for infectious diseases, I developed a straight-forward algorithm that uses individual risk to optimize the decision about pooled or individual testing.
Methods: The online greedy algorithm provides an recommendation for filling pooled testing queue for optimal testing in pools of variable size. Observational data from Medical University of South Carolina COVID-19 diagnostic testing was used to estimate capacity gains under this algorithm versus optimal fixed pooling based on population prevalence.
Results: The online pooling recommendations based on this algorithm resulted in statistically better capacity gains than optimal pools of fixed size (P-value 0.003 and 0.002, for pools of 5 or 6, respectively). This is especially significant since the underlying individual-level risk prediction model attained only a moderate predictive accuracy.
Conclusions: This result suggests that when combined with a better risk prediction and integrated in an appropriate informatics ecosystem this approach cab offers an opportunity for resilient pooled testing strategies for pathogens while incorporating relevant operational constraints of pathology laboratories.

Keywords

pooled testing, individual-level risk, machine learning, COVID19

Corresponding author: Alexander V. Alekseyenko

Competing interests: AVA is a scientific advisory board member for Second Genome Inc., which has not contributed to this research.

Grant information: AVA is supported by National Institutes of Health National Library of Medicine (NIH/NLM) [R01 LM012517], National Institutes of Health National Center for Advancing Translational Sciences (NIH/NCATS) [R21 TR002513, UL1 TR001450], and National Institutes of Health National Cancer Center (NIH/NCI) [U54 CA210962].
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2023 Alekseyenko AV. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Alekseyenko AV. Online algorithm for assignment of specimens to pooled or individual testing using risk models provides a practical way to increase testing capacity [version 1; peer review: 3 approved with reservations]. F1000Research 2023, 12:85 (https://doi.org/10.12688/f1000research.126285.1) First published: 23 Jan 2023, 12:85 (https://doi.org/10.12688/f1000research.126285.1) Latest published: 23 Jan 2023, 12:85 (https://doi.org/10.12688/f1000research.126285.1)

Introduction

The coronavirus disease 2019 (COVID-19) pandemic has placed a demand for massive, rapid, and accurate diagnostic testing. A number of reports recommend pooled testing to help increase testing capacity. For example, a recent report provides specific estimates of cost savings in a pooled testing setting.¹ Biochemically, multiple reports demonstrate that RT-qPCR (reverse transcription quantitative real-time polymerase chain reaction) tests are amenable to pooled testing strategies.²^–⁴

Pooled testing has a long legacy of quantitative methodologies and some practical implementation successes. A review of pooled testing can be found at Wiley StatsRef: Statistics Reference Online.⁵ Within pooled testing methodologies, hierarchical two-step approach is the oldest and the simplest. The approach involves splitting subjects to be tested in equal size groups (pools) and testing each pool first. If a group test result is negative, so is the entire group. If the group is positive, each individual in the group is tested individually. Many variations of this approach have been proposed over the years. The most relevant set of techniques uses individual-level risks in conjunction with pooled testing to determine appropriate pool sizes for more efficient testing. These methods are typically concerned with optimization for given a collection of specimens (e.g. Ref. 6). Unfortunately, such algorithms do not always fit the workflow of pathology and laboratory medicine. It is desirable to be able to make online (at the time of encounter) decisions to assign a specimen to a pool at a point the specimen is first received in the laboratory. Many labs have limited ability to manipulate and rearrange the specimens multiple times to establish optimal pools. Therefore, an online algorithm that provides an immediate recommendation about better pooling strategies for the specimens is of practical importance for successful implementation of pooled testing strategies. For this reason, fixed size pooling, which cannot account for individual-level risk even when available, is by far the most popular approach in practice.

Informatics and artificial intelligence tools have been mobilized to help with the pandemic by allowing for infection risk prediction at the individual level. These risk predictions can be leveraged workflows to prioritize valuable clinical resources.⁷ In this report I demonstrate that a greedy online algorithm for specimen assignment based on individual risk predictions can increase COVID-19 testing capacity in a way suitable for providing pooling recommendations for specimens as they come in for testing. Figure 1 presents and overview of this approach.

Figure 1. Online greedy algorithm for pool assignment.

A. The algorithm relies on predictive model informed risk that a given specimen to be tested is positive, p_i. Population prevalence rates and arbitrary predictive models can be used based on the available predictors, such as basic demographics, risk factors, symptoms, natural language processing derived features from clinical notes, etc. The probabilities are used to group a stream of specimens to be tested into pools that will be tested together. Should a pool test negative, all of the specimens in the pool are recorded as negative, resulting in increased capacity for testing. For a positive pool, additional ascertainment of each individual specimen will be required for a final result. B. online algorithm makes a decision for any new specimen to either add it to a pool that is being formed or to end forming that pool and start a new one with this specimen. The decision is made based on expected capacity gain by using pooled testing calculated from the p_i of the specimen. EHR: Electronic health record.

Methods

Pooled testing with individual risk estimates

Suppose the positivity rate is $p$ among the $k$ individuals to be tested. The probability that the pool of these individuals tests positive is $P_{k} = 1 - {(1 - p)}^{k}$ , which is one minus the probability that all subjects are negative. Two-step pooled testing requires individual retesting of everyone in a positive pool (Figure 1A). Thus, the expected number of tests is $E_{k} = 1 + k P_{k} .$ The capacity gain is the ratio of the number of subjects tested to the number of physical units of test performed, $G_{k} = k / E_{k} .$ Capacity gains with pooled testing are achieved when a pool of specimens tests negative (Figure 1A).

Suppose individual (a priori) estimates of being positive are available for each individual. Given these estimates, $p = {\{p_{i}\}}_{1 \leq i \leq k},$ the probability that a pool tests positive is $P_{k} (p) = 1 - \prod_{i = 1}^{k} (1 - p_{i}) = P_{k - 1} (p) (1 - p_{k}) + p_{k}$ , and the expected capacity gain is $G_{k} (p) = \frac{k}{1 + n P_{k} (p)}$ . The key to deriving a greedy online algorithm is that both of these quantities, $P_{k} (p)$ and $G_{k} (p)$ , can be expressed as recurrence relationships, which depend on the quantities already computed for a pool of smaller size. This allows one to maintain an online estimate of the capacity gain of a collection of already processed specimens when making a decision about a new specimen.

Demonstration predictive model for individual risk estimation

The estimates of a priori individual risk have been obtained using logistic regression. The evaluation data has been collected incidental to a Medical University of South Carolina (MUSC) IRB (Pro00079660) approved study on the Living μBiome Bank^TM study⁸^,⁹ of the microbiomes associated with infectious disease testing. The data consisted of COVID-19 test result (response) for the subjects with conclusive test result (“Positive” or “Negative”) for adult (age ≥18) subjects undergoing testing at MUSC Molecular Pathology Lab between March 12 and June 6, 2020 (32,851 cases in total). The design of this study was based on convenience sampling in a relatively short time interval. Cases obtained between May 21^st and 6^th, 2020 were not used for model fitting, and constituted the testing data. Predictors included subject age and indicator variables for whether the test is a (i) follow-up, (ii) immediately preceding test has been positive; (iii) the patient is hospitalized; (iv) hospital order location; and interaction term between age and (ii). Logistic regression model followed by stepwise backward feature elimination based on Akaike Information Criterion (AIC) was used for model selection in training data only. The performance of the model has been evaluated in both the training and the testing data separately. The analyses have been conducted using R statistical programming environment version 3.6.1.

Evaluation of adaptive pooling strategy

The number of physical tests needed and capacity gains of the online algorithm has been compared with optimal uniform fixed pool sizes for based on population prevalence rates.¹ For observations in each day in the testing data, averages of 1,000 permutations of the order of the specimens provided for order-independent estimates of the number of tests. One-sided Wilcoxon signed rank sum test was used to evaluate the hypothesis that online recommendations resulted in less tests.

Results

The online pooling algorithm

The online pooling algorithm (Figure 1B) has been specifically designed to make pooling decisions about each specimen as it arrives for testing. The individual risk information and the estimates of the capacity gains from already processed specimens allows the algorithm to make a determination to add the specimen to the pool that is currently being filled or to close that pool and start a new one with the current specimen.

A basic logistic regression model provides moderate predictive accuracy of individual risk

The logistic regression model was meant to provide simplistic estimates of individual risk to demonstrate the feasibility and utility of the approach. The variables included in the data showed statistically significant differences across the training and testing data (Table 1), indicating the potential for suboptimal predictive performance.¹¹ The predictive model provides for a moderate predictive accuracy with 0.62 area under receiver operating characteristic curve estimate in testing data.

Table 1. Distribution of patient characteristics included in the predictive model.

Variable (%)	Overall (32,851)	Data subset (n)		Difference in training vs. testing data, χ² test, P value (degrees of freedom)	Included in the final predictive model
Variable (%)	Overall (32,851)	Training (25,714)	Testing (7,137)		Included in the final predictive model
Positive	4.68	4.71	4.55	0.59 (1)	Response
Repeat visit test	6.07	5.09	9.58	<10^-16 (1)	Yes
Previously tested positive	0.88	0.85	0.99	0.27 (1)	Yes
Hospitalized	13	13	12	0.066 (1)	Yes
Tests ordered from hospital^a	14	15	13	<10^-5 (1)	No^b
Age group (years)					Yes
19–40	28	29	25	<10^-12 (2)
40–70	53	52	54
>70	19	19	21
Age group (years) within subjects previously tested positive					Yes^c
Total (n)	289	218	71
19 – 40	27	23	38	0.0025 (2)
40-70	43	48	25
>70	30	28	37

a The remaining tests have been ordered from ambulatory or community locations.

b Dropped from the model based on Akaike Information Criteria (AIC) in backward elimination step.

c Included as interaction term of age and indicator of a previous positive test.

Parameter estimates and other details of the model fit are shown in Figure 2. The model demonstrates that age, hospitalization status and whether the individual has been previously tested and/or tested positive are all good high level predictors of risk of positive test. The model is plagued by the imbalance of low and high risk patients, indicative of the relatively low population-level risk. Nonetheless, the predicted and empirical risk seem to correlate well, albeit with large variability in the high risk group.

Figure 2. A simple logistic regression model provides sensible estimates of positive rates.

A. R generalized linear regression function call and output following backwards stepwise elimination is illustrated. The features included in the final model were an indicator of whether this was a follow up test (Follow_up), an indicator of whether the individual had tested positive at any point previously (Previous_positive), age group (18–40, 40–70, >70), and indicator of whether the individual is hospitalized. An interaction of age and previous positivity is likewise retained in the model. B. Model performance evaluation included comparison of the empirical and predicted probabilities for groups of individuals with matching predictor values (follow up testing indicator, previous positivity, age, and hospitalization status). The model shows good concordance between the empirical and predicted risk in the lower risk range (inset), and large variability within the higher risk groups.

The online algorithm results in superior capacity gains

As is already known from the recent literature, two-step pooling can provide capacity gains over testing everyone individually. This is also demonstrated in our testing data using fixed pools of 5 or 6 specimens (Table 2). These fixed pool sizes have been chosen for comparison because of their optimality given prevalence rates. The evaluation of the online algorithm shows that the implementation of this approach may result in doubling of the testing capacity over testing individually (Table 2). Moreover, on 12 out of 17 days in the testing data the online approach resulted in less tests than fixed pool sizes. These differences were statistically significant for both fixed pool sizes (P value 0.003 and 0.002, respectively).

Table 2. Evaluation of expected capacity increase from pooled testing using uniform and adaptively selected pools.

Total individual tests^a	Number of positive subjects	Expected number of tests using alternative pooling strategies^b			Expected capacity increase by pooling strategy^c
Total individual tests^a	Number of positive subjects	Pools of 5	Pools of 6	Online	Pools of 5	Pools of 6	Online
534	15	178.3	173.4	167.0	2.99	3.08	3.20
1,125	49	449.4	452.5	433.0	2.50	2.49	2.60
250	7	83.4	81.7	73.5	3.00	3.06	3.40
38	7	33.3	35.3	31.4	1.14	1.08	1.21
68	0	14.0	12.0	15.0	4.86	5.67	4.55
554	17	191.3	188.0	175.4	2.90	2.95	3.16
826	28	296.9	292.8	297.2	2.78	2.82	2.78
389	20	169.0	171.7	157.8	2.30	2.27	2.46
836	47	378.5	386.2	374.9	2.21	2.16	2.23
223	6	73.7	72.1	68.9	3.03	3.09	3.24
17	1	9.0	9.0	5.7	1.89	1.89	2.98
81	5	39.6	40.5	41.5	2.04	2.00	1.95
398	21	174.8	178.3	170.6	2.28	2.23	2.33
718	36	307.7	311.1	302.1	2.33	2.31	2.38
392	24	185.4	190.4	191.2	2.11	2.06	2.05
619	42	307.7	318.3	304.8	2.01	1.94	2.03
69	0	14.0	12.0	12.8	4.93	5.75	5.41

a Each row represents an individual day in the training data.

b Results based on 1,000 random permutations of the specimen input order.

c Highest capacity increase strategy for each day is shown in bold.

Discussion

The online nature of the presented algorithm allows for its easy implementation in many existing laboratory medicine workflows. For example, it may be used to provide pooling recommendations as the specimens are scanned upon receipt in the lab for testing. This feature is unique and important for a feasible and practical solution that fits the existing laboratory medicine workflow. Alternative approaches that optimize the pools globally for a collection of specimens (e.g. Ref. 6) may offer better performance in terms of capacity gains, but require additional manipulation of the specimens to form the pools, which may be feasible in some, but not all workflows. Implementations of these global optimizing approaches may be feasible when pool assignments can happen off-line, for example during transport of a batch of specimens from a collection site to a testing facility. With that respect, the online approach offers simplicity and appeal for laboratory management that is traded for potential global suboptimality.

The exact implementation of online pooling approach may need to meet specific operational constraints to be practical. For example, some laboratories may only be capable of testing in pools of fixed maximum size. These constraints can be naturally incorporated into modified online pooling algorithms. More sophisticated versions of the algorithms are easy to imagine as well. For example, a parallel fulfilment of multiple pools simultaneously can be accommodated in a straightforward extension.

Institutional implementation of any pooled testing approach that utilizes individual-level risks requires solutions to many informatics ecosystem problems. First, the models providing individual-level risk predictions need to be updated frequently to account for changes in prevalence by risk factors, and other contributors to model drift. In practice, a nightly model fit update may be feasible and necessary. Second, the model predictions need to be triggered at an appropriate time between specimen collection and the time it arrives into a laboratory for testing. Third, pooling recommendations have to be either pre-computed or involve only lightweight computations. In either case these recommendations have to be easily available in the laboratory information system. Combined these challenges point to likely requirement of high intra-institutional cooperation between laboratory medicine, analytics, data science, and informatics operations.

The testing capacity increases by pooled testing rely on the quality of the predictive models. When predictors are not available population prevalence rates can be input into the online algorithm, and the resulting two-step groupings will be equivalent to optimal pooling into pools of fixed size. In this paper, the evaluation of the algorithm involved clearly suboptimal sets of predictors and risk prediction approach (multivariable logistic regression). Nonetheless, the online algorithm provides an improvement over simple two-step pooling. Better predictive models will result in even larger capacity gains. Improved model predictivity could result from employing the data that is readily available or computable at the time of diagnostic specimen collection. For example, many of the data elements from the Health and Human Services guidance on laboratory reporting¹⁰ could be included. Other structured data, such as questionnaires collecting evidence and degree of exposure, and telehealth-derived variables can prove useful as well. Likewise, unstructured text data processed by natural language processing techniques can be useful.⁷ Further, more sophisticated machine learning and artificial intelligence approaches may be used to combine all of the available data sources for superior risk estimates in online pooling recommendations.

Conclusions

The results reported herein are immediately translatable to laboratory medicine operations. Even without sophisticated predictive models, given the current state of the pandemic the online pooling algorithm can double the COVID-19 testing capacity. Better risk prediction models may result in even better capacity improvements. In the longer term, similar strategies can be used for implementation of massive scale testing for other diseases.

Data availability

Access to the full dataset cannot be made available publicly since it contains elements of personal health information (PHI); however, access will be granted to readers and reviewers upon signing MUSC IRB-approved data use agreement. Please contact the author to initiate the process (alekseye@musc.edu).

Summary data and software in support of this work is available at https://github.com/alekseyenko/AIIAT/ and https://doi.org/10.5281/zenodo.7541444.¹¹ The file predict_positive_r2.pdf contains summary data.

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

Acknowledgments

The author would like to thank Katie Kirchoff, Bashir Hamidi, Jihad Obeid, Matthew Turner, Stephane Meystre, and Leslie A. Lenert for discussing the merits of the ideas presented in this brief report.

References

1. Cherif A, Grobe N, Wang X, et al.: Simulation of Pool Testing to Identify Patients With Coronavirus Disease 2019 Under Conditions of Limited Test Availability. JAMA Netw. Open. 2020; 3(6): e2013075. PubMed Abstract | Publisher Full Text | Free Full Text
2. Lohse S, Pfuhl T, Berko-Gottel B, et al.: Pooling of samples for testing for SARS-CoV-2 in asymptomatic people. Lancet Infect. Dis. 2020; 20: 1231–1232. PubMed Abstract | Publisher Full Text | Free Full Text
3. Yelin I, Aharony N, Shaer Tamar E, et al.: Evaluation of COVID-19 RT-qPCR test in multi-sample pools. Clin. Infect. Dis. 2020; 71: 2073–2078. PubMed Abstract | Publisher Full Text | Free Full Text
4. Abdalhamid B, Bilder CR, McCutchen EL, et al.: Assessment of Specimen Pooling to Conserve SARS CoV-2 Testing Resources. Am. J. Clin. Pathol. 2020; 153(6): 715–718. PubMed Abstract | Publisher Full Text | Free Full Text
5. Bilder CR: Group Testing for Identification. Wiley StatsRef: Statistics Reference Online.2019; pp. 1–11.
6. Xiong W, Lu H, Ding J: Determination of Varying Group Sizes for Pooling Procedure. Comput. Math. Methods Med. 2019; 2019: 4381084.
7. Obeid JS, Davis M, Turner M, et al.: An AI approach to COVID-19 infection risk assessment in virtual visits: a case report. J. Am. Med. Inform. Assoc. 2020; 27: 1321–1325. PubMed Abstract | Publisher Full Text | Free Full Text
8. Living BioBank:2020.Reference Source
9. Alekseyenko AV, Hamidi B, Faith TD, et al.: Each patient is a research biorepository: informatics-enabled research on surplus clinical specimens via the living BioBank. J. Am. Med. Inform. Assoc. 2021; 28(1): 138–143. PubMed Abstract | Publisher Full Text | Free Full Text
10. COVID-19 Pandemic Response, Laboratory Data Reporting: CARES Act Section 18115: Department of Health and Human Services.2020.Reference Source
11. Alekseyenko A:alekseyenko/AIIAT: For F1000 publication (v1.0). [Dataset]. Zenodo. 2023. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 23 Jan 2023

Author details Author details

Biomedical informatics Center, Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, 29403, USA

Competing interests

AVA is a scientific advisory board member for Second Genome Inc., which has not contributed to this research.

Grant information

AVA is supported by National Institutes of Health National Library of Medicine (NIH/NLM) [R01 LM012517], National Institutes of Health National Center for Advancing Translational Sciences (NIH/NCATS) [R21 TR002513, UL1 TR001450], and National Institutes of Health National Cancer Center (NIH/NCI) [U54 CA210962].
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 23 Jan 2023, 12:85

https://doi.org/10.12688/f1000research.126285.1

© 2023 Alekseyenko AV. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Alekseyenko AV. Online algorithm for assignment of specimens to pooled or individual testing using risk models provides a practical way to increase testing capacity [version 1; peer review: 3 approved with reservations]. F1000Research 2023, 12:85 (https://doi.org/10.12688/f1000research.126285.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 1

VERSION 1

PUBLISHED 23 Jan 2023

Views

Reviewer Report 11 Sep 2024

F. M. Javed Mehedi Shamrat, Computer System and Technology, University of Malaya, Kuala Lumpur, Malaysia

Approved with Reservations

https://doi.org/10.5256/f1000research.138681.r176925

The paper is effectively written and presents the novelty in a proficient manner.

The clarity of the methodological approach requires further elaboration.
It is suggested that the author incorporate more statistical

The paper is effectively written and presents the novelty in a proficient manner.

The clarity of the methodological approach requires further elaboration.
It is suggested that the author incorporate more statistical analysis into the Results section to evaluate the findings.
The visual representations appear to be lacking in clarity, and it is recommended that the author provide high-resolution images.
The clarity of the work in the conclusion section could be enhanced.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Data Science, Machine Learning, Artificial Intelligence, Bioinformatics, Image Processing

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response 12 Sep 2024

Alexander Alekseyenko, Biomedical informatics Center, Department of Public Health Sciences, Medical University of South Carolina, Charleston, 29403, USA

12 Sep 2024

Author Response

Thank you for taking the time to submit these points. I am happy to respond to any specific concerns. As they stand your comments are too general to be actionable. I ... Continue reading Thank you for taking the time to submit these points. I am happy to respond to any specific concerns. As they stand your comments are too general to be actionable. I kindly request that you either provide more detail about the changes you request, or change your evaluation to APPROVED.
Thank you for taking the time to submit these points. I am happy to respond to any specific concerns. As they stand your comments are too general to be actionable. I kindly request that you either provide more detail about the changes you request, or change your evaluation to APPROVED.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 12 Sep 2024

Alexander Alekseyenko, Biomedical informatics Center, Department of Public Health Sciences, Medical University of South Carolina, Charleston, 29403, USA

12 Sep 2024

Author Response

Thank you for taking the time to submit these points. I am happy to respond to any specific concerns. As they stand your comments are too general to be actionable. I ... Continue reading Thank you for taking the time to submit these points. I am happy to respond to any specific concerns. As they stand your comments are too general to be actionable. I kindly request that you either provide more detail about the changes you request, or change your evaluation to APPROVED.
Thank you for taking the time to submit these points. I am happy to respond to any specific concerns. As they stand your comments are too general to be actionable. I kindly request that you either provide more detail about the changes you request, or change your evaluation to APPROVED.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 02 Nov 2023

Md S. Warasi, Radford University, Radford, Virginia, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.138681.r206487

The author introduces a pooled testing algorithm, referred to as the 'online algorithm,' with the goal of enhancing disease screening efficiency and increasing testing capacity. The specimen pooling strategy and determination of the optimal pool size are clearly described, and the results are concisely presented. The study focuses on screening individuals for SARS-CoV-2. While the methodological contribution may not be ground breaking, the approach's simplicity and flexibility make it appealing. Overall, this is an interesting, well-written article with the potential for practical applications. Here, I present some comments and concerns.

Methods:

The article relies on the assumption of perfect test outcomes, which may limit its applicability to situations where highly sensitive and specific assays are available. However, it's crucial to acknowledge that pooled testing has been used in scenarios with less-than-perfect assay sensitivity and specificity. Therefore, addressing the handling of testing errors (false negatives and false positives) would broaden the article's scope and practicality.
The article should explicitly state the assumptions it makes, such as perfect sensitivity and specificity, to provide clarity to readers and highlight potential limitations.
It's essential to include information about the sensitivity and specificity of the assay used for SARS-CoV-2 testing. If these values are not perfect, as is often the case in practice, discussing how the algorithm accounts for this imperfection is important.
The article's flexibility in dynamically forming pools as specimens arrive is a significant strength. However, this flexibility relies on reliable estimates of disease probabilities using individual covariates and predictive models based on historical data. The author should emphasize the importance of accurate probability estimates to ensure the algorithm's effectiveness, possibly in the concluding sections.
The inclusion of a flowchart to summarize the pooling strategy is a positive aspect of the article, enhancing its clarity.
A related method in McMahan, Tebbs, and Bilder (2012, Biometrics, 68(1): 287–296¹) is not cited in the article. Including this citation would provide readers with a broader perspective on the topic.

Results:

The presented results appear to be reasonable, and the use of logistic regression for building a predictive model is a sound approach.
The comparison of the expected number of expended tests with Dorfman's testing is noted. However, it might be more informative and appropriate to compare the method with a more closely related approach, such as the one proposed by McMahan, Tebbs, and Bilder (2012)¹. This would provide a more relevant benchmark for readers to assess the method's performance.

Minor Comments: There are several typographical errors or areas needing clarification. Here are some from the first page:

"an recommendation" should be corrected to "a recommendation."
The phrase "this approach cab offers" appears unclear. Please clarify its meaning or correct the typographical error.
The sentence mentioning "resilient pooled testing" requires clarification to ensure readers understand the concept.

Other Comments:

The title of the article is a complete sentence, which is uncommon for a research article. Additionally, it is fairly long. The author may consider shortening and rephrasing the title.
A well-documented, user-friendly R function should be provided with the article for clinicians.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions drawn adequately supported by the results?

Yes

References

1. McMahan CS, Tebbs JM, Bilder CR: Informative Dorfman screening.Biometrics. 2012; 68 (1): 287-96 PubMed Abstract | Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Statistics, biostatistics

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 01 Nov 2023

Barathidasan R., ICMR-(NARFBR) National Animal Resource Facility for Biomedical Research, Hyderabad, India

Approved with Reservations

https://doi.org/10.5256/f1000research.138681.r206489

Summary: This article describes an online algorithm developed by the author for assigning samples to different pools based on the risk data available from patient health records. The algorithm is back-tested on patient samples collected for detecting COVID-19. Improvement in testing capacity has been reported in this article over conventional pooling strategies. The algorithm has been back-tested on samples received by a lab over a period of 17 days, during which the test positivity rate ranged from 0-7% (single outlier, 18% not included). Testing capacity enhancement for conventional pooling for the same data was 2.66 (5-sample pool) and 2.75 (6-sample pool); whereas, online algorithm-based pooling strategy had very marginally increased the testing capacity i.e. 2.82.

Methods & Source data: The health record parameters (i.e. age, symptoms, exposure history, other patient demographics) based on which the sample is assigned to a pool could have been described in detail. Classifications of pools i.e. high-risk pool, medium-risk pool, low-risk pool, and the basis for classification could have been included.

Though the capacity enhancement is very marginal for the given data set, the author's opinion can be agreed that improvement in the algorithm can improve the capacity enhancement.

Such algorithms implemented at the sample collection site can definitely reduce the workload on the laboratory in deciding the pool, assigning pool numbers to samples and testing.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Diagnostic virology, toxicologic pathology

CITE

Report a concern

Author Response 06 Sep 2024

Alexander Alekseyenko, Biomedical informatics Center, Department of Public Health Sciences, Medical University of South Carolina, Charleston, 29403, USA

06 Sep 2024

Author Response

Thank you for taking the time to review the manuscript. I am encouraged to see that you recognize the potential of improving pooled testing via our method. As we state ... Continue reading Thank you for taking the time to review the manuscript. I am encouraged to see that you recognize the potential of improving pooled testing via our method. As we state in our manuscript, we use an (almost) strawman predictive model individual risk to inform our pooling strategy. I have published with colleagues on better predictive models that can be incorporated into the same framework (doi: 10.1093/jamia/ocab186). Ultimately, I hope that this and other similar work will encourage the laboratory instrumentation manufacturers to improve automation to allow for implementation of variable pool size testing.
Thank you for taking the time to review the manuscript. I am encouraged to see that you recognize the potential of improving pooled testing via our method. As we state in our manuscript, we use an (almost) strawman predictive model individual risk to inform our pooling strategy. I have published with colleagues on better predictive models that can be incorporated into the same framework (doi: 10.1093/jamia/ocab186). Ultimately, I hope that this and other similar work will encourage the laboratory instrumentation manufacturers to improve automation to allow for implementation of variable pool size testing.
Competing Interests: None Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 06 Sep 2024

Alexander Alekseyenko, Biomedical informatics Center, Department of Public Health Sciences, Medical University of South Carolina, Charleston, 29403, USA

06 Sep 2024

Author Response

Thank you for taking the time to review the manuscript. I am encouraged to see that you recognize the potential of improving pooled testing via our method. As we state ... Continue reading Thank you for taking the time to review the manuscript. I am encouraged to see that you recognize the potential of improving pooled testing via our method. As we state in our manuscript, we use an (almost) strawman predictive model individual risk to inform our pooling strategy. I have published with colleagues on better predictive models that can be incorporated into the same framework (doi: 10.1093/jamia/ocab186). Ultimately, I hope that this and other similar work will encourage the laboratory instrumentation manufacturers to improve automation to allow for implementation of variable pool size testing.
Thank you for taking the time to review the manuscript. I am encouraged to see that you recognize the potential of improving pooled testing via our method. As we state in our manuscript, we use an (almost) strawman predictive model individual risk to inform our pooling strategy. I have published with colleagues on better predictive models that can be incorporated into the same framework (doi: 10.1093/jamia/ocab186). Ultimately, I hope that this and other similar work will encourage the laboratory instrumentation manufacturers to improve automation to allow for implementation of variable pool size testing.
Competing Interests: None Close
Report a concern

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 23 Jan 2023

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 1 23 Jan 23	read	read	read

Barathidasan R., ICMR-(NARFBR) National Animal Resource Facility for Biomedical Research, Hyderabad, India
Md S. Warasi, Radford University, Radford, USA
F. M. Javed Mehedi Shamrat, University of Malaya, Kuala Lumpur, Malaysia

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

7 Views

11 Sep 2024 | for Version 1

F. M. Javed Mehedi Shamrat, Computer System and Technology, University of Malaya, Kuala Lumpur, Malaysia

7 Views Cite this report Responses(1)

Approved With Reservations

The paper is effectively written and presents the novelty in a proficient manner.

The clarity of the methodological approach requires further elaboration.
It is suggested that the author incorporate more statistical analysis into the Results section to evaluate the findings.
The visual representations appear to be lacking in clarity, and it is recommended that the author provide high-resolution images.
The clarity of the work in the conclusion section could be enhanced.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Data Science, Machine Learning, Artificial Intelligence, Bioinformatics, Image Processing

Respond to this report

Responses (1)

Back to all reports

Reviewer Report

10 Views

02 Nov 2023 | for Version 1

Md S. Warasi, Radford University, Radford, Virginia, USA

10 Views Cite this report Responses(0)

Approved With Reservations

The article relies on the assumption of perfect test outcomes, which may limit its applicability to situations where highly sensitive and specific assays are available. However, it's crucial to acknowledge that pooled testing has been used in scenarios with less-than-perfect assay sensitivity and specificity. Therefore, addressing the handling of testing errors (false negatives and false positives) would broaden the article's scope and practicality.
The article should explicitly state the assumptions it makes, such as perfect sensitivity and specificity, to provide clarity to readers and highlight potential limitations.
It's essential to include information about the sensitivity and specificity of the assay used for SARS-CoV-2 testing. If these values are not perfect, as is often the case in practice, discussing how the algorithm accounts for this imperfection is important.
The article's flexibility in dynamically forming pools as specimens arrive is a significant strength. However, this flexibility relies on reliable estimates of disease probabilities using individual covariates and predictive models based on historical data. The author should emphasize the importance of accurate probability estimates to ensure the algorithm's effectiveness, possibly in the concluding sections.
The inclusion of a flowchart to summarize the pooling strategy is a positive aspect of the article, enhancing its clarity.
A related method in McMahan, Tebbs, and Bilder (2012, Biometrics, 68(1): 287–296¹) is not cited in the article. Including this citation would provide readers with a broader perspective on the topic.

Results:

The presented results appear to be reasonable, and the use of logistic regression for building a predictive model is a sound approach.
The comparison of the expected number of expended tests with Dorfman's testing is noted. However, it might be more informative and appropriate to compare the method with a more closely related approach, such as the one proposed by McMahan, Tebbs, and Bilder (2012)¹. This would provide a more relevant benchmark for readers to assess the method's performance.

Minor Comments: There are several typographical errors or areas needing clarification. Here are some from the first page:

"an recommendation" should be corrected to "a recommendation."
The phrase "this approach cab offers" appears unclear. Please clarify its meaning or correct the typographical error.
The sentence mentioning "resilient pooled testing" requires clarification to ensure readers understand the concept.

Other Comments:

The title of the article is a complete sentence, which is uncommon for a research article. Additionally, it is fairly long. The author may consider shortening and rephrasing the title.
A well-documented, user-friendly R function should be provided with the article for clinicians.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions drawn adequately supported by the results?

Yes

References

1. McMahan CS, Tebbs JM, Bilder CR: Informative Dorfman screening.Biometrics. 2012; 68 (1): 287-96 PubMed Abstract | Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Statistics, biostatistics

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

8 Views

01 Nov 2023 | for Version 1

Barathidasan R., ICMR-(NARFBR) National Animal Resource Facility for Biomedical Research, Hyderabad, India

8 Views Cite this report Responses(1)

Approved With Reservations

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Diagnostic virology, toxicologic pathology

Respond to this report

Responses (1)

Author Response

06 Sep 2024

Alexander Alekseyenko, Biomedical informatics Center, Department of Public Health Sciences, Medical University of South Carolina, Charleston, 29403, USA

Thank you for taking the time to review the manuscript. I am encouraged to see that you recognize the potential of improving pooled testing via our method. As we state in our manuscript, we use an (almost) strawman predictive model individual risk to inform our pooling strategy. I have published with colleagues on better predictive models that can be incorporated into the same framework (doi: 10.1093/jamia/ocab186). Ultimately, I hope that this and other similar work will encourage the laboratory instrumentation manufacturers to improve automation to allow for implementation of variable pool size testing.

View more View less

Competing Interests

None

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Cherif A, Grobe N, Wang X, et al.: Simulation of Pool Testing to Identify Patients With Coronavirus Disease 2019 Under Conditions of Limited Test Availability. JAMA Netw. Open. 2020; 3(6): e2013075. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Lohse S, Pfuhl T, Berko-Gottel B, et al.: Pooling of samples for testing for SARS-CoV-2 in asymptomatic people. Lancet Infect. Dis. 2020; 20: 1231–1232. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Yelin I, Aharony N, Shaer Tamar E, et al.: Evaluation of COVID-19 RT-qPCR test in multi-sample pools. Clin. Infect. Dis. 2020; 71: 2073–2078. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Abdalhamid B, Bilder CR, McCutchen EL, et al.: Assessment of Specimen Pooling to Conserve SARS CoV-2 Testing Resources. Am. J. Clin. Pathol. 2020; 153(6): 715–718. PubMed Abstract | Publisher Full Text | Free Full Text

[5] 5. Bilder CR: Group Testing for Identification. Wiley StatsRef: Statistics Reference Online.2019; pp. 1–11.

[6] 6. Xiong W, Lu H, Ding J: Determination of Varying Group Sizes for Pooling Procedure. Comput. Math. Methods Med. 2019; 2019: 4381084.

[7] 7. Obeid JS, Davis M, Turner M, et al.: An AI approach to COVID-19 infection risk assessment in virtual visits: a case report. J. Am. Med. Inform. Assoc. 2020; 27: 1321–1325. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Living BioBank:2020.Reference Source

[9] 9. Alekseyenko AV, Hamidi B, Faith TD, et al.: Each patient is a research biorepository: informatics-enabled research on surplus clinical specimens via the living BioBank. J. Am. Med. Inform. Assoc. 2021; 28(1): 138–143. PubMed Abstract | Publisher Full Text | Free Full Text

[10] 10. COVID-19 Pandemic Response, Laboratory Data Reporting: CARES Act Section 18115: Department of Health and Human Services.2020.Reference Source

[11] 11. Alekseyenko A:alekseyenko/AIIAT: For F1000 publication (v1.0). [Dataset]. Zenodo. 2023. Publisher Full Text

Online algorithm for assignment of specimens to pooled or individual testing using risk models provides a practical way to increase testing capacity

Abstract

Keywords

Introduction

Figure 1. Online greedy algorithm for pool assignment.

Methods

Pooled testing with individual risk estimates

Demonstration predictive model for individual risk estimation

Evaluation of adaptive pooling strategy

Results

The online pooling algorithm

A basic logistic regression model provides moderate predictive accuracy of individual risk

Table 1. Distribution of patient characteristics included in the predictive model.

Figure 2. A simple logistic regression model provides sensible estimates of positive rates.

The online algorithm results in superior capacity gains

Table 2. Evaluation of expected capacity increase from pooled testing using uniform and adaptively selected pools.

Discussion

Conclusions

Data availability

Acknowledgments

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated