ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Brief Report

Online algorithm for assignment of specimens to pooled or individual testing using risk models provides a practical way to increase testing capacity

[version 1; peer review: 3 approved with reservations]
PUBLISHED 23 Jan 2023
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background: To improve throughput in diagnostic and screening testing for infectious diseases, I developed a straight-forward algorithm that uses individual risk to optimize the decision about pooled or individual testing.
Methods: The online greedy algorithm provides an recommendation for filling pooled testing queue for optimal testing in pools of variable size. Observational data from Medical University of South Carolina COVID-19 diagnostic testing was used to estimate capacity gains under this algorithm versus optimal fixed pooling based on population prevalence.
Results: The online pooling recommendations based on this algorithm resulted in statistically better capacity gains than optimal pools of fixed size (P-value 0.003 and 0.002, for pools of 5 or 6, respectively). This is especially significant since the underlying individual-level risk prediction model attained only a moderate predictive accuracy.
Conclusions: This result suggests that when combined with a better risk prediction and integrated in an appropriate informatics ecosystem this approach cab offers an opportunity for resilient pooled testing strategies for pathogens while incorporating relevant operational constraints of pathology laboratories.

Keywords

pooled testing, individual-level risk, machine learning, COVID19

Introduction

The coronavirus disease 2019 (COVID-19) pandemic has placed a demand for massive, rapid, and accurate diagnostic testing. A number of reports recommend pooled testing to help increase testing capacity. For example, a recent report provides specific estimates of cost savings in a pooled testing setting.1 Biochemically, multiple reports demonstrate that RT-qPCR (reverse transcription quantitative real-time polymerase chain reaction) tests are amenable to pooled testing strategies.24

Pooled testing has a long legacy of quantitative methodologies and some practical implementation successes. A review of pooled testing can be found at Wiley StatsRef: Statistics Reference Online.5 Within pooled testing methodologies, hierarchical two-step approach is the oldest and the simplest. The approach involves splitting subjects to be tested in equal size groups (pools) and testing each pool first. If a group test result is negative, so is the entire group. If the group is positive, each individual in the group is tested individually. Many variations of this approach have been proposed over the years. The most relevant set of techniques uses individual-level risks in conjunction with pooled testing to determine appropriate pool sizes for more efficient testing. These methods are typically concerned with optimization for given a collection of specimens (e.g. Ref. 6). Unfortunately, such algorithms do not always fit the workflow of pathology and laboratory medicine. It is desirable to be able to make online (at the time of encounter) decisions to assign a specimen to a pool at a point the specimen is first received in the laboratory. Many labs have limited ability to manipulate and rearrange the specimens multiple times to establish optimal pools. Therefore, an online algorithm that provides an immediate recommendation about better pooling strategies for the specimens is of practical importance for successful implementation of pooled testing strategies. For this reason, fixed size pooling, which cannot account for individual-level risk even when available, is by far the most popular approach in practice.

Informatics and artificial intelligence tools have been mobilized to help with the pandemic by allowing for infection risk prediction at the individual level. These risk predictions can be leveraged workflows to prioritize valuable clinical resources.7 In this report I demonstrate that a greedy online algorithm for specimen assignment based on individual risk predictions can increase COVID-19 testing capacity in a way suitable for providing pooling recommendations for specimens as they come in for testing. Figure 1 presents and overview of this approach.

88ed8c71-b9cb-4c23-8ac7-62e8899b935d_figure1.gif

Figure 1. Online greedy algorithm for pool assignment.

A. The algorithm relies on predictive model informed risk that a given specimen to be tested is positive, pi. Population prevalence rates and arbitrary predictive models can be used based on the available predictors, such as basic demographics, risk factors, symptoms, natural language processing derived features from clinical notes, etc. The probabilities are used to group a stream of specimens to be tested into pools that will be tested together. Should a pool test negative, all of the specimens in the pool are recorded as negative, resulting in increased capacity for testing. For a positive pool, additional ascertainment of each individual specimen will be required for a final result. B. online algorithm makes a decision for any new specimen to either add it to a pool that is being formed or to end forming that pool and start a new one with this specimen. The decision is made based on expected capacity gain by using pooled testing calculated from the pi of the specimen. EHR: Electronic health record.

Methods

Pooled testing with individual risk estimates

Suppose the positivity rate is p among the k individuals to be tested. The probability that the pool of these individuals tests positive is Pk=11pk, which is one minus the probability that all subjects are negative. Two-step pooled testing requires individual retesting of everyone in a positive pool (Figure 1A). Thus, the expected number of tests is Ek=1+kPk. The capacity gain is the ratio of the number of subjects tested to the number of physical units of test performed, Gk=k/Ek. Capacity gains with pooled testing are achieved when a pool of specimens tests negative (Figure 1A).

Suppose individual (a priori) estimates of being positive are available for each individual. Given these estimates, p=pi1ik, the probability that a pool tests positive is Pkp=1i=1k1pi=Pk1p1pk+pk, and the expected capacity gain is Gkp=k1+nPkp. The key to deriving a greedy online algorithm is that both of these quantities, Pkp and Gkp, can be expressed as recurrence relationships, which depend on the quantities already computed for a pool of smaller size. This allows one to maintain an online estimate of the capacity gain of a collection of already processed specimens when making a decision about a new specimen.

Demonstration predictive model for individual risk estimation

The estimates of a priori individual risk have been obtained using logistic regression. The evaluation data has been collected incidental to a Medical University of South Carolina (MUSC) IRB (Pro00079660) approved study on the Living μBiome BankTM study8,9 of the microbiomes associated with infectious disease testing. The data consisted of COVID-19 test result (response) for the subjects with conclusive test result (“Positive” or “Negative”) for adult (age ≥18) subjects undergoing testing at MUSC Molecular Pathology Lab between March 12 and June 6, 2020 (32,851 cases in total). The design of this study was based on convenience sampling in a relatively short time interval. Cases obtained between May 21st and 6th, 2020 were not used for model fitting, and constituted the testing data. Predictors included subject age and indicator variables for whether the test is a (i) follow-up, (ii) immediately preceding test has been positive; (iii) the patient is hospitalized; (iv) hospital order location; and interaction term between age and (ii). Logistic regression model followed by stepwise backward feature elimination based on Akaike Information Criterion (AIC) was used for model selection in training data only. The performance of the model has been evaluated in both the training and the testing data separately. The analyses have been conducted using R statistical programming environment version 3.6.1.

Evaluation of adaptive pooling strategy

The number of physical tests needed and capacity gains of the online algorithm has been compared with optimal uniform fixed pool sizes for based on population prevalence rates.1 For observations in each day in the testing data, averages of 1,000 permutations of the order of the specimens provided for order-independent estimates of the number of tests. One-sided Wilcoxon signed rank sum test was used to evaluate the hypothesis that online recommendations resulted in less tests.

Results

The online pooling algorithm

The online pooling algorithm (Figure 1B) has been specifically designed to make pooling decisions about each specimen as it arrives for testing. The individual risk information and the estimates of the capacity gains from already processed specimens allows the algorithm to make a determination to add the specimen to the pool that is currently being filled or to close that pool and start a new one with the current specimen.

A basic logistic regression model provides moderate predictive accuracy of individual risk

The logistic regression model was meant to provide simplistic estimates of individual risk to demonstrate the feasibility and utility of the approach. The variables included in the data showed statistically significant differences across the training and testing data (Table 1), indicating the potential for suboptimal predictive performance.11 The predictive model provides for a moderate predictive accuracy with 0.62 area under receiver operating characteristic curve estimate in testing data.

Table 1. Distribution of patient characteristics included in the predictive model.

Variable (%)Overall (32,851)Data subset (n)Difference in training vs. testing data, χ2 test, P value (degrees of freedom)Included in the final predictive model
Training (25,714)Testing (7,137)
Positive4.684.714.550.59 (1)Response
Repeat visit test6.075.099.58<10-16 (1)Yes
Previously tested positive0.880.850.990.27 (1)Yes
Hospitalized1313120.066 (1)Yes
Tests ordered from hospitala141513<10-5 (1)Nob
Age group (years)Yes
19–40282925<10-12 (2)
40–70535254
>70191921
Age group (years) within subjects previously tested positiveYesc
Total (n)28921871
19 – 402723380.0025 (2)
40-70434825
>70302837

a The remaining tests have been ordered from ambulatory or community locations.

b Dropped from the model based on Akaike Information Criteria (AIC) in backward elimination step.

c Included as interaction term of age and indicator of a previous positive test.

Parameter estimates and other details of the model fit are shown in Figure 2. The model demonstrates that age, hospitalization status and whether the individual has been previously tested and/or tested positive are all good high level predictors of risk of positive test. The model is plagued by the imbalance of low and high risk patients, indicative of the relatively low population-level risk. Nonetheless, the predicted and empirical risk seem to correlate well, albeit with large variability in the high risk group.

88ed8c71-b9cb-4c23-8ac7-62e8899b935d_figure2.gif

Figure 2. A simple logistic regression model provides sensible estimates of positive rates.

A. R generalized linear regression function call and output following backwards stepwise elimination is illustrated. The features included in the final model were an indicator of whether this was a follow up test (Follow_up), an indicator of whether the individual had tested positive at any point previously (Previous_positive), age group (18–40, 40–70, >70), and indicator of whether the individual is hospitalized. An interaction of age and previous positivity is likewise retained in the model. B. Model performance evaluation included comparison of the empirical and predicted probabilities for groups of individuals with matching predictor values (follow up testing indicator, previous positivity, age, and hospitalization status). The model shows good concordance between the empirical and predicted risk in the lower risk range (inset), and large variability within the higher risk groups.

The online algorithm results in superior capacity gains

As is already known from the recent literature, two-step pooling can provide capacity gains over testing everyone individually. This is also demonstrated in our testing data using fixed pools of 5 or 6 specimens (Table 2). These fixed pool sizes have been chosen for comparison because of their optimality given prevalence rates. The evaluation of the online algorithm shows that the implementation of this approach may result in doubling of the testing capacity over testing individually (Table 2). Moreover, on 12 out of 17 days in the testing data the online approach resulted in less tests than fixed pool sizes. These differences were statistically significant for both fixed pool sizes (P value 0.003 and 0.002, respectively).

Table 2. Evaluation of expected capacity increase from pooled testing using uniform and adaptively selected pools.

Total individual testsaNumber of positive subjectsExpected number of tests using alternative pooling strategiesbExpected capacity increase by pooling strategyc
Pools of 5Pools of 6OnlinePools of 5Pools of 6Online
53415178.3173.4167.02.993.083.20
1,12549449.4452.5433.02.502.492.60
250783.481.773.53.003.063.40
38733.335.331.41.141.081.21
68014.012.015.04.865.674.55
55417191.3188.0175.42.902.953.16
82628296.9292.8297.22.782.822.78
38920169.0171.7157.82.302.272.46
83647378.5386.2374.92.212.162.23
223673.772.168.93.033.093.24
1719.09.05.71.891.892.98
81539.640.541.52.042.001.95
39821174.8178.3170.62.282.232.33
71836307.7311.1302.12.332.312.38
39224185.4190.4191.22.112.062.05
61942307.7318.3304.82.011.942.03
69014.012.012.84.935.755.41

a Each row represents an individual day in the training data.

b Results based on 1,000 random permutations of the specimen input order.

c Highest capacity increase strategy for each day is shown in bold.

Discussion

The online nature of the presented algorithm allows for its easy implementation in many existing laboratory medicine workflows. For example, it may be used to provide pooling recommendations as the specimens are scanned upon receipt in the lab for testing. This feature is unique and important for a feasible and practical solution that fits the existing laboratory medicine workflow. Alternative approaches that optimize the pools globally for a collection of specimens (e.g. Ref. 6) may offer better performance in terms of capacity gains, but require additional manipulation of the specimens to form the pools, which may be feasible in some, but not all workflows. Implementations of these global optimizing approaches may be feasible when pool assignments can happen off-line, for example during transport of a batch of specimens from a collection site to a testing facility. With that respect, the online approach offers simplicity and appeal for laboratory management that is traded for potential global suboptimality.

The exact implementation of online pooling approach may need to meet specific operational constraints to be practical. For example, some laboratories may only be capable of testing in pools of fixed maximum size. These constraints can be naturally incorporated into modified online pooling algorithms. More sophisticated versions of the algorithms are easy to imagine as well. For example, a parallel fulfilment of multiple pools simultaneously can be accommodated in a straightforward extension.

Institutional implementation of any pooled testing approach that utilizes individual-level risks requires solutions to many informatics ecosystem problems. First, the models providing individual-level risk predictions need to be updated frequently to account for changes in prevalence by risk factors, and other contributors to model drift. In practice, a nightly model fit update may be feasible and necessary. Second, the model predictions need to be triggered at an appropriate time between specimen collection and the time it arrives into a laboratory for testing. Third, pooling recommendations have to be either pre-computed or involve only lightweight computations. In either case these recommendations have to be easily available in the laboratory information system. Combined these challenges point to likely requirement of high intra-institutional cooperation between laboratory medicine, analytics, data science, and informatics operations.

The testing capacity increases by pooled testing rely on the quality of the predictive models. When predictors are not available population prevalence rates can be input into the online algorithm, and the resulting two-step groupings will be equivalent to optimal pooling into pools of fixed size. In this paper, the evaluation of the algorithm involved clearly suboptimal sets of predictors and risk prediction approach (multivariable logistic regression). Nonetheless, the online algorithm provides an improvement over simple two-step pooling. Better predictive models will result in even larger capacity gains. Improved model predictivity could result from employing the data that is readily available or computable at the time of diagnostic specimen collection. For example, many of the data elements from the Health and Human Services guidance on laboratory reporting10 could be included. Other structured data, such as questionnaires collecting evidence and degree of exposure, and telehealth-derived variables can prove useful as well. Likewise, unstructured text data processed by natural language processing techniques can be useful.7 Further, more sophisticated machine learning and artificial intelligence approaches may be used to combine all of the available data sources for superior risk estimates in online pooling recommendations.

Conclusions

The results reported herein are immediately translatable to laboratory medicine operations. Even without sophisticated predictive models, given the current state of the pandemic the online pooling algorithm can double the COVID-19 testing capacity. Better risk prediction models may result in even better capacity improvements. In the longer term, similar strategies can be used for implementation of massive scale testing for other diseases.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 23 Jan 2023
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Alekseyenko AV. Online algorithm for assignment of specimens to pooled or individual testing using risk models provides a practical way to increase testing capacity [version 1; peer review: 3 approved with reservations]. F1000Research 2023, 12:85 (https://doi.org/10.12688/f1000research.126285.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 23 Jan 2023
Views
7
Cite
Reviewer Report 11 Sep 2024
F. M. Javed Mehedi Shamrat, Computer System and Technology, University of Malaya, Kuala Lumpur, Malaysia 
Approved with Reservations
VIEWS 7
The paper is effectively written and presents the novelty in a proficient manner.
  1. The clarity of the methodological approach requires further elaboration.
     
  2. It is suggested that the author incorporate more statistical
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Mehedi Shamrat FMJ. Reviewer Report For: Online algorithm for assignment of specimens to pooled or individual testing using risk models provides a practical way to increase testing capacity [version 1; peer review: 3 approved with reservations]. F1000Research 2023, 12:85 (https://doi.org/10.5256/f1000research.138681.r176925)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 12 Sep 2024
    Alexander Alekseyenko, Biomedical informatics Center, Department of Public Health Sciences, Medical University of South Carolina, Charleston, 29403, USA
    12 Sep 2024
    Author Response
    Thank you for taking the time to submit these points. I am happy to respond to any specific concerns. As they stand your comments are too general to be actionable. I ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 12 Sep 2024
    Alexander Alekseyenko, Biomedical informatics Center, Department of Public Health Sciences, Medical University of South Carolina, Charleston, 29403, USA
    12 Sep 2024
    Author Response
    Thank you for taking the time to submit these points. I am happy to respond to any specific concerns. As they stand your comments are too general to be actionable. I ... Continue reading
Views
10
Cite
Reviewer Report 02 Nov 2023
Md S. Warasi, Radford University, Radford, Virginia, USA 
Approved with Reservations
VIEWS 10
The author introduces a pooled testing algorithm, referred to as the 'online algorithm,' with the goal of enhancing disease screening efficiency and increasing testing capacity. The specimen pooling strategy and determination of the optimal pool size are clearly described, and ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Warasi MS. Reviewer Report For: Online algorithm for assignment of specimens to pooled or individual testing using risk models provides a practical way to increase testing capacity [version 1; peer review: 3 approved with reservations]. F1000Research 2023, 12:85 (https://doi.org/10.5256/f1000research.138681.r206487)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
8
Cite
Reviewer Report 01 Nov 2023
Barathidasan R., ICMR-(NARFBR) National Animal Resource Facility for Biomedical Research, Hyderabad, India 
Approved with Reservations
VIEWS 8
Summary: This article describes an online algorithm developed by the author for assigning samples to different pools based on the risk data available from patient health records. The algorithm is back-tested on patient samples collected for detecting COVID-19. Improvement in ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
R. B. Reviewer Report For: Online algorithm for assignment of specimens to pooled or individual testing using risk models provides a practical way to increase testing capacity [version 1; peer review: 3 approved with reservations]. F1000Research 2023, 12:85 (https://doi.org/10.5256/f1000research.138681.r206489)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 06 Sep 2024
    Alexander Alekseyenko, Biomedical informatics Center, Department of Public Health Sciences, Medical University of South Carolina, Charleston, 29403, USA
    06 Sep 2024
    Author Response
    Thank you for taking the time to review the manuscript. I am encouraged to see that you recognize the potential of improving pooled testing via our method. As we state ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 06 Sep 2024
    Alexander Alekseyenko, Biomedical informatics Center, Department of Public Health Sciences, Medical University of South Carolina, Charleston, 29403, USA
    06 Sep 2024
    Author Response
    Thank you for taking the time to review the manuscript. I am encouraged to see that you recognize the potential of improving pooled testing via our method. As we state ... Continue reading

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 23 Jan 2023
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.