Evaluation of predicted Medfly ( Ceratitis capitata ) quarantine length in the United States utilizing degree-day and agent-based models [version 2; peer review: 3 approved]

Invasions by pest insects pose a significant threat to agriculture worldwide. In the case of Ceratitis capitata incursions on the US mainland, where it is not officially established, repeated detections are followed by quarantines and treatments to eliminate the invading population. However, it is difficult to accurately set quarantine duration because non-detection may not mean the pest is eliminated. Most programs extend quarantine lengths past the last fly detection by calculating the amount of time required for 3 generations to elapse under a thermal unit accumulation development model (“degree day”). A newer approach is to use an Agent-Based Simulation (ABS) to explicitly simulate population demographics and elimination. Here, predicted quarantine lengths for 11 sites in the continental United States are evaluated using both approaches. Results indicate a strong seasonality in quarantine length, with longer predictions in the second half of the year compared with the first; this pattern is more extreme in degree day predictions compared with ABS. Geographically, quarantine lengths increased with latitude, though this was less pronounced under the ABS. Variation in quarantine lengths for particular times and places was dramatically larger for degree day than ABS, generally spiking in the middle of the year for degree day and peaking in second half of the year for ABS. Analysis of 34 C. capitata quarantines from 1975 to 2017 in California shows that, for all but two, quarantines were started in the second half of the year, when degree day quarantine lengths are longest and have the highest uncertainty. For I thoroughly support the intent of this paper: to bring additional accuracy and rigour to biosecurity decision making. Specifically, the question of “when is eradication achieved” is an interesting and important one. It is not easy to address, however, but this paper makes an important contribution. the This paper describes a well-designed comparison between an existing “rule of thumb” approach and a more biologically savvy modelling approach. The strengths and weaknesses of each method are alluded to and the reasons for misalignment explored. The authors suggest a balanced approach to utilise the strengths and minimise the weaknesses of each. On the whole, I support the study and its findings, with two caveats. agent-based Critical information on the temperature relationships simulated in computer I am loathe search source find it. Without these details I cannot make a confident The article explains the significance of using two different simulation models to calculate quarantine length after repeated pest invasions, particularly from Ceratitis capitate . Quarantine length is a key issue: non- detection doesn’t mean the pest invasion has been eliminated. Therefore, authors have selected eleven sites in the USA to compare quarantine lengths estimated with the usual simulation model –called degree-day model- and a newer one –called Agent-based simulation (ABS) in order to compare both. The first approach calculates quarantine lengths taking into account the time needed to pass 3 pest generations and a thermal approach. The second approach calculates the lengths considering population and elimination. I think it’s a very interesting paper showing how important is the approach selected in order to determine quarantine lengths. The comparison of both approaches gives some useful information about which approach could suit better in function of season, latitude and longitude etc. When both methods are combined, managers can select the best aspects of each one to optimize quarantine lengths. As authors remark at the end of Discussion, using this combination, vital information for planning is provided to managers and affected parts.


Introduction
Invasions by insects, pathogens and pests are increasingly a defining challenge of the 21 st century, facilitated by global connectivity, climatic shifts, and other factors 1,2 , with a particularly severe impact on agriculture 3 . Invasions by insects that do not become established have a lower public profile than those that are "successful" from the point of view of the insect. However, there is a greater chance that cases of invasion followed by elimination will be detected and studied when the invading species is of environmental, human health, or economic concern 4 . Eradicating local populations of such insects can be desirable and feasible 5,6 depending on several factors.
One factor determining the feasibility of elimination is if the new environment is only marginally or seasonally suitable to the invading insect, facilitating its eradication. Another is when the high cost of allowing establishment leads to extensive efforts for eradication. The invasion of the malaria mosquito species Anopheles gambiae into Northeastern Brazil in the 1930's 7 is one example of an invasive insect that was successfully eradicated primarily due to the second of these factors 8,9 . In the case of An. gambiae there have been no reports of reinvasion, but there are examples of insects that recurrently invade areas outside their native range and are recurrently eliminated within relatively few generations. The Gypsy moth Lymantria dispar in Canada 10 is one such species. Arguably, another example is the screwworm Cochlyomyia hominivorax along the current northernmost edge of its range in Panama 11 and more recently in Florida 12 .
One of the most important instances of repeated invasion and elimination by an economically important insect pest is that of the Mediterranean fruit fly Ceratitis capitata (Wiedemann) (Medfly) in California. The last four decades have seen a repeated pattern of invasion, detection, and response interspersed by periods of no detections 13,14 . While it has been suggested that this pattern is the result of cryptic establishment 15 , the majority view is that Medfly in California is an example of a "metain-vasion", consisting of multiple sequential or overlapping introductions 16 and repeated eradication 17 . Still other researchers point to the possibility of different situations in different regions of the state 18,19 . Medfly is occasionally found in other parts of the mainland US such as Florida 20 , and in other countries or areas that are considered free of the pest including Eastern Australia, Mexico and Chile 21 .
The response plan to Medfly in California and the other "free" regions mentioned above is extensive and costly, including a quarantine when detections exceed an established standard (more than a male or unmated female fly is detected) 22 . Perfect pest surveillance efforts could determine exactly when eradication has been achieved. However, actual surveillance has a density threshold below which it is increasingly probable that a population is undetected. A practical and important problem is how long to maintain the countermeasures and quarantine after flies are no longer detected. Predicting the likely duration of this 'post last detection' quarantine period (hereafter just called quarantine length) would help with management decision-making and planning, and could allow potential cost savings by having sufficient but not excessive resources available.
Currently, most programs extend quarantine periods past when the last fly is found, by calculating the amount of time required for a given number of generations (usually but not always three) to elapse under a thermal unit accumulation ("degree day") physiological development model. Degree day based quarantine lengths have been codified in some legal regulations, including United States Federal code 23 , California 24 , and Florida. However, the procedure prescribed only defines when the end of a quarantine period has been reached after the fact. Additionally, the efficacy of pest surveillance efforts should factor into quarantine length, but that is beyond the scope of this paper.
For planning and resource allocation, policy makers and managers typically attempt to predict the quarantine lengths by using normal temperatures for forward projection. Although it frequently works fairly well, this approach is mathematically flawed and also provides no indication as to the variance or uncertainty of those predictions. Even a more rigorous treatment of degree day based values from historical temperature data can still produce highly variable results depending on relatively small changes in temperatures or details of the model formulation 25 , in addition to neglecting important aspects of the biology.
Recently, another approach to determining effective quarantine durations against Medfly via Agent-Based Simulations (ABS) 26 was introduced. The MED-FOES system simulates a population of individual Medflies under inundative sterile insect technique (SIT) and other controls, explicitly modeling elimination as opposed to the degree day approach, which

Amendments from Version 1
In response to referee suggestions, we have: • Corrected an important typo in the degree-day model base temperature parameter.
• Added a new supplementary table (Table S1) reporting the parameters used by the MED-FOES simulation, including developmental parameter ranges. The old Table S1 has been  re-named Table S2.
• Expanded and clarified the description of MED-FOES parameters as well as the meaning of the threshold used to define ABS PQL.
• Updated the Introduction to mention the importance of surveillance, but clarified that is beyond the scope of this paper.
• Expanded the explanation of why the sites used here were selected.
• Added a few new relevant references.
• Clarified some of the prose in the Discussion.
• Fixed the typos the reviewers helpfully pointed out.

See referee reports
REVISED only determines the time for a specific number of generations to elapse to estimate quarantine duration. MED-FOES also allows for the sampling of parameter space (temperature dependent mortality for each stage, fecundity, etc.), producing a distribution of possible outcomes. While an ABS can be arbitrarily complex, MED-FOES is parameterized in such a way that it can model a 'typical' or hypothetical outbreak from only hourly temperature data, and is therefore similar to degree day methods in its input data requirements. It is also possible to vary the initial population to model a specific outbreak.
In this paper, predicted quarantine length (PQL) for 11 sites in the continental United States were analyzed ( Figure 1 and Table 1) based on both the standard thermal accumulation degree day method 27 as well as the MED-FOES ABS 28 . Seasonal variation dominates quarantine duration, so we aggregated the PQL values for each day of the year (Jan. 1, Jan. 2, etc.) across a large number of years (65 for most locations) to produce normals. This approach enables comparison of the standard degree day method to the ABS, but more importantly provides insight into seasonal and spatial variations, prediction uncertainties, and model reliability.

Methods
Sites and temperature data Hourly air temperature data for 11 sites was downloaded from NOAA's publicly available Integrated Surface Database (ISD) dataset 29,30 .
The airport sites shown in Figure 1 were chosen for their biological relevance and availability of high quality hourly data over a long time frame. Models indicate that these sites are in regions suitable for Medfly 31,32 . Many of the sites experienced outbreaks in their vicinity the past and are of current concern. Additionally, they cover a range of conditions latitudinally as well as the California sites varying from coastal to more arid inland locations.
Sites are referred to here by the last three letters of the callsign shown in Table 1 Data was fetched and parsed using the Fetching and parsing ISH.ipynb * program. Records for the same station callsign were merged, since identification, format, and precise location of stations has changed over time. The data was then cleaned using the Cleaning temperatures.ipynb * by removing outliers, identifying large gaps (> 3 hours), resampling to every hour on the hour using linear interpolation, and filling the large gaps using day-over-day linear interpolation (interpolating using values for the same hour of day from previous and following days). The resulting temperature datasets are available * .

Degree-day calculation
Degree-days were computed by the single-sine method 27 , using a base development temperature of 12.39°C (54.3°F) and 345.56 degree-days Celsius (DDc; 622 DDf) per generation  Table 1.  24,33 . Since hourly temperature data are available, we also calculated degree-days by simple summation for comparison 25 . For each date, the number of days required for 3 generations of degree-day based life cycles was computed. These calculations are implemented in Temperature functions.ipynb * .
Agent-based simulations: MED-FOES MED-FOES 26,28 is an agent-based simulation explicitly modeling the eradication of a population of Medflies under inundative sterile male releases (sterile insect technique or SIT) and other interventions, such as increased trapping and foliar sprays. A MED-FOES simulation models a single nonspatial population, starting from a given population size and age distribution, tracking the number of individuals through time until the last fly (Agent) dies and the population is eliminated. In addition to hourly temperatures, simulation parameters include: the initial population, additional mortality induced by control efforts, the effectiveness of SIT, and a large number of biological parameters for which ranges are known from the literature including temperature-dependent development and mortality. The simulations were performed using the same hourly time series of temperature values used for degreeday calculations.
Due to the fact that many of the parameters are only known to within a range, 2500 individual MED-FOES simulations were run for each start date at each site, evenly sampling different regions of parameter-space via the Latin Hypercube Sampling 34 procedure. This set of simulations, encompassing a range of possible elimination outcomes, is referred to as a 'run'. For example, each run include simulations with the initial number of adult females in the population ranging from 33 to 100, but the initial population age distribution was the same for all simulations. Initial population numbers were chosen as a "standard outbreak" based on seven real outbreaks modeled previously 26 . LHS ranges for the probability of loss of reproduction due innundative SIT releases (0.5 to 1 chance per day) and additional human induced mortality from control efforts (0.05 to 0.15 per day) were chosen based on estimates of a typical California intervention 26 . The full list of parameters used and their values is provided in Supplementary Table S1. The number of days from the start date required for 95% of the simulations in a run to be eliminated is taken as a conservative prediction of needed quarantine length and referred to as ABS PQL.
It is important to note that the 95% threshold for ABS PQL does not mean that there is a 95% chance a given outbreak will be eliminated. Instead, it refers to 95% of the LHC sampled points in parameter space reaching eradication by a given time. Despite the fact that we only know most of those parameters to within a range, it is almost certainly true that extreme values are less probable than mid-range values, and even more improbable that combinations of extreme values (for example: low mortalities and high fecundity) which lead to long eradication times will be as frequent as the uniform sampling the LHC procedure produces. Therefore, the 95% threshold used here is expected to be quite conservative.
Varying the start date for different simulations was achieved by simply starting at different points in the input temperature file; for this study a run was started every 7 days over the range of dates available for each site. Each set of runs for a single site over a range of starting dates is referred to as a 'runset'. All runsets were conducted with the same input parameters aside from temperature. The 7 day interval ABS PQL values were upsampled to daily values using linear interpolation to allow day-of-year aggregations across years and comparisons with daily degree day based PQLs.
MED-FOES version 0.6.2 was run under Open Grid Scheduler/Grid Engine 2011.11 on a CentOS 6.6 HPC cluster. The MED-FOES code, configuration files, helper scripts, and raw results are available * . Overall, we created 11 runsets (one for each site). Each runset contained runs starting every 7 days over the input temperature data range for that site, and each run contained 2500 individual simulations sampling different regions of biologically plausible parameter space. This sums to a total of approximately 86×10 6 simulations.

Statistical analysis
The main results reported here are 'normals' in a meteorological sense of the term, but without the typical running mean smoothing which would complicate interpretation. For a variable of interest (eg. temperature or PQL), all values for the same calendar day irrespective of year (eg. 20-July) are aggregated, and summary statistics such as mean, minimum, maximum, and standard deviation are computed for each aggregation. The results reported here are the normals of PQL, computed using the full temperature time series as opposed to computing PQL from the normal of the temperature time series. While the latter is fairly common practice, it is not mathematically proper since, as with means, the normal of a function of X is not generally equal to the function applied to the normal of X. Additionally, by computing the normals of the predicted quarantine durations, we can investigate properties of the distribution of values as shown in Figure 3 and   Figure 2 shows the mean of the normal PQL based on 3 generation degree day accumulation and MED-FOES 95% elimination along with the minimum and maximum of the normals for temperatures. Figure 3 and Figure 4 show the standard deviations (σ) of the normals for the degree day and ABS based PQL.   There is significant variation in PQL across both time and location. The temporal variation in PQL is dominated by a yearly cycle, characterized by the normal values shown in Figure 2. Table 2 shows the percentage of variance in quarantine length predictions captured by the mean of the normal yearly cycle (R 2 ) for each site. At all but one site, greater than 75% of the variance in both degree day and ABS based PQLs is accounted for by the mean normal, and the majority exceed 90%. SFO is an exception to this common trend, with the mean normal accounting for only 9.1% of the variation in degree day based PQL and 28.0% of the ABS based PQL. This is also reflected in supplementary figure S2 and supplementary figure S3.

Results
Seasonal dependence Seasonal variation, evidenced by the general shape of the curves shown in Figure 2, is doubtless familiar to anyone engaged in Medfly pest management. Outbreaks starting in the late summer, autumn, or early winter will extend through relatively cold periods, when thermal dependent development will be slow and therefore extend the duration of quarantine required for 3 generations of degree days to accumulate (referred to as DD PQL hereafter). Similarly, outbreaks starting in the spring or early summer often lead to short quarantines due to the relatively high temperatures.
This familiar pattern is also seen in the ABS PQLs despite it being quite different in nature from simple degree day accumulation. However, the ABS predictions show a smaller seasonal swing. The ABS generally produces a smaller overall range of PQLs, with longer quarantines than DD PQL for spring and early summer outbreaks, and shorter quarantines for late summer through early winter in almost all cases.
A particular feature of interest, shown most dramatically at FAT in Figure 2, is that ABS PQL often flattens out or even dips for quarantines starting in the late autumn or early winter. This can be due to relatively rare and brief cold-snaps, normally lasting only a few hours, which increase mortality. ence in DD PQL between LAX and BUR is normally about a month (overall median=35 days; overall 25% & 75% quantiles are 28 & 45 days), but the median difference of the normal exceeds 75 days in August with some PQL differences up to 142 days. Differences in ABS PQLs are more seasonally stable, with the LAX minus BUR difference not exceeding 42 days for any start date in the 43 years analyzed here. Figure 3 and Figure 4 report the standard deviation (σ) of the normal for DD PQL and the MED-FOES ABS PQL respectively. These indicate the year to year variability of the PQL for outbreaks starting at a given time of the year and can be used to gauge the uncertainty of predictions based on past PQLs relative to the actual quarantine length which will be required. Similar information is represented by the interquartile ranges shown in Figure 5  Excluding SFO, the mean normal is a good predictor of DD PQL with σ values below 20 days except for the late summer and early autumn, where variance increases due to quarantines extending through the cold season. FAT and, to a lesser extent, RIV show this increase more dramatically, presumably due to their more arid/inland climates where both daily and seasonal temperature ranges are larger (also see Figure 2). The standard deviation generally decreases with decreasing latitude, together with reduced means. The standard deviation in DD PQL for SFO shows an inversion of the seasonal trend other sites exhibit. This is due to the colder temperatures leading to extremely long DD PQLs, frequently extending across two winter seasons.

Variance and uncertainty
The standard deviations of the ABS PQL normals shown in Figure 4 are generally about 1/2 as large as for DD PQL. This indicates that the ABS PQL not only shows less dramatic seasonal swings, but is also produces more consistent predictions across years. Values again generally decrease with latitude, but less consistently than DD PQL σ of normals. Also, unlike with the DD PQL, the results for SFO appear consistent with other sites.
A notable feature is that BUR, LAX, and SAN all show an increase in the year to year variation in ABS PQLs starting in July and extending through November, while that increase for all other sites starts in July or August but extends to January or February. Additionally, results for FAT show a sharp increase in uncertainty starting in September, fitting with the more arid/inland climate. RIV shows a significant but more gradual increase.

Historical quarantines
Thirty-four Medfly quarantines in CA dating from 1975 to early 2017 were analyzed (supplementary table S2). The Since DD PQL does not account for mortality, it misses the effect of cold-snaps entirely. This effect is most clearly seen at more northern and inland sites where cold-snaps are more likely: particularly FAT and RIV, but also BUR, LAX, JAX, and IAH.
Geographic dependence PQL generally shows a positive correlation with latitude, and sites are ordered by latitude in the figures and tables here. As seen in Figure 2, higher latitude sites tend to have longer PQLs as well as larger seasonal swings for both degree day and ABS based predictions. Figure 5 shows the relationship between PQL and latitude. An ordinary least squares fit to the median PQL at each site shows a significant slope for both DD PQL (F =14.08, p=0.005) and ABS PQL (F =10.55, p=0.010), but the degree day based predictions are more sensitive to latitude than the ABS (coefficients of 17.39 and 4.78 respectively). Additionally, the ABS predictions are more stable for SFO, and to a lesser extent FAT, where the degree day model for Medfly produced PQLs that appear either unrealistically long (SFO) or are subject to rapid and extreme seasonal variation in the mid year (FAT).
In addition to the variation associated with latitude, large differences in PQLs computed for the same start date can exist between even relatively nearby sites. For example, the differences in both degree day and ABS PQLs for the three sites in the Los Angeles region (LAX, BUR, RIV) (shown in the supplementary figure S4) display a strong seasonal component with a spike in July and/or August. The differ- start of all but two of these quarantines was in the latter half of the year (July through December), when DD PQLs are typically relatively long, with 68% (23/34) occurring in September through October, when DD PQLs are longest. August, the month where uncertainty in DD PQL often spikes (see Figure 3), accounts for 30% (7/34) of historic quarantines.
For each historic quarantine start date, the DD PQL and ABS PQL for the closest of the 11 sites analyzed above (see Figure 1 and Table 1) to the actual outbreak location was determined (see supplementary table S2). For this set of hypothetical quarantines, the ABS produced significantly shorter quarantines (mean=169.7 days, σ=21.8 days) than simple 3 generation degree day accumulation (mean=234.2 days, σ=79.2 days) (df =33, t=6.01, p<10 −5 ). Additionally, the variance in the difference between quarantine lengths using a specific date and the mean of the normal PQL for that day of year was smaller for the ABS (σ=8.2 days) than with degree day (σ=25.9 days) (df =33, F =9.92, p<10 −8 ).

Discussion
The principal contributions of this work can be broken down into three categories: 1) Comparison of PQLs as determined by the degree day and ABS methods.

2) Variation in average PQLs across time of year and space; and 3) Variation in PQLs within a time of year and location.
Consideration of all three of these by program managers, planners and other decision makers is likely to improve management of Medfly incursions by informing resource allocation ahead of outbreaks, reducing quarantine costs in some cases, and reducing risk from premature quarantine suspension in others. The results presented cover most of the latitudinal range of Medfly suitability within the United States, as well as many sites of probable introduction, and will hopefully find use as a general guide. Eradication models are extremely difficult to test for accuracy given the impracticality of experimental introductions and the sparse and idiosyncratic nature of historic outbreaks. However, analyzing the timing and locations of historic outbreaks suggests that quarantine lengths would generally be more consistent and shorter on average in California if estimated by ABS compared with degree day.
Requiring a fixed number of generations (typically 3) of degree days to pass is a "tried and true" method, but not explicitly an extirpation model. It may overestimate required quarantine length through cold weather 26 and may underestimate length when growth conditions are very favorable, which somewhat paradoxically leads to shorter degree day based quarantine periods after the last fly detection since generation times are shorter. However, the simplicity of the degree day calculation is a point in its favor, together with its record of generally avoiding subsequent detections after eradication measures and quarantine establishment 21 .
ABS results may be used to inform and modulate responses and treatments such as delimination trapping, fruit sampling, and eradication measures which are under the some discretion of managers. In situations where DD PQL greatly exceed those from the ABS, it is likely that degree day is missing important effects, such as cold snaps, which may justify shortening quarantine periods. On the other hand, in cases where the ABS predicts longer times to elimination the degree day indicated quarantine may be unusually short, so treatments and SIT releases should be conducted more aggressively than normal to ensure eradication is achieved within the perscribed degree day based quarantine.
A few specific results arising from overall comparisons of different locations are worth highlighting. In general, DD PQLs for Medfly generated from San Francisco International Airport temperature data are almost certainly too long for the entire year. The ABS PQLs are flatter and seem more realistic at around 200 days for San Francisco compared with the 400-550 days of DD PQLs. For several other California locations (typified by Fresno and Riverside) DD PQLs are in close alignment with those from the ABS for the first half of the year but go significantly longer in the cooler months. For three of the four Florida locations analyzed, DD PQLs are significantly shorter than the ABS results (Miami, Tampa, and Orlando). The extent of the difference in those Florida locations is smaller in the later months of the year, but the generality of this pattern suggests that the margin of safety for quarantines as calculated by degree day in those locations may be smaller than expected.
There is significant variation in PQL depending on the location of the outbreak, with the extremes in our study sites represented by Miami and San Francisco. These geographic results could be compared to previous efforts to model climatic suitability of different parts of the US. One of the early studies on the subject focused on Medfly found higher climatic suitability in Florida locations (Fort Pierce and Orlando) compared with California sites 36 . Within California, however, those authors found a higher number of suitable months in coastal areas such as Oceanside compared with Riverside and Fresno, roughly paralleling our findings (compare Los Angeles or San Diego with Fresno or Riverside). A more recent analysis of climatic suitability likewise concludes that coastal S. California is the most favorable area of the state for Medfly, but favorability drops inland in the south due to desert conditions. Suitability in central and northern California is limited by cold temperatures and freezes 31 .
An important aspect of ABS PQLs is variation within particular times of years and locations. Rare events like cold snaps can increase mortality in the ABS, and thereby lead to shorter PQLs than expected based on historical averages, or DD PQLs. The specificity of the ABS is helpful for determining when quarantines might be safely suspended due to such a rare event not be captured by the degree day model. For cold temperatures especially there can be a significant difference in PQLs: The degree day model includes only development, which is halted at low temperatures, extending quarantine lengths. The ABS, however, also includes mortality for generating PQLs, which means that low temperatures can significantly reduce estimates. Historically in California, quarantines have most frequently occurred at times of year when degree day based quarantines are drawn out by cold weather and the MED-FOES ABS model predicts significantly shorter durations. Furthermore, 30% of those historic quarantines happened in August where there is a great deal of uncertainty in forward predictions of degree day quarantine durations based on normal values. If we assume those historic CA quarantines are a guide, the ABS model would very likely produced more predictable and shorter quarantine durations for future outbreaks.
Combination of the two methods analyzed here could leverage the best aspects of both methods for determining optimal quarantine length. The initial quarantine length estimate could be quickly produced via degree-day calculation or the ABS based on the distribution of PQL values generated using historical temperatures. This would generate not just a single "typical" value as the current method of projecting using historical average/normal temperatures does, but a range of outcomes. The median "most likely" value may be used for official estimates, while the variance and extremes would provide managers and affected parties additional information vital for planning.
Once the three generation period has started after the last fly find, weekly ABS simulations could indicate the likelihood that the pest has been successfully eliminated. If 95% of simulations show elimination, the decision to end quarantine early could be made, or in the case where the ABS has not reached the 95% threshold at the end of the DD PQL additional measures could be considered to reduce the risk of re-detection.

Data and software availability
All data, non-standard programs, and scripts used area available in the GitHub repository: https://github.com/travc/paper-Predicted-MF-Quarantine-Length-Data-and-Code, archived at https://doi.org/10.5281/zenodo.1006698. Files are documented in the repository's README, and the analysis scripts (.ipynb files) are viewable online at GitHub. Efforts were made to make the code understandable. It is our intent that someone with a reasonable level of programming knowledge will be able to not only replicate our analysis, but also use portions of the provided code as a basis for their own analysis.

Competing interests
No competing interests were disclosed.

Grant information
This research was funded by USDA-ARS, project number 2040-22430-025-00D, and by the Headquarters Research Associate program (TCC).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Figure S1: Daily normal of hourly temperatures. Hourly temperature data aggregated by day of year.

Supplementary material
Click here to access the data. Click here to access the data. Click here to access the data. Click here to access the data. Click here to access the data.

John M. Kean
Forage Systems, AgResearch, Christchurch, New Zealand I thank the authors for addressing my feedback on the first version of the paper. Most of my reservations were adequately addressed. A few grammatical or typological errors remain (e.g. "metain-vasion", "data" treated as a singular) but these are very minor concerns. There is, however, one point that I think warrants a follow-up comment from the authors.
Fruit fly responses typically involve heightened trapping for monitoring eradication. The 3generation guideline operates from the time that the last insect is detected. Similarly, the agentbased simulation (ABS) implicitly assumes that no further fruit flies are detected -otherwise this would influence decision-making around the appropriate quarantine period. However, there might be some simulations in which the ABS population becomes large, and these would most likely be those that persist for the longest and have the greatest chance of generating further detections. This might introduce a slight bias in the results, such that the ABS potentially overestimates the quarantine time required since the last detection. This does not at all alter the conclusions from the study, but it does highlight the importance of surveillance trapping in informing eradication decision making.

John M. Kean
Forage Systems, AgResearch, Christchurch, New Zealand I thoroughly support the intent of this paper: to bring additional accuracy and rigour to biosecurity decision making. Specifically, the question of "when is eradication achieved" is an interesting and important one. It is not easy to address, however, but this paper makes an important contribution.
Biosecurity aware economies have recognised the need for science-based reform of international practices around quarantines 1 . Current practice has largely withstood the test of time, but is based on simplistic assumptions. As this paper points out, better decisions might be made with the better biological data and population dynamic tools now available.
This paper describes a well-designed comparison between an existing "rule of thumb" approach and a more biologically savvy modelling approach. The strengths and weaknesses of each method are alluded to and the reasons for misalignment explored. The authors suggest a balanced approach to utilise the strengths and minimise the weaknesses of each. On the whole, I support the study and its findings, with two caveats.
First, some additional details are needed to fully understand what the agent-based simulations (ABS) are doing. Critical information is lacking on the temperature relationships and simulated management in the model. While this information would be available from the computer code provided, I am loathe to search through 423 MB of source code to find it. Without these details I cannot make a confident assessment of the study.
Second, the study highlights the importance of low temperatures in fruit fly phenology, but glosses over some of the biologically relevant complications. More detailed comments on the paper sections are given below.

Introduction
The examples given for eradications are all historical, including the cited review paper 2 , which is not very optimistic about eradication as an effective tool for managing invasions. The science of eradication has advanced considerably in recent years and better understanding of invasive ecology, improved surveillance and control tools, and important advances in understanding and managing social expectations around eradication programmes mean that biosecurity agencies can now conduct such operations with much greater certainty, efficacy and efficiency. I suggest replacing the cited review paper 2 with Liebhold et al. (2016) 3 which gives a more up-to-date review of eradication science.
I note that the quarantine guideline of three generations is not universally used. For example, Australia uses one generation plus thirty days.
I am intrigued by the statement that predicting generation times into the future using normal temperatures is "mathematically flawed" and would like further clarification about what the authors mean.

Methods
It would be very useful to specify the developmental parameters used by the ABS, especially if they differ from those used for the day-degree approach. I note that the day degree parameters used in California differ considerably from other published values, which find a base development temperature of 8 to 10°C 4 .
No details are given about the assumed starting populations and age structures for ABS runs. This seems a critical detail. Also, what were the assumed management conditions? Without management the simulated populations would presumably increase (on average) and the fact that eradication was achieved in the simulations suggests that some sort of management must have been in place. Details of this are critical for understanding what the simulation results mean and how applicable they may be to different cases.
In addition, I feel an important aspect of quarantine has been completely ignored -that of surveillance efficacy. The relevant international guidelines (ISPM 6, 9 and 26) specify that surveillance and monitoring are a key part of any programme aiming to prove freedom from a pest. For fruit flies, surveillance trapping is used to monitor eradications, with the quarantine period applying from the time of the last confirmed detection. Therefore surveillance effort is a critical factor in determining an appropriate quarantine length. If surveillance were perfect then no quarantine period would be needed because we would know that nothing is there. Conversely, if surveillance is poor then a large population might conceivably persist undetected for a very long time, necessitating a very long quarantine period indeed. I assume that the ABS simulated the surveillance practices used in California, but some details are needed. I believe that fruit fly surveillance practices differ between different parts of California -could this explain part of the differences in results between sites ( Figure 2)?

Results
The ABS results were tallied as the simulated times after which 95% of simulated populations were eradicated. Depending on the situation, regulator might prefer greater (e.g. 99%) or lesser certainty. It would be useful to see an example of a population survival curve to understand how quarantine duration affects the risk of non-eradication. I suspect that such information would be very useful for biosecurity authorities in their decision-making. Figure 2 has a spelling mistake -"extripation".

Conclusions
A key point in the paper is the hypothesis that cold snaps may help to knock out fruit fly populations more quickly than expected by the simple day-degree method. The implication is that the ABS method may enable significant cost savings in such situations by allowing a substantially shorter quarantine period. This is a valuable insight, but should be tempered by the fact that fruit flies may have special tricks to enable them to survive cold snaps. The torpor and cold survival thresholds for Queensland fruit fly, for example, are conditioned by previous exposure to low temperatures 56 , an effect that I suspect is not captured in the ABS. Therefore, I would urge some caution in applying the ABS recommendations in such situations.
Over all, Medfly development and survival at relatively low temperatures seems to be a key factor in setting quarantine periods that start in late summer/autumn, and in understanding the different predictions from the day-degree and ABS methods. However linear day degree models, as used in both approaches, are known to have questionable validity near the predicted developmental threshold. Also (as noted by the authors) small changes in the threshold temperature parameter might cause relatively large changes in the predicted quarantine times. This is not a particular issue for this study, but is a common problem in the simulation of insect phenology, especially in temperate climates. Given the economic importance of fruit flies, better data and models for behavioural and physiological thresholds could be very useful for making better biosecurity decisions.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com