Likely community transmission of COVID-19 infections between neighboring, persistent hotspots in Ontario, Canada

Introduction: This study aimed to produce community-level geo-spatial mapping of confirmed COVID-19 cases in Ontario Canada in near real-time to support decision-making. This was accomplished by area-to-area geostatistical analysis, space-time integration, and spatial interpolation of COVID-19 positive individuals. Methods: COVID-19 cases and locations were curated for geostatistical analyses from March 2020 through June 2021, corresponding to the first, second, and third waves of infections. Daily cases were aggregated according to designated forward sortation area (FSA), and postal codes (PC) in municipal regions Hamilton, Kitchener/Waterloo, London, Ottawa, Toronto, and Windsor/Essex county. Hotspots were identified with area-to-area tests including Getis-Ord Gi*, Global Moran’s I spatial autocorrelation, and Local Moran’s I asymmetric clustering and outlier analyses. Case counts were also interpolated across geographic regions by Empirical Bayesian Kriging, which localizes high concentrations of COVID-19 positive tests, independent of FSA or PC boundaries. The Geostatistical Disease Epidemiology Toolbox, which is freely-available software, automates the identification of these regions and produces digital maps for public health professionals to assist in pandemic management of contact tracing and distribution of other resources. Results: This study provided indicators in real-time of likely, community-level disease transmission through innovative geospatial analyses of COVID-19 incidence data. Municipal and provincial results were validated by comparisons with known outbreaks at long-term care and other high density residences and on farms. PC-level analyses revealed hotspots at higher geospatial resolution than public reports of FSAs, and often sooner. Results of different tests and kriging were compared to determine consistency among hotspot assignments. Concurrent or consecutive hotspots in close proximity suggested potential community transmission of COVID-19 from cluster and outlier analysis of neighboring PCs and by kriging. Results were also stratified by population based-categories (sex, age, and presence/absence of comorbidities). Conclusions: Earlier recognition of hotspots could reduce public health burdens of COVID-19 and expedite contact tracing.


Introduction
Many critical elements of disease epidemiology were unknown during the early stages of the COVID-19 pandemic. This included the case fatality rate, at what point during the natural history of the disease was it most transmissible (before, during or after the onset of symptoms), the frequencies of asymptomatic presentation of positive cases, and the duration of infectivity. 1 Quarantining, social distancing, and isolation of infected populations were quickly recognized as effective public health measures. Without implementation of these measures in Ontario, COVID-19 patients presenting with severe disease would have exceeded available hospital capacity. 2 Delayed testing and the sheer numbers of infected individuals often precluded comprehensive contact tracing. 3 Indeed, COVID-19 testing was limited in many regions within Canada early in the pandemic, as was the availability of necessary resources (e.g., intensive care units, nursing and other clinical personnel, personal protective equipment). 4,5 Limited resources must be adequately allocated to those locations where the rates of transmission of COVID-19 are high and where the highest numbers of infected individuals reside. This highlights the need for an effective, systematic way of tracking concentrations of this disease after community-level transmission. We address the persistence of geographic disease hotspots as a potential sentinel for efficient distribution of sparse resources.
The SARS-CoV-2 virus spreads by person-to-person contact, especially through respiration. 6 Geographic epidemiology can be a key factor for the prevention of disease spread, as it can be used to monitor disease progression (or regression) and manage allocation of medical resources. Geostatistical methods have been utilized to map communicable diseases such as influenza-like diseases, 7 dengue fever, 8 cholera, 9 legionellosis and shigellosis, among others. 10 Various geostatistical analyses have been used in the surveillance of COVID-19 spread across the globe, [11][12][13][14][15][16][17][18] often with the geo-spatial software, ArcGIS (ESRI). 19 Other available tools for geostatistical epidemiology include SaTScan TM , 20 and GeoDa, 21 however these were not used in the present study because of their requirement to input disease background levels, which was not relevant for COVID-19. Available case data analyzed has been typically aggregated at the municipal 11,16 or county level, 13,14,22 however higher resolution spatial analyses has been limited as finer granularity for the distribution of COVID-19 cases has often been lacking. 16,19 We performed geostatistical area-to-area analyses as well as interpolation of COVID-19 cases within six municipalities of the Canadian Province of Ontario (Hamilton, Kitchener/Waterloo, London, Ottawa, Toronto and Windsor/Essex region) distributed at the level of individual forward sortation areas (FSA) and postal codes (PCs; a smaller geographic segment within an FSA). Area-to-area analyses were performed on COVID-19 case data at the provincial level by FSA, a geographical unit similar to zip codes in the United States which can encompass hundreds of PCs. Each PC comprises an average of 19 households (Statistics Canada, 2006). Therefore, by evaluating COVID-19 case data at the PC level, we will be able to evaluate disease distributions at a high level of granularity.
The success of short-term COVID-19 geostatistical studies at the county level early in the pandemic 22 suggested that hotspots and localized concentrations of infected individuals could be delineated more precisely with epidemiological data collected with higher granularity. The present analysis evaluates the three major waves of COVID-19 infection to date in Ontario, allowing for the observation of trends and commonalities between waves. We utilized ArcGIS to carry out multiple area-to-area geostatistical tests, including Getis-Ord Gi* hotspot analysis (Figure 1), 23,24 Cluster and Outlier analysis (Figures 2A and 2B; Anselin Local Moran's I 25 ), and Spatial Autocorrelation (Figures 2C and 2D; Global Moran's I 26 ). Hotspots were also inferred from spatially-interpolated counts by kriging and represented as topographic contour maps 27,28 utilizing the ArcGIS geographic information system. Area-to-Area Gi* analysis finds regions with a high disease burden (i.e., model for community-driven spread), but does not inform how these cases are distributed throughout the region. PC-level point-to-point analyses (e.g., kriging) can be employed to provide context of how cases cluster and identify localized hotspots, where cases are concentrated in a single PC and independent of the disease burden of the FSA it is within ( Figure 3). The Getis-Ord Gi* local statistic identifies regions with high disease burden with high, spatially clustered COVID-19 case counts (a local sum) relative to the rest of the Province, 23,24 and is formulated as:

REVISED Amendments from Version 1
We thank the reviewers for their insightful suggestions and have incorporated them where possible. The descriptions of the "Preparation of the Ontario COVID-19 case data" and "FSA and PC boundary files" now appear at the beginning of the Methods section. Selection of the Thin Plate Spline semivariogram for interpolation of case counts for Toronto is now justified in the Methods section (Empirical Bayesian Kriging analysis), including Chi-Square statistical testing comparing Power vs TPS with ground truth hotspot counts. Table 1 contains thumbnails of Postal Code maps for each municipality analyzed and of their locations in Ontario. The Zenodo archive now combines extended, underlying data and software. Table 2 has been moved to this archive (Extended data, Section 1). Results (Wave 1) and Figure S3 describes 2 additional examples detecting COVID-19 hotspots before it was reported in the media (line plots based on true and interpolated case counts, with Gi* significance).
Any further responses from the reviewers can be found at the end of the article where i and j are two separate features, w i,j is the spatial weight between these features (described in Figure 1), x j is the value of the feature j (the feature being evaluated in this instance), n equals the total number features being considered, while x and S equal: Modeling spatial relationships by area-to-area geostatistical analyses. Case counts aggregated at the centroids of FSAs or PCs over daily-or multi-day windows are compared to cases in neighboring FSAs or PCs and evaluated with geostatistical tests. These spatial relationships are defined by modeling spatial interactions between the FSA or PC being evaluated and the neighbors being tested. The panels describing these interactions indicate: (A) The fixed distance band weighs all neighbors within a specified distance equally (weights [w i,j ] are equal for features within circle; for all other features, w i,j = 0), (B) The inverse distance band is based on distance decay where neighbors further from the target feature is weighted less relative to closer neighbors (w i,j = d i,j -β where d i,j is the Euclidean distance between two points and β is the inverse distance power variable; β = 1 in this study), (C) K-Nearest Neighbors constructs a spatial relationship which assesses a feature with a specified number of its most proximate neighbors based on centroid distances (here, K = 3, where each K neighbor is weighted equally), (D) Space-Time integration defines feature relationships in terms of both a space (fixed distance) and a time window. Features that are proximate to one another in space and time are analyzed together (equal weights), as relationships are assessed relative to the location and time stamp of the target feature. Area-to-area geostatistical tests used include Gi* analysis, Cluster and Outlier analysis, and Spatial Autocorrelation. I (also known as Cluster and Outlier analysis) computes a Moran's I value, or spatial autocorrelation, between counts within sets of FSAs or PCs. Global Moran's I evaluates the overall case clustering in a region. Panels shown indicate these relationships in Leamington FSA N8H: A) A positive local Moran's I index indicates that the regions neighboring N8H 3V5 has similar attributes (a high-case cluster; pink dot). B) A negative local Moran's I index demonstrates that cases the region surrounding the target are dissimilar (a high-case outlier in N8H 3V8; red dot). C) The Global Moran's I (Spatial Autocorrelation) index demonstrates whether cases are clustered, dispersed, or randomly distributed. In this example, cases were significantly clustered from Dec. 21 to 23, 2020 (p = 0.03; p-value signifies whether the null hypothesis [a random distribution] can be rejected). D) The case distribution was not significantly clustered on Dec. 2 to 4, 2020 (p = 0.7). 18 to 20, 2020) and interpolate different case levels within the region; (B) EBK uses empirical semivariograms which describe the correlations between the distance separating a pair of locations with COVID-19-positive individuals, 'h', and their corresponding semivariance 'γ'. Larger values of γ indicate lower spatial autocorrelation between numbers of cases. A semivariogram model fits a curve through γ vs. distance that predicts γ at unsampled locations between known data points. The 'Power' semivariogram (B, top) is a parabolic mathematical model based on distance, while the Thin Plate Spline semivariogram (B, bottom) is a non-rigid model that passes exactly through bins of paired locations (grouped by distance and direction; blue crosses) while minimizing surface curvature. Red lines represent the median of the distribution of semivariogram models.
x ¼ P n j¼1 x j n (2) Space-time Gi* analysis also uses this statistic to find hotspots that are both spatially and temporally clustered with other neighboring features ( Figure 1D). Cluster and Outlier analysis identifies high-case regions that are geospatially clustered with other neighboring regions with a similar rate of cases (high-case clusters; Figure 2A), or those with neighboring regions with significantly lower COVID-19 case counts (high-case outliers; Figure 2B). The Global Moran's I index determines if the distribution of cases is clustered, dispersed, or random within a region. 26 The Moran's I index is positive for spatially clustered cases between postal codes, which is compared to the expected value (i.e., if the spatial relationship was random) to compute a z-score and significance level. Kriging is a geostatistical interpolation method which predicts values over a non-sampled region by analyzing a subset of locations with known values. 28 As a combination of geostatistical methods were applied to daily COVID-19 case data at the provincial [FSAs] and municipality [PCs] level for >450 days, efficient scripts were necessary to automate these analyses.
These geostatistical tools were used to identify and krige COVID-19 hotspots, clusters, and outliers within Ontario at a high level of granularity. With the results of these analyses, we can ask questions regarding the distribution of hotspots relative to others in both time and space. For example, this study evaluates significant levels of COVID-19 over a series of consecutive days which we term 'streaks'. This allowed us to determine the true size and persistence of hotspots, compare and contrast geostatistical analyses of COVID-19 waves highlighting recurrent hotspots and clusters of PCs with high case counts. We also investigated the order in the progression of infection across a region in a statistically robust way. Empirical Bayesian Kriging (EBK) identified concentrations of COVID-19 that transcended the boundaries of FSAs and PCs. Ultimately, consistency between the results obtained by these different approaches reinforces the conclusions that might be drawn from them based on implementation as independent geostatistical tests.

Methods
Preparation of Ontario COVID-19 case data from ICES Access to anonymized Ontario COVID-19 test results through the ICES Research and Analytic Environment (RAE) was obtained through Applied Health Research Questions (AHRQ) project #P0950.096 approved by the Ontario Ministry of Health and Long-Term Care (MOHLTC). ICES collects health care data for analysis and evaluation of the health system. Ontario-wide COVID-19 data tables maintained in ICES were pre-processed to ensure that only the chronologically first laboratory-confirmed viral RNA polymerase chain reaction (PCR) SARS-CoV-2 test of a specific individual was counted (while repeat infections occur, these are assumed to be significantly less frequent). PCs associated with each positive case were cross-referenced to their respective FSA (derived from the 2016 Canadian Census boundary file from Statistics Canada) to identify and filter out invalid entries. The PCs provided were sourced from the OLIS (Ontario Laboratories Information System) and the RPDB (Registered Persons) databases. The OLIS postal code was more likely to relate to a current address and was therefore prioritized. However, it is also the most error prone. Reasons for rejection include blank entries, invalid FSAs (e.g., out of province FSA), or intentional masking. The RPDB PC was used if the OLIS PC was rejected. Entries where both PCs were invalid were rare (<0.1% of all unique positive cases). Non-residents or those ineligible for the Ontario Health Insurance Plan (OHIP) are also not included. Cases were separately aggregated by FSAs and PCs, then converted into tables structured as rows (FSA or PC count) by columns (date) as input to ArcGIS Desktop. Cases were based on the date of sampling, rather than when test results were reported, to more accurately reflect the inception of infections.
As of June 2021, three separate and distinct waves of COVID-19 cases have been recognized in Ontario (and elsewhere). We delineated the inception and termination dates of each wave from the rates of change in infection rates (cases/day 2 ) using the second derivative of daily case counts. Stochasticity in day-to-day variation due to testing and case reporting caused fluctuations in changes of daily counts, even when averaging cases over multiple consecutive days. Consecutive dates without fluctuations in counts (between -10 and +10 cases/day 2 ) were rare. Therefore, dates were selected where the second derivative remained between -10 and +10 cases/day 2 immediately preceding rapid increases in cases that defined inception of a wave, or after infections ceased decreasing in counts, defining termination of a wave. The first wave (1)  Of the geostatistical methods tested, PC-level counts of infected individuals were the most geospatially resolved granular COVID-19 case data. Many PCs comprise of small geographic areas with low population densities. This made them more prone to the effects of day-to-day stochasticity in patient ascertainment and laboratory reporting, especially when case counts were low. To address this, PC-level case data was also aggregated over multi-day sliding windows, effectively smoothing out this source of variation. Although 3 and 7-day windows were initially analyzed, longer windows were not as useful for determining the inception of hotspots, a goal of this analysis, and were eventually discounted. 3-day windows have been used in other geostatistical studies of COVID-19. 13,14 Access to, pre-processing and geostatistical analysis of COVID-19 daily cases and associated metadata required the RAE computer system at ICES. To ensure patient privacy, ICES stipulated that case counts could not be exported from the ICES system unless FSAs and/or PCs exhibiting between 1 and 5 daily COVID-19 cases were masked. We modified counts of FSAs meeting these criteria and assessed their impact on the geostatistical analysis. Gi* analysis of ground truth case data was compared with the analysis of two modified FSA-level case datasets (where all instances of between 1 and 5 case counts were either converted to a single count or randomized between 1 and 5). More than one third of the locations that were deemed significant hotspots by Gi* from actual case counts (from March 1 st to September 24 th , 2020) were not flagged when locations were masked and assigned a count of 1; greater than one half were missed when case counts were randomized (Extended data, 29 Section 1 - Table S1). Discordant hotspots identified from actual vs. masked case counts were reduced in FSAs with high-case counts (≥10) but were nevertheless unacceptably frequent (14% or 28% discordant, respectively, by masking low case counts as 1 or by randomization). We concluded that errors introduced by masking count values precluded analysis of masked data and required all work to be carried out within the RAE computing environment. All geostatistical analyses presented in this study were performed against unmasked, true case counts.

FSA and PC boundary files
Shapefiles which store geospatial data [points, lines or polygons] with related attribute information and depict the 2016 census geographic boundaries of FSAs across Canada were obtained from Statistics Canada. PC boundary shapefiles created by DMTI ("DMTI_2019_CMPCS_LocalDeliveryUnitsRegion.shp") were accessed through the Scholar's Geoportal at the University of Western Ontario. These files were used to compute latitude/longitude coordinates of the centroid for each FSA and PC (with the ArcGIS "Calculate Geometry" function), to validate PCs provided for each individual in the COVID-19 test dataset, and to convert spatial interpolation maps (such as those generated by kriging) into machine-readable text files (using ArcGIS "Intersection" function between the spatial map and with each boundary file). The ArcGIS "Split by Attribute" function was used to limit PC boundary files to the province of Ontario. This was necessary, both because data from other provinces were not available, and because of system limitations affecting processing and memory requirements for large geographic regions.
Minor discrepancies between the FSA and PC boundary files were identified which had little or no impact on our geostatistical analysis. Nearly 9% of all PCs in Ontario overlapped multiple FSAs. Most of these PCs were present in multiple locations and occurred in rural areas. Approximately 2.7% of PCs intersected with an incorrect FSA, which was likely the result of asynchronous creation of the corresponding shapefiles. These locations rarely contained many COVID-19 cases as only 11 PCs had >5 cases over any 3-day span in 2020, including: N9H 0E3 (Windsor, ON) which overlaps with the FSA N9G; N8N 0B3 (Tecumseh, ON) which overlaps the bordering FSA N8R, and M3H 5V9 (Toronto, ON) which intersected with neighboring FSA M3J. These discrepancies might impact the interpolation of cases in FSAs based on kriging analysis. Since the discrepant PCs were typically found on the border of FSAs, these conflicts had minimal effect on kriging accuracy. A positive score indicates that a feature is clustered with other similar high-case count regions, while a negative index indicates that a feature is an outlier. The z-score and p-value indicates the statistical significance of the index value. A false discovery rate (FDR) correction was applied to this and Gi* analysis, which changes the threshold of significance of p-value ≤0.05 to a reduced value within the 95% confidence interval [C.I.], thereby accounting for multiple testing and spatial dependency. Z-scores are also determined in Gi* analysis, with higher positive scores indicating greater clustering of cases in a region. Space-time Gi* analysis was based on a single day interval but incorporated temporal trends based on the days preceding and subsequent to each date. We set a minimum threshold of 10 confirmed cases for an FSA to be considered significant by these three approaches, as this threshold was found to provide reliable and conservative detection of known hotspots using the Gi* test. Over two thirds of FSAs with ≥10 cases between March 1 st and Sept. 24 th , 2020 were found to be significant by Gi* [99% C.I.] using both a fixed and inverse distance metric (Extended data, 29 Section 1 - Figure S1). Spatial relationships were evaluated to determine which distance metric best reflected the relationship between the test location and its surrounding FSAs by fixed distance (all FSAs within a given threshold distance [10km] are equally weighted with the test FSA), inverse distance (FSAs closer to the test FSA are weighted more highly), and by K Nearest Neighbors (KNN; comparing counts in the test FSA and its closest K neighbors).

Geostatistical analysis
PC-level case data (binned over 3-day sliding windows) for each municipality were evaluated by Spatial Autocorrelation (Global Moran's I), Cluster and Outlier analysis, and EBK. Spatial Autocorrelation used FDR correction to account for multiple testing; it was not applied for PC-level Cluster and Outlier analysis due a limitation of the method in high-case count regions where all locations were deemed non-significant. The KNN distance metric was used to define the spatial relationship between features (with K = 3) for all PCs, including those with zero cases. PCs with ≥6 cases were reported if deemed significant by Cluster and Outlier analysis. This is consistent with case minimums used in other COVID-19 geostatistical studies. 13,14 Kriging generates a topographic contour map of the number of COVID-19 cases interpolated over the evaluated region, as indicated below.
COVID-19 testing data compiled by ICES contained metadata on tested individuals, such as gender, age, and relevant predicate health conditions. To identify geospatially significant cases in these subgroups, case data was stratified by gender (Male: 241,393; Female: 247,152; those without gender information were ignored), by age (≥65 years old: 63,746; <65 years: 424,854), and by presence or absence of at least one chronic health condition (187,242; includes chronic asthma, congestive heart failure, chronic obstructive pulmonary disease, dementia, hypertension and/or diabetes). 301,358 patients were without one of these conditions. Individuals with these conditions are grouped as an "at-risk population", consistent with provincial terminology. These case subsets were then evaluated with Gi* (FSA-level), Spatial Autocorrelation, Cluster and Outlier analysis and kriging (PC-level).
Results from PC-level Cluster and Outlier analysis were further analyzed with a program script to identify significantly, highly clustered PCs over consecutive periods (i.e., termed high-case cluster "streaks"; Local_Morans_Analysis. Recurrent_Clustered_PC_Identifier.pl; see Data Availability section 29 ). We also identified which of these PC streaks occurred in close proximity (Local_Morans_Analysis.Clustered_Streak_Pairing_Program.pl) based on the neighbors of each postal code from PC centroid data (neighbors identified by Ontario_City_Closest_Postal_Code_Identification. pl; see FSA and PC boundary files subsection). Directional network graphs which visualize the interconnectivity of these paired high-case cluster streaks were created in MATLAB using built-in digraph and plot functions, where each edge (the connection between two PCs) represents a single pair of PCs with nearly concomitant or overlapping streaks.
Software automation was required to computationally evaluate daily and windowed case data over 16 months at the geographic resolution of FSAs and PCs across multiple cities with different geostatistical tests. Using the ArcPy package (ESRI), which enables Python programs to access ArcGIS's geoprocessing tools and extensions, program scripts were developed to perform geostatistical analyses across Ontario. These scripts set conditional variables which define the spatial relationship used (fixed, inverse or KNN), the threshold distance (or for KNN, the number of neighbors of the target feature), the distance method (how distances were calculated; set to 'Euclidean'), row standardization (set to 'none'), and whether FDR correction was used (applied to FSA-level Gi* and Cluster and Outlier analysis, and PC-level Spatial Autocorrelation analysis). ArcPy scripts read in case data (in tabular format), iterated sequentially through each date skipping those without positive cases, applied the associated geostatistical tool, and converted the output into text files. Tabular output was subsequently filtered by removing PCs without cases to avoid producing outputs of prohibitive size.
These processes were then integrated and streamlined in a software package named the Geostatistical Epidemiology Toolbox (see Data Availability section 29 ), which is accessed through the ArcMap graphical user interface. This resource provides the same geostatistical operations and map imaging developed for this project in a simplified, user-friendly environment. This toolbox bundles manual pre-processing, analysis, and post-analytical steps into a set of linked, uninterrupted workflows. While the toolbox was designed to evaluate COVID-19 case data in Ontario, the software can be easily modified to accommodate other Canadian provinces and, with additional effort, other countries (see Geostatistical Epidemiology Toolbox user manual 29 ). To enable kriging functionality, the Advanced Geostatistical Analyst (ESRI) extension and the PC and FSA boundary shapefiles are required. These shapefiles are also required for PC-level Local Moran's I and Global Moran's I analyses. This package is not bundled with any of the program scripts developed to evaluate geostatistical output (besides data imaging), which are also provided in the accompanying Zenodo archive. 29 Empirical Bayesian Kriging analysis Kriging estimates case counts in unsampled regions by interpolation between counts at known locations. These values are spatially correlated with degree of separation from centroids of PCs or FSA with known cases. A topographic contour map is generated which illustrates the interpolated case levels within the analyzed region. The spatial distribution of the disease outbreak was determined for Hamilton, Kitchener/Waterloo, London, Ottawa, Toronto, and Windsor/Essex county by kriging analysis of PC-level COVID-19 case data over consecutive, overlapping 3-day windows. Kriging was also performed for other municipalities bordering Toronto (Ajax, Brampton, Markham, Mississauga, Pickering, Richmond Hill and Vaughan; see results in Extended data, 29 Section 3). Only PCs with ≥1 case were included in this analysis, as locations without confirmed cases greatly outnumbered PCs with cases (≥95% of PCs on any given date evaluated) leading to the severe depression of kriging signals. Several types of kriging (Ordinary, Universal, Simple and EBK) were evaluated for accurate detection of high case clusters in municipalities with high and moderate population densities (Toronto and London, respectively). Contour maps generated by EBK best matched regions of known cases used for analysis. Ordinary and Universal kriging yielded similar results for locations with high case counts (≥50) but lacked the sensitivity of EBK in regions with moderate cases. Simple kriging failed to define most contours correctly, regardless of case count. Therefore, EBK was chosen for all kriging analyses presented in this study.
Semivariograms describe how the distances separating COVID-19 positive individuals and the semivariance of counts at these locations are correlated. The best model that fits this relationship using empirically defined distance data was used to predict COVID-19 count levels at unsampled locations between the centroids of FSAs or PCs. Semivariogram models assessed included Power, Linear, Thin Plate Spline (TPS), Whittle, Exponential, and K-Bessel. Kriging parameters for these models were considered, such as sector type, search radius, neighborhood type and the number of neighbors considered for each location. Through empirical testing, the Power semivariogram model (based on a parabolic relationship to distance between locations) was chosen based on interpolated values that best represented actual case counts in low/moderately dense populations (Extended data, 29 Section 1 - Figure S2A). Kriging accuracy improved by inferring interpolated values from 3 neighboring locations with known counts. However, the Power semivariogram to failed to detect some hotspots in Toronto, which exhibited the highest case counts and overall population density. The TPS semivariogram model, a non-rigid representation that passes exactly through points while minimizing surface curvature, performed best in this city (Extended data, 29 Section 1 - Figure S2B). We noticed that the separation between Power-based kriging contours was generally larger, in contrast with TPS-based contours which were tightly packed, more bifurcated and frequent. TPS, however, tended to generate elevated estimated counts that did not correlate with known hotspots. Using 5 neighboring locations for interpolation minimized these artifacts without affecting the contour map. However, with 10 neighbors, interpolation accuracy was affected according to a chi-square test (e.g., p < 0.05 for the 103 days in Wave 2 We have previously demonstrated that circumscribing the boundaries of analyzed regions by specifying these locations with zero counts improves kriging accuracy, especially at locations proximate to boundaries. 30 This approach was used in the present study of interpolated COVID-19 cases for each municipality in Ontario. Municipal borders were derived by combining all PCs associated with the region of interest. The optimal densities and distances of the zero-count buffer circumscribing municipalities by programmatically generating boundaries. Distances between the boundary to the city border were varied from 0.1, 0.5, 2, and 5 km, and densities between locations of 0.05, 0.25, 0.5 and 1 km. Kriging analysis was iterated for each of these defined zero-count borders, and the kriging layers generated were compared to the original kriging results. The location of each kriging contour properly coincided with regions of high case counts, regardless of the shape and density of the zero-count boundary locations. Raising the boundary density consistently increased the COVID-19 case value interpolated of a high-case region while simultaneously reducing the total area of the kriging contours around the region, regardless of semivariogram model used. This effect was less prominent in regions with higher density of cases (i.e., Toronto), and in regions further from the city boundaries (as the kriging contours exhibited consistent shape and magnitude >6 km away from the city boundary). Considering these observations, we chose to use a moderate value for each parameter (0.5 km density and 1km distance from the city boundary). The kriging tool in the Geostatistical Epidemiology Toolbox, however, automatically generates these boundaries 1 km from the selected border with 250 m of distance between each location.
Kriging analysis generates a series of shapefiles containing contour maps that define the interpolated level of COVID-19 cases of a region. An ArcPy script intersected map contours derived from kriging and either FSA or PC boundary files, which associated each kriging-derived case count with these locations. The results of these intersections were displayed as distinct contour distributions (or kriging 'symbologies'): one for instances with a wider range of inferred case count levels (0-1, 1-5, 5-10, 10-15 … 35-40, 40-50, >50 cases), and another for low densities of inferred cases, which was discriminated by single counts (0-1, 1-2 … 9-10, >10 cases). In compliance with ICES's reporting criteria, we defined any interpolated value >5 cases (either ≥5-6 or ≥5-10 contour, depending on the symbology type used) as a region of interest, and display PC or FSA designations for only these locations. The intersected area (m 2 ) between contours and the corresponding FSA or PC was determined to compute the size of localized hotspots (e.g., the total area of the FSA or PC considered a region of interest by kriging).

Results
Changes in the distribution of COVID-19 infections over time were evaluated through geostatistical analysis of true (unmasked) case counts within both FSAs and PCs. PC-level analysis can map COVID-19 spread at a finer resolution, which more precisely identifies hotspots concentrated within individual postal codes (e.g., a localized hotspot). Area-toarea geostatistical analysis of cases in FSAs is better suited when COVID-19 cases are distributed across the entire FSA (e.g., an area with high disease burden by community-driven spread; Figure 4), rather than when cases are concentrated in a single location. This method does not, however, provide any information regarding the overall distribution of cases within the FSA. Spatial Autocorrelation analysis (Global Moran's I) was therefore performed to identify FSAs where COVID-19 cases are clustered at the PC-level (potential evidence of disease transmission). These geostatistical methods combine analyses of COVID-19 cases at multiple levels of resolution, facilitating interpretation of the case distribution throughout the province.
Significant FSA hotspots were assessed by Gi* through comparison of cases within KNNs, a fixed distance, or by weighted inverse distance. During the first wave of COVID-19, when testing was not fully established across the province, FSAs with the minimum number of cases considered significant hotspots (≥10) were less common, and fewer FSAs met these criteria (78 to 94%; Extended data, 29 Section 1 - Table S2). Later, fixed distance comparisons identified more FSAs with significantly higher disease burden than other metrics (53% of all FSAs in wave 3 versus 24% for KNN3 and versus 13% with inversely weighted FSAs; Extended data, 29 Section 1 - Table S2). We prioritized the standard Gi* analysis which utilized KNN (where K = 3) in this study, as it is moderately stringent (compared to results using the fixed and inverse distance metrics) and is unaffected by the shape of FSAs (which can be quite variable when comparing rural and urban areas).
Space-time Gi* analysis consistently called FSAs with ≥10 cases as significant in waves 2 and 3 (>91% and 95%, respectively; Extended data, 29 Section 1 - Table S2). This approach identified nearly twice as many FSAs as significant versus to standard Gi* comparing neighbors up to the same fixed distance (Extended data, 29 Section 1 - Table S2). FSAs deemed significant hotspots by space-time Gi* were tested against many more neighbors (<10km) relative to FSAs that were not hotspots (averaging 59 and 17 neighbors, respectively). This revealed a bias in significant FSA hotspots towards smaller, densely organized FSAs (which were typically urban), while those that did not achieve significance were often rural, covering larger areas and exhibiting lower COVID-19 counts. Additionally, 98% of the FSAs flagged were also significant hotspots on the days prior and after the central date in each window. This was uncommon in FSAs that were not determined to be significant hotspots (6%). As the vast majority of regions were flagged by this approach (>90%), we did not gain any insight from this space-time analysis. Instead, we evaluated temporal trends in the distribution of COVID-19 cases and identified potential transmission using Cluster and Outlier analysis of PCs that persisted across a consecutive series of dates (referred to as high-case cluster 'streaks'; described later in the Results section).
Comparison of kriging and Cluster and Outlier analysis of PC-level case data revealed that the same statistically significant PCs were often identified by both approaches on the same dates (March 26 th to December 28 th , 2020; ≥6 cases). A PC was considered significant by Cluster and Outlier analysis if the location was deemed a high-case cluster or outlier with an FDR-correction with 95% confidence, or a region of interest by kriging if ≥5 cases were interpolated within it. This comparison finds that both approaches are consistent in municipalities with lower overall case counts such as Hamilton (65.2% concordance; N = 66 PCs found significant by cluster/outlier analysis), Kitchener/Waterloo (85.1% concordance; N=47), London (69.0% concordance; N = 29), and Windsor/Essex (85.0% concordance; N = 160) (Extended data, 29 Section 1 - Table S3). The two methods showed <50% correlation to each other when evaluating cases in Ottawa (45.0% concordance; N = 211) and Toronto (49.7% concordance; N = 529). Nearly all discordant PCs were interpolated to have between 1-5 cases (i.e., marginally discordant). Kriging analysis commonly identified high-case outliers more often than high-case clusters. In Hamilton, 70% of high-case outliers but only 29% of high-case clusters were also found to be regions of interest by kriging (N = 59 and 7, respectively). Similar patterns were observed for all regions evaluated except Windsor/Essex, where overlap between kriging and high-case clusters and outliers were comparable (83% and 96% [N = 138 and 22], respectively; Extended data, 29 Section 1 - Table S3).
As incidence rates of COVID-19 have fluctuated since its initial appearance in Canada, the following geostatistical analyses have been stratified by periods with significant infectious burden, which provides a common frame of reference. We organize the analyses according to the three periods (or waves) with maximal occurrence of COVID-19 infections, where wave 1 extended from March 26 th to June 16 th , 2020, wave 2 from September 25 th , 2020 to February 14 th , 2021, and wave 3 from March 17 th to June 23 rd , 2021. The intervening periods between waves 1 and 2 and waves 2 and 3 are designated as inter-waves 1-2 and 2-3, respectively.

Wave 1
Kriging analysis of COVID-19 cases in this wave identified localized hotspots in Hamilton, Kitchener/Waterloo, London, Ottawa, Toronto, and Windsor/Essex (all localized hotspots are identified in Extended data, 29 Section 2). High disease burden FSAs (those flagged by Gi* analysis) were often found to include localized hotspots (>50% of flagged FSAs in cities evaluated; Extended data, 29 Section 1 - Table S4). Localized hotspots (regions within contours exceeding interpolated count >5 cases) were most frequently identified within high disease burden FSAs in Ottawa during the first COVID-19 wave (84%; Extended data, 29 Section 1 - Table S4). However, kriging also identified 34 additional localized hotspots within FSAs on dates within this period, suggesting that COVID-19 events of interest occur that are missed by Gi* analysis. Due to its high density, kriging analysis of cases in Toronto using the Power semivariogram (used in analysis of all other cities) was found to have poor overlap with Gi* analysis (only 31.6% of Gi*-significant FSAs contained a localized hotspot). The overlap between kriging and Gi* test results was improved using the TPS semivariogram (increased to 60.9%), and this was therefore used for all subsequent kriging analyses of Toronto. PC clustering was uncommon during this wave (<15% of FSAs flagged by Gi* and/or kriging were clustered). The exception was London where 27% of localized hotspots occurred in clustered FSAs [N = 11], and the majority of cases were often localized to a single PC in other cities that were analyzed (Extended data, 29 Section 1 - Table S4).
COVID-19 outbreaks frequently affected residents of long-term care residences during this wave. Geostatistical analyses confirmed the ability to identify these outbreaks. For example, an outbreak in a nursing home in Toronto in early April 2020 (M8V; Lakeshore Lodge) was identified by kriging, while the FSA in which it occurred was also significant by Gi* analysis ( Figure 5A). A different outbreak in Toronto in April 2020 (M2K; Extendicare Bayview nursing home) was also detected by kriging analysis, but the FSA in which it was found was not a significant hotspot by Gi* ( Figure 5B; it was significant only using a fixed distance metric of ≥20 km). Similarly, the FSA K1J (Ottawa, ON) was only significant by Gi* using an inverse distance weighted test, despite a mid-April 2020 outbreak in the Rothwell Heights assisted living community which was identified by kriging ( Figure 5C). Localized hotspots within the FSA M9P were observed as early as March 31 st , 2020 however the FSA was not flagged by Gi* analysis until April 6 th , 2020 (Extended data, 29 Section 1 - Figure S3A). The localized hotspot covers the Village of Humber Heights Retirement Home in Toronto, which was reported to have a COVID-19 outbreak on April 12 th , 13 days after the localized hotspot was first detected by kriging. 31 In each of these instances, the interpolation of cases by kriging were similar to actual case counts in these FSAs. This was Figure 5. Geostatistical analysis of COVID-19 outbreaks during wave 1. FSAs exhibiting high-disease burden by Gi* analysis were identified with localized hotspots by kriging or as having significant case clusters by Spatial Autocorrelation. Panels indicate cases in all PCs within FSAs (A) M8V (B) M2K, and (C) K1J, which were amalgamated over 3-day sliding windows and masked (red line indicates actual case counts; privacy consideration require dates with fewer than 5 cases to be masked as zero). Kriging interpolated the number of cases over these FSAs (blue line), which closely approximates true case counts when counts are high. Green asterisks indicate dates where at least one day within a 3-day window was significant by Gi*; purple crosses indicate significantly clustered cases within PCs.
consistent with the concentration of cases within a single PC (a frequent occurrence in the first COVID-19 wave; Extended data, 29 Section 1 - Table S4). Hotspots where the case total exceeds the interpolated value is consistent with cases distributed among multiple PCs across an FSA. For example, localized hotspots in FSA M9V were identified sporadically, and the interpolated value does not always match total cases (Extended data, 29 Section 1 - Figure S3B). The localized hot spots cover the PC M9V 5B5 starting from April 4 th . Outbreaks in the long-term care home within this PC (Humber Valley Terrance long-term care home in Toronto) were reported on April 16 th , 2020. 32 This FSA was not flagged by Gi* analysis until April 19 th (Extended data, 29 Section 1 - Figure S3B).
Localized hotspots identified by kriging comprise a narrow range of PCs in each FSA. The area of interpolated contours that overlapped FSAs with significant disease burdens (by Gi* analysis) was compared to the total area of all corresponding FSAs. COVID-19 cases were highly localized, as the highest value contour (where kriging interpolates the greatest number of cases) constituted only a small portion of the total FSA area (Extended data, 29 Section 1 - Figure S4). While hotspots found by kriging often coincide temporally with Gi* results (Extended data, 29 Section 1 - Table S4), the interpolated hotspots contain fewer cases relative to the entire FSA, since the maxima of kriging contours cover only a portion of the overlapping FSA. The overall area indicated as a region of interest by kriging overlaps a greater fraction of the FSA area, but still covers a fraction of its area (<10% in the majority of cases). Compared to the other COVID-19 waves, however, kriging analysis of cases during this wave tended to span a larger area of each FSA (Extended data, 29 Section 1 - Figure S4A). This appears to be consistent with higher transmissibility (which, we speculate, may be related to lower levels of immunity during this period).
COVID-19 case counts were stratified into three categories: biological sex (males vs. females), age (exceeding vs. below 65 years old), and overall health (presence or absence of at-risk, pre-existing comorbidities). Stratified PC-level case counts were evaluated by Cluster and Outlier analysis across all waves ( Table 1). In the first wave, the frequency of highcase outliers (a high-case PC with few cases amongst its immediate neighbors) is biased towards older, at-risk populations ( Table 1A). Across all municipalities, PCs were flagged nearly 5-fold more often in the older population (305 and 62 PCs were identified as a high-case outlier on different dates in this wave, respectively). A similar pattern was observed in at-risk vs healthy populations (321 and 31 high-case outlier PCs, respectively), which is not surprising due to the considerable overlap between the elevated age and groups with at-risk comorbid diagnoses. High-case clusters were identified less often, but also showed this pattern, which is likely related to the high frequency of outbreaks in long-term care homes during this period (Table 1B). Windsor/Essex was an exception to this trend, since the frequencies of PCs designated as high-case outliers were comparable among both age and comorbidity groups (20 vs 17 for older vs younger populations, 19 and 17 for at-risk vs. healthy populations; Table 1A). The frequency of high-case clusters/outliers identified by sex-stratification varied by the region analyzed (e.g., more high-case outliers were identified in the female population in Toronto, in contrast within the male population in Windsor/Essex). This stratified analysis itself does not provide insight into the mode of COVID-19 transmission, that is, the development of a hot spot consisting of a particular subgroup does not indicate that viral transmission occurred exclusively between members of the particular subgroup.
Interwave 1-2 The first COVID-19 wave ended approximately two months after the imposition of public health prohibitions on mobility, limits on assembly, and social distancing (i.e., a lockdown) in Ontario. June 16 th , 2020 defined the start of interwave 1-2, a period of much lower COVID-19 incidence than the preceding 2 ½ months. Fewer FSAs exhibited high disease burden by Gi* analysis (39.7% of those called in wave 1, despite the duration of interwave 1-2 being longer; Extended data, 29 Section 1 - Table S2). Despite the overall decrease in cases province-wide, the frequency of FSAs in the Windsor/Essex region that were designated hotspots by both Gi* and kriging analysis increased relative to the preceding wave (Extended data, 29 Section 1 - Table S4). This was due to multiple outbreaks in the Essex County FSAs in Leamington [N8H] and Kingsville [N9Y] (Extended data, 29 Section 1 - Figure S5). These outbreaks were publicly documented in guest farm workers, which is reflected by the Cluster and Outlier analysis of stratified case data (a strong bias of high-case clusters and outliers in healthy males <65 years in Windsor/Essex; Extended data, 29 Section 1 - Table S4). These same locations were correlated with interpolated kriging contour maxima and often consisted of isolated single PCs (Table 1). However, there were instances where the interpolated count values exceeded the actual case count (i.e., N8H, June 22-26 th , 2020; Extended data, 29 Section 1 - Figure S5B) due to concurrent high case counts from neighboring FSAs (i.e., N0P 2J0; Extended data, 29 Section 1 - Figure S5C) bleeding into these contours from a single PC.

Wave 2
FSAs with high disease burden were more frequent within the second wave (Extended data, 29 Section 1 - Table S2). Even when accounting for its longer duration, there was a >10-fold increase in FSAs with high disease burden based on the Gi* test. The distribution of cases was more widespread across each FSA, as COVID-19 testing became more reliable and pervasive. Localized hotspots were detected less frequently within FSAs with high disease burden, i.e., significant FSAs by Gi* analysis overlapped less often with kriging contours of interest (>5 interpolated cases) in all municipalities except   Windsor/Essex (Extended data, 29 Section 1 - Table S4). Additionally, interpolated case counts by kriging tended to be less than actual FSA case counts, due to dispersion of cases across more PCs. Most municipalities contained more FSAs with localized hotspots that were not significant by Gi* analysis. Increased case counts across the province during this wave increased the significance threshold for Gi* analysis. This decreased the likelihood that FSAs in municipalities with moderate population densities were designated significant hotspots. For example, FSAs within Kitchener/Waterloo and London were not significant COVID-19 hotspots during wave 2 (Extended data, 29 Section 1 - Table S4). Hotspots in these regions were coincident with maximum contour levels interpolated by kriging. PCs had a greater proportion of highcase clusters and/or outliers in younger and healthy populations, in contrast with wave 1 ( Table 1). The reduction of COVID-19 outbreaks (i.e., hotspots) in long-term care homes during this period was supported by reduced statistical significance of counts at these locations relative to increased background case counts across the province.
Localized hotspots detected in PCs by kriging were more frequent in Hamilton, London, Ottawa, Toronto and Windsor/ Essex during this wave than in the preceding one (Extended data, 29 Section 1 - Table S4). For example, an interpolated hotspot in the N5Y FSA (London) with >5 interpolated cases over consecutive 3-day intervals (December 19-21 and December 20-22, 2020; Figure 6) represents a 'streak', since kriging analysis interpolated cases over this FSA throughout the previous week (however, it was below the kriging threshold for significance). This hotspot encompasses a 2 building apartment complex that was publicized on December 29 by the local health authorities as a COVID-19 hotspot, two weeks after it was first detected by kriging. 33 While the interpolated case count is lower than the total count in this FSA on this date (N=33, other cases are present at other locations), this hotspot encompasses only 4% of the FSA by area. Other localized hotspots during this wave are depicted (e.g. in Leamington ON; Extended data, 29 Section 1 - Figure S6) and catalogued in Extended data, 29 Section 2.
Significant clustering of cases in PCs in close proximity based on Spatial Autocorrelation analysis may be distinct from other tests that identify localized hotspots. Spatially clustered localized COVID-19 hotspots may not be confined within  2 Estimated COVID-19 cases in FSAs associated with region (days where cases within FSA were masked were assumed to be 1 case). 3 PCs identified as a high-case outlier on a particular 3-day interval with ≥ 6 cases when analyzing stratified case data.
the same region of an FSA. For example, FSA M1P (Toronto) from Nov. 1-3 to Nov. 3-5, 2020 exhibited significant case clustering based on Gi* analysis. However, a localized hotspot identified within this FSA during this time did not overlap this cluster (Extended data, 29 Section 1 - Figure S7A). A neighboring FSA M1C showed significant spatially correlated cases within the same timeframe (Nov. 6-8, 2020), however localized hotspots were not detected in this FSA (Extended data, 29 Section 1 - Figure S7B). No PC within this FSA had ≥6 cases during this period. This illustrates how combining Spatial Autocorrelation, Gi* and kriging analyses can improve the interpretation of asymmetric COVID-19 case distributions.

Interwave 2-3
The interval between the end of wave 2 and inception of wave 3 was short (~31 days), however by comparison with interwave 1-2, the case counts were considerably higher (averaging 1134 and 154 per day, respectively). Thus, a far greater number of high disease burden FSAs were identified during this period (1.9 and 4.7-fold more often compared to wave 1 and interwave 1-2, respectively; Extended data, 29 Section 1 - Table S2). However, most of these occurred in Toronto and Windsor/Essex regions only (Extended data, 29 Section 1 - Table S4).

Wave 3
The frequencies of high disease burden FSAs in waves 2 and 3 were comparable upon adjusting for duration of each wave (Extended data, 29 Section 1 - Table S2). The regions detected by geostatistical analyses were different. High disease burden FSAs implicated in Hamilton and Kitchener/Waterloo by Gi* analysis were significantly more frequent compared to the preceding wave, while FSAs within Toronto were less often significant (Extended data, 29 Section 1 - Table S4). By contrast, localized hotspots within FSAs were generally less frequent than in wave 2, most notably in the Windsor/Essex region which contained 389 localized hotspots in wave 2, but 80 in wave 3 (Extended data, 29 Section 1 - Table S4).
High-case clusters and outliers were seldomly identified in older (≥65) populations in wave 3 and interwave 2-3 (Table 1). Similarly, there is a considerable decrease in the number of case clusters/outliers in at-risk populations relative to previous waves. For example, analysis of at-risk populations in Windsor/Essex identified 116 PCs as a high case outlier on a particular date during wave 2, but only 2 in wave 3 (Table 1A). These observations may be related to effectiveness of vaccination, based on the initial prioritization of older and at-risk cohorts for immunization.
The distribution of and relationships between clustered PCs with persistent COVID-19 cases COVID-19 distributions were evaluated by Cluster and Outlier analysis to identify neighboring PCs with high concentrations of cases. PCs deemed part of a high-case cluster that were clustered over contiguous date ranges were then identified. These PCs were recognized as clustered over the duration of a time "streak", which by definition, allows for a single day in which the PC was either not clustered or did not meet the minimum case count threshold of ≥6 cases within the 3-day sliding window. PCs that were part of a high-case cluster, but did not meet the case threshold, were not reported. This analysis identified 20 unique PCs with one or more high-case cluster streaks in Hamilton, 5 in Kitchener/ Waterloo, 10 in London, 24 in Ottawa, 216 in Toronto, and 19 in Windsor/Essex (Extended data, 29 Section 1 - Table S5). Streaks were uncommon in Kitchener/Waterloo and London, as PCs in these cities rarely fulfilled the minimum case threshold. Most streaks were shorter than 4 sliding window periods, indicating that the most frequent streaks were brief due the large number of single day discontinuities from case reporting patterns and ascertainment bias. For these reasons, ≤3-day bins were not considered evidence of a streak. However, 37 streaks longer than 3 days were interpreted to indicate persistent infections at the same locations. Streaks with more than 3 consecutive high-case clusters were uncommon in wave 1, with M6M 2J5 (West Park Long Term Care Center) as the only example of a streak exceeding 3 days. The lack of other locations with persistent infections may be related to inconsistent case reporting which was hampered by limited testing during this period. 4, 34 We sought evidence of COVID-19 transmission by identifying pairs of PCs with high case counts that were adjacent to one another by both distance and time. In Figure 7A, we analyzed PCs and their 10 closest neighbors, where streaks were either concurrent, consecutive, or separated by short gaps within the same wave. Concurrent streaks of neighboring PC pairs spanned at least 3 and as many as 29 days in Hamilton, Ottawa, Toronto, and Windsor/Essex (Extended data, 29 Section 1 - Table S6). There were a greater number of discontinuous streaks across all municipalities, with gaps separating streaks in neighboring PCs varied from short to long intervals (Extended data, 29 Section 1 - Table S7). During the second wave, many pairs of PCs were comprised of neighboring apartment buildings, for example, within Toronto FSA M4H. Other long concurrent paired streaks (exceeding 1 week in duration) occurred in Toronto PCs M1R 1S9 and M1R 1T1, M1L 1K9 and M1L 1L1 (both PC pairs in North York, Toronto), and L0R 1C0 with its neighbors in Hamilton (Extended data, 29 Section 1 - Table S6). Pairs of concurrent streaks in close proximity did not occur during wave 1, since streaks were uncommon during this time. However, many neighboring PCs were both found to be clustered both spatially and temporally during interwave 1-2 and waves 2 and 3 (Extended data, 29 Section 1 - Table S7). While not concurrent, neighboring PC pairs in rural Essex county were clustered within 10 days of one another during interwave 1-2 (N0P 2L0, N0P 2P0 and N0R 1B0). This is confirmed by publicly reported outbreaks in agriculture facilities of this region during this time. During wave 2, 56 unique pairs of streaks were identified in neighboring PCs in Toronto (18 clustered concurrently, another 14 within a week of one other), 4 pairs in Hamilton (2 concurrent), 1 in London (concurrent), 4 in Ottawa (1 concurrent) and 14 in Windsor/Essex (10 concurrent; all involve PCs in Essex county). There were fewer incidences of streaks during the third wave, which may be related to the effectiveness of vaccination efforts which limit transmission ( Figure 7B). During wave 3, we identified an additional 30 pairs of PCs with streaks in Toronto (7 are clustered concurrently, and an additional 4 pairs are offset by 1 week), and 6 in Hamilton (3 concurrent, and another pair offset by 2 days). Interestingly, 9 of the same PC pairs in Toronto clustered during the third wave were also flagged during the second wave, which could be an indication of an undetected reservoir of COVID-19 that persisted in this population during interwave 2-3. We examined evidence for likely community spread of COVID-19 between infected individuals in different pairs of neighboring PCs. This involved determining the distances separating them, the duration between streaks in the respective PCs, the total case counts among PC pairs during these streaks, and then assessing the significance of these characteristics. During waves 2 and 3 in Toronto, paired concurrent or consecutive streaks in such PCs were common ( Figure 7B). Frequencies of these PC pairs were negatively correlated with the duration of the gap between the end of a streak in one member of the pair and the beginning of the other (r = -0.57 and -0.50 for waves 2 and 3, respectively). To assess the significance of these results, a multinomial logistic regression model was derived in R ("Streak_Analysis_using_ Multinomial_Logistic_Regression_Models.Rmd" which uses the multinom function of the nnet package; available in Zenodo archive 29 ), evaluating (1) the total number of cases in each pair of PCs over the course of both streaks (classified as either <25, between 25 and 49, or ≥50 cases; the response variable of the regression), (2) the duration or gap (in days) separating streaks in a pair of PCs, and (3) the distances between the centroids of both PCs. The duration between streaks was significantly shorter for PC pairs with ≥50 cases relative to those with <25 cases (p = 0.0005 and 0.0141 for waves 2 and 3, respectively, using the Wald two-tailed z-test on the coefficients of the model). The same relationship was also significant when comparing streaks in pairs of PCs with ≥50 cases and those with 25-49 cases (p = 0.0015 and 0.0277 for waves 2 and 3, respectively). Total case counts and time between streaks in a pair of PCs were also inversely correlated (r = À0.41 in both waves 2 and 3; using the CORREL function in Excel), which is consistent with disease transmission between adjacent PCs being more common when cases were more abundant. These parameters were also inversely correlated (r = -0.13 and -0.17, respectively) in Ottawa and Windsor/Essex, but under less stringent criteria that allowed ≥6 cases for only one of the PCs in a pair on a single day. The relationship between distance between PCs and total number of cases was not significant by the regression model in either of these waves. Increasing the number of PC neighbors considered over a larger range of distances did not alter this result (Extended data, 29 Section 1 - Figure S8). However, upon dividing groups of neighboring PCs thresholded according to average quartile values (distance [182 m] and average gap between streaks [17.5 days]) distance between neighbors was significant when combined with the degree to which they coincided temporally. Below average duration between streaks and distance separating PCs were 2.8-fold more likely to exhibit ≥50 cases than all other PC pairs (p = 0.002 by Mantel-Haenszel chi square test). Transmission of higher levels of COVID-19 between pairs of PCs is supported by abbreviated intervals between streaks and their proximity to one another.
Multiple paired PC streaks in Toronto were tightly grouped within a single neighborhood over a short period of time (Extended data, 29 Section 1 - Table S7). Directional network diagrams were created to illustrate the connectivity and temporal relationships between these and other paired high-case cluster streaks in close proximity (Extended data, 29 Section 1 - Figures S8 and S9 for waves 2 and 3, respectively). The preponderance of paired streaks were bimodal (9 paired streaks where only two neighbors interact [Extended data, 29 Section 1 - Figure S9D], of which 17 occurred in wave 3 [Extended data, 29 Section 1 - Figure S10A]), and in close succession (≤7 days between streaks). Multi-node networks involving ≥3 postal codes occurred in 3 distinct neighborhoods in wave 2, and another three neighborhoods within wave 3. These neighborhoods were often comprised of multiple adjacent apartment buildings, and rarely occurred in low-density residential areas. The PCs where streaks tended to coincide were often proximate to facilities where communities may have congregated (grocery stores, schools, parks, or religious sites). We investigated whether the participation of many PCs in a multi-node network was a result of higher case counts during a streak (which would suggest an increased probability of transmission between these locations). However, bimodal networks of two PCs with recurrent high case counts were not rare (e.g., >100 cases within a 9-day streak in M1R 1T1), so other factors are necessary to explain why some PCs form multi-node networks. Furthermore, the average numbers of cases within streaks in bimodal networks and multi-node networks were comparable (18.8 and 17.0 cases per streak, respectively).
In some instances, PCs began to accrue cases prior to their inclusion in significant clusters of PCs or reaching the minimum numbers of cases to be designated a COVID-19 hotspot (M4H 1K1, M4H 1K2, M4H 1K4, M4H 1L1, M4H 1L6, and M4H 1L7). These PCs were nevertheless elements of the M4H FSA streak. Similarly, M4H 1L3 and M4H 1L4 were elements of a cluster containing M4H 1J4 on Nov. 8-10, 2020, at least 2 weeks prior to the occurrence of the corresponding multi-day streaks. Consecutive streaks within a set of clustered PCs in Etobicoke (M9V 3S6, M9V 3Z8, M9V 4A4, M9V 4A9, M9V 4M1, M9V 4P1, and M9V 5G9; centroids were all within 0.63km of one another) occurred between mid-October and late December 2020 ( Figure 8B). M9V 5G9 is separated from the other PCs by a wooded area, but all are close to a major thoroughfare, Kipling Avenue. A striking directional acyclic network of COVID-19 cases involved 7 adjacent PCs whose streaks occurred concurrently and were occurred nearly consecutively along Bathurst Street, Toronto (M2R 1Z2, M2R 1Z7, M2R 1Z8, M2R 1Z9, M2R 2A1, M2R 1Z6, M2R 2Y2; within 0.57km of one another; Figure 8C). Curiously, the order of occurrence of COVID-19 infections among these PCs consistently followed a southerly direction along Bathurst Street.
The multi-node networks during wave 3 consisted of only three postal codes (M3C, M9V, M9W; Extended data, 29 Section 1 - Figure S10A). Increasing the number of neighbors (by relaxing distance constraints, from 10 to 30 PCs) for these PC networks results in multi-node networks that were previously identified as bimodal (M4H, M4C, and M4X; Extended data, 29 Section 1 - Figure S10B). This procedure also resulted in four additional bimodal pairs of PCs in wave 2 (Extended data, 29 Section 1 - Figure S9D); however, the total number of PCs comprising the multi-node networks in wave 2 was unchanged (Extended data, 29 Section 1 - Figures S8A-C).

Discussion
Geostatistical analysis revealed the locations and relationships between COVID-19 hotspots, utilizing daily postal codelevel case data in Ontario Canada from March 26 th , 2020 through June 25 th , 2021 (including Gi*, Spatial Autocorrelation, Clustering and Outlier analyses and interpolation by kriging). Previous efforts to integrate geostatistical and epidemiological data have not investigated COVID-19 at this level of geographic and temporal resolution. This approach more precisely localized COVID-19 hotspots, and enabled early detection of disease clusters, which could be exploited to prevent or limit further spread at or close to these hotspots (provided complete and up-to-date COVID-19 case data is made available in a timely manner). Concurrent or consecutive neighboring hotspots classified as persistent high-case clusters were flagged as possible evidence of disease transmission.
The COVID-19 epidemic in Ontario has been classified into three distinct waves of infections. 35 The recurrence of successive waves shortly after easing of public health measures strongly motivated development and implementation of novel strategies that anticipate the locations and timing of future hotspots. Network analysis demonstrated that these locations may form nuclei from which subsequent local outbreaks arise. COVID-19 clusters were considerably diminished in the older at-risk population during the third wave 3, most likely, because of province-wide vaccination compliance, which resulted in lower mortality in long-term care residences during this period.
The techniques implemented in this study could prove useful to assist with contact tracing of infectious diseases, which may be challenging to carry out in a timely fashion once community spread has already occurred. Contact tracing has been vital to identify potential sources of viral transmission from infected cases. Successful contact tracing is impacted by incomplete collection of information, specifically interviewees who provide imprecise information, omit critical interactions, or who are uncooperative. 36 The directional acyclic network graphs presented are consistent with transmission between individuals residing in multiple PCs in close proximity to each other. These tools could assist tracing investigations by focusing resources on the most likely locations where interactions between infected and uninfected persons occur. It may be possible to exploit observations of COVID-19 infections at consecutive, neighboring PCs to anticipate the occurrence and locations of subsequent infections. This was exemplified by the progression of cases along PCs on Bathurst Street, Toronto in wave 2 ( Figure 8C). We suggest that these and similar patterns may be useful as features along with cellular phone mobility data from dates preceding the appearance of future hotspots for predictive machine learning models. Accuracy in predicting the timing of future hotspots, however, would be influenced by the lag between date of infection and presentation of symptoms and by compliance of symptomatic individuals in seeking testing. Application of geostatistical analyses and other technologies 37 may overcome the challenges posed by noncompliance or erroneous COVID-19 information from infected persons.
We also identified persistent, clustered postal codes that recurred between waves. Neighboring PCs comprising high-case clusters over a series of consecutive or overlapping dates were proposed as evidence of COVID-19 transmission. Nine PC pairs across 4 separate regions in Toronto were flagged in both waves 2 and 3, which were suggestive of recurrence. Without viral sequencing data, it is not possible to unequivocally establish whether these local outbreaks were independent events or due to re-emergence of the same infection from a low-level reservoir of asymptomatic or persistent mild infections during interwave 2-3. The same type of analysis did not yield evidence of inter-wave COVID-19 transmission in the other municipalities evaluated. Because of their comparatively lower population densities and smaller geographic size, it is less likely that the case thresholds required to define a COVID-19 hotspot within a 3-day window can be met in other Ontario municipalities. Increasing the aggregation of cases from 3-to 7-day sliding windows might improve detection of likely COVID-19 transmission in these areas.
Unlike traditional geostatistical epidemiology, we did not determine the likelihoods of hotspots based on simulations of random distributions of the same number of cases in the analyzed region, 14,38 since background levels of COVID-19 were undetectable prior to the pandemic. Therefore, performing geostatistical tests on positive COVID-19 cases is valid statistically. FSA-level analysis on population-normalized case counts (cases per 100,000; where FSAs with <100 individuals were not analyzed) was also performed (March to November 2020). 29 Analyses could not be normalized by population densities (cases per 100,000) since these data are not publicly available for individual PCs. However, the smallest geographic units defined by Statistics Canada, termed Dissemination Areas, encompass an average of 15 postal codes (consisting of 400-700 persons). 39,40 A seroprevalence study of COVID-19 antibodies in Canada has suggested that infection rates were up to 3-fold higher than detected by RT-PCR testing during the first wave. 34 Similar studies have also indicated underreporting in the U.S. 41 and Europe. 42,43 This included asymptomatic and untested symptomatic infected individuals 42 or who received negative results. 44 Our study assumes that the geographic and temporal distribution of unrecognized infections trends similarly with known cases. However, nonuniformity in the distribution of asymptomatic and undetected cases across Ontario could have altered the detection or localization of hotspots. It seems likely that significant hotspots detected by geostatistical methods that also fulfill minimum case count thresholds would seem to be less likely to present as false positives or be incorrectly localized.
Some geostatistical functions (e.g., Gi* and Cluster and Outlier analysis) could be computed rapidly (within seconds for a single day analysis), while others (e.g., intersection of kriging contours PC boundaries) required up to 10 minutes per date analyzed on the ICES research computing system. In order to scale these computations over the duration of this project, it was critical to automate analyses for Provincial and multiple Municipal jurisdictions for timely generation and interpretation of these results. In April 2021, the sponsor of our AHRQ project requested an analysis of recent COVID-19 count and location data (January-March 2021). We automated kriging of localized hotspots in Toronto, Windsor/Essex, Ottawa and London within 5 days from the receipt of the request. This entailed performing the software analysis, imaging the results as maps, and interpreting these results. The rapid turnaround of geostatistical analysis of ongoing infectious diseases could have practical value for time-sensitive, critical public health management decisions. These programs have been made as a publicly available package, reformatted as a user-friendly toolbox in ArcMap (the Geostatistical Epidemiology Toolbox). 29 FSA-level geospatial tests, such as Gi*, did not reveal the distributions of COVID-19 infections, a limitation of this approach. We attempted to mitigate this by integrating the results of multiple geostatistical methods at different levels of resolution (e.g., Spatial Autocorrelation determined if cases were clustered at the PC-level). The locations of individuals in the same FSAs and PCs were aggregated at the centroids of their respective boundaries. However, boundary shapes and areas covered by different FSAs or PCs can vary, even though the total populations within each are more similar. Rural FSAs can be larger than urban counterparts and are occasionally non-contiguous (e.g., the FSA N0P). While close proximity of COVID-19 positive individuals in high-density municipalities is more consistent with transmission, it was nevertheless feasible to detect hotspots in rural regions. However, high concentrations of cases are required to find statistically significant hotspots in these regions. Since we use PC centroids to assign cases, this also affects the contours defined by kriging analysis of rural regions. These are limitations of the imposed boundaries of FSAs and PCs and not of the geostatistical tests used. Due to privacy considerations imposed by data providers, we could not perform kriging analysis on exact locations of COVID-infected individuals. Kriging interpolation can overlap multiple FSAs (Extended data, 29 Section 1 - Figure S11), which can be advantageous over area-to-area geostatistical tests, as hotspots in close proximity that cross an FSA boundary could be missed with these tests. However, FSAs with few or no cases that are adjacent to a local hotspot could mistakenly be implicated. We account for this by ignoring FSAs without case counts, however this effect can still impact interpolation accuracy. For example, the interpolated case count for the FSA N8H in late June 2020 was much higher than the ground truth (Extended data, 29 Section 1 - Figure S5B). Over the same period, 109 cases were reported over a 3-day period within N0P 2J0, which is immediately north of N8H. The high-case kriging contour detecting this event intersects the boundary of N8H, thus causing this discrepancy (Extended data, 29 Section 1 - Figure S5C).
In late 2020, Ontario announced prioritization of COVID-19 vaccinations beginning with those with the highest risk of contracting severe illness. The subsequent group included individuals residing in hotspot communities. 45 These hotspots were identified in FSAs with the highest counts per 100,000 population. However, cases are not always uniformly distributed within FSAs with high disease burden ( Figure 4). This study illustrates the importance of identifying hotspots at a higher resolution and describes how this level of precision provides targeted information about transmission that is often not feasible with FSA-level analyses. To minimize high rates of community transmission in the future, we suggest that targeted vaccination campaigns focus on highly constrained, localized hotspots with high case counts at the earliest possible juncture (rather than broadly across a region or FSA). For example, a hotspot in London was detected by kriging over several weeks at levels below our pre-selected interpolated case thresholds ( Figure 6). The results of this analysis surpassed these thresholds approximately one week before it was publicly revealed. Thus, targeting can achieve the necessary precision and at an earlier stage with these methods. Smaller targeted locations may also facilitate communication, promote compliance and implementation of individual public health measures.

Data availability
Zenodo: Data and Software Archive for 'Likely community transmission of COVID-19 infections between neighboring, persistent hotspots in Ontario, Canada, http://doi.org/10.5281/zenodo.5585811. 29 This project contains the following underlying and extended data, organized across 4 sections. Section 1 and 2 contain extended data, and Sections 3 and 4 respectively consist of data files containing and map image files displaying the underlying statistics from the various geostatistical analyses performed in this study. This project contains the following extended data:

Extended data
• Section 1 -An Excel document containing 7 additional tables described in this study which provide summary statistics of various geostatistical tests ("Section 1 -Tables S1-S4, S6") and lists all identified single and paired high-case cluster streaks ("Section 1 -Tables S5, S7"). This section also contains 11 additional figures referred to in the manuscript ("Section 1 -Figures S1-S11") with a Word document which describe them. were similar to actual case counts (red line). COVID-19 outbreaks within these FSAs were often identified by Gi* analysis (green asterisks), however the distribution of cases within these regions were infrequently found to be spatially clustered (purple crosses). Panel (C) indicates Power-based kriging contours can be large in regions with low population densities due to the distribution of cases. Interpolation of cases within N8H from June 22nd to June 26th, 2020 exceeded than actual counts due to the contribution of a high-case postal code in a neighboring FSA (N0P), with the resultant kriging contour intersecting both FSAs.
○ Figure S6. A Localized Hotspot in Leamington, Ontario. PC-level COVID-19 case data of the FSA: N8H (Leamington, ON) on consecutive dates in December 2020 evaluated by kriging interpolation (blue line), testing for significance by Spatial Autocorrelation (purple crosses) and Gi* analysis (green asterisk). B) Map contours and distribution of cases. Small circles in this region represent PCs with one or more cases from December 21st to the 23rd, 2020. The maximal kriging contour was coincident with the hotspot in PC N8H 3V5 during this window.
○ Figure S7. Spatial Clustering and the Identification of Localized Hotspots. Spatial autocorrelation analysis identifies clusters of postal codes with high case counts within FSAs. Panels (A) displays significantly clustered cases in Toronto FSA M1P from Nov. 3-5, 2020. This region was also significant by Gi* and contained a localized hotspot consisting of 8 cases (M1P 4V6). Each dot represents a postal code with at least 1 but fewer than 6 cases. (B) Cases in M1C were also significantly clustered from Nov. 6-8, 2020, but no hotspots were identified. Topographic contours indicate kriging-interpolated case counts.
High-case postal code clusters were identified by Cluster and Outlier analysis (Local Moran's I). High-case cluster streaks (PCs significant over multiple consecutive days, allowing for a single day gap) were identified and paired to other streaks within close proximity (considering 50 closest neighbors based on distance between PC centroids). Each point represents a paired PC streak with colors representing the total number of cases in both, the X axis indicates the duration between PC pairs (with negative values representing degree of overlap between end of first streak and beginning of the other), and the Y axis indicates spatial distance between the centroids of each PC forming a pair. All paired streaks are illustrated, regardless of wave. o Figure S11. Overlap of Localized Hotspots with Multiple FSAs. Kriging contours can overlap multiple FSAs. While the kriging semivariogram model selected can affect their frequency, contours which intersect ≥2 FSAs can occur with either method. Panel A includes: a TPS-based kriging contour which intersected 9 separate FSAs in Toronto, and panel B shows a power-based kriging contour which interpolated ≥10 cases over 6 unique FSAs in London.
• Section 2 -All localized hotspots (identified through kriging analysis) catalogued for each municipality evaluated are provided in this section. Each data file lists the FSA containing the localized hotspot, the size of this hotspot (% by area), the 3-day window in which it was observed, the interpolated and true case values on this date, and whether the FSA was also significant by Gi* analysis (on any of the 3 days analyzed).

Underlying data
Zenodo: Data and Software Archive for 'Likely community transmission of COVID-19 infections between neighboring, persistent hotspots in Ontario, Canada': http://doi.org/10.5281/zenodo.5585811. 29 This project contains the following underlying data: • Section 3 -All underlying data from the geostatistical tests performed in this study are provided in this section. This includes the output from Ontario-wide FSA-level Gi* and Cluster and Outlier analyses, and PC-level Cluster and Outlier, Spatial Autocorrelation, and kriging analysis of 6 municipal regions. Kriging analysis output files are also provided for 7 municipalities immediately surrounding Toronto. This section also contains data files from our analyses of stratified case data (by age, gender, and at-risk condition). All coordinates presented in these data files are given in "PCS_Lambert_Conformal_Conic" format. Case values between 1-5 were masked (appear as "NA"). •

Software availability
The Zenodo repository (http://doi.org/10.5281/zenodo.5585811 29 ) also provides all program files pertaining to the (Geostatistical analysis software package to be used in ArcGIS), as well as all other scripts described in this manuscript. This code is available under the terms of the GNU General Public License v3.0.

Open Peer Review
they really meant by "disease background levels" and how it restricts the use of the mentioned software including R for making geospatial analysis of COVID-19 data. As a fact, there are several of such analyses out there that used different methods and software particularly R. A Poisson geoadditive model, for instance, can be fitted to case count data on COVID-19 over time, which will generate some beautiful spatial maps without the need for any background disease as claimed.
The relationship between FSA and PC is not clear in the manuscript. This may be confusing for some readers, especially those not familiar with Canada. More detail is required in this aspect by clearly mentioning the benefits of why both FSA and PC were chosen rather than either one of them. An illustrative graph showing the FSA and PC regions would be helpful.

2.
It is difficult to understand the illustration in Figure 1. A mathematical notation/formula should be used to support the illustration, clearly defining all the parameters involved in the formula. 3.
The authors mention that "the software can be easily modified to accommodate other Canadian provinces and, with additional effort, other countries." However, there is no guide or link to a description of how the developed tool could be used and probably extended to other provinces or countries.

4.
More details are required for the multinomial logistic model considered. Explicit details such as the response variable and the corresponding covariates should be clearly stated. If possible, the regression model should include stratification variables as covariates, such as categorized age, and "at-risk group" for example. The conclusion drawn from such analysis can be discussed in the Discussion session. It is not clear whether the developed toolbox was used for the estimation of the multinomial regression model or the Directional Acyclic Graph Network.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
Page 28 of 47 F1000Research 2022, 10:1312 Last updated: 08 AUG 2022 Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 21 Jul 2022 Peter Rogan, University of Western Ontario, London, Canada Comment #1. In the introduction, the authors, arguing against some geostatistical software, stated that "… however, these were not used in the present study because of their requirement to include disease background levels, which was not relevant for COVID-19." I wonder what they really meant by "disease background levels" and how it restricts the use of the mentioned software including R for making geospatial analysis of COVID-19 data. As a fact, there are several of such analyses out there that used different methods and software particularly R. A Poisson geoadditive model, for instance, can be fitted to case count data on COVID-19 over time, which will generate some beautiful spatial maps without the need for any background disease as claimed.

Response:
We appreciate this comment, in particular about the Poisson geoadditive model. Depending on which R script is used, analysis with R may or may not require background levels to be input. We have removed the citation to R, as suggested by another reviewer.
Comment #2. The relationship between FSA and PC is not clear in the manuscript. This may be confusing for some readers, especially those not familiar with Canada. More detail is required in this aspect by clearly mentioning the benefits of why both FSA and PC were chosen rather than either one of them. An illustrative graph showing the FSA and PC regions would be helpful.

Response:
To address the lack of familiarity on Canadian Forward Sortation Areas and Postal Codes, the sentence which introduces FSAs and PCs in the Introduction has been expanded. We also relate FSAs conceptually to U.S. zip codes: "distributed at the level of individual forward sortation areas (FSA) and postal codes (PCs; a smaller geographic segment within an FSA). Area-to-area analyses were performed on COVID-19 case data at the provincial level by FSA, a geographical unit similar to zip codes in the United States which can encompass hundreds of PCs." We have also added the following to infections was often non-uniform in many FSAs (as we describe), this level of granularity was insufficient to fully elucidate its local distributions, and analysis of PCs was necessary.
This possibility is mentioned in the Introduction (second paragraph): "Available case data analyzed has been typically aggregated at the municipal or county level, however higher resolution spatial analyses has been limited as finer granularity for the distribution of COVID-19 cases has often been lacking." Subsequently, we generally describe FSAs and PCs. However, it was not obvious that PClevel COVID-19 case data produced a high level of geographic resolution that had been previously unavailable. We have therefore added the following at the end of this paragraph:: "Therefore, by evaluating COVID-19 case data at the PC level, we will be able to evaluate disease distributions at a high level of granularity." Comment #3. It is difficult to understand the illustration in Figure 1. A mathematical notation/formula should be used to support the illustration, clearly defining all the parameters involved in the formula.

Response:
Formulation of the Getis-Ord local statistic is now presented in the third paragraph of the Introduction, with descriptions of each parameter within each equation, as requested.. We have also expanded the Figure 1 legend to describe how the weight function of these formulas differ depending on the spatial relationship being used. Weights of two features are equal in Fixed and K Nearest Neighbors, given that the two points are within a given distance metric (e.g. within 1km for Fixed, or is one of the K closest neighbors to the feature being evaluated for KNN; otherwise, w i,j = 0). The inverse distance spatial relationship, however, computes weight as the inverse Euclidean distance between two features being evaluated. This is calculated as w i,j = d i,j -β where d i,j is the Euclidean distance between two points and β is the inverse distance power variable (β = 1 in this study). This is now provided in the legend of Figure 1.
Comment #4. The authors mention that "the software can be easily modified to accommodate other Canadian provinces and, with additional effort, other countries." However, there is no guide or link to a description of how the developed tool could be used and probably extended to other provinces or countries.

Response:
We have added a section to the Geostatistical Epidemiology Toolbox user manual (in the Zenodo archive) which details the steps required for evaluation other Canadian Provinces or other countries. We have modified the original statement in the main manuscript to direct readers to the Geostatistical Epidemiology Toolbox user manual (which now contains a section on modifying the toolbox for other countries and Canadian provinces):" While the toolbox was designed to evaluate COVID-19 case data in Ontario, the software can be easily modified to accommodate other Canadian provinces and, with additional effort, other countries (as described in the Geostatistical Epidemiology Toolbox user manual 29 )." Using the toolbox to evaluate disease geostatistics in other Provinces is a relatively straightforward, requiring only alteration of a few lines of the main Python toolbox code ("GeostatisticalEpidemiologyToolbox_CytoGnomix.pyt"; see Toolbox guide sub-section "Evaluating Other Canadian Provinces" within the Zenodo archive file "Software.GeostatisticalEpidemiologyToolbox.zip").
Modifying the Toolbox to evaluate hotspots and interpolation in other countries would be more involved ( "Evaluating Other Countries" section of the manual). The structure of the shape files that define the regions equivalent to an FSA or a postal code in other countries will likely follow a different naming convention (for example, county subdivisions and zip codes in the United States). We now indicate in the Toolbox user manual which column names used by the Toolbox are most likely to require modification ( 'PROV' and 'PRNAME').
Since it is not possible to generalize the structure of every shape file defining geographic elements of other countries, potential users should consider contacting us for assistance.

Comment #5.
More details are required for the multinomial logistic model considered.
Explicit details such as the response variable and the corresponding covariates should be clearly stated. If possible, the regression model should include stratification variables as covariates, such as categorized age, and "at-risk group" for example. The conclusion drawn from such analysis can be discussed in the Discussion session. It is not clear whether the developed toolbox was used for the estimation of the multinomial regression model or the Directional Acyclic Graph Network.

Response:
Postal codes were identified as a high-case cluster (by Cluster and Outlier analysis) over a period of >3 consecutive days (i.e., PC "streaks"), and established which of these streaks occurred in two neighboring PCs in close succession. We noticed that the number of cases reported in each paired streak, the interval of time separating each streak, and the physical distance between the PCs might be related. To establish this, we derived multinomial logistic regression models based on these factors, in which the total number of cases within the paired streaks was defined as the response variable. The relationship between the number of cases between streaks of pairs of PCs in close proximity and the time separating the occurrence of those two streaks was significant.
We now provide the R code used to derive these multinomial logistic regression models ("Streak_Analysis_using_Multinomial_Logistic_Regression_Models.Rmd") as a new section within the Zenodo archive ("Software.Additional_Programs_for_Cluster_Outlier_Streak_Identification_and_Pairing.zip"). The paragraph describing the multinomial logistic regression model (third paragraph of the Results has been updated to reflect this change: "The distribution of and relationships between clustered PCs with persistent COVID-19 cases"). This section has been revised to describe the software used to derive the model (including the name of the script and a citation of the Zenodo archive), and explicitly states which of the three variables was used as the response variable: The input data are also included in the archive.
The description now states: "To assess the significance of these results, a multinomial logistic regression model was derived in R ("Streak_Analysis_using_Multinomial_Logistic_Regression_Models.Rmd" which uses the multinom function of the nnet package; available in Zenodo archive 29 ), evaluating (1) the total number of cases in each pair of PCs over the course of both streaks (classified as either <25, between 25 and 49, or ≥50 cases; the response variable of the regression), (2) the duration … " This analysis was not performed for stratified case data because it considerably decreases the number of high-case clusters detected per class (Table 1). This would result in far fewer and shorter PC "streaks", and the lower statistical power would make analysis would less meaningful.
The Geostatistical Epidemiology Toolbox was developed to perform and visualize geostatistical analyses through the ArcMap graphical user interface using ArcPy (which enables Python programs to access ArcGIS's geoprocessing tools and extensions). The directional acyclic graph networks, as well as the regression, were performed outside of this environment (using MatLab and R, respectively). Furthermore, these techniques examined a subset of our results. Thus, the toolbox does not and can not perform either task.
these FSAs without masking, just within a separate computing environment? It was not clear to me whether they were masked or not in the final analysis.
In the fourth paragraph of the results, why were comparisons between kriging and the cluster/outlier methods restricted to March 26 to December 28, 2020? 4.
In Figure 3B, what do the blue crosses represent? I thought they represented the observed data, in which case I would have expected the same blue crosses in the top and bottom panel. Please clarify.

5.
Please provide additional information on the "relative density" metric in Figure 4. Is there a reason this particular area and time were highlighted in this figure? It may be interesting to show a hotspot that occurred during a wave rather than in a time window with low disease burden. 6.
If possible, please combine all resources into one repository. 7.
I did not see underlying data in any of the linked repositories. If the underlying data cannot be publicly shared, please provide a statement describing how the data may be accessed.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Partly

Response:
We agree that including regional statistics and maps of the areas being studied would be beneficial to the manuscript. We have therefore chosen to append this information to Table  1A of the manuscript, as the structure of the table was well suited for it. The first column of Table 1 includes a map of lower Ontario with the location of each region of interest, as well as maps of each region indicating their boundaries. We also include the population, the area (km 2 ) and the number of FSAs and PCs in each region analyzed.
We have also added a new column which gives an estimate of the number of cases present in each region (the sum of cases from individual FSAs associated with the region of interest), divided by waves. These case numbers are estimated, since our contract with ICES has ended, and they have removed our access to actual case data. Thus, any dates in which case data was masked (i.e. the FSA had between 1-5 cases on a particular date) were considered to be a single case by this calculation (which is indicated in the footer of Table 1).

Major Comment #2:
The motivation behind the subgroup analyses (sex, age, health conditions) was not clear to me. It seems like clusters within each subgroup would tend to appear simply where the highest number of individuals in each subgroup reside, and I'm not sure we would expect the transmission to occur along subgroup lines (for example, we might expect younger family members to transmit to older ones rather than transmission occurring between older individuals). Please explain the reasoning and implications of this analysis further; it may be more suitable in the Extended Data as it seems less relevant to the overall conclusions of the manuscript.

Response:
The identification of COVID-19 clusters according to these subgroups was specifically requested by the Ontario Ministry of Health. There was interest in identifying differences in how COVID-19 cases were distributed across our regions of interest. This has merit since clustering of a particular subgroup was often correlated with known events occurring in those locations, e.g. high-case clusters of COVID-19 in predominantly young males in rural regions of Windsor/Essex reflecting outbreaks in migrant farm worker communities at this time). We have not moved this analysis (including Table 1) to the extended data.
The analysis we describe does not imply that transmission is occurring between or within subgroups. Nonetheless, the distribution of cases by subgroup was of particular interest to the Ontario Ministry of Health. It did not reflect the geographic bias in the residence of subgroup members. We have therefore added the following to the end of the fourth paragraph of Results subsection "Wave 1" (page 17, paragraph 1 without markup): "This stratified analysis itself does not provide insight into the mode of COVID-19 transmission, that is, the development of a hot spot consisting of a particular subgroup does not indicate that viral transmission occurred exclusively between members of the particular subgroup."

Major Comment #3:
I found it difficult to distinguish between the space-time Gi* and the "streak" analyses. How do they relate to one another? Is there any additional information gained from the spacetime Gi* analysis?

Response:
Space-time Gi* analysis and Cluster and Outlier 'streaks' are indeed two separate analyses. Results using space-time Gi* analysis (built-in to ArcGIS) is provided at the start of the Results. We found that space-time Gi* analysis was generally uninformative, as the analysis called nearly all regions with high numbers of cases (>10 at the FSA level) as statistically significant (>90% in most waves). Therefore, we did not pursue it further. However, this was not explicitly stated, so we have added the following statement to the aforementioned Results paragraph: "As the vast majority of regions were flagged by this approach (>90%), we did not gain any insight from this space-time analysis." We then investigated temporal trends using Cluster and Outlier analyses of "streaks" of consecutive days that particular locates were deemed hotspots. analysis This is a form of "space-time" analysis, but is completely independent of any functional tests built into ArcGIS. We aren't aware of any prior description of this method. We have modified the text to clarify that this was completely separate from space-time Gi* analysis (page 14, paragraph 2): "Instead, we evaluated temporal trends in the distribution of COVID-19 cases and identified potential transmission using Cluster and Outlier analysis of PCs that persisted across a consecutive series of dates (referred to as high-case cluster 'streaks'; described later in the Results section)."

Major Comment #4:
Please describe in more detail the approach that was used to evaluate the kriging and semivariogram models.

1.
Why do you think the models performed differently in Toronto compared to other cities? 2.
In addition, please clarify -were PCs with 0 cases also excluded as neighbors (i.e., were neighbors in the kriging analysis restricted to neighbors with cases >=1)? 3.

Response:
(A) In the second paragraph of Methods subsection "Empirical Bayesian Kriging analysis", we describe our reasoning for prioritizing the Power and TPS semivariograms for low/medium and high case density regions, respectively. Some detail was excluded due to the overall length of the manuscript.  10,15]). The kriging contours generated were converted to polygons in order to perform an 'intersect' in ArcGIS between these contours and PCs, and from this intersection, the values interpolated by kriging were compared to true case counts. The Power semivariogram generally modeled counts better in low and medium-density municipal regions. Although TPS often performed well, it produced artifact contours in regions without cases, as described in the Methods ("TPS, in some instances, generated kriging artifacts that did not correlate with known hotspots"). However, the Power semivariogram model was sometimes inferior to TPS, with PCs having high case counts were being missed. Thus, we used TPS in Toronto, considering only 5 neighbors which minimized the kriging artifacts without sacrificing accuracy. This observation was supported by a chi-square test, whereby kriging interpolation with the TPS semivariogram utilizing ≥10 neighbors was frequently significantly different than ground truth (e.g. 72% of dates in Wave 2 were significantly different [p<0.05] when considering 10 neighbors) This did not happen when considering 5 neighbors (p>0.05 for all dates in Waves 1 and 2). This analysis justifies our selection for the number of neighbors parameter when performing kriging, and this analysis is now briefly described in the manuscript: "Using 5 neighboring locations for interpolation minimized these artifacts without affecting the contour map. However, with 10 neighbors, interpolation accuracy was affected according to a chi-square test (e.g., p < 0.05 for the 103 days in Wave 2)." With similar tests, we found that the default kriging parameters (other than neighbor count) were optimal (or lead to equivalent results) regardless of semivariogram used. Multiple kriging parameters are used to define a search neighborhood, the window which delineates which measured locations will and will not be included to predict the value of a region.  Table 1).
Toronto exhibits far more PCs with cases than the other regions tested on most dates. The majority of these PCs have cases between 1-5 (<1% of PCs in Toronto had >5 cases across all dates analyzed). This pushes the interpolated case counts downwards (similar to how zerocase PCs depressed kriging values; see next section). Since the "Power" semivariogram is a parabolic mathematical representation of cases in the region, a large amount of low-case PCs may 'smooth' any contour peaks in the region. Since TPS semivariograms attempt to pass through binned data points exactly, these peaks are not lost. Thus, the true case count of high case areas are better represented within the kriging contour. Kriging analysis was performed on COVID-19 case data that did not include PCs with zero cases. In the first paragraph of Methods subsection "Empirical Bayesian Kriging analysis", we state: "Only PCs with ≥1 case were included in this analysis, as locations without confirmed cases were found to severely depress kriging signals." PCs with zero case counts are more frequent, even on dates where PCs with cases were at its maximum (e.g. from April 12 to the 14, 2021, 2377 PCs in Toronto had ≥1 case, which is only 5% of the total number of PCs in this region [2.4 to 3.8% for the other 5 regions of interest). These PCs were so pervasive that they depressed the interpolated values of actual cases in PCs with cases. The case count interpolated by kriging was never representative of (i.e. biased against) actual case counts when zero case PCs were included in our testing. We modified the relevant (above) to further explain this: "Only PCs with ≥1 case were included in this analysis, as locations without confirmed cases greatly outnumbered PCs with cases (≥95% of PCs on any given date evaluated) leading to the severe depression of kriging signals."

Major Comment #5:
Is there a way to incorporate uncertainty in the kriging analysis? I would also be cautious about using the term "significant" to describe areas with interpolated counts >5, as this may be mistaken for statistical significance.

Response:
The ArcGIS software is capable of determining uncertainty when performing Kriging ( https://desktop.arcgis.com/en/arcmap/10.5/extensions/geostatistical-analyst/kriging-ingeostatistical-analyst.htm) with standard error maps ( https://desktop.arcgis.com/en/arcmap/latest/extensions/geostatistical-analyst/usingordinary-kriging-to-create-a-prediction-standard-error-map.htm). We no longer have access to the ICES computer system to generate such plots. Furthermore, The 'layer' files are linked to the input used to generate the kriging layer which is no longer available to us.
The main text has been edited to remove instances of the term "significant" for regions interpolated by kriging consisting of >5 case counts. This was changed to "region of interest", "localized hotspot", or avoided. The name of Supplemental Figure S3 was also changed.
Major Comment #6: ...graphs in Figure 8 were a bit challenging to interpret. Visually, it may help to arrange the points similarly to how they are geographically located and/or to change the length of the edges in proportion to the duration separating adjacent pairs (the current label).

Response:
The directional network graphs in Figure 8 were created with the MatLab digraph function. digraph does not provide any option to incorporate the recommended changes. The order of data points and edge length are automatically specified. Our only option was to label each path with distance, number of days separating pairs of neighboring streaks, PC identifiers, etc. The maps accompanying the graphs provide a geographic context for each network. Figure 8 and Extended Data Tables S9 and S10 remain unchanged.

Major Comment #7:
I would like further discussion about how the authors see these analyses being applied, especially in real-time. For example, the first paragraph of the Discussion says that this method "enabled early detection of disease clusters" -please provide additional evidence of this from the analyses. Additionally, based on the authors' experience of applying many different methods, are there specific ones that the authors would recommend for real-time use (or public health practice in general)?

Response:
We provide an example within which the FSA N5Y in London Ontario was identified as a region of interest (>5 cases interpolated) by kriging, one week before it was publicized ( Figure 6). To provide further evidence of early detection, additional examples (in a new supplementary figure S3A and S3B) of identified localized hotspots prior to public announcement are described: "Localized hotspots within the FSA M9P were observed as early as March 31 st , 2020 however the FSA was not flagged by Gi* analysis until April 6 th , 2020 (Extended data, Section 1 - Figure S3A). The localized hotspot covers the Village of Humber Heights Retirement Home in Toronto, which was reported to have a COVID-19 outbreak on April 12 th , 13 days after the localized hotspot was first detected by kriging. " "For example, localized hotspots in FSA M9V were identified sporadically, and the interpolated value does not always match total cases (Extended data, Section 1 - Figure  S3B). The localized hot spots cover the PC M9V 5B5 starting from April 4 th . Outbreaks in the long-term care home within this PC (Humber Valley Terrace long-term care home in Toronto) were reported on April 16 th , 2020. This FSA was not flagged by Gi* analysis until April 19 th (Extended data, Section 1 - Figure S3B)." There were several instances where kriging identified a localized hotspot prior to public declaration within the same region. These was not found by either Gi* analysis or Cluster and Outlier analysis. While kriging analysis requires more time than these other statistics to generate results (minutes vs seconds per hotspot), software automation on high-powered systems at provincial and municipal scale improved performance of these processes. In April 2021, the Ontario Ministry of Health requested that we perform a full geostatistical analysis of COVID-19 in four municipal regions [London, Ottawa, Toronto and Windsor/Essex] from January and March 2021. We returned the results, including an evaluation of localized hotspots with kriging, within 5 days (see Discussion, paragraph 7). The "Streak" analysis method (based on results from the Cluster and Outlier analysis tool) was only used to trace disease spread; this was a retroactive analysis to identify neighboring, persistent, and overlapping COVID-19 clusters. We believe that additional work is required to prove the validity of use of this approach pro-actively.
Early hot spot detection is predicated on the availability of current and complete case data. Except for holiday periods, COVID-19 case data was preprocessed and updated on a weekly basis at ICES. As previously published, the number of cases on the final 3 days of the prior dataset was updated and usually increased in the subsequent release (additional testing data was added to those dates), especially in rural regions where testing was subject to adjustment. We have therefore added the following to the Discussion: "… and enabled early detection of disease clusters, which could be exploited to prevent or limit further spread at or close to these hotspots (provided complete and up-to-date COVID-19 case data is made available in a timely manner)."

Minor Comment #1:
In the second paragraph of the introduction, the authors include R in a list of tools that were not used because they require background levels, but R is very general software with packages that may be used without disease background levels. I would consider removing it from this list.

Response:
Geostatistical epidemiology has usually required pre-existing background levels of disease, which is not relevant at the beginning of the COVID-19 pandemic. R and the majority of available epidemiology packages include background level adjustment. While some R packages may permit background to be ignored, we have removed R from the software listed.

Minor Comment #2:
In my opinion, some sections of the methods may be moved to the Extended Data (e.g. paragraphs beginning "Additional ArcPy scripts…" and "These processes were then integrated…") for conciseness. Tables 1 and 2 may also be better suited to the Extended Data.
Response: Table 2 has been moved to the Extended Data, as requested. Table 1 remains, as previously discussed (Reviewer #2, Major Comment #2). The paragraph beginning "These processes were then integrated…" contains our description of the Geostatistical Epidemiology Toolbox, which is important for reproducibility and portability of the methods we describe. We highlight this in the paper, and F1000Research requires that the software be available as a condition of publication. We have reduced its length rather than removed it. The deleted, relevant information can now be found within the User Manual accompanying the toolbox ("UserManual_GeostatisticalEpidemiologyToolbox_CytoGnomix.pdf") The paragraph beginning with "Additional ArcPy scripts…" has been moved to the Zenodo archive ("Documentation_for_Each_Section_of_Zenodo_Archive.docx".

Minor Comment #3:
Please clarify the paragraph about FSAs with case counts <5 -was it possible to analyze these FSAs without masking, just within a separate computing environment? It was not clear to me whether they were masked or not in the final analysis.

Response:
All analyses described using unmasked true case count data within the ICES environment, the only exception being the Gi* analysis that was performed to assess the impact of masking on the geostatistical results (described in the fourth paragraph of the Methods, where we found that masking significantly affected results: "We concluded that errors introduced by masking count values precluded analysis of masked data and required all work to be carried out within the RAE computing environment." We now explicitly state that no other geostatistical analysis described in this study used masked data, with the following statement in the Methods: "All geostatistical analyses presented in this study were performed against unmasked, true case counts." We also modified an earlier statement in the same paragraph regarding ICES stipulation that we mask FSAs/PCs with 1-5 cases to clarify that this practice applied only to data export: "To ensure patient privacy, ICES stipulated that case counts could not be exported from the ICES system unless FSAs and/or PCs exhibiting between 1 and 5 daily COVID-19 cases were masked." The following was also added to the Results (first para.): "Changes in the distribution of COVID-19 infections over time were evaluated through geostatistical analysis of true (unmasked) case counts within both FSAs and PCs."

Minor Comment #4:
In the fourth paragraph of the results, why were comparisons between kriging and the cluster/outlier methods restricted to March 26 to December 28, 2020?

Response:
The comparison between kriging and Cluster and Outlier analysis was part of a feasibility analysis (performed in January 2021) to compare the two methods. Expanding this analysis was unlikely to provide novel insights. We prioritized our limited access time to the data and computing environment to perform other analyses.

Minor Comment #5:
In Figure 3B, what do the blue crosses represent? I thought they represented the observed data, in which case I would have expected the same blue crosses in the top and bottom panel. Please clarify.

Response:
Blue crosses are bins of paired locations, grouped by distance and direction (as described in the Figure 3 description). How these bins are grouped by ArcGIS software is indeed different whether the Power or TPS semivariogram is being used. Additionally, how data points are binned can regionally differ, as the values and distances between points vary from location to location.
However, the two semivariograms from Figure 3B were not associated with each other. The original purpose of Figure 3B was to provide examples of "Power" and "TPS" semivariograms derived from this data. While not implicitly stated in the figure legend, we see now that how the inferrence of a relationship between the two semivariograms could be made. Figure 3B has therefore been updated with semivariograms that are representative of the kriging map presented in Figure 3A. The binned blue crosses for the "Power" and "TPS" semivariograms are similar, although not identical (for the reasons given earlier).

Minor Comment #6:
Please provide additional information on the "relative density" metric in Figure 4. Is there a reason this particular area and time were highlighted in this figure? It may be interesting to show a hotspot that occurred during a wave rather than in a time window with low disease burden. Figure 4 represents a region where COVID-19 cases were highly dispersed over a period of time. There are indeed likely other additional regions that could have been used, however the example selected is indeed interesting and worthy of discussion. The time window (September 10 th to the 29 th , 2020) spans the very end of Interwave 1 (September 24 th , 2020), leading into Wave 2, a time where the frequency of cases was rapidly increasing. The frequency of dates in which this FSA was called significant by Gi* analysis was similarly increasing. We therefore do not feel that replacing this figure for another example would be beneficial.