ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Correspondence

‘Not finding causal effect’ is not ‘finding no causal effect’ of school closure on COVID-19

[version 1; peer review: 1 approved, 1 approved with reservations]
PUBLISHED 25 Apr 2022
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Japan Institutional Gateway gateway.

Abstract

In a paper recently published in Nature Medicine, Fukumoto et al. tried to assess the government-led school closure policy during the early phase of the COVID-19 pandemic in Japan. They compared the reported incidence rates between municipalities that had and had not implemented school closure in selected periods from March–May 2020, where they matched for various potential confounders, and claimed that there was no causal effect on the incidence rates of COVID-19. However, the effective sample size (ESS) of their dataset had been substantially reduced in the process of matching due to imbalanced covariates between the treatment (i.e. with closure) and control (without closure) municipalities, which led to the wide uncertainty in the estimates. Despite the study title starting with "No causal effect of school closures", their results are insufficient to exclude the possibility of a strong mitigating effect of school closure on incidence of COVID-19. In this replication/reanalysis study, we showed that the confidence intervals of the effect estimates from Fukumoto et al. included a 100% relative reduction in COVID-19 incidence. Simulations of a hypothetical 50% or 80% mitigating effect hardly yielded statistical significance with the same study design and sample size. We also showed that matching of variables that had large influence on propensity scores (e.g. prefecture dummy variables) may have been incomplete.

Keywords

COVID-19, school closure, Japan, causal inference, reanalysis

Introduction

A paper recently published in Nature Medicine, Fukumoto et al. tried to assess the government-led school closure policy during the early phase of the COVID-19 pandemic in Japan. They compared the reported incidence rates between municipalities that had and had not implemented school closure in selected periods from March–May 2020, where they matched for various potential confounders, and claimed that they found no causal effect on the incidence rates of COVID-19. School closure as a means to control outbreaks has been studied mostly for influenza prior to the emergence of COVID-19, which generally suggested low-to-moderate effects, but the evidence on other respiratory infections including coronavirus diseases has been limited (Viner et al., 2020). Sometimes decisions need to be made in the lack of sufficient evidence in the earliest phase of the pandemic; nonetheless, such decisions should undergo retrospective policy assessment to provide insights and refinement for future pandemic responses.

One of the challenges in this type of analysis of the early COVID-19 epidemic in Japan is the limited statistical power due to low case counts. During the first wave of the epidemic from February to June 2020 that overlapped with the study period of Fukumoto et al., Japan never observed more than 1,000 COVID-19 cases per day. As a result, out of the total 79,989 municipality-level daily counts from the 847 municipalities included, 99.9% were less than 10 cases per day (Figure S2 of original study). Moreover, matching technique used to minimise confounding has a known side effect of limiting statistical power, especially when there is little overlap in the covariates between arms.

Unfortunately, the analysis in Fukumoto et al. appear to suffer from these issues. The study title says “No causal effect”, which is a rather strong statement given the substantial uncertainty in their estimates. As the saying goes, “absence of evidence is not evidence of absence”—when the uncertainty range covers practically meaningful values, it should not be prematurely concluded that there is “no effect” just because the effect estimates is statistically insignificant. Here I highlight limitations of the analysis and discuss possible factors that may have rendered the study underpowered.

Relative ATC and ATT estimates

The original study measures the effect of school closures as the absolute difference in incidence rates between the treatment and control municipalities. However, the theoretical ground is unclear for assuming a fixed additive effect of school closures to the incidence rate per capita. The effect estimates relative to the baseline incidence would be a more intuitive and interpretable measure for assessment of its practical use. It should also be noted that since incidence rates can only take non-negative values, the absolute mitigating effect of school closure can only be as high as the average incidence rate in the control group.

I rescaled the reported average treatment effects (average treatment effect on the control: ATC and average treatment effect on the treatment: ATT) and their confidence intervals relative to the average outcome (incidence rate per capita) in the control group (Figure 1). The confidence intervals of the relative ATC and ATT cover most of the regions from 100% reduction to 100% elevation, suggesting the underpowered nature of the original study. An effect of 50% reduction (i.e. -50% relative effect), which most experts would agree is of practical significance, or even complete reduction (i.e. -100%) was within the confidence intervals over the substantial part of the period of interest. ESS of the matched arms of around 40–50 (Figure 1d) was likely insufficient to find a statistical significance because incidence of infectious diseases typically exhibits higher dispersion than independent- and identically-distributed settings due to its self-exciting nature (i.e. an increase in cases induces a further increase via transmission).

437da452-ff32-4612-a74d-0402e3ef26b7_figure1.gif

Figure 1. Relative average treatment effect on the control (ATC) and average treatment effect on the treatment (ATT).

The turquoise vertical lines represent the date of treatment (school closure). The black lines and shaded areas represent the mean effect and 95% confidence intervals, respectively. (a) Relative ATC for the closure as of April 6, 2020. (b) Relative ATC for the closure as of April 10, 2020. (d) Relative ATT for the closure as of April 6, 2020. (d) Comparison of sample sizes. The number of all samples included for matching, the number of unique samples matched to at least one other sample and the effective sample size (ESS) of the matched samples are shown.

Statistical power demonstration with assumed causal mitigating effect of 50%/80%

To further examine the statistical power of the study, I artificially modified the dataset such that school closure has a 50% or 80% mitigating effect on the incidence rate per capita. On the treatment reference date (April 6) and onward, the expected incidence rate of each municipality in the treatment group was assumed to be 50%/20% that of the matched control municipality plus Poisson noise (see Extended data: Supplementary document for details). The results suggested that, even with as much as 50%/80% mitigating effect, the approach in the original study might not have reached statistical significance (Figure 2). The absolute ATT for the 50% mitigating effect (Figure 2b) appears similar to what were referred to as “no effect” in the original study. ATT for the 80% mitigating effect was also statistically insignificant (Figure 2c and 2d), suggesting that the study was underpowered to find even moderate to high mitigating effects, if any. ATC estimates also yielded similarly insignificant/barely significant patterns (Figure 3).

437da452-ff32-4612-a74d-0402e3ef26b7_figure2.gif

Figure 2. Simulated average treatment effect on the treatment (ATT) estimates assuming 50%/80% mitigating effects.

(a) The average outcome (incidence per capita) of the matched treatment (black) and control (red) groups for closure as of April 6, 2020. (b) Absolute ATT estimates (black line) and 95% confidence intervals (shaded area) for closure as of April 6. (c) Relative ATT estimates and 95% confidence intervals for closure as of April 6. (d)–(f) Those for closure as of April 10.

437da452-ff32-4612-a74d-0402e3ef26b7_figure3.gif

Figure 3. Simulated average treatment effect on the control (ATC) estimates assuming 50%/80% mitigating effects.

(a) The average outcome (incidence per capita) of the unmatched treatment (dashed), matched treatment (black) and control (red) groups for closure as of April 6, 2020. (b) Absolute ATC estimates (black line) and 95% confidence intervals (shaded area) for closure as of April 6. (c) Relative ATC estimates and 95% confidence intervals for closure as of April 6. (d)–(f) Those for closure as of April 10.

Separation of propensity scores

I also noticed that propensity scores computed for one of the subanalyses included, inverse-probability weighting, exhibited substantial/complete “separation” (Heinze & Schemper, 2002) and most samples were essentially lost due to the substantial imbalance in the assigned weights (Figure 4). Although separation of propensity scores can arise from overfitting, in this case it remained (while slightly ameliorated) even after addressing overfitting by Lasso regularisation (Figure 5). This indicates that the treatment assignments may have been nearly deterministic in the dataset, which can compromise the performance of quasi-experimental causal inference via “positivity violation” (Petersen et al., 2020).

437da452-ff32-4612-a74d-0402e3ef26b7_figure4.gif

Figure 4. Propensity scores and effective sample sizes for the inverse probability weighting analysis in the original study.

(a) Balance of propensity scores before and after matching for school closure as of April 6, 2021. (b) Balance of propensity scores before and after matching for school closure as of April 10, 2021. (c) All and effective sample sizes and the maximum weight among the samples. The effective sample size of NaN indicates that the all samples received zero weights.

437da452-ff32-4612-a74d-0402e3ef26b7_figure5.gif

Figure 5. Inverse probability weighting with Lasso regularisation.

(a) The average outcome (incidence per capita) of the unmatched treatment (dashed), matched treatment (black) and control (red) groups for closure as of April 6, 2020. (b) Absolute ATC estimates (black line) and 95% confidence intervals (shaded area) for closure as of April 6. (c) Result of 10-fold cross validation. The x-axis represents the logarithm of the regularisation coefficient λ for each model; the number of included variables is also displayed above the panel. The left dotted vertical line denotes the selected model with best cross validation performance and the right dotted line the most parsimonious within the 1 standard error range of the performance from the best model (for reference purpose). (d) Balance of propensity scores before and after matching. (e)–(h) Those for closure as of April 10. (i) All and effective sample sizes and the maximum weight among the samples.

The authors did not use propensity scores in the Mahalanobis distance-based genetic matching for the main analysis as opposed to the general recommendation (Diamond & Sekhon, 2013) (the authors cite King & Nielsen, 2019 as a reason not to use propensity scores, the authors of which however clarifies that their criticism does not apply to genetic matching). This means that the covariates that strongly determined the treatment assignment may not have received large weights (and therefore were not prioritised) in the matching process, which could leave unadjusted bias arising from these potential confounders. For example, many regression coefficients for prefecture dummy variables had large values (~5 or larger) in the Lasso-regularised model, whereas 236 out of 483 matched pairs of municipalities in the original analysis for April 6 were from different prefectures. The robustness to the above concerns could be assessed by computing ESS from another genetic matching including propensity scores and a calliper (to ensure the matched pairs have sufficiently similar features), which I report in the next section.

Reanalysis with genetic matching with propensity scores and a calliper

I reanalysed the original dataset with the genetic matching algorithm incorporating propensity scores and a calliper and estimated ATCs for school closures as of Aril 6 and 10, 2020. Propensity scores were estimated by a Lasso-regularised linear regression model and included in genetic matching with a calliper of 0.25 (Rosenbaum & Rubin, 1985). The results remained statistically insignificant and the confidence intervals for the relative effects covered most region from -100% to 100%, although the direction of the weak trend reversed for closure as of April 6 from the original study (Figure 6). ESS of the matched treatment group was only 7 and 3.8 for April 6 and 10, respectively, indicating that the results relied on only a small set of samples that were repeatedly used in matching. Genetic matching is generalisation of propensity score and Mahalanobis distance matching that searches for optimal covariate balance and thus should achieve no worse balance than matching using only Mahalanobis distance (Diamond & Sekhon, 2013). The substantial loss of ESS in the updated genetic matching with propensity scores suggests that improved matching required more samples to be discarded and that both the original and current results are likely unreliable.

437da452-ff32-4612-a74d-0402e3ef26b7_figure6.gif

Figure 6. Re-estimated average treatment effect on the control (ATC) using a genetic matching with propensity scores and a calliper of 0.25.

(a) The average outcome (incidence per capita) of the unmatched treatment (dashed black), matched treatment (solid black) and control (red) groups for closure as of April 6, 2020. (b) Absolute ATC estimates (black line) and 95% confidence intervals (shaded area) for closure as of April 6. (c) Relative ATC estimates and 95% confidence intervals for closure as of April 6. (d)–(f) Those for closure as of April 10.

Conclusion

The reanalysis of Fukumoto et al. suggested that the study was inherently underpowered to identify the presence of causal effects of school closure on COVID-19. While I recognise the importance of their attempt to assessing the school closure policy given its collateral effect imposed onto students and their family, I argue that their conclusion of “no causal effect” was not well supported by data due to the limited statistical power. Finding no mitigating effect itself would not be surprising as children were not the centre of the outbreak especially in the earliest phase (Davies et al. 2020); nonetheless, evidence claiming “no effect” would need to show that effects were at least below the level of practical significance.

Altogether, these limitations represent difficulties in post-hoc causal analysis of mass interventions implemented without a built-in evaluation design such as randomisation. The fact that even the reasonably designed approach of Fukumoto et al. suffers insufficient power emphasises the importance of the "evidence-generating" philosophy in policy planning as has been promoted for medicine (Embi & Payne, 2013).

Data availability

Underlying data

This study did not generate original data. The underlying dataset is available from the repository associated with the original study:

Harvard Dataverse. Replication Data for: No causal effect of school closures in Japan on the spread of COVID-19 in spring 2020. DOI: https://doi.org/10.7910/DVN/N803UQ (Fukumoto et al. 2021a).

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver.

Extended data

Replication code along with the full analysis report (Extended data: Supplementary document) is available from a GitHub repository: https://github.com/akira-endo/reanalysis_Fukumoto2021.

Archived version of the above repository at time of publication is available from: Zenodo. akira-endo/reanalysis_Fukumoto2021: ‘Not finding causal effect’ is not ‘finding no causal effect’ of school closure on COVID-19. DOI: https://doi.org/10.5281/zenodo.6457916 (Endo, 2022)

This project contains the following data:

  • - main.html/main.ipynb (Extended data: Supplementary document).

  • - replication codes and data from the original study (Fukumoto et al. 2021a) which are partially modified and reused.

  • - replication codes for the analysis conducted in this study.

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 25 Apr 2022
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Endo A. ‘Not finding causal effect’ is not ‘finding no causal effect’ of school closure on COVID-19 [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2022, 11:456 (https://doi.org/10.12688/f1000research.111915.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 25 Apr 2022
Views
28
Cite
Reviewer Report 04 Jul 2022
Takehiko I. Hayashi, Social Systems Division, National Institute for Environmental Studies, Tsukuba, Japan 
Approved
VIEWS 28
This article reanalyzed Fukumoto et al. (2021), which concluded that school closures had no causal effect on the spread of COVID-19 in Spring 2020 in Japan. The author first examined the robustness of the conclusion of Fukumoto et al. (2021) ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Hayashi TI. Reviewer Report For: ‘Not finding causal effect’ is not ‘finding no causal effect’ of school closure on COVID-19 [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2022, 11:456 (https://doi.org/10.5256/f1000research.123641.r136221)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 27 Jun 2024
    Akira Endo, School of Tropical Medicine and Global Health, Nagasaki University, Nagasaki, 852-8521, Japan
    27 Jun 2024
    Author Response
    > I thank the reviewers for their constructive feedback. While I regret the extended time it took to revise the manuscript in response to their comments, I believe the revised ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 27 Jun 2024
    Akira Endo, School of Tropical Medicine and Global Health, Nagasaki University, Nagasaki, 852-8521, Japan
    27 Jun 2024
    Author Response
    > I thank the reviewers for their constructive feedback. While I regret the extended time it took to revise the manuscript in response to their comments, I believe the revised ... Continue reading
Views
34
Cite
Reviewer Report 29 Apr 2022
Koichiro Shiba, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA 
Approved with Reservations
VIEWS 34
This article provides a critical re-assessment of the recent paper that concluded that school closures had no causal effect on the spread of COVID-19 in Spring 2020 (Fukumoto et al., 2021). The author of this article argued that the original ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Shiba K. Reviewer Report For: ‘Not finding causal effect’ is not ‘finding no causal effect’ of school closure on COVID-19 [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2022, 11:456 (https://doi.org/10.5256/f1000research.123641.r136223)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 27 Jun 2024
    Akira Endo, School of Tropical Medicine and Global Health, Nagasaki University, Nagasaki, 852-8521, Japan
    27 Jun 2024
    Author Response
    > I thank the reviewers for their constructive feedback. While I regret the extended time it took to revise the manuscript in response to their comments, I believe the revised ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 27 Jun 2024
    Akira Endo, School of Tropical Medicine and Global Health, Nagasaki University, Nagasaki, 852-8521, Japan
    27 Jun 2024
    Author Response
    > I thank the reviewers for their constructive feedback. While I regret the extended time it took to revise the manuscript in response to their comments, I believe the revised ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 25 Apr 2022
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.