Dependency‑Light ML Forecasts of ASEAN Smoking Prevalence with Uncertainty &amp; Policy Scenarios (1990‑2035)

Awab Ahmad; Rida Aleem; Mahmood Ahmad

doi:10.12688/f1000research.170647.1

Home Browse Dependency‑Light ML Forecasts of ASEAN Smoking Prevalence with Uncertainty...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Dependency‑Light ML Forecasts of ASEAN Smoking Prevalence with Uncertainty & Policy Scenarios (1990‑2035)

[version 1; peer review: 1 approved with reservations]

Awab Ahmad ¹, Rida Aleem¹, Mahmood Ahmad²

PUBLISHED 10 Apr 2026

Author details Author details

¹ St George's University of London, London, England, UK
² Royal Free Hospital, Royal Free London NHS Foundation Trust, London, England, UK

Awab Ahmad
Roles: Data Curation, Methodology, Project Administration, Resources, Writing – Original Draft Preparation, Writing – Review & Editing

Rida Aleem
Roles: Data Curation, Project Administration, Resources, Writing – Original Draft Preparation, Writing – Review & Editing

Mahmood Ahmad
Roles: Conceptualization, Data Curation, Investigation, Methodology, Project Administration, Resources, Software, Supervision, Validation, Visualization

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Global Public Health gateway.

Abstract

Background

Within the ASEAN region, smoking is one of the leading preventable causes of death, contributing to over 10% of all global smoking-related deaths. Accurate policy modelling and effective forecasting are therefore vital to support strategies that aim to control tobacco regionally.

Methods

Our study created an installation-free, open-source machine learning model that could predict smoking prevalence and simulate the potential effect of a 15% tobacco tax rise within the 11 ASEAN countries. This was done by clustering countries based on historical prevalence using dynamic time warping k-means (DTW-kM). The performance of the model was then compared with the traditional autoregressive integrated moving average (ARIMA) approach, which used mean absolute error (MAE) as the primary accuracy metric.

Results

Compared to the traditional ARIMA, the stacked long short term memory (LSTM) model performed better in forecasting accuracy (median MAE 0.32 vs 0.46 percentage points, p < 0.01). If no intervention is made, the smoking prevalence across the ASEAN region will fall from 26.35 (2022) to 24.6% by 2035. Implementing a 15% tax rise in 2026 improves this percentage to 22.0% which averts 1.3 million disability-adjusted life years (DALYs).

Conclusion

This openly accessible tool therefore supports evidence-based tobacco policy and provides a reproducible, data-driven approach to forecast tobacco use trends and evaluate the effect of policy interventions.

Keywords

Smoking prevalence; ASEAN; Machine learning; Forecasting; Tobacco tax; Public health policy

Corresponding author: Awab Ahmad

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2026 Ahmad A et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Ahmad A, Aleem R and Ahmad M. Dependency‑Light ML Forecasts of ASEAN Smoking Prevalence with Uncertainty & Policy Scenarios (1990‑2035) [version 1; peer review: 1 approved with reservations]. F1000Research 2026, 15:500 (https://doi.org/10.12688/f1000research.170647.1) First published: 10 Apr 2026, 15:500 (https://doi.org/10.12688/f1000research.170647.1) Latest published: 10 Apr 2026, 15:500 (https://doi.org/10.12688/f1000research.170647.1)

Introduction

Smoking is one of the major causes of preventable death in the ASEAN region, accounting for over 10% of all global smoking related deaths (Dai et al., 2025). Despite the World Health Organization’s (WHO) target to reduce adult smoking prevalence by 30% from 2010 to 2030, there has been limited progress across ASEAN countries (World Health Organization, 2023). The smoking trends for the ASEAN countries were clustered ( Figure 1) and forecasted aggregate smoking prevalence can also be seen ( Figure 2). Few studies have explored the impact of future policies on this goal and have used complex methods that are difficult to reproduce.

Figure 1. Smoking prevalence trends in 11 ASEAN countries, 1990–2022.

The age standardized smoking prevalence rates for all 11 ASEAN countries can be seen over the period of 1990–2022 for both sexes. Each coloured line represents a different country indicated by the key and illustrates historical trends in smoking prevalence.

Figure 2. ASEAN aggregate smoking prevalence: observed vs forecast.

The forecasted age standardized smoking prevalence rates for both sexes were generated using the stacked LSTM model based on previous trends. The dashed lines represent the predicted average prevalence across all ASEAN countries from 2023 to 2035. The solid line illustrates the actual observed average smoking prevalence for all 11 ASEAN countries from 1990 to 2022.

Long short-term memory (LSTM) networks are types of machine learning (ML) models that have proven to be promising in forecasting public health trends and predicting future behaviours (Hochreiter and Schmidhuber, 1997). These models have performed well in predicting infectious disease outbreaks but have not been used in tobacco control.

This study aimed to build a ML tool that could be operated without additional software and could: help group ASEAN countries based on past smoking trends, produce reliable forecasts up to 2035 and thirdly simulate the impact on health of introducing new policies such as increased tobacco tax.

Method

Data sources

Open-access datasets from the Global Burden of Disease Study 2021 were used (Dai et al., 2025) to access age-standardized smoking prevalence rates from 1990–2022 for 11 ASEAN countries (Institute for Health Metrics and Evaluation, 2024b). Alongside this, additional variables such as GDP per capita, tobacco tax levels and tobacco control scores were included if available. Smoking attributable disease burden (DALYs) were also collected from Global Burden of Disease 2021 (GBD 2021) (Institute for Health Metrics and Evaluation, 2024a).

Pre-processing and clustering

Of the data collected, we filtered out only age-standardised, both sex prevalence rates that had no missing values and each country’s smoking history was presented as a 33-year timeline. The 11 countries were then grouped into three clusters using a time series method called Dynamic Time Warping (DTW-kM). When this was unavailable, the system defaulted to a simpler method using Euclidean distance.

Forecasting

We built a machine learning model (stacked LSTM network) that predicted future smoking trends based on the past six years of data. This model’s output was then compared with two current baselines: ARIMA (a traditional statistical model) and a naive model, which assumed no change. Dropout sampling was also used to estimate uncertainty, and built-in validation was used to compare forecast accuracy from 2018–2022.

Policy simulation

Policy changes were simulated using a tool in the model such as a 15% tobacco tax rise in 2026 which was modelled to estimate its effect on smoking rates and related disease burden. DALYs were adjusted based on known elasticities (0.7% DALY reduction per 1% drop in prevalence) (Nazar et al., 2021).

Reproducibility

The entire system is very reproducible as it runs in under 5 seconds and logs all code version and data hashes which ensures transparency.

Results

The 11 ASEAN countries were categorized into three main groups based on their smoking trends:

Early Convergers – Singapore, Brunei: had already reduced smoking prevalence to below 15% by 2022.

Mid Decliners – Malaysia, Philippines, Thailand: have experienced a steady decline of 1.2 percentage points (pp) per year since 2005.

Late Stagnators – Indonesia, Myanmar, Vietnam: which still have high smoking rates of around 28% with little recent progress.

Forecasting accuracy

The LSTM model outperformed the ARIMA model in 9 out of 11 countries. The median absolute error from 2018 to 2022 was lower for LSTM (0.32 pp, 95% CI: 0.26–0.40) than for ARIMA (0.46 pp, 95% CI: 0.39–0.57), with a statistically significant difference (p < 0.01).

Future projections

Based on current trends, the average smoking rate in the ASEAN countries will decrease modestly from 26.3% in 2022 to 24.6% by 2035 if current trends continue. Singapore is the only country projected to achieve the WHO target of less than 30% by 2030 (World Health Organization, 2023).

Tax simulation

On introduction of a tobacco tax of 15% in 2026, the expected prevalence in 2035 drops to 22% which could prevent 1.3 million smoking related DALYs across the region from a single policy.

Sensitivity checks

Adjusting the settings of the model e.g. adding more country clusters or longer input data windows had very minimal effects on the overall results which affirms the stability of the model.

Discussion

This lightweight machine learning tool is the first that has been developed to forecast smoking prevalence across ASEAN countries. The study creates opportunities for shared learning and policy collaboration as it groups countries based on past smoking trends. For example, Malaysia’s trajectory closely mirrors that of the Philippines and Thailand, while Indonesia’s trends resemble those of Myanmar and Vietnam.

The results show that LSTM models have higher accuracy when compared to traditional ARIMA methods which is also consistent with what has been seen in infectious disease forecasting. The LSTM model is highly practical in low-resource settings such as government health departments or public health NGOs due to it not requiring external software installations.

Based on current policies, without stronger action, most ASEAN countries will clearly not meet the WHO target of a 30% reduction in smoking prevalence by 2030. Smoking rates could be significantly reduced, and over 1 million years of life lost to smoking-related disease could be prevented with interventions such as a 15% increase in tobacco taxes.

Limitations

- The smoking trends were not modelled by age group
- The relationship between prevalence reduction and DALYs was assumed to be linear
- Newly emerging trends such as vaping and e-cigarette use were not included or investigated in the study

Future directions

In the future, more detailed data such as age and income could be incorporated. The model could also be expanded by exploring e-cigarette usage and vaping using advanced machine learning methods such as Temporal Fusion Transformers to improve the accuracy and applicability of the study.

Conclusion

We developed a simple machine learning tool that does not require installation and can predict smoking trends while assessing policy impacts across ASEAN. Based on our analysis, most countries will not achieve the WHO target unless stronger tobacco control measures are implemented. Even a modest tax increase could avoid over a million DALYs. The overall framework is also adaptable to other public health areas without external software installation.

Software availability

Source code available from: https://github.com/awabahmad469/smoking-analysis

Archived source code at: https://doi.org/10.5281/zenodo.17499927

License: MIT.

This software is based on code originally published by Mahmood Ahmad (DOI: https://doi.org/10.5281/zenodo.17095791) and has been reused and extended with permission under the MIT License.

Ethics and consent

Ethical approval and consent were not required for this study as it used publicly available, deidentified datasets and there was no direct involvement from human participants.

The primary data on smoking prevalence and disease burden were obtained from the Institute for Health Metrics and Evaluation (IHME) Global Burden of Disease Study 2021, accessible at https://ghdx.healthdata.org/record/ihme-data/gbd-2021-asean-smoking-prevalence-burden-1990-2021 (accessed 30 July 2025).

Additional variables and processed datasets generated for analysis, including extrapolations to 2022 and policy simulation inputs, are publicly archived on Zenodo at https://doi.org/10.5281/zenodo.17095791, enabling full reproducibility by readers and reviewers.

Data availability

The dataset used in this study is publicly available from Mahmood Ahmad (2025), “Smoking”, Zenodo, https://doi.org/10.5281/zenodo.17095791.

The data can be accessed by any reader without registration, and reuse is permitted under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, with appropriate attribution to the original author.

Acknowledgments

Portions of the Python code archived on Zenodo were drafted with assistance from ChatGPT (o3-mini) and used to help in code development. All outputs were reviewed, tested, and validated by the authors.

References

Ahmad M: Smoking datasets used in “Forecasting smoking prevalence in ASEAN countries using machine learning”. Zenodo. 2025. (Accessed: 11 September 2025). Publisher Full Text
Dai X, Ng M, Gil GF, et al.: The epidemiology and burden of smoking in countries of the Association of Southeast Asian Nations (ASEAN), 1990–2021: findings from the Global Burden of Disease Study 2021. Lancet Public Health. 2025; 10(6): e442–e455. PubMed Abstract | Publisher Full Text | Free Full Text
Hochreiter S, Schmidhuber J: Long short-term memory. Neural Comput. 1997; 9(8): 1735–1780. Publisher Full Text
Institute for Health Metrics and Evaluation: Global Burden of Disease 2021: Findings from the GBD 2021 Study.2024a;17. (Accessed: 31 July 2025). Reference Source
Institute for Health Metrics and Evaluation: Global Burden of Disease (GBD 2021) ASEAN Smoking Prevalence & Burden 1990–2021. Global Health Data Exchange.2024b. (Accessed: 30 July 2025). Reference Source
Nazar GP, Sharma N, Chugh A, et al.: Impact of tobacco price and taxation on affordability and consumption of tobacco products in the South-East Asia Region: A systematic review. Tob. Induc. Dis. 2021: 19: 1, 17. (Accessed: 31 July 2025). PubMed Abstract | Publisher Full Text | Free Full Text Reference Source
World Health Organization: Implementation roadmap 2023–2030 for the Global action plan for the prevention and control of noncommunicable diseases.2023. (Accessed: 31 July 2025). Reference Source

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 10 Apr 2026

Author details Author details

¹ St George's University of London, London, England, UK
² Royal Free Hospital, Royal Free London NHS Foundation Trust, London, England, UK

Awab Ahmad
Roles: Data Curation, Methodology, Project Administration, Resources, Writing – Original Draft Preparation, Writing – Review & Editing

Rida Aleem
Roles: Data Curation, Project Administration, Resources, Writing – Original Draft Preparation, Writing – Review & Editing

Mahmood Ahmad
Roles: Conceptualization, Data Curation, Investigation, Methodology, Project Administration, Resources, Software, Supervision, Validation, Visualization

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 10 Apr 2026, 15:500

https://doi.org/10.12688/f1000research.170647.1

Copyright

© 2026 Ahmad A et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Ahmad A, Aleem R and Ahmad M. Dependency‑Light ML Forecasts of ASEAN Smoking Prevalence with Uncertainty & Policy Scenarios (1990‑2035) [version 1; peer review: 1 approved with reservations]. F1000Research 2026, 15:500 (https://doi.org/10.12688/f1000research.170647.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 10 Apr 2026

Views

7

Reviewer Report 06 Jun 2026

Juan Manuel Martín-Álvarez, Universidad Internacional de La Rioja (UNIR), Logroño, Spain

Approved with Reservations

https://doi.org/10.5256/f1000research.188135.r475280

This article addresses an important public health issue by developing a machine learning framework to forecast smoking prevalence in ASEAN countries and simulate the impact of tobacco taxation policies. The study is timely, relevant, and has clear potential for policy ... Continue reading

This article addresses an important public health issue by developing a machine learning framework to forecast smoking prevalence in ASEAN countries and simulate the impact of tobacco taxation policies. The study is timely, relevant, and has clear potential for policy application, particularly given its emphasis on accessibility and reproducibility through an open-source, installation-free tool.
The manuscript is generally well structured and clearly written. The objectives are well defined, and the results are presented in a straightforward manner. The comparison between LSTM and ARIMA models is useful, and the inclusion of policy simulation represents a valuable attempt to move from prediction to decision support.
However, several aspects require further development to ensure scientific robustness.
First, the methodological description is insufficiently detailed. Key elements of the LSTM model—such as architecture, hyperparameters, training procedure, and validation strategy—are not fully specified, which limits reproducibility.
Second, the modeling framework is relatively limited. The comparison is restricted to ARIMA and a naïve baseline, whereas the current literature supports the use of more advanced and hybrid approaches. For example, recent work combining econometric and machine learning models in a counterfactual framework has demonstrated improved robustness, interpretability, and policy relevance (refer 1).
Third, the policy simulation approach relies on simplified assumptions, particularly the linear relationship between prevalence reduction and DALYs. This assumption may not adequately capture real-world nonlinearities or cross-country heterogeneity, and therefore the policy conclusions should be interpreted with caution.
Fourth, the statistical evaluation could be strengthened. The study relies primarily on MAE, while additional metrics (e.g., RMSE, MAPE) and robustness checks would provide a more comprehensive assessment of model performance.
Fifth, although the discussion is generally appropriate, some conclusions are stated too strongly given the limitations of the modeling approach. In particular, the manuscript does not clearly distinguish between predictive simulation and causal inference, which may lead to overinterpretation of policy effects.
Finally, the manuscript would benefit from stronger engagement with recent literature on counterfactual modeling and hybrid forecasting approaches in public health, which provide more robust frameworks for evaluating policy impacts.
In conclusion, this is a promising and relevant contribution with practical value, but it requires major revisions to improve methodological transparency, strengthen the analytical framework, and moderate the interpretation of results.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

References

1. Martín-Álvarez J, Galiano A, Hoz B, Lyalkov S, et al.: The long-term impact of Spain's 2010 Anti-Smoking Law: A counterfactual and prospective time-series analysis. AIMS Public Health. 2026; 13 (1): 178-203 Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Public health analytics, time-series econometrics, machine learning applications in health, and policy evaluation. I am particularly familiar with forecasting methods, counterfactual analysis, and the assessment of public health interventions using statistical and hybrid ML approaches.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 10 Apr 2026

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1
Version 1 10 Apr 26	read

Juan Manuel Martín-Álvarez, Universidad Internacional de La Rioja (UNIR), Logroño, Spain

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

7 Views

06 Jun 2026 | for Version 1

Juan Manuel Martín-Álvarez, Universidad Internacional de La Rioja (UNIR), Logroño, Spain

7 Views Cite this report Responses(0)

Approved With Reservations

This article addresses an important public health issue by developing a machine learning framework to forecast smoking prevalence in ASEAN countries and simulate the impact of tobacco taxation policies. The study is timely, relevant, and has clear potential for policy application, particularly given its emphasis on accessibility and reproducibility through an open-source, installation-free tool.
The manuscript is generally well structured and clearly written. The objectives are well defined, and the results are presented in a straightforward manner. The comparison between LSTM and ARIMA models is useful, and the inclusion of policy simulation represents a valuable attempt to move from prediction to decision support.
However, several aspects require further development to ensure scientific robustness.
First, the methodological description is insufficiently detailed. Key elements of the LSTM model—such as architecture, hyperparameters, training procedure, and validation strategy—are not fully specified, which limits reproducibility.
Second, the modeling framework is relatively limited. The comparison is restricted to ARIMA and a naïve baseline, whereas the current literature supports the use of more advanced and hybrid approaches. For example, recent work combining econometric and machine learning models in a counterfactual framework has demonstrated improved robustness, interpretability, and policy relevance (refer 1).
Third, the policy simulation approach relies on simplified assumptions, particularly the linear relationship between prevalence reduction and DALYs. This assumption may not adequately capture real-world nonlinearities or cross-country heterogeneity, and therefore the policy conclusions should be interpreted with caution.
Fourth, the statistical evaluation could be strengthened. The study relies primarily on MAE, while additional metrics (e.g., RMSE, MAPE) and robustness checks would provide a more comprehensive assessment of model performance.
Fifth, although the discussion is generally appropriate, some conclusions are stated too strongly given the limitations of the modeling approach. In particular, the manuscript does not clearly distinguish between predictive simulation and causal inference, which may lead to overinterpretation of policy effects.
Finally, the manuscript would benefit from stronger engagement with recent literature on counterfactual modeling and hybrid forecasting approaches in public health, which provide more robust frameworks for evaluating policy impacts.
In conclusion, this is a promising and relevant contribution with practical value, but it requires major revisions to improve methodological transparency, strengthen the analytical framework, and moderate the interpretation of results.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

References

1. Martín-Álvarez J, Galiano A, Hoz B, Lyalkov S, et al.: The long-term impact of Spain's 2010 Anti-Smoking Law: A counterfactual and prospective time-series analysis. AIMS Public Health. 2026; 13 (1): 178-203 Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Public health analytics, time-series econometrics, machine learning applications in health, and policy evaluation. I am particularly familiar with forecasting methods, counterfactual analysis, and the assessment of public health interventions using statistical and hybrid ML approaches.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] Ahmad M: Smoking datasets used in “Forecasting smoking prevalence in ASEAN countries using machine learning”. Zenodo. 2025. (Accessed: 11 September 2025). Publisher Full Text

[2] Dai X, Ng M, Gil GF, et al.: The epidemiology and burden of smoking in countries of the Association of Southeast Asian Nations (ASEAN), 1990–2021: findings from the Global Burden of Disease Study 2021. Lancet Public Health. 2025; 10(6): e442–e455. PubMed Abstract | Publisher Full Text | Free Full Text

[3] Hochreiter S, Schmidhuber J: Long short-term memory. Neural Comput. 1997; 9(8): 1735–1780. Publisher Full Text

[4] Institute for Health Metrics and Evaluation: Global Burden of Disease 2021: Findings from the GBD 2021 Study.2024a;17. (Accessed: 31 July 2025). Reference Source

[5] Institute for Health Metrics and Evaluation: Global Burden of Disease (GBD 2021) ASEAN Smoking Prevalence & Burden 1990–2021. Global Health Data Exchange.2024b. (Accessed: 30 July 2025). Reference Source

[6] Nazar GP, Sharma N, Chugh A, et al.: Impact of tobacco price and taxation on affordability and consumption of tobacco products in the South-East Asia Region: A systematic review. Tob. Induc. Dis. 2021: 19: 1, 17. (Accessed: 31 July 2025). PubMed Abstract | Publisher Full Text | Free Full Text Reference Source

[7] World Health Organization: Implementation roadmap 2023–2030 for the Global action plan for the prevention and control of noncommunicable diseases.2023. (Accessed: 31 July 2025). Reference Source

Dependency‑Light ML Forecasts of ASEAN Smoking Prevalence with Uncertainty & Policy Scenarios (1990‑2035)

Abstract

Background

Methods

Results

Conclusion

Keywords

Introduction

Figure 1. Smoking prevalence trends in 11 ASEAN countries, 1990–2022.

Figure 2. ASEAN aggregate smoking prevalence: observed vs forecast.

Method

Data sources

Pre-processing and clustering

Forecasting

Policy simulation

Reproducibility

Results

Forecasting accuracy

Future projections

Tax simulation

Sensitivity checks

Discussion

Limitations

Future directions

Conclusion

Software availability

Ethics and consent

Data availability

Acknowledgments

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated