ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Dependency‑Light ML Forecasts of ASEAN Smoking Prevalence with Uncertainty & Policy Scenarios (1990‑2035)

[version 1; peer review: 1 approved with reservations]
PUBLISHED 10 Apr 2026
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Global Public Health gateway.

Abstract

Background

Within the ASEAN region, smoking is one of the leading preventable causes of death, contributing to over 10% of all global smoking-related deaths. Accurate policy modelling and effective forecasting are therefore vital to support strategies that aim to control tobacco regionally.

Methods

Our study created an installation-free, open-source machine learning model that could predict smoking prevalence and simulate the potential effect of a 15% tobacco tax rise within the 11 ASEAN countries. This was done by clustering countries based on historical prevalence using dynamic time warping k-means (DTW-kM). The performance of the model was then compared with the traditional autoregressive integrated moving average (ARIMA) approach, which used mean absolute error (MAE) as the primary accuracy metric.

Results

Compared to the traditional ARIMA, the stacked long short term memory (LSTM) model performed better in forecasting accuracy (median MAE 0.32 vs 0.46 percentage points, p < 0.01). If no intervention is made, the smoking prevalence across the ASEAN region will fall from 26.35 (2022) to 24.6% by 2035. Implementing a 15% tax rise in 2026 improves this percentage to 22.0% which averts 1.3 million disability-adjusted life years (DALYs).

Conclusion

This openly accessible tool therefore supports evidence-based tobacco policy and provides a reproducible, data-driven approach to forecast tobacco use trends and evaluate the effect of policy interventions.

Keywords

Smoking prevalence; ASEAN; Machine learning; Forecasting; Tobacco tax; Public health policy

Introduction

Smoking is one of the major causes of preventable death in the ASEAN region, accounting for over 10% of all global smoking related deaths (Dai et al., 2025). Despite the World Health Organization’s (WHO) target to reduce adult smoking prevalence by 30% from 2010 to 2030, there has been limited progress across ASEAN countries (World Health Organization, 2023). The smoking trends for the ASEAN countries were clustered ( Figure 1) and forecasted aggregate smoking prevalence can also be seen ( Figure 2). Few studies have explored the impact of future policies on this goal and have used complex methods that are difficult to reproduce.

cc6441e1-d3e6-4e1f-bb73-da93ed78c946_figure1.gif

Figure 1. Smoking prevalence trends in 11 ASEAN countries, 1990–2022.

The age standardized smoking prevalence rates for all 11 ASEAN countries can be seen over the period of 1990–2022 for both sexes. Each coloured line represents a different country indicated by the key and illustrates historical trends in smoking prevalence.

cc6441e1-d3e6-4e1f-bb73-da93ed78c946_figure2.gif

Figure 2. ASEAN aggregate smoking prevalence: observed vs forecast.

The forecasted age standardized smoking prevalence rates for both sexes were generated using the stacked LSTM model based on previous trends. The dashed lines represent the predicted average prevalence across all ASEAN countries from 2023 to 2035. The solid line illustrates the actual observed average smoking prevalence for all 11 ASEAN countries from 1990 to 2022.

Long short-term memory (LSTM) networks are types of machine learning (ML) models that have proven to be promising in forecasting public health trends and predicting future behaviours (Hochreiter and Schmidhuber, 1997). These models have performed well in predicting infectious disease outbreaks but have not been used in tobacco control.

This study aimed to build a ML tool that could be operated without additional software and could: help group ASEAN countries based on past smoking trends, produce reliable forecasts up to 2035 and thirdly simulate the impact on health of introducing new policies such as increased tobacco tax.

Method

Data sources

Open-access datasets from the Global Burden of Disease Study 2021 were used (Dai et al., 2025) to access age-standardized smoking prevalence rates from 1990–2022 for 11 ASEAN countries (Institute for Health Metrics and Evaluation, 2024b). Alongside this, additional variables such as GDP per capita, tobacco tax levels and tobacco control scores were included if available. Smoking attributable disease burden (DALYs) were also collected from Global Burden of Disease 2021 (GBD 2021) (Institute for Health Metrics and Evaluation, 2024a).

Pre-processing and clustering

Of the data collected, we filtered out only age-standardised, both sex prevalence rates that had no missing values and each country’s smoking history was presented as a 33-year timeline. The 11 countries were then grouped into three clusters using a time series method called Dynamic Time Warping (DTW-kM). When this was unavailable, the system defaulted to a simpler method using Euclidean distance.

Forecasting

We built a machine learning model (stacked LSTM network) that predicted future smoking trends based on the past six years of data. This model’s output was then compared with two current baselines: ARIMA (a traditional statistical model) and a naive model, which assumed no change. Dropout sampling was also used to estimate uncertainty, and built-in validation was used to compare forecast accuracy from 2018–2022.

Policy simulation

Policy changes were simulated using a tool in the model such as a 15% tobacco tax rise in 2026 which was modelled to estimate its effect on smoking rates and related disease burden. DALYs were adjusted based on known elasticities (0.7% DALY reduction per 1% drop in prevalence) (Nazar et al., 2021).

Reproducibility

The entire system is very reproducible as it runs in under 5 seconds and logs all code version and data hashes which ensures transparency.

Results

The 11 ASEAN countries were categorized into three main groups based on their smoking trends:

Early Convergers – Singapore, Brunei: had already reduced smoking prevalence to below 15% by 2022.

Mid Decliners – Malaysia, Philippines, Thailand: have experienced a steady decline of 1.2 percentage points (pp) per year since 2005.

Late Stagnators – Indonesia, Myanmar, Vietnam: which still have high smoking rates of around 28% with little recent progress.

Forecasting accuracy

The LSTM model outperformed the ARIMA model in 9 out of 11 countries. The median absolute error from 2018 to 2022 was lower for LSTM (0.32 pp, 95% CI: 0.26–0.40) than for ARIMA (0.46 pp, 95% CI: 0.39–0.57), with a statistically significant difference (p < 0.01).

Future projections

Based on current trends, the average smoking rate in the ASEAN countries will decrease modestly from 26.3% in 2022 to 24.6% by 2035 if current trends continue. Singapore is the only country projected to achieve the WHO target of less than 30% by 2030 (World Health Organization, 2023).

Tax simulation

On introduction of a tobacco tax of 15% in 2026, the expected prevalence in 2035 drops to 22% which could prevent 1.3 million smoking related DALYs across the region from a single policy.

Sensitivity checks

Adjusting the settings of the model e.g. adding more country clusters or longer input data windows had very minimal effects on the overall results which affirms the stability of the model.

Discussion

This lightweight machine learning tool is the first that has been developed to forecast smoking prevalence across ASEAN countries. The study creates opportunities for shared learning and policy collaboration as it groups countries based on past smoking trends. For example, Malaysia’s trajectory closely mirrors that of the Philippines and Thailand, while Indonesia’s trends resemble those of Myanmar and Vietnam.

The results show that LSTM models have higher accuracy when compared to traditional ARIMA methods which is also consistent with what has been seen in infectious disease forecasting. The LSTM model is highly practical in low-resource settings such as government health departments or public health NGOs due to it not requiring external software installations.

Based on current policies, without stronger action, most ASEAN countries will clearly not meet the WHO target of a 30% reduction in smoking prevalence by 2030. Smoking rates could be significantly reduced, and over 1 million years of life lost to smoking-related disease could be prevented with interventions such as a 15% increase in tobacco taxes.

Limitations

  • - The smoking trends were not modelled by age group

  • - The relationship between prevalence reduction and DALYs was assumed to be linear

  • - Newly emerging trends such as vaping and e-cigarette use were not included or investigated in the study

Future directions

In the future, more detailed data such as age and income could be incorporated. The model could also be expanded by exploring e-cigarette usage and vaping using advanced machine learning methods such as Temporal Fusion Transformers to improve the accuracy and applicability of the study.

Conclusion

We developed a simple machine learning tool that does not require installation and can predict smoking trends while assessing policy impacts across ASEAN. Based on our analysis, most countries will not achieve the WHO target unless stronger tobacco control measures are implemented. Even a modest tax increase could avoid over a million DALYs. The overall framework is also adaptable to other public health areas without external software installation.

Software availability

Source code available from: https://github.com/awabahmad469/smoking-analysis

Archived source code at: https://doi.org/10.5281/zenodo.17499927

License: MIT.

This software is based on code originally published by Mahmood Ahmad (DOI: https://doi.org/10.5281/zenodo.17095791) and has been reused and extended with permission under the MIT License.

Ethics and consent

Ethical approval and consent were not required for this study as it used publicly available, deidentified datasets and there was no direct involvement from human participants.

The primary data on smoking prevalence and disease burden were obtained from the Institute for Health Metrics and Evaluation (IHME) Global Burden of Disease Study 2021, accessible at https://ghdx.healthdata.org/record/ihme-data/gbd-2021-asean-smoking-prevalence-burden-1990-2021 (accessed 30 July 2025).

Additional variables and processed datasets generated for analysis, including extrapolations to 2022 and policy simulation inputs, are publicly archived on Zenodo at https://doi.org/10.5281/zenodo.17095791, enabling full reproducibility by readers and reviewers.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 10 Apr 2026
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Ahmad A, Aleem R and Ahmad M. Dependency‑Light ML Forecasts of ASEAN Smoking Prevalence with Uncertainty & Policy Scenarios (1990‑2035) [version 1; peer review: 1 approved with reservations]. F1000Research 2026, 15:500 (https://doi.org/10.12688/f1000research.170647.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 10 Apr 2026
Views
7
Cite
Reviewer Report 06 Jun 2026
Juan Manuel Martín-Álvarez, Universidad Internacional de La Rioja (UNIR), Logroño, Spain 
Approved with Reservations
VIEWS 7
This article addresses an important public health issue by developing a machine learning framework to forecast smoking prevalence in ASEAN countries and simulate the impact of tobacco taxation policies. The study is timely, relevant, and has clear potential for policy ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Martín-Álvarez JM. Reviewer Report For: Dependency‑Light ML Forecasts of ASEAN Smoking Prevalence with Uncertainty & Policy Scenarios (1990‑2035) [version 1; peer review: 1 approved with reservations]. F1000Research 2026, 15:500 (https://doi.org/10.5256/f1000research.188135.r475280)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 10 Apr 2026
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.