ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article
Revised

COVID-19 impact: Customised economic stimulus package recommender system using machine learning techniques

[version 2; peer review: 2 approved, 1 approved with reservations]
PUBLISHED 12 Nov 2021
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

This article is included in the Research Synergy Foundation gateway.

Abstract

Background: The Malaysian government reacted to the pandemic’s economic effect with the Prihatin Rakyat Economic Stimulus Package (ESP) to cushion the novel coronavirus 2019 (COVID-19) impact on households. The ESP consists of cash assistance, utility discount, moratorium, Employee Provident Fund (EPF) cash withdrawals, credit guarantee scheme and wage subsidies. A survey carried out by the Department of Statistics Malaysia (DOSM) shows that households prefer different types of financial assistance. These preferences forge the need to effectively customise ESPs to manage the economic burden among low-income households. In this study, a recommender system for such ESPs was designed by leveraging data analytics and machine learning techniques.

Methods: This study used a dataset from DOSM titled “Effects of COVID-19 on the Economy and Individual - Round 2,” collected from April 10 to April 24, 2020. Cross-Industry Standard Process for Data Mining was followed to develop machine learning models to classify ESP receivers according to their preferred subsidies types. Four machine learning techniques—Decision Tree, Gradient Boosted Tree, Random Forest and Naïve Bayes—were used to build the predictive models for each moratorium, utility discount and EPF and Private Remuneration Scheme (PRS) cash withdrawals subsidies. The best predictive model was selected based on F-score metrics.

Results: Among the four machine learning techniques, Gradient Boosted Tree outperformed the rest. This technique predicted the following: moratorium preferences with 93.8% sensitivity, 82.1% precision and 87.6% F-score; utilities discount with 86% sensitivity, 82.1% precision and 84% F-score; and EPF and PRS with 83.6% sensitivity, 81.2% precision and 82.4% F-score. Households that prefer moratorium subsidies did not favour other financial aids except for cash assistance.

Conclusion: Findings present machine learning models that can predict individual household preferences from ESP. These models can be used to design customised ESPs that can effectively manage the financial burden of low-income households.

Keywords

COVID-19, low-income households, economic stimulus package, customisation, data analytics, machine learning, Gradient Boosted Tree

Revised Amendments from Version 1

The term "How" was added to the research questions. The methodology section has been enhanced to include different of machine learning approaches. A new flowchart (new Figure 2) of the modelling and evaluation process has been introduced, which demonstrates how a supervised machine learning model is constructed, as well as parameter tuning to achieve optimised models. To select the optimal model, k-fold cross validation was used, which revealed no differences when the data was separated into training and testing datasets. The discussion section has been updated to address each research questions individually, providing additional detail on the findings and their implications. The trade-off between prediction accuracy and interpretability of a model in machine learning approach is explained.  The part on limitations has been explained in detail.

See the authors' detailed response to the review by Setia Pramana
See the authors' detailed response to the review by Sreeja N.K
See the authors' detailed response to the review by Shahrinaz Ismail

Introduction

The novel coronavirus 2019 (COVID-19) pandemic has created devastation in people’s lives worldwide, both socially and economically (Shah et al., 2020). As a result, governments have adopted various strategies aimed at reducing the pandemic’s impact, particularly the financial strain. The Malaysian government has introduced a series of economic stimulus packages to support various segments of its citizens. One such support is the Prihatin Rakyat Economic Stimulus Package (ESP) to cushion the impact of COVID-19 on low-income households after the first movement control in the country. The ESP consists of cash assistance, utility discount, moratorium, Employee Provident Fund and Private Remuneration Scheme (EPF and PRS) cash withdrawals and Credit Guarantee Scheme and Wage subsidies (Flanders et al., 2020). Following the implementation of ESP, the Department of Statistics Malaysia (DOSM) carried out a special survey from April 10 to April 24,2020 to better understand the implications of COVID-19 on the economy and households. The study included questions on social and economic factors and subsidy preferences.

A typical low-income household often bears considerable debt and has limited savings. When movement control was implemented, households that lost their income sources faced difficulties in accessing necessities, such as food and housing (Flanders et al., 2020). Even though the government offered ESP to help residents cope financially, the demands and desires of citizens in the event of a pandemic are unknown. For example, several households are reluctant to withdraw from EPF and PRS due to its reduction on their savings for old age. A personalised ESP can be built to reduce residents’ financial burden in this crisis if we can foresee their requirements and preferences for various subsidies, such as cash allowance, utility discount, moratorium or EPF and PRS withdrawals. Using data analytics and machine learning approaches, this study attempted to analyse survey data and construct predictive models for customised economic stimulus packages. The following research questions were put forward.

1. How can households that favour moratorium subsidies be identified?

2. How to find out which households seek utility discount subsidies?

3. How can households who desire EPF and PRS withdrawal subsidies be identified?

This study contributes to the literature by using four machine learning techniques on socioeconomic survey data and predicting household subsidy preferences. A comparison of the feature selection methods, such as Gini index, Gain–Ratio and various partitioning ratios of the training and test data sets were carried out. The outcomes of this study can help the government deliver better and improved stimulus packages in the future based on individual preferences.

Methods

For planning and execution, this study used the Cross-Industry Standard Process-Data Mining (CRISP-DM), which is the industry-independent de-facto standard for implementing data mining initiatives (Schröer et al., 2021). This process has six phases, namely, business understanding, data understanding, data preparation, modelling, evaluation and deployment. Figure 1 depicts the activities carried out in each phase, as further explained below.

a170e1fe-dd7c-41c6-a103-39f04b179cf2_figure1.gif

Figure 1. Research methods.

Business understanding

This phase identifies the problems to solve using the machine learning perspective and approach. All three research questions were selected as the problems, and the purpose was to propose predictive models for the moratorium, utility discount and EPF and PRS subsidies in the Prihatin Rakyat ESPs.

Data understanding

Data gathering, evaluating, characterising and assuring its quality are part of this phase. DOSM performed a special survey (Round 2) to investigate the consequences of the COVID-19 epidemic on household economics and status (“Department of Statistics Malaysia Official Portal,” 2020; Malaysia, 2020). The dataset includes 36 questions and 41,386 respondents. However, the data obtained from DOSM were not complete due to missing questions. The missing questions were Q3, Q6, Q19, and Q27 – Q31. In terms of the total respondents, the data were complete and had a total of 41,386 participants, all of them were aged 15 and older. 96.8% of the respondents have received benefits from Prihatin Rakyat ESPs. The raw data were based on responses from respondents, which included qualitative personal opinions on economy, employment, lifestyle, and education. The original dataset was in Malay language, and is translated into English for this study and given below as Table 1.

Table 1. Survey questions.

1.    Joined and answered Survey Round 1 (Yes, No)
2.    State (KL, Johor, Selangor, Perak, Sarawak, Kedah, Pahang, Putrajaya, Perlis, Kelantan, Melaka, Pulau Pinang, Labuan,
Terengganu, Negeri Sembilan, Sabah)
3.    
4.    Gender (Female, Male)
5.    Stay in (City, Rural)
6.    
7.    Age group (15–24, 25–34, 35–44, 45–54, 55–64, 65 and above)
8.    Marital status
9.    (Single/Non-married, Married, Single mother, Widow/Widower, Divorced/Separated)
10.    Number of dependents (including respondents) (1–2, 3–4, 5–10, More than 10 people)
11.    Ethnic groups (Malay, Indian, Native Sabah/Sarawak, Chinese, Others, Foreigner)
12.    Benefit from the PRIHATIN Economic Stimulus Package (ESP)? (Yes, No)
13.    The most advantageous form of assistance received under the PRIHATIN Economic Stimulus Package?
                a.    Cash assistance (National Prihatian assistance, IPT student assistance, E-hailing) (Yes, No)
                b.    Utilities discount (Yes, No)
                c.    Moratorium (Yes, No)
                d.    EPF cash withdrawals & Private remuneration scheme (Yes, No)
                e.    Credit guarantee scheme (Yes, No)
                f.    Wage subsidies and payments under ERP program (Yes, No)
                g.    NA (Yes, No)
13.    Process/procedure achievement level for PRIHATIN assistance (Easy, Difficult, Medium, NA, Others)
14.    Satisfied with the help of PRIHATIN (Yes, No, NA)
15.    Status of eligibility as a recipient of PRIHATIN aid (Eligible, New application, Appeal, Not eligible)
16.    Ranking according to your preferences with the situation facing now [Health & lives] (1,2,3)
                a.    Ranking according to your preferences with the current situation [Income (employment/ business/enterprise)] (1,2,3)
                b.    Ranking according to your preferences with the situation faced now [Life goes back as normal] (1,2,3)
17.    The impact of the PRIHATIN Economic Stimulus Package (Effective, Most effective, Others, NA, No effects)
18.    With the extension of the Movement Control Order (MCO) until 28 April 2020, does it still require assistance/ the PRIHATIN
Economic Stimulus Package for the next phase? (Yes, No)
19.    
20.    Views on health workers (frontlines) and health facilities provided by the government in dealing with COVID-19 (Good, Bad)
21.    Views on achievements and facilities provided in dealing with COVID-19 in Malaysia compared to other countries. (Good, Bad)
22.    COVID-19 outbreak affects lifestyle (Yes, No)
23.    Ready to have a new normal lifestyle (i.e. a new normal life will differ from normal life before the spread of the COVID-19
outbreak) (Yes, No)
24.    If ready, lifestyle changes will be done
                a.    Number of lifestyle changes to be done (Yes, No)
                b.    Will not eat out (Yes, No)
                c.    Limiting social activities (Yes, No)
                d.    Limiting sports and recreational activities (Yes, No)
                e.    Limited religious and spiritual activity at home (Yes, No)
                f.    Limiting tourism activities (Yes, No)
                g.    Improves hygiene (Yes, No)
                h.    Others (Yes, No)
                i.    NA (Yes, No)
25.    Working during the MCO period (Work from home, Not working, Rotation - partial payment, Full paid leave,
                    Rotation - full payment, Semi-salary leave,
                    Retrenched)
26.    Work as (Government servants, Private sector employees, Not working, Self-employed, GLC employees, MNC employees,
employers, Unpaid family workers)
27.    
28.    
29.    
30.    
31.    
32.    Main expenses on food products during MCO (Yes, No)
                a.    Number of major food products expenditure during MCO
                b.    Dry food items (e.g. bihun, biscuits, nestum, etc.)
                c.    Cooked food (takeaway/delivery)
                d.    Instant noodles
                e.    Eggs
                f.    Fish/chicken/meat/seafood for cooking
                g.    Vegetables
                h.    Fruits
                i.    Frozen food (e.g. sausages, nuggets, fish balls, french fries, etc.)
                j.    Rice
                k.    Spaghetti
                l.    Bread
                m.    Baby food
                n.    Animal food
                o.    Cooking oil
                p.    Essential items for cooking (example: shallots, garlic, sugar, salt, etc.)
                q.    Canned beverages (examples: milo, condensed milk, powdered milk, etc.)
                r.    Flour (including wheat flour, Rice flour, glutinous flour, etc.)
                s.    3-in-1 Drinks
                t.    Vitamins/Supplements
                u.    Drinking water
                v.    Othersr
33.    Main expenses for non-food products during MCO (Yes, No)
                a.    Number of main expenses on non-food during MCO
                b.    Hand wash
                c.    Tissuepaper
                d.    Disinfectant
                e.    Face mask
                f.    Baby diaper
                g.    Wet tissue
                h.    Sanitary pad
                i.    Laundry soap
                j.    Toiletries
                k.    Thermometer
                i.    Medications (e.g. fever medicine, flu medicine, cough medicine, etc.)
                m.    First aid kit
                n.    Gloves (various sizes)
                o.    Others
34.    Internet access level for yourself / your child's online learning (NA, Slow, Fast, Average,
No Internet access)
35.    Average hours per day of yourself/child online learning during the MCO period (NA,
<1, 1–3, 3–5, 5–8, >8)
36.    Accessibility of faster Internet speed during online learning (NA, Morning, Night,
Afternoon)

Data preparation

There were 28 questions available for further analysis in this study, eliminating the missing ones. Question 32 and 33 were excluded from the survey because they focused on the primary food and non-food products purchased during the time of movement control orders. Given that the dataset was cluttered with missing values and errors, a considerable effort was spent on its cleaning before applying descriptive analytics techniques. Q34, Q35 and Q36 had missing values of 2071, 2243 and 2310 respectively and were replaced by the most frequent values. By cleaning the data, the raw data is transformed into structured data. Without losing any information, all lengthy responses were reduced to short and detailed responses. If the original answer for respondent’s dwelling state was "Wilayah Persekutuan Kuala Lumpur," it was converted to "KL." Questions with answers were labelled "Yes," whereas those without answers are labelled "No." One question, for example, inquired about respondents' willingness to eat out as part of the new norm's lifestyle adjustments. Those who agreed said they "will not eat out." Those who opposed to the shift in lifestyle left the question unanswered. As a result, these questions were changed to a "Yes" or "No" format.

Modelling

This phase employed a variety of machine learning techniques in order to achieve the study's goal of developing classification models. Different approaches, such as information-based learning, similarity-based learning, error-based learning, and probability-based learning, can be used to create classifiers (Kelleher et al., 2015). This research used information based learning (a decision tree algorithm family) as the best model for explaining decision logic. To create prediction models, we used four machine learning techniques: Decision Tree, Random Forest, Gradient Boosted Tree and Naïve Bayes. These four machine learning techniques were chosen from a literature review (Mostafa et al., 2021; Sangavi et al., 2020) and used to determine the optimal model by tuning their parameters. Feature selection methods, such as Gini index, Gain–Ratio and various partitioning ratios of the training and test data sets were also compared (Trivedi, 2020). Figure 2 depicts parameter tuning carried out in the modelling phase.

a170e1fe-dd7c-41c6-a103-39f04b179cf2_figure2.gif

Figure 2. Flowchart for modelling and evolution process.

Evaluation

In this phase, the best predictive model for each subsidy was selected based on the standard performance evaluation metrics: Sensitivity, Precision, F-Score and Accuracy (Moscato et al., 2021). The formulas used to calculate each of the metrics are given below.

Accuracy=TP+TNTP+TN+FP+FN

TP = True Positive, TN = True Negative, FP = False Positive, FN = False Negative

The completeness of a prediction model was measured by sensitivity, also known as recall and true positive rate (TPR). This metric determined the proportion of positive predictions by a model that corresponds to true positive values (Moscato et al., 2021). The formula is given below.

Sensitivity=TPTP+FN

Precision in data analytics refers to a model’s ability to correctly forecast outcomes. In other words, precision is a true positive divided by a combination of true and false positives.

Precision=TPTP+FP

F-Score, also known as F1 Score, is a balance of both precision and sensitivity. Hence, this study used F-Score to evaluate the machine learning models.

FScore=2×Precision×SensitivityPrecision+Sensitivity

Deployment

In the final phase a deployment strategy for the model was created and documented. The best predictive model as determined for each of the subsidies was to be recommended for further deployment. The entire CRISP-DM phases were carried out using the Konstanz Information Miner (KNIME 4.3.2), a free and open-source data analytics software.

Ethical approval

This study was carried out from November 23, 2020 to October 06, 2021 and has obtained ethics approval (EA1322021) by Technology Transfer Office, secretariat of research ethics committee, Multimedia university.

Results

The outcomes of this study were organised as descriptive analytics, model optimisation and findings. Descriptive analytics helps to understand the characteristics of each respondent and the relationship between variables. Table 2 provides the descriptive information on the respondents.

Table 2. Descriptive statistics of respondent demographics.

StateAge Group
Selangor
Johor
Sabah
KL
Perak
Kedah
Pahang
Melaka
Sarawak
Negeri Sembilan
Kelantan
Pulau Pinang
Terengganu
Putrajaya
Perlis
Labuan
27.74%
12.35%
8.41%
7.84%
6.29%
4.96%
4.87%
4.83%
4.52%
4.38%
3.30%
3.23%
3.02%
2.94%
0.83%
0.50%
35–44
25–34
45–54
15–24
55–64
65 and above
38.87%
29.19%
19.38%
6.6%
5.46%
0.50%
Marital Status
Married
Single/ Non-Married
Divorced/ Separated
Single mother
Widow/ Widower
70.95%
24.35%
1.98%
1.42%
1.28%
Location
City
Rural
70.71%
29.28%
GenderNumber of dependents
including self
Female
Male
55.7%
44.3%
3–4
5–10
1–2
More than 10 people
36.21%
35.2%
28.1%
0.47%
Ethnic GroupJob status
Malay


Native Sabah/
Sarawak
Chinese
Indian
Others
Foreigner
79.61%


10.29%

7.18%
1.75
1.07
0.1%
Government
Servants

Private Sector
Employees
Not Working
GLC Employees
Self-Employed
Employers
MNC Employees
Unpaid family
workers
39.22%

27.14%
21.61%

4.35%
3.05%
2.18%
2.12%
0.32%

Figure 3 shows the various types of subsidies offered in the ESP. Among the 41,386 respondents, 72.2% were eligible to receive subsidies, 21.9% were newly applied, 3.2% were not eligible and 2.7% had appealed. Figure 3 also shows the most beneficial forms of support. The most popular type of subsidy was cash allowance, followed by moratorium, utility discounts and EPF and PRS cash withdrawals. The least preferred type was the credit guarantee plan and wage subsidies.

a170e1fe-dd7c-41c6-a103-39f04b179cf2_figure3.gif

Figure 3. Most beneficial forms of assistance received under the Prihatin Rakyat ESP.

Following the descriptive analytics, the four machine learning techniques were applied to develop prediction models for each moratorium, utility discount and EPF withdrawals subsidies. Decision Tree, Gradient Boosted Tree, Random Forest and Naïve Bayes are subjected to parameter tuning to determine the best model and parameter values.

Table 3 to Table 6 show how the optimal model was obtained from each machine learning technique. Partitioning ratio indicates the training and test data. Gain ratio, Gini index and information gain were used to measure the quality of each predictor in classifying the target variable. The results show that the Gradient Boosted Tree and Naïve Bayes techniques performed well when 60% of the data were used to train the machine learning models and the other 40% was used for testing. Random Forest and Decision Tree techniques generated the best models when the training data were 80% and the test data were 20%. F-Score was used as the evaluation measure to select the optimal models. After identifying the optimal models, the best was selected among the four machine learning techniques. Table 7 shows the results. Gradient Boosted Tree outperformed the rest of the techniques in predicting the moratorium preference with 93.8% sensitivity, 82.1% precision and 87.6% F-score. When the data is partitioned with k= 5, and K=10, the findings reveal little difference in classifier performance, and the Gradient boosting tree still performs the best. When the data set is small, k-fold cross validation produces a superior model; but, when the data set is large, it produces no change. A current study backs up this conclusion (Marcot & Hanea, 2021)

Table 3. Gradient Boosted Tree - Parameter tuning and identification of optimal model.

Machine Learning
Technique
Partitioning
Ratio
Target Variable:
Moratorium
PrecisionSensitivityF-scoreAccuracy
Gradient Boosted Tree50:50 No0.8770.710.7840.838
Yes0.8190.9290.871
60:40No0.8910.7110.7910.844
Yes0.8210.9380.876
70:30No0.8830.7170.7910.843
Yes0.8230.9330.874
80:20No0.8810.7010.7810.837
Yes0.8150.9330.87

Table 4. Naïve Bayes - Parameter tuning and identification of optimal model.

Machine Learning
Technique
Partitioning
Ratio
Target Variable:
Moratorium
PrecisionSensitivityF-scoreAccuracy
Naïve Bayes50:50 No0.6270.4490.5240.661
Yes0.6750.8110.737
60:40No0.640.4530.530.667
Yes0.6790.8190.742
70:30No0.6330.4490.5260.663
Yes0.6760.8150.739
80:20No0.6240.4430.5180.658
Yes0.6730.8110.735

Table 5. Decision Tree - Parameter tuning and identification of optimal model.

Machine Learning
Technique
Partitioning
Ratio
Target Variable:
Moratorium
PrecisionSensitivityF-scoreAccuracy
Decision Tree50:50 (Gini Index) No0.7760.7050.7390.793
Yes0.8040.8560.829
50:50 (Gain Ratio)No0.7450.6850.7310.772
Yes0.7890.8340.811
60:40 (Gini Index) No0.7720.6970.7330.789
Yes0.7990.8550.826
60:40 (Gain Ratio)No0.7460.6690.7050.768
Yes0.7810.8390.809
70:30 (Gini Index) No0.7660.70.7310.787
Yes0.80.8480.823
70:30 (Gain Ratio)No0.750.6850.7160.775
Yes0.790.8380.813
80:20 (Gini Index) No0.7790.7270.7520.802
Yes0.8160.8540.834
80:20 (Gain Ratio)No0.7430.6970.7190.774
Yes0.7940.830.812

Table 6. Random Forest - Parameter tuning and identification of optimal model.

Machine Learning
Technique
Partitioning
Ratio
Target Variable:
Moratorium
PrecisionSensitivityF-scoreAccuracy
Random Forest50:50
(Information Gain)
No0.8090.7660.7870.828
Yes0.840.8720.856
50:50
(Gain Ratio)
No0.8050.7440.7730.819
Yes0.8280.8720.849
50:50
(Gini Index)
No0.8040.7630.7830.825
Yes0.8380.8680.853
60:40
(Information Gain)
No0.8090.7520.7790.823
Yes0.8330.8740.853
60:40
(Gain Ratio)
No0.8040.7390.770.817
Yes0.8250.8730.848
60:40
(Gini Index)
No0.8030.7590.780.823
Yes0.8360.8680.852
70:30
(Information Gain)
No0.8080.7740.7910.83
Yes0.8450.870.857
70:30
(Gain Ratio)
No0.8210.7530.7860.83
Yes0.8350.8840.859
70:30
(Gini Index)
No0.8150.7650.7890.831
Yes0.840.8770.858
80:20
(Information Gain)
No0.8130.7590.7850.827
Yes0.8370.8760.856
80:20
(Gain Ratio)
No0.8250.7520.7870.831
Yes0.8350.8870.86
80:20
(Gini Index)
No0.8250.7690.7960.837
Yes0.8440.8850.864

Table 7. Evaluation of predictive models: moratorium.

Machine Learning
Technique
Partitioning
Ratio
Target Variable:
Moratorium
PrecisionSensitivityF-scoreAccuracy
Decision Tree80:20(Gini Index)No0.7790.7270.7520.802
Yes0.8160.8540.834
Gradient Boosted Tree60:40No0.8910.7110.7910.844
Yes0.8210.9380.876
Random Forest80:20(Gini Index)No0.8250.7690.7960.837
Yes0.8440.8850.864
Naïve Bayes60:40No0.640.4530.530.667
Yes0.6790.8190.742

A similar process was carried out to develop machine learning models for utility discounts and EPF and PRS subsidies. The results show that for both subsidies, Gradient Boosted Tree was identified as the best machine learning technique. Table 8 and Table 9 show that this technique can predict utility discount with 86% sensitivity, 82.1% precision and 84% F-score, as well as EPF and PRS with 83.6% sensitivity, 81.2% precision and 82.4% F-score, respectively.

Table 8. Evaluation of predictive models: utilities discount.

Machine Learning
Technique
Partitioning
Ratio
Target Variable:
Utilities Discount
PrecisionSensitivityF-scoreAccuracy
Decision Tree80:20 (Gini Index) No0.8490.8350.8420.83
Yes0.8090.8240.816
Gradient Boosted Tree80:20No0.8760.8410.8590.85
Yes0.8210.860.84
Random Forest80:20 (Information
Gain Ratio)
No0.8480.8640.8560.843
Yes0.8360.8170.827
Naïve Bayes60:40No0.6430.6230.6330.609
Yes0.5710.5920.581

Table 9. Evaluation of predictive models: EPF and PRS withdrawals.

Machine Learning
Technique
Partitioning RatioTarget Variable:
EPF & PRS
PrecisionSensitivityF-scoreAccuracy
Decision Tree80:20 (Gini Index) No0.8820.8540.8680.844
Yes0.7910.8280.809
Gradient Boosted Tree60:40No0.8890.8710.880.857
Yes0.8120.8360.824
Random Forest60:40 (Information
Gain Ratio)
No0.870.8920.8810.855
Yes0.8310.7990.815
Naïve Bayes60:40No0.6990.7810.7380.666
Yes0.5990.4940.541

Discussion

1. How can households that favour moratorium subsidies be identified?

To answer this research question, four classification models have been built using decision tree, gradient boosted tree, random forest and naïve bayes machine learning techniques. The optimal model from each of these techniques are determined by tuning their parameters and the details of parameter values are explained in the previous section. Finally, the best model from each of the four machine learning algorithms is compared, and the best model is chosen using the F-score performance evaluation measure. When the data division ratio is 60 percent training data and 40 percent testing data, Gradient Boosted Tree was shown to be the best machine learning model for predicting moratorium subsidies preferred households, with F-score =0.876 and sensitivity = 0.938. Hence this model is recommended for the deployment phase.

Although the gradient boosting tree can more accurately identify households that favour moratorium subsidies, it is difficult to interpret. It's because the relationship between each predictor and the target is modelled using a curve, making it difficult to explain how each predictor relates to the target. Machine learning techniques are always a trade-off between prediction accuracy and interpretability. In general, a method's interpretability reduces as its accuracy improves. (Hastie et al., 2021). Therefore, the decision tree model is utilised to create the ruleset in order to determine the general profile of families who favour a moratorium subsidy. Rule support refers to the number of respondents to whom this condition applies. Rule confidence indicates the probability of having a moratorium as the preferred subsidy. Table 10 shows the basic characteristics of families that choose moratorium subsidies with a rule support of 400 and above.

Table 10. General profiles for moratorium subsidy preference.

ConditionRule
Support
Rule
Confidence
$Q12: Cash Assistance$ IN ("Yes") AND $Q12: Utilities Discount$ IN ("No") AND $Q12: EPF & PRS$ IN ("No") AND $Q12: Wage Sub$ IN ("Yes") AND
$Q12: CGS$ IN ("No") AND $Q10: Race$ IN ("Malay", "Native Sabah/Sarawak", "Others") AND $Q7: Age Group$ IN ("35-44 ", "25-34 ", "45-54 ", "55-64 ")
90295.68%
$Q22: Outbreak lifestyle changes?$ IN ("Yes") AND $Q4: Gender$ IN ("Male") AND $Q34: Internet access lvl$ IN ("NA", "Fast") AND $Q7: Age Group$
IN ("35-44 ") AND $Q5: Area$ IN ("City") AND $Q7: Age Group$ IN ("35-44 ", "25-34 ", "45-54 ") AND $Q8: Marital Status$ IN ("Married") AND $Q12:
Wage Sub$ IN ("No") AND $Q12: CGS$ IN ("No") AND $Q10: Race$ IN ("Malay", "Native Sabah/Sarawak", "Others") AND $Q7: Age Group$ IN ("35-44 ",
"25-34 ", "45-54 ", "55-64 ")
79288.51%
$Q23: Readiness of lifestyle changes$ IN ("Yes") AND $Q12: EPF & PRS$ IN ("Yes") AND $Q12: Utilities Discount$ IN ("No") AND $Q22: Outbreak
lifestyle changes?$ IN ("Yes") AND $Q2: State$ IN ("Selangor", "Perak", "Sarawak", "Kedah", "KL", "Johor", "Perlis", "Putrajaya", "Pahang", "Melaka", "Pulau
Pinang", "Terengganu", "Negeri Sembilan", "Sabah", "Labuan") AND $Q4: Gender$ IN ("Male") AND $Q18: Future ESP?$ IN ("Yes") AND $Q12: Cash
Assistance$ IN ("Yes") AND $Q5: Area$ IN ("Rural") AND $Q7: Age Group$ IN ("35-44 ", "25-34 ", "45-54 ") AND $Q8: Marital Status$ IN ("Married") AND
$Q12: Wage Sub$ IN ("No") AND $Q12: CGS$ IN ("No") AND $Q10: Race$ IN ("Malay", "Native Sabah/Sarawak", "Others") AND $Q7: Age Group$ IN
("35-44 ", "25-34 ", "45-54 ", "55-64 ")
46693.13%
$Q12: EPF & PRS$ IN ("Yes") AND $Q12: Utilities Discount$ IN ("No") AND $Q12: Cash Assistance$ IN ("Yes") AND $Q34: Internet access lvl$ IN ("NA",
"Slow", "Average", "No Internet Access") AND $Q22: Outbreak lifestyle changes?$ IN ("Yes") AND $Q4: Gender$ IN ("Female") AND $Q2: State$ IN
("Selangor", "Kedah", "KL", "Johor", "Perlis", "Putrajaya", "Pulau Pinang", "Negeri Sembilan", "Sabah") AND $Q7: Age Group$ IN ("25-34 ", "45-54 ", "15-24 ",
"55-64 ", "65 and above") AND $Q5: Area$ IN ("City") AND $Q7: Age Group$ IN ("35-44 ", "25-34 ", "45-54 ") AND $Q8: Marital Status$ IN ("Married")
AND $Q12: Wage Sub$ IN ("No") AND $Q12: CGS$ IN ("No") AND $Q10: Race$ IN ("Malay", "Native Sabah/Sarawak", "Others") AND $Q7: Age Group$
IN ("35-44 ", "25-34 ", "45-54 ", "55-64 ")
44790.16%

The first rule shows that households who prefer to have a cash allowance and their race is either Malay or Native Sabah/Sarawak or others, while those aged between 25 to 64 prefer moratorium. Table 11 explains the first rule indicating the general profile of households who prefer moratorium subsidies.

Table 11. General profile of moratorium subsidy preference - rule support: 902 records, rule confidence: 95.7%.

SubsidiesRaceAge Group
•    Cash Assistance (Yes)
•    Utilities Discount (No)
•    EOF & PRS (No)
•    Wage Subsidies (No)
•    CGS (No)
•     Malay
•     Native Sabah / Sarawak
•     Others
•     35-44
•     25-34
•     45-54
•     55-64

2. How to find out which households seek utility discount subsidies?

The four machine learning techniques described in the moratorium subsidies were applied to develop the classification model in order to identify the households who want utility discount subsidies. All of the procedures outlined in the preceding section were followed in order to find the optimal machine learning model. The gradient boosting tree outperforms the other three techniques, with a data partitioning ratio of 80:20, an F-score of 0.84, and a sensitivity of 0.86. Although the gradient boosting tree can more accurately identify households who seek utilities discount subsidies, it is difficult to interpret this model. As a result, decision tree rules were developed in order to comprehend the overall profile of households who seek utility discount subsidies. One such rule is presented in Table 12.

Table 12. General profile of utilities discount subsidy preference - rule support: 1128 records, rule confidence: 90.1%.

Readiness of
Lifestyle Changes
Marital StatusStatesSubsidies
Yes•     Married
•     Divorced / Separated
•     KL
•     Putrajaya
•     Melaka
•     Pulau Penang
•     Sarawak
•     Johor
•     CGS (No)
•     Cash Assistance (Yes)
•     Wage Subsidies (No)
•     Moratorium (Yes)
•     EPF & PRS (No)

3. How can households who desire EPF and PRS withdrawal subsidies be identified?

To identify the households who want EPF and PRS withdrawal subsidies., similar to the moratorium and utilities discount subsidies classification models, decision tree, gradient boosted tree, random forest and naïve bayes techniques were used to develop the machine learning model. With a data partitioning ratio of 60:40 and F-scores of 0.824 and sensitivity of 0.836, the gradient boosting tree was found to be the best model compared to the others. Decision tree rules were developed to explain the general features of households who prefer EPF and PRS withdrawal subsidies, and one of the rules is displayed in Table 13.

Table 13. General profile of EPF & PRS subsidy preference - rule support: 1427 records, rule confidence: 90.5%.

ESP in
future
Marital statusPriorityRaceStatesSubsides
Yes•     Single / Non-Married
•     Married
•     Widow / Widower
•      Divorced / Separated
Employment•     Malay
•     Native Sabah / Sarawak
•      Others
•     Foreigner
•   KL
•   Selangor
•   Johor
•   Labuan
•   Sarawak
•   Negeri Sembilan
•     CGS (No)
•     Cash Assistance (Yes)
•     Wage Subsidies (No)
•     Moratorium (Yes)
•     Utilities Discount (No)

The results imply that households that prefer moratorium subsidies did not favour other financial aids except cash assistance. By contrast, households that prefer for utility discounts, EPF and PRS withdrawals also chose moratorium subsidies and cash assistance. All households preferred cash assistance, which had the highest score among financial aids, followed by moratorium subsidies. Utility discounts, EPF and PRS withdrawals can be implemented according to the household income group preferences.

Wage subsidy and credit guarantee scheme were the least preferred financial assistance. First, the Prihatin wage subsidy is only for eligible Social Security Organisation (SOCSO) subscribers. Hawkers, small businesses and their employees might not subscribe to SOCSO and thus, ineligible to apply for financial aid. Second, the credit guarantee scheme was not preferred due to economic uncertainty from COVID-19. Economic uncertainty adversely affects household income, resulting in their inability to repay the loan instalments.

Limitations of the study

The following are some of the limitations of this study's findings: The data used are survey responses and cannot be considered to represent the views of all Malaysians. According to DOSM, it should not be used to analyse the impact of COVID-19 in Malaysia and should not be considered official statistics. It can, however, be utilised to assist in the reflection process (“Department of Statistics Malaysia Official Portal,” 2020; Malaysia, 2020). Another limitation is data partitioning method. This study portioned the data into training and test data sets. However, to improve the model, the dataset could be divided into training, validation and test data. This research used information based learning (a decision tree algorithm family) as the best model for explaining decision logic. However, it is possible to test with other classification methods, and they might have better accuracy.

Conclusions

This study used data analytics and machine learning approaches to derive insights from the “Effects of COVID-19 on the Economy and Individual - Round 2” survey dataset. The CRISP-DM approach was applied to develop prediction models for households’ preferred subsidies, such as moratoriums, utility discounts and EPF and PRS using four machine learning algorithms, namely, Decision Tree, Random Forest, Naïve Bayes and Gradient Boosted Tree. For all three subsidies, the best predictive model was obtained by Gradient Boosted Tree. The findings can be used to design customised ESPs that effectively manage the economic burden of low-income households.

Data availability

Data used in this study were obtained from a survey dataset “Effects of COVID-19 on the Economy and Individual - Round 2,” available from the Department of Statistics, Malaysia (DOSM). A report published by the DOSM based on the survey can be viewed on the DOSM website. Access to this data requires application, as stated on the DOSM website. A guide for how to apply for dataset access is available on the Data Request page or requests for more information can be emailed to data@dosm.gov.my.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 16 Sep 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Kannan R, Wang IZW, Ong HB et al. COVID-19 impact: Customised economic stimulus package recommender system using machine learning techniques [version 2; peer review: 2 approved, 1 approved with reservations]. F1000Research 2021, 10:932 (https://doi.org/10.12688/f1000research.72976.2)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 12 Nov 2021
Revised
Views
2
Cite
Reviewer Report 31 May 2024
Setia Pramana, Computational Statistics Department, Politeknik, Statistika STIS, Jakarta, Indonesia;  BPS Statistics Indonesia, Jakarta, Indonesia 
Approved
VIEWS 2
The authors have addressed all suggested ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Pramana S. Reviewer Report For: COVID-19 impact: Customised economic stimulus package recommender system using machine learning techniques [version 2; peer review: 2 approved, 1 approved with reservations]. F1000Research 2021, 10:932 (https://doi.org/10.5256/f1000research.79204.r100081)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
15
Cite
Reviewer Report 03 Dec 2021
Sreeja N.K, Department of Applied Mathematics and Computational Sciences, PSG College of Technology, Coimbatore, Tamil Nadu, India 
Approved
VIEWS 15
The authors have addressed the issues raised.
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
N.K S. Reviewer Report For: COVID-19 impact: Customised economic stimulus package recommender system using machine learning techniques [version 2; peer review: 2 approved, 1 approved with reservations]. F1000Research 2021, 10:932 (https://doi.org/10.5256/f1000research.79204.r100079)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 16 Sep 2021
Views
21
Cite
Reviewer Report 25 Oct 2021
Sreeja N.K, Department of Applied Mathematics and Computational Sciences, PSG College of Technology, Coimbatore, Tamil Nadu, India 
Approved with Reservations
VIEWS 21
  • The authors have compared few classifiers on Effects of COVID-19 on the Economy and Individual - Round 2 data set to predict individual household preferences from ESP.
     
  • I would recommend the authors
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
N.K S. Reviewer Report For: COVID-19 impact: Customised economic stimulus package recommender system using machine learning techniques [version 2; peer review: 2 approved, 1 approved with reservations]. F1000Research 2021, 10:932 (https://doi.org/10.5256/f1000research.76592.r97108)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 12 Nov 2021
    Rathimala Kannan, Department of Information Technology, Faculty of Management, Multimedia University, Cyberjaya, 63100, Malaysia
    12 Nov 2021
    Author Response
    • I would recommend the authors to use k-fold cross validation method to evaluate the performance of a classifier instead of a random 60-40 or 80-20 split. This would
    ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 12 Nov 2021
    Rathimala Kannan, Department of Information Technology, Faculty of Management, Multimedia University, Cyberjaya, 63100, Malaysia
    12 Nov 2021
    Author Response
    • I would recommend the authors to use k-fold cross validation method to evaluate the performance of a classifier instead of a random 60-40 or 80-20 split. This would
    ... Continue reading
Views
17
Cite
Reviewer Report 15 Oct 2021
Shahrinaz Ismail, Albukhary International University, Alor Setar, Kedah, Malaysia 
Approved with Reservations
VIEWS 17
Since this is a full research paper, it is recommended that a Literature Review section be included. It is believed that there are some previous research that have performed the 4 ML techniques covered by this research, hence the need ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Ismail S. Reviewer Report For: COVID-19 impact: Customised economic stimulus package recommender system using machine learning techniques [version 2; peer review: 2 approved, 1 approved with reservations]. F1000Research 2021, 10:932 (https://doi.org/10.5256/f1000research.76592.r95431)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 12 Nov 2021
    Rathimala Kannan, Department of Information Technology, Faculty of Management, Multimedia University, Cyberjaya, 63100, Malaysia
    12 Nov 2021
    Author Response
    Since this is a full research paper, it is recommended that a Literature Review section be included. It is believed that there are some previous research that have performed the ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 12 Nov 2021
    Rathimala Kannan, Department of Information Technology, Faculty of Management, Multimedia University, Cyberjaya, 63100, Malaysia
    12 Nov 2021
    Author Response
    Since this is a full research paper, it is recommended that a Literature Review section be included. It is believed that there are some previous research that have performed the ... Continue reading
Views
48
Cite
Reviewer Report 06 Oct 2021
Setia Pramana, Computational Statistics Department, Politeknik, Statistika STIS, Jakarta, Indonesia;  BPS Statistics Indonesia, Jakarta, Indonesia 
Approved with Reservations
VIEWS 48
  1. Research questions and the discussion and conclusion are not inline. The authors mentioned three research questions, but they are not discussed throughout the manuscript. Instead, the authors focus on comparing different methods for predicting household subsidy preferences.
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Pramana S. Reviewer Report For: COVID-19 impact: Customised economic stimulus package recommender system using machine learning techniques [version 2; peer review: 2 approved, 1 approved with reservations]. F1000Research 2021, 10:932 (https://doi.org/10.5256/f1000research.76592.r95430)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 12 Nov 2021
    Rathimala Kannan, Department of Information Technology, Faculty of Management, Multimedia University, Cyberjaya, 63100, Malaysia
    12 Nov 2021
    Author Response
    1. Research questions and the discussion and conclusion are not inline. The authors mentioned three research questions, but they are not discussed throughout the manuscript. Instead, the authors focus on ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 12 Nov 2021
    Rathimala Kannan, Department of Information Technology, Faculty of Management, Multimedia University, Cyberjaya, 63100, Malaysia
    12 Nov 2021
    Author Response
    1. Research questions and the discussion and conclusion are not inline. The authors mentioned three research questions, but they are not discussed throughout the manuscript. Instead, the authors focus on ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 16 Sep 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.