ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Feature optimized hybrid model for prediction of myocardial infarction

[version 1; peer review: awaiting peer review]
PUBLISHED 14 Jan 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

Abstract

Background

Cardiovascular disease is rampant worldwide and has become the leading factor in increasing the global mortality rates. According to the World Heart Federation, death toll due to CVD has increased from 12.1 million in 1990 to around 19 million in 2019. Myocardial Infarction (MI) is a condition where the heart muscle dies due to reduced or inhibited flow of oxygenated blood. It has affected approximately 3 million people worldwide, with more than 1 million deaths in the United States annually. Such unusual proliferation in global death toll due to CVD can be reduced to a great extent by predicting the risk of CVD at an early stage.

Method

In this paper, several feature selection techniques including Variance-based, Mutual Information (MI), Maximum Relevance Minimum Redundancy (MRMR), Boruta, and Recursive Feature Elimination (RFE) algorithms are used feature optimization. For class prediction, the Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), and Adaboost algorithms were implemented in their ordinary, One-vs-Rest (OVR) and One-vs-One (OVO) methods.

Result

The performance of Adaboost model has significantly improved by using feature selection techniques, that is, the accuracy of 74% (without any feature selection taking 5.3 seconds) is increased to 85% (with Boruta feature selection taking only 2.17 seconds training time) and 88% (with MRMR feature selection taking 1.6 seconds training time). Similarly, the DT-OVO model’s performance has improved from 84% (without any feature selection taking 1.48 seconds training time) to 86% (with Boruta feature selection taking 0.58 training time). For other models, the performance is maintained with reduced model training times.

Conclusion

This research paper prioritizes on feature selection in developing machine learning models for CVD prediction. This conclusion is justified by demonstrating the significant reduction in model training times for the 72 models generated while maintaining or even improving the model’s predictive performance.

Keywords

Cardiovascular Disease, Machine Learning, One-vs-One, One-vs-All, Feature Selection.

1. Introduction

Cardiovascular disease (CVD) refers to any obstruction in the normal functioning of the heart. Myocardial Infarction (MI) is a type of heart disease caused by decreased or complete stoppage of blood flow to a portion of the myocardium. It is a condition in which the heart muscle dies because of reduced or inhibited flow of oxygenated blood caused by partial occlusion of the coronary artery. Factors such as diets rich in fat, alcohol consumption, sedentary lifestyle, lack of proper sleep, work stress and many more usually lead to such obstructions or blockages that inhibit the proper flow of blood resulting to a heart attack. Myocardial infarction may be “silent,” and go undetected, or could lead to a catastrophic event leading like sudden death. The primary cause of amplification of MI cases in the US is the prevalence of coronary artery disease among people. Based on the statistics provided by WHO, around 17.9 million annual deaths occur due to CVD globally.1 In India, the increase in heart failure cases is mostly due to coronary heart disease, diabetes, hypertension, obesity, etc.2 People who are suffering or are likely to suffer from cardiovascular disease show symptoms such a rise in blood pressure, increased glucose levels, overweight, etc.3

However, today it has become possible to combat the increasing mortality rates due to MI, or CVD in general. Powerful and optimized machine learning models are able to predict the disease at an early stage and also recommend ways to cure it.4 Most ML models developed categorize heart disease patients into two classes: healthy or affected, however models that classify patients into multiple classes based on level of impact of disease is somewhat limited. This research work focuses on multiclass classification of heart disease patients using a Myocardial Infarction dataset taken from the UCI repository. To diminish the burden of training on the model, the number of predictors were reduced using feature selection techniques like Variance-based, Mutual Information (MI) based, Maximum Relevance Minimum Redundancy (MRMR), Boruta, and Recursive Feature Elimination (RFE) based methods. These feature-reduced datasets were partitioned into training and testing data followed by training ML models like LR, SVM, DT, and Adaboost. These algorithms were executed in their traditional procedure, using One-vs-all (OVA) method, and using One-vs-one (OVO) method and all these models were analyzed with respect to the accuracy, recall, and precision provided by them. Besides these performance metrics, a comparison of the model training times taken by the 60 models using the 6 feature selection scenarios is also illustrated.

Section 2 discusses some of the research works that have played a crucial role in providing a foundation for this research work. Section 3 discusses the flow of work and describes the dataset and algorithms applied in this research. Section 4 presents the results obtained and finally the paper is concluded in Section 5.

2. Literature review

Rashmi G. Saboji et al.5 have used genetic search to obtain 13 important predictors out of 76 attributes of Cleveland heart disease dataset. They also used the Switzerland and Hungary heart disease datasets containing the same 13 predictors. Upon these datasets, Random forest and Naive Bayes algorithms were applied on varying training dataset sizes (200,400, 600 instances). Both algorithms were compared in terms of the accuracy obtained and it was observed that the RF model gave better accuracy than NB for all 3 training data sizes, that is, 88%, 96%, and 98% for 200 instances, 400 instances, and 600 instances respectively in the training data.

Kirsi Varpa et al.1 have conducted experiments on an Otoneurological Disorder dataset containing a multinomial target attribute (nine classes in the target attribute) by implementing KNN and SVM in ordinary, OVA, and OVO methodologies. SVM was implemented using both linear and RBF kernel functions. All the nine models were compared against a 5-NN baseline model (which gave 89.5% accuracy) and it was observed that 5-NN with OVO yielded the best performance with 95% accuracy.

G. Manikandan et al.6 have compared the LR, DT, SVM, RF, and XGBoost models for predicting heart disease. First the Boruta feature selection technique was applied on the Cleveland heart disease dataset, which resulted in selection of 6 out of 13 predictors, followed by application of the aforementioned ML models on the reduced dataset. This research concluded that the LR combined with Boruta model outperformed all the other models with an accuracy of 88.52%.

Asif Nawaz et al.,7 in their work, suggested a model based on hybridization of data sampling and cost-sensitive learning for handling imbalanced dataset. They have used the Myocardial Infarction (MI) dataset which contained 1700 patient records and was highly imbalanced at the ratio 1:5.67. They have compared multiple class balancing methods like SMOTE, ADASYN, Tomek-link, ENN, weighted XGBoost with their proposed method which gave a better performance in terms of accuracy, ROC-AUC, and MCC. The combination of data sampling and cost-sensitive learning using XGBoost for classification gave an accuracy of 91.98%.

Abedayo Ogunpola et al.8 compared seven different ML and DL algorithms like, LR, SVM, KNN, RF, Gradient boosting, XGBoost, and CNN by applying them on two datasets: Cardiovascular Heart disease dataset from Mendeley database and Cleveland Heart disease dataset from Kaggle database. These algorithms were compared based on their accuracy, precision, recall, and F1-score, and it was observed that XGBoost outperformed the other models.

3. Methods

This section discusses the flow of work of this research work. First, the MI patients dataset was collected from the UCI repository and was preprocessed to handle missing values and removal of trivial attributes like patient Id. Next, the original dataset was split into training and testing datasets followed by balancing the classes in training dataset using SMOTE. Further, feature selection techniques are applied on the dataset to select a smaller number of relevant predictors followed by classification algorithms to predict the patient class. The Figure 1 below shows the workflow of the implementation.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure1.gif

Figure 1. Flow of work.

Synthetic Minority Oversampling Technique (SMOTE) is a class balancing method in which synthetic instances are created for minority class using some simple statistical operations. In this method, first the difference between any two neighboring samples (Xi and Xj) of minority class is computed and this difference is multiplied with a random value between 0 to 1, referred to as lambda. The resultant set of values is added to Xi or Xj to produce a new instance.

3.1 Dataset description

This study considers a Myocardial Infarction dataset containing 1700 patient records from the Krasnoyarsk Inter-district Clinical Hospital, Russia, available in the UCI repository. This dataset has 124 attributes including one patient Id column, one target attribute called Lethal Outcome, and remaining 122 attributes include information like patient’s demographic details, heart disease history, patient condition during admission to hospital, condition after 24 hours, 48 hours, and 72 hours of admission, patient condition during admission to ICU, condition after 24 hours, 48 hours, and 72 hours of admission to ICU, use of several drugs and condition of the patient after 24 hours, 48 hours and 72 hours of use of the drug. The target attribute has 8 classes from 0 to 7, indicating the cause of death of the patient.

This dataset is designed such a way that maximum importance is given to the initial hours of the patient’s condition after a particular treatment. The below Figure 2, Figure 3, and Figure 4 represent the count of target attribute classes, count of target attribute classes with respect to the gender of patient, and the count of target attribute classes after balancing the dataset using SMOTE. In Figure 2 and 3, the count of target attribute classes is shown including and excluding class 0 in order to highlight the imbalance in class.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure2.gif

Figure 2. Count plot for Lethal Outcome.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure3.gif

Figure 3. Count plot for Lethal Outcome with respect to Gender of patient.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure4.gif

Figure 4. Count plot for Lethal Outcome after data oversampling using SMOTE.

3.2 Feature selection algorithms used

Feature selection is a crucial pre-processing activity before making predictions using ML models. It helps in reducing the burden of training the model by selecting a few selective predictors from all available features of the dataset.911 Several algorithms exist for the selection of relevant features which can be categorized into three methods: Filter method, Wrapper method, and Embedded method.

Filter method feature selection techniques individually check the relationship between each feature and the target attribute. It uses correlation to compute the dependency of the target attribute on a particular feature, and determines whether the target is negatively or positively correlated with the feature. Examples of filter methods include Chi-square test, Variance based, Mutual Information, Fisher’s score, etc. Wrapper method feature selection techniques involve testing the classification model performance based of different feature subsets, that is, the features are added and removed dynamically and the model is trained upon every possible combination. The feature subset that gives the best performance is selected as the most optimal set of features. Due to its working method, it is also known as greedy method of feature selection. Examples of wrapper method include Forward Selection, Recursive Feature Elimination, Backward Selection, Boruta, etc. Embedded method feature selection combines the advantages of filter methods and wrapper methods. This method takes care of the machine training iterative process while maintaining the minimum computation cost. Examples of embedded method are Lasso and Ridge Regression.

In this research, five different FS techniques are applied on the Myocardial Infarction dataset, which include the Variance based, Mutual Information based, Maximum Relevance Minimum Redundancy, Boruta and Recursive Feature Elimination based feature selection.

Variance based Feature Selection: Higher the variance of a feature, more is the dependency of target attribute upon that feature, lower the variance, lesser will be the dependency. In this method, the variance of each feature is computed and all features having variance less than a certain threshold are eliminated. In our research, the threshold variance was set to 0.2 and it was observed that out of 124, out 33 features were accepted. The below Figure 5 depicts the pseudo code for selecting features using this method.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure5.gif

Figure 5. Pseudo code for Variance based FS.

Mutual Information (MI) based FS: MI refers to the amount of dependency between two variables. An importance score greater than zero indicates that there exits some dependency between the two variables and an importance score equal to zero implies that the variables are completely independent of each other. The mutual information between 2 variable X and Y, given by I(X,Y), is computed using the following formula:

I(X,Y)=H(X)H(X|Y)

Such that H(X) indicates the entropy in variable X and H(X|Y) depicts the entropy in X when Y is true. Entropy refers to the amount of information contained in a random variable. In this MI feature selection technique, the MI between the target attribute and every other feature is computed to determine the degree of dependency of the target attribute on that feature. Based on the computed importance scores, the top ‘K’ features are selected for training the model. Figure 6 depicts the pseudocode for selecting features using this method.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure6.gif

Figure 6. Pseudo code for MI based feature selection.

The below Figures 7, 8, 9, and 10 provide the ‘Accuracy’ vs ‘Number of features selected using MI’ method plot for the LR, DT SVM, and Adaboost algorithms respectively with the number of features ranging between 1 to 95.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure7.gif

Figure 7. Accuracy vs Feature count plot for MI and Logistic Regression model.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure8.gif

Figure 8. Accuracy vs Feature count plot for MI and Decision Tree model.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure9.gif

Figure 9. Accuracy vs Feature count plot for MI and Support Vector Machine model.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure10.gif

Figure 10. Accuracy vs Feature count plot for MI and Adaboost model.

Maximum Relevance Minimum Redundancy (MRMR) based Feature Selection: This technique is an improved form of the MI feature selection approach. MI may lead to selection of all the features that are important for the target attribute. However, this may include multiple features which are highly correlated, that is, extremely similar, therefore having only one of those features would be sufficient to train the model. The MRMR approach handles this issue by retaining only one of the multiple similar features that are equally important for the target attribute. The basic principle of MRMR method lies in computing the importance score of each feature in terms of its relevance and redundancy with respect to the target attribute. At each step, the importance score of each unselected feature is calculated using either the difference (relevance minus redundancy) or quotient (relevance divided by redundancy) approach. The below Figure 11 depicts the pseudo code for selecting features using this method.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure11.gif

Figure 11. Pseudo code for MRMR based feature selection.

The below Figures 12, 13, 14, and 15 provide the ‘Accuracy’ vs ‘Number of features selected using MRMR method plot for the LR, DT SVM, and Adaboost algorithms respectively with the number of features ranging between 1 to 95.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure12.gif

Figure 12. Accuracy vs Feature count plot for MRMR and Logistic Regression model.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure13.gif

Figure 13. Accuracy vs Feature count plot for MRMR and Decision Tree model.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure14.gif

Figure 14. Accuracy vs Feature count plot for MRMR and Support Vector Machine model.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure15.gif

Figure 15. Accuracy vs Feature count plot for MRMR and Adaboost model.

Boruta: In this technique, a copy of all original features, with shuffled rows are created and added to the original dataset. This additional set of features is commonly referred to as Shadow features.12 The new dataset is then provided to a random forest model which computes the importance of each feature and the shadow feature having the highest importance is identified. All features of the original dataset that have an importance value higher than the identified shadow feature are retained. This process is repeated for certain number of times (minimum 20 times), and the original features that are retained for majority of the iterations are selected for final model training. In our study, we have used 100 iterations to select the optimal features. Figure 16 depicts the pseudo code for selecting features using this method.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure16.gif

Figure 16. Pseudo code for Boruta based FS.

Recursive Feature Elimination (RFE) based feature selection is another attribute selection method which attempts to obtain the best feature subset of size ‘K’ where ‘K’ is the number of features required. This objective is achieved by eliminating the less important features and retaining the relevant ones which help in improving the model performance. In this method, the predictors are assigned ranks based on the feature_importances_ attribute of the predictive model being used removing the ones with lowest importance. This process was performed iteratively using the reduced feature-subset until the desired number of features was obtained. Figure 17 depicts the pseudocode for selecting features using this method.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure17.gif

Figure 17. Pseudo code for RFE based feature selection.

3.3 Class prediction algorithms used

Logistic Regression is a supervised regression and classification algorithm that assumes each data point to be independent of each other and no outliers should be present in the dataset. Ideally it handles datasets having a binomial target attribute, but can handle multinomial target attributes with softmax function. The logistic regression algorithm uses a sigmoid function to generate a probability value that indicates the probability of a tuple belonging to a particular class.13,14

Support Vector Machine is another machine learning algorithm used to classify data points into two or more classes by trying to find an optimal hyperplane that separates the different data points. Out of all the possible hyperplanes, the one that provides the maximum margin, known as the Maximal Margin Hyperplane (MMH), is selected as the most optimal one.1517 SVM has a kernel hyperparameter which is a mathematical function used to map the instances to a high-dimensional space to be able to easily obtain the MMH if the data is non-linearly separable.18 Some of the commonly used kernel functions are sigmoid, linear, radial basis function, polynomial function, etc.

Decision tree is a tree-structures regression and classification model consisting of test on attributes as internal nodes, values of these attributes as branches to the next level, and class labels as the leaf nodes. At each level, attributes are are chosen based on metrics like gini impurity, entropy, and information gain. This process continues until a pure node is obtained, that is, each value of that attribute belongs to the same class. Entropy refers to the amount of uncertainty in the attribute considered. Information gain refers to the reduction in entropy after splitting the dataset based on a certain attribute.

Adaptive Boosting or Adaboost is an ensemble learning algorithm in which the weak learner is trained iteratively and each successive model gives higher weightage to the misclassified data points. The final Adaboost model is obtained as an ensemble of these weak learners based on the model weights, where the highest weight is given to the model with the highest accuracy and the lowest weight is given to the model with the lowest accuracy.

One-vs-All (OVA) is a way of implementing multinomial classification problem using ‘n’ binary classifiers where ‘n’ implies the number of categories in the target attribute. Each classifier Mi is dedicated to a single class Ci considering class Ci as 1 and other classes as 0. Each binary classifier predicts whether an instance belongs to class Ci or not. The average of the accuracy of each each model is considered to be the final accuracy of the OVA model.16

One-vs-One (OVO) is another method of executing multiclass classification using multiple binary classifiers, where a binary classifier is built for every pair of target classes Ci and Cj, that is, the number of binary classifiers required is n*(n-1)/2, where ‘n’ is the number of classes in the target attribute. Each data point is then classified based on majority vote applied on results of all models.1

4. Results

The models generated by each of the aforementioned algorithms in their ordinary, one-vs-all, and one-vs-one approaches under various feature selection scenarios are compared in terms of the accuracy, precision, recall, and F1-score provided by each of them. Tables 1 to 6 present the evaluation metrics provided by each algorithm under the six scenarios (with no feature selection, variance based feature selection, mutual information based feature selection, maximum relevance minimum redundancy based feature selection, Boruta feature selection and recursive feature elimination) respectively.

Table 1. Performance metrics of 4 classification models without feature selection.

No feature selectionLogistic Regression (122 features)Decision Tree (122 features)SVM (122 features)Adaboost (122 features)
SimpleOVAOVOSimpleOVAOVOSimpleOVAOVOSimpleOVA OVO
Accuracy 828183 847384 919191 749090
Precision 909090868889868986898990
Recall 828183847385919191749090
F1-score 868586858087889088809090

Table 2. Performance metrics of 4 classification models with Variance based (>0.2) feature selection.

Variance feature selectionLogistic Regression (32 features)Decision Tree (32 features)SVM (32 features)Adaboost (32 features)
SimpleOVAOVOSimpleOVAOVOSimpleOVAOVOSimpleOVA OVO
Accuracy 514955 685575 848384 728083
Precision 868585798280798080808283
Recall 514955685575848384728083
F1-score 626066726677818080868183

Table 3. Performance metrics of 4 classification models with Mutual Information based feature selection.

MI feature selectionLogistic Regression (85 features)Decision Tree (43 features)SVM (50 features)Adaboost (60 features)
SimpleOVAOVOSimpleOVAOVOSimpleOVAOVOSimpleOVA OVO
Accuracy 828084 807384 90 90908390 90
Precision 909091858888878887888990
Recall 828084807384909090839090
F1-score 868587828085888988859090

Table 4. Performance metrics of 4 classification models with MRMR based feature selection.

MRMR feature selectionLogistic Regression (84 features)Decision Tree (18 features)SVM (61 features)Adaboost (56 features)
SimpleOVAOVOSimpleOVAOVOSimpleOVAOVOSimpleOVA OVO
Accuracy 827982 84 778290 9090 889091
Precision 909090868786878787868890
Recall 827982847782909090889091
F1-score 858485858184888888878990

Table 5. Performance metrics of 4 classification models with Boruta feature selection.

Boruta feature selectionLogistic Regression (95 features)Decision Tree (95 features)SVM (95 features)Adaboost (95 features)
SimpleOVAOVOSimpleOVAOVOSimpleOVAOVOSimpleOVA OVO
Accuracy 828082 847586 909091 859091
Precision 909090878889858686888990
Recall 828082847586909091859091
F1-score 868486858087888888868990

Table 6. Performance metrics of 4 classification models with RFE based feature selection.

RFE feature selectionLogistic Regression (70 features)Decision Tree (40 features)SVM (58 features)Adaboost (16 features)
SimpleOVAOVOSimpleOVAOVOSimpleOVAOVOSimpleOVA OVO
Accuracy 817981 847086 919191 748285
Precision 909090878888878887868687
Recall 817981 847086 919191 748285
F1-score 858385857887898989798496

A graphical representation of the aforementioned metrics is also shown below in the bar graphs. Figures 18, 19, 20, 21, 22, and 23 indicate the accuracy, precision, recall, and f1-score of all the four algorithms (in ordinary, OVA, and OVO implementations) under the six scenarios, that is, without any feature selection, with variance based, MI based, MRMR based, Boruta, and RFE based feature selection methods respectively.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure18.gif

Figure 18. Performance metrics of 4 classification models with No feature selection.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure19.gif

Figure 19. Performance metrics of 4 classification models with Variance based feature selection.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure20.gif

Figure 20. Performance metrics of 4 classification models with Mutual Information feature.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure21.gif

Figure 21. Performance metrics of 4 classification models with MRMR feature selection.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure22.gif

Figure 22. Performance metrics of 4 classification models with Boruta feature selection.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure23.gif

Figure 23. Performance metrics of 4 classification models with RFE feature selection.

It can be observed from the above tables and graphs that out of the 60 models implemented, the highest accuracy obtained without any feature selection is 91% by SVM-OVO model with 122 features. However, the same accuracy is also achieved by the Adaboost-OVO model with 95 features selected using Boruta and with less than half number of features, i.e., 58 features selected using RFE feature selection and with only 56 features selected using MRMR feature selection.

The second highest accuracy of 90% is provided by the Adaboost-OVO model using 122 features and the same performance is also achieved by the SVM model with only 50 features, 61 features, and 91 features selected using MI based, MRMR based, and Boruta feature selection methods respectively, as well as, by the Adaboost-OVA model with 60 features selected using MI feature selection.

Besides the performance metrics discussed above, emphasis is laid upon the use of various feature selection methods by comparing the model training times taken by all the 60 models with the reduced feature sets. These training times are an indication of burden on the model, lower the training time, lower is the burden on the model. The below Table 7 provides the model training times in seconds.

Table 7. Model training times for different feature selection techniques (in seconds).

AlgorithmMode of ImplementationNo FSVariance FSBoruta FSMI FSMRMR RFE
LR Ordinary0.4750.4620.2210.2760.2600.419
OVA1.2160.8530.4800.5780.4280.858
OVO0.9551.0620.3950.4980.4130.721
DT Ordinary0.6210.3540.2560.2080.0880.295
OVA3.6892.3821.4510.9250.3111.709
OVO1.4851.3890.5800.4480.1770.760
SVM Ordinary1.2891.5610.5480.4570.4741.254
OVA4.6274.6411.8041.6911.4294.684
OVO1.5541.7930.6370.6550.6911.648
Adaboost Ordinary5.2843.1352.1692.2581.5641.067
OVA34.65124.06715.18314.56910.2506.754
OVO27.84223.87111.45811.0447.5097.530

Figure 24 shows the training times taken by the 4 algorithms for six feature selection scenarios: No feature selection, Variance based FS, Boruta FS, Mutual Information based FS, Minimum Redundancy Maximum Relevance based FS, and Recursive Feature Elimination based FS in their ordinary implementation. Similarly, Figure 19 and Figure 20 show the training times taken by the 4 algorithms for 6 feature selection scenarios in their OVA and OVO implementations respectively.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure24.gif

Figure 24. Ordinary model training times for different feature selection (in seconds).

It can be observed from the Figures 24, 25, and 26 that the model training times are somewhat decreasing for the LR and DT models and have significantly reduced for the SVM and Adaboost models. This decrease in training times is a clear implication of the reduced burden upon the prediction models as they do not have to learn large amounts of data while maintaining the model performance.

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure25.gif

Figure 25. OVA model training times for different feature selection (in seconds).

e4a116d9-93b6-4779-9c1d-029dd0fd2536_figure26.gif

Figure 26. OVO model training times for different feature selection (in seconds).

5. Conclusion

The primary objective of this research work is to strike a balance between the predictive performance of the model and the burden of training the model. As discussed in Section 4, the use of selective predictors extracted by the application of feature selection techniques has provided similar results to those of the models without feature selection. It can also be observed that for some sophisticated models such as Adaboost, the performance has significantly improved by the use of feature selection techniques, that is, the accuracy of 74% (without any feature selection taking 5.3 seconds) is increased to 85% (with Boruta feature selection taking only 2.17 seconds training time) and 88% (with MRMR feature selection taking 1.6 seconds training time). Similarly, the DT-OVO model’s performance has improved from 84% (without any feature selection taking 1.48 seconds training time) to 86% (with Boruta feature selection taking 0.58 training time). It can be noted that the performance of DT-OVO model has improved from 84% accuracy with 122 features (taking 1.48 seconds training time) to 86% with only 40 features (taking only 0.76 seconds training time) selected using RFE. The advantages of these reduced training times can be clearly noticed when dealing with a large number of data instances. Overall from this experiment it is clear that, while maintaining a decent level of predictive performance of the model, it is essential to keep the number of predictors optimal so as to reduce the model training burden. In future, the several feature selection techniques can be hybridized and used upon machine learning models as well as ensembled model to enhance the predictive performance while keeping a check on the number of essential features.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 14 Jan 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Mishra S, Pandey M and Routaray SS. Feature optimized hybrid model for prediction of myocardial infarction [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:78 (https://doi.org/10.12688/f1000research.160393.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 14 Jan 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.