A Differential Evolution-Based Optimized Ensemble for Balanced and Imbalanced Medical Datasets

Surajit Das; Samaleswari P. Nayak; Biswajit Sahoo; Satyananda Champati Rai

doi:10.12688/f1000research.169456.1

Home Browse A Differential Evolution-Based Optimized Ensemble for Balanced and...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

A Differential Evolution-Based Optimized Ensemble for Balanced and Imbalanced Medical Datasets

[version 1; peer review: awaiting peer review]

Surajit Das¹, Samaleswari P. Nayak², Biswajit Sahoo ¹, Satyananda Champati Rai¹

PUBLISHED 29 Sep 2025

Author details Author details

¹ School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, Odisha, 751024, India
² Department of Computer Science and Engineering, Silicon University, Bhubaneswar, Odisha, 751024, India

Surajit Das
Roles: Data Curation, Formal Analysis, Software, Writing – Original Draft Preparation

Samaleswari P. Nayak
Roles: Conceptualization, Project Administration, Supervision

Biswajit Sahoo
Roles: Funding Acquisition, Investigation, Validation

Satyananda Champati Rai
Roles: Methodology, Resources, Visualization, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS AWAITING PEER REVIEW

This article is included in the AIDR: Artificial Intelligence for Data Discovery and Reuse collection.

This article is included in the Data: Use and Reuse collection.

This article is included in the AI in Medicine and Healthcare collection.

Abstract

Background

Class imbalance is a frequent and severe problem in medical datasets, where instances from the minority class are usually high risk or disease positive. Most traditional classifiers suffer from a biasness towards the majority class, resulting in a poor detection rate of the minority class and, therefore, decreased confidence in prediction systems in medical applications.

Methods

In this paper, we present an optimized ensemble by differential evolution (OEDE), a novel ensemble learning framework, to address this problem. OEDE harmonizes three dissimilar base learners (Logistic Regression, Random Forest, and XGBoost) and trains each using class-balancing techniques. Next, the model utilized Differential Evolution (DE) to discover the most appropriate ensemble weights to maximize the area under the ROC curve (AUC) on a validation dataset.

Result

We conducted experiments on four real-world medical datasets, whose imbalance ratios vary from 1.89 to 14.6, using OEDE in the original, SMOTE, and ADASYN balanced conditions. The results indicate that the proposed OEDE consistently outperforms or is competitive with traditional ensemble models in terms of AUC, F1-score, and Recall. ROC curve analysis also approved the OEDE’s superior discriminative capabilities.

Conclusion

The proposed OEDE framework effectively improves minority class detection in imbalance medical datasets. Its robust and flexible design makes it a promising tool for healthcare risk prediction tasks where minority class groups need to be well identified.

Keywords

Ensemble Learning, Differential Evolution, Class Imbalance,AUC Optimization, SMOTE, ADASYN.

Corresponding author: Biswajit Sahoo

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2025 Das S et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Das S, Nayak SP, Sahoo B and Champati Rai S. A Differential Evolution-Based Optimized Ensemble for Balanced and Imbalanced Medical Datasets [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:1003 (https://doi.org/10.12688/f1000research.169456.1) First published: 29 Sep 2025, 14:1003 (https://doi.org/10.12688/f1000research.169456.1) Latest published: 29 Sep 2025, 14:1003 (https://doi.org/10.12688/f1000research.169456.1)

1. Introduction

Machine learning has become increasingly popular in medical and healthcare services in recent years because it can be employed to analyze multidimensional datasets and detect subtle patterns that may not be detectable using traditional standard statistical methods.^1,2 Disease diagnosis, estimation of survival, and risks using machine learning models are becoming more common in assisting clinical decision-making. However, one of the most common and serious issues in medical datasets is class imbalance, in which an individual class label with a smaller number of instances occurs more infrequently than the others.³ As a result, traditional models may be biased in favour of the majority class, which may lead to a loss of sensitivity and misclassification of rare but significant minorities.

Many real-world medical datasets, such as cancer prediction, surgical outcomes, and disease screening, suffer from moderate to severe class imbalance. Models trained on imbalanced data often achieve high overall accuracy because they only predict the majority class but may have a poor generalization to the minority class. This is a problem in healthcare, where a missed positive result can be catastrophic. People face this challenge with traditional resampling methods such as SMOTE⁴ and ADASYN⁵ and ensemble methods⁶ such as AdaBoost (AB), CatBoost (CB), Gradient Boost (GB), XGBoost (XGB), Random Forest (RF), Balanced Random Forest (BRF), LightGBM (LGBM), Easy Ensemble (EE), and Extra Trees (ET). Some of these studies enhanced the performance to some extent, but they were not flexible in assigning weights based on the weights and did not directly optimize the corresponding relevant evaluation measure.

For imbalance classification, there is still a gap in ensemble model design, where most models use static or heuristic weights for the base models. Hence, an adaptive ensemble model is needed that incorporates class balance into training and learns to correctly weight the outputs of the base learners. To address these goals, we proposed a novel ensemble model called Optimized Ensemble by Differential Evolution (OEDE) using an adaptive weighted ensemble, where the optimal weights are determined by Differential Evolution (DE). The main contributions of this study are as follows:

• Integration of Class-balanced base learning.
• Differential Evolution-based weight Optimization.
• Robust evolution on diverse medical datasets.
• Comprehensive comparison with state-of-the-art models and visualization.

2. Related work

Dey and Pratap⁷ studied different oversampling techniques, such as SMOTE, Borderline-SMOTE, and ADASYN, on different statistical models, such as SVM, KNN, GNB, DT, and RF, concluding that RF combined with SMOTE outperforms others. T.-C.T. Chen et al.⁸ used an ensemble approach for the classification of diabetes, where DNN performed the classification, and a modified RF was used to explain the results of the proposed model. MARTÍNEZ-VELASCO⁹ et al. have used both oversampling and under-sampling along with 8 different ML models and concluded that the balanced bagging and balanced RF (BRF) beat every other setup, even without balancing the dataset. To address this class imbalance problem, Agyemang et al.¹⁰ used several oversampling techniques such as Random Oversampling (RO), SMOTE, SMOTE-Tomek, and ADASYN, along with ML models such as K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Decision Tree (DT), and concluded that RO-SVM gave the best result. Abayomi-Alli et al.¹¹ proposed a 2-phase ensemble model combining DNN with 15 other ML models (ExtraTrees, SVM, RBF, etc.) for COVID-19 classification, showing that the DNN-ExtraTrees ensemble performed better than the other combinations.

Elgendy et al.¹² used a stacking-based ensemble of seven base models for diabetes prediction and concluded that the stacked multilayer perceptron (MLP) provides the highest accuracy. Dutta et al.¹³ applied weighted average ensemble strategies with GNB, BNB, RF, DT, XGB, and LGB and concluded that the DT+RF+XGB+LGB pair achieves 73.5% accuracy, which is the highest among all other pairs. Alzakari et al.¹⁴ proposed a two-stage ensemble combining XGBoost and Bi-LSTM, where XGBoost performs feature selection and early classification and Bi-LSTM performs second stage classification and pattern recognition. Das et al.¹⁵ conducted studies of different ML models in different class-imbalanced datasets and concluded that RF performs remarkably well in both balanced and imbalanced datasets. Senthilvadivu et al.¹⁶ used RF and XGB for decision making in ICU patients and showed that XGB performs better than RF.

For the prediction of heart disease, Abdellatif et al.¹⁷ proposed a weighted random forest ensemble model, used along with an infinite feature selection strategy, and concluded that the proposed model performed better than the SMOTE-RF combination. Yalin et al.¹⁸ proposed the XGBoost-BLR method for the classification of diabetes, where XGBoost is used to transform selected features into higher dimensions, and binary logistic regression (BLR) was used for modelling the higher dimensional data. Abnoosian el al.¹⁹ used a normalized weighted ensemble to aggregate the results of six different ML models for the classification of diabetes, showing that the proposed ensemble model performed better than the individual base models. Liu et al.²⁰ used bagging to overcome the problem of an imbalanced dataset, where LR was used for feature selection, and SVM was used as a weak classifier. For heart disease prediction, Masruriyah et al.²¹ used SMOTE and ADASYN along with four ML models and concluded that the oversampling techniques cause a reduction in the accuracy of the model.

Pablo et al.²² performed an analysis of different ML models and sampling techniques on the COVID-19 dataset and concluded that MLP stood out strongly in all combinations, whereas SVM gave lower performance in all combinations. To classify COVID-19, Chowdhury et al.²³ proposed a two-step ensemble pipeline with four different ML models, where the results of KNN, SMV, and XGB were passed through RF for final prediction, and concluded that the proposed model outperformed the existing models. Prithula et al.²⁴ proposed a stacking ensemble with ET, RF, and CB as base models and GB as a meta-learner, and concluded that CB outperforms the proposed model. A study by Chowdhury et al.²⁵ Performance analysis of different ML models was performed in the original dataset, and after the dataset was balanced using oversampling, under-sampling, and hybrid-sampling techniques, it was shown that the ML models performed better in the original dataset, except that there was a minor improvement in recall after the dataset was balanced. To handle the imbalanced dataset, Mienye and Sun²⁶ proposed four cost-effective ML models by tuning the hyperparameters and concluded that cost-sensitive XGB performs better than the other models.

In summary (as shown in Table 1), various strategies have been explored to address the class imbalance problem. Oversampling methods such as SMOTE and ADASYN, under-sampling techniques such as RUS and ENN, and hybrid sampling techniques such as SMOTE-ENN and SMOTE-Tomek have been widely used to improve the performance of the models. Many studies have shown that ensemble models can improve the performance by assembling multiple base models. In fact, models such as RF and XGBoost are highly effective across different healthcare applications, as shown in different studies. Overall, the literature suggests that there is no universal solution, and that the selection of the technique is typically based on the nature of the dataset.

Table 1. Literature review.

Author	Disease	Models	Is the original dataset imbalanced	Imbalance handle strategy	Observation
Dey and Pratap⁷	Diabetes and Breast Cancer	SVM, KNN, GNB, DT, RF	Yes	SMOTE, Borderline-SMOTE, ADASYN	RF in combination with SMOTE outperforms others.
Chen et al.⁸	Diabetes	DNN, RF	Yes	-	RF is used to explain the result of DNN.
Martínez-velasco et al.⁹	Age-Related Macular Degeneration and Preeclampsia	Balanced Bagging, BRF, RF, GB, KNN, LR, SVM, DT	Yes	SMOTE and Under-sampling	Balanced bagging and BRF outperform others even in imbalanced datasets.
Agyemang et al.¹⁰	Stroke	KNN, SVM, LR, RF, DT	Yes	RO, ADASYN, SMOTE, SMOTE–Tomek	SVM with Random Oversampling performs better.
Abayomi-Alli et al.¹¹	COVID-19	DNN, ExtraTrees, SVM, RBF etc.	Yes	SMOTE	DNN-ExtraTrees ensemble outperforms others.
Elgendy et al.¹²	Diabetes	LR, RF, MLP, AB, GB etc.	Yes	SMOTE	MLP with Staking gives the highest accuracy.
Dutta et al.¹³	Diabetes	GNB, BNB, RF, DT, XGB, LGB	Yes	-	DT, RF, XGB, and LGB combination. performance than others.
Alzakari et al.¹⁴	Heart Disease	XGB, Bi-LSTM	Yes	-	XGB performs feature selection and early classification, and Bi-LSTM performs second-stage classification.
Das et al.¹⁵	Diabetes, Cancer	RF, XGB, AB, CB etc.	Yes	SMOTE	RF performs well in both balanced and imbalance datasets.
Senthilvadivu et al.¹⁶	ICU Condition	RF, XGB	Yes	-	XGB performs better than RF.
Abdellatif et al.¹⁷	Heart Disease	RF	Yes	SMOTE	Proposed weighted RF performs better than SMOTE-RF.
Yalin et al.¹⁸	Diabetes	LR, XGB	Yes	-	The proposed ensemble outperforms other base models.
Abnoosian et al.¹⁹	Diabetes	KNN, SVM, DT, RF, AB, GNB	Yes	-	The proposed ensemble outperforms the base models.
Liu et al.²⁰	Cardiovascular disease	LR, SVM, Bagging	Yes	Undersampling	SVM is used as a weak learner for bagging.
Masruriyah et al.²¹	Heart Disease	C4.5, RF, SVM, LR	Yes	SMOTE, ADASYN	Oversampling decreases the model's performance.
Pablo et al.²²	COVID-19	MLP, XGB, NB, DT, SVM	Yes	POS, RUS, SMOTE, ADASYN	MLP stood out strongly for all experimental setups.
Chowdhury et al.²³	COVID-19	KNN, SVM, RF, XGB	Yes	SMOTE	The proposed pipeline outperforms other pairs and the base models.
Prithula et al.²⁴	Respiratory diseases	MLP, XGB, DT, SVM, AB CB etc	Yes	SMOTE	CB performs better than the proposed model.
Chowdhury et al.²⁵	Diabetes	LR, RF, AB, GB, Voting	Yes	ENN, SOMTE_N, SMOTE-ENN, SMOTE-Tomek	The performance of the models is better in original dataset.
Mienye and Sun²⁶	Diabetes, Cancer, CKD	LR, DT, XGB, RF	Yes	-	Cost-sensitive XGB performs better than others.

3. Proposed methodology

To enhance the prediction accuracy on an imbalanced dataset, we propose a novel ensemble approach called Optimized Ensemble by Differential Evolution (OEDE). An overview of the proposed methodology is shown in Figure 1. Four different medical datasets with different imbalance ratio (1.89 to 14.6) were collected from the UCI Machine Learning Repository to assess the robustness of the model. After data preprocessing, a stratified train-test split was applied to maintain the class distribution in the training, test, and valuation sets. Logistic Regression (LR), Random Forest (RF), and XGBoost (XGB) were used as the base models, and Differential Evolution (DE) was used to combine their predictive strength based on prediction probability on the validation set, with Area Under the ROC curve (AUC) as the optimization objective. Before constructing the final model, the base learners were fine-tuned using GridSearchCV and Stratified-K-Fold cross-validation to ensure a robust model under class imbalance. The performance of OEDE was tested on the original imbalanced dataset, dataset balanced with SMOTE, and ADASYN. A performance comparison was performed against existing ensemble models such as AB, CB, LGBM, ET, EE, and RBF.

Figure 1. Proposed methodology.

3.1 Datasets

To evaluate the robustness and adaptability of the proposed OEDE model, we used four widely used public medical datasets from the UCI Machine Learning Repository, with different class Imbalance Ratios (IR), as shown in Table 2. These datasets were carefully chosen to cover a broad range of real-life situations where minority class data are clinically significant. The Pima Indiana Diabetes Dataset (IR = 1.89) includes the inception of diabetes in female patients based on diagnostic measurements. The Haberman’s Cancer Survival Dataset (IR = 2.78) has patient records for those who had undergone breast cancer surgery, aimed at predicting post-operative survival. The Thoracic Surgery Dataset (IR = 7.27) is based on predicting survival following major lung surgery for patients, considering clinical and surgical factors. Finally, the Cervical Cancer Risk Dataset (IR = 14.6) uses personal health information and screening results to assess the risk of cervical cancer. These datasets provide a range of imbalance ratios where the minority class proportion is less than 40%, offering a robust testbed for validating the effectiveness of the OEDE.

Table 2. Summary of datasets used.

Dataset name	No. of instances	No. of features	Minority class	Minority class proportion (%)	Imbalance Ratio (IR)
Pima Indians Diabetes²⁷	768	9	Positive	34.9%	1.89
Haberman’s Cancer²⁸	306	4	Negative	26.5%	2.78
Thoracic Surgery²⁹	470	17	Negative	12.1%	7.27
Cervical Cancer Risk³⁰	858	36	Positive	6.4%	14.6

3.2 Base models

The proposed ensemble model leverages three different base learners, namely Logistic Regression (LR), Random Forest (RF), and XGBoost (XGB), with the aim of improving predictive performance based on model diversity. The LR is a linear model that predicts the probability of a binary outcome using a sigmoid function.³¹ Its interpretability and probabilistic output make it a valuable baseline, particularly when modelling linear relationships.^32,33 RF is an ensemble of decision trees that introduces non-linearity and robustness by aggregating predictions from a group of trees trained on bootstrapped datasets and random feature subsets, thus removing variance and overfitting.³⁴ It performed well in capturing complex feature interactions.³⁵ XGBoost builds a sequence of trees, where each tree corrects the errors made by its predecessors, optimizing a normalized objective function L = $\sum_{l} l (y_{i}, \hat{y_{i}}) + \sum_{k} Ω (f_{k})$ , where $Ω (f) = γT + \frac{1}{2} λ {‖ w ‖}^{2} .$ ³⁶ Its ability to model non-linearities with good precision,³⁷ regularization, and the ability to handle missing values¹⁸ makes it a strong learner in the ensemble. Together, these models provide distinct perspectives, such as linear separability, variance minimization, and gradient-based optimization, making the ensemble more robust and less prone to overfitting than any single learner.

3.3 Sampling techniques

We used oversampling techniques, the Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN) to address the class imbalance of the original dataset. We assessed the performance of our proposed ensemble model on three different datasets: the original imbalanced dataset, SMOTE-balanced dataset, and ADASYN-balanced dataset. SMOTE creates synthetic instances for the minority class through interpolation between minority samples and their k-closest minority neighbour, thereby expanding the decision boundary and reducing overfitting to specific samples.³⁸ While SMOTE assumes the same significance to all instances of the minority class,³⁹ ADASYN focuses on varying the importance of individual minority instances according to their level of difficulty in learning.⁴⁰ It individually generates a synthetic minority sample, which is harder to classify owing to its lower density in the feature space, thereby promoting the generalization of the challenging part of the data.⁴¹ Through this cross-dataset comparison of model performance, we aim to evaluate not only the generalizability of the proposed ensemble, but also how different balancing strategies influence its performance.

3.4 OEDE

The Optimized Ensemble by Differential Evolution (OEDE) was designed as a novel ensemble model to alleviate the classification difficulties of imbalanced medical datasets. This method incorporates class-balanced base learners and a Differential Evolution (DE)-driven algorithm for AUC maximization. Three fundamentally different classifiers, LR, RF, and XGB, were used as base models. All base models were configured with a class-weighting mechanism during training to address the effects of class imbalance. LE employs class_weight= ‘balanced’, RF uses class_weight=‘balanced_subsample’, while XGB is fine-tuned for logloss with consideration taken for label imbalance. These learners were independently trained on the same dataset and produced probability estimates for the positive class that were combined using a learned ensemble set of weights. OEDE does not combine the outputs of the base learners using fixed or heuristic-based weights; instead, it uses Differential Evolution (DE) for adaptive weight optimization. DE is a stochastic population-based global optimization method that is well-suited for non-differentiable and non-convex functions.^42–45

Let the prediction probabilities from M base classifiers for a given instance x be denoted by [p₁(x), p₂(x), …, p_M(x)]. Assign weight w_i to each base classifier i, where the weights satisfy:

For the proposed ensemble, assign a weight w_i to each base model i such that:

(1)

\sum_{i = 1}^{M} w_{i} = 1 and w_{i} \geq 0 for all i

The ensemble’s predicted probability is given by the weighted average:

(2)

p_{ens} (x) = \sum_{i = 1}^{M} w_{i} p_{i} (x)

The goal was to choose the weights w = [w₁, w₂, …, w_M] to maximize the AUC on a validation set. Recall that the AUC represents the probability that a randomly chosen positive instance has a higher score than a randomly chosen negative instance.

(3)

AUC = \frac{1}{| S^{+} ‖ S^{-} |} \sum_{i \in S^{+}} \sum_{j \in S^{-}} I (p_{ens} (i) > p_{ens} (j))

where S⁺: index of positive instances, S^-: index of negative instances, I: indicator function.

Because many optimization algorithms (such as Differential Evolution) are formulated as minimization problems, the loss function is defined as the negative AUC:

(4)

L (w) = - AUC (y_{val}, p_{ens} (val))

Thus, the optimization problem becomes:

(5)

min_{w} L (w) subject to \sum_{i = 1}^{M} w_{i} = 1, w_{i} \geq 0 \forall i

DE is selected for this task because of its robustness in optimizing non-differentiable and non-convex functions such as AUC. DE evolves a population of weight vectors over generations using mutation, crossover, and selection to converge to a globally optimal solution. The proposed process is described by Algorithm 1.

Algorithm 1. Optimized Ensemble by Differential Evolution (OEDE).

Input:

- Traini Set: X_train, y_train
- Validation Set: X_val, y_val
- Test Set: X_test, y_test
- DE parameters: population size P, generations G

Output:

- Test set predictions y_pred
- Evaluation metrics: Accuracy, Precision, Recall, F1-score, AUC

1. Train LR, RF, XGB on (X_train, y_train)

2. For each model m ∈ {LR, RF, XGB}:

3. p_val_m = m.predict_proba(X_val)[:, 1]

4. P_val = [p_val_LR, p_val_RF, p_val_XGB]

5. function AUC_Loss(weights, P_val, y_val):

6. Normalize weights: w = weights/sum (weights)

7. Ensemble prediction: p_ens = dot(P_val, w)

8. Return -AUC(y_val, p_ens)

9. bounds = [(0, 1), (0, 1), (0, 1)]

10. w_opt = DifferentialEvolution (AUC_Loss, bounds, args=(P_val, y_val))

11. w_opt: w_opt = w_opt/sum(w_opt)

12. For each m ∈ {LR, RF, XGB}:

13. p_test_m = m.predict_proba(X_test)[:, 1]

14. P_test = [p_test_LR, p_test_RF, p_test_XGB]

15. p_test_ens = dot(P_test, w_opt)

16. y_pred = 1 if p_test_ens ≥ 0.5 else 0

17. Return y_pred, evaluation metrics

3.5 Performance matrices

We used five common yet important classification metrics such as Accuracy, Precision, Recall, F1-Score, and Area Under the ROC curve (AUC), to evaluate the performance of the proposed OEDE model. While working with the imbalanced dataset, these measures offer a comprehensive insight into the efficiency of the model. Let P_C, N_C, P_E, and N_E represent the numbers of correctly classified positives, correctly classified negatives, false positives, and false negatives, respectively.

• The model’s overall correctness is gauged by accuracy.

(6)

Accuracy = \frac{P_{c} + N_{E}}{P_{c} + N_{C} + P_{E} + N_{E}}

• The percentage of correct positive predictions among all positive predictions is known as the precision.

(7)

Precision = \frac{P_{c}}{P_{c} + P_{E}}

• Recall shows how well the model recognizes the actual positives.

(8)

Recall = \frac{P_{c}}{P_{c} + N_{E}}

• F1-Score balances the Precision and Recall.

(9)

F 1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}

When it comes to an imbalanced dataset, the AUC is especially significant, which represents the area under the ROC curve by plotting the true positive rate $(\frac{P_{C}}{P_{c} + N_{E}})$ and false positive rate $(\frac{P_{E}}{P_{E} + N_{C}})$ across different classification thresholds. A higher AUC represents better separability. Finally, we visualized the ROC curve of each model, which offered an intuitive view of the trade-off between false alarms and sensitivity. The closer the curve is to the upper-left corner, the better is the classifier.

4. Results and discussion

The experiment involved four different medical datasets with different imbalance ratios (1.89 – 14.6) used for the performance analysis of the proposed model. This section provides a detailed analysis of the proposed model and compares the performance matrices with the existing base models after training them in three different conditions: the original imbalanced dataset, data balanced using SMOTE, and data balanced using ADASYN. The performance matrices were evaluated using a never-observed imbalance test split.

4.1 Pima indiana diabetes dataset

On the original dataset, OEDE achieved high accuracy and AUC score, as shown in Table 3, effectively discriminating between diabetic and non-diabetic samples without artificial rebalancing. Although ensemble models such as RF and XGBoost demonstrated competitive accuracy, their AUC and recall values were low, indicating a bias towards the majority classes. The OEDE tended to have a high F1 score, while maintaining a better balance between precision and recall, as shown in Figure 2. The overall performance of all models improved when the dataset was balanced using SMOTE, as indicated by the increased recall and F1-score, but OEDE still outperformed the others in terms of accuracy, AUC, and F1-score. Notably, the change in the AUC value was marginal, indicating that merely balancing the data may not be sufficient. In the ADASYN-balanced scenario, it tends to echo those of SMOTE but with minor instability for some models because ADASYN introduces noise in the minority samples. OEDE again outperformed other baseline models with a minor drop in precision while maintaining the AUC and showed a balanced performance throughout the datasets.

Table 3. AUC and Accuracy comparison of OEDE and state-of-the-art ML models on Pima Indiana diabetes dataset.

Models	Original imbalanced data		SMOTE-Balanced data		ADASYN-Balanced data
Models	Accuracy	AUC	Accuracy	AUC	Accuracy	AUC
AB	74.46	80.1	77.06	82.4	74.46	80.44
BRF	76.19	83.9	76.19	83.49	76.62	84.17
ET	75.76	82.4	73.59	81.79	74.03	81.5
GB	74.89	82.77	75.32	84.26	77.92	82.94
LGBM	74.03	81.52	76.19	80.98	75.76	81.81
RF	76.19	83.15	75.76	85.35	75.76	84.16
XGB	76.19	81.33	76.19	80.89	77.06	80.87
CB	77.06	83.82	77.06	83.93	77.06	82.45
EE	76.19	79.81	77.06	82.4	75.32	80.05
OEDE	77.49	84.02	77.92	84.31	77.92	84.37

Figure 2. Precision, Recall, and F1-score comprise between OEDE and state-of-the-art ML models on the Pima Indiana Diabetes Dataset.

4.2 Haberman’s cancer dataset

OEDE achieved a higher accuracy, AUC, and F1-score than traditional ensemble models such as RF, AB, and CB in an imbalanced dataset, as shown in Table 4. While some models show comparable accuracy, OEDE shows stability in precision and recall, leading to a better F1-score as shown in Figure 3, demonstrating its ability to detect the minority class without compromising the overall correctness. After the dataset is balanced with SMOTE, an improvement is observed in recall for most of the models, which shows that the models benefit from the synthetic sample. The OEDE maintained its AUC lead, highlighting its ability to influence informative patterns more effectively, even in a balanced dataset. Its high F1-score reflects food robustness against overfeeding for synthetic data. When ADASYN is used for data balancing, some models show instability in precision and recall, as ADASYN tends to produce harder-to-learn synthetic samples, while OEDE has a high-performance score without sacrificing stability and sensitivity.

Table 4. AUC and Accuracy comparison of OEDE and state-of-the-art ML models on Haberman’s Cancer Dataset.

Models	Original imbalanced data		SMOTE-Balanced data		ADASYN-Balanced data
Models	Accuracy	AUC	Accuracy	AUC	Accuracy	AUC
AB	70.65	61.04	70.65	66.75	71.74	67.74
BRF	68.48	62.88	67.39	61.68	64.13	61.8
ET	70.65	60.61	68.48	64.22	66.3	64.66
GB	71.74	68.76	68.48	67.45	69.57	67.4
LGBM	65.22	65.33	66.3	61.77	65.22	62.47
RF	69.57	66.03	68.48	66.08	65.22	66.9
XGB	71.74	67.77	66.3	66.61	65.22	66.14
CB	67.39	70.8	71.74	66.96	71.74	67.74
EE	65.22	68.12	70.65	66.75	68.48	66.78
OEDE	72.83	73.25	68.48	70.37	69.57	66.33

Figure 3. Precision, Recall, and F1-score comprise between OEDE and state-of-the-art ML models on Haberman’s Cancer Dataset.

4.3 Thoracic surgery data

The thoracic surgery dataset showed a significant imbalance in class distribution, challenging most of the classifiers that favour the majority classes. OEDE outperformed traditional ensemble models such as RF, LGBM, and BRF in terms of AUC and F1-score, as shown in Table 5 and Figure 4, while the original imbalanced dataset was used. Although many baseline models achieve relatively high accuracy, OEDE maintains its balanced performance, achieving higher recall for minority classes and offering better overall performance. With the SMOTE-balanced dataset, most models improve recall values owing to synthetic samples. OEDE continues to display a superior performance matrix, especially the AUC value, which indicates the robustness of the model even when the dataset is balanced with synthetic data without dropping precision or overfitting. Similar to the previous datasets, the performance of the baseline models fluctuated in the ADASYN-balanced dataset, but OEDE maintained a stable performance and showed high F1 and AUC scores. This generalizability and consistent performance are possible owing to the differential evaluation-based weight optimization.

Table 5. AUC and Accuracy comparison of OEDE and state-of-the-art ML models on the Thoracic Surgery dataset.

Models	Original imbalanced data		SMOTE-Balanced data		ADASYN-Balanced data
Models	Accuracy	AUC	Accuracy	AUC	Accuracy	AUC
AB	80.14	47.15	71.63	44.14	73.05	50.11
BRF	81.56	50.48	77.31	47.61	75.89	46.19
ET	80.14	48.11	78.01	50.89	78.01	45.76
GB	82.98	56.29	76.6	54.93	78.01	50.04
LGBM	82.98	50.57	78.01	53.45	78.01	50.18
RF	79.43	50.82	76.6	47.86	79.43	48.15
XGB	82.98	50.82	76.6	44.23	74.47	43.38
CB	58.87	49.61	78.01	52.62	79.43	49.73
EE	57.45	57.59	71.63	44.14	72.34	47.19
OEDE	84.4	70.08	81.56	69.37	79.43	65.4

Figure 4. Precision, Recall, and F1-score comprise between OEDE and state-of-the-art ML models on the Thoracic surgery dataset.

4.4 Cervical cancer risk dataset

The Cervical Cancer Risk dataset shows a significant difficulty owing to high-class imbalance, where high-risk cases are less than 7% of the dataset, which tends to affect the performance of the traditional model. On the imbalanced dataset, most baseline models, such as RF, ET, and CB, struggle with the minority class and show a low recall and F1-score, despite considerable accuracy, as shown in Table 6 and Figure 5. In contrast, OEDE is able to distinguish the minority class effectively owing to adaptive weighting, as reflected in the AUC and F1-score. Again, SMOTE improves the recall for all models, including OEDE, while maintaining competitive precision and achieving the highest AUC. For the ADASYN-balanced dataset, OEDE maintained high performance with the highest AUC and F1-score, supporting its resilience and generalizability.

Table 6. AUC and Accuracy comparison of OEDE and state-of-the-art ML models on the cervical cancer risk dataset.

Models	Original imbalanced data		SMOTE-Balanced data		ADASYN-Balanced data
Models	Accuracy	AUC	Accuracy	AUC	Accuracy	AUC
AB	93.57	81.45	94.96	89.22	94.19	92.24
BRF	94.35	94.94	95.35	96.3	94.35	95.47
ET	94.35	92.27	95.35	89.33	94.57	88.31
GB	94.35	94.13	95.35	96.38	94.74	95.02
LGBM	94.74	94.05	94.96	94.29	94.96	93.03
RF	94.74	94.27	94.96	94.6	94.35	94.05
XGB	94.35	93.42	94.74	96.21	94.35	96.08
CB	92.8	95.17	94.35	97.23	94.35	96.89
EE	92.8	95.12	94.96	89.22	94.19	92.24
OEDE	95.35	96.73	95.19	97.19	94.96	97.89

Figure 5. Precision, Recall, and F1-score comprise between OEDE and state-of-the-art ML models on the Cervical Cancer Risk Dataset.

The box plot overlaid with swarm points ( Figure 6) demonstrates the diversity of the different datasets. The median and IQR of AUC on the Cervical dataset presented the highest AUC scores (~0.95 median), demonstrating excellent and stable classification performance. Even for the Pima dataset, a moderate spread was observed, indicating reasonable generalization for the model. The Haberman and Thoracic datasets, on the other hand, demonstrated lower and more variable AUCs, around a median of 0.66–0.68, suggesting struggles possibly with class imbalance or limited separability.

Figure 6. AUC distribution across datasets.

The performance of the OEDE across the four benchmark datasets demonstrates its robustness and adaptability across balanced and imbalanced datasets. Despite varying levels of class imbalance, OEDE achieved superior performance, outperforming a range of traditional baseline models. To support the numerical findings, ROC curves were plotted ( Figure 7) for all the datasets to visually represent the discrimination ability of the model. Consistently across all datasets, the ROC curve of OEDE was favourable, aligning with the AUC values achieved.

Figure 7. ROC plots of all the models on different datasets.

(A) Pima Indiana Diabetes Dataset. (B) Haberman’s Cancer Dataset. (C) Thoracic Surgery Dataset. (D) Cervical Cancer Risk Dataset.

5. Conclusion

In this study, a novel ensemble model, OEDE, was proposed to address the challenges of imbalanced medical datasets. The key innovation includes the aggregation of the predictive power of diverse classifiers such as LR, RF, and XGB through a differential evolution-based ensemble, which learns the optimal weight by maximizing the AUC. Unlike traditional models, OEDE adapts the decision boundary to enhance the discrimination capacity, especially in the minority class. The approach also includes class-balanced-based learning, ensuring that base models reduce the imbalance at the source, resulting in a robust ensemble model that generalizes well across datasets with different characteristics and imbalance ratios. The performance of OEDE was assessed on four different medical datasets with class imbalance ratios from 1.89 to 14.6, to verify its ability to handle class imbalance data. The results demonstrated that OEDE almost always performed significantly better than the state-of-the-art ML models in terms of AUC, accuracy, and F1-score, and the model was robust under different data balancing techniques such as SMOTE and ADSYN. Adding the ROC curves of all the models and datasets also confirms the superior separability of OEDE and makes it a useful framework for practical real-world classification problems.

Ethical approval

This study does not require any ethical approval since this study only used publicly accessible, de-identified datasets. There were no experiments with human subjects, and no new data were gathered. The datasets do not contain any personally identifiable information and are available from recognized open repositories.

Data availability

The datasets used in this research are publicly available from recognized data repositories and can be accessed through the following links. The Pima Indians Diabetes Dataset, originally hosted on UCI ML Repository is no longer available there. However, it can be accessed via Mendeley Data.

• UCI Machine Learning Repository. Haberman’s Survival Dataset. DOI: 10.24432/C5XK51
• UCI Machine Learning Repository. Thoracic Surgery Data. DOI: 10.24432/C5Z60N
• UCI Machine Learning Repository. Cervical Cancer Dataset. DOI: 10.24432/C5Z310
• Mendeley Data. Pima Indians Diabetes Dataset. DOI: 10.17632/7zcc8v6hvp.1

All the data is publicly available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0).

References

1. Kolasa K, Admassu B, Hołownia-Voloskova M, et al.: Systematic reviews of machine learning in healthcare: a literature review. Expert Rev. Pharmacoecon. Outcomes Res. Jan. 2024; 24(1): 63–115. Publisher Full Text
2. Zhang H, Gerych W, Ghassemi M: A data-centric perspective to fair machine learning for healthcare. Nature Reviews Methods Primers. Nov. 2024; 4(1): 86. Publisher Full Text
3. Roy D, Roy A, Roy U: Learning from Imbalanced Data in Healthcare: State-of-the-Art and Research Challenges.2024; 19–32. Publisher Full Text
4. Kosolwattana T, Liu C, Hu R, et al.: A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare. BioData Min. Apr. 2023; 16(1): 15. Publisher Full Text
5. Khan TM, Xu S, Khan ZG, et al.: Implementing Multilabeling, ADASYN, and ReliefF Techniques for Classification of Breast Cancer Diagnostic through Machine Learning: Efficient Computer-Aided Diagnostic System. J. Healthc. Eng. Mar. 2021; 2021: 1–15. Publisher Full Text
6. Mahajan P, Uddin S, Hajati F, et al.: Ensemble Learning for Disease Prediction: A Review. Healthcare. Jun. 2023; 11(12): 1808. Publisher Full Text
7. Dey I, Pratap V: A Comparative Study of SMOTE, Borderline-SMOTE, and ADASYN Oversampling Techniques using Different Classifiers. Proceedings - 2023 3rd International Conference on Smart Data Intelligence, ICSMDI 2023. Institute of Electrical and Electronics Engineers Inc.; 2023; pp. 294–302. Publisher Full Text
8. Chen TCT, Wu HC, Chiu MC: A deep neural network with modified random forest incremental interpretation approach for diagnosing diabetes in smart healthcare. Appl. Soft Comput. Feb. 2024; 152. Publisher Full Text
9. Martinez-Velasco A, Martínez -Villaseñor L, Miralles-Pechuán L: Addressing Class Imbalance in Healthcare Data: Machine Learning Solutions for Age-Related Macular Degeneration and Preeclampsia. IEEE Latin America Transactions. 2024. Reference Source
10. Agyemang EF, et al.: Addressing Class Imbalance Problem in Health Data Classification: Practical Application From an Oversampling Viewpoint. Applied Computational Intelligence and Soft Computing. 2025; 1: 2025. Publisher Full Text
11. Abayomi-Alli OO, Damaševičius R, Maskeliūnas R, et al.: An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples. Sensors. Mar. 2022; 22(6). Publisher Full Text
12. Elgendy IA, Hosny M, Albashrawi MA, et al.: Dual-stage explainable ensemble learning model for diabetes diagnosis. Expert Syst. Appl. May 2025; 274. Publisher Full Text
13. Dutta A, et al.: Early Prediction of Diabetes Using an Ensemble of Machine Learning Models. Int. J. Environ. Res. Public Health. Oct. 2022; 19(19). Publisher Full Text
14. Alzakari SA, et al.: Enhanced heart disease prediction in remote healthcare monitoring using IoT-enabled cloud-based XGBoost and Bi-LSTM. Alex. Eng. J. Oct. 2024; 105: 280–291. Publisher Full Text
15. Das S, Nayak SP, Sahoo B, et al.: Evaluating Ensemble Models on Imbalanced Data Sets: A Comparative Study across Varied Minority Class Ratios. ESIC 2024-4th International Conference on Emerging Systems and Intelligent Computing, Proceedings. Institute of Electrical and Electronics Engineers Inc.; 2024; pp. 774–779. Publisher Full Text
16. Senthilvadivu S, Ramesh PS, Narang S, et al.: Impact of Random Forest and XGBoost Algorithms on Improving Patient Outcomes Compared to Standard Decision-Making Methods in Healthcare Predictive Analytics. 2024 International Conference on Cybernation and Computation, CYBERCOM 2024. Institute of Electrical and Electronics Engineers Inc.; 2024; pp. 694–699. Publisher Full Text
17. Abdellatif A, Abdellatef H, Kanesan J, et al.: Improving the Heart Disease Detection and Patients’ Survival Using Supervised Infinite Feature Selection and Improved Weighted Random Forest. IEEE Access. 2022; 10: 67363–67372. Publisher Full Text
18. Wu Y, et al.: Novel binary logistic regression model based on feature transformation of XGBoost for type 2 Diabetes Mellitus prediction in healthcare systems. Futur. Gener. Comput. Syst. Apr. 2022; 129: 1–12. Publisher Full Text
19. Abnoosian K, Farnoosh R, Behzadi MH: Prediction of diabetes disease using an ensemble of machine learning multi-classifier models. BMC Bioinformatics. Dec. 2023; 24(1). Publisher Full Text
20. Liu L, Wu X, Li S, et al.: Solving the class imbalance problem using ensemble algorithm: application of screening for aortic dissection. BMC Med. Inform. Decis. Mak. Dec. 2022; 22(1). Publisher Full Text
21. Masruriyah AFN, Novita HY, Sukmawati CE, et al.: Thorough Evaluation of the Effectiveness of SMOTE and ADASYN Oversampling Methods in Enhancing Supervised Learning Performance for Imbalanced Heart Disease Datasets. 2023 8th International Conference on Informatics and Computing, ICIC 2023. Institute of Electrical and Electronics Engineers Inc.; 2023. Publisher Full Text
22. Ormeño-Arriagada P, Márquez G, Araya D, et al.: Applying Machine Learning Sampling Techniques to Address Data Imbalance in a Chilean COVID-19 Symptoms and Comorbidities Dataset. Applied Sciences (Switzerland). Feb. 2025; 15(3). Publisher Full Text
23. Chowdhury LH, Tabassum S, Shatabda S, et al.: An optimized data analytics pipeline for improving healthcare diagnosis using ensemble learning. Inform. Med. Unlocked. Jan. 2025; 53. Publisher Full Text
24. Prithula J, et al.: Improved pediatric ICU mortality prediction for respiratory diseases: machine learning and data subdivision insights. Respir. Res. Dec. 2024; 25(1). Publisher Full Text
25. Chowdhury MM, Ayon RS, Hossain MS: An investigation of machine learning algorithms and data augmentation techniques for diabetes diagnosis using class imbalanced BRFSS dataset. Healthcare Analytics. Jun. 2024; 5. Publisher Full Text
26. Mienye ID, Sun Y: Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inform. Med. Unlocked. Jan. 2021; 25. Publisher Full Text
27. UCI MACHINE LEARNING: Pima Indians Diabetes Database. Version 1. Accessed: Feb. 09, 2025. Reference Source
28. Haberman S: Haberman’s Survival.1976. Publisher Full Text
29. Lubicz Marek PKRA, Kolodziej J: Thoracic Surgery Data.2014. Publisher Full Text
30. Fernandes Kelwin CJ, Fernandes J: Cervical Cancer (Risk Factors).2017. Publisher Full Text
31. Song Y, et al.: Comparison of logistic regression and machine learning methods for predicting postoperative delirium in elderly patients: A retrospective study. CNS Neurosci. Ther. Jan. 2023; 29(1): 158–167. Publisher Full Text
32. Rahmatinejad Z, et al.: A comparative study of explainable ensemble learning and logistic regression for predicting in-hospital mortality in the emergency department. Sci. Rep. Feb. 2024; 14(1): 3406. Publisher Full Text
33. Rajendra P, Latifi S: Prediction of diabetes using logistic regression and ensemble techniques. Computer Methods and Programs in Biomedicine Update. 2021; 1: 100032. Publisher Full Text
34. Nagarajan SM, Muthukumaran V, Murugesan R, et al.: Feature selection model for healthcare analysis and classification using classifier ensemble technique. Int. J. Syst. Assur. Eng. Manag. May 2021. Publisher Full Text
35. Nguyen D-K, Lan C-H, Chan C-L: Deep Ensemble Learning Approaches in Healthcare to Enhance the Prediction and Diagnosing Performance: The Workflows, Deployments, and Surveys on the Statistical, Image-Based, and Sequential Datasets. Public Health. 2021; 18: 10811. Publisher Full Text
36. Das S, Nayak SP, Sahoo B, et al.: Machine Learning in Healthcare Analytics: A State-of-the-Art Review. Archives of Computational Methods in Engineering. Apr. 2024. Publisher Full Text
37. Verma I, Prasad SK: Exploring Ensemble Learning Techniques for Infant Mortality Prediction: A Technical Analysis of XGBoost Stacking AdaBoost and Bagging Models. Birth Defects Res. Feb. 2025; 117(2). Publisher Full Text
38. Imani M, Beikmohammadi A, Arabnia HR: Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels. Technologies (Basel). Mar. 2025; 13(3). Publisher Full Text
39. Kabir MA, Ahmed MU, Begum S, et al.: Balancing Fairness: Unveiling the Potential of SMOTE-Driven Oversampling in AI Model Enhancement. ACM International Conference Proceeding Series. Association for Computing Machinery; May 2024; pp. 21–29. Publisher Full Text
40. Mohammedqasim H, Ahmed Jasim A, Mohammedqasem A, et al.: ENHANCING PREDICTIVE PERFORMANCE IN COVID-19 HEALTHCARE DATASETS: A CASE STUDY BASED ON HYPER ADASYN OVER-SAMPLING AND GENETIC FEATURE SELECTION.2024.
41. Al-Shehari T, et al.: Comparative evaluation of data imbalance addressing techniques for CNN-based insider threat detection. Sci. Rep. Dec. 2024; 14(1). Publisher Full Text
42. Differential Evolution. Differential Evolution. Boston, MA: Springer US; pp. 1–24. Publisher Full Text
43. Ahmad MF, Isa NAM, Lim WH, et al.: Differential evolution: A recent review based on state-of-the-art works. Alex. Eng. J. May 2022; 61(5): 3831–3872. Publisher Full Text
44. Song Y, et al.: Dynamic hybrid mechanism-based differential evolution algorithm and its application. Expert Syst. Appl. Mar. 2023; 213: 118834. Publisher Full Text
45. Zhang SX, Liu YH, Zheng LM, et al.: Differential evolution with collective ensemble learning. Swarm Evol. Comput. Jun. 2024; 87: 101521. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 29 Sep 2025

Author details Author details

¹ School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, Odisha, 751024, India
² Department of Computer Science and Engineering, Silicon University, Bhubaneswar, Odisha, 751024, India

Surajit Das
Roles: Data Curation, Formal Analysis, Software, Writing – Original Draft Preparation

Samaleswari P. Nayak
Roles: Conceptualization, Project Administration, Supervision

Biswajit Sahoo
Roles: Funding Acquisition, Investigation, Validation

Satyananda Champati Rai
Roles: Methodology, Resources, Visualization, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 29 Sep 2025, 14:1003

https://doi.org/10.12688/f1000research.169456.1

Copyright

© 2025 Das S et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Das S, Nayak SP, Sahoo B and Champati Rai S. A Differential Evolution-Based Optimized Ensemble for Balanced and Imbalanced Medical Datasets [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:1003 (https://doi.org/10.12688/f1000research.169456.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 29 Sep 2025

Open Peer Review

Reviewer Status

AWAITING PEER REVIEW

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

[1] 1. Kolasa K, Admassu B, Hołownia-Voloskova M, et al.: Systematic reviews of machine learning in healthcare: a literature review. Expert Rev. Pharmacoecon. Outcomes Res. Jan. 2024; 24(1): 63–115. Publisher Full Text

[2] 2. Zhang H, Gerych W, Ghassemi M: A data-centric perspective to fair machine learning for healthcare. Nature Reviews Methods Primers. Nov. 2024; 4(1): 86. Publisher Full Text

[3] 3. Roy D, Roy A, Roy U: Learning from Imbalanced Data in Healthcare: State-of-the-Art and Research Challenges.2024; 19–32. Publisher Full Text

[4] 4. Kosolwattana T, Liu C, Hu R, et al.: A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare. BioData Min. Apr. 2023; 16(1): 15. Publisher Full Text

[5] 5. Khan TM, Xu S, Khan ZG, et al.: Implementing Multilabeling, ADASYN, and ReliefF Techniques for Classification of Breast Cancer Diagnostic through Machine Learning: Efficient Computer-Aided Diagnostic System. J. Healthc. Eng. Mar. 2021; 2021: 1–15. Publisher Full Text

[6] 6. Mahajan P, Uddin S, Hajati F, et al.: Ensemble Learning for Disease Prediction: A Review. Healthcare. Jun. 2023; 11(12): 1808. Publisher Full Text

[7] 7. Dey I, Pratap V: A Comparative Study of SMOTE, Borderline-SMOTE, and ADASYN Oversampling Techniques using Different Classifiers. Proceedings - 2023 3rd International Conference on Smart Data Intelligence, ICSMDI 2023. Institute of Electrical and Electronics Engineers Inc.; 2023; pp. 294–302. Publisher Full Text

[8] 8. Chen TCT, Wu HC, Chiu MC: A deep neural network with modified random forest incremental interpretation approach for diagnosing diabetes in smart healthcare. Appl. Soft Comput. Feb. 2024; 152. Publisher Full Text

[9] 9. Martinez-Velasco A, Martínez -Villaseñor L, Miralles-Pechuán L: Addressing Class Imbalance in Healthcare Data: Machine Learning Solutions for Age-Related Macular Degeneration and Preeclampsia. IEEE Latin America Transactions. 2024. Reference Source

[10] 10. Agyemang EF, et al.: Addressing Class Imbalance Problem in Health Data Classification: Practical Application From an Oversampling Viewpoint. Applied Computational Intelligence and Soft Computing. 2025; 1: 2025. Publisher Full Text

[11] 11. Abayomi-Alli OO, Damaševičius R, Maskeliūnas R, et al.: An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples. Sensors. Mar. 2022; 22(6). Publisher Full Text

[12] 12. Elgendy IA, Hosny M, Albashrawi MA, et al.: Dual-stage explainable ensemble learning model for diabetes diagnosis. Expert Syst. Appl. May 2025; 274. Publisher Full Text

[13] 13. Dutta A, et al.: Early Prediction of Diabetes Using an Ensemble of Machine Learning Models. Int. J. Environ. Res. Public Health. Oct. 2022; 19(19). Publisher Full Text

[14] 14. Alzakari SA, et al.: Enhanced heart disease prediction in remote healthcare monitoring using IoT-enabled cloud-based XGBoost and Bi-LSTM. Alex. Eng. J. Oct. 2024; 105: 280–291. Publisher Full Text

[15] 15. Das S, Nayak SP, Sahoo B, et al.: Evaluating Ensemble Models on Imbalanced Data Sets: A Comparative Study across Varied Minority Class Ratios. ESIC 2024-4th International Conference on Emerging Systems and Intelligent Computing, Proceedings. Institute of Electrical and Electronics Engineers Inc.; 2024; pp. 774–779. Publisher Full Text

[16] 16. Senthilvadivu S, Ramesh PS, Narang S, et al.: Impact of Random Forest and XGBoost Algorithms on Improving Patient Outcomes Compared to Standard Decision-Making Methods in Healthcare Predictive Analytics. 2024 International Conference on Cybernation and Computation, CYBERCOM 2024. Institute of Electrical and Electronics Engineers Inc.; 2024; pp. 694–699. Publisher Full Text

[17] 17. Abdellatif A, Abdellatef H, Kanesan J, et al.: Improving the Heart Disease Detection and Patients’ Survival Using Supervised Infinite Feature Selection and Improved Weighted Random Forest. IEEE Access. 2022; 10: 67363–67372. Publisher Full Text

[18] 18. Wu Y, et al.: Novel binary logistic regression model based on feature transformation of XGBoost for type 2 Diabetes Mellitus prediction in healthcare systems. Futur. Gener. Comput. Syst. Apr. 2022; 129: 1–12. Publisher Full Text

[19] 19. Abnoosian K, Farnoosh R, Behzadi MH: Prediction of diabetes disease using an ensemble of machine learning multi-classifier models. BMC Bioinformatics. Dec. 2023; 24(1). Publisher Full Text

[20] 20. Liu L, Wu X, Li S, et al.: Solving the class imbalance problem using ensemble algorithm: application of screening for aortic dissection. BMC Med. Inform. Decis. Mak. Dec. 2022; 22(1). Publisher Full Text

[21] 21. Masruriyah AFN, Novita HY, Sukmawati CE, et al.: Thorough Evaluation of the Effectiveness of SMOTE and ADASYN Oversampling Methods in Enhancing Supervised Learning Performance for Imbalanced Heart Disease Datasets. 2023 8th International Conference on Informatics and Computing, ICIC 2023. Institute of Electrical and Electronics Engineers Inc.; 2023. Publisher Full Text

[22] 22. Ormeño-Arriagada P, Márquez G, Araya D, et al.: Applying Machine Learning Sampling Techniques to Address Data Imbalance in a Chilean COVID-19 Symptoms and Comorbidities Dataset. Applied Sciences (Switzerland). Feb. 2025; 15(3). Publisher Full Text

[23] 23. Chowdhury LH, Tabassum S, Shatabda S, et al.: An optimized data analytics pipeline for improving healthcare diagnosis using ensemble learning. Inform. Med. Unlocked. Jan. 2025; 53. Publisher Full Text

[24] 24. Prithula J, et al.: Improved pediatric ICU mortality prediction for respiratory diseases: machine learning and data subdivision insights. Respir. Res. Dec. 2024; 25(1). Publisher Full Text

[25] 25. Chowdhury MM, Ayon RS, Hossain MS: An investigation of machine learning algorithms and data augmentation techniques for diabetes diagnosis using class imbalanced BRFSS dataset. Healthcare Analytics. Jun. 2024; 5. Publisher Full Text

[26] 26. Mienye ID, Sun Y: Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inform. Med. Unlocked. Jan. 2021; 25. Publisher Full Text

[27] 27. UCI MACHINE LEARNING: Pima Indians Diabetes Database. Version 1. Accessed: Feb. 09, 2025. Reference Source

[28] 28. Haberman S: Haberman’s Survival.1976. Publisher Full Text

[29] 29. Lubicz Marek PKRA, Kolodziej J: Thoracic Surgery Data.2014. Publisher Full Text

[30] 30. Fernandes Kelwin CJ, Fernandes J: Cervical Cancer (Risk Factors).2017. Publisher Full Text

[31] 31. Song Y, et al.: Comparison of logistic regression and machine learning methods for predicting postoperative delirium in elderly patients: A retrospective study. CNS Neurosci. Ther. Jan. 2023; 29(1): 158–167. Publisher Full Text

[32] 32. Rahmatinejad Z, et al.: A comparative study of explainable ensemble learning and logistic regression for predicting in-hospital mortality in the emergency department. Sci. Rep. Feb. 2024; 14(1): 3406. Publisher Full Text

[33] 33. Rajendra P, Latifi S: Prediction of diabetes using logistic regression and ensemble techniques. Computer Methods and Programs in Biomedicine Update. 2021; 1: 100032. Publisher Full Text

[34] 34. Nagarajan SM, Muthukumaran V, Murugesan R, et al.: Feature selection model for healthcare analysis and classification using classifier ensemble technique. Int. J. Syst. Assur. Eng. Manag. May 2021. Publisher Full Text

[35] 35. Nguyen D-K, Lan C-H, Chan C-L: Deep Ensemble Learning Approaches in Healthcare to Enhance the Prediction and Diagnosing Performance: The Workflows, Deployments, and Surveys on the Statistical, Image-Based, and Sequential Datasets. Public Health. 2021; 18: 10811. Publisher Full Text

[36] 36. Das S, Nayak SP, Sahoo B, et al.: Machine Learning in Healthcare Analytics: A State-of-the-Art Review. Archives of Computational Methods in Engineering. Apr. 2024. Publisher Full Text

[37] 37. Verma I, Prasad SK: Exploring Ensemble Learning Techniques for Infant Mortality Prediction: A Technical Analysis of XGBoost Stacking AdaBoost and Bagging Models. Birth Defects Res. Feb. 2025; 117(2). Publisher Full Text

[38] 38. Imani M, Beikmohammadi A, Arabnia HR: Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels. Technologies (Basel). Mar. 2025; 13(3). Publisher Full Text

[39] 39. Kabir MA, Ahmed MU, Begum S, et al.: Balancing Fairness: Unveiling the Potential of SMOTE-Driven Oversampling in AI Model Enhancement. ACM International Conference Proceeding Series. Association for Computing Machinery; May 2024; pp. 21–29. Publisher Full Text

[40] 40. Mohammedqasim H, Ahmed Jasim A, Mohammedqasem A, et al.: ENHANCING PREDICTIVE PERFORMANCE IN COVID-19 HEALTHCARE DATASETS: A CASE STUDY BASED ON HYPER ADASYN OVER-SAMPLING AND GENETIC FEATURE SELECTION.2024.

[41] 41. Al-Shehari T, et al.: Comparative evaluation of data imbalance addressing techniques for CNN-based insider threat detection. Sci. Rep. Dec. 2024; 14(1). Publisher Full Text

[42] 42. Differential Evolution. Differential Evolution. Boston, MA: Springer US; pp. 1–24. Publisher Full Text

[43] 43. Ahmad MF, Isa NAM, Lim WH, et al.: Differential evolution: A recent review based on state-of-the-art works. Alex. Eng. J. May 2022; 61(5): 3831–3872. Publisher Full Text

[44] 44. Song Y, et al.: Dynamic hybrid mechanism-based differential evolution algorithm and its application. Expert Syst. Appl. Mar. 2023; 213: 118834. Publisher Full Text

[45] 45. Zhang SX, Liu YH, Zheng LM, et al.: Differential evolution with collective ensemble learning. Swarm Evol. Comput. Jun. 2024; 87: 101521. Publisher Full Text

A Differential Evolution-Based Optimized Ensemble for Balanced and Imbalanced Medical Datasets

Abstract

Background

Methods

Result

Conclusion

Keywords

1. Introduction

2. Related work

Table 1. Literature review.

3. Proposed methodology

Figure 1. Proposed methodology.

3.1 Datasets

Table 2. Summary of datasets used.

3.2 Base models

3.3 Sampling techniques

3.4 OEDE

(1)

(2)

(3)

(4)

(5)

Algorithm 1. Optimized Ensemble by Differential Evolution (OEDE).

3.5 Performance matrices

(6)

(7)

(8)

(9)

4. Results and discussion

4.1 Pima indiana diabetes dataset

Table 3. AUC and Accuracy comparison of OEDE and state-of-the-art ML models on Pima Indiana diabetes dataset.

Figure 2. Precision, Recall, and F1-score comprise between OEDE and state-of-the-art ML models on the Pima Indiana Diabetes Dataset.

4.2 Haberman’s cancer dataset

Table 4. AUC and Accuracy comparison of OEDE and state-of-the-art ML models on Haberman’s Cancer Dataset.

Figure 3. Precision, Recall, and F1-score comprise between OEDE and state-of-the-art ML models on Haberman’s Cancer Dataset.

4.3 Thoracic surgery data

Table 5. AUC and Accuracy comparison of OEDE and state-of-the-art ML models on the Thoracic Surgery dataset.

Figure 4. Precision, Recall, and F1-score comprise between OEDE and state-of-the-art ML models on the Thoracic surgery dataset.

4.4 Cervical cancer risk dataset

Table 6. AUC and Accuracy comparison of OEDE and state-of-the-art ML models on the cervical cancer risk dataset.

Figure 5. Precision, Recall, and F1-score comprise between OEDE and state-of-the-art ML models on the Cervical Cancer Risk Dataset.

Figure 6. AUC distribution across datasets.

Figure 7. ROC plots of all the models on different datasets.

5. Conclusion

Ethical approval

Data availability

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated