Identification of Biomarkers for Severity in COVID-19 Through Comparative Analysis of Five Machine Learning Algoritms

Juan P. Olán-Ramón; Freddy De la Cruz-Ruiz; Eduardo De la Cruz-Cano; Sarai Aguilar-Barojas; Erasmo Zamarron-Licona

doi:10.12688/f1000research.150128.1

Home Browse Identification of Biomarkers for Severity in COVID-19 Through Comparative...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Identification of Biomarkers for Severity in COVID-19 Through Comparative Analysis of Five Machine Learning Algoritms

[version 1; peer review: 1 approved with reservations, 1 not approved]

Juan P. Olán-Ramón¹, Freddy De la Cruz-Ruiz ², Eduardo De la Cruz-Cano³, Sarai Aguilar-Barojas⁴, Erasmo Zamarron-Licona⁴

Juan P. Olán-Ramón¹, Freddy De la Cruz-Ruiz ², [...] Eduardo De la Cruz-Cano³, Sarai Aguilar-Barojas⁴, Erasmo Zamarron-Licona⁴

PUBLISHED 25 Jun 2024

Author details Author details

¹ Facultad de Ingeniería, Universidad Autónoma de Yucatán, México., Mérida, Yucatán, 97203, Mexico
² División Académica Multidisciplinaria de Comalcalco., Universidad Juárez Autónoma de Tabasco, Comalcalco, Tabasco, 86650, Mexico
³ División Académica Multidisciplinaria de Comalcalco, Universidad Juarez Autonoma de Tabasco, Villahermosa, Tabasco, 86650, Mexico
⁴ Laboratorio Estatal de Salud Pública, Villahermosa, Tabasco, 86020, Mexico

Juan P. Olán-Ramón
Roles: Conceptualization, Data Curation, Software, Writing – Original Draft Preparation

Freddy De la Cruz-Ruiz
Roles: Conceptualization, Project Administration, Validation

Eduardo De la Cruz-Cano
Roles: Data Curation, Validation, Visualization

Sarai Aguilar-Barojas
Roles: Methodology, Validation, Writing – Review & Editing

Erasmo Zamarron-Licona
Roles: Conceptualization, Formal Analysis, Validation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background

COVID-19 is a global public health problem.

Aim

The main objective of this research is to evaluate and compare the performance of the algorithms: Random Forest, Support Vector Machine, Logistic Regression, Decision Tree, and Neural Network, using metrics such as precision, recall, F1-score and accuracy.

Methods

A dataset (n=138) was used, with numerical and categorical variables. The algorithms Random Forest, Support Vector Machine, Logistic Regression, Decision Tree, and Neural Network were considered. These were trained using an 80-20 ratio. The following metrics were evaluated: precision, recall, F1-Score, and 5-fold stratified cross-validation.

Results

The Random Forest algorithm was superior, achieving a maximum score of 0.9727 in cross-validation. The correlation analysis identified ferritin (0.8277) and oxygen saturation (-0.6444). The heuristic model was compared with metaheuristics models. Models obtained through metaheuristic search could maintaining the metrics with 3 variables and stable weight distribution. A perplexity analysis it allows to differentiate between the best models. The features of creatinine and ALT are highlighted in the model with the best CV score and the lowest perplexity.

Conclusion

Comparative analysis of different classification models was carried out to predict the severity of COVID-19 cases with biological markers.

Keywords

Biological markers, Cross-validation, Ferritin, Machine learning, Metaheuristics, Oxygen saturation, Random forest.

Corresponding author: Freddy De la Cruz-Ruiz

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2024 Olán-Ramón JP et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Olán-Ramón JP, De la Cruz-Ruiz F, De la Cruz-Cano E et al. Identification of Biomarkers for Severity in COVID-19 Through Comparative Analysis of Five Machine Learning Algoritms [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2024, 13:688 (https://doi.org/10.12688/f1000research.150128.1) First published: 25 Jun 2024, 13:688 (https://doi.org/10.12688/f1000research.150128.1) Latest published: 25 Jun 2024, 13:688 (https://doi.org/10.12688/f1000research.150128.1)

1. Introduction

The coronavirus disease COVID-19 is a significant global public health problem. Given the ease with which new strains can emerge, it is essential to investigate and understand their pathophysiology using precise techniques.¹ Machine learning (ML) emerges as a promising tool, offering the possibility of improving precision and reducing the time of variable analysis to deeply understand the pathophysiology induced by COVID-19,² and, consequently, improve patient treatment.

When using machine learning in the study of this clinical condition, it is necessary to choose between supervised or unsupervised learning and determine the appropriate algorithm, such as Random Forest, Support Vector Machine, Logistic Regression, Decision Tree, Neural Network (MLP), among others. Various studies³^,⁴^,⁵ have demonstrated the effectiveness of these algorithms in biomedical problems. In relation to COVID-19, there is also evidence⁶ that precise biological markers can be decisive in the patient's prognosis.

Regarding the prediction of COVID-19 based on biomarkers in the study by Gharib et al.,⁷ inflammatory biomarkers were evaluated in 150 Egyptian patients and risk factors associated with the severity of COVID-19 patients. The study found a significant negative correlation between percent oxygen saturation and serum levels of inflammatory markers, including ferritin.

Another study with 50 patients infected with COVID-19 conducted in Peshawar by Khan et al.,⁸ an increase in the levels of CRP, ferritin, and IL-6 was detected, as well as changes in the count of neutrophils and lymphocytes. The authors concluded that elevation in CRP and ferritin are linked to secondary bacterial infections and adverse clinical outcomes. Findings from a meta-analysis suggest that serum ferritin levels correlate with the severity of COVID-19. Specifically, COVID-19 patients showed markedly higher ferritin levels compared to controls, with a SMD (Standardized Mean Difference) of -0.889 and a 95% CI of (-1.201, - 0.577).

Furthermore, patients with severe to critical COVID-19 symptoms showed elevated ferritin levels compared to those with mild to moderate symptoms, with an SMD of 0.882 and a 95% CI of (0.738, 1,026). Also significant was the finding that nonsurvivors had a pronounced increase in ferritin levels compared to survivors, with an SMD of 0.992 and a 95% CI of (0.672, 1.172). These observations emphasize the potential usefulness of serum ferritin as a biomarker in the management of COVID-19, although the presence of other comorbidities and confounding factors require cautious interpretation of the results.⁹

A study conducted in Alexandria, Egypt by Abdelhalim et al.,¹⁰ analyzed 210 non-hospitalized patients with confirmed COVID-19, aged between 14 and 75 years (mean age 44.5 ± 30.5). It was observed that 71.4% of these patients had high levels of serum ferritin, identifying ferritin as a significant biomarker for the diagnosis of COVID-19 in this population, with a value p = 0.014738. A comprehensive review of databases, including MEDLINE, EMBASE and others, identified relevant studies related to laboratory parameters in COVID-19 cases of different severities. Of 9,620 records, 40 studies with a total of 9,542 patients were included in the final analysis. The results showed that lymphopenia, thrombocytopenia and elevated levels of interleukin-6, ferritin, among other biomarkers, were associated with severe and fatal cases of COVID-19. In particular, elevated levels of interleukin-6 and hyperferritinemia were identified as indicators of systemic inflammation and a poor prognosis in patients with COVID-19.¹¹ In a study conducted at the Department of Pathology and Laboratory Medicine, Aga Khan University (AKU), Karachi by Sibtain et al.,¹² medical records of patients hospitalized with confirmed COVID-19 from March 1 to August 10, 2020 were reviewed. 157 patients were included in the final analysis, of which 108 were men and 49 were women. The analysis revealed a significant difference in ferritin levels between categories based on COVID-19 severity and mortality. Through binary logistic regression, ferritin was identified as an independent predictor of all-cause mortality in patients with COVID-19, with an AUC of 0.69 in the ROC analysis. The study concludes that serum ferritin concentration appears as a promising predictor of mortality in cases of COVID-19. In the work of Samprathi et al.,¹³ an extensive research protocol was designed for patients with COVID-19 that varies depending on the severity of symptoms and the presence of comorbidities. For those asymptomatic or with mild symptoms without comorbidities, no further investigations were requested. Patients with mild or moderate comorbidities, upon admission, needed tests such as CBC, CRP, serum creatinine, and liver function tests. In the presence of abnormalities, additional investigations were requested.

For severe cases, tests such as PT, APTT, INR and specific biomarkers were added, while for critical cases serial IL-6 and lactate levels were requested. Monitoring for hospitalized patients was performed using CBC and CRP every 48 to 72 hours. However, serum ferritin is not recommended to monitor response to treatment. For children with suspected MIS-C, a stepwise testing strategy is suggested, including cytokine testing and SARS-CoV-2 serology.

To predict the outcome in patients with COVID-19, AI techniques were successfully used, specifically the Random Forest and AdaBoost algorithms. These algorithms, using varied patient data, achieved a precision of $0.94$ and an F1 Score of $0.86$. A correlation between gender and mortality was also highlighted, with the majority of patients between 20 and 70 years.¹⁴ In the work of Ahmed et al.,¹⁵ machine learning methods (Random Forest, Support Vector Machine, Logistic Regression, Decision Tree and k Nearest Neighbors) were compared. Balancing of the dataset is not mentioned in this study. In the study by Wang et al.,¹⁶ the use of Random Forest to predict the severity of Covid-19 is reported, reaching an accuracy of 0.9905. For this purpose, they applied balancing using the SMOTE technique, achieving an AUC of 0.846.

Similarly, Elif et al.,¹⁷ used the SMOTE technique to balance their data. The Random Forest model they developed in their research returned a precision of 0.9796 and an AUC of 0.959. It is highlighted that some classes in their study had predictions lower than 90%, while others achieved perfect metrics. Among the biomarkers they identified as significant were LDH, high leukocyte count and C-reactive protein. On the other hand, Patterson et al.,¹⁸ also made use of the SMOTE balancing technique and developed a model that presented perfect metrics in all evaluated categories. Although they do not explicitly detail the cross-validation process in their study, they mention having used 10-fold CV (without specifying whether it was stratified) to perform an exhaustive search (grid-search) and determine the best hyperparameters. Additionally, it is noted that this validation was used exclusively for feature selection and not for dimensionality reduction.

In the work of Cui et al.,¹⁹ it is highlighted that its Random Forest model was able to predict the severity of the condition with a precision of 84.2% and an AUC of 0.874 for the first group and 0.842 for the second, both with an interval of confidence of 95%. LDH, D-Dimer and Fibrinogen stand out as key biomarkers. The main focus of their study, which included 437 patients, was to analyze the differences in these biomarkers between young (≤ 70 years) and older (≥ 70 years) people.

In the scope of the feature selection problem, metaheuristic algorithms have demonstrated effective addressing even in their most primitive versions on local and/or heuristic search mechanisms. In the work of Kabir et al.,²⁰ used genetic algorithms for variable selection. In particular, this work uses a variant of the algorithm that allows reducing redundancy during the search for variables to improve performance.

In the work of Guha et al.,²¹ mentions the effectiveness that simulated annealing has had in multidimensional variable selection problems. Like the previous work, it proposes an improvement to the algorithm through its hybridization with another search mechanism that allows it to improve general performance. In the work of Chen et al.,²² it is described that the crow search algorithm was applied to variable selection problems. The authors faced late-stage diversity challenges, which are effectively addressed by a hierarchical adaptive approach. In a study by Bandyopadhyay et al.,²³ on the detection of COVID-19 in radiological images, a two-stage pipeline consisting of feature extraction and selection was proposed. A CNN model based on the DenseNet architecture was used for feature extraction. To filter out non-informative and redundant features, the Harris Hawks optimization (HHO) algorithm with Simulated Annealing (SA) and chaotic initialization was used. When evaluating on the SARS-COV-2 CT-Scan dataset (2482 CT-scans), an accuracy of 98.85% was achieved with the inclusion of chaotic initialization and SA. The study reports a reduction in the number of selected features by 75%.

The main objective of this research is to evaluate and compare the performance of the algorithms: Random Forest, Support Vector Machine, Logistic Regression, Decision Tree, and Neural Network, using metrics such as precision, recall, F1-score and accuracy. With the models developed from these algorithms and metrics, we seek to identify which biological markers are affected in the pathophysiology of COVID-19 in order to monitor the clinical evolution and make medical decisions that contribute to the patient's recovery.

The structure of this document is as follows: section 2 details the methodology used in this research and the experimental design, section 3 presents the results. Section 4 focuses on discussing these results. Section 5 concludes the study. Section 6 shows some recommendations for future work.

1.1 Research question

What are the laboratory biomarkers to predict the severity of SARS-CoV-2 infection in patients from southeastern Mexico?

2. Methods

In this study, a comparative analysis of different classification models was carried out to predict the severity of COVID-19 cases.

2.1 Programming language details

The Python programming language was used for all calculations. For the classification models, the sci-kit learn suite was used in all cases.

Auxiliary libraries:

• DEAP: Allows the rapid construction and prototyping of genetic optimization algorithms.
• Simmaneal: Contains tools for building simulated annealing algorithms.
• Pandas: visualization and data management.
• Seaborn: Data visualization.
• Matplotlib: Data visualization.
• Tabulate: Tabulation of data.
• Scipy: Calculation of statistics. It comes integrated into Numpy.
• Numpy: Calculation of statistics. NumPy is a library for the Python programming language, adding support for large, multidimensional arrays and arrays, along with a large collection of high-level mathematical functions for operating on these arrays.
• Imblearn: Data balance for MLP with SMOTE. Imbalanced-learn offers resampling techniques.

2.2 Dataset details

This study includes patients diagnosed with COVID-19 according to the World Health Organization (WHO) guidelines for the clinical management of severe acute respiratory infections due to SARS-CoV-2,²⁴ who were admitted to the Intensive Care Unit (ICU) at the “Dr. Desiderio G. Rosado Carbajal” general hospital in the time period between April 1^st and July 31^st, 2021 in Comalcalco Tabasco, Mexico. The ethical and legal guidelines considered for sampling are described in De la Cruz-Cano et al.²⁵

The dataset has 138 entries that correspond to patients and 60 columns that represent different features. The features encompass a wide range of data, including demographic information, laboratory test results, vital signs, symptoms, pre-existing conditions, and the severity of the COVID-19 illness divided into three classes that are considered mild, moderate and severe.

2.3 Dataset characterizationc

Next, the study population contained in the dataset used is characterized. In the dataset, there is a greater number of male patients (84) compared to female patients (54). It is also observed that the average age of the patients is approximately 59.9 years, with an age range that goes from 41 to 82 years, and a median of 58.

The severity of COVID-19 presents a significant proportion of patients (44.93%) with moderate severity of the disease (count of 62 and proportion of 0.4493), while mild and severe cases constitute 20.29% (count of 28 and proportion of 0.2029) and 34.78% (count of 48 and proportion of 0.3478) of the total respectively. Based on these data, it can be seen that the dataset is moderately unbalanced. Regarding comorbidities, presented in Table 1, the most prevalent is Heart Disease (Heart disease) with 97.10%, followed by Chronic Kidney Disease (CKD) with 94.20%. On the other hand, the least prevalent is Obesity, presented in only 34.06% of patients.

Table 1. Statistics of comorbidities.

Comorbidity	Mean	Max
Hypertension	0.6087	1.0
T2DM	0.3913	1.0
Dyslipidemia	0.5507	1.0
CKD	0.9420	1.0
Heart_disease	0.9710	1.0
COPD	0.9130	1.0
Obesity	0.3406	1.0
Malignancy	0.9928	1.0

Table 2 shows that the most common symptom among patients is Diarrhea, present in 94.93% of cases, while the least common symptom is Fever, reported only in 8.70% of patients.

Table 2. Statistics of symptoms.

Symptom	Mean	Max
Fever	0.0870	1.0
Cough	0.1594	1.0
Sore_throat	0.3986	1.0
Myalgia	0.7101	1.0
Headache	0.4783	1.0
Diarrhea	0.9493	1.0

2.4.1 Experimental design

The research strategy focused on two primary objectives. The first objective was to identify the most effective algorithm for analyzing the dataset, and to conduct a correlation analysis of the variables to provide a suitable context. This was illustrated in Figure X (referenced as Figure 1). The second objective was to conduct a thorough analysis of various optimization methods to identify the key variables for a simplified model. This model should maintain a minimum reduction in its predictive capabilities while providing insight on the importance of each feature. The methodology used to achieve these objectives is explained in detail below:

Figure 1. Experimental process and algorithms used to predict the severity of COVID-19.

2.4.2 Data processing

The dataset was processed to contain a subset of 40 feature columns. The subset used in this work was generated from the first set by removing several features not directly related to the biomarkers. All variables were normalized, putting them on the same scale to facilitate analysis. The exclusion of certain variables is justified due to the analysis's focus on biomarkers and their importance in detecting the severity of COVID-19, which are features that can be objectively measured and that provide an assessment of a health condition.

2.4.3 Evaluation of algorithms and metrics

Five classification algorithms were evaluated and compared: Random Forest, Support Vector Machine, Logistic Regression, Decision Tree and Neural Network (Multi Layer Perceptron or MLP). To compare the performance of the models, evaluation metrics such as precision, recall, F1 score, and accuracy were used. The test used to discern the robustness of the methods was stratified five-segment cross-validation. During subsequent variable reduction experiments, an average of these metrics was employed as objective functions for the metaheuristic search algorithms.

2.4.4 Model selection

The selection of the optimal model depends on the analysis of variables and the analysis of variable reduction that provide indications of the importance of the biomarkers. Given that there is an imbalance in the dataset, the models that will be selected for these analyzes will be those that are balanced by weights (or SMOTE in the case of the MLP). This ensures that uneven distribution of data is taken into account and more reliable and representative results are provided.

2.4.5 Correlation analysis

Through correlation analysis, the variables (biological markers) associated with the severity of COVID-19 (COVID19_Severity) were identified. The correlated variables or biological markers are represented in a Table and a heat map of correlations.

2.4.6 Variable reduction

Heuristic experiments were carried out to progressively reduce the variables and the model performance was evaluated for each set of selected variables to obtain a model with a reduced number of variables without significantly compromising accuracy. In addition, variable selection experiments were performed using the following algorithms: genetic, simulated annealing, and crow search. These methods introduced variety to the models, ensuring that parameters were selected equitably to achieve similar time durations.

2.4.7 Overfitting analysis

To evaluate the possibility of overfitting in the generated models, difference tests between the training and validation sets were implemented through learning curves. The curves, produced with the learning_curve() function from the sklearn library, illustrate the performance of the model when varying the size of the training set. A threshold of 0.05 (or 5%) was set for the difference between training and validation scores. If the difference exceeded this threshold, the model was considered overfitted. This methodology facilitated the identification of feature sets that provided an optimal balance between accuracy and generalization., illustrate the performance of the model when varying the size of the training set. A threshold of 0.05 (or 5%) was set for the difference between training and validation scores. If the difference exceeded this threshold, the model was considered overfitted. This methodology facilitated the identification of feature sets that provided an optimal balance between accuracy and generalization.

3. Results

In this section, the results obtained in the study of predicting the severity of COVID-19 cases using different classification algorithms are presented. The algorithms evaluated include Random Forest, Support Vector Machine, Logistic Regression, Decision Tree and Neural Network (MLP). To evaluate the performance of each algorithm, the following metrics were used: precision, recall, F1-score and CV-Score (cross-validation).

3.1 Algorithm performance

The results in Table 3 provide a comparative view of the performance of machine learning algorithms based on various metrics.

Table 3. Basic metrics of algorithms sorted by best cross-validation (best algorithm shown in bold).

Model	CV Score	ROC AUC	Accuracy	Recall	Precision	F1 Score
Random Forest	0.9727	1.000	1.0000	1.0000	1.0	1.0000
Logistic Regression	0.9636	0.9834	0.9643	0.9722	1.0	0.9649
SVM	0.9545	0.9978	0.9643	0.9722	1.0	0.9649
Neural Network (MLP)	0.9467	0.9955	0.9286	0.9389	1.0	0.9290
Decision Tree	0.9364	0.9688	0.9643	0.9667	1.0	0.9641

This Table shows the basic metrics of the algorithms. It can be seen that all algorithms exhibit solid performance, with the precision, recall, F1-score, and ROC AUC value of each algorithm all exceeding 0.9. The Random Forest stands out as the best algorithm, which reaches the maximum value of 1.0000 in all metrics, with the exception of the cross-validation score, which is 0.9727. It's worth noting that, natively, Gradient Boosting (including XGBoost) and k Nearest Neighbors (kNN) do not allow balancing of classes and therefore they have not been considered in the comparison with the purpose of evaluating the models under equal conditions, in the same way, all the models have been initialized in random_state=42).

Table 4 shows the accuracy, recall and F1 scores for each class of each algorithm. Random Forest achieves scores of 1,000 in all classes, but other algorithms, such as SVM and Logistic Regression, also show high performance in several classes, reaching scores of 1,000 in several metrics. The results reported for each metric with respect to the Random Forest algorithm select it as the best algorithm and it is subsequently used for the variable reduction experiments and other comparisons used.

Table 4. Accuracy, Recall and F1 scores for each Class (metrics for the best model are shown in bold).

Model	Class	Accuracy score	Recall score	F1 Score
Random Forest	Mild	1.0000	1.0000	1.0000
Random Forest	Moderate	1.0000	1.0000	1.0000
Random Forest	Severe	1.0000	1.0000	1.0000
Logistic Regression	Mild	0.8571	1.0000	0.9231
Logistic Regression	Moderate	1.0000	0.9167	0.9565
Logistic Regression	Severe	1.0000	1.0000	1.0000
SVM	Mild	0.8571	1.0000	0.9231
SVM	Moderate	1.0000	0.9167	0.9565
SVM	Severe	1.0000	1.0000	1.0000
Neural Network (MLP)	Mild	0.8571	1.0000	0.9231
Neural Network (MLP)	Moderate	0.9167	0.9167	0.9167
Neural Network (MLP)	Severe	1.0000	0.9000	0.9474
Decision Tree	Mild	1.0000	1.0000	1.0000
Decision Tree	Moderate	0.9231	1.0000	0.9600
Decision Tree	Severe	1.0000	0.9000	0.9474

3.2 Correlation analysis

The characterization analysis of the variables contained in the subset of data under study and the correlation analysis are presented below. The features are ordered by correlation. Table 5 contains the averages and range of all features, the correlation value and the Pearson P value with interpretation to provide context of the dataset.

Table 5. Correlation, p-value and significance of biomarkers with the severity of COVID-19.

Feature	ALL (n=138)	Correlation	p-value	Significance
ferritine	1166 (529-1810)	0.8277	6.148e-36	Very significant
sat. O2	80.4 (65-91)	-0.6444	1.490e-17	Very significant
Fibrinogen	544 (222-891)	0.5356	1.301e-11	Very significant
DDimer	1020 (166-2498)	0.4731	4.631e-09	Very significant
Glucose	169 (82-502)	0.4632	1.063e-08	Very significant
Respiratory_Rate	28.59 (26-33)	0.4402	6.582e-08	Very significant
Procalcitonine	1.047 (0.04-14.52)	0.3855	3.019e-06	Very significant
IL6	183 (48-295)	0.3435	3.716e-05	Very significant
Na	137.8 (127-166)	-0.3020	0.0003178	Very significant
Creatinine	0.9928 (0.3-6.4)	-0.2815	0.0008228	Very significant
Urea	39.53 (26.9-130.5)	-0.2720	0.001251	Significant
Cl	101.7 (91-128)	-0.2627	0.001853	Significant
LDH	479 (215-948)	0.2387	0.004817	Significant
ALT	51.1 (11-293)	-0.2345	-0.2345	Significant
CRP	267 (69-426)	0.2166	0.01073	Slightly significant
AST	69.27 (13-365)	-0.2116	0.01274	Slightly significant
Heart_Rate	87.53 (76-110)	0.2097	0.01359	Slightly significant
Albumin	3.626 (2.4-4.9)	-0.1807	0.03396	Slightly significant
Hemoglobin	13.42 (6.1-16.9)	0.1685	0.04824	Slightly significant
INR	1.077 (0.92-1.76)	-0.1563	0.06709	Not significant
PT	13.32 (11.5-21.3)	-0.1549	0.06959	Not significant
APTT	36.45 (24.1-67)	0.1465	0.08634	Not significant
Uric_acid	5.302 (3-6.8)	0.1447	0.09038	Not significant
K	4.628 (3.1-7.9)	0.1180	0.1681	Not significant
MgSerico	2.03 (1.5-3.6)	0.1123	0.1897	Not significant
Fosforo	3.677 (2.8-6.9)	0.1090	0.2033	Not significant
Eosinophils	0.02246 (0-0.5)	-0.1071	0.2112	Not significant
Age	59.9 (41-82)	-0.09790	0.2533	Not significant
Lymphocyte	0.992 (0.3-2.2)	0.09082	0.2894	Not significant
Total_Cholesterol	202 (100-269)	0.06639	0.4391	Not significant
Triglycerides	156 (77-342)	0.04880	0.5697	Not significant
Temp °C	38.98 (37-40.1)	0.04733	0.5814	Not significant
Eosinophils	0.442 (0-8)	-0.04220	0.6231	Not significant
Neutrophils	80.95 (40.5-94.8)	0.03510	0.6827	Not significant
Lymphocyte	11.88 (3-43)	-0.03271	0.7033	Not significant
CaTotal	8.53 (7.1-9.2)	0.01871	0.8276	Not significant
WBC	10.53 (1.7-24.9)	0.01709	0.8423	Not significant
Neutrophils	8.852 (0.91-20.04)	-0.01134	0.8950	Not significant

Additionally, a bar graph of these correlations is shown in Figure 2.

Figure 2. Correlation Bar Chart (The first chart presents correlations and the second presents absolute correlations).

In the original heat map, correlations clusters are distinguished that are characterized by variables with magnitudes greater than or close to 0.5. These magnitudes suggest a potential impact on the subsequent phase of feature selection. In particular, in the upper left corner of the map, a grouping stands out where variables such as ferritin and oxygen saturation present an absolute correlation greater than 0.5 with each other. Closely, fibrinogen shows a correlation of 0.47. Likewise, in the lower corner of the map, a strong correlation is observed between the variables Na, Creatinine, Urea and Cl. In addition, in relation to these, the variables ALT and AST also stand out.

Figure 3, presents a refined version of this heatmap. In it, significant correlations with the target variable are evident (> 0.45). It can be deduced from this heat map that these correlations should, with high probability, be the ones selected in the variable reduction procedures.

Figure 3. Heat map of significant correlations (>0.45) between variables related to the target variable (correlations with magnitude greater than 0.5 are highlighted in red and bold).

3.3 Feature Selection

Four different approaches were used for feature selection: heuristic, genetic algorithm, simulated annealing algorithm and crow search algorithm. The two best models for each approach were identified (ranked based on cross-validation and number of variables). In the case of the heuristic approach, 3 and 4 variables were considered for the models. Once these models were found, it was decided that for metaheuristic approaches, the models would be encouraged to produce 3- and 4-feature solutions. The metaheuristic search algorithms were run ten times each and the two best models in each case were evaluated and compared at the end.

3.4 Heuristic Method

Several progressively generated models were trained by adding the features that were found to have statistical significance during the correlation analysis. Below is a brief description of them for context.

• Ferritin: Protein that stores iron in the body.
• Oxygen Saturation (Sat.O2): Percentage of hemoglobin that carries oxygen.
• Fibrinogen: Protein that helps in blood clotting.
• D-Dimer (DDimer): Fibrin degradation product in coagulation.
• Glucose: Blood sugar level.
• Respiratory Rate: Number of breaths per minute.
• Procalcitonin: Marker of inflammation and immune response.
• Interleukin 6 (IL6): Protein involved in the inflammatory response.
• Sodium (Na): Essential electrolyte for fluid balance.
• Creatinine: Muscle waste product eliminated by the kidneys.
• Urea: Waste that is formed when proteins are metabolized.
• Chloride (Cl): Chloride level in the blood.
• LDH: Enzyme that indicates tissue damage.
• ALT: Enzyme indicating liver damage.
• CRP: Protein that indicates inflammation in the body.
• AST: Enzyme indicating liver damage.
• Heart Rate: Number of heartbeats per minute.
• Albumin: Protein produced by the liver and found in blood plasma.
• Hemoglobin: Protein in red blood cells that transports oxygen.

As mentioned before, a limited set of variables within such a list (see heatmap in Figure 3) are expected to be displayed during variable selection. The models were trained progressively and the graphical results can be seen in Figure 4.

Figure 4. Vertical curves present the first appearance of a model with the best F1-Score (Green) or the best cross-validation accuracy (Magenta).

The values of the models represented in said graph are found in Table 6.

Table 6. Model metric results by adding statistically significant features and complete model (models that present some local improvement are shown in bold).

Added_feature	Accuracy	Recall	F1	ROC_AUC	CV_Accuracy
Ferritine	0.8929	0.8929	0.8922	0.9833	0.9000
Sat.O2	0.9643	0.9643	0.9649	1.0000	0.9636
Fibrinogen	0.9643	0.9643	0.9649	1.0000	0.9273
DDimer	0.9643	0.9643	0.9649	1.0000	0.9273
Glucose	0.9643	0.9643	0.9649	1.0000	0.9455
Respiratory_Rate	0.9643	0.9643	0.9649	1.0000	0.9545
Procalcitonine	1.0000	1.0000	1.0000	1.0000	0.9545
IL6	0.9643	0.9643	0.9649	1.0000	0.9455
Na	0.9643	0.9643	0.9649	1.0000	0.9364
Creatinine	0.9643	0.9643	0.9649	1.0000	0.9636
urea	0.9643	0.9643	0.9649	1.0000	0.9636
Cl	1.0000	1.0000	1.0000	1.0000	0.9636
LDH	1.0000	1.0000	1.0000	1.0000	0.9636
ALT	1.0000	1.0000	1.0000	1.0000	0.9636
CRP	1.0000	1.0000	1.0000	1.0000	0.9545
AST	1.0000	1.0000	1.0000	1.0000	0.9727
Heart_Rate	1.0000	1.0000	1.0000	1.0000	0.9636
Albumin	1.0000	1.0000	1.0000	1.0000	0.9364
Hemoglobin	1.0000	1.0000	1.0000	1.0000	0.9727
ALL	1.0000	1.0000	1.0000	1.0000	0.9727

During the training experiment with progressive addition of variables, 39 models were trained. The intermediate models, 20 to 38, have been omitted from Table 6 since their metrics were identical or inferior to subsequent models and the added variables did not show statistical significance during the correlation analysis. From this Table, procalcitonin and AST are highlighted as variables that increase the performance of the models; models were generated and analyzed with the progressive addition of these features. These models are referred to in this research simply as “Model 1” and “Model 2”.

In Table 7, a redistribution of the weights of the variables between Model 1 and Model 2 is observed. The inclusion of the variable AST in Model 2 entails a redistribution of the weight, especially with a notable reduction in the weight of procalcitonine from 0.2235 to 0.1140. However, despite the lower correlation of AST with severity (-0.2116), the inclusion of AST in Model 2 appears to contribute to a marginal improvement in the cross-validation accuracy of 0.9546 compared to 0.9636 for Model 1.

Table 7. Weight of the variables in the outstanding models and their correlation.

Model	Ferritin	Sat.O2	Procalcitonin	AST
Model 1 - Weight on model	0.4465	0.3299	0.2235	-
Modelo 2 - Weight on model	0.4849	0.3208	0.1140	0.0825
Correlation with Severity	0.8277	-0.6444	0.3855	-0.2116

It is notable that both models achieve perfect accuracy and perfect F1-score, indicating their effectiveness in classification. The main difference between them lies in their performance in cross-validation, where Model 2 outperforms Model 1 with an increase of approximately 0.007 when including the variable LDH. This suggests that the inclusion of LDH may improve the robustness of the model and its generalization ability.

Despite the marginal improvement, the decrease in cross-validation metrics in Model 2 compared to Model 1 suggests that the inclusion of the AST variable could be adding additional complexity without necessarily improving the generalizability of the model. The apparently unequal distribution of weights in the second model compared to the first could be a sign of overfitting.

A joint superficial analysis of Tables 7 and 8, as well as the behavior of the metrics between models in Figure 4, suggests that although the Models 1 and 2 show excellence in their metrics, with a precision, F1-score and AUC of $1.0000$, this perfect performance could indicate a possible overfitting to the training data. To this end, subsequent evaluations are based on cross-validation as an indicator of performance and generalization (see Figure 6).

Table 8. Comparison of the metrics of the two outstanding models.

Metrics	Model 1	Model 2
Precision	1.0000	1.0000
F1-score	1.0000	1.0000
AUC	1.0000	1.0000
Cross-validation	0.9636	0.9636

3.5 Metaheuristic method 1 (Genetic algorithm)

The algorithm used as an optimization function a combination of all the scores on average, so that it found the model that best satisfied all the metrics at the same time. The specifications of the genetic algorithm are as follows:

Individual Representation: Binary string of length equal to the number of features.

Aptitude Function: Average precision, recall, accuracy and F1 score.

Crossing: Two point crossing.

Mutation: Bit flip mutation with probability 0.05.

Selection: Tournament selection with size 3.

Population Size: 50.

Number of Generations: 20.

Hall of Fame: Best individual through all generations.

The performance of the best models found with the metaheuristic method is shown below in Table 9.

Table 9. Summary of the results of the models found by the genetic algorithm.

CV Score	Accuracy	Recall	F1 Score	Features
0.9727	1.000	1.000	1.000	['ferritine', 'urea', 'Cl']
0.9727	1.000	1.000	1.000	['ferritine', 'sat. O2', 'creatinine', 'ALT']
0.9727	1.000	1.000	1.000	['ferritine', 'sat. O2', 'Glucose', 'procalcitonine']
0.9545	1.000	1.000	1.000	['ferritine', 'sat. O2', 'procalcitonine', 'urea']
0.9545	1.000	1.000	1.000	['sat. O2', 'Respiratory_Rate', 'creatinine', 'Cl']
0.9273	1.000	1.000	1.000	['ferritine', 'DDimer', 'procalcitonine', 'Cl']
0.9273	0.9643	0.9643	0.9644	['Respiratory_Rate', 'Na', 'creatinine']
0.9182	1.000	1.000	1.000	['ferritine', 'procalcitonine', 'CRP']
0.8818	1.000	1.000	1.000	['ferritine', 'IL6', 'ALT']
0.8273	1.000	1.000	1.000	['Respiratory_Rate', 'IL6', 'Cl']

3.6 Metaheuristic method 2 (Simulated annealing algorithm)

This algorithm used a similar objective function as the previous algorithm. The specifications of the simulated annealing algorithm are as follows:

Individual Representation: Binary string of length equal to the number of features.

Power Function: Negative of the average of the F1 scores obtained through cross-validation.

Movement: Selects and deselects features randomly until the total of selected features equals 4.

State Initialization: Random binary string with exactly 4 selected features.

Initial temperature: 10.0.

Final Temperature: 0.01.

Number of Iterations: 150.

The performance of the best models found with metaheuristic method 2 is shown below in Table 10.

Table 10. Summary of results from simulated annealing iterations.

CV Score	Accuracy	Recall	F1 Score	Features
0.9727	1.0000	1.0000	1.0000	['ferritine', 'sat. O2', 'LDH', 'ALT']
0.9636	1.0000	1.0000	1.0000	['ferritina', 'sat. O2', 'Fibrinogen', 'ALT']
0.9636	1.0000	1.0000	1.0000	['ferritine', 'sat. O2', 'Fibrinogen', 'ALT']
0.9636	1.0000	1.0000	1.0000	['ferritine', 'sat. O2', 'Fibrinogen', 'ALT']
0.9545	1.0000	1.0000	1.0000	['ferritine', 'sat. O2', 'Fibrinogen', 'urea']
0.9545	1.0000	1.0000	1.0000	['ferritine', 'sat. O2', 'urea', 'LDH']
0.9545	1.0000	1.0000	1.0000	['ferritine', 'sat. O2', 'procalcitonine', 'urea']
0.9364	1.0000	1.0000	1.0000	['ferritine', 'sat. O2', 'creatinine', 'LDH']
0.9364	0.9643	0.9643	0.9649	['ferritine', 'sat. O2', 'Fibrinogen', 'Cl']
0.9273	0.9643	0.9643	0.9649	['ferritine','Respiratory\_Rate','procalcitonine','LDH']

3.7 Metaheuristic method 3 (Crow search algorithm)

This algorithm used a similar objective function as the previous algorithms. The specifications of the crow search algorithm are as follows:

Individual Representation: Binary string of length equal to the number of features.

Aptitude Function: Average of cross-validation score and F1 score.

Crow Movement: If the probability of consciousness is less than 0.2, it moves towards the target, otherwise it moves randomly.

Mutation: Flip bit with probability 0.2 (probability of consciousness).

Penalty: Penalizes solutions with less than 3 features or 5 or more features.

Population Size: 50.

Number of Iterations: 30.

Hall of Fame: Better individual through all iterations.

The performance of the best models found with metaheuristic method 3 is shown below in Table 11.

Table 11. Summary of the results of the models found by the crow search algorithm (the best 2 models are highlighted in bold).

CV Score	Accuracy	Recall	F1 Score	Features
0.9545	1.0000	1.0000	1.0000	['ferritine', 'sat. O2', 'urea', 'LDH']
0.9455	1.0000	1.0000	1.0000	['ferritine', 'sat. O2', 'Na', 'LDH']
0.9364	0.9286	0.9286	0.9274	['ferritine', 'DDimer', 'Respiratory_Rate', 'procalcitonine']
0.9364	0.9286	0.9286	0.9304	['ferritine', 'sat. O2', 'DDimer', 'Na']
0.9273	1.0000	1.0000	1.0000	['ferritine', 'DDimer', 'creatinine', 'urea']
0.9273	1.0000	1.0000	1.0000	['ferritine', 'DDimer', 'creatinine', 'urea']
0.9273	1.0000	1.0000	1.0000	['ferritine', 'DDimer', 'procalcitonine', 'Cl']
0.9182	0.9643	0.9643	0.9641	['ferritine', 'DDimer', 'Glucose', 'urea']
0.9091	0.9643	0.9643	0.9649	['ferritine', 'DDimer', 'procalcitonine']
0.9091	1.0000	1.0000	1.0000	['ferritine', 'DDimer', 'urea']

3.8 Comparison of reduced models

Tables 12 and 13 show the effectiveness and weighting of the features of the models under study (AUC and Specificity values are omitted as they are identical in all cases and equal to 1.0000).

Table 12. Feature selection results and model performance.

Features	CV Score	F1 Score	Accuracy	Recall	Selection Method
ferritine, urea, Cl	0.9727	1.0000	1.0000	1.0000	Genetic algorithm
ferritine, sat. O2, creatinine, ALT	0.9727	1.0000	1.0000	1.0000	Genetic algorithm
ferritine, sat. O2, LDH, ALT	0.9727	1.0000	1.0000	1.0000	Simulated Annealing
ferritine, sat. O2, procalcitonine	0.9636	1.0000	1.0000	1.0000	Heuristic
ferritine, sat. O2, procalcitonine, AST	0.9636	1.0000	1.0000	1.0000	Heuristic
ferritine, sat. O2, Fibrinogen, ALT	0.9636	1.0000	1.0000	1.0000	Simulated Annealing
ferritine, sat. O2, urea, LDH	0.9545	1.0000	1.0000	1.0000	Crow Search
ferritine, sat. O2, Na, LDH	0.9455	1.0000	1.0000	1.0000	Crow Search

Table 13. Performance and feature weights of the models (F1-Score, Accuracy and Recall omitted due to perfect scoring). The best models are highlighted in bold.

Model	Caracteristics	Weight	CV Score
M3	ferritine, urea, Cl	0.5650, 0.2660, 0.1690	0.9727
M4	ferritine, sat. O2, creatinine, ALT	0.4826, 0.3323, 0.07377, 0.1113	0.9727
M5	ferritine, sat. O2, LDH, ALT	0.4920, 0.3334, 0.06770, 0.1069	0.9727
M1	ferritine, sat. O2, procalcitonine	0.4465, 0.3300, 0.2235	0.9636
M2	ferritine, sat. O2, procalcitonine, AST	0.4840, 0.3069, 0.0901, 0.1191	0.9636
M6	ferritine, sat. O2, Fibrinogen, ALT	0.4668, 0.3250, 0.1175, 0.09072	0.9636
M7	ferritine, sat. O2, urea, LDH	0.4781, 0.3426, 0.1019, 0.07732	0.9545
M8	ferritine, sat. O2, Na, LDH	0.4780, 0.3598, 0.09156, 0.07058	0.9455

Certain clear patterns were observed in feature selection and its impact on model performance.

In Table 13, the models M3, M4, and M5, which have a cross-validation (CV) score of 0.9727, you can see the distribution of assigned weights of 0.5650 (M3), 0.4826 (M4) and 0.4920 (M5) respectively, for the Ferritin feature. This corroborates the importance of the variable in predicting the severity of COVID-19. In this table it can be seen that the ferritine feature is repeated in all models, and its weight is considerable in those with the best CV Scores (Models M3, M4, and M5), reinforcing its relevance in prediction. The variable sat. O2 is also selected consistently across almost all models, suggesting its predictive value, although its weight in the models varies, which could have implications for unbalancing the performance of each model.

Features such as urea, creatinine, LDH, ALT and procalcitonine are selected in certain models, with variations in their weights. This could suggest that its importance may fluctuate depending on the context of the other features in the model. Although all models achieve high accuracy, CV Scores vary, suggesting differences in their ability to generalize to new data. Models M3, M4, and M5 stand out and tie with the CV Score of 0.9727, an indicator of better generalization ability than the rest of the models generated in the list.

On the other hand, models M7 and M8 have the lowest CV Scores (0.9545 and 0.9455, respectively), which could indicate a lower robustness in generalization or even overfitting (see Figure 5 in the overfitting analysis). We can relate these results to the weights assigned to their features and the selection method used. In the model M4, the additional features creatinine and ALT are included with weights of 0.07377 and 0.1113 respectively. In contrast, model M3 uses urea and Cl with weights of 0.2660 and 0.1690. Despite these changes, the CV score remains the same at $0.9727$ for both models.

Figure 5. Cross-validation is performed for each model, along with the methods employed for variable selection.

Any model suspected of overfitting is represented with a grid.

Model M5, which also has a CV score of 0.9727, includes LDH and ALT with weights of 0.06770 and 0.1069, respectively, instead of creatinine which is found in the model M4. Despite this change, the weights assigned to ferritin and O2 saturation are very similar to those of the M4 model. A small difference of 0.006070 is shown when LDH is replaced by Creatinine. Models M4, M2 and M6 have CV scores of 0.9636. Despite sharing ferritin and O2 saturation with comparable weights with models M3, M4 and M5, these models also include procalcitonin, AST and fibrinogen with weights of 0.2235 (M1), 0.1191 (M2) and 0.1175 (M6) respectively.

Models M7 and M8 have the lowest CV scores (0.9545 and 0.9455) and include urea and Na with weights of 0.1019 and 0.09156. Although urea is in the best model and both urea and Na are highly correlated between the variables with statistical significance and with others, its performance is lower than that of other models. One possible reason is the interaction with the oxygen saturation variable, which unbalances the weight distribution, which could lead to overfitting.

A better qualitative comparison perspective and the importance of features can be appreciated in Table 14. In this table, the distribution of features in the models can be seen, where it can be observed that ferritine is present in all prediction models, due to its high correlation with the target variable and its high Pearson correlation. Following ferritine, oxygen saturation is present in almost all models, except the third one (which, according to Table 13, has a reasonable weight distribution and a high CV Score, so it could be possible to predict the disease severity without necessarily using oxygen saturation). Both features (ferritin and oxygen saturation) are considered as main features because they are present in all or almost all models, and this observation coincides with the calculated correlation for these features with the target feature. In addition, for the rest of the features, the complementary behavior of them with respect to the most frequent features can be seen. It is believed that those "secondary features" can be implicitly selected under diversity and cross-correlation criteria. It does not seem, in the first instance, to have any preferential distribution of secondary features, as the frequency values for these features fluctuate between 1 and 3 appearances. Being the secondary features most frequently appearing the LDH and ALT, whose target variable correlations are not necessarily the highest, but their interpretation with the p-value is deemed as "significant" (see Table 5).

Table 14. Using features by model and assigned correlation.

Feature	Heuristic		Genetic algorithm		Simulated Annealing		Crow Search		Correlation
Feature	M1	M2	M3	M4	M5	M6	M7	M8	Correlation
Ferritin	X	X	X	X	X	X	X	X	0.8277
Oxygen saturation	X	X		X	X	X	X	X	-0.6444
Fibrinogen						X			0.5356
Procalcitonin	X	X							0.3855
Na								X	-0.3020
Creatinine				X					-0.2815
Urea			X				X		-0.2720
Chlorine			X						-0.2627
LDH					X		X	X	0.2387
ALT				X	X	X			0.2387
AST		X							-0.2116

3.9 Overfitting analysis

An overfitting analysis was performed to evaluate the ability of the Random Forest model to generalize to new data. The overfitting analysis was carried out using two complementary approaches.

Learning curve: Learning curves are generated using code:

train_sizes, train_scores, test_scores = learning_curve(estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)

In this code snippet, the learning_curve() function creates a set of training and validation scores for different training set sizes. An overfitted model will show a large discrepancy between training and validation scores, especially for larger training sets. In Figure 6 you can see these curves (the red one represents validation and the green one represents training).

Figure 6. The validation and training curves appear close and within the threshold in almost all cases except in model 8.

Difference between training and validation scores: The overfitting test is performed using the following code:

if (train_scores_mean[-1] - test_scores_mean[-1]) > 0.05:  overfitting = "Yes"else:  overfitting = "No"

This code snippet calculates the difference between the mean training and validation scores for the largest training set size. If this difference is greater than 0.05, the model is considered to be overfitting.

The training and cross-validation results that were used to determine if the model is overfitted are presented in Table 15.

Table 15. Results of the training scores and cross-validation of the reduced models.

Caracteristics	CV Score	Train Score	Selection Method	Overfitting
ferritine, urea, Cl	0.9727	0.990909	Genetic algorithm	No
ferritine, sat. O2, creatinine, ALT	0.9727	0.990909	Genetic algorithm	No
ferritine, sat. O2, LDH, ALT	0.9727	1	Simulated Annealing	No
ferritine, sat. O2, procalcitonine	0.9636	0.990909	Heuristic	No
ferritine, sat. O2, Fibrinogen, ALT	0.9636	1	Simulated Annealing	No
ferritine, sat. O2, urea, LDH	0.9545	1	Crow Search	No
ferritine, sat. O2, procalcitonine, LDH	0.9455	0.990909	Heuristic	No
ferritine, sat. O2, Na, LDH	0.9455	1	Crow Search	Yes

3.10 Complementary analysis

The perplexity parameters of the best models are presented below in Table 16 and models 9, 10 and 11 have been added, generated progressively and reporting the biomarkers Ferritin + Oxygen saturation + Fibrinogen because they are the three with the highest correlation with the target variable. In addition, model 12 has been added, which contains all the variables. The specificity is the average of the three classes for this case.

Table 16. Perplexity of models ordered from highest to lowest with additional metrics.

Model	Caracteristics	Accuracy	CV Score	Perplexity	Specificity
M1	ferritine, sat. O2, procalcitonine	1	0.9636	1.0478	1
M4	ferritine, sat. O2, creatinine, ALT	1	0.9727	1.0502	1
M10	ferritine, sat. O2	0.9643	0.9636	1.0633	0.9849
M3	ferritine, urea, Cl	1	0.9727	1.0635	1
M7	ferritine, sat. O2, urea, LDH	1	0.9546	1.0671	1
M2	ferritine, sat. O2, procalcitonine, AST	1	0.9636	1.0683	1
M5	ferritine, sat. O2, LDH, ALT	1	0.9727	1.0743	1
M6	ferritine, sat. O2, Fibrinogen, ALT	1	0.9636	1.0850	1
M8	ferritine, sat. O2, Na, LDH	1	0.9455	1.0998	1
M11	ferritine, sat. O2, Fibrinogen	0.9643	0.9273	1.126	0.9849
M 12	ALL	1	0.9727	1.191	1
M9	ferritine	0.8929	0.9	1.254	0.9444

A minimal difference between models can be noted in this table. One of the models from the heuristic algorithm presents the least perplexity of the set, closely followed by one of the best models according to the CV score. The one that is considered the best algorithm in Table 15 does not have the slightest measure of perplexity. Even so, in second place is one of the best models found in said table, which is also the only one of those at the top of the table that has a perfect Train Score. These results could indicate that by reducing model variables, there is a possibility of a slight degradation in performance, which could go unnoticed compared to classical accuracy tests.

This particular measure was adopted with a pragmatic consideration, since the probability distribution could be used experimentally as an indicator of the improvement or worsening of the condition in the absence of data of a temporal nature. This is because this distribution exhibits the proximity and similarity of a sample between different categories. It is suggested that perplexity can play a significant role as a metric to perform a differential analysis between the most prominent models.

The model highlighted in the table in question (determined by its CV score and its lower perplexity) stands out for a choice of biomarkers of a varied nature, which could suggest greater robustness of the model itself.

4. Discussion

4.1 Importance of variables

This study supports the conclusions presented by Gharib et al.,⁷ which emphasizes the relevance of negative correlations with oxygen saturation and positive correlations with ferritin when investigating biomarkers of inflammation related to the severity of COVID-19. Furthermore, it highlights the importance of biomarkers such as IL-6, LDH, CRP and D-dimer. The correlations with the variables mentioned in the study by Khan et al8., are also corroborated during the analysis phase in this study, especially regarding ferritin (highly significant), IL-6 (highly significant) and CRP (slightly significant). Similarly, the study conducted in Alexandria by Abdelhalim et al10. found that 71.4% of non-hospitalized patients had elevated levels of serum ferritin, reaffirming its relevance as a biomarker in the diagnosis of COVID-19.

The results obtained could indicate that, beyond the individual impact that each variable may have, the interaction between various biomarkers could be of especially significant importance. Most of the models investigated integrate fundamental features, such as ferritin levels and oxygen saturation, along with other secondary features, such as fibrinogen, sodium (Na) levels, creatinine, urea, chloride (Cl), as well as the levels of the ALT and AST enzymes. In the differential analysis of perplexity, the three outstanding models were identified, and it was observed that the model that presented the best performance (determined by its highest score in the CV Score and its lowest perplexity) uses variables that reveal liver damage (represented by ALT), kidney damage (evidenced by creatinine), biomarkers that denote immunoinflammatory alterations (such as ferritine), as well as the balance in oxygenation (sat.02). These observations could suggest that feature diversity could play an important role in decreasing model perplexity and granularly increasing overall prediction performance. Within the focus of the study, it has not been found whether this importance of diversity has already been reported previously.

A point to highlight is the indication of Samprathi et al.¹³ on the non-recommendation of serum ferritin to monitor treatment response, despite its prominence in predictions. Additionally, a comprehensive review¹¹ identified lymphopenia, thrombocytopenia, interleukin-6, ferritin, among other biomarkers, as associated with severe and fatal cases of COVID-19. In particular, hyperferritinemia was highlighted as an indicator of systemic inflammation and a poor prognosis.

These findings, together with the identification of ferritin as an independent predictor of COVID-19 mortality in the study by Sibtain et al.,¹² highlight the need to continue investigating and evaluating the function and predictive value of these biomarkers in the disease using the interaction between variables as a framework.

4.2 Machine learning

This work is similar to the approach used in the work of Ahmed et al.¹⁵ However, there are fundamental differences in methodology. One of these is the deliberate exclusion of models that do not easily admit balance, as is the case of kNN. This choice was made despite the fact that many studies, including the recently mentioned,¹⁵ included it among their compared methods, along with Random Forest, Support Vector Machine, Logistic Regression and Decision Tree, such as the study by¹⁴ which used Random Forest and AdaBoost.

The current work has prioritized a preselection of biomarkers based on the Pearson p-value, facilitating the reduction of the search space for the metaheuristic algorithms that were used for subsequent feature selection. This preselection is consistent with the study by Elif et al.,¹⁷ which identified significant biomarkers with a similar method, agreeing with this study in biomarkers such as LDH, high leukocyte count and C-reactive protein. The models presented in said study achieved an accuracy of 0.9796 and an AUC value of 0.959, it can be suggested that the high prediction values may indeed be related to the selection of statistically significant features, given that this study reports exceptional metrics and AUC. A relevant aspect is that in the current work, unlike,²⁶ the data used during the correlation analysis have been used for subsequent variable reduction experiments. In other similar works²⁶^,¹⁵ the 5-fold stratified cross-validation, considered fundamental in the analysis of this work to evaluate the performance of the models, has not been implemented. In the work of Patterson et al.¹⁸ A 10-fold CV approach is used which appears to be quite effective. This approach is feasible in this work given the magnitude of the dataset (n=225).

It is suggested that data balancing could present a significant improvement in model construction. Xiong et al.,²⁶ carried out an analysis that is similar to the approach of this study. Despite working with a more extensive dataset (n=287) and having a notable diversity in features, they do not mention a specific balancing process. From the analysis of its results, the Random Forest model stood out with an AUC of 0.970, while the SVM model recorded the highest precision, with 88.5%. However, despite its extension, our study, with a sample of n=138, delves deeper and expands on the features of said data, in addition to using balancing techniques. For their part, Gok et al.¹⁷ worked with a larger dataset (n=863), which was balanced using the SMOTE technique. Of the 8 trained models, Random Forest had the best performance, achieving an accuracy of 0.98, similar to the results of our study. A key difference between the work of Gok et al.¹⁷ and ours lies in the nature of the variables used for prediction. While Gok et al.¹⁷ they did not use laboratory results that contained biomarkers such as IL-6, LDH or Ferritin, in this study they have been essential in the analysis. The latter can make prediction difficult because the biomarkers are not very specific, which highlights the high precision of the models generated in said study. It is vital to highlight that the study by Gok et al.¹⁷ emphasized how data preprocessing and balancing positively influence the accuracy and robustness of the model.

It should be noted that although Xiong et al.²⁶ does not detail its balancing process, other studies, such as that of Wang et al.,²⁷ have adopted the SMOTE technique successfully. Specifically, Wang et al.²⁷ they achieved an exceptional precision of 0.9905 and an AUC value of 0.846. Given the performance results of the models, this study has recognized RandomForest as one of the best classifiers, aligning with observations in.¹⁵^,²⁶ It is relevant to mention that this work has obtained an AUC value that slightly exceeds that reported by,¹⁷ but as the dataset is smaller, more tests must be performed before reaching definitive conclusions. When looking at other metrics, such as accuracy, this work has achieved comparable results to studies such as,¹⁴ which achieved an accuracy of 0.94 and an F1 Score of 0.86, and with an accuracy¹⁹ of 84.2 % and an AUC value of 0.874 for the first group (age ≤ 70) and 0.842 for the second (age ≥ 70).

Finally, while studies such as that of Iwendi et al.,¹⁴ which highlighted a correlation between gender and mortality, and that of Cui et al.,¹⁹ focused on the differences in biomarkers according to age, have provided unique perspectives, the present work has focused on validating and comparing techniques based on clinical analyzes with the aim of identifying the most appropriate approach to predict outcomes in patients with COVID-19. Future work could include the participation of these variables in the models.

4.3 Variable reduction

Feature selection has notably benefited from the use of metaheuristic algorithms, which have shown consistent improvements over traditional heuristic techniques, especially in high complexity areas such as COVID-19 severity prediction. In this work, we sought to use diverse algorithms inspired by evolution, physical mechanisms and collective intelligence, as mentioned in the work of Agrawa et al.²⁸ It should be noted that the algorithms applied were quite simple, trying to preserve the minimum version of each one to encourage equality of conditions, so they cannot be directly comparable with those of other works that present cumulative improvements said algorithms.²⁰^,²⁹^,²¹^,²²^,²³ Relevant differences with said works will be discussed below.

The genetic algorithm has proven effective in several contexts, including the work of Kabir et al.,²⁰ which introduced a redundancy reduction approach. In our study and in that of Hayet et al.,²⁹ the genetic algorithms provided remarkable results, although the choice of appropriate objective functions could have improved the performance. Focusing on the detection of COVID-19, the study by Hayet et al.,²⁹ highlights the importance of specific variables, such as CRP, Respiratory Rate, Oxygen Saturation, and LDH. The consistency in the selection of these variables between different investigations highlights their relevance, although differences have also been observed, such as the elimination of certain variables due to insufficient quality of the data.

Simulated Annealing, analyzed by Guha et al.,²¹ has proven to be useful for multidimensional problems. This work mentions a hybridization with a recently developed optimization algorithm called equilibrium optimization (Equilibrium Optimizer) which, according to said work, is capable of working during the exploitation phase to obtain better results than just simple simulated annealing, for this it is considered that simulated annealing works as a local preseeker. Additional mention is made of the work of Bandyopadhyay et al.,²³ where a two-stage pipeline was proposed for the detection of COVID-19 in radiological images. They used a DenseNet-based CNN for feature extraction and combined the Harris Hawks Optimization (HHO) algorithm with Simulated Annealing (SA) and chaotic initialization for selection. The study achieved an accuracy of 98.85% and reduced the number of features by 75%.

Despite efforts to use crow search, highlighted by Chen et al.,²² in our work, this algorithm showed limitations, including the potential for overfitting. In this study, the search is improved using a hierarchical approach that reports favorable results. In all previous works, an improvement in the results can be seen by hybridizing the algorithms with pre-optimization during the search phase that allows a better exploitation phase.

5. Conclusions

In this study, a comparative analysis of different classification models was carried out to predict the severity of COVID-19 cases with biological markers. The results obtained provide relevant information for the understanding and management of the disease. Below are the main conclusions of the study:

• Model evaluation: Five classification models (Random Forest, Support Vector Machine, Logistic Regression, Decision Tree and Neural Network) were compared using metrics such as precision, recall, F1-Score and accuracy. Most models showed similar performance across all metrics evaluated. The Random Forest model showed exceptional performance with cross-validation of 0.9727 and perfect metrics for the rest.
• Correlation of variables: A correlation analysis was carried out between the variables and the severity of COVID-19. It was observed that the biological markers ferritin and oxygen saturation (sat. O2) showed the highest correlations (0.8277, -0.6444) in absolute value with the target variable (COVID19_Severity). These findings may be useful in understanding factors (biological markers) related to disease severity.
• Feature Selection: Experiments were conducted to evaluate the efficiency of the model when using different sets of predictor variables (biological markers). The experiments were carried out with genetic algorithms, simulated annealing and crow search. It was found that even with only two biological markers (variables), ferritin and oxygen saturation, the model maintained a high level of accuracy (0.9636). This suggests that it is possible to reduce the number of variables without a significant loss in the predictive ability of the model. The best model (fewer variables and more precision) was found to be one that included the features ferritine, urea and Cl without losing any of the metrics.
• Overfitting evaluation: Cross-validation was performed and the importance of features in the Random Forest model was examined. The results indicated a stable performance of the model, with a small standard deviation and a prominent importance of the variables ferritin and oxygen saturation.

6. Recommendations

• Expansion of the dataset: It is considered that the dataset is relatively small and a larger sample would be preferred for better generalization of the model's classification patterns.
• Validation with external datasets: Although cross-validation allows us to better perceive the generalization of the models, it would be good to be able to validate said model with public datasets. However, many of these datasets do not have the diversity of features necessary to fully use the models in this work. It should be noted that public datasets require a lot of preprocessing and exhaustive research would be necessary for such an experiment. If it is not possible to completely make these data sources compatible, the number of features to be used could be limited in order to homogenize the data.
• Feature Selection: Although experiments were performed with fairly minimalist versions of the algorithms, it is possible to use some of the variants mentioned during the discussion. Its implementation would require a lot of experience in the subject but could suggest significant improvements in the convergence of said selections.
• Overfitting evaluation: With a homogenized external dataset or more cases, you could have a better idea of the overfitting of the generated models.

Ethical considerations

The ethical and legal guidelines considered for sampling are described in De la Cruz-Cano et al.²⁵

Author contributions

Eduardo de la Cruz-Cano - collect the dataset studied in this research and data curation.

Freddy de la Cruz-Ruiz - project administration, experimental study design.

Juan Pablo Olán-Ramón - software, wrote code, conducted experiments and wrote the manuscript draft.

Sarai Aguilar-Barojas - review and approved the manuscript to be submitted.

Erasmo Zamarron-Licona - review and approved the manuscript to be submitted.

Data availability

The dataset studied in this research was already made available through De la Cruz-Cano et al.²⁵Harvard Dataverse: Comorbidities and laboratory parameters associated with SARS-CoV-2 infection severity in patients from the southeast of Mexico: A cross-sectional study. https://doi.org/10.7910/DVN/DFALL683

This project contains the following files:

• - COVID19_Database (v1).tab (Data on clinical features and laboratory parameters of COVID-19 patients).
• - Data key.docx (Data key for variables and abbreviations in the tab file)

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

Software availability³

https://doi.org/10.5281/zenodo.11317990

Creative Commons Attribution 4.0 International

Acknowledgements

We thank the doctors and nurses assigned to the COVID-19 area for the data provided to carry out this research.

References

1. Zumla A, Niederman MS: The explosive epidemic outbreak of novel coronavirus disease 2019 (COVID-19) and the persistent threat of respiratory tract infectious diseases to global health security Current opinion in pulmonary medicine.2020.
2. Ramadijanti N, Muarifin, Basuki A: Comparison of Covid-19 Cases in Indonesia and Other Countries for Prediction Models in Indonesia Using Optimization in SEIR Epidemic Models. International Conference on ICT for Smart Society (ICISS). vol. CFP2013V-ART: pp. 1–6, 2020.
3. Moulaei K, Shanbehzadeh M, Mohammadi-Taghiabad Z, et al.: Comparing machine learning algorithms for predicting COVID-19 mortality. BMC Med. Inform. Decis. Mak. 2022; 22(1): 1–12. Publisher Full Text
4. Iwendi C, Huescas CGY, Chakraborty C, et al.: COVID-19 health analysis and prediction using machine learning algorithms for Mexico and Brazil patients. Journal of Experimental \& Theoretical Artificial Intelligence. 2022; 36: 1–21. Publisher Full Text
5. Prakash KB, Imambi SS, Ismail M, et al.: Analysis, prediction and evaluation of covid-19 datasets using machine learning algorithms. Int. J. 2020; 8(5): 2199–2204.
6. Tikale S, Rajurkar H, Kaple MN: CORONAVIRUS DISEASE 2019 (COVID19) A REVIEW ARTICLE. Journal of critical reviews. 2020.
7. Gharib AF, El Askary A , Hassan AF, et al.: Profiling Inflammatory Cytokines in a Cohort Study of Egyptian Patients with COVID-19 Infection. Clin. Lab. 2021; 67(6). Publisher Full Text
8. Khan M, Shah N, Mushtaq H, et al.: Profiling laboratory biomarkers associated with COVID-19 disease progression: a single-center experience. International Journal of Microbiology. 2021; vol. 2021: 1–7. PubMed Abstract | Publisher Full Text | Free Full Text
9. Kaushal K, Kaur H, Sarma P, et al.: Serum ferritin as a predictive biomarker in COVID-19. A systematic review, meta-analysis and meta-regression analysis. J. Crit. Care. 2022; vol. 67: pp. 172–181. PubMed Abstract | Publisher Full Text | Free Full Text
10. Yameny AA: Ferritin as a biomarker of infection in COVID-19 non-hospitalized patients. Journal of Bioscience and Applied Research. 2021; 7(1): 23–28. Publisher Full Text
11. Melo AK, Milby G, Keilla M, et al.: Biomarkers of cytokine storm as red flags for severe and fatal COVID-19 cases: A living systematic review and meta-analysis. PloS one. 2021; 16(6): E0253894. PubMed Abstract | Publisher Full Text | Free Full Text
12. Ahmed S, Ahmed ZA, Siddiqui I, et al.: Evaluation of serum ferritin for prediction of severity and mortality in COVID-19-A cross sectional study. Ann. Med. Surg. 2021; 63: 102163.
13. Samprathi M, Jayashree M: Biomarkers in COVID-19: an up-to-date review. Front. Pediatr. 2021; 8: 607647. PubMed Abstract | Publisher Full Text | Free Full Text
14. Iwendi C, Bashir AK, Peshkar A, et al.: COVID-19 patient health prediction using boosted random forest algorithm. Front. Public Health. 2020; 8: 357. Publisher Full Text
15. Ahmed AH, Al-Hamadani MNA, Satam IA: Prediction of COVID-19 disease severity using machine learning techniques. Bulletin of Electrical Engineering and Informatics. 2022; 11: 1069–1074. Publisher Full Text
16. Wang J, Yu H, Hua Q, et al.: A descriptive study of random forest algorithm for predicting COVID-19 patients outcome. PeerJ. 2020; vol. 8: pp. E9945. PubMed Abstract | Publisher Full Text | Free Full Text
17. Gok: SMOTE-NC and gradient boosting imputation based random forest classifier for predicting severity level of covid-19 patients with blood samples. Neural Comput. & Applic. 2021; 33(22): 15693–15707. PubMed Abstract | Publisher Full Text | Free Full Text
18. Patterson BK, Guevara-Coto J, Yogendra R, et al.: Immune-based prediction of COVID-19 severity and chronicity decoded using machine learning. Front. Immunol. 2021; 12: 2520. Publisher Full Text
19. Cui X, Wang S, Jiang N, et al.: Establishment of prediction models for COVID-19 patients in different age groups based on Random Forest algorithm. QJM: An International Journal of Medicine. 2021; vol. 114(11): pp. 795–801. Publisher Full Text
20. Kabir M, Shahjahan M, Murase K: A new local search based hybrid genetic algorithm for feature selection. Neurocomputing. 2011; 74(17): 2914–2928. Publisher Full Text
21. Guha R, Ghosh KK, Bera SK, et al.: Discrete equilibrium optimizer combined with simulated annealing for feature selection. J. Comput. Sci. 2022; 67: 101942.
22. Chen Y, Ye Z, Gao B, et al.: A Robust Adaptive Hierarchical Learning Crow Search Algorithm for Feature Selection. Electronics. 2023; 12(14): 3123. Publisher Full Text
23. Bandyopadhyay R, Basu A, Cuevas E, et al.: Harris Hawks optimisation with Simulated Annealing as a deep feature selection method for screening of COVID-19 CT-scans. Appl. Soft Comput. 2021; 111: 107698. PubMed Abstract | Publisher Full Text | Free Full Text
24. World Health Organization: Clinical management of severe acute respiratory infection (SARI) when COVID-19 disease is suspected: interim guidance, 13 March 2020 (No. WHO/2019-nCoV/clinical/2020.4).
25. De la Cruz-Cano E, del C Jiménez–González C, Díaz-Gandarilla JA, et al.: Comorbidities and laboratory parameters associated with SARS-CoV-2 infection severity in patients from the southeast of Mexico: a cross-sectional study. F1000Res. 2022; 11. Publisher Full Text
26. Xiong Y, Ma Y, Ruan L, et al.: Comparing different machine learning techniques for predicting COVID-19 severity. Infect. Dis. Poverty. 2022; 11(1): 1–9. Publisher Full Text
27. Wang J, Yu H, Hua Q, et al.: A descriptive study of random forest algorithm for predicting COVID-19 patients outcome. PeerJ. 2020; vol. 8: pp. E9945. PubMed Abstract | Publisher Full Text | Free Full Text
28. Agrawal P, Abutarboush HF, Ganesh T, et al.: Metaheuristic Algorithms on Feature Selection: A Survey of One Decade of Research (2009-2019). IEEE Access. 2021; 9: 26766–26791. Publisher Full Text
29. Hayet-Otero M, García: Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques. Plos one. 2023; 18(4): e0284150. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 25 Jun 2024

Author details Author details

¹ Facultad de Ingeniería, Universidad Autónoma de Yucatán, México., Mérida, Yucatán, 97203, Mexico
² División Académica Multidisciplinaria de Comalcalco., Universidad Juárez Autónoma de Tabasco, Comalcalco, Tabasco, 86650, Mexico
³ División Académica Multidisciplinaria de Comalcalco, Universidad Juarez Autonoma de Tabasco, Villahermosa, Tabasco, 86650, Mexico
⁴ Laboratorio Estatal de Salud Pública, Villahermosa, Tabasco, 86020, Mexico

Juan P. Olán-Ramón
Roles: Conceptualization, Data Curation, Software, Writing – Original Draft Preparation

Freddy De la Cruz-Ruiz
Roles: Conceptualization, Project Administration, Validation

Eduardo De la Cruz-Cano
Roles: Data Curation, Validation, Visualization

Sarai Aguilar-Barojas
Roles: Methodology, Validation, Writing – Review & Editing

Erasmo Zamarron-Licona
Roles: Conceptualization, Formal Analysis, Validation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 25 Jun 2024, 13:688

https://doi.org/10.12688/f1000research.150128.1

Copyright

© 2024 Olán-Ramón JP et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Olán-Ramón JP, De la Cruz-Ruiz F, De la Cruz-Cano E et al. Identification of Biomarkers for Severity in COVID-19 Through Comparative Analysis of Five Machine Learning Algoritms [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2024, 13:688 (https://doi.org/10.12688/f1000research.150128.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 25 Jun 2024

Views

7

Reviewer Report 09 Oct 2024

Gustavo Sganzerla Martinez, Dalhousie University, Halifax, Nova Scotia, Canada

Not Approved

https://doi.org/10.5256/f1000research.164667.r322245

I read with interest the work from Olan-Ramon et al. In their work, the authors compared different machine learning algorithms to predict severity in COVID-19 patients based on laboratory biomarkers.

Overall, the introduction is extensively long and ... Continue reading

I read with interest the work from Olan-Ramon et al. In their work, the authors compared different machine learning algorithms to predict severity in COVID-19 patients based on laboratory biomarkers.

Overall, the introduction is extensively long and fails to describe what are the biomarkers the authors have actually considered for their work.

Please indicate the version of python and the libraries used.

Reading the methods, tt is not clear whether the authors used an already collected dataset of patients or they did their own data collection. If the authors collected their own data, the authors need to provide additional details regarding their cohort such as inclusion/exclusion criteria, how the data was collected, the laboratory procedures used to collected data, and much more.

What is the criteria that defines mild, moderate, and severe COVID-19 infection?

The authors should pay attention to the spelling and writing consistency/quality throughout their text (e.g. title of section 2.3 - 'dataset characterizationc', the presence of $ when describing the performance in the eighth paragraph of the introduction, "this was illustrated in Figure X").

In tables 1 and 2, the authors should show the comorbidities and symptoms based on their groups (severe, mild, moderate).

The authors need to provide which are the 40 columns they used in their prediction (methods 2.4.2).

Did the authors used a multi-class classification?

The authors need to provide more details on their algorithm selection process. What was the ratio of train and test data points? What was the architecture of the models? I understand this is an initial process to determine the algorithm with best likelihood of correctly predicting, but for models such as neural networks, random forest, SVM, a few parameters need to be defined (number of neurons, layers; SVM kernel; number of estimators for RF, etc).

The correlations described in section 3.2 need more details. I assume (from the title of Table 5) the authors are correlating the continuous values of the biomarkers with COVID19 severity. How was severity treated as a continuous value rather than a class?

Is it really necessary to show code snippets along with the paper?

The chosen model by the authors (with best performance) was a random forest. The authors have provided an extensive feature selection process. I wonder what would have been the outcome if the authors have run the random forest with all the input variables and analyzed the feature importance of each. There is a built-in function based on gini importance of RF models in scikit learn (model.feature_importances_).

The authors have proposed models composed of different input variables with the best prediction score (table 15). It is clear that variables such as ferritin and sat 02 are ubiquitous in different subsets. I feel the authors should have provided an analysis of how these variables were treated by the RF model in characterizing each class of data (e.g. higher ferritin in severe patients, etc).

Overall, the authors well documented most of their machine learning application. However, the study is filled with several points that need to be revisited prior to its indexing so it can be scientifically sound and reproductible.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Artificial intelligence; immunology.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

22

Reviewer Report 28 Jun 2024

Maitham Ghaly Yousif, College of Science, University of Al-Qadisiyah, Baghdad, Iraq

Approved with Reservations

https://doi.org/10.5256/f1000research.164667.r296235

Assessment and Recommendations for the Study:
Strengths:

Algorithm Selection and Evaluation: The study effectively utilised a variety of machine learning algorithms, evaluating them comprehensively using multiple accuracy metrics, which enhances the credibility of the results.

Assessment and Recommendations for the Study:
Strengths:

Algorithm Selection and Evaluation: The study effectively utilised a variety of machine learning algorithms, evaluating them comprehensively using multiple accuracy metrics, which enhances the credibility of the results.
Correlation Analysis: The detailed correlation analysis provided deep insights into the relationships between variables and the severity of COVID-19, contributing significantly to the field.

Weaknesses:

Sample Size: The relatively small sample size (n=138) may limit the generalisability of some machine learning algorithms used in this study.
Class Balance: The study does not clearly detail how class balance was addressed, which is crucial for ensuring the accuracy of machine learning model outcomes.

Recommendations:

Increase Sample Size: To improve generalisability and enhance the robustness of the findings, it is recommended to increase the sample size in future studies.
Detailed Class Balance Analysis: A more detailed analysis of how class balance affects model performance is recommended, including exploring strategies such as SMOTE for improving class balance.

Unassessed Aspects of the Study: The technical specifics related to data configuration and preprocessing were not assessed due to a lack of detailed information on the methods used for initial data preparation and analysis.
Mandatory Reviewer Questions:

Clarity and Precision of Writing: The study is clearly written, providing sufficient detail about the algorithms and methodologies employed.
Relevance to the Field: The study is highly relevant to COVID-19 research and innovatively applies machine learning techniques.
Originality and Contribution to Knowledge: The study makes a valuable contribution by identifying potential biomarkers for the severity of COVID-19.

Approval Status: Approved with Reservations - The study is recommended for revision to address issues related to class balance and sample size to strengthen the research and its generalisability.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: immunology, Medical Microbiology, and AI

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 25 Jun 2024

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 25 Jun 24	read	read

Maitham Ghaly Yousif, University of Al-Qadisiyah, Baghdad, Iraq
Gustavo Sganzerla Martinez, Dalhousie University, Halifax, Canada

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

7 Views

09 Oct 2024 | for Version 1

Gustavo Sganzerla Martinez, Dalhousie University, Halifax, Nova Scotia, Canada

7 Views Cite this report Responses(0)

Not Approved

I read with interest the work from Olan-Ramon et al. In their work, the authors compared different machine learning algorithms to predict severity in COVID-19 patients based on laboratory biomarkers.

Overall, the introduction is extensively long and fails to describe what are the biomarkers the authors have actually considered for their work.

Please indicate the version of python and the libraries used.

Reading the methods, tt is not clear whether the authors used an already collected dataset of patients or they did their own data collection. If the authors collected their own data, the authors need to provide additional details regarding their cohort such as inclusion/exclusion criteria, how the data was collected, the laboratory procedures used to collected data, and much more.

What is the criteria that defines mild, moderate, and severe COVID-19 infection?

The authors should pay attention to the spelling and writing consistency/quality throughout their text (e.g. title of section 2.3 - 'dataset characterizationc', the presence of $ when describing the performance in the eighth paragraph of the introduction, "this was illustrated in Figure X").

In tables 1 and 2, the authors should show the comorbidities and symptoms based on their groups (severe, mild, moderate).

The authors need to provide which are the 40 columns they used in their prediction (methods 2.4.2).

Did the authors used a multi-class classification?

The authors need to provide more details on their algorithm selection process. What was the ratio of train and test data points? What was the architecture of the models? I understand this is an initial process to determine the algorithm with best likelihood of correctly predicting, but for models such as neural networks, random forest, SVM, a few parameters need to be defined (number of neurons, layers; SVM kernel; number of estimators for RF, etc).

The correlations described in section 3.2 need more details. I assume (from the title of Table 5) the authors are correlating the continuous values of the biomarkers with COVID19 severity. How was severity treated as a continuous value rather than a class?

Is it really necessary to show code snippets along with the paper?

The chosen model by the authors (with best performance) was a random forest. The authors have provided an extensive feature selection process. I wonder what would have been the outcome if the authors have run the random forest with all the input variables and analyzed the feature importance of each. There is a built-in function based on gini importance of RF models in scikit learn (model.feature_importances_).

The authors have proposed models composed of different input variables with the best prediction score (table 15). It is clear that variables such as ferritin and sat 02 are ubiquitous in different subsets. I feel the authors should have provided an analysis of how these variables were treated by the RF model in characterizing each class of data (e.g. higher ferritin in severe patients, etc).

Overall, the authors well documented most of their machine learning application. However, the study is filled with several points that need to be revisited prior to its indexing so it can be scientifically sound and reproductible.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Artificial intelligence; immunology.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

22 Views

28 Jun 2024 | for Version 1

Maitham Ghaly Yousif, College of Science, University of Al-Qadisiyah, Baghdad, Iraq

22 Views Cite this report Responses(0)

Approved With Reservations

Assessment and Recommendations for the Study:
Strengths:

Algorithm Selection and Evaluation: The study effectively utilised a variety of machine learning algorithms, evaluating them comprehensively using multiple accuracy metrics, which enhances the credibility of the results.
Correlation Analysis: The detailed correlation analysis provided deep insights into the relationships between variables and the severity of COVID-19, contributing significantly to the field.

Weaknesses:

Sample Size: The relatively small sample size (n=138) may limit the generalisability of some machine learning algorithms used in this study.
Class Balance: The study does not clearly detail how class balance was addressed, which is crucial for ensuring the accuracy of machine learning model outcomes.

Recommendations:

Increase Sample Size: To improve generalisability and enhance the robustness of the findings, it is recommended to increase the sample size in future studies.
Detailed Class Balance Analysis: A more detailed analysis of how class balance affects model performance is recommended, including exploring strategies such as SMOTE for improving class balance.

Unassessed Aspects of the Study: The technical specifics related to data configuration and preprocessing were not assessed due to a lack of detailed information on the methods used for initial data preparation and analysis.
Mandatory Reviewer Questions:

Clarity and Precision of Writing: The study is clearly written, providing sufficient detail about the algorithms and methodologies employed.
Relevance to the Field: The study is highly relevant to COVID-19 research and innovatively applies machine learning techniques.
Originality and Contribution to Knowledge: The study makes a valuable contribution by identifying potential biomarkers for the severity of COVID-19.

Approval Status: Approved with Reservations - The study is recommended for revision to address issues related to class balance and sample size to strengthen the research and its generalisability.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

immunology, Medical Microbiology, and AI

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. Zumla A, Niederman MS: The explosive epidemic outbreak of novel coronavirus disease 2019 (COVID-19) and the persistent threat of respiratory tract infectious diseases to global health security Current opinion in pulmonary medicine.2020.

[2] 2. Ramadijanti N, Muarifin, Basuki A: Comparison of Covid-19 Cases in Indonesia and Other Countries for Prediction Models in Indonesia Using Optimization in SEIR Epidemic Models. International Conference on ICT for Smart Society (ICISS). vol. CFP2013V-ART: pp. 1–6, 2020.

[3] 3. Moulaei K, Shanbehzadeh M, Mohammadi-Taghiabad Z, et al.: Comparing machine learning algorithms for predicting COVID-19 mortality. BMC Med. Inform. Decis. Mak. 2022; 22(1): 1–12. Publisher Full Text

[4] 4. Iwendi C, Huescas CGY, Chakraborty C, et al.: COVID-19 health analysis and prediction using machine learning algorithms for Mexico and Brazil patients. Journal of Experimental \& Theoretical Artificial Intelligence. 2022; 36: 1–21. Publisher Full Text

[5] 5. Prakash KB, Imambi SS, Ismail M, et al.: Analysis, prediction and evaluation of covid-19 datasets using machine learning algorithms. Int. J. 2020; 8(5): 2199–2204.

[6] 6. Tikale S, Rajurkar H, Kaple MN: CORONAVIRUS DISEASE 2019 (COVID19) A REVIEW ARTICLE. Journal of critical reviews. 2020.

[7] 7. Gharib AF, El Askary A , Hassan AF, et al.: Profiling Inflammatory Cytokines in a Cohort Study of Egyptian Patients with COVID-19 Infection. Clin. Lab. 2021; 67(6). Publisher Full Text

[8] 8. Khan M, Shah N, Mushtaq H, et al.: Profiling laboratory biomarkers associated with COVID-19 disease progression: a single-center experience. International Journal of Microbiology. 2021; vol. 2021: 1–7. PubMed Abstract | Publisher Full Text | Free Full Text

[9] 9. Kaushal K, Kaur H, Sarma P, et al.: Serum ferritin as a predictive biomarker in COVID-19. A systematic review, meta-analysis and meta-regression analysis. J. Crit. Care. 2022; vol. 67: pp. 172–181. PubMed Abstract | Publisher Full Text | Free Full Text

[10] 10. Yameny AA: Ferritin as a biomarker of infection in COVID-19 non-hospitalized patients. Journal of Bioscience and Applied Research. 2021; 7(1): 23–28. Publisher Full Text

[11] 11. Melo AK, Milby G, Keilla M, et al.: Biomarkers of cytokine storm as red flags for severe and fatal COVID-19 cases: A living systematic review and meta-analysis. PloS one. 2021; 16(6): E0253894. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Ahmed S, Ahmed ZA, Siddiqui I, et al.: Evaluation of serum ferritin for prediction of severity and mortality in COVID-19-A cross sectional study. Ann. Med. Surg. 2021; 63: 102163.

[13] 13. Samprathi M, Jayashree M: Biomarkers in COVID-19: an up-to-date review. Front. Pediatr. 2021; 8: 607647. PubMed Abstract | Publisher Full Text | Free Full Text

[14] 14. Iwendi C, Bashir AK, Peshkar A, et al.: COVID-19 patient health prediction using boosted random forest algorithm. Front. Public Health. 2020; 8: 357. Publisher Full Text

[15] 15. Ahmed AH, Al-Hamadani MNA, Satam IA: Prediction of COVID-19 disease severity using machine learning techniques. Bulletin of Electrical Engineering and Informatics. 2022; 11: 1069–1074. Publisher Full Text

[16] 16. Wang J, Yu H, Hua Q, et al.: A descriptive study of random forest algorithm for predicting COVID-19 patients outcome. PeerJ. 2020; vol. 8: pp. E9945. PubMed Abstract | Publisher Full Text | Free Full Text

[17] 17. Gok: SMOTE-NC and gradient boosting imputation based random forest classifier for predicting severity level of covid-19 patients with blood samples. Neural Comput. & Applic. 2021; 33(22): 15693–15707. PubMed Abstract | Publisher Full Text | Free Full Text

[18] 18. Patterson BK, Guevara-Coto J, Yogendra R, et al.: Immune-based prediction of COVID-19 severity and chronicity decoded using machine learning. Front. Immunol. 2021; 12: 2520. Publisher Full Text

[19] 19. Cui X, Wang S, Jiang N, et al.: Establishment of prediction models for COVID-19 patients in different age groups based on Random Forest algorithm. QJM: An International Journal of Medicine. 2021; vol. 114(11): pp. 795–801. Publisher Full Text

[20] 20. Kabir M, Shahjahan M, Murase K: A new local search based hybrid genetic algorithm for feature selection. Neurocomputing. 2011; 74(17): 2914–2928. Publisher Full Text

[21] 21. Guha R, Ghosh KK, Bera SK, et al.: Discrete equilibrium optimizer combined with simulated annealing for feature selection. J. Comput. Sci. 2022; 67: 101942.

[22] 22. Chen Y, Ye Z, Gao B, et al.: A Robust Adaptive Hierarchical Learning Crow Search Algorithm for Feature Selection. Electronics. 2023; 12(14): 3123. Publisher Full Text

[23] 23. Bandyopadhyay R, Basu A, Cuevas E, et al.: Harris Hawks optimisation with Simulated Annealing as a deep feature selection method for screening of COVID-19 CT-scans. Appl. Soft Comput. 2021; 111: 107698. PubMed Abstract | Publisher Full Text | Free Full Text

[24] 24. World Health Organization: Clinical management of severe acute respiratory infection (SARI) when COVID-19 disease is suspected: interim guidance, 13 March 2020 (No. WHO/2019-nCoV/clinical/2020.4).

[25] 25. De la Cruz-Cano E, del C Jiménez–González C, Díaz-Gandarilla JA, et al.: Comorbidities and laboratory parameters associated with SARS-CoV-2 infection severity in patients from the southeast of Mexico: a cross-sectional study. F1000Res. 2022; 11. Publisher Full Text

[26] 26. Xiong Y, Ma Y, Ruan L, et al.: Comparing different machine learning techniques for predicting COVID-19 severity. Infect. Dis. Poverty. 2022; 11(1): 1–9. Publisher Full Text

[27] 27. Wang J, Yu H, Hua Q, et al.: A descriptive study of random forest algorithm for predicting COVID-19 patients outcome. PeerJ. 2020; vol. 8: pp. E9945. PubMed Abstract | Publisher Full Text | Free Full Text

[28] 28. Agrawal P, Abutarboush HF, Ganesh T, et al.: Metaheuristic Algorithms on Feature Selection: A Survey of One Decade of Research (2009-2019). IEEE Access. 2021; 9: 26766–26791. Publisher Full Text

[29] 29. Hayet-Otero M, García: Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques. Plos one. 2023; 18(4): e0284150. PubMed Abstract | Publisher Full Text | Free Full Text

Identification of Biomarkers for Severity in COVID-19 Through Comparative Analysis of Five Machine Learning Algoritms

Abstract

Background

Aim

Methods

Results

Conclusion

Keywords

1. Introduction

1.1 Research question

2. Methods

2.1 Programming language details

2.2 Dataset details

2.3 Dataset characterizationc

Table 1. Statistics of comorbidities.

Table 2. Statistics of symptoms.

Figure 1. Experimental process and algorithms used to predict the severity of COVID-19.

3. Results

3.1 Algorithm performance

Table 3. Basic metrics of algorithms sorted by best cross-validation (best algorithm shown in bold).

Table 4. Accuracy, Recall and F1 scores for each Class (metrics for the best model are shown in bold).

3.2 Correlation analysis

Table 5. Correlation, p-value and significance of biomarkers with the severity of COVID-19.

Figure 2. Correlation Bar Chart (The first chart presents correlations and the second presents absolute correlations).

Figure 3. Heat map of significant correlations (>0.45) between variables related to the target variable (correlations with magnitude greater than 0.5 are highlighted in red and bold).

3.3 Feature Selection

3.4 Heuristic Method

Figure 4. Vertical curves present the first appearance of a model with the best F1-Score (Green) or the best cross-validation accuracy (Magenta).

Table 6. Model metric results by adding statistically significant features and complete model (models that present some local improvement are shown in bold).

Table 7. Weight of the variables in the outstanding models and their correlation.

Table 8. Comparison of the metrics of the two outstanding models.

3.5 Metaheuristic method 1 (Genetic algorithm)

Table 9. Summary of the results of the models found by the genetic algorithm.

3.6 Metaheuristic method 2 (Simulated annealing algorithm)

Table 10. Summary of results from simulated annealing iterations.

3.7 Metaheuristic method 3 (Crow search algorithm)

Table 11. Summary of the results of the models found by the crow search algorithm (the best 2 models are highlighted in bold).

3.8 Comparison of reduced models

Table 12. Feature selection results and model performance.

Table 13. Performance and feature weights of the models (F1-Score, Accuracy and Recall omitted due to perfect scoring). The best models are highlighted in bold.

Figure 5. Cross-validation is performed for each model, along with the methods employed for variable selection.

Table 14. Using features by model and assigned correlation.

3.9 Overfitting analysis

Figure 6. The validation and training curves appear close and within the threshold in almost all cases except in model 8.

Table 15. Results of the training scores and cross-validation of the reduced models.

3.10 Complementary analysis

Table 16. Perplexity of models ordered from highest to lowest with additional metrics.

4. Discussion

4.1 Importance of variables

4.2 Machine learning

4.3 Variable reduction

5. Conclusions

6. Recommendations

Ethical considerations

Author contributions

Data availability

Software availability 3

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated

Software availability³