<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.123776.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Machine learning-based heart attack prediction: A &#x00a0;symptomatic heart attack prediction method and exploratory analysis</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Nandal</surname>
                        <given-names>Neha</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2566-5925</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Goel</surname>
                        <given-names>Lipika</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-1609-2475</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>TANWAR</surname>
                        <given-names>ROHIT</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>COMPUTER SCIENCE AND ENGINEERING, GOKARAJU RANGARAJU INSTITUTE OF ENGINEERING AND TECHNOLOGY, HYDERABAD, TELANGANA, 500090, India</aff>
                <aff id="a2">
                    <label>2</label>School of Computer Science, University of Petroleum &amp; Energy Studies, DEHRADUN, UTTRAKHAND, 248007, India</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:neha1607@grietcollege.com">neha1607@grietcollege.com</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>29</day>
                <month>9</month>
                <year>2022</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2022</year>
            </pub-date>
            <volume>11</volume>
            <elocation-id>1126</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>12</day>
                    <month>8</month>
                    <year>2022</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Nandal N et al.</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/11-1126/pdf"/>
            <abstract>
                <p>
                    <bold>Background</bold>: Heart attack prediction is one of the serious causes of morbidity in the world&#x2019;s population. The clinical data analysis includes a very crucial disease i.e., cardiovascular disease as one of the most important sections for the prediction. Data Science and machine learning (ML) can be very helpful in the prediction of heart attacks in which different risk factors like high blood pressure, high cholesterol, abnormal pulse rate, diabetes, etc&#x2026; can be considered. The objective of this study is to optimize the prediction of heart disease using ML.</p>
                <p>
                    <bold>Methods:</bold> In this paper, we are presenting a machine learning-based heart attack prediction (ML-HAP) method in which the analysis of different risk factors and prediction for heart attacks is done using ML approaches of Support Vector Machines, Logistic Regression, Na&#x00ef;ve Bayes and XGBoost. The data of heart disease symptoms has been collected from the UCI ML Repository and analysis has been performed on the data using ML methods. The focus has been on optimizing the prediction on the basis of different parameters.</p>
                <p>
                    <bold>Results:</bold> XGBoost provided the best prediction among the four. The Area under the curve achieved with XGBoost is .94 and Logistic Regression is .92. The prediction with ML models in identifying heart attack symptoms is highly efficient, especially with boosting algorithms. The prediction was done to evaluate accuracy, precision, recall, and area under the curve. ML models are being trained to perform optimized predictions.</p>
                <p>
                    <bold>Conclusions</bold>: This prediction can help clinically in analyzing the risk factors of the disease and interpretation of the patient scenario. Boosting the algorithm provided promising results to predict symptoms of heart disease. It can further be optimized by working further on risk factors associated with this condition.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Disease prediction</kwd>
                <kwd>Machine Learning</kwd>
                <kwd>XGBoost</kwd>
                <kwd>Logistic Regression</kwd>
                <kwd>Performance measures.</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <p>A heart attack which is analogous to acute myocardial infarction (AMI) is one of the most serious diseases in the segment of cardiovascular disease. It occurs due to the interruption of blood circulation to muscle of the heart which damages the heart the muscle. Diagnosing heart disease is also a crucial task. The symptoms, physical examination, and understanding of the different signs of this disease are required to diagnose heart disease. Different factors including cholesterol, genetic heart disease, high blood pressure, low physical activity, obesity, and smoking can be reasons for the occurrence of heart disease. The major reason for heart attacks is the stoppage of blood to the coronary arteries. The red blood cells (RBC) start getting low when blood flow is reduced; due to this the human body stops getting necessary oxygen and loses consciousness. The early diagnosis through symptoms and signs can help prevent patients of heart attacks if the prediction is accurate enough. 
                <xref ref-type="fig" rid="f1">Figure 1</xref> shows different symptoms of a heart attack. The work presented takes 13 features/attributes as input having number values. It has been stated that little modifications in lifestyle including quitting smoking/alcohol/tobacco, having healthy food habits, and routine exercises can help in the prevention of heart attacks. Any person living a healthy lifestyle with early treatment after diagnosis can greatly increase the positive results. However, it is difficult to identify the high risk of heart disease where different risks like diabetes, high blood pressure, and cholesterol problems are present. In these types of scenarios, ML can help in the early diagnosis of disease.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>Figure 1. </label>
                <caption>
                    <title>Symptoms of a heart attack.</title>
                </caption>
                <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/135913/5e535a33-a4f3-4274-9b96-ae5a6255249f_figure1.gif"/>
            </fig>
            <sec id="sec2">
                <title>Heart disease in the context of machine learning</title>
                <p>Previous works have declared that prediction can be improved with the application of feature selection and proper engineering.
                    <sup>
                        <xref ref-type="bibr" rid="ref1">1</xref>
                    </sup> An experiment with different machine learning approaches and models by tuning various hyper-parameters has been performed and improved the performance with optimized accuracy.
                    <sup>
                        <xref ref-type="bibr" rid="ref1">1</xref>
                    </sup> Neural networks performed well when compared to other machine learning classifiers i.e., Na&#x00ef;ve Bayes, J48, CART, Grading, and SVM with nearly 79% accuracy.</p>
                <p>Other researchers worked on the reduction of cardiovascular features and extracted nonlinear features with discriminant analysis.
                    <sup>
                        <xref ref-type="bibr" rid="ref2">2</xref>
                    </sup> Fisher was utilized for the experiment&#x2019;s purpose to tackle overfitting problems and to improve the training speed. Results stated that 100% accuracy has been shown for the detection of coronary disease. 
                    <xref ref-type="table" rid="T1">Table 1</xref> represents the summary of literature survey done for the work.</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>Table 1. </label>
                    <caption>
                        <title>Summary of the literature survey.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Author</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Findings</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Boshra Brahmi 
                                    <italic toggle="yes">et al.</italic>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref20">20</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Data mining techniques were utilized for the prediction of heart disease and J48 outperformed other models like K-Nearest Neighbor (KNN), Support Vector Machines (SVM), and na&#x00ef;ve Bayes.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Marjia 
                                    <italic toggle="yes">et al.</italic>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref21">21</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Weka-based heart disease prediction was done and SMO gave maximum accuracy of 89% as compared to Bayes net with 87% accuracy and J48 with 86%.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Zhang 
                                    <italic toggle="yes">et al.</italic>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref5">5</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Worked on principle component analysis (PCA) using ADABOOST algorithm for prediction of heart disease.</td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Chala Bayen 
                                    <italic toggle="yes">et al.</italic>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref22">22</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">A short time result to improvise the quality of service has been presented with data mining models.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Stephen J. Mooney 
                                    <italic toggle="yes">et al.</italic>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref23">23</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Different big data approaches have been utilized for interpretation and identification of threads.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Senthilkumar Mohan 
                                    <italic toggle="yes">et al.</italic>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref24">24</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Worked on different machine learning classifiers for the defect prediction in which the maximum accuracy achieved was 88.4%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Salhi, D.E. 
                                    <italic toggle="yes">et al.</italic>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref25">25</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Three approaches have been utilized i.e., SVM, KNN, and neural network (NN) on different sized datasets. It found that NN was the most accurate with 93% accuracy.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Harshit Jindal 
                                    <italic toggle="yes">et al.</italic>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref26">26</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">A prediction system has been declared in this work where logistic regression and KNN have been utilized. An improved accuracy has been shown by the proposed model.</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>Another study has been done on the classification of arrhythmias for variations of heart rate.
                    <sup>
                        <xref ref-type="bibr" rid="ref3">3</xref>
                    </sup> Classification was performed by using a multi-layer perceptron neural network. The results stated that the accuracy achieved was 100% with Gaussian discriminant analysis (GDA). GDA optimization and heart rate variability (HRV) signal feature reduction were done later which then went up to 15 from 13.
                    <sup>
                        <xref ref-type="bibr" rid="ref4">4</xref>
                    </sup>
                </p>
                <p>It has been stated in the work by Zhang 
                    <italic toggle="yes">et al.,</italic> in 2018
                    <sup>
                        <xref ref-type="bibr" rid="ref5">5</xref>
                    </sup> that 100% precision has been achieved with the support vector machines classifier. Many researchers utilized principal component analysis (PCA) to deal with high dimensional data. The Adaboost model was utilized in another study by using PCA for breast cancer detection.
                    <sup>
                        <xref ref-type="bibr" rid="ref5">5</xref>
                    </sup>
                </p>
                <p>In this work, the focus is on optimizing the model of ML for the prediction of heart disease and the overfitting problem. It is certainly possible to address overfitting problem while working with Logistic Regression. A random sample can be drawn from the complete dataset to avoid overfitting issues. Also, the work focuses on training the model on samples of data obtained from the UCI Machine Learning repository. So, the aim of this study is to improve the prediction of heart disease.</p>
            </sec>
            <sec id="sec3">
                <title>Machine learning research methods</title>
                <p>In this section the description of methods implemented and the techniques used in machine learning research (MLR) are provided. The ML approach and the challenges related to the same are discussed and then selected methods are described.</p>
                <p>An active learning approach is utilized to implement the model. 
                    <xref ref-type="fig" rid="f2">Figure 2</xref> shows the base framework to the active approach of learning.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>The basic approach to active learning.</title>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/135913/5e535a33-a4f3-4274-9b96-ae5a6255249f_figure2.gif"/>
                </fig>
                <p>In the digital world, electronic health records have taken over to gather health data digitally which made it easier to collect data and allowed for data to become cheaper and more accessible in terms of availability. However, along with the easy availability of the data, there is also the issue of unstructured data which contains a lot of issues including redundancy, noise, heterogeneity, and diversity in scale.</p>
                <p>Health care and diseases comprise of different outcomes including binary i.e., 0 or 1 which means 0 as &#x2018;death&#x2019; or any other events, and 1 as continuous outcomes i.e., staying duration. Other outcomes include ordinal ones such as tumor grading, life quality, survival outcomes i.e., any clinical trials or survival from cancer, etc.</p>
                <p>ML provides versatility in analyzing these data and providing some more precise results.</p>
            </sec>
            <sec id="sec4">
                <title>Highlights</title>
                <p>
                    <list list-type="bullet">
                        <list-item>
                            <label>-</label>
                            <p>ML is an effective way to optimize the prediction of heart disease and the related effects.</p>
                        </list-item>
                        <list-item>
                            <label>-</label>
                            <p>A good understanding of the required parameters for the diagnosis of the disease can be highly helpful in making precise and accurate predictions.</p>
                        </list-item>
                        <list-item>
                            <label>-</label>
                            <p>Cardiovascular (CV) disease research and treatment coupled with some high-performance tools for analysis can improve the knowledge about the domain.</p>
                        </list-item>
                    </list>
                </p>
            </sec>
        </sec>
        <sec id="sec5">
            <title>Literature survey</title>
            <p>A thorough search has been done of the previous work on the domain of the heart disease using different algorithms. The previous 21 years of work has been considered for study and their shortcomings are noted down to further extend our research. A total of 50 papers from Web of science, Science direct, and Scopus were collected from which 27 were selected for final study after removal of duplicates and same domain-based papers.</p>
            <sec id="sec6">
                <title>Search Strategy</title>
                <p>The literature survey has been started from January 1, 2021 until December 31, 2021 from Scopus, Web of Science, and Science Direct and thorough analysis has been performed on the collected papers. The analysis is done to understand the challenges in the field of heart disease prediction. Collected papers were studied and pros and cons of the work were being observed on the basis of the evaluation parameters, methodology, and utilization of algorithms.</p>
                <p>The inclusion criteria was based on identifying the papers which are of related domain, utilization of latest machine learning algorithms, challenging area in domain of heart disease. Search terms for identifying papers are &#x201c;machine learning based health disease prediction&#x201d;, &#x201c;optimization of Health disease prediction&#x201d;, &#x201c;Challenges in identifying health disease&#x201d;. The exclusion criteria included removing duplicate papers, papers which presented inferior work in terms of evaluation parameter values, and obsolete work.</p>
                <p>In one study, an electronic health record (ehr) model based on sequential modeling was designed with the utilization of a neural network.
                    <sup>
                        <xref ref-type="bibr" rid="ref6">6</xref>
                    </sup> The EHR was applied for experiment conduction and predicting of heart disease. Researchers in this work used word vectors and hot encryption for modeling diagnostic situations and predicting cardiac failure. Along with the same approach, an extended memory model based on the network was utilized. The work stated that it is very necessary for taking care of the sequential character of healthcare with the help of results analysis. The sequential character of healthcare includes tracking of a behavior of person like his/her health-based activities, change in healthcare providers during sickness, exercise routine, diet routine etc.</p>
                <p>The artificial neural network (ANN), random forest, K-Nearest Neighbor (KNN), and support vector machine techniques were used in another work.
                    <sup>
                        <xref ref-type="bibr" rid="ref7">7</xref>
                    </sup> It stated that ANN produced the highest accuracy for heart disease predictions compared to the earlier classification algorithms. The work presented highly efficient results in terms of accuracy and other evaluation measures included in the study.</p>
                <p>Another work stated that PCA as a dimensionality reduction technique can be utilized to deal with data having high dimensions and variance. More information can be stored utilizing this approach in new components.
                    <sup>
                        <xref ref-type="bibr" rid="ref8">8</xref>
                    </sup> When working with data with high dimensionality, many researchers choose to employ PCA. Five unsupervised (linear and nonlinear) dimensionality reduction techniques were utilized, as well as NN as a classifier, to classify cardiac arrhythmia.
                    <sup>
                        <xref ref-type="bibr" rid="ref9">9</xref>
                    </sup> With a minimum of 10 components, an F1 score of 99.83% was achieved with fast independent component analysis (FastICA) which was used for the ICA for breast cancer diagnosis.</p>
                <p>Another researcher employed the AdaBoost algorithm, based on PCA.
                    <sup>
                        <xref ref-type="bibr" rid="ref10">10</xref>
                    </sup> A combination of uncorrelated discriminant analysis and PCA was applied to select the optimal features for controlling upper limb motions.
                    <sup>
                        <xref ref-type="bibr" rid="ref11">11</xref>
                    </sup>
                </p>
                <p>Using PCA approaches to time-frequency representations, another researcher attempted to minimize heart sounds to improve performance.
                    <sup>
                        <xref ref-type="bibr" rid="ref12">12</xref>
                    </sup> A scale-invariant feature, Principle Component Analysis-K-Nearest Neighbor (PCA-KNN), was used in medical pictures for scaling to develop a new approach for diverse medical images that achieved an 83.6% accuracy with 200 images used for training the machine.
                    <sup>
                        <xref ref-type="bibr" rid="ref13">13</xref>
                    </sup> A gray-level threshold of 150 was utilized as a result of PCA and Return on Investment (ROI), all of which were used to reduce X-ray picture characteristics.
                    <sup>
                        <xref ref-type="bibr" rid="ref14">14</xref>
                    </sup>
                </p>
                <p>Diabetics are more likely to suffer from cardiovascular (CV)disease. In determining CV risk-assessment methods, both fasting glucose levels and glycosylated hemoglobin have been used. The evidence that these components are being used is inconclusive. According to the cardiovascular heart study,
                    <sup>
                        <xref ref-type="bibr" rid="ref15">15</xref>
                    </sup> the relationship between fasting blood glucose and CV risk is relatively weakly associated. Similarly, multiple studies were done by other researchers
                    <sup>
                        <xref ref-type="bibr" rid="ref15">15</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref16">16</xref>
                    </sup> which have shown a correlation between glycosylated hemoglobin and CV risk, as well as postprandial glucose levels.</p>
                <p>Because of our genetic diversity, cultures, dietary habits, and social and behavioral features, available risk-assessment measures are not universal. In a review of the worldwide burden of CV illness, researchers discovered that various populations have varied disease burdens as well as different main Rheumatic fever (RFs) that contribute to this burden. The Asia Pacific Cohort studies sought to compare the Asian and Framingham cohorts in terms of risk factors and illness incidence and discovered that the Framingham group had greater systolic blood pressure, total cholesterol, and CV events, whereas the Asian cohort had higher smoking rates.
                    <sup>
                        <xref ref-type="bibr" rid="ref17">17</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref18">18</xref>
                    </sup> There has been no consensus on the risk-assessment tools to employ in Asian populations for risk stratification. As a result, clinicians are perplexed and are unable to use risk stratification to prioritize individuals for primary prevention strategies. So, it has been stated that it will be beneficial to develop a predictive equation from the population-based on gathered data on a contemporary and representative basis. The current mixture of known and unknown RF based on genetic traits has been considered.
                    <sup>
                        <xref ref-type="bibr" rid="ref19">19</xref>
                    </sup> As a result, we must be aware of the limits of each of these risk-assessment techniques and interpret the results with caution.
                    <sup>
                        <xref ref-type="bibr" rid="ref20">20</xref>
                    </sup>
                </p>
                <p>Another work presented on different ML classifiers on which later comparative analysis is also performed.
                    <sup>
                        <xref ref-type="bibr" rid="ref21">21</xref>
                    </sup> This work was performed on data mining approaches like Sequential minimal optimization (SMO), na&#x00ef;ve Bayes, and J48 decision trees.</p>
                <p>The maximum accuracy has been achieved with SMO with 89%. The J48 decision tree experiment provided an accuracy of 86% and na&#x00ef;ve bayes classifier gave an accuracy of 87%.</p>
            </sec>
        </sec>
        <sec id="sec7" sec-type="methods">
            <title>Methods</title>
            <sec id="sec8">
                <title>Study design</title>
                <p>Each step of this study is outlined below. Exploratory data analysis (EDA) is used for mistake detection, finding appropriate data, and checking the relationship between variables of exploratory analysis. In this work the heart disease-based risk factors are taken into consideration and ultimately the prediction of the heart attack. The ML classifiers utilized for the work are logistic regression, support vector machines, na&#x00ef;ve Bayes, and XGBoost. A detailed literature survey has been performed considering the previous experiments conducted to predict the heart disease and the classifiers SVM, Logistic Regression, Na&#x00ef;ve Bayes, and XGBoost are taken into consideration on the basis of their performance attributes. The experiment is carried out on a Cleveland dataset which contains 294 tuples having 14 attributes. A flowchart of the process is presented in 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>.
                    <list list-type="order">
                        <list-item>
                            <label>1.</label>
                            <p>The first step is gathering data which is represented as &#x2018;acquisition&#x2019;. This included evaluating physical conditions and considering the numeric data by converting the samples which will be utilized by the computer to manipulate.</p>
                            <list list-type="alpha-lower">
                                <list-item>
                                    <label>a.</label>
                                    <p>The data collected is taken from the UCI ML repository
                                        <sup>
                                            <xref ref-type="bibr" rid="ref28">28</xref>
                                        </sup> as outlined in the data collection section, having multiple attributes to study the risk factors for heart disease.</p>
                                </list-item>
                                <list-item>
                                    <label>b.</label>
                                    <p>All experiments in this study are performed on Python 3.8.3.</p>
                                </list-item>
                            </list>
                        </list-item>
                        <list-item>
                            <label>2.</label>
                            <p>The second step is &#x2018;pre-processing&#x2019; where we tackled issues in the data such as missing values, outlier detection, and redundancy removal to clean the dataset. Predictive analysis has been performed for the uniform environment which also takes the application towards EDA.</p>
                            <list list-type="alpha-lower">
                                <list-item>
                                    <label>a.</label>
                                    <p>The collected data has been cleaned using pre-processing techniques including missing values replacement, outlier detection, and duplicacy removal.</p>
                                </list-item>
                                <list-item>
                                    <label>b.</label>
                                    <p>Missing values (if any) are being replaced with Mean values.</p>
                                </list-item>
                                <list-item>
                                    <label>c.</label>
                                    <p>Outliers in the data has been detected using Boxplots by understanding minimum, maximum, and interquartile ranges of data.</p>
                                </list-item>
                                <list-item>
                                    <label>d.</label>
                                    <p>Duplicacy removal in the data was performed by using a function dict() for generating dictionary to remove the duplicates.</p>
                                </list-item>
                            </list>
                        </list-item>
                        <list-item>
                            <label>3.</label>
                            <p>The third step is &#x2018;integration&#x2019; where libraries and different subsets were combined by importing independent modules in python and merging them to perform necessary experiments.</p>
                            <list list-type="alpha-lower">
                                <list-item>
                                    <label>a.</label>
                                    <p>First part of the experiment was to have the preprocessed data.</p>
                                </list-item>
                                <list-item>
                                    <label>b.</label>
                                    <p>The cleaned data was then integrated to apply ML algorithms.</p>
                                </list-item>
                            </list>
                        </list-item>
                        <list-item>
                            <label>4.</label>
                            <p>The fourth step is &#x2018;analysis&#x2019; where EDA was done to understand the relationship between different attributes of data (
                                <xref ref-type="table" rid="T2">Table 2</xref>).
                                <sup>
                                    <xref ref-type="bibr" rid="ref28">28</xref>
                                </sup>
                            </p>
                            <list list-type="alpha-lower">
                                <list-item>
                                    <label>a.</label>
                                    <p>Analysis works on the concept of learning from data, pattern identification and making decisions with least intervention of human beings.</p>
                                </list-item>
                                <list-item>
                                    <label>b.</label>
                                    <p>EDA is being utilized to understand the relationship between attributes.</p>
                                </list-item>
                                <list-item>
                                    <label>c.</label>
                                    <p>Variable were compared to understand the correlation and the same variables were analyzed using boxplots and heatmaps.</p>
                                </list-item>
                            </list>
                        </list-item>
                        <list-item>
                            <label>5.</label>
                            <p>The fifth step was &#x2018;intervention&#x2019; to get into the decision-making policies i.e., search strategy for understanding previous experimental studies to determine when it becomes efficient to utilize models for real-world problems effectively.</p>
                            <list list-type="alpha-lower">
                                <list-item>
                                    <label>a.</label>
                                    <p>A detailed literature survey was done to know the utilization of ML models for the same domain and to understand which are the most promising ones to optimize our results. The most promising papers were selected on the basis of their performance in previously implemented work in the similar domains for heart disease.</p>
                                </list-item>
                            </list>
                        </list-item>
                        <list-item>
                            <label>6.</label>
                            <p>The sixth step was&#x2019;application&#x2019; of ML algorithms in making the predictions. In this work, four machine learning models were utilized i.e., SVM, Na&#x00ef;ve Bayes, Logistic Regression, and XGBoost.</p>
                            <list list-type="alpha-lower">
                                <list-item>
                                    <label>a.</label>
                                    <p>SVM was applied on the data utilizing scikit learn with svm extension of python.</p>
                                </list-item>
                                <list-item>
                                    <label>b.</label>
                                    <p>Na&#x00ef;ve Bayes classifier is being applied by using Scikit learn library of neighbors in python.</p>
                                </list-item>
                                <list-item>
                                    <label>c.</label>
                                    <p>Logistic regression was utilized with linear model class of sklearn in python.</p>
                                </list-item>
                                <list-item>
                                    <label>d.</label>
                                    <p>XGBoost is a boosting algorithm which utilizes weak classifications and provide optimized results.</p>
                                </list-item>
                            </list>
                        </list-item>
                    </list>
                </p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Methodology flowchart.</title>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/135913/5e535a33-a4f3-4274-9b96-ae5a6255249f_figure3.gif"/>
                </fig>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>Table 2. </label>
                    <caption>
                        <title>Sample dataset showing 14 attributes essential for heart disease prediction.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Age</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Sex</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Chest pain (cp)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Resting blood pressure (trtbps)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Cholestoral (chol)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Fasting blood sugar (fbs)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Resting electrocardiographic (restecg)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Maximum heart rate achieved (thalachh)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Exercise induced angina (exng)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Oldpeak</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Slope (slp)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Number of major vessels (caa)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Thalium Stress Test (thall)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Output</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">60</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">145</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">233</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">150</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2.3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">35</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">130</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">250</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">187</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">3.5</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">41</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">130</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">204</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">172</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.4</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">55</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">120</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">236</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">178</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.8</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">56</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">120</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">354</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">163</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">55</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">140</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">192</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">148</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.4</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">56</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">140</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">294</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">153</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>The work is conducted step wise starting from gathering the data. Pre-processing has been done on the data to clean it including duplicacy removal, detection of Outliers, and filling up missing values with mean. Then the four machine learning classifiers has been applied i.e., Support Vector machines, Na&#x00ef;ve Bayes, Logistic Regression and XGBoost to further classify the outputs.</p>
            </sec>
            <sec id="sec9">
                <title>Data collection</title>
                <p>The dataset utilized is composed of four parts or sub-databases i.e., Hungary, Switzerland, Cleveland, and Long Beach which has 76 different attributes. In this work a subset of 14 attributes is utilized because all the published experiments in the literature review referred to these selected 14 attributes which helps to understand the major risk factors of heart disease. This dataset is available online in UCI repository to be availed freely for experimental purpose.
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup> The last column i.e., target value represents absence or presence of disease in the patient represented by binary of O or 1 respectively.</p>
                <p>The prediction is being performed on whole dataset and to present the attributes and behavior of dataset, the sample of the data set is shown in 
                    <xref ref-type="table" rid="T2">Table 2</xref> (whole dataset is not presented because of the size).
                    <sup>
                        <xref ref-type="bibr" rid="ref27">27</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup>
                </p>
            </sec>
            <sec id="sec10">
                <title>Exploration of dataset</title>
                <p>The dataset contains attributes and integer values which are distributed in a file (heart.csv)
                    <sup>
                        <xref ref-type="bibr" rid="ref29">29</xref>
                    </sup> whose link is provides at the end of the paper in the section of data availability.
                    <sup>
                        <xref ref-type="bibr" rid="ref27">27</xref>
                    </sup> The behavioral and attributes information of the complete dataset is given in 
                    <xref ref-type="table" rid="T3">Table 3</xref>. The attributes of the dataset utilized (risk factors of heart attack)
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup> are discussed below:
                    <list list-type="order">
                        <list-item>
                            <label>1.</label>
                            <p>Age (age): This is a highly crucial risk factor for the occurrence of heart attacks because the risk of getting heart attacks can double as age increases. In adults, the fatty streaks indicative of coronary artery disease starts to develop and it is proven that more than 80% cases of heart attacks due to coronary heart disease are in patients aged 65 or above.
                                <sup>
                                    <xref ref-type="bibr" rid="ref16">16</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>2.</label>
                            <p>Sex (sex): It has been proven that there is a higher risk of heart attack in men compared to women aged 50 or less.
                                <sup>
                                    <xref ref-type="bibr" rid="ref17">17</xref>
                                </sup> After the menopause in women, there is a debate of equal risk of heart attack in both men and women. The disease of diabetes in women increases the risk of a heart attack.</p>
                        </list-item>
                        <list-item>
                            <label>3.</label>
                            <p>Chest pain (cp): This happens when the muscle of the heart doesn&#x2019;t get enough blood with oxygen and is called angina. The feeling of squeezing or high pressure builds up in the chest and an uncomfortable feeling in shoulder, jaw, back, or neck can also develop along with the feeling of indigestion in angina. The pain can be felt in the hands. Different types of Angina include stable angina, pectoris, unstable angina, prinzmetal angina, and microvascular angina.</p>
                        </list-item>
                        <list-item>
                            <label>4.</label>
                            <p>Blood pressure (trtbps): Arteries can be affected by high blood pressure. This can occur because of different reasons like imbalanced cholesterol, high sugar, obesity etc. which can enhance the risks.</p>
                        </list-item>
                        <list-item>
                            <label>5.</label>
                            <p>Cholesterol (chol): Arteries again can get affected due to imbalanced or bad cholesterol. It narrows the arteries especially the low-density lipo-protein cholesterol. Another cause is the blood fat i.e., triglycerides with high levels of cholesterol which can also enhance the risk of heart attacks. So, it is advisable to maintain good cholesterol to lower the risk of a heart attack.</p>
                        </list-item>
                        <list-item>
                            <label>6.</label>
                            <p>Fasting blood sugar (fbs): High blood sugar can become a cause of a heart attack. It may happen due to lower hormone production by the pancreas or no response to insulin in the body.</p>
                        </list-item>
                        <list-item>
                            <label>7.</label>
                            <p>Resting Electrocardiographic (restecg): For medium to high risk of heart attack, the present scenario is not sufficient to understand the screening disadvantages. For those having less risk of disease, the screening harmful effects including a rash or irritation on skin can balance up with exercise.</p>
                        </list-item>
                        <list-item>
                            <label>8.</label>
                            <p>Heart rate (thalach): The increase in the heart rate with the enhanced risk of heart disease is being parallelized with risk increment with blood pressure enhancement.
                                <sup>
                                    <xref ref-type="bibr" rid="ref23">23</xref>
                                </sup>It is proven in research
                                <sup>
                                    <xref ref-type="bibr" rid="ref25">25</xref>
                                </sup>that if the heart rate increases by 10 bpm, then the chances of cardiac death increase by 20%. This is also the same with the enhancement in the blood pressure of 10 mm Hg.</p>
                        </list-item>
                        <list-item>
                            <label>9.</label>
                            <p>Angina (exng): The discomfort from Angina which is an Exercise-induced makes the person feel gripped, squeezed and tight which can carry from mild to serious. The pain is usually felt in the chest&#x2019;s center and it can spread up in the shoulders, back, jaw, arm or neck. Angina plays a crucial role in identifying coronary disease which makes it worthwhile to consider it a separate category for analysis.</p>
                        </list-item>
                        <list-item>
                            <label>10.</label>
                            <p>Thalium Stress Test (thall)
                                <italic toggle="yes">:</italic> Duration of the segment is very important because it needs to be checked that after peak stress, the recovery is happening constantly or not with a positive treadmill test. The abnormal values come under the downslope of depression with less than or equal to 1 mm with 60 to 80 ms. The equivocal tests i.e., with up-sloping segments are also there in the exercise.</p>
                        </list-item>
                    </list>
                </p>
                <table-wrap id="T3" orientation="portrait" position="float">
                    <label>Table 3. </label>
                    <caption>
                        <title>Dataset exploration for better understanding of the meaning of attributes in data.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Attribute</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Values</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Semantic</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Age</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Integer</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Patient's Age</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Sex</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Male: 0, Female: 1</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Patient's Gender</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">exang</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Yes: 1, No: 0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Angina Induction</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">ca</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0 to 3</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Major Vessel's count</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">cp</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0: typical Angina,
                                    <break/>1: Atypical Angina,
                                    <break/>2: Non-Anginal Pain,
                                    <break/>3: Asymptomatic</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Type of Chest pain</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">trtbps</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Integer in mm Hg</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Blood pressure</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">chol</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Integer in mg/dl</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Cholestrol value</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">fbs</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">True: 1, False: 0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Blood sugar level with fast</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rest_ecg</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0: normal, 1: ST-T wave abnomalitywith inversions and depression, 2: left ventricular hypertrophy (probable diagnosis or confirmed also)</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Electro-cardiographic results</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">thalach</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0: less chance, 1: more chance</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Heart rate</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>Rest 4 attributes, oldpeak, slope, number of major vessels, and output are the numeric values related to heart disease in the dataset and were not included in the 10 variables of this study.</p>
            </sec>
            <sec id="sec11">
                <title>ML models</title>
                <p>The study was completed with four ML models: XGBoost, support vector machines, na&#x00ef;ve Bayes, and logistic regression.</p>
                <p>1. Logistic regression: One of the very popular algorithms is considered as logistic regression which is a supervised learning model. It performs categorical predictions which can be &#x2018;true&#x2019; or &#x2018;false&#x2019;. This model provides probabilistic values instead of exact ones. This algorithm works on both continuous and discrete values. A simple S-Shaped curve can elaborate the logistic regression very precisely.</p>
                <p>2. Na&#x00ef;ve Bayes: A bayes theorem based algorithm, Na&#x00ef;ve Bayes is a supervised learning model which works for fast predictions. It is a probabilistic classifier and works very accurately on high dimensional data.</p>
                <p>3. Support vector machines (SVM): It is a supervised learning model which works on the concept of decision boundary or hyper plane. The aim of the algorithm is to maximize the margin of the hyper planes which helps in minimizing the misclassification problem. Model chooses extreme points to create the decision boundary which are called as support vectors.</p>
                <p>4. XGBoost: It is a decision tree classifier which has been implemented on gradient boosting framework. This model works on the principle that weak learners should be combined to produce best predictions. Ensembling is performed in sequential manner.</p>
            </sec>
        </sec>
        <sec id="sec12" sec-type="results">
            <title>Results</title>
            <p>In this work, the evaluation of the performance metrices are being done with four machine learning classifiers i.e., SVM, Na&#x00ef;ve Bayes, XGBoost, and logistic regression.</p>
            <p>XGBoost classifier provided best training and test scores of.91 and.89 along with the 92% accuracy. The results achieved are discussed below. 
                <xref ref-type="fig" rid="f4">Figures 4</xref> and 
                <xref ref-type="fig" rid="f5">5</xref> represents the interface for taking input from users and predicting using machine learning.</p>
            <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                <label>Figure 4. </label>
                <caption>
                    <title>Interface for considering symptoms.</title>
                </caption>
                <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/135913/5e535a33-a4f3-4274-9b96-ae5a6255249f_figure4.gif"/>
            </fig>
            <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                <label>Figure 5. </label>
                <caption>
                    <title>Prediction following interface.</title>
                </caption>
                <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/135913/5e535a33-a4f3-4274-9b96-ae5a6255249f_figure5.gif"/>
            </fig>
            <p>
                <xref ref-type="fig" rid="f6">Figure 6</xref> represents distribution of attribute values. 
                <xref ref-type="fig" rid="f7">Figure 7</xref> shows the box plots to understand the median values of data.</p>
            <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                <label>Figure 6. </label>
                <caption>
                    <title>Attributes distribution of values.</title>
                </caption>
                <graphic id="gr6" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/135913/5e535a33-a4f3-4274-9b96-ae5a6255249f_figure6.gif"/>
            </fig>
            <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                <label>Figure 7. </label>
                <caption>
                    <title>Box plots to represent the second and third quartiles to indicate the median value.</title>
                </caption>
                <graphic id="gr7" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/135913/5e535a33-a4f3-4274-9b96-ae5a6255249f_figure7.gif"/>
            </fig>
            <p>The training and testing was evaluated for each machine learning classifier and results achieved are shown in 
                <xref ref-type="fig" rid="f8">Figure 8</xref>. The training score came up maximum with XGBoost as 91% and Test score also came maximum with XGBoost as 89%.</p>
            <fig fig-type="figure" id="f8" orientation="portrait" position="float">
                <label>Figure 8. </label>
                <caption>
                    <title>Training and test scores of machine learning classifiers.</title>
                </caption>
                <graphic id="gr8" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/135913/5e535a33-a4f3-4274-9b96-ae5a6255249f_figure8.gif"/>
            </fig>
            <p>
                <xref ref-type="fig" rid="f9">Figure 9</xref> shows the results for different evaluation metrics and 
                <xref ref-type="table" rid="T4">Table 4</xref> provides the evaluated values for different machine learning classifiers.</p>
            <fig fig-type="figure" id="f9" orientation="portrait" position="float">
                <label>Figure 9. </label>
                <caption>
                    <title>Evaluation measures for different classifiers.</title>
                </caption>
                <graphic id="gr9" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/135913/5e535a33-a4f3-4274-9b96-ae5a6255249f_figure9.gif"/>
            </fig>
            <table-wrap id="T4" orientation="portrait" position="float">
                <label>Table 4. </label>
                <caption>
                    <title>Evaluated results for machine learning classifiers.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="top"/>
                            <th align="left" colspan="1" rowspan="1" valign="top">Accuracy</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">F1-Score</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Precision</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Recall</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Logistic Regression</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.85</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.83</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.85</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.82</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Na&#x00ef;ve Bayes</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.82</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.87</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.86</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.88</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">SVM</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.64</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.73</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.60</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.95</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">XGBoost</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.92</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.91</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.93</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.92</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <p>On the basis of the evaluation, the area under the curve has been generated for the work which is shown in 
                <xref ref-type="fig" rid="f10">Figure 10</xref> and 
                <xref ref-type="fig" rid="f11">Figure 11</xref>. 
                <xref ref-type="fig" rid="f10">Figure 10</xref> compares True Positive Rate (TPR) and False Positive Rate (FPR). 
                <xref ref-type="fig" rid="f11">Figure 11</xref> shows area under the curve for all machine learning classifiers.</p>
            <fig fig-type="figure" id="f10" orientation="portrait" position="float">
                <label>Figure 10. </label>
                <caption>
                    <title>Receiver operating characteristic (ROC) for different classifiers.</title>
                </caption>
                <graphic id="gr10" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/135913/5e535a33-a4f3-4274-9b96-ae5a6255249f_figure10.gif"/>
            </fig>
            <fig fig-type="figure" id="f11" orientation="portrait" position="float">
                <label>Figure 11. </label>
                <caption>
                    <title>Area under the curve (AUC) for the performance of the classification model.</title>
                </caption>
                <graphic id="gr11" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/135913/5e535a33-a4f3-4274-9b96-ae5a6255249f_figure11.gif"/>
            </fig>
            <p>In the work, maximum accuracy was achieved through XGBoost algorithm. Area under the curve, precision, and recall are also evaluated to understand the performance of algorithms.</p>
        </sec>
        <sec id="sec13" sec-type="discussion">
            <title>Discussion</title>
            <p>Some previous researchers proposed that the datasets should be small to deploy ML classifiers, which has been proved in this work. Additionally, the computation time was reduced, which is significant when the model has been deployed. The requirement for the normalization of the dataset has also been felt during the work and the overfitting can be there while training the model. Minimal accuracy has been achieved during evaluation of the real world problem based data. The data can be normalized in a range of methods, and the results can be compared. More techniques to connect heart-disease trained ML models with specific multimedia for the convenience of patients and clinicians could be discovered. The optimized results have been achieved in the presented work and XGBoost provided best results when it came on to accuracy as 92 % and Area under the curve as 94%. Future work will be on optimizing the performance of algorithms with hybrid approach for the prediction of heart disease.</p>
        </sec>
        <sec id="sec14" sec-type="conclusion">
            <title>Conclusion</title>
            <p>The comparative evaluation of four machine learning algorithms for the heart disease prediction was carried out in this study, with promising outcomes. In this investigation, the performance of ML approaches has been better. When data pre-processing was used, XGBoost performed better in the ML technique for the 13 features in the dataset. The training and test score achieved for the XGBoost was highest with the values 91% and 89% respectively. Similar results of 92% accuracy and AUC score of 0.94 was achieved with XGBoost.</p>
            <p>In the future, this research will be expanded by identifying and integrating new features from total of 76 features of heart disease. It also intends to employ other classification methods, such as deep learning to optimize the prediction. The goal is to study and merge more datasets in order to create a more relevant dataset that encompasses a broad range of population types. The feature selection can be used to generate more relevant features and effective results for the prediction of heart disease.</p>
        </sec>
        <sec id="sec15">
            <title>Data availability</title>
            <sec id="sec16">
                <title>Underlying data</title>
                <p>Figshare: heart.csv. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.20236848.v1">https://doi.org/10.6084/m9.figshare.20236848.v1</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref27">27</xref>
                    </sup>
                </p>
                <p>The project contains the following underlying data:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>heart.csv (underlying data contains 14 features).</p>
                        </list-item>
                    </list>
                </p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
            </sec>
        </sec>
        <sec id="sec17">
            <title>Software availability</title>
            <p>Software available from: 
                <ext-link ext-link-type="uri" xlink:href="https://ipython.org/notebook.html">https://ipython.org/notebook.html</ext-link>
            </p>
            <p>Source code available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/nandalneha/heart_disease">https://github.com/nandalneha/heart_disease</ext-link>
            </p>
            <p>Archived source code at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.6934185">https://doi.org/10.5281/zenodo.6934185</ext-link>.</p>
            <p>License: GNU General Public License 3</p>
        </sec>
    </body>
    <back>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fatima</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pasha</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Survey of machine learning algorithms for disease diagnostic.</article-title>
                    <source>

                        <italic toggle="yes">J. Intell. Learn. Syst. Appl.</italic>
</source>
                    <year>2017</year>;<volume>09</volume>:<fpage>1</fpage>&#x2013;<lpage>16</lpage>.
                    <pub-id pub-id-type="doi">10.4236/jilsa.2017.91001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Singh</surname>
                            <given-names>RS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Saini</surname>
                            <given-names>BS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sunkaria</surname>
                            <given-names>RK</given-names>
                        </name>
</person-group>:
                    <article-title>Detection of coronary artery disease by reduced features and extreme learning machine.</article-title>
                    <source>

                        <italic toggle="yes">Med. Pharm. Rep.</italic>
</source>
                    <year>2018</year>;<volume>91</volume>(<issue>2</issue>):<fpage>166</fpage>&#x2013;<lpage>175</lpage>.
                    <pub-id pub-id-type="pmid">29785154</pub-id>
                    <pub-id pub-id-type="doi">10.15386/cjmed-882</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yaghouby</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ayatollahi</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Soleimani</surname>
                            <given-names>R</given-names>
                        </name>
</person-group>:
                    <article-title>Classification of cardiac abnormalities using reduced features of heart rate variability signal.</article-title>
                    <source>

                        <italic toggle="yes">World Appl. Sci. J.</italic>
</source>
                    <year>2009</year>;<volume>6</volume>(<issue>11</issue>):<fpage>1547</fpage>&#x2013;<lpage>1554</lpage>.</mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Asl</surname>
                            <given-names>BM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Setarehdan</surname>
                            <given-names>SK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mohebbi</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Support vector machine-based arrhythmia classification using reduced features of heart rate variability signal.</article-title>
                    <source>

                        <italic toggle="yes">Artif. Intell. Med.</italic>
</source>
                    <year>2008</year>;<volume>44</volume>(<issue>1</issue>):<fpage>51</fpage>&#x2013;<lpage>64</lpage>.
                    <pub-id pub-id-type="pmid">18585905</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.artmed.2008.04.007</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zou</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhou</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Access.</italic>
</source>
                    <year>2018</year>;<volume>6</volume>:<fpage>28936</fpage>&#x2013;<lpage>28944</lpage>.
                    <pub-id pub-id-type="doi">10.1109/access.2018.2837654</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jin</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Che</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>Z</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Predicting the Risk of Heart Failure With EHR Sequential Data Modeling.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Access.</italic>
</source>
                    <year>2018</year>;<volume>6</volume>:<fpage>9256</fpage>&#x2013;<lpage>9261</lpage>.
                    <pub-id pub-id-type="doi">10.1109/ACCESS.2017.2789324</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Alex</surname>
                            <given-names>MP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shaji</surname>
                            <given-names>SP</given-names>
                        </name>
</person-group>:
                    <article-title>Predictionand Diagnosis of Heart Disease Patients using Data Mining Technique.</article-title>
                    <source>

                        <italic toggle="yes">2019 International Conference on Communication and Signal Processing (ICCSP).</italic>
</source>
                    <year>2019</year>; pp.<fpage>0848</fpage>&#x2013;<lpage>0852</lpage>.
                    <pub-id pub-id-type="doi">10.1109/ICCSP.2019.8697977</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Guyon</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gunn</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nikravesh</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <source>

                        <italic toggle="yes">Feature Extraction: Foundations and Applications.</italic>
</source>
                    <publisher-loc>Cham, Switzerland</publisher-loc>:
                    <publisher-name>Springer</publisher-name>;<year>2008</year>.</mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rajagopal</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ranganathan</surname>
                            <given-names>V</given-names>
                        </name>
</person-group>:
                    <article-title>Evaluation of effect of unsupervised dimensionality reduction techniques on automated arrhythmia classification.</article-title>
                    <source>

                        <italic toggle="yes">Biomed. Signal Process Control.</italic>
</source>
                    <year>2017</year>;<volume>34</volume>:<fpage>1</fpage>&#x2013;<lpage>8</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.bspc.2016.12.017</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zou</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhou</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Access.</italic>
</source>
                    <year>2018</year>;<volume>6</volume>:<fpage>28936</fpage>&#x2013;<lpage>28944</lpage>.
                    <pub-id pub-id-type="doi">10.1109/access.2018.2837654</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Negi</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kumar</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mishra</surname>
                            <given-names>VM</given-names>
                        </name>
</person-group>:
                    <article-title>Feature extraction and classification for EMG signals using linear discriminant analysis.</article-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 2016 2nd International Conference on Advances in Computing, Communication, &amp; Automation (ICACCA) (Fall); September 2016; Bareilly, India. IEEE.</italic>
</source>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Avenda&#x00f1;o-Valencia</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Martinez-Tabares</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Acosta-Medina</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Godino-Llorente</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Castellanos-Dominguez</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>TFR-based feature extraction using PCA approaches for discrimination of heart murmurs.</article-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society; Minneapolis, MN, USA. IEEE</italic>
</source>
                    <year>September 2009</year>; pp.<fpage>5665</fpage>&#x2013;<lpage>5668</lpage>.</mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kamencay</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hudec</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Benco</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zachariasova</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Feature extraction for object recognition using PCA-KNN with application to medical image analysis.</article-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 2013 36th International Conference on Telecommunications and Signal Processing (TSP); Rome, Italy. IEEE</italic>
</source>
                    <year>July 2013</year>; pp.<fpage>830</fpage>&#x2013;<lpage>834</lpage>.</mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ratnasari</surname>
                            <given-names>NR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Susanto</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Soesanti</surname>
                            <given-names>I</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Thoracic X-ray features extraction using thresholding-based ROI template and PCA-based features selection for lung TB classification purposes.</article-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 2013 3rd International Conference on Instrumentation, Communications, Information Technology and Biomedical Engineering (ICICI-BME); Bandung, Indonesia. IEEE</italic>
</source>
                    <year>November 2013</year>; pp.<fpage>65</fpage>&#x2013;<lpage>69</lpage>.</mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Conti</surname>
                            <given-names>AA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Minelli</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gensini</surname>
                            <given-names>GF</given-names>
                        </name>
</person-group>:
                    <article-title>Global management of high risk patients: integrated primary cardiovascular prevention in diabetics.</article-title>
                    <source>

                        <italic toggle="yes">Int. Congr. Ser.</italic>
</source>
                    <year>2003</year>;<volume>207</volume>:<fpage>10</fpage>&#x2013;<lpage>20</lpage>.</mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Khaw</surname>
                            <given-names>K-T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wareham</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Luben</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Glycated haemoglobin, diabetes and mortality in men in Norfolk Cohort of European Perspective Investigation of Cancer and Nutrition (EPIC-Norfolk).</article-title>
                    <source>

                        <italic toggle="yes">BMJ.</italic>
</source>
                    <year>2001</year>;<volume>322</volume>:<fpage>15</fpage>&#x2013;<lpage>18</lpage>.
                    <pub-id pub-id-type="pmid">11141143</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmj.322.7277.15</pub-id>
                    <pub-id pub-id-type="pmcid">PMC26599</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yusuf</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Reddy</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ounpuu</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Global Burden of Cardiovascular Diseases: Part II: Variations in cardiovascular disease by specific ethnic groups and geographic regions and prevention strategies.</article-title>
                    <source>

                        <italic toggle="yes">Circulation.</italic>
</source>
                    <year>2001</year>;<volume>104</volume>:<fpage>2855</fpage>&#x2013;<lpage>2864</lpage>.
                    <pub-id pub-id-type="doi">10.1161/hc4701.099488</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hong</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ralph</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Predictive value for the Chinese population of the Framingham CHD risk assessment tool compared with the Chinese Multi-provincial Cohort Study.</article-title>
                    <source>

                        <italic toggle="yes">JAMA.</italic>
</source>
                    <year>2004</year>;<volume>291</volume>:<fpage>2591</fpage>&#x2013;<lpage>2599</lpage>.
                    <pub-id pub-id-type="doi">10.1001/jama.291.21.2591</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tonkin</surname>
                            <given-names>AM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lim</surname>
                            <given-names>SS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schirmer</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>Cardiovascular risk factors: when should we treat?.</article-title>
                    <source>

                        <italic toggle="yes">Med. J. Aust.</italic>
</source>
                    <year>2003</year>;<volume>178</volume>:<fpage>101</fpage>&#x2013;<lpage>102</lpage>.
                    <pub-id pub-id-type="doi">10.5694/j.1326-5377.2003.tb05092.x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Brahmi</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>Mirsaeid Hosseini Shirvani, &#x201c;Prediction and Diagnosis of Heart Disease by Data Mining Techniques&#x201d;.</article-title>
                    <source>

                        <italic toggle="yes">J. Multidiscip. Eng. Sci. Technol.</italic>
</source>
                    <year>2015 February</year>;<volume>2</volume>(<issue>2</issue>):<fpage>164</fpage>&#x2013;<lpage>168</lpage>.</mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sultana</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Haider</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Heart Disease Prediction using WEKA tool and 10-Fold cross-validation.</italic>
</source>
                    <publisher-name>The Institute of Electrical and Electronics Engineers</publisher-name>;<year>March 2017</year>.</mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Beyene</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kamat</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Survey on Prediction and Analysis the Occurrence of Heart Disease Using Data Mining Techniques.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Pure Appl. Math.</italic>
</source>
                    <year>2018</year>.</mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mooney</surname>
                            <given-names>SJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pejaver</surname>
                            <given-names>V</given-names>
                        </name>
</person-group>:
                    <article-title>Big data in public health: Terminology, Machine Learning, and Privacy.</article-title>
                    <source>

                        <italic toggle="yes">Annu. Rev. Public Health.</italic>
</source>
                    <year>2018</year>;<volume>39</volume>:<fpage>95</fpage>&#x2013;<lpage>112</lpage>.
                    <pub-id pub-id-type="pmid">29261408</pub-id>
                    <pub-id pub-id-type="doi">10.1146/annurev-publhealth-040617-014208</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mohan</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Thirumalai</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Srivastava</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Access.</italic>
</source>
                    <year>2019</year>;<volume>7</volume>:<fpage>81542</fpage>&#x2013;<lpage>81554</lpage>.
                    <pub-id pub-id-type="doi">10.1109/ACCESS.2019.2923707</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Salhi</surname>
                            <given-names>DE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tari</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kechadi</surname>
                            <given-names>MT</given-names>
                        </name>
</person-group>:
                    <chapter-title>Using Machine Learning for Heart Disease Prediction.</chapter-title>
                    <person-group person-group-type="editor">

                        <name name-style="western">
                            <surname>Senouci</surname>
                            <given-names>MR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Boudaren</surname>
                            <given-names>MEY</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sebbak</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>, editors.
                    <source>

                        <italic toggle="yes">Advances in Computing Systems and Applications. CSA 2020. Lecture Notes in Networks and Systems.</italic>
</source>
                    <publisher-loc>Cham.</publisher-loc>:
                    <publisher-name>Springer</publisher-name>;vol. 199.
                    <pub-id pub-id-type="doi">10.1007/978-3-030-69418-0_7</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jindal</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Agrawal</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Khera</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <source>

                        <italic toggle="yes">IOP Conf. Ser.: Mater. Sci. Eng.</italic>
</source>
                    <year>2021</year>;<volume>1022</volume>:<fpage>012072</fpage>.
                    <pub-id pub-id-type="doi">10.1088/1757-899X/1022/1/012072</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <label>27</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nandal</surname>
                            <given-names>N</given-names>
                        </name>
</person-group>:
                    <article-title>heart.csv. Figshare. Dataset.</article-title>
                    <year>2022</year>.
                    <pub-id pub-id-type="doi">10.6084/m9.figshare.20236848.v1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <label>28</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Janosi</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Steinbrunn</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pfisterer</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Heart Disease. UCI Machine Learning Repository.</article-title>
                    <year>1988</year>.</mixed-citation>
            </ref>
            <ref id="ref29">
                <label>29</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Neha</surname>
                            <given-names>N</given-names>
                        </name>
</person-group>:
                    <article-title>nandalneha/heart_disease: (heart.csv). Zenodo. Software.</article-title>
                    <year>2022</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.6934185</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report204980">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.135913.r204980</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Yahaya</surname>
                        <given-names>Lamido</given-names>
                    </name>
                    <xref ref-type="aff" rid="r204980a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6234-6953</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Gambo Lawan</surname>
                        <given-names>Farouk</given-names>
                    </name>
                    <xref ref-type="aff" rid="r204980a1">1</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r204980a1">
                    <label>1</label>Department of Cyber Security, Federal University Dutse, Dutse, Jigawa, Nigeria</aff>
                <aff id="r204980a2">
                    <label>2</label>Department of Computer Science, Gombe State University, Gombe, Nigeria</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>21</day>
                <month>9</month>
                <year>2023</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Gambo Lawan F and Yahaya L</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport204980" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.123776.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <bold>INTRODUCTION</bold>
            </p>
            <p> The authors have introduced the work very well.&#x00a0;</p>
            <p> </p>
            <p> 
                <bold>LITERATURE</bold>
            </p>
            <p> The authors have done well in trying to correlate the literature with their work. However, it would have been helpful if they could look into the following: 
                <list list-type="order">
                    <list-item>
                        <p>Heart disease is a general term for a range of cardiac related complications, such as the heart attack (topic of the article), stroke, coronary artery disease, valvular heart disease, cardiomyopathy, etc. In this regard, I think the title of the article is more specific than the literature, even though both need to reflect each other.</p>
                    </list-item>
                    <list-item>
                        <p>The authors made mentioned of reviewing previous studies conducted for the past 21 years that used latest machine learning algorithms. I think the correlation between 21 year old literature and latest machine learning algorithms can help other researchers understand the work better.</p>
                    </list-item>
                    <list-item>
                        <p>Considering the period covered for the literature survey (21 years), presenting more specific literature will go a long way in paving the way for other researchers to have a clearer direction of the study.</p>
                    </list-item>
                </list> 
                <bold>METHODOLOGY</bold>
            </p>
            <p> In their effort to optimize the prediction, the authors have beautifully selected 10 heart disease features upon which the XGBoost classifier performed better than the others. For the benefit of other researchers, it will be more beneficial if the authors could provide the details of the optimization technique(s) used in the work.</p>
            <p> </p>
            <p> 
                <bold>RESULTS</bold>
            </p>
            <p> The authors have presented a very good result, with no overfitting or underfitting, which shows the correlation between training and testing results. However, other researchers may be interested in knowing the ratio of training set to testing set of the data utilized that arrived at this beautiful result.&#x00a0;</p>
            <p> </p>
            <p> 
                <bold>CONCLUSION</bold>
            </p>
            <p> The authors have given a very brief and precise conclusion. However, readers of the article may require more clarification about the 13 features mentioned here against 10 features utilized in training and evaluation of the selected algorithms.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>I cannot comment. A qualified statistician is required.</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Application of machine learning techniques for disease prediction, more especially cardiovascular diseases.</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
