<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.73009.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>An intelligent decision support system for crop yield prediction using hybrid machine learning algorithms</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved, 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Anbananthen</surname>
                        <given-names>Kalaiarasi Sonai Muthu</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0540-2872</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Subbiah</surname>
                        <given-names>Sridevi</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6173-8189</uri>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Chelliah</surname>
                        <given-names>Deisy</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6140-1682</uri>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Sivakumar</surname>
                        <given-names>Prithika</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Somasundaram</surname>
                        <given-names>Varsha</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Velshankar</surname>
                        <given-names>Kethaarini Harshana</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Khan</surname>
                        <given-names>M.K.A.Ahamed</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Faculty of Information Science Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia</aff>
                <aff id="a2">
                    <label>2</label>Department of Information Technology,, Thiagarajar College of Engineering, Madurai, Tamil Nadu, India</aff>
                <aff id="a3">
                    <label>3</label>Faculty of Engineering, UCSI University, Kuala Lumpur, 56000, Malaysia</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:kalaiarasi@mmu.edu.my">kalaiarasi@mmu.edu.my</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>11</day>
                <month>11</month>
                <year>2021</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2021</year>
            </pub-date>
            <volume>10</volume>
            <elocation-id>1143</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>21</day>
                    <month>10</month>
                    <year>2021</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Anbananthen KSM et al.</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/10-1143/pdf"/>
            <abstract>
                <p>
                    <bold>Background</bold>: In recent times, digitization is gaining importance in different domains of knowledge such as agriculture, medicine, recommendation platforms, the Internet of Things (IoT), and weather forecasting. In agriculture, crop yield estimation is essential for improving productivity and decision-making processes such as financial market forecasting, and addressing food security issues. The main objective of the article is to predict and improve the accuracy of crop yield forecasting using hybrid machine learning (ML) algorithms.</p>
                <p>
                    <bold>Methods:</bold> This article proposes hybrid ML algorithms that use specialized ensembling methods such as stacked generalization, gradient boosting, random forest, and least absolute shrinkage and selection operator (LASSO) regression. Stacked generalization is a new model which learns how to best combine the predictions from two or more models trained on the dataset. To demonstrate the applications of the proposed algorithm, aerial-intel datasets from the github data science repository are used.</p>
                <p>
                    <bold>Results:</bold> Based on the experimental results done on the agricultural data, the following observations have been made. The performance of the individual algorithm and hybrid ML algorithms are compared using cross-validation to identify the most promising performers for the agricultural dataset. The accuracy of random forest regressor, gradient boosted tree regression, and stacked generalization ensemble methods are 87.71%, 86.98%, and 88.89% respectively.</p>
                <p>
                    <bold>Conclusions:</bold> The proposed stacked generalization ML algorithm statistically outperforms with an accuracy of 88.89% and hence demonstrates that the proposed approach is an effective algorithm for predicting crop yield. The system also gives fast and accurate responses to the farmers.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Machine Learning</kwd>
                <kwd>Prediction</kwd>
                <kwd>Crop</kwd>
                <kwd>Stacked Generalization</kwd>
                <kwd>Random Forest</kwd>
                <kwd>Regression</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <p>The tremendous increases in population and random climatic changes have laid down a great challenge to the agricultural sector in terms of the unavailability of food, productivity, and sustainability. Although farmers are skilled in the cultivation of crops, there is a huge gap between scientific and technological knowledge, and their availability in rural areas. One of the key challenges for a country's food security is climate change and its effects in the form of extreme weather events. The increase in temperature of 1-2.5 degrees Celsius forecast for 2030 is likely to have serious effects on crop yields (
                <xref ref-type="bibr" rid="ref3">Bhanumathi 
                    <italic toggle="yes">et al.,</italic> 2019</xref>) as it allows changes in photosynthesis, increases the respiration rate of plants, and affects pest populations.</p>
            <p>One of the goals proposed to be achieved by 2030 is &#x201c;no hunger&#x201d; and the other goal is &#x201c;promoting sustainable agriculture&#x201d; (
                <xref ref-type="bibr" rid="ref18">Holzapfel and Br&#x00fc;ntrup, 2017</xref>). Sustainable agriculture helps to empower small farmers, end poverty, improve the financial growth of the country, and to promote gender equality. The present scenario is alarming. To ensure sustainable access to nutritious food universally, countries would force continuous food production and agricultural practices (
                <xref ref-type="bibr" rid="ref16">Ramesh and Vardhan, 2015</xref>).</p>
            <p>Timely and economic agricultural observance is essential to attain these goals. In this context, crop yield estimation is crucial for checking and making higher cognitive processes like crop insurance, money market foretelling, and addressing food security problems (
                <xref ref-type="bibr" rid="ref5">Patil and Shirdhonkar, 2017</xref>). With the drastic improvement in technology, the objective of the present study is to use the machine learning algorithms (
                <xref ref-type="bibr" rid="ref11">Medar 
                    <italic toggle="yes">et al.,</italic> 2019</xref>) and control systems to change the procedure and enhance the productivity (
                <xref ref-type="bibr" rid="ref17">Sriram 
                    <italic toggle="yes">et al.,</italic> 2019</xref>) of crops (
                <xref ref-type="bibr" rid="ref15">Zingade 
                    <italic toggle="yes">et al.,</italic> 2017</xref>).</p>
            <p>Formerly, machine learning (ML) algorithms like linear regression and multiple linear regression have been used to make crop yield predictions (
                <xref ref-type="bibr" rid="ref10">Manjula and Djodiltachoumy, 2017</xref>). This article proposes improved ML algorithms that use specialized ensemble methods such as stacked generalization, gradient boosting, random forest, and least absolute shrinkage and selection operator (LASSO) regression. Our goal is to develop a web application in order to provide the farmers/users an approximation on how much amount of crop yield will be produced depending upon the given input and also find the relationship between yield (dependent variable) and other independent variables.</p>
            <p>The remaining section of the article contains the literature survey, proposed method, results, discussion, conclusion, and recommendations for future work.</p>
        </sec>
        <sec id="sec2">
            <title>Literature review</title>
            <p>A convolutional neural network - recurrent neural network (CNN-RNN) framework for crop yield prediction was introduced by 
                <xref ref-type="bibr" rid="ref9">Saeed Khaki 
                    <italic toggle="yes">et al.,</italic> (2020)</xref>. In this article, other models like random forest (RF), deep fully neural networks (DFNN), and LASSO algorithms were compared with CNN-RNN in predicting the corn and soybean yield. The forecasting was done throughout the Corn Belt within the United States for the years 2016, 2017, and 2018. The results were based on three categories, having soil, weather, and management as the attributes, and the accuracy for corn and soybean was 87.82% and 87.09% respectively.</p>
            <p>To predict the crop yield, a random-forest classifier was used by 
                <xref ref-type="bibr" rid="ref8">Hajir Almahdi (2020)</xref> and 
                <xref ref-type="bibr" rid="ref21">Ramesh (2020)</xref>. In their article, a graphical web-based interface was designed for a farmer to know the yield of crops beforehand cultivation. The dataset contains details about the crop production of Maharashtra where the study was conducted.</p>
            <p>A backpropagation artificial neural network model was proposed by 
                <xref ref-type="bibr" rid="ref12">Meena and Singh (2013)</xref> for forecasting the crop yield. Unlike the fuzzy models, physical factors for yield forecasts were used. The annual forecast evaluation reports (AFER) are compared and have been reduced from 11.40% to 3.82%.</p>
            <p>An empirical analysis for crop yield forecasting was done by 
                <xref ref-type="bibr" rid="ref6">Dharmaraja 
                    <italic toggle="yes">et al.</italic> (2020)</xref> as an attempt to focus on forecasting the yield of &#x2018;bajra&#x2019; or the pearl millet crop through implementing appropriate statistical models such as regression and time-series models. Models like auto-regressive integrated moving average (ARIMA) and an ARIMA model with an exogenous variable (ARIMAX) were also used for prediction. The ARIMAX model produced the best outcome for 'bajra&#x2019; compared to the regression time series model.</p>
            <p>A crop yield prediction using ML was proposed by 
                <xref ref-type="bibr" rid="ref6">Nishant 
                    <italic toggle="yes">et al.</italic> (2020)</xref>. They used stacked regression for crop yield production, based on an additional factor of soil nutrients. Efficient neural network (ENeT), LASSO, and kernel ridge algorithms had minimal errors of 4%, 2%, and 1% respectively. A web page was used as an interface to display the predicted result.</p>
            <p>Mobile based applications such as uzhavan (
                <ext-link ext-link-type="uri" xlink:href="https://apps.apple.com/in/app/uzhavan/id1405906962">https://apps.apple.com/in/app/uzhavan/id1405906962</ext-link>), Kisan (
                <ext-link ext-link-type="uri" xlink:href="https://apps.apple.com/in/app/kisan/id1297223018">https://apps.apple.com/in/app/kisan/id1297223018</ext-link>), and the agri app (
                <ext-link ext-link-type="uri" xlink:href="https://play.google.com/store/apps/details?id=com.criyagen&amp;hl=en">https://play.google.com/store/apps/details?id=com.criyagen&amp;hl=en</ext-link>) provide facilities to the farmer for knowing the information about the scheme components, subsidy patterns, seeds and fertilizers. From the above literature, it is observed that the integration of an ML algorithm along with the web application or mobile application is missing. To address this issue, this article proposes a web page interface through which crop yield can be predicted with the applications of stacked generalization and random forest algorithms.</p>
        </sec>
        <sec id="sec3" sec-type="methods">
            <title>Methods</title>
            <p>Selecting appropriate data is a very important part of any machine learning algorithms or statistics. In the proposed system, Aerialintel datasets from the github data science repository were utilised to forecast crop yields (
                <xref ref-type="bibr" rid="ref19">Aerial Intelligence, 2017</xref>). Many researchers including 
                <xref ref-type="bibr" rid="ref17">Sriram Rakshith 
                    <italic toggle="yes">et al.</italic> (2019)</xref> and 
                <xref ref-type="bibr" rid="ref20">Jameshan (2017)</xref> have used this dataset and derived useful insights from it. It consists of two years&#x2019; winter wheat data for several counties in the United States of American for the years. 2013 and 2014, in total holding 26 attributes and over three hundred thousand records. The attributes mainly focus information about crop and climate data as outlined below.</p>
            <p>The climatic parameters include precipitation, temperature, cloud cover, vapor pressure, and wet day frequency. The data in these files are geolocated to specific lat-longs and counties. The framework of the proposed work for this study using these datasets is shown in 
                <xref ref-type="fig" rid="f1">Figure 1</xref>. The framework contains the following modules: data preprocessing, feature extraction, and decision support system (DSS). DSS module includes predictions and performance evaluation.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>Figure 1. </label>
                <caption>
                    <title>The framework for the proposed method.</title>
                </caption>
                <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76627/b1c4c07f-6f30-4918-ae1d-2c47c9589826_figure1.gif"/>
            </fig>
            <p>Predictions can be done by stacked regressor and performance evaluation can be done by checking the accuracy of dss. The detailed explanation about performance evaluation has been discussed in the discussion section.</p>
            <sec id="sec4">
                <title>Data preprocessing and feature extraction</title>
                <p>In this phase, the collected dataset was explored, and data preprocessing techniques such as imputation of missing values and Haversine distance have been used. The details about the original dataset are shown in 
                    <xref ref-type="table" rid="T1">Table 1</xref>. Attributes precipIntensity, pressure, and visibility contain missing values. The number of missing values of the above attributes are 1, 254 and 30 respectively. Since the data are collected from different states in the United States, a global average cannot be used for imputing missing values. Therefore, the data from the same day and the closest neighboring location has been used to replace the null values by calculating the haversian distance between the two points. Basic statistics features like mean, variance, and quartiles values are computed for all the attributes. From which, it is found that the attribute &#x201c;PrecipTypeIsOther&#x201d; can be dropped as they hold no predictive power, since all the statistical values are around zero. Pairwise positive correlations between different features will aid the removal of features from models, as adding highly correlated features dilutes the model's predictive potential. Correlation coefficients have been estimated for all the possible combinations. From the correlation matrix, it has been observed that attributes like apparentemperaturemin, apparenttemperaturemax, and precipintensitymax etc have been removed since it is highly correlated with attributes like temperaturemax, temperaturemin, and precipAccumulation. The correlation between the attributes is given in the result section. After the removal of highly correlated attributes, the dataset contains the following attributes: latitude, longitude, precipAccumulation, temperaturemax, temperaturemin, ndvi,windspeed, country, state and date. Attributes such as State, and Date are removed because their inclusion would result in overfitting and a lack of generalization (
                    <xref ref-type="bibr" rid="ref7">Gandhi 
                        <italic toggle="yes">et al.,</italic> 2016</xref>; 
                    <xref ref-type="bibr" rid="ref13">Mythra 
                        <italic toggle="yes">et al.,</italic> 2018</xref>). Features like length of day and elevation plays an important role in crop yield prediction (
                    <xref ref-type="bibr" rid="ref14">Nishant 
                        <italic toggle="yes">et al.,</italic> 2020</xref>). These features are the derived features, it is not available in the original dataset. Hence these two features of length of day and elevation were added in order to account for the amount of sunlight available to the plants at different locations. This can be done through astral package in python, version 3.8.8 (
                    <ext-link ext-link-type="uri" xlink:href="https://www.python.org/downloads/release/python-388/">https://www.python.org/downloads/release/python-388/</ext-link>). After data preprocessing and feature extraction, the dataset contains 12 features including derived attributes. These features are longitude, latitude, elevation, length_of_day, total_precipitation, minitemp, maxitemp, ndvi, windspeed, meantemp, stdtemp and yield. The results of the data preprocessing and feature extraction is shown in 
                    <xref ref-type="fig" rid="f4">Figures 4</xref>&#x2013;
                    <xref ref-type="fig" rid="f6">6</xref>. The original dataset contains two years of winter wheat data for several countries in the United States of America for 2013 and 2014 together with python code for data preprocessing techniques such as correlation estimation and scatter matrix is uploaded in Github (
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/HangulAlien/intelligent-decision-support-system">https://github.com/HangulAlien/intelligent-decision-support-system</ext-link>) (
                    <xref ref-type="bibr" rid="ref23">HangulAlien, 2021</xref>).</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>Table 1. </label>
                    <caption>
                        <title>Original attributes along with the data type.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Attributes</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Data type</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Attributes</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Data type</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">State</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Char</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">precipAccumulation</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Latitude</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">precipTypeIsRain</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Int</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Longitude</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">precipTypeIsSnow</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Int</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Date</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Char</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">precipTypeIsOther</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Char</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">apparentTemperatureMax</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">pressure</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">apparentTemperatureMin</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">temperatureMax</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">cloudCover</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">temperatureMin</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">dewPoint</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">visibility</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">humidity</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">windBearing</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Int</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">precipIntensity</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">windSpeed</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">precipIntensityMax</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">NDVI</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">precipProbability</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">DayInSeason</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Int</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">country name</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">char</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">yield</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Float</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec id="sec5">
                <title>Data partitioning</title>
                <p>Based on 
                    <xref ref-type="bibr" rid="ref8">Hajir Almahdi (2020)</xref> and 
                    <xref ref-type="bibr" rid="ref6">Dharmaraja 
                        <italic toggle="yes">et al.</italic> (2020)</xref>, the whole data set is divided into two parts: that is, 70% of the data set is used for training the model and 30% of the data is reserved for testing the model. In the 2013 wheat dataset, around 124,000 records were considered for training purpose and 53,000 records (containing the period from March to May 2014) were considered for testing purpose. In the 2014 wheat dataset, around 127,000 records were considered for training purpose and 54,000 records (containing the period from March to May 2015) were considered for testing purpose. While developing the machine learning model, both the datasets i.e., 2013 and 2014 datasets are combined.</p>
                <p>A simple correlation study of the final featured data demonstrates that there was no strong linear correlation between the input features and the target output. However, some of them were linearly correlated to each other, which led to the conclusion that linear models such as linear regression could not be the best model for this dataset and problem. Hence, it was decided to execute many algorithms such as random forest (RF), stacked generalization, gradient boosted tree (GBT) regression, and LASSO regression algorithms (
                    <xref ref-type="bibr" rid="ref2">Bhanu Kiran 
                        <italic toggle="yes">et al.,</italic> 2020</xref>). The efficiency of the model is tested using k-fold cross-validation (
                    <xref ref-type="bibr" rid="ref1">Shah 
                        <italic toggle="yes">et al.,</italic> 2018</xref>; 
                    <xref ref-type="bibr" rid="ref4">Champaneri 
                        <italic toggle="yes">et al.,</italic> 2020</xref>).</p>
            </sec>
            <sec id="sec6">
                <title>Algorithms</title>
                <p>In the proposed framework, the preprocessed dataset (contains 12 attributes), training and testing period is same for all the algorithms.</p>
                <p>
                    <bold>Random forest (RF) regression:</bold> The RF algorithm is a supervised learning model composed of multiple decision trees having the same nodes. It builds several decision trees and merges the decisions of several other decision trees to achieve a solution, which constitutes the mean of all these decision trees. The decision tree algorithm comprises traditional algorithms such as Iterative Dichotomiser (ID3), C 4.5 (which is a successor of ID3) and classification and regression tree (CART), etc. The performance of the algorithm can be measured by mean squared error (MSE).
                    <disp-formula id="e1">
                        <mml:math display="block">
                            <mml:mtext>MSE</mml:mtext>
                            <mml:mo>=</mml:mo>
                            <mml:mfrac>
                                <mml:mn>1</mml:mn>
                                <mml:mi>N</mml:mi>
                            </mml:mfrac>
                            <mml:munderover>
                                <mml:mo>&#x2211;</mml:mo>
                                <mml:mrow>
                                    <mml:mi>i</mml:mi>
                                    <mml:mo>=</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                                <mml:mi>N</mml:mi>
                            </mml:munderover>
                            <mml:msup>
                                <mml:mfenced close=")" open="(">
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>f</mml:mi>
                                            <mml:mi>i</mml:mi>
                                        </mml:msub>
                                        <mml:mo>&#x2212;</mml:mo>
                                        <mml:msub>
                                            <mml:mi>y</mml:mi>
                                            <mml:mi>i</mml:mi>
                                        </mml:msub>
                                    </mml:mrow>
                                </mml:mfenced>
                                <mml:mn>2</mml:mn>
                            </mml:msup>
                        </mml:math>
                        <label>(1)</label>
                    </disp-formula>
                </p>
                <p>where 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>N</mml:mi>
                        </mml:math>
                    </inline-formula> is the number of records, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>f</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> is the value returned by the model, and 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>y</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> is the actual value for the given data point.</p>
                <p>
                    <bold>LASSO Regression:</bold> LASSO regression is a form of linear regression that uses shrinkage. It performs both selections of variables and regularization in order to enhance accuracy. The LASSO model encourages simple, sparse models.</p>
                <p>This precise form of regression is well-acceptable for models displaying excessive degrees of multi-collinearity or whilst one needs to automate certain components of model selection, like variable selection/parameter elimination.
                    <disp-formula id="e2">
                        <mml:math display="block">
                            <mml:msub>
                                <mml:mi>L</mml:mi>
                                <mml:mtext mathvariant="italic">lasso</mml:mtext>
                            </mml:msub>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>&#x03b2;</mml:mi>
                            </mml:mfenced>
                            <mml:mo>=</mml:mo>
                            <mml:munderover>
                                <mml:mo>&#x2211;</mml:mo>
                                <mml:mrow>
                                    <mml:mi>i</mml:mi>
                                    <mml:mo>=</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                                <mml:mi>n</mml:mi>
                            </mml:munderover>
                            <mml:msup>
                                <mml:mfenced close=")" open="(">
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>y</mml:mi>
                                            <mml:mi>i</mml:mi>
                                        </mml:msub>
                                        <mml:mo>&#x2212;</mml:mo>
                                        <mml:munderover>
                                            <mml:mo>&#x2211;</mml:mo>
                                            <mml:mi>j</mml:mi>
                                            <mml:mi>n</mml:mi>
                                        </mml:munderover>
                                        <mml:msub>
                                            <mml:mi>x</mml:mi>
                                            <mml:mi mathvariant="italic">ij</mml:mi>
                                        </mml:msub>
                                        <mml:msub>
                                            <mml:mi>&#x03b2;</mml:mi>
                                            <mml:mi>j</mml:mi>
                                        </mml:msub>
                                    </mml:mrow>
                                </mml:mfenced>
                                <mml:mn>2</mml:mn>
                            </mml:msup>
                            <mml:mo>+</mml:mo>
                            <mml:mi>&#x03bb;</mml:mi>
                            <mml:munderover>
                                <mml:mo>&#x2211;</mml:mo>
                                <mml:mrow>
                                    <mml:mi>j</mml:mi>
                                    <mml:mo>=</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                                <mml:mi>p</mml:mi>
                            </mml:munderover>
                            <mml:mfenced close="|" open="|">
                                <mml:msub>
                                    <mml:mi>&#x03b2;</mml:mi>
                                    <mml:mi>j</mml:mi>
                                </mml:msub>
                            </mml:mfenced>
                        </mml:math>
                        <label>(2)</label>
                    </disp-formula>where 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>y</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> is the outcome, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mi mathvariant="italic">ij</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> is the covariate, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>&#x03bb;</mml:mi>
                        </mml:math>
                    </inline-formula> is the amount of shrinkage and 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>&#x03b2;</mml:mi>
                        </mml:math>
                    </inline-formula> is the regression coefficient.</p>
                <p>
                    <bold>Gradient boosted tree (GBT) regression</bold>: The GBT regression trees model is one of the most successful machine learning models for predictive study, which optimizes the result value in the successive steps in every iteration of the decision tree by adjusting the values of weights, or biases coefficients applied to the input variable. Gradient boosting involves three elements; namely, a loss function to be optimized, a weak learner to make predictions, and an additive model to add weak learners to minimize the loss function.
                    <disp-formula id="e3">
                        <mml:math display="block">
                            <mml:msub>
                                <mml:mi>F</mml:mi>
                                <mml:mi>m</mml:mi>
                            </mml:msub>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>x</mml:mi>
                            </mml:mfenced>
                            <mml:mo>=</mml:mo>
                            <mml:msub>
                                <mml:mi>F</mml:mi>
                                <mml:mrow>
                                    <mml:mi>m</mml:mi>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>x</mml:mi>
                            </mml:mfenced>
                            <mml:mo>+</mml:mo>
                            <mml:munderover>
                                <mml:mo>&#x2211;</mml:mo>
                                <mml:mrow>
                                    <mml:mi>j</mml:mi>
                                    <mml:mo>=</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                                <mml:msub>
                                    <mml:mi>J</mml:mi>
                                    <mml:mi>m</mml:mi>
                                </mml:msub>
                            </mml:munderover>
                            <mml:msub>
                                <mml:mi>&#x03b3;</mml:mi>
                                <mml:mi mathvariant="italic">jm</mml:mi>
                            </mml:msub>
                            <mml:mn>1</mml:mn>
                            <mml:msub>
                                <mml:mi>R</mml:mi>
                                <mml:mi mathvariant="italic">jm</mml:mi>
                            </mml:msub>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>x</mml:mi>
                            </mml:mfenced>
                            <mml:mo>,</mml:mo>
                            <mml:msub>
                                <mml:mi>&#x03b3;</mml:mi>
                                <mml:mi mathvariant="italic">jm</mml:mi>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:mo mathvariant="italic">arg</mml:mo>
                            <mml:mspace width="0.25em"/>
                            <mml:msub>
                                <mml:mo mathvariant="italic">min</mml:mo>
                                <mml:mi>&#x03b3;</mml:mi>
                            </mml:msub>
                            <mml:munderover>
                                <mml:mo>&#x2211;</mml:mo>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mi>x</mml:mi>
                                        <mml:mi>i</mml:mi>
                                    </mml:msub>
                                    <mml:mo>&#x2208;</mml:mo>
                                    <mml:msub>
                                        <mml:mi>R</mml:mi>
                                        <mml:mi mathvariant="italic">jm</mml:mi>
                                    </mml:msub>
                                </mml:mrow>
                                <mml:mi mathvariant="italic">Jm</mml:mi>
                            </mml:munderover>
                            <mml:mi>L</mml:mi>
                            <mml:mfenced close=")" open="(" separators=",">
                                <mml:msub>
                                    <mml:mi>y</mml:mi>
                                    <mml:mi>i</mml:mi>
                                </mml:msub>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mi>F</mml:mi>
                                        <mml:mrow>
                                            <mml:mi>m</mml:mi>
                                            <mml:mo>&#x2212;</mml:mo>
                                            <mml:mn>1</mml:mn>
                                        </mml:mrow>
                                    </mml:msub>
                                    <mml:mfenced close=")" open="(">
                                        <mml:msub>
                                            <mml:mi>x</mml:mi>
                                            <mml:mi>i</mml:mi>
                                        </mml:msub>
                                    </mml:mfenced>
                                    <mml:mo>+</mml:mo>
                                    <mml:mi>&#x03b3;</mml:mi>
                                </mml:mrow>
                            </mml:mfenced>
                        </mml:math>
                        <label>(3)</label>
                    </disp-formula>
                </p>
                <p>where 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>J</mml:mi>
                                <mml:mi>m</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> is the number of terminal nodes in trees, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>R</mml:mi>
                                <mml:mi mathvariant="italic">jm</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> is the region under study, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>&#x03b3;</mml:mi>
                                <mml:mi mathvariant="italic">jm</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> is the optimal value, and 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>x</mml:mi>
                        </mml:math>
                    </inline-formula> is the training value.</p>
                <p>
                    <bold>Stacked regression</bold>: Stacking regressions is a method of combining multiple regressors to increase accuracy. The workflow of stacked regression is shown in 
                    <xref ref-type="fig" rid="f2">Figure 2</xref>. It uses several meta-algorithms in order to learn how to combine the best predictions from two or more base algorithms. Here, by cross-validation and least square for non-negative values, the coefficient of the stack is found to give a result. It is found to be effective when compared with traditional ML algorithms and random forest. In the proposed work, the algorithms for random forest, LASSO regression, and GBT were used in the stacked regression. In 
                    <xref ref-type="fig" rid="f2">Figure 2</xref>, R1, R2 &#x2026; Rn represents the model which is generated after training. Based on the training model and testing data, the prediction models -P1,P2, &#x2026; Pn is generated. The individual regression models are trained based on the same training set; then the meta-regressor is fitted based on the meta-features of the individual regression models in the ensemble learning. Meta-regression is a type of meta-analysis that customs regression analysis to combine, compare, and synthesize research findings from multiple experiments to provide a better response.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Stacking regressor.</title>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76627/b1c4c07f-6f30-4918-ae1d-2c47c9589826_figure2.gif"/>
                </fig>
                <p>The Python integrated development environment (IDE) was utilised to find the machine learning solution for agricultural yield prediction using packages such as os, pickle, time, matplotlib, pandas, basemap, sklearn, numpy, and astral. Python pickle module is used for serializing and de-serializing a Python object structure. Pickle is used to &#x201c;serializes&#x201d; the object first before writing it to file. It is the way of converting a python object into a character stream. jsonify() is a helper method provided by Flask to properly return JSON data. Render_template is used to produce the output from a template file based on jinja2 engine. Render_template is typically imported directly from the flask package. Astral package is used to calculating the times of various aspects of the sun and phases of the moon.</p>
                <p>The web-based model was deployed using flask, The flask framework's goal is to provide a graphical user interface for accessing information. In the proposed work, the best performing model i.e., stacked generalization is loaded in the flask framework to cross verify the performance or accuracy of the algorithm. When we provide inputs in the webpage, the stacked generalization model runs and provide the required output, i.e., yield prediction. The following input features: longitude, latitude, elevation, length_of_day, total_precipitation, minitemp, maxitemp, ndvi, windspeed, meantemp, and stdtemp are given in the web page to find the yield prediction. If the user enters the location details, wind speed and temperature details, they can obtain the yield prediction details. Around 100,000 records are considered for the testing purpose which includes combined data of 2013 and 2014 
                    <xref ref-type="bibr" rid="ref19">aerial intelligence (2017)</xref> datasets. Any novice users can access the webpage at any time from any location. The web interface is shown in 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>. The creation of the interactive page contains the following steps:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x27a2;</label>
                            <p>Install the flask package available in python- version 3.8.8 (
                                <ext-link ext-link-type="uri" xlink:href="https://www.python.org/downloads/release/python-388/">https://www.python.org/downloads/release/python-388/</ext-link>).</p>
                        </list-item>
                        <list-item>
                            <label>&#x27a2;</label>
                            <p>Create a HTML file to display the front-end design of the web page</p>
                        </list-item>
                        <list-item>
                            <label>&#x27a2;</label>
                            <p>Create a python file that contains the following: generate a new route &#x201c;/join&#x201d; with &#x201c;get and post&#x201d; methods. Take the input from the web input box through request.form[&lt;'name'&gt;] . Perform the manipulations in the function and return the value as a JSON format to the web.</p>
                        </list-item>
                        <list-item>
                            <label>&#x27a2;</label>
                            <p>Create a route &#x201c;/&#x201d; and return to html file from the function. Then run the python file and click on the link that it provides after running.</p>
                        </list-item>
                        <list-item>
                            <label>&#x27a2;</label>
                            <p>The webpage takes the input from the web to flask and print the results back to the web page.</p>
                        </list-item>
                    </list>
                </p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Web Interface to predict the crop yield.</title>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76627/b1c4c07f-6f30-4918-ae1d-2c47c9589826_figure3.gif"/>
                </fig>
            </sec>
        </sec>
        <sec id="sec7" sec-type="results">
            <title>Results</title>
            <p>
                <xref ref-type="fig" rid="f4">Figure 4</xref> represents the geographical distribution of data corresponds to the years 2013 and 2014. This graph represents the yield of crops in the particular region based on the collected dataset. The red color denotes maximum yield while it decreases towards blue color. In the graph, the year representing the yield prediction and the number of records in the region are mentioned. The number of records mentioned in different colors starting from blue to red which resprents lowest to highest count of the record. Since the dataset is huge, here both scatter matrices and correlation matrix are used to find the correlation between two variables, is shown in 
                <xref ref-type="fig" rid="f5">Figures 5</xref> and 
                <xref ref-type="fig" rid="f6">6</xref>. To demonstrate the purpose of the scatter matrix, four features such as temperaruremin, tempertauremax, apparanttemperaturemin and apparanttemperaturemax is considered. As a result, a 4*4 scatter matrix has been formed and it is shown in 
                <xref ref-type="fig" rid="f5">Figure 5</xref>. In the matrix, diagonal value represents histogram of the above four attributes. Other than diagonal value represents, correlation between the attributes. For example, the first row represents the correlation between apparanttemperaturemax and the remaining attributes such as apparanttemperaturemin,tempertauremax and temperaruremin. From the first row, it is observed that apparanttemperaturemax is correlated with other three attributes, since the y value is increased if there is an increase in x value as well, as it contains very few outlier data. Similar to the first row, the correlation between the attributes can be taken from the second, third, and fourth rows. From the matrix, it is observed that all attributes are correlated with each other. The correlation between the 12 attributes is shown in 
                <xref ref-type="fig" rid="f6">Figure 6</xref>. In the correlation matrix, highly correlated features are denoted in red and less correlated features are denoted in blue. In the correlation matrix, the diagonal represents correlation of the univariate data. First row in the correlation matrix denotes how the attribute &#x201c;longitude&#x201d; is correlated with the other 11 attributes. From the first row of the figure, it is inferred that the attribute longitude is negatively correlated (blue color in 
                <xref ref-type="fig" rid="f6">Figure 6</xref>) with the attributes latitude, ratiomndvi30, and elevation. The attribute longitude has no correlation with the attributes total_precipitation and yield. The attribute longitude is positively correlated with the attributes minnat30, mean_wind_speed,std_temperature_diff, and mean_tempeaturediff. The attribute longitude is strongly correlated (red color in 
                <xref ref-type="fig" rid="f6">Figure 6</xref>) with the attributes LOD and maxmat30. Four regression-based algorithms were used to find the crop yield. They are random forest regression, gradient boosted tree regression, LASSO regression, and stacked generalization ensemble method. The relative efficiencies of these four models were compared using cross-validation as outlined in the methods section. The performance was measured by varying the hyper-parameter settings. In most of the cases, stacked generalization performed the best, followed by random forest, and gradient boosted tree regression. The overall comparison of the algorithms is shown in 
                <xref ref-type="table" rid="T2">Table 2</xref>.</p>
            <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                <label>Figure 4. </label>
                <caption>
                    <title>Geographical distribution of data in United States in 2013 and 2014.</title>
                </caption>
                <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76627/b1c4c07f-6f30-4918-ae1d-2c47c9589826_figure4.gif"/>
            </fig>
            <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                <label>Figure 5. </label>
                <caption>
                    <title>Scatter matrix for the 12 sample features.</title>
                </caption>
                <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76627/b1c4c07f-6f30-4918-ae1d-2c47c9589826_figure5.gif"/>
            </fig>
            <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                <label>Figure 6. </label>
                <caption>
                    <title>Correlation matrix in which highly correlated features are denoted in red and less correlated features are denoted in blue.</title>
                </caption>
                <graphic id="gr6" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76627/b1c4c07f-6f30-4918-ae1d-2c47c9589826_figure6.gif"/>
            </fig>
            <table-wrap id="T2" orientation="portrait" position="float">
                <label>Table 2. </label>
                <caption>
                    <title>Performance comparison of various machine learning algorithms.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="top">Algorithm</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Accuracy</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Random forest regressor</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">87.71%</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Stacked generalization</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <italic toggle="yes">88.89%</italic>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Gradient boosted tree regression</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">86.98%</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">LASSO regression</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">42.00%</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <p>When optimizing the parameters, the best pairs of hyper-parameters were found, from which the performance can be increased. The learning curve of the stacked regressor and random forest is shown in 
                <xref ref-type="fig" rid="f7">Figure 7</xref>. The proposed work is trained and tested. Based on the results obtained from the testing set, the comparison of the proposed algorithms has been done.</p>
            <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                <label>Figure 7. </label>
                <caption>
                    <title>Learning curve of stacked regressor vs random forest.</title>
                </caption>
                <graphic id="gr7" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76627/b1c4c07f-6f30-4918-ae1d-2c47c9589826_figure7.gif"/>
            </fig>
            <p>The proposed ensembling methods of stacked generalization, gradient boosting, random forest, and LASSO regression have been implemented using the same training dataset. Among these algorithms, random forest &#x2013; 87.71% and stacked generalization &#x2013; 88.89% yield slightly better accuracy than 
                <xref ref-type="bibr" rid="ref22">Kaur 
                    <italic toggle="yes">et al.</italic> (2020)</xref>, who have implemented random forest, gradient boosted regression, nearest neighbor regression, and support vector machine with the polynomial kernel where the accuracy of the algorithms is 87.5%, 80.11%, 78%, and 34%, respectively.</p>
        </sec>
        <sec id="sec8" sec-type="discussion">
            <title>Discussion</title>
            <p>The correlation matrix (
                <xref ref-type="fig" rid="f5">Figure 5</xref>) and the scatter matrix (
                <xref ref-type="fig" rid="f5">Figure 5</xref>) is used to find highly connected features (
                <xref ref-type="fig" rid="f6">Figure 6</xref>). Attributes like apparentemperaturemin, apparenttemperaturemax, and precipintensitymax have been removed since it is highly correlated with attributes like temperaturemax, temperaturemin, and precipAccumulation. Features like day length and elevation are added since they play an important role in crop yield prediction (
                <xref ref-type="bibr" rid="ref14">Nishant 
                    <italic toggle="yes">et al.,</italic> 2020</xref>). After data preprocessing techniques such as imputation of missing values, attribute elimination, and adding the new attributes, the dataset contains 12 attributes. The algorithms RF, stacked generalization, GBT regression, and LASSO regression is used to predict crop yield. The performance of these algorithms is shown in 
                <xref ref-type="table" rid="T2">Table 2</xref>. The performance of each model is evaluated separately, and then the performance of the stacked regressor is evaluated. Among these algorithms, stacked regressor yield better results. The mean absolute percentage error is ~ 5%. Based on the experimental results outlined in the previous section, the following observations have been made. The accuracy of random forest regressor, gradient boosted tree regression, and stacked generalization ensemble methods are 87.71%, 86.98%, and 88.89 % respectively. The proposed stacked generalization ML algorithm statistically outperforms with an accuracy of 88.89% and hence demonstrates that the proposed approach is an effective algorithm. The learning curve (shown in 
                <xref ref-type="fig" rid="f7">Figure 7</xref>) for the training is above the validation score. This indicates the goodness of the random forest and stacked generalization model. The learning curve of the stacked generalization model (
                <xref ref-type="fig" rid="f8">Figure 8</xref>) showed little over-fitting but compared to other models, the overall accuracy and variance produce stronger results. The final model's R2 value is ~ 0.85 with a root mean square error (RMSE) of 5.2. The accuracy of the proposed algorithm is comparatively better than the existing work proposed by Kaur 
                <italic toggle="yes">et al. ,</italic>2020. In the earlier literature (
                <xref ref-type="bibr" rid="ref14">Nishant 
                    <italic toggle="yes">et al.,</italic> 2020</xref> and 
                <xref ref-type="bibr" rid="ref11">Medar 
                    <italic toggle="yes">et al.,</italic> 2019</xref>), yield prediction was done by accepting input parameters in the terminal and not in the web interface. The farmers don&#x2019;t have knowledge and don&#x2019;t know how to use the terminal. In the proposed framework, the above issue has been resolved by use of the web interface. In the literature (
                <xref ref-type="bibr" rid="ref22">Kaur 
                    <italic toggle="yes">et al.,</italic> 2020</xref>), they maily focused on latitude, longitude, temperature and humidity. They are not considering the derived attributes like elevation and the length_of_day. In the proposed work, including the above features totally 11 features longitude, latitude, elevation, length_of_day, total_precipitation, minitemp, maxitemp, ndvi, windspeed, meantemp, stdtemp are considered for predicting the crop yield. The testing dataset that supports to check the performance of the web interface. The interactive web interface is used to find the crop yield prediction by accepting the inputs from the user as shown in 
                <xref ref-type="fig" rid="f3">Figure 3</xref>. The limitation of the study is that the proposed work uses United States datasets by considering the crop yield for the year 2013 and 2014, where recent datasets have been considered for better understanding and checking the accuracy in the real time.</p>
            <fig fig-type="figure" id="f8" orientation="portrait" position="float">
                <label>Figure 8. </label>
                <caption>
                    <title>Performance of stacked generalization regressor.</title>
                </caption>
                <graphic id="gr8" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76627/b1c4c07f-6f30-4918-ae1d-2c47c9589826_figure8.gif"/>
            </fig>
        </sec>
        <sec id="sec9" sec-type="conclusions">
            <title>Conclusions</title>
            <p>Based on the climatic input parameters, the present experiment provided a demonstration of the possible use of four regression-based algorithms to predict crop yield. The algorithms are random forest regression, gradient boosted tree regression, LASSO regression, and stacked generalization ensemble method. In comparison of these algorithms, one concludes that the stacked ensemble model performed the best, followed by others for the given dataset.</p>
            <p>Since this proposed system is a web-based system, input variables and modules can be easily changed as new features can be added based on their future needs. The system also gives fast and accurate responses to the farmers.</p>
            <p>
                <bold>Suggestion for future studies:</bold> Our future work is to examine hybrid machine learning such as random forest, support vector machine, multiple regressor, logistic regressor and deep learning algorithms, such as deep convolution neural network (DCNN), and long short-term memory (LSTM) which might provide a fast and accurate solution to this problem. Future work will include considering the large recent datasets from different countries for predicting the crop yield in advance, leaf disease prediction, and predicting the quality of the fruits etc. and the results will be tested by the farmers and the agricultural experts.</p>
        </sec>
        <sec id="sec10">
            <title>Data availability</title>
            <sec id="sec11">
                <title>Underlying data</title>
                <p>Zenodo: HangulAlien/intelligent-decision-support-system: Crop Prediction. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.5533487">https://doi.org/10.5281/zenodo.5533487</ext-link> (
                    <xref ref-type="bibr" rid="ref23">HangulAlien, 2021</xref>).</p>
                <p>The project contains the following underlying data:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Python file. (Contains code for Random forest, Gradient boosted tree regression, Lasso regression and stacked generalization).</p>
                        </list-item>
                    </list>
                </p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">Creative Commons Zero &#x201c;No rights reserved&#x201d; data waiver</ext-link> (CC0 1.0 Public domain dedication).</p>
            </sec>
        </sec>
        <sec id="sec12">
            <title>Software availability</title>
            <p>Source code available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/HangulAlien/intelligent-decision-support-system">https://github.com/HangulAlien/intelligent-decision-support-system</ext-link>.</p>
            <p>Archived source code at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.5533487">https://doi.org/10.5281/zenodo.5533487</ext-link>.</p>
            <p>License: 
                <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">Creative Commons Zero &#x201c;No rights reserved&#x201d; data waiver</ext-link> (CC0 1.0 Public domain dedication).</p>
        </sec>
    </body>
    <back>
        <ref-list>
            <title>References</title>
            <ref id="ref19">
                <mixed-citation publication-type="other">
                    <collab>Aerial Intelligence</collab>:
                    <article-title>Data-science-exercise.</article-title>
                    <year>2017</year>. (Accessed on March 01,2021).
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/aerialintel/data-science-exercise">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bhanu Kiran</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Priyanka</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Poojitha</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Crop Yield Prediction using Regression.</article-title>
                    <source>

                        <italic toggle="yes">Int. Res. J. Eng. Techno. (IRJET).</italic>
</source>
                    <year>2020</year>;<volume>7</volume>(<issue>5</issue>):<fpage>3896</fpage>&#x2013;<lpage>3899</lpage>.</mixed-citation>
            </ref>
            <ref id="ref3">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bhanumathi</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vineeth</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rohit</surname>
                            <given-names>N</given-names>
                        </name>
</person-group>:
                    <article-title>Crop Yield Prediction and Efficient use of Fertilizers.</article-title>
                    <source>

                        <italic toggle="yes">IEEE International Conference on Communication and Signal Processing (ICCSP).</italic>
</source>
                    <year>2019</year>; pp.<fpage>769</fpage>&#x2013;<lpage>773</lpage>.</mixed-citation>
            </ref>
            <ref id="ref4">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Champaneri</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chachpara</surname>
                            <given-names>DC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chaitanya</surname>
                        </name>
</person-group>:
                    <article-title>Crop yield prediction using machine learning.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Sci. Res. (IJSR).</italic>
</source>
                    <year>2020</year>;<volume>9</volume>(<issue>2</issue>):<fpage>645</fpage>&#x2013;<lpage>648</lpage>.</mixed-citation>
            </ref>
            <ref id="ref6">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dharmaraja</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jain</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Anjoy</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Empirical Analysis for Crop Yield Forecasting in India.</article-title>
                    <source>

                        <italic toggle="yes">Agric Res.</italic>
</source>
                    <year>2020</year>;<volume>9</volume>:<fpage>132</fpage>&#x2013;<lpage>138</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s40003-019-00413-x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gandhi</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Armstrong</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>Rice crop yield forecasting of tropical wet and dry climatic zone of India using data mining techniques.</article-title>
                    <source>

                        <italic toggle="yes">IEEE International Conference on Advances in Computer Applications (ICACA).</italic>
</source>
                    <year>2016</year>; pp.<fpage>357</fpage>&#x2013;<lpage>363</lpage>.</mixed-citation>
            </ref>
            <ref id="ref8">
                <mixed-citation publication-type="other">
                    <collab>Hajir Almahdi</collab>:
                    <article-title>Machine Learning nano-degree capstone project Data-science-exercise.</article-title>
                    <year>2020</year>. (Accessed on March 10,2021).
                    <ext-link ext-link-type="uri" xlink:href="https://towardsdatascience.com/predicting-crops-yield-machine-learning-nanodegree-capstone-project-e6ec9349f69">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>HangulAlien.</surname>
                        </name>
</person-group>:
                    <article-title>HangulAlien/intelligent-decision-support-system: Crop Prediction (Version 1).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2021</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.5533487</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Holzapfel</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Br&#x00fc;ntrup</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>SDG 2 (Zero Hunger) in the context of the German Sustainable Development Strategy: are we leaving the starving behind? Briefing Paper, No. 13/2017.</article-title>
                    <year>2017</year>.</mixed-citation>
            </ref>
            <ref id="ref20">
                <mixed-citation publication-type="other">
                    <collab>Jameshan</collab>:
                    <article-title>Wheat yield prediction for United States by environmental features.</article-title>
                    <year>2017</year>. (Accessed on March 01, 2021).
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/itsjameshan/Wheat-Yield-prediction-for-United-States-by-environmental-features">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kaur</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Havish</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dutt</surname>
                            <given-names>TK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Agrocompanion: A Smart Farming Approach Based on Iot and Machine Learning.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Innov. Techn. Explor. Eng. (IJITEE).</italic>
</source>
                    <year>2020</year>;<volume>9</volume>(<issue>12</issue>):<fpage>254</fpage>&#x2013;<lpage>262</lpage>.
                    <pub-id pub-id-type="doi">10.35940/ijitee.L7984.1091220</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Manjula</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Djodiltachoumy</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>A Model for Prediction of Crop Yield.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Compu. Intell. Inform.</italic>
</source>
                    <year>2017</year>;<volume>6</volume>(<issue>4</issue>):<fpage>298</fpage>&#x2013;<lpage>305</lpage>.</mixed-citation>
            </ref>
            <ref id="ref11">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Medar</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rajpurohit</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shweta</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Crop Yield Prediction using Machine Learning Techniques.</article-title>
                    <source>

                        <italic toggle="yes">IEEE 5th International Conference for Convergence in Technology (I2CT).</italic>
</source>
                    <year>2019</year>; pp.<fpage>1</fpage>&#x2013;<lpage>5</lpage>.</mixed-citation>
            </ref>
            <ref id="ref12">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Meena</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Singh</surname>
                            <given-names>PK</given-names>
                        </name>
</person-group>:
                    <article-title>Crop Yield Forecasting Using Neural Networks.</article-title>
                    <source>

                        <italic toggle="yes">Swarm, Evolutionary, and Memetic Computing. SEMCCO 2013. Lecture Notes in Computer Science, 82, Springer, Cham.</italic>
</source>
                    <year>2013</year>; pp.<fpage>319</fpage>&#x2013;<lpage>331</lpage>.
                    <pub-id pub-id-type="doi">10.1007/978-3-319-03756-1_29</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mythra</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Velayudham</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shamila</surname>
                            <given-names>ES</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Survey on Crop Yield Prediction using Data Mining.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Comp. Trends and Technol.</italic>
</source>
                    <year>2018</year>;<volume>65</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>7</lpage>.
                    <pub-id pub-id-type="doi">10.14445/22312803/IJCTT-V65P101</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nishant</surname>
                            <given-names>PS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sai Venkat</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Avinash</surname>
                            <given-names>BL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Crop Yield Prediction based on Indian Agriculture using Machine Learning.</article-title>
                    <source>

                        <italic toggle="yes">International Conference for Emerging Technology (INCET), Belgaum, India.</italic>
</source>
                    <year>2020</year>; pp.<fpage>1</fpage>&#x2013;<lpage>4</lpage>.</mixed-citation>
            </ref>
            <ref id="ref5">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Patil</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shirdhonkar</surname>
                            <given-names>MS</given-names>
                        </name>
</person-group>:
                    <article-title>Rice Crop Yield Prediction using Data Mining Techniques: An Overview.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Adv. Res. Comp. Sci. Softw. Eng.</italic>
</source>
                    <year>2017</year>;<volume>7</volume>(<issue>5</issue>):<fpage>427</fpage>&#x2013;<lpage>431</lpage>.
                    <pub-id pub-id-type="doi">10.23956/ijarcsse/SV7I5/0135</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Saeed</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lizhi</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Archontoulis Sotirios</surname>
                            <given-names>V</given-names>
                        </name>
</person-group>:
                    <article-title>A CNN-RNN Framework for Crop Yield Prediction.</article-title>
                    <source>

                        <italic toggle="yes">Front. Plant Sci.</italic>
</source>
                    <year>2020</year>;<volume>10</volume>:<fpage>1,750</fpage>&#x2013;<lpage>1,755</lpage>.</mixed-citation>
            </ref>
            <ref id="ref1">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shah</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dubey</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hemnani</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Smart Farming System: Crop Yield Prediction Using Regression Techniques.</article-title>
                    <source>

                        <italic toggle="yes">Proceedings of International Conference on Wireless Communication.</italic>
</source>
                    <year>January 2018</year>; pp.<fpage>49</fpage>&#x2013;<lpage>56</lpage>. Springer.</mixed-citation>
            </ref>
            <ref id="ref17">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sriram Rakshith</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Deepak</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rajesh</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Survey on Crop Prediction using Machine Learning Approach.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Res. App. Sci. &amp; Eng. Techno. (IJRASET).</italic>
</source>
                    <year>2019</year>;<volume>7</volume>(<issue>4</issue>):<fpage>3231</fpage>&#x2013;<lpage>3234</lpage>.</mixed-citation>
            </ref>
            <ref id="ref16">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ramesh</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vardhan</surname>
                            <given-names>BV</given-names>
                        </name>
</person-group>:
                    <article-title>Analysis of crop yield prediction using data mining techniques.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Res. Eng. Techn.</italic>
</source>
                    <year>2015</year>;<volume>04</volume>:<fpage>470</fpage>&#x2013;<lpage>473</lpage>.
                    <pub-id pub-id-type="doi">10.15623/ijret.2015.0401071</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <mixed-citation publication-type="other">
                    <collab>Ramesh A: Data Analytics</collab>:<year>2020</year>. (Accessed on April 30,2021).
                    <ext-link ext-link-type="uri" xlink:href="https://nptel.ac.in/ courses/106/107/106107220/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zingade</surname>
                            <given-names>DS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Buchade</surname>
                            <given-names>O</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mehta</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Crop Prediction System using Machine Learning.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Adv. Engin. Res. Develop. Special Issue on Recent Trends in Data Eng. (IJAERD).</italic>
</source>
                    <year>2017</year>;<volume>4</volume>(<issue>5</issue>):<fpage>01</fpage>&#x2013;<lpage>06</lpage>.</mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report99829">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.76627.r99829</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Vincent</surname>
                        <given-names>Durai Raj</given-names>
                    </name>
                    <xref ref-type="aff" rid="r99829a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-7598-1363</uri>
                </contrib>
                <aff id="r99829a1">
                    <label>1</label>School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>20</day>
                <month>12</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Vincent DR</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport99829" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.73009.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This work on crop yield prediction is nicely presented with much clarity. 
                <list list-type="bullet">
                    <list-item>
                        <p>The role of the crop yield prediction was explained well in the introduction section.</p>
                    </list-item>
                    <list-item>
                        <p>Aerialintel datasets from the GitHub data science repository were utilized to forecast crop yields. The existing attributes, the reason for the elimination of certain attributes like "apparentemperaturemin", "apparenttemperaturemax", etc, and the reason for the inclusion of the attributes like length of day and elevation are explained in data reprocessing and feature extraction section.</p>
                    </list-item>
                    <list-item>
                        <p>Figure 1 gives an overview of the entire work and it is easy to follow.</p>
                    </list-item>
                    <list-item>
                        <p>The working of stacked generalization, gradient boosting, random forest, and least absolute shrinkage and selection operator (LASSO) regression for crop yield prediction was explained appropriately along with the equation.</p>
                    </list-item>
                    <list-item>
                        <p>The author also explained the purpose and need of a web-based model. The web-based model was deployed using a flask.</p>
                    </list-item>
                    <list-item>
                        <p>In the results and discussion section, the performance of the algorithms has been compared using accuracy and the learning curve.&#x00a0;</p>
                    </list-item>
                </list> The following comment can be considered to further strengthen the work. 
                <list list-type="bullet">
                    <list-item>
                        <p>The need for the current work was clearly mentioned in the Literature Review section by comparing with existing articles and existing mobile applications. However, many references are taken from conference proceedings, not from high-impact journals. The authors should consider references from high impact journal publications on crop yield prediction. For example, the following articles by this reviewer: Elavarasan 
                            <italic>et al. </italic>2018
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-99829-1">1</xref>
                            </sup>,&#x00a0;Elavarasan and Vincent 2021a
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-99829-2">2</xref>
                            </sup>
                            <italic>,&#x00a0;</italic>Elavarasan and Vincent 2021b
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-99829-3">3</xref>
                            </sup>.</p>
                    </list-item>
                </list> Finally, I conclude that the flow and contents are clear and the language is easy to understand. This article is suitable for indexing.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>AI, ML, Deep Learning</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-99829-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Forecasting yield by integrating agrarian factors and machine learning models: A survey</article-title>.
                        <source>
                            <italic>Computers and Electronics in Agriculture</italic>
                        </source>.<year>2018</year>;<volume>155</volume>:
                        <elocation-id>10.1016/j.compag.2018.10.024</elocation-id>
                        <fpage>257</fpage>-<lpage>282</lpage>
                        <pub-id pub-id-type="doi">10.1016/j.compag.2018.10.024</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-99829-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Fuzzy deep learning-based crop yield prediction model for sustainable agronomical frameworks</article-title>.
                        <source>
                            <italic>Neural Computing and Applications</italic>
                        </source>.<year>2021</year>;<volume>33</volume>(<issue>20</issue>) :
                        <elocation-id>10.1007/s00521-021-05950-7</elocation-id>
                        <fpage>13205</fpage>-<lpage>13224</lpage>
                        <pub-id pub-id-type="doi">10.1007/s00521-021-05950-7</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-99829-3">
                    <label>3</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>A reinforced random forest model for enhanced crop yield prediction by integrating agrarian parameters</article-title>.
                        <source>
                            <italic>Journal of Ambient Intelligence and Humanized Computing</italic>
                        </source>.<year>2021</year>;<volume>12</volume>(<issue>11</issue>) :
                        <elocation-id>10.1007/s12652-020-02752-y</elocation-id>
                        <fpage>10009</fpage>-<lpage>10022</lpage>
                        <pub-id pub-id-type="doi">10.1007/s12652-020-02752-y</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report99828">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.76627.r99828</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Alzami</surname>
                        <given-names>Farrikh</given-names>
                    </name>
                    <xref ref-type="aff" rid="r99828a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2669-3864</uri>
                </contrib>
                <aff id="r99828a1">
                    <label>1</label>Faculty of Computer Science, University of Dian Nuswantoro, Semarang, Indonesia</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>20</day>
                <month>12</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Alzami F</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport99828" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.73009.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The paper is well organized and easy to read to understand the content</p>
            <p> </p>
            <p> The paper used data which are collected from different states in the United States, a global average cannot be used for imputing missing values. Therefore, they applied the Haversian distance between the two points, for replacing the null values.&#x00a0;</p>
            <p> </p>
            <p> The authors stated that there is no strong linear correlation between the input features and the target output in the dataset. Hence, they decided to execute many algorithms such as random forest (RF), stacked generalization, gradient boosted tree (GBT) regression, and LASSO regression algorithms. And finally they proposed the Stacked regression model which is a method of combining multiple regressors and compared its performance with other models.&#x00a0;</p>
            <p> </p>
            <p> The authors compared the data set using Scatter matrix for the 12 sample features. They identified highly correlated features and less correlated features using heat map data visualization method.</p>
            <p> </p>
            <p> The authors concluded that the stacked ensemble model outperforms with an accuracy of 88.89% &#x00a0;than Random forest (87.71%) and Gradient Boosted tree (86.98%).</p>
            <p> </p>
            <p> The paper also stated the future works. Thus, I accept the paper without any modifications.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Data mining, machine learning, pattern recognition</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report102293">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.76627.r102293</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Pant</surname>
                        <given-names>Millie</given-names>
                    </name>
                    <xref ref-type="aff" rid="r102293a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r102293a1">
                    <label>1</label>Department of Applied Science and Engineering, Department of Applied Science and Engineering, Roorkee, Uttarakhand, India</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>20</day>
                <month>12</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Pant M</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport102293" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.73009.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors have proposed an article entitled: An intelligent decision support system for crop yield prediction using hybrid machine learning algorithms.</p>
            <p> </p>
            <p> It is an interesting study and is relevant to the present scenario.</p>
            <p> </p>
            <p> I would suggest the authors to extend the review part a bit. Presently, it only discusses papers of 2020 and one paper of 2013. It will be good if the authors present a good range of review articles.</p>
            <p> </p>
            <p> In the opening sentence in the introduction, "increases" should be replaced with increase and "climatic" should be replaced with climate. Likewise there are other grammatical errors that may be corrected before submitting the final version.</p>
            <p> </p>
            <p> Maybe the authors can add a table mentioning the characteristics of the algorithms used in the paper.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Numerical optimization, artificial intelligence, data analysis</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
