Estimation of Paddy Crop Water Usage in Indian Conditions Using Ensemble Learning

Sibani Mohanty; Dhanpratap Singh; Ajit Kumar Pasayat

doi:10.12688/f1000research.165706.1

Home Browse Estimation of Paddy Crop Water Usage in Indian Conditions Using Ensemble...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Estimation of Paddy Crop Water Usage in Indian Conditions Using Ensemble Learning

[version 1; peer review: 2 approved]

Sibani Mohanty¹, Dhanpratap Singh¹, Ajit Kumar Pasayat ²

PUBLISHED 10 Jul 2025

Author details Author details

¹ Lovely Professional University, Phagwara, Punjab, India
² Kalinga Institute of Industrial Technology, Bhubaneswar, Odisha, India

Sibani Mohanty
Roles: Data Curation, Funding Acquisition, Methodology, Validation, Writing – Original Draft Preparation

Dhanpratap Singh
Roles: Data Curation, Funding Acquisition, Methodology, Writing – Original Draft Preparation, Writing – Review & Editing

Ajit Kumar Pasayat
Roles: Formal Analysis, Funding Acquisition, Methodology, Software, Supervision, Visualization, Writing – Original Draft Preparation

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Agriculture, Food and Nutrition gateway.

Abstract

Background

In the Indian coastal state of Odisha, agriculture remains the primary livelihood, particularly paddy cultivation. However, traditional farming practices often result in inefficient resource use, particularly water. Given the state’s varied climatic zones and soil types, there is a pressing need for sustainable solutions. Precision agriculture, which utilizes advanced information technologies for decision-making, offers a pathway to enhance productivity while minimizing resource wastage.

Methods

This study applied machine learning (ML) and ensemble regression techniques to predict water usage for paddy cultivation in Odisha. The models were trained on a comprehensive dataset integrating remote sensing data, satellite imagery, historical weather records, soil profiles, and field-level observations. Various regression algorithms were used in ensemble combinations to enhance predictive accuracy and model robustness. Soil moisture, climatic conditions, and crop health indicators were continuously monitored using sensor-based and image-derived data.

Results

The ensemble regression models demonstrated high predictive accuracy, with performance metrics exceeding 90% in forecasting optimal water usage. These predictions enabled precise water management tailored to specific agro-climatic zones within Odisha. Furthermore, the models effectively supported crop recommendation strategies based on soil and environmental parameters, ensuring optimal resource allocation.

Conclusions

The integration of ML and ensemble regression in precision agriculture significantly improves water use efficiency and supports data-driven farming in coastal Odisha. By enabling accurate predictions of water needs and crop suitability, these technologies contribute to maximizing yield, conserving natural resources, and fostering long-term sustainability. The findings emphasize the potential for scalable, technology-driven solutions to modernize traditional agricultural practices in resource-constrained environments.

Keywords

Paddy Crop, Machine Learning, Ensemble Regression, AdaBoost, XGBoost

Corresponding author: Ajit Kumar Pasayat

Competing interests: No competing interests were disclosed.

Grant information: The authors would like to express their gratitude to the Kalinga Institute of Industrial Technology, Bhubaneswar, for funding (KIIT-DU/802/25) for publication of this article.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 Mohanty S et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Mohanty S, Singh D and Pasayat AK. Estimation of Paddy Crop Water Usage in Indian Conditions Using Ensemble Learning [version 1; peer review: 2 approved]. F1000Research 2025, 14:684 (https://doi.org/10.12688/f1000research.165706.1) First published: 10 Jul 2025, 14:684 (https://doi.org/10.12688/f1000research.165706.1) Latest published: 10 Jul 2025, 14:684 (https://doi.org/10.12688/f1000research.165706.1)

Introduction

Agriculture is the world’s oldest and most common occupation in the world. Since farming is the primary source of life sustenance in Odisha, it is the most popular practice. Precision agriculture, sometimes referred to as “digital farming,” is a management approach that makes better use of information technology to boost agricultural output and decision-making.¹ Precision agriculture helps farmers monitor crop health, maximize resource use, and boost overall farm output by gathering and calculating data from multiple sources such as weather stations, satellite imaging, and soil sensors.² Using this method, farmers may decide when, where, and how much to water, fertilize, or apply pesticides based on data. Precision farming also reduces waste and its consequences on the environment by using less water, fertilizers, and pesticides, among other inputs. The increasing need for food production due to the growing global population presents major problems for farmers seeking to maximize agricultural productivity.

The Indian state of Odisha is located in the north-eastern region of the peninsula, with latitudes ranging from 17°31′N to 20°31′N and longitudes from 81°31′E to 87°30′ E. Its overall size is 15.57 million hectares, or roughly 4.7% of India’s total land area.³

In the 2011 Census, Odisha had a population of 4.20 crore. Approximately 76% of the total population works in the agricultural sector. There are 52.66 lakh hectares of net cultivated area. There is 87.94 lakh hectares of gross cultivated area. The remainder consisted of pastures, forests, and cultivable waste. Ten agro-climatic zones, encompassing landform, terrain, climate, and soil, were established for the State. Eight major soil groups, red soil, black soil, Mixed Red and Yellow Soil, Brown Forest soil, laterite soil, Deltaic Alluvial soil, Coastal Saline and Alluvial soil, and red soil, have been identified in Odisha’s soils.⁴

For optimal growth, each plant type requires certain soil and location conditions. The physicochemical properties and microenvironment of soils regulate the availability of water and plant nutrients; growth or lack thereof is dependent on these elements. Studies on soil sites cover how many factors, such as soil depth, texture, salinity, pH, drainage, humidity, and nutrient availability affect plant growth and yield.⁵

Summers (from March to June), monsoons (from July to September), and winters (from October to February) are the three primary seasons of Odisha. Its tropical environment is characterized by high humidity, high temperature, and medium to heavy rainfall. Rainfall averages approximately 451.2 mm. A repeat monsoon also hit Odisha in October and November.

Odisha has a plentiful supply of water for disposal.⁶ It is also endowed with an extensive river-stream network. The state’s agricultural expansion is driven mostly by irrigation. Growing a number of commercial crops is acceptable when sprinkler irrigation is performed.

It has been determined that even a 10% improvement in irrigation water-use efficiency above the current level can assist in providing crops in sizable regions with irrigation that could save their lives. It is necessary to gather current climatic and soil characteristics from several districts. Precision agriculture has become a game-changing strategy that uses information technology to improve farming techniques in response to these issues.⁷ With this creative approach, farmers can make data-driven decisions that result in better crop health, more sustainable practices, and more effective resource usage.

Machine learning in precision agriculture requires accurate and diverse datasets for training the models.⁸ It relies on data sources, such as remote sensing, satellite imagery, weather records, soil measurements, and field observations. The insights generated by machine learning models help farmers make data-driven decisions, optimize resource management, minimize environmental impacts, and maximize crop productivity in a precise and sustainable manner.⁹ Algorithms for machine learning were used to ensure optimal water consumption in the field.¹⁰ Because of the relatively higher climate and water content in the fields during the rainy season, less water is needed overall than during the summer.

This will assist farmers in obtaining accurate information on the soil’s water content and climate, enabling them to make the right agricultural decisions for optimal crop growth and high yield.

It has been observed that 97% accuracy results in production using Gaussian ELM techniques. When different regression techniques are applied in crop recommendation, the accuracy is greater than 90% for every technique applied.

Literature review

C. Murugamani et al.¹¹ emphasized the use of machine learning methods in precision farming. Determining the proper plant development parameters was the primary goal. Plant growth and high output are the best ways to encourage clever and intelligent farming and lower agricultural risks. They employed sensors to gather parameters. Quickly and clearly improving your content. To investigate and identify diseases in cotton crops, they used machine learning techniques, such as regression, naïve Bayes, and support vector machines, on an IoT platform. They found that the results obtained using Support Vector Machine techniques were extremely accurate. They focused on the use of pesticides, fertilizers, and water because the overuse of these substances deteriorates groundwater quality and destroys crops.

Combining the research efforts of Prabhavathi et al.¹² and Swami Durai et al.,¹³ a comprehensive approach emerged for enhancing agricultural productivity through data-driven methodologies. Prabhavathi et al. investigated soil variables for forecasting agricultural outcomes using a range of machine learning algorithms with a Gaussian Extreme Learning Machine, notably achieving 97% accuracy in predicting crop yield, highlighting its effectiveness. Meanwhile, Senthil Kumar Swami Durai et al.’s focus on farming recommendations based on soil and weather parameters yielded a suite of modules within a Django web application framework that integrated Deep Learning and Machine Learning techniques. Their study showed high accuracies across multiple algorithms, with the Random Forest Classifier which have Randomized CV identified as the optimal model for crop recommendation, boasting a 95.45% accuracy rate. Together, these findings help us understand the potential of advanced computational methods to revolutionize agricultural methods, offering promising avenues for global implementation to benefit farmers worldwide.

Priyani et al.¹⁴ and Treboux et al.¹⁵ contributed to the advancement of smart agriculture and precision farming using machine-learning techniques. Priyani et al. focused on enhancing agricultural efficiency by leveraging IoT, automation, and hybrid machine learning algorithms to forecast crop and soil moisture levels using data provided by the state of Gujarat. Their approach resulted in more accurate predictions with lower error rates, highlighting the potential of the smart agriculture infrastructure. On the other hand, Treboux et al. concentrated on precision farming in vineyards, employing image recognition on high-precision aerial pictures to analyze color variations and intensities. By utilizing a Decision Tree Ensemble and morphological approaches, they achieved a high accuracy rate of 94.275% for identifying crop features. However, they noted the importance of manually exploring datasets for further accuracy improvement. Together, these studies demonstrate how machine learning can be used to optimize agricultural practices and maximize crop yield, paving the way for more efficient and sustainable farming methods.

Sajjad Ahmad et al.¹⁶ studied soil moisture, which is a crucial component of hydrology and climate. This is an essential variable for precision farming. This significantly affects the dispersion of water in the water cycle. Remote sensing was used to examine the percentage of soil moisture using a machine learning technique. Their research focused on the Colorado River Basin and the current drought in the southwestern United States. They developed a state-of-the-art statistical learning methodology. They employed models such as SVM, ANN, and MLR to estimate soil moisture using data from the Tropical Rainfall Measuring Mission Precipitation Radar (TRMMPR) and Normalized Difference Vegetation Index. The Support Vector Machine model captured the connections between soil moisture and agriculture.

Bakthavatchalam, et al.¹⁷ used machine learning systems to anticipate crops. Meteorological and soil parameters such as N, P, K, pH, temperature, humidity, and rainfall were recorded. Creating a model for high-yield and precision agriculture is the primary objective. For learning, they used models and the WEKA Supervised Machine Learning method.

The three machine learning techniques used for classification are Rule-based JRip, Decision table classifier, and multilayer perceptron-based algorithms. These three classifiers yield very few errors. The accuracy increased from 96.23% to 98.22% in the first iteration of the second iteration using MLP, from 88.59% to 88.5909% in the second iteration using JRip, and from 96.0% in the second iteration using the Decision Table. The machine-learning model was found to assist in producing a precision model.

Farhat Abbas et al.¹⁸ attempted to increase potato crop yield. Datasets of soil parameters collected from six fields in Atlantic Canada were used in their investigation. Machine learning techniques, such as K-Nearest Neighbor, Elastic Net, Linear Regression, and Support Vector Regression, were used to forecast the yield and quality of potatoes. Yield projections were created using modeling techniques and subsequently assessed using a range of statistical features. SVR outperformed the other models for every dataset collected. KNN did not perform up to par. Because machine learning techniques could explain approximately 60% of the potato yield in terms of soil attributes and 40% in terms of meteorological and environmental parameters, it was concluded that they were effective. Because machine learning techniques could explain approximately 60% of the potato yield in terms of soil attributes and 40% in terms of meteorological and environmental parameters, it was concluded that they were effective. It was shown that if the datasets were larger, the model might produce more accurate findings.

Savvas Dimitriadis et al.¹⁹ investigated the ways in which precision agriculture uses machine learning to better manage natural resources, such as water, and extract new knowledge. For this reason, efforts have been made to appropriately monitor crops, soil, and climate. They combined a black-box technique with evolutionary algorithms to optimize the control system. Pre-classified examples comprise the datasets used in this study. Their main objective was to create a new dynamic model using the currently available knowledge. Method that made machine learning effective in addressing real-world challenges in precision agriculture. They used WEKA to run machine-learning algorithms on all datasets, yielding several useful guidelines. The false-positive rate was lower at 0.07 percent, but the accuracy was higher at 91.59%.

A. Sharma et al.²⁰ examined India’s prospects for precision agriculture. They found that the technology used in precision agriculture includes robotics, sensors, drones, the Internet of Things, GPS, and machine-learning algorithms. They are used in data collection, pre-processing, filtering, and decision applications based on dependable information. Their research indicates that these are particularly helpful for gathering soil and climate characteristic datasets. They observed that optimizing natural resources and agricultural inputs was necessary for high yields and higher crop growth. They found that because precision agriculture is an Internet-connected, data-driven farming technology, farmers must receive the necessary training to use these methods for the best possible use of resources and to enable them to make the right decisions that will support the expansion of our economy.

T. Talaviya et al.²¹ concentrated on real-time machine learning, which was applied to weed detection and the classification of agrochemicals applied in different concentrations per acre of cropland. They created a Computer Vision for Real-Time Distinguishing between Crop/Weed for Precision Weedicides Application or Any Chemicals in order to protect the crop and increase productivity. The dataset was subjected to a Random Forest Classifier prior to conducting an outside test. The accuracy was determined to be 95% when all features were selected, but only 90% when only a few features were selected.

Marcelo Chan Fu Wei et al.²² centered on mapping the carrot yield. A method for producing a carrot yield map was the main target using the dataset and the Random Forest Regression algorithm. The dataset was gathered from satellite images. The Training and Testing sets were created using the entire Dataset, To statistical parameters were employed to calculate and estimate the performance of the model. The Random Forest Regression model has been shown to be a successful machine-learning method for commercial carrot yield prediction.

Kumar et al. Researcher²³ used the proposed DCNN to evaluate soil moisture and schedule irrigation for precision agriculture producers to integrate IoT applications into agriculture, decrease the amount of water needed for cultivation, and enhance agricultural output by monitoring water content at different stages of plant growth. Additionally, it adjusts the water level so that future irrigation decisions can preserve water stability and crop growth. Apriori and GRU require the data to be delivered and stored in a grid view (gated recurrent unit). This system helps forecast irrigation plans based on needs by utilizing many sensor and parameter modeling approaches. Temperature, humidity, and soil moisture were anticipated characteristics. When producing crops with great harvest and minimal water consumption, smart irrigation is supported by the observed experimental results. With a 98.5% accuracy rate in the testing results, the DCNN predicted an MSE (Mean Squared Error) value of 99.25% of the time.

Srinivasa Rao Burri et al.²⁴ developed an advanced-level machine learning model that can predict soil moisture levels and help to optimize the use of water in agriculture, which can potentially be utilized to build intelligent irrigation systems. Data from the “Smart Irrigation System Dataset,” an open-source dataset provided by the University of California, Irvine, was used to assess and train the model. The model performance factors were assumed by the Area under the ROC curve (AUC), accuracy, recall, precision, and other classification factors. They employed a transfer- learned ResNet-50 model. The results demonstrated an AUC of 0.95, indicating that the model correctly distinguished between soil moisture conditions 95% of the time.

Methods

The prediction of overall water usage was performed in various steps. Figure 1 shows the steps followed.

Figure 1. Flow of the data in the models.

Data preparation and details

Data collection and details

The dataset under consideration for this study is derived from Open-Meteo and encompasses a comprehensive representation of paddy crop cultivation near Bhubaneswar, Odisha, India. This dataset comprises 29 columns and 15284 rows of meticulously collected data, providing a detailed insight into various facets of paddy cultivation. Notably, this dataset specifically focuses on the cultivation of the MR84 variety of paddy,⁵⁰ a widely cultivated strain known for its adaptability and yield potential in the region.

The dataset was meticulously curated to exclude data related to snow precipitation and other irrelevant factors, ensuring a focused analysis of the growth dynamics of paddy crops in the given geographic location. Each column in the dataset offers unique dimensions of information, ranging from meteorological parameters to growth coefficients,^25,26 facilitating a comprehensive understanding of the factors influencing paddy cultivation. Additionally, the inclusion of growth coefficient data for the three stages of the MR84 variety further enhanced the granularity of the analysis, enabling researchers to assess the growth progression and performance of the crop across different developmental phases.

Data preprocessing

Following the initial loading of AgriDataset_2. csv’ file into the Python environment using the Pandas library,²⁷ and additional preprocessing steps were performed to ensure that the dataset was integrable and usable for subsequent analyses. After excluding non-numeric columns to focus solely on numerical features, further preprocessing techniques, such as normalization,²⁸ one-hot encoding,²⁹ and scaling³⁰ were applied.

Normalization, denoted as $Norm (x)$ , is used to normalize numerical features to a standard range, typically between 0 and 1, thereby stopping a single feature from showing dominance due to its larger magnitude. This step is crucial for preserving the relative significance of each feature in the dataset.²⁸ Mathematically, normalization can be expressed as

x_{norm} = \frac{x - min (x)}{max (x) - min (x)}

In addition, one-hot encoding was utilized for this binary transformation method, which allows categorical features to be properly incorporated, creating separate binary columns for each category within a categorical variable. This technique ensures that categorical variables are appropriately represented in a numerical format, allowing their inclusion in subsequent analyses such as regression or classification models.²⁹

Furthermore, scaling was implemented to standardize the range of the numerical features, ensuring that they exhibited similar scales and variances. This step is necessary for algorithms that react sensitively to feature scaling, such as support vector machines and k-nearest neighbors, to perform optimally and to avoid biased results. Mathematically, scaling can be expressed as

x_{scaled} = \frac{x - mean (x)}{std (x)}

By incorporating these preprocessing techniques into the data preparation pipeline, the ‘AgriDataset_2. csv’ file was transformed into a robust and standardized dataset ready for in-depth analysis and modeling to gain insights into the factors influencing paddy crop cultivation near Bhubaneswar, Odisha, India.³⁰

Data addition

To enhance the utility of the dataset and provide a more thorough understanding of water management practices in paddy cultivation³¹ near Bhubaneswar, Odisha, India, a data augmentation process was initiated. This augmentation primarily involved the calculation of the net water usage for different growth stages of paddy crops, incorporating specific crop coefficient (Kc) values tailored to each stage.

For each target column (‘water_usage_kc20’, ‘water_usage_kc40’, ‘water_usage_kc60’, and ‘overall_water_usage’), the net water usage was computed based on a comprehensive formula. This formula encompasses various variables, including precipitation, rain, snowfall, crop coefficient for the respective growth stage, FAO reference evapotranspiration (ET0), and irrigation.³² The formula is as follows:

(1)

Net Water Usage = Precipitation + Rain + Snowfall - (Kc__{day} \times ET0) + irrigation,

Here, (Kc__day) represents the crop coefficient for a specific growth stage (days 20, 40, or 60), and ET0 denotes the reference evapotranspiration. Precipitation refers to the form of water that falls to the ground in the form of rain, snow, or hail, whereas rain specifically denotes liquid precipitation. Snowfall indicates the quantity of snow precipitation received. Additionally, irrigation signifies supplementary water applied to crops through artificial means.

The ‘overall_water_usage’ column was derived by summing the net water usage across all growth stages and adjusting for the corresponding crop coefficient-modulated evapotranspiration values. This comprehensive approach facilitates a holistic evaluation of the total water requirements throughout the paddy crop lifecycle, providing valuable insights into water management strategies and their implications for sustainable cultivation practices.

Following the computation of net water usage, the updated dataset enriched with these calculations was stored in the form of a CSV file for further research and model development. This marked a significant advancement in refining the dataset and paving the way for a deeper exploration of the dynamics of water usage in paddy cultivation within a specified geographic context.

Data extraction - Principal Component Analysis (PCA)

Principal Component Analysis (PCA)³³ was used to reduce the dimensionality of the dataset while retaining all essential information. Prior to PCA, the data underwent standardization using the StandardScaler function from the Scikit-learn library.^34,35 All features should contribute equally, which ensures that by scaling them, they have a mean of 0 and 1 as the standard deviation.

PCA was applied to convert the original features into a new set of uncorrelated variables, termed principal components. The amount of variance retained from the original dataset depended on the number of principal components chosen. In this study, two principal components were selected for visualization purposes.³⁶

The resulting principal components were visualized in a scatterplot (see Figure 2⁵¹) to observe any discernible patterns or clusters within the data.

Figure 2. Scatterplot visualization.

Feature selection and correlation analysis

To optimize the performance of the predictive model, we should carefully select the input features. This section delineates the meticulous process of feature selection and correlation analysis, aimed at identifying pertinent variables and understanding their interrelationships.³⁷

Initially, non-numeric columns, such as ‘time,’ were expunged from the dataset as they did not contribute to the numerical analysis. Additionally, the target variable ‘overall_water_usage’ was excluded given its role as the dependent variable in the prediction model.

A comprehensive correlation analysis³⁸ was then undertaken to unravel the intricate relationships between the remaining numeric features. This analysis involved computing Pearson correlation coefficients,³⁹ which quantitatively show the linear association between every feature and its respective target variables. Simultaneously, Mutual Information Gain was employed to uncover nonlinear and non-monotonic relationships, providing valuable insights into the information obtained regarding one variable through another.⁴⁰

For each target variable (‘overall_water_usage,’ ‘coeff20,’ ‘coeff40,’ and ‘coeff60’), feature selection was conducted individually. The features were iteratively pruned based on the specific target variable under consideration, ensuring the inclusion of only relevant features in subsequent analyses.

Furthermore, to visualize the correlations more intuitively, a pair plot⁴¹ was generated to show the relationship between the top ten selected features and the target variables. This plot provides a comprehensive overview of data distribution and discernible patterns or clusters within the dataset.

The results of feature selection, including Pearson correlation coefficients, Mutual Information Gain scores, and pair plots (ref Figure 3⁵¹), were meticulously presented in tabular and graphical formats. These comprehensive analyses serve to guide the selection of optimal features for subsequent modeling and prediction tasks, thereby enhancing the robustness and accuracy of the predictive model.

Figure 3. Pair plot for top selected features.

Model preparation

After augmenting the dataset with additional columns to enhance the feature space, the feature selection process was revisited to identify the most relevant features for predicting the target variable ‘overall_water_usage.’ Mutual information gain was employed as the criterion for feature selection, with a threshold set at 0.55 to retain features with high predictive power.

Subsequently, ensemble learning models were trained using the selected features to predict ‘overall_water_usage.’ The ensemble comprises the following algorithms.

Random forest regression: A decision tree-based approach that creates multiple decision trees and ensembles all of their output to improve accuracy and robustness.⁴² The formula used for Random Forest Regression is:

\hat{y_{i}} = \frac{1}{N} \sum_{j = 1}^{N} h_{j} (x_{i})

Where:

- $\hat{y_{i}}$ denotes the predicted value of the target variable.

- $h_{j} (x_{i})$ is the prediction of the $j$ -th decision tree.

- $N$ is the total number of decision trees.

AdaBoost regression: This enhances the performance of the regression by merging all the weak predictors into a stronger, more accurate model, and hence enhances both performance and accuracy.⁴³ The formula for AdaBoost is as follows:

F (x) = \sum_{t = 1}^{T} α_{t} h_{t} (x)

where

(F (x))

is the final prediction,

(α_{t})

is the weight assigned to the

(t)

-th weak learner, and

(h_{t} (x))

is the prediction of the

(t)

-th weak learner.

XGBoost regression: An optimized gradient boosting algorithm, which is known for its efficiency and scalability, employs a gradient boosting framework.⁴⁴ The formula for XGBoost is as follows:

\hat{y_{i}} = \sum_{k = 1}^{K} f_{k} (x_{i})

where

(\hat{y_{i}})

is the predicted value of the target variable,

(f_{k} (x_{i}))

is the prediction of the

(k)

-th tree, and

(K)

is the total number of trees.

CatBoost regression: A gradient boosting library that utilizes category-specific computation to seamlessly handle categorical features.⁴⁵ The formula for CatBoost is similar to that for XGBoost, which involves the summation of predictions from individual trees.

\hat{y_{i}} = \sum_{k = 1}^{K} f_{k} (x_{i})

where

\hat{y_{i}}

denotes the predicted value of the target variable.

f_{k} (x_{i})

is the prediction of the k-th tree. K denotes the total number of trees.

Each model was trained on the selected features, and its performance was calculated using different matrices, including Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared $((R^{2}))$ scores.

Mean Squared Error (MSE): MSE is used to measure the mean squared difference between the predicted values and actual given values. More weight is assigned to larger errors, that is, they are sensitive to outliers.

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}

Where:

$(n)$ is the Number of samples.

$(y_{i})$ is the actual value of the target variable for the $(i)$ -th sample.

$(\hat{y_{i}})$ is the predicted value of the target variable for the $(i)$ -th sample.

Mean Absolute Error (MAE): The MAE is the average of the exact differences between the predicted and actual values. This shows us how far away the predicted value is from the actual given values.

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - \hat{y_{i}} |

Where:

$(n)$ is the Number of samples.

$(y_{i})$ is the actual value of the target variable for the $(i)$ -th sample.

$(\hat{y_{i}})$ is the predicted value of the target variable for the $(i)$ -th sample.

R-squared $(R^{2})$ Score: R-squared calculates how the independent variables explain the variability of the dependent variable and shows that the proportion of variance in the dependent variable is predictable from the independent variable.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

Where:

$(n)$ is the Number of samples.

$(y_{i})$ is the actual value of the target variable for the $(i)$ -th sample.

$(\hat{y_{i}})$ is the predicted value of the target variable for the $(i)$ -th sample.

$(\bar{y})$ is the Mean of the actual values of the target variable.

These metrics provide valuable information regarding the performance of regression models by quantifying the accuracy and goodness of fit of the predictions.

Ensemble model evaluation

Ensemble the strength of different models into one model by averaging the predictions from all individual models. The ensemble’s predictive performance was evaluated using the same evaluation metrics as those of individual models.⁴⁶

Additionally, learning curves were generated to visualize and understand the performance of the model as a function of the training set size, aiding in identifying potential issues such as overfitting or underfitting, as shown in Figure 4.⁵¹

Figure 4. Prediction vs. actual plots.

Finally, prediction vs. actual plots were generated for each model to visually assess the agreement between the predicted and actual values of ‘overall_water_usage and use the insider information about the models’ predictive capabilities.

Results

In this section, we present the outcomes of evaluating various machine learning algorithms for predicting ‘overall_water_usage’ using three key metrics: the Mean Squared Error (MSE), Mean Absolute Error (MAE),⁴⁷ and R-squared (R2) score.⁴⁸ Additionally, we identified important features that contribute to the prediction of ‘overall_water_usage’ through Mutual Information Regression.

Algorithm performance metrics

The performance of each algorithm in predicting ‘overall_water_usage’ is summarized in Table 1.

Table 1. Performance metrics of each algorithm.

Machine learning method	Mean Squared Error (MSE)	Mean Absolute Error (MAE)	R2 score
Random Forest	0.1262	0.0687	0.9699
AdaBoost	59.9429	5.8568	0.9632
XGBoost	2.9393	0.3793	0.9767
Ensemble Model	4.2369	1.4970	0.9861

Selected important features

The important features contributing significantly to the prediction of ‘overall_water_usage,’ identified through Mutual Information Regression, are presented in Table 2.

Table 2. Top ten selected features.

Sr No.	Top ten selected features
1.	Sunshine Duration
2.	Precipitation Sum
3.	Rain Sum
4.	Precipitation Hours
5.	Shortwave Radiation Sum
6.	Evapotranspiration
7.	Crop Coefficient (Kc) on day 20
8.	Crop Coefficient (Kc) on day 40
9.	Crop Coefficient (Kc) on day 60
10.	Irrigation

These features play a pivotal role in accurately assessing comprehensive water consumption in paddy cultivation, offering invaluable insights for devising agricultural water management strategies.

In summary, the ensemble model demonstrated superior predictive capability compared to the individual algorithms, with reduced Mean Squared Error (MSE) and Mean Absolute Error (MAE) values, along with an elevated R-squared (R2) score. The delineated significant features provide crucial insights into the primary determinants affecting water utilization in paddy cultivation, facilitating the formulation of potent water conservation and irrigation approaches.

Discussion

This study presents a comprehensive analysis for predicting ‘overall_water_usage’ in paddy cultivation using machine learning algorithms. The results highlight the efficacy of ensemble modeling, particularly in terms of predictive accuracy, as indicated by the reduced Mean Squared Error (MSE) and Mean Absolute Error (MAE) values, along with an elevated R-squared (R2) score. Additionally, important features identified through Mutual Information Regression offer valuable insights into the primary determinants affecting water utilization in paddy cultivation, which can inform the development of effective water management strategies.

The superiority of the ensemble model over individual algorithms underscores the benefit of leveraging diverse modeling techniques to enhance the predictive performance. By combining the features and strengths of different algorithms, the ensemble model achieves greater robustness and generalizability, thereby improving the accuracy of the predictions.

However, it is important to understand the limitations of this study. First, predictive models rely on the assumption that historical data accurately represents future trends. Changes in environmental conditions, agricultural practices, and socioeconomic factors may impact water usage patterns, potentially affecting the models’ predictive capabilities. Additionally, the quality and completeness of the dataset can influence the model performance. Incomplete or noisy data may introduce biases and affect the reliability of the predictions.

Furthermore, the scope of the study is limited to predicting ‘overall_water_usage’ in paddy cultivation, overlooking other essential aspects, such as crop yield prediction or soil health assessment. Future research could involve additional dependent factors, such as soil moisture content, temperature, or crop growth stage, to develop more comprehensive predictive models. Moreover, incorporating remote sensing data⁴⁹ or satellite imagery can provide valuable spatial information, enabling more precise predictions and insights into agricultural practices.

In terms of methodology, exploring advanced machine learning techniques, such as deep learning or ensemble methods, with more sophisticated feature engineering approaches could yield further improvements in predictive accuracy. Additionally, conducting sensitivity analyses to evaluate the robustness of the models to variations in the input parameters would enhance the reliability of the predictions.

Conclusion

In this study, we employed machine learning algorithms to predict overallwaterusage in paddy cultivation. Through rigorous evaluation, we found that the ensemble model outperformed individual approaches, demonstrating superior predictive capability with reduced Mean Squared Error (MSE), Mean Absolute Error (MAE), and an elevated R-squared (R2) score.

Mutual Information Regression helped to identify key features influencing water usage in paddy cultivation, providing valuable insights for effective water management strategies. Despite limitations, such as reliance on historical data and the study’s narrow focus, our findings pave the way for future research in refining predictive models and expanding the scope of analysis.

Integrating advanced techniques and additional variables could enhance predictive accuracy and foster more sustainable agricultural practices and environmental conservation efforts.

Ethics statement

No ethical approval is required.

Data availability

figshare. Estimation of Paddy Crop Water Usage in Indian Conditions Using Ensemble Learning. https://doi.org/10.6084/m9.figshare.29263199.v1⁵¹

This project contains the following underlying data:

root/ (Contains JPEG images of the processed dataset and visual outputs used in the study).

OPEN DATA/AgriDataset_2.csv (CSV file containing the dataset used for regression analysis and graph generation).

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

References

1. Cambra Baseca C, Sendra S, Lloret J, et al.: A smart decision system for digital farming. Agronomy. 2019; 9(5): 216. Publisher Full Text
2. Ahmad SF, Dar AH: Precision Farming for Resource Use Efficiency. Kumar S, Meena RS, Jhariya MK, editors. Resources Use Efficiency in Agriculture. Singapore: Springer; 2020. Publisher Full Text
3. Mohanty S, Singh D: Optimal Water Utilization in the State of Odisha using Precision Agriculture. 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE). IEEE; 2023, May; pp. 1804–1807. Publisher Full Text
4. Mishra MC, Senapati S, Rao BH: Odisha. Geotechnical Characteristics of Soils and Rocks of India. CRC Press; 2021; pp. 511–527.
5. Paul E, Frey S: Soil microbiology, ecology and biochemistry. Elsevier; 2023.
6. Mishra MC, Senapati S, Rao BH: Odisha. Geotechnical Characteristics of Soils and Rocks of India. CRC Press; 2021; pp. 511–527.
7. Sharma A, Jain A, Gupta P, et al.: Machine learning applications for precision agriculture: A comprehensive review. IEEE Access. 2020; 9: 4843–4873. Publisher Full Text
8. Sharma A, Jain A, Gupta P, et al.: Machine learning applications for precision agriculture: A comprehensive review. IEEE Access. 2020; 9: 4843–4873. Publisher Full Text
9. Koshariya AK, Rameshkumar PM, Balaji P, et al.: Data-Driven Insights for Agricultural Management: Leveraging Industry 4.0 Technologies for Improved Crop Yields and Resource Optimization. Robotics and Automation in Industry 4.0. CRC Press; 2024; pp. 260–274.
10. Dehghanisanij H, Emami H, Emami S, et al.: A hybrid machine learning approach for estimating the water-use efficiency and yield in agriculture. Sci. Rep. 2022; 12(1): 6728. PubMed Abstract | Publisher Full Text | Free Full Text
11. Murugamani C, Shitharth S, Hemalatha S, et al.: Machine learning technique for precision agriculture applications in 5G-based internet of things Wireless Communications and Mobile Computing.2022. Publisher Full Text
12. Prabavathi R, Chelliah BJ: An Optimized Gaussian Extreme Learning Machine (GELM) for Predicting the Crop Yield using Soil Factors. 2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC). IEEE; 2022; pp. 219–222. Publisher Full Text
13. Durai SKS, Shamili MD: Smart farming using machine learning and deep learning techniques Decision. Anal. J. 2022; 3: 100041. Publisher Full Text
14. Abraham G, Raksha R, Nithya M: Smart Agriculture Based on IoT and Machine Learning. 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India. 2021; pp. 414–419. Publisher Full Text
15. Treboux J, Genoud D: Improved Machine Learning Methodology for High Precision Agriculture. 2018 Global Internet of Things Summit (GIoTS), Bilbao, Spain. 2018; pp.1–6. Publisher Full Text
16. Ahmad S, Kalra A, Stephen H: Estimating soil moisture using remote sensing data: A machine learning approach. Adv. Water Resour. 2010; 33(1): 69–80. Publisher Full Text
17. Bakthavatchalam K, Karthik B, Thiruvengadam V, et al.: IoT framework for measurement and precision agriculture: predicting the crop using machine learning algorithms. Technologies. 2022; 10(1): 13. Publisher Full Text
18. Abbas F, Afzaal H, Aitazaz A: Farooque, and Skylar Tang.;Crop yield prediction through proximal sensing and machine learning algorithms. Agronomy. 2020; 10(7): 1046. Publisher Full Text
19. Dimitriadis S, Goumopoulos C: Applying Machine Learning to Extract New Knowledge in Precision Agriculture Applications. 2008 Panhellenic Conference on Informatics, Samos, Greece. 2008; pp. 100–104. Publisher Full Text
20. Sharma A, Jain A, Gupta P, et al.: Machine Learning Applications for Precision Agriculture: A Comprehensive Review. IEEE Access. 2021; 9: 4843–4873. Publisher Full Text
21. Talaviya T, Shah D, Patel N, et al.: Implementation of artificial intelligence in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artif. Intell. Agric. 2020; 4: 58–73. Publisher Full Text
22. Wei MCF, Maldaner LF, Ottoni PMN, et al.: Carrot yield mapping: A precision agriculture approach based on machine learning. Ai. 2020; 1(2): 229–241. Publisher Full Text
23. Kumar P, Udayakumar A, Anbarasa Kumar A, et al.: Multiparameter optimization system with DCNN in precision agriculture for advanced irrigation planning and scheduling based on soil moisture estimation. Environ. Monit. Assess. 2023; 195: 13. PubMed Abstract | Publisher Full Text
24. Burri SR, Agarwal DK, Vyas N, et al.: Optimizing Irrigation Efficiency with IoT and Machine Learning: A Transfer Learning Approach for Accurate Soil Moisture Prediction. 2023 World Conference on Communication & Computing (WCONF), RAIPUR, India. 2023; pp. 1–6. Publisher Full Text
25. Dorairaj D, Govender NT: Rice and paddy industry in Malaysia: governance and policies, research trends, technology adoption and resilience. Front. Sustain. Food Syst. 2023; 7: 1093605. Publisher Full Text
26. Montazar A, Rejmanek H, Tindula G, et al.: Crop coefficient curve for paddy rice from residual energy balance calculations. J. Irrig. Drain. Eng. 2017; 143(2): 04016076. Publisher Full Text
27. Nelli F: The pandas library—an introduction. Python Data Analytics: With Pandas, NumPy, and Matplotlib. Berkeley, CA: Apress; 2023; pp. 73–114.
28. Mallikharjuna Rao K, Saikrishna G, Supriya K: Data preprocessing techniques: emergence and selection towards machine learning models - a practical review using HPA dataset. Multimed. Tools Appl. 2023; 82: 37177–37196. Publisher Full Text
29. Dahouda MK, Joe I: A Deep-Learned Embedding Technique for Categorical Features Encoding. IEEE Access. 2021; 9: 114381–114391. Publisher Full Text
30. Mishra P, Biancolillo A, Roger JM, et al.: New data preprocessing trends based on ensemble of multiple preprocessing techniques. TrAC Trends Anal. Chem. 2020; 132: 116045. Publisher Full Text
31. Datta A, Ullah H, Ferdous Z: Water management in rice. Rice production worldwide. 2017; 255–277. Publisher Full Text
32. Pereira LS, Paredes P, Hunsaker DJ, et al.: Standard single and basal crop coefficients for field crops. Updates and advances to the FAO56 crop water requirements method. Agric. Water Manag. 2021; 243: 106466. Publisher Full Text
33. Greenacre M, Groenen PJ, Hastie T, et al.: Principal component analysis. Nat. Rev. Methods Primers. 2022; 2(1): 100. Publisher Full Text
34. Zollanvari A: Supervised Learning in Practice: the First Application Using Scikit-Learn. Machine Learning with Python: Theory and Implementation. Cham: Springer International Publishing; 2023; pp. 111–131. Publisher Full Text
35. Bisong E, Bisong E: Introduction to Scikit-learn, Building machine learning and deep learning models on google cloud platform: a comprehensive guide for beginners.2019; 215–229. Publisher Full Text
36. Mutlag WK, Ali SK, Aydam ZM, et al.: Feature extraction methods: a review. J. Phys. Conf. Ser. 2020, July; Vol. 1591(1): p. 012028). IOP Publishing. Publisher Full Text
37. Kou G, Yang P, Peng Y, et al.: Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl. Soft Comput. 2020; 86: 105836. Publisher Full Text
38. Zaporozhets AO: Correlation analysis between the components of energy balance and pollutant emissions. Water Air Soil Pollut. 2021; 232: 1–22. Publisher Full Text
39. Liu Y, Mu Y, Chen K, et al.: Daily activity feature selection in smart homes based on pearson correlation coefficient. Neural. Process. Lett. 2020; 51: 1771–1787. Publisher Full Text
40. Gonzalez-Lopez J, Ventura S, Cano A: Distributed multi-label feature selection using individual mutual information measures. Knowl.-Based Syst. 2020; 188: 105052. Publisher Full Text
41. Emerson JW, Green WA, Schloerke B, et al.: The generalized pairs plot. J. Comput. Graph. Stat. 2013; 22(1): 79–91. Publisher Full Text
42. Kumar MS, Sah AK, Ruthvik G, et al.: Advancements in Heart Disease Prediction: A Comprehensive Review of ML and DL Algorithms. 2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS). IEEE; 2023, November; pp. 1463–1468. Publisher Full Text
43. Wang W, Sun D: The improved AdaBoost algorithms for imbalanced data classification. Inf. Sci. 2021; 563: 358–374. Publisher Full Text
44. Zhang X, Yan C, Gao C, et al.: Predicting missing values in medical data via XGBoost regression. J. Healthc. Inform. Res. 2020; 4: 383–394. PubMed Abstract | Publisher Full Text | Free Full Text
45. Arora M, Sharma A, Katoch S, et al.: A State of the Art Regressor Model’s comparison for Effort Estimation of Agile software. 2021 2nd International Conference on Intelligent Engineering and Management (ICIEM). IEEE; 2021, April; pp. 211–215. Publisher Full Text
46. Dong X, Yu Z, Cao W, et al.: A survey on ensemble learning. Front. Comp. Sci. 2020; 14: 241–258. Publisher Full Text
47. Karunasingha DSK: Root mean square error or mean absolute error? Use their ratio as well. Inf. Sci. 2022; 585: 609–629. Publisher Full Text
48. Kardani N, Zhou A, Nazem M, et al.: Estimation of Bearing Capacity of Piles in Cohesionless Soil Using Optimised Machine Learning Approaches. Geotech. Geol. Eng. 2020; 38: 2271–2291. Publisher Full Text
49. Zhang L, Zhang L: Artificial Intelligence for Remote Sensing Data Analysis: A review of challenges and opportunities. IEEE Geosci. Remote Sens. Mag. 2022; 10(2): 270–294. Publisher Full Text
50. Zippenfenig P: Weather API [Computersoftware]. Zenodo. 2023. Open-Meteo.com Publisher Full Text
51. Pasayat A: Estimation of Paddy Crop Water Usage in Indian Conditions Using Ensemble Learning. figshare. 2025. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 10 Jul 2025

Author details Author details

¹ Lovely Professional University, Phagwara, Punjab, India
² Kalinga Institute of Industrial Technology, Bhubaneswar, Odisha, India

Sibani Mohanty
Roles: Data Curation, Funding Acquisition, Methodology, Validation, Writing – Original Draft Preparation

Dhanpratap Singh
Roles: Data Curation, Funding Acquisition, Methodology, Writing – Original Draft Preparation, Writing – Review & Editing

Ajit Kumar Pasayat
Roles: Formal Analysis, Funding Acquisition, Methodology, Software, Supervision, Visualization, Writing – Original Draft Preparation

Competing interests

No competing interests were disclosed.

Grant information

The authors would like to express their gratitude to the Kalinga Institute of Industrial Technology, Bhubaneswar, for funding (KIIT-DU/802/25) for publication of this article.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 10 Jul 2025, 14:684

https://doi.org/10.12688/f1000research.165706.1

Copyright

© 2025 Mohanty S et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Mohanty S, Singh D and Pasayat AK. Estimation of Paddy Crop Water Usage in Indian Conditions Using Ensemble Learning [version 1; peer review: 2 approved]. F1000Research 2025, 14:684 (https://doi.org/10.12688/f1000research.165706.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 10 Jul 2025

Views

16

Reviewer Report 16 Sep 2025

Saswata Roy, Atal Bihari Vajpayee Indian Institute of Information Technology and Management Gwalior, Madhya Pradesh, India

Approved

https://doi.org/10.5256/f1000research.182437.r399699

Comment 1: The review is comprehensive but lacks synthesis at the end to highlight the specific research gap addressed by this study.
Comment 2: Table 1 shows AdaBoost having a much higher MSE than other models, but no explanation ... Continue reading

Comment 1: The review is comprehensive but lacks synthesis at the end to highlight the specific research gap addressed by this study.
Comment 2: Table 1 shows AdaBoost having a much higher MSE than other models, but no explanation is offered.
Comment 3: The conclusion restates results without emphasizing specific practical recommendations for irrigation planning.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Misinformation Detection/Machine Leanring/Deep Learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Views

8

Reviewer Report 03 Sep 2025

Sourajit Behera, Sardar Vallabhbhai National Institute of Technology, gujarat, India

Approved

https://doi.org/10.5256/f1000research.182437.r399701

This paper addresses an important topic in precision agriculture—accurately predicting water usage for paddy cultivation using ensemble learning models. The authors make good use of multiple machine learning algorithms and provide a detailed description of their methodology. However, some sections ... Continue reading

This paper addresses an important topic in precision agriculture—accurately predicting water usage for paddy cultivation using ensemble learning models. The authors make good use of multiple machine learning algorithms and provide a detailed description of their methodology. However, some sections would benefit from clarification, deeper analysis of results, and more explicit connections between findings and practical application

1. Abstract
The phrase “performance metrics exceeding 90%” is vague and could be misinterpreted.

2. Introduction
Accuracy figures from other studies (e.g., 97% from Gaussian ELM) are presented without sufficient experimental context.

3. Literature Review
Several studies cited involve crops (vineyards, carrots, potatoes) with very different cultivation and irrigation patterns.

4. Methods
The rationale for using a 0.55 threshold in Mutual Information Gain is not explained.

5. Results
Prediction vs. actual plots are presented but not discussed in detail.

6. Conclusion
Next steps for real-world implementation are not outlined.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Predictive Maintenance, Applied Deep Learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 10 Jul 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 10 Jul 25	read	read

Sourajit Behera, Sardar Vallabhbhai National Institute of Technology, gujarat, India
Saswata Roy, Atal Bihari Vajpayee Indian Institute of Information Technology and Management Gwalior, Madhya Pradesh, India

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

16 Views

16 Sep 2025 | for Version 1

Saswata Roy, Atal Bihari Vajpayee Indian Institute of Information Technology and Management Gwalior, Madhya Pradesh, India

16 Views Cite this report Responses(0)

Approved

Comment 1: The review is comprehensive but lacks synthesis at the end to highlight the specific research gap addressed by this study.
Comment 2: Table 1 shows AdaBoost having a much higher MSE than other models, but no explanation is offered.
Comment 3: The conclusion restates results without emphasizing specific practical recommendations for irrigation planning.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Misinformation Detection/Machine Leanring/Deep Learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

8 Views

03 Sep 2025 | for Version 1

Sourajit Behera, Sardar Vallabhbhai National Institute of Technology, gujarat, India

8 Views Cite this report Responses(0)

Approved

This paper addresses an important topic in precision agriculture—accurately predicting water usage for paddy cultivation using ensemble learning models. The authors make good use of multiple machine learning algorithms and provide a detailed description of their methodology. However, some sections would benefit from clarification, deeper analysis of results, and more explicit connections between findings and practical application

1. Abstract
The phrase “performance metrics exceeding 90%” is vague and could be misinterpreted.

2. Introduction
Accuracy figures from other studies (e.g., 97% from Gaussian ELM) are presented without sufficient experimental context.

3. Literature Review
Several studies cited involve crops (vineyards, carrots, potatoes) with very different cultivation and irrigation patterns.

4. Methods
The rationale for using a 0.55 threshold in Mutual Information Gain is not explained.

5. Results
Prediction vs. actual plots are presented but not discussed in detail.

6. Conclusion
Next steps for real-world implementation are not outlined.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Predictive Maintenance, Applied Deep Learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

[1] 1. Cambra Baseca C, Sendra S, Lloret J, et al.: A smart decision system for digital farming. Agronomy. 2019; 9(5): 216. Publisher Full Text

[2] 2. Ahmad SF, Dar AH: Precision Farming for Resource Use Efficiency. Kumar S, Meena RS, Jhariya MK, editors. Resources Use Efficiency in Agriculture. Singapore: Springer; 2020. Publisher Full Text

[3] 3. Mohanty S, Singh D: Optimal Water Utilization in the State of Odisha using Precision Agriculture. 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE). IEEE; 2023, May; pp. 1804–1807. Publisher Full Text

[4] 4. Mishra MC, Senapati S, Rao BH: Odisha. Geotechnical Characteristics of Soils and Rocks of India. CRC Press; 2021; pp. 511–527.

[5] 5. Paul E, Frey S: Soil microbiology, ecology and biochemistry. Elsevier; 2023.

[6] 6. Mishra MC, Senapati S, Rao BH: Odisha. Geotechnical Characteristics of Soils and Rocks of India. CRC Press; 2021; pp. 511–527.

[7] 7. Sharma A, Jain A, Gupta P, et al.: Machine learning applications for precision agriculture: A comprehensive review. IEEE Access. 2020; 9: 4843–4873. Publisher Full Text

[8] 8. Sharma A, Jain A, Gupta P, et al.: Machine learning applications for precision agriculture: A comprehensive review. IEEE Access. 2020; 9: 4843–4873. Publisher Full Text

[9] 9. Koshariya AK, Rameshkumar PM, Balaji P, et al.: Data-Driven Insights for Agricultural Management: Leveraging Industry 4.0 Technologies for Improved Crop Yields and Resource Optimization. Robotics and Automation in Industry 4.0. CRC Press; 2024; pp. 260–274.

[10] 10. Dehghanisanij H, Emami H, Emami S, et al.: A hybrid machine learning approach for estimating the water-use efficiency and yield in agriculture. Sci. Rep. 2022; 12(1): 6728. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Murugamani C, Shitharth S, Hemalatha S, et al.: Machine learning technique for precision agriculture applications in 5G-based internet of things Wireless Communications and Mobile Computing.2022. Publisher Full Text

[12] 12. Prabavathi R, Chelliah BJ: An Optimized Gaussian Extreme Learning Machine (GELM) for Predicting the Crop Yield using Soil Factors. 2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC). IEEE; 2022; pp. 219–222. Publisher Full Text

[13] 13. Durai SKS, Shamili MD: Smart farming using machine learning and deep learning techniques Decision. Anal. J. 2022; 3: 100041. Publisher Full Text

[14] 14. Abraham G, Raksha R, Nithya M: Smart Agriculture Based on IoT and Machine Learning. 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India. 2021; pp. 414–419. Publisher Full Text

[15] 15. Treboux J, Genoud D: Improved Machine Learning Methodology for High Precision Agriculture. 2018 Global Internet of Things Summit (GIoTS), Bilbao, Spain. 2018; pp.1–6. Publisher Full Text

[16] 16. Ahmad S, Kalra A, Stephen H: Estimating soil moisture using remote sensing data: A machine learning approach. Adv. Water Resour. 2010; 33(1): 69–80. Publisher Full Text

[17] 17. Bakthavatchalam K, Karthik B, Thiruvengadam V, et al.: IoT framework for measurement and precision agriculture: predicting the crop using machine learning algorithms. Technologies. 2022; 10(1): 13. Publisher Full Text

[18] 18. Abbas F, Afzaal H, Aitazaz A: Farooque, and Skylar Tang.;Crop yield prediction through proximal sensing and machine learning algorithms. Agronomy. 2020; 10(7): 1046. Publisher Full Text

[19] 19. Dimitriadis S, Goumopoulos C: Applying Machine Learning to Extract New Knowledge in Precision Agriculture Applications. 2008 Panhellenic Conference on Informatics, Samos, Greece. 2008; pp. 100–104. Publisher Full Text

[20] 20. Sharma A, Jain A, Gupta P, et al.: Machine Learning Applications for Precision Agriculture: A Comprehensive Review. IEEE Access. 2021; 9: 4843–4873. Publisher Full Text

[21] 21. Talaviya T, Shah D, Patel N, et al.: Implementation of artificial intelligence in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artif. Intell. Agric. 2020; 4: 58–73. Publisher Full Text

[22] 22. Wei MCF, Maldaner LF, Ottoni PMN, et al.: Carrot yield mapping: A precision agriculture approach based on machine learning. Ai. 2020; 1(2): 229–241. Publisher Full Text

[23] 23. Kumar P, Udayakumar A, Anbarasa Kumar A, et al.: Multiparameter optimization system with DCNN in precision agriculture for advanced irrigation planning and scheduling based on soil moisture estimation. Environ. Monit. Assess. 2023; 195: 13. PubMed Abstract | Publisher Full Text

[24] 24. Burri SR, Agarwal DK, Vyas N, et al.: Optimizing Irrigation Efficiency with IoT and Machine Learning: A Transfer Learning Approach for Accurate Soil Moisture Prediction. 2023 World Conference on Communication & Computing (WCONF), RAIPUR, India. 2023; pp. 1–6. Publisher Full Text

[25] 25. Dorairaj D, Govender NT: Rice and paddy industry in Malaysia: governance and policies, research trends, technology adoption and resilience. Front. Sustain. Food Syst. 2023; 7: 1093605. Publisher Full Text

[26] 26. Montazar A, Rejmanek H, Tindula G, et al.: Crop coefficient curve for paddy rice from residual energy balance calculations. J. Irrig. Drain. Eng. 2017; 143(2): 04016076. Publisher Full Text

[27] 27. Nelli F: The pandas library—an introduction. Python Data Analytics: With Pandas, NumPy, and Matplotlib. Berkeley, CA: Apress; 2023; pp. 73–114.

[28] 28. Mallikharjuna Rao K, Saikrishna G, Supriya K: Data preprocessing techniques: emergence and selection towards machine learning models - a practical review using HPA dataset. Multimed. Tools Appl. 2023; 82: 37177–37196. Publisher Full Text

[29] 29. Dahouda MK, Joe I: A Deep-Learned Embedding Technique for Categorical Features Encoding. IEEE Access. 2021; 9: 114381–114391. Publisher Full Text

[30] 30. Mishra P, Biancolillo A, Roger JM, et al.: New data preprocessing trends based on ensemble of multiple preprocessing techniques. TrAC Trends Anal. Chem. 2020; 132: 116045. Publisher Full Text

[31] 31. Datta A, Ullah H, Ferdous Z: Water management in rice. Rice production worldwide. 2017; 255–277. Publisher Full Text

[32] 32. Pereira LS, Paredes P, Hunsaker DJ, et al.: Standard single and basal crop coefficients for field crops. Updates and advances to the FAO56 crop water requirements method. Agric. Water Manag. 2021; 243: 106466. Publisher Full Text

[33] 33. Greenacre M, Groenen PJ, Hastie T, et al.: Principal component analysis. Nat. Rev. Methods Primers. 2022; 2(1): 100. Publisher Full Text

[34] 34. Zollanvari A: Supervised Learning in Practice: the First Application Using Scikit-Learn. Machine Learning with Python: Theory and Implementation. Cham: Springer International Publishing; 2023; pp. 111–131. Publisher Full Text

[35] 35. Bisong E, Bisong E: Introduction to Scikit-learn, Building machine learning and deep learning models on google cloud platform: a comprehensive guide for beginners.2019; 215–229. Publisher Full Text

[36] 36. Mutlag WK, Ali SK, Aydam ZM, et al.: Feature extraction methods: a review. J. Phys. Conf. Ser. 2020, July; Vol. 1591(1): p. 012028). IOP Publishing. Publisher Full Text

[37] 37. Kou G, Yang P, Peng Y, et al.: Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl. Soft Comput. 2020; 86: 105836. Publisher Full Text

[38] 38. Zaporozhets AO: Correlation analysis between the components of energy balance and pollutant emissions. Water Air Soil Pollut. 2021; 232: 1–22. Publisher Full Text

[39] 39. Liu Y, Mu Y, Chen K, et al.: Daily activity feature selection in smart homes based on pearson correlation coefficient. Neural. Process. Lett. 2020; 51: 1771–1787. Publisher Full Text

[40] 40. Gonzalez-Lopez J, Ventura S, Cano A: Distributed multi-label feature selection using individual mutual information measures. Knowl.-Based Syst. 2020; 188: 105052. Publisher Full Text

[41] 41. Emerson JW, Green WA, Schloerke B, et al.: The generalized pairs plot. J. Comput. Graph. Stat. 2013; 22(1): 79–91. Publisher Full Text

[42] 42. Kumar MS, Sah AK, Ruthvik G, et al.: Advancements in Heart Disease Prediction: A Comprehensive Review of ML and DL Algorithms. 2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS). IEEE; 2023, November; pp. 1463–1468. Publisher Full Text

[43] 43. Wang W, Sun D: The improved AdaBoost algorithms for imbalanced data classification. Inf. Sci. 2021; 563: 358–374. Publisher Full Text

[44] 44. Zhang X, Yan C, Gao C, et al.: Predicting missing values in medical data via XGBoost regression. J. Healthc. Inform. Res. 2020; 4: 383–394. PubMed Abstract | Publisher Full Text | Free Full Text

[45] 45. Arora M, Sharma A, Katoch S, et al.: A State of the Art Regressor Model’s comparison for Effort Estimation of Agile software. 2021 2nd International Conference on Intelligent Engineering and Management (ICIEM). IEEE; 2021, April; pp. 211–215. Publisher Full Text

[46] 46. Dong X, Yu Z, Cao W, et al.: A survey on ensemble learning. Front. Comp. Sci. 2020; 14: 241–258. Publisher Full Text

[47] 47. Karunasingha DSK: Root mean square error or mean absolute error? Use their ratio as well. Inf. Sci. 2022; 585: 609–629. Publisher Full Text

[48] 48. Kardani N, Zhou A, Nazem M, et al.: Estimation of Bearing Capacity of Piles in Cohesionless Soil Using Optimised Machine Learning Approaches. Geotech. Geol. Eng. 2020; 38: 2271–2291. Publisher Full Text

[49] 49. Zhang L, Zhang L: Artificial Intelligence for Remote Sensing Data Analysis: A review of challenges and opportunities. IEEE Geosci. Remote Sens. Mag. 2022; 10(2): 270–294. Publisher Full Text

[50] 50. Zippenfenig P: Weather API [Computersoftware]. Zenodo. 2023. Open-Meteo.com Publisher Full Text

[51] 51. Pasayat A: Estimation of Paddy Crop Water Usage in Indian Conditions Using Ensemble Learning. figshare. 2025. Publisher Full Text

Estimation of Paddy Crop Water Usage in Indian Conditions Using Ensemble Learning

Abstract

Background

Methods

Results

Conclusions

Keywords

Introduction

Literature review

Methods

Figure 1. Flow of the data in the models.

Data preparation and details

(1)

Figure 2. Scatterplot visualization.

Figure 3. Pair plot for top selected features.

Model preparation

Ensemble model evaluation

Figure 4. Prediction vs. actual plots.

Results

Algorithm performance metrics

Table 1. Performance metrics of each algorithm.

Selected important features

Table 2. Top ten selected features.

Discussion

Conclusion

Ethics statement

Data availability

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated