Mango quality prediction based on near-infrared spectroscopy using multi-predictor local polynomial regression modeling

Millatul Ulya; Nur Chamidah; Toha Saifudin

doi:10.12688/f1000research.130015.2

Home Browse Mango quality prediction based on near-infrared spectroscopy using...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Revised

Mango quality prediction based on near-infrared spectroscopy using multi-predictor local polynomial regression modeling

[version 2; peer review: 3 approved with reservations, 1 not approved]

Millatul Ulya^1,2, Nur Chamidah ³, Toha Saifudin³

PUBLISHED 18 Mar 2024

Author details Author details

¹ Department of Agroindustrial Technology, Faculty of Agriculture, Universitas Trunojoyo Madura, Bangkalan, 69162, Indonesia
² Doctoral Study Program of Mathematics and Natural Sciences, Faculty of Science and Technology, Airlangga University, Surabaya, 60115, Indonesia
³ Department of Mathematics, Faculty of Science and Technology, Airlangga University, Surabaya, 60115, Indonesia

Millatul Ulya
Roles: Data Curation, Funding Acquisition, Investigation, Resources, Visualization, Writing – Original Draft Preparation

Nur Chamidah
Roles: Conceptualization, Formal Analysis, Methodology, Project Administration, Supervision, Writing – Review & Editing

Toha Saifudin
Roles: Data Curation, Formal Analysis, Investigation, Software, Supervision, Validation

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Agriculture, Food and Nutrition gateway.

Abstract

Background

pH and total soluble solids (TSS) are important quality parameters of mangoes; they represent the acidity and sweetness of the fruit, respectively. This study predicts the pH and TSS of intact mangoes based on near-infrared (NIR) spectroscopy using multi-predictor local polynomial regression (MLPR) modeling. Herein, the prediction performance of kernel partial least square regression (KPLSR), support vector machine regression (SVMR), and MLPR is compared.

Methods

For this purpose, 186 intact mango samples at three different maturity stages are used. Prediction models are built using MLPR, KPLSR, and SVMR based on untreated and treated spectra. The best regression model for predicting pH is MLPR based on Gaussian filter smoothing spectra. Moreover, the TSS value is more accurately predicted using MLPR based on Savitzky–Golay smoothing.

Results

The findings reveal that MLPR is highly accurate in estimating the pH and TSS of mangoes, with mean absolute percentage error (MAPE) values less than 10 %. In addition, the MLPR model has the best predictive performance with the lowest Mean Squared error (MSE) and root mean squared error (RMSE) values and the highest R2 value.

Conclusions

The use of NIR spectroscopy in combination with multi-predictor local polynomial regression could provide a quick and non-destructive technique for predicting mango quality. Thus, the results of this study help support sustainable production as a sustainable development goal.

Keywords

NIR spectroscopy, mango, sustainable production, local polynomial regression

Corresponding author: Nur Chamidah

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2024 Ulya M et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Ulya M, Chamidah N and Saifudin T. Mango quality prediction based on near-infrared spectroscopy using multi-predictor local polynomial regression modeling [version 2; peer review: 3 approved with reservations, 1 not approved]. F1000Research 2024, 12:656 (https://doi.org/10.12688/f1000research.130015.2) First published: 13 Jun 2023, 12:656 (https://doi.org/10.12688/f1000research.130015.1) Latest published: 18 Mar 2024, 12:656 (https://doi.org/10.12688/f1000research.130015.2)

Revised Amendments from Version 1

In the new version of our manuscript, we add discussion that cited 13 references published in the last five years. The spectral data acquisition setup is also clearly stated by adding a figure namely Figure 1 as an illustration. The existence of outliers and any data that includes outliers have been added to the new version of our manuscript. The authors have also added suggestions to evaluate the robustness of the model using samples across different seasons as testing datasets in future research. We proposed nonparametric regression approach for predicting pH and total soluble solids (TSS) based on local polynomial estimator because it gives more flexibility of the regression curve and the prediction result is highly accurate (less than 10%)

To read any peer review reports and author responses for this article, follow the "read" links in the Open Peer Review table.

Introduction

Indonesia’s Gadung Klonal 21, commonly known as Avomango, is a popular mango cultivar owing to its thick flesh, low fiber content, and sweet flavor.¹ Avomango can be eaten in the same way as an avocado. It was developed in Pasuruan Regency, East Java. Even now, fruit pickers hand-pick mangoes and must determine whether the mango is sufficiently mature for picking. Maturity indices determine the quality and shelf life of harvested fruits, and provide the necessary flexibility for transport and marketing.² Harvest maturity is the stage of development in climacteric fruits, such as mangoes, when the fruit is harvest-ready and of an acceptable consumer grade. Consumer maturity is achieved when the fruit is ready for consumption or utilized in other ways. In climacteric fruits, consumer maturity is reached after harvest maturity.³ Mature mangoes have a low pH value and high Brix percentage; the pH value increases throughout the maturation period.

Fruit maturity indices can be estimated accurately using destructive methods. These methods are damaging, time-consuming, labor-intensive, and require manual loading.⁴ Furthermore, they are costly, require a long time for sample preparation, and are wasteful.⁵ Thus, rapid, non-destructive, and environment-friendly analytical methods are required. Hence, non-destructive techniques, such as visible imaging, colorimetry, visible and near-infrared (NIR) spectroscopy, computed tomography, hyperspectral imaging, fluorescence imaging, and multispectral imaging, have been developed to evaluate fruit maturity.³ NIR spectroscopy is the most widely used non-destructive technique in post-harvest fruit and vegetable quality determination.⁶ Moreover, NIR spectroscopy techniques have been used to overcome the limitations of destructive methods while maintaining the physicochemical attributes of food and agricultural products. In addition, using NIR spectroscopy technology helps support sustainable production as a sustainable development goal (SDGs).

Physicochemical, physical, and biological changes occur in mangoes during ripening. These changes differ depending on the mango variety. The color does not reflect the stage of maturity in Gadung Klonal 21 as the green color of the mature mango is similar to that of the raw mango. The texture of mature mangoes differs significantly; the tip of the fruit has a soft texture. A significant change is observed in the level of sweetness and acidity of mangoes, which can only be detected by destructive analysis. However, numerous studies have been conducted to predict the sweetness and acidity of mangoes using regression modeling based on spectral data from NIR spectroscopy.⁷^–¹¹

Studies have been conducted to determine and forecast the maturity and quality of mangoes.⁷^–¹² Spectral data must be modeled using several regression methods, either as raw spectra or pre-processed spectral data, to estimate the internal quality of mangoes. Most of the previous studies have modeled spectral data using linear regression and nonlinear regression. Linear regression including partial least squares regression (PLSR) and principal component regression are the two most popular approaches for NIR calibration.¹³ Linear regression is commonly used in parametric regression techniques to predict the internal quality of fruit. Furthermore, nonlinear regression methods have the potential for such application. Forecasting changes in quality in agricultural products requires predictive modeling that considers the effectiveness of a nonlinear regression model.¹⁴ To predict the fruit quality, Nicolaï, Theron, and Lammertyn²⁰ employed nonlinear regression analysis involving the kernel partial least squares regression method (KPLSR). Anderson, Walsh, Flynn, and Walsh¹⁵ reported using local PLSR to estimate the dry matter content of mango. The application of nonlinear regression to fruit quality prediction is relatively underexplored.

A nonparametric regression approach can be used to model unpatterned data, including the nonlinear cases. Nonparametric regression for predicting the acidity level of mangoes was studied by Ulya and Chamidah.²²^,²³ The prediction of the sweetness level of mangoes has been reported by Ulya, Chamidah, and Saifudin.¹⁶ These studies report that nonparametric regression based on local polynomial estimators, particularly, multiple polynomial regression (MPR), results in a better predictive model than the parametric regression approach.

Local polynomial regression can capture nonlinear patterns between the response and the predictor variables.¹⁷ This method looks at neighboring data for the specified bandwidth, matches separate piecewise regressions for each part, and combines them.¹⁸ The local regression fit is complete when all the data points are identified using the regression function values. Function calculation in local polynomial and local linear regression is performed locally at the point to be estimated.¹⁹ This is different from spline regressions.²⁰^–²³ This local estimation technique captures nonlinearities that may exist without the influence of dataset outliers at each estimation stage.²⁴ This method is data-based and easy to implement. It provides a flexible structure that can capture the nonlinear characteristics present in the data compared to multiple linear regression (MLR).²⁵

Mango pH value prediction using multi-predictor local polynomial regression (MLPR) and MPR was investigated by Ulya et al.²⁶ The results indicate that the MLPR method provides better predictive performance with a lower mean absolute percentage error (MAPE) value than the MPR method. However, some studies have attempted to overcome the problem of nonlinearity when predicting internal fruit quality based on NIR spectroscopy using KPLSR²⁷ and support vector machine regression (SVMR).²⁸^,²⁹ These two approaches perform better compared with MLR and PLSR. However, to date, no study has compared the predictive performance of nonparametric regression approaches, such as MLPR with KPLSR and SVMR, in predicting the internal quality of mangoes.

This study aims at comparing the performance of a mango pH and total soluble solid (TSS) prediction model based on MLPR with KPLSR- and SVMR-based models. MLPR was found to be the best regression model; it exhibited a predictive performance with the lowest mean squared error (MSE), root mean squared error (RMSE) and MAPE values, and the highest R² value. The MLPR algorithm in this study is useful to design instruments to detect the acidity and sweetness of intact mango.

Methods

Sample preparation

A total of 186 mangoes (Mangifera indica L, Gadung Klonal 21) were collected from a garden in Wonokerto Village, Sukorejo District, Pasuruan Regency, Indonesia. Mango samples weighing 250–300 g at varying stages of ripeness, ranging from unripe to ripe, were chosen. The mangoes were cleaned and air-dried before being wrapped in Styrofoam fruit netting, and were subsequently placed in boxes (approximately 12 mangoes). Fruit boxes were screened to avoid collisions.

NIR spectra data acquisition

The spectral data for intact mangoes was acquired using an NIR spectrometer (OtO Photonics. Inc.) in the range of 900–1650 nm at 7 nm intervals. The samples were scanned in reflectance mode to record the spectral data. The process of NIR spectra measurement was conducted by firing a halogen lamp on the sample, which was positioned at an angle of 45° to the sample, with the detector positioned at 45° to the sample. Each sample was scanned at three separate locations at two side of mango (the shoulder, cheek, and tip of the intact mango) and the obtained scans for each sample were averaged. The setup of spectral data acquisition can be seen in Figure 1. The spectral data were originally presented in terms of the reflectance value (R) and were later converted to the absorbance spectra value (log 1/R).

Figure 1. NIRS Acquisition.

Spectral data pre-treatment

Before developing the prediction models, some pre-treatment methods can eliminate undesired effects, including random noise, high-frequency noise, light scattering, baseline shifts, and any other external effects caused by environmental or instrumental factors. Furthermore, smoothing effectively reduces the high-frequency noise. Among the numerous smoothing approaches in the field, Savitzky–Golay (SG) smoothing is one of the most widely used.³⁰ Using SG can retain the signal properties, including the maximum and minimum relative values, and the width of the peak, which are lost when using other smoothing techniques. In various models, different pre-processing approaches produce varied outcomes. The application of pre-processing techniques in the NIRS modelling process during several harvest times affects NIRS accuracy.³¹ The present work pre-treated the spectra using SG smoothing generated with two-degree polynomials, Gaussian filter smoothing (GFS), and MSC.

Measurement of pH and TSS

Mango samples (10 g per sample) were blended with 40 ml of distilled water in a fruit blender. Mango juice was measured using a digital pH meter (Lutron pH-208). Triplicate measurements were performed to obtain average values. A small amount of mango juice was dropped onto a pocket digital refractometer (ATAGO PAL-1) to record TSS, expressed in terms of degree Brix (°Bx). The measurements were conducted at room temperature after spectral acquisition.

Statistical analysis

The pH, TSS, and spectral data was organized into matrices. The matrix rows represent the 186 samples, and the 114 columns represent the predictor (X) and response (Y) variables. The predictor variables were the wavelengths of 112 NIR spectra for each mango sample. The response variables described the measured pH and TSS values associated with each sample in the first and the second column, respectively.

The following steps were to perform dimension reduction using principal component analysis (PCA) to reduce predictor variables into two principal components. When modeling NIR data, the user usually eliminates outliers by implementing advanced statistics including Hotelling’s T2 and Q residuals.³² The study used Hotelling’s T² ellipse method to remove outliers. Then data excluded outliers analyzed using KPLSR, SVMR, and MLPR. The analysis was performed using calibration and validation models of the pH and TSS values. The Unscrambler X 10.4 software was used to perform spectral pre-processing and model development for pH and TSS values. The open-source software R was used to perform the MLPR method. The calibration and validation models' absorbance spectral data of the reference vs. predicted pH was plotted to investigate the nature of the spectral absorbance distribution.

Modeling using different calibration methods

One of the major issues with NIR spectroscopy for fresh fruit analysis is that the approach requires a pre-calibration procedure before it can be utilized in practice.³³ The study used several regression methods in the calibration process. The dataset was divided into two parts 80% as calibration data and the rest as validation data. For multivariate calibration, the data were modeled using a parametric regression method, including KPLSR and SVMR. Additionally, the data were modeled using the nonparametric regression method MLPR.²⁶ Subsequently, predictions were conducted on the validation dataset based on the model developed for the calibration dataset. The prediction performance of the three methods was compared in this study.

Multi-predictor local polynomial regression (MLPR)

MLPR for predicting the internal quality of fruits was proposed by Ulya et al.¹⁶ The prediction was obtained using a nonparametric regression approach based on a local polynomial estimator with one response variable and multiple predictors. The MLPR model has a response variable y that depends on the sum of some functions of the predictor variable x and can be written as follows.

(1)

\sum_{j = 1}^{p} f (x_{ij}) = \sum_{j = 1}^{p} \{β_{0 j} (x_{0 j}) + β_{1 j} (x_{0 j}) {(x_{ij} - x_{0 j})}^{1} + β_{2 j} (x_{0 j}) {(x_{ij} - x_{0 j})}^{2} + \dots + β_{dj} (x_{0 j}) {(x_{ij} - x_{0 j})}^{d_{j}}\} \sum_{j = 1}^{p} f (x_{ij}) = \sum_{j = 1}^{p} \{β_{0 j} (x_{0 j}) + β_{1 j} (x_{0 j}) {(x_{ij} - x_{0 j})}^{1} + β_{2 j} (x_{0 j}) {(x_{ij} - x_{0 j})}^{2} + \dots + β_{dj} (x_{0 j}) {(x_{ij} - x_{0 j})}^{d_{j}}\},

where

x_{ij} \in (x_{0 j} - h_{j}, x_{0 j} + h_{j})

.

${\hat{β}}_{\sim}$ is the parameter estimator, which is performed by taking n pairs of samples $(x_{i 1} x_{i 2} \dots x_{ip} y_{i})$ . The parameters were estimated using the weighted least squares (WLS) method by minimizing the following.

(2)

\sum_{i = 1}^{n} {\{y_{i} - [\sum_{j = 1}^{p} \{β_{0 j} (x_{0 j}) + β_{1 j} (x_{0 j}) {(x_{ij} - x_{0 j})}^{1} + β_{2 j} (x_{0 j}) {(x_{ij} - x_{0 j})}^{2} + \dots + β_{pj} (x_{0 j}) {(x_{ij} - x_{0 j})}^{dj}\}]\}}^{2} \prod_{j = 1}^{p} K_{h_{j}} (x_{ij} - x_{0 j}); i = 1, 2, \dots, n j = 1, 2, \dots, p

where

\prod_{j = 1}^{p} K_{h_{j}} (x_{ij} - x_{0 j})

is the product of kernel functions K(.), which was used as the weighting element in the WLS optimization process. This study used a Gaussian kernel, which is defined as follows.

(3)

K (x) = \frac{1}{\sqrt{2 π}} exp (- \frac{1}{2} (x^{2})) . - \infty < x < \infty .

In addition, the optimum bandwidth (h) as a smoothing parameter in the estimation process must be determined using this method. If the bandwidth value decreases, the regression estimation becomes rougher and vice versa. The optimum bandwidth is the bandwidth with the minimum generalized cross-validation (GCV) value³⁴ and is calculated using the following formula.

(4)

GCV (h_{j}) = = \frac{n^{- 1} \sum_{i = 1}^{n} {(y_{i}^{(j)} - {\hat{y}}_{i}^{(j)})}^{2}}{{(1 - tr [Ι - A (h_{j})] / n)}^{2}}

Model validation

Several methods have been used to assess the algorithm's performance in the prediction results. One such method is K-fold cross-validation, wherein the data is randomly divided into k parts before training or calibrating a classifier with one part and testing or validating it with another.³⁵ This method can reduce sampling bias because the data is randomly divided into several (k) parts.³⁶ The final accuracy of this process is the average accuracy of the number of processes.³⁷ In this study, five-fold cross-validation was used (Figure 2). The 165 samples were split into calibration and validation data, with 80% used as calibration models and the rest as validation models.

Figure 2. Five-fold cross-validation.

Generally, in the studies on predicting the internal quality of fruits using NIR spectroscopy, the evaluation of the predictive performance and accuracy of the models is performed on the validation dataset. Previous studies have used R² and RMSE to evaluate predictive models. However, this study evaluated the predictive model using MSE and MAPE values. The most frequently used forecasting accuracy measurement is MAPE.³⁸ MAPE has some significant and desirable characteristics, including reliability, unit-free measurement, interpretability, clarity of presentation, statistical evaluation support, and utilization of all error information.³⁹

The aforementioned four criteria are suitable for comparing the predictive performances of parametric and nonparametric regressions. Generally, a good model must have a high R² value and low RMSE, MSE, and MAPE. The RMSE, R²,²⁹ MAPE,³⁹ MSE,⁴⁰ and overall of RMSE, MSE, R² and MAPE formulas can be defined using Equations (5)–(12).

(5)

RMSE = \sqrt{\frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{n}},

(6)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{n} {({\hat{y}}_{i} - {\bar{y}}_{i})}^{2}},

(7)

MSE = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2},

(8)

MAPE = \frac{1}{n} \sum_{i = 1}^{n} (\frac{|{\hat{y}}_{i} - y_{i}|}{y_{i}}) x 100 %,

(9)

Overall RMSE = \frac{((RMS E_{C} x n_{C}) + (RMS E_{V} x n_{V}))}{(n_{C} + n_{V})},

(10)

Overall R^{2} = \frac{((R_{C}^{2} x n_{C}) + (R_{V}^{2} x n_{V}))}{(n_{C} + n_{V})},

(11)

Overall MSE = \frac{((MS E_{C} x n_{C}) + (MS E_{C} x n_{V}))}{(n_{C} + n_{V})},

(12)

Overall MAPE = \frac{((MAP E_{C} x n_{C}) + (MAP E_{V} x n_{V}))}{(n_{C} + n_{V})},

where

{\hat{y}}_{i}

is the estimated value of the i^th response variable;

y_{i}

is the measured value of the response variable;

{\bar{y}}_{i}

is the average measured value of the response variable; n is the number of observations; C is the calibration data; and V is the validation data.

The stages of developing the prediction models and measuring the predictive performance in this study are summarized in Figure 3.

Figure 3. Stages of developing prediction models.

Results

Pre-treatment

The raw absorption spectra of 186 samples of three different types of mangoes acquired from a spectrometer of wavelength 900–1650 nm are shown in Figure 4(a). The spectral data were pre-treated with SG, a Gaussian filter, and MSC to reduce high-frequency noise, as shown in Figures 4(b)–(d). The MSC technique was used to correct the data by approximating the additive and multiplicative effects of the spectra.⁴¹

Figure 4. Untreated and pre-treated spectra data.

(a) Raw spectra, (b) SG smoothing spectra, (c) Gaussian spectra, and (d) MSC spectra.

After pre-processing the spectral data, the dimension of the absorbance spectra data was reduced by PCA using the singular value decomposition algorithm. Two latent variables representing 99.75% of the variance were selected. Spectral outliers were identified using PCA, subject to Hotelling’s T² ellipse. In this study, 21 outliers were identified. Most of the outlier data are ripe mango samples. The large number of outliers was caused by the larger number of samples in the form of unripe mangoes, so that some ripe mangoes were outliers because the characteristics were too different. These outliers were excluded because they could have a negative impact on the model. Removing data outliers from a dataset can increase statistical analysis accuracy and avoid misleading findings. Outliers are a significant issue in regression analysis and substantially threaten ordinary least squares analysis results including effect sizes and coefficients.⁴²^,⁴³ Sample outliers may provide helpful information, but they can also be non-representative samples that contribute to errors in a model.⁴⁴ The final sample consisted of 165 observations, divided into two parts: calibration and validation datasets with five-fold cross-validation (see Figure 2). Each fold consisted of 132 calibration data samples and 33 validation data samples.

Descriptive statistical values of the pH and TSS of the mangoes

The descriptive statistical values for the measured pH and TSS are presented in Table 1. The robustness of the calibration models was evaluated using five-fold cross-validation. Three different calibration models (KPLSR, SVMR, and MLPR) were developed using the calibration dataset for each pre-treatment method (raw, SG smoothing, Gaussian filter, and MSC) to predict the pH and TSS values of the mangoes.

Table 1. Descriptive statistics of the pH and TSS of mangoes.

Statistical Descriptive	pH value			TSS value
Statistical Descriptive	Raw	Unripe	Ripe	Raw	Unripe	Ripe
n	18	119	28	18	119	28
Mean	3.64	4.18	5.02	4.11	5.36	7.31
Standard deviation	0.28	0.26	0.38	0.28	0.51	0.58
Min	3.03	3.78	4.29	3.50	4.40	6.60
Max	3.96	4.27	5.66	4.40	6.50	8.40
Range	0.93	0.49	1.37	0.90	2.10	1.80

Predictive performance comparison of pH value

Table 2 presents the calibration and validation results for the pH prediction using NIR. The three regressions provided robust models using GFS-treated spectra compared with untreated, SG smoothing, and full MSC spectra. Both calibration and validation in the MSC spectral model yielded higher MAPE values than those of the other two spectral treatments (SG and Gaussian smoothing). The raw spectra model was better than the MSC model but not better than the Gaussian and SG models. This result is similar; the raw spectra model had a worse predictive performance than the pre-treated spectra.

Table 2. Predictive performance comparison of pH values.

Spectra treatment	Method	Calibration				Validation				Overall
Spectra treatment	Method	RMSE	R²	MSE	MAPE	RMSE	R²	MSE	MAPE	RMSE	R²	MSE	MAPE
Un-treated (Raw)	KPLSR	0.261	0.690	0.068	4.931	0.264	0.668	0.070	4.993	0.261	0.686	0.069	4.943
	SVMR	0.244	0.729	0.060	4.569	0.261	0.672	0.069	4.916	0.247	0.718	0.062	4.638
	MLPR	0.174	0.862	0.030	3.085	0.258	0.663	0.067	4.798	0.191	0.822	0.038	3.428
SG smoothing	KPLSR	0.261	0.690	0.068	4.931	0.264	0.668	0.070	4.993	0.261	0.686	0.069	4.943
	SVMR	0.244	0.729	0.060	4.569	0.261	0.672	0.069	4.916	0.247	0.718	0.062	4.638
	MLPR	0.174	0.862	0.030	3.092	0.244	0.700	0.060	4.629	0.188	0.829	0.036	3.400
GFS	KPLSR	0.261	0.690	0.068	4.931	0.264	0.668	0.070	4.993	0.261	0.686	0.069	4.943
	SVMR	0.244	0.730	0.060	4.569	0.260	0.673	0.069	4.916	0.247	0.718	0.061	4.638
	MLPR	0.174	0.861	0.030	3.084	0.241	0.716	0.059	4.608	0.187	0.832	0.036	3.389
MSC	KPLSR	0.298	0.597	0.089	5.763	0.302	0.583	0.092	5.936	0.298	0.594	0.089	5.797
	SVMR	0.289	0.622	0.084	5.432	0.316	0.557	0.101	6.248	0.295	0.609	0.087	5.595
	MLPR	0.274	0.659	0.075	5.238	0.294	0.551	0.089	5.887	0.278	0.638	0.078	5.368

MLPR is the best method for predicting the pH value of mangoes based on the three regression methods used for all spectral data. The prediction model's performance with MLPR had the highest R² value and the lowest MSE, RMSE, and MAPE values compared with the KPLSR and SVMR methods.

Predictive performance comparison of TSS value

Predictive performance comparisons of the TSS values are listed in Table 3. Predictive models for TSS values have lower performance than predictive models for pH values. The performance of the pH prediction model was better than that of the TSS prediction model because it had a high R² value with a low MAPE value (3.4–5.8%). The predicted pH value was closer to the observed pH value. Although the R² value was high in the TSS prediction model, the MAPE value was also relatively high (6.4–8.1%). However, the MAPE value in the TSS model is still classified as highly accurate.³⁹

Table 3. Predictive performance comparison of TSS values.

Spectra treatment	Method	Calibration				Validation				Overall
Spectra treatment	Method	RMSE	R²	MSE	MAPE (%)	RMSE	R²	MSE	MAPE (%)	RMSE	R²	MSE	MAPE (%)
Un-treated (Raw)	KPLSR	0.506	0.750	0.256	7.519	0.512	0.745	0.266	7.547	0.507	0.749	0.258	7.524
	SVMR	0.473	0.781	0.224	7.108	0.518	0.732	0.270	7.825	0.482	0.771	0.233	7.251
	MLPR	0.436	0.814	0.190	4.654	0.490	0.725	0.243	7.375	0.447	0.796	0.201	6.638
SG smoothing	KPLSR	0.506	0.750	0.256	7.519	0.512	0.745	0.266	7.547	0.507	0.749	0.258	7.524
	SVMR	0.473	0.781	0.224	7.109	0.518	0.732	0.270	7.826	0.482	0.771	0.233	7.252
	MLPR	0.422	0.826	0.178	6.220	0.494	0.722	0.248	7.388	0.436	0.805	0.192	6.454
GFS	KPLSR	0.505	0.750	0.256	7.517	0.512	0.742	0.266	7.543	0.507	0.748	0.258	7.522
	SVMR	0.473	0.781	0.224	7.108	0.518	0.732	0.270	7.820	0.482	0.771	0.233	7.250
	MLPR	0.426	0.822	0.182	6.294	0.490	0.725	0.244	7.341	0.439	0.803	0.194	6.503
MSC	KPLSR	0.530	0.725	0.281	8.103	0.534	0.710	0.292	8.267	0.531	0.722	0.283	8.136
	SVMR	0.505	0.753	0.255	7.648	0.552	0.702	0.308	8.750	0.514	0.743	0.266	7.869
	MLPR	0.467	0.785	0.219	7.086	0.516	0.694	0.271	7.949	0.477	0.767	0.229	7.259

Pre-processing spectra using SG smoothing for the TSS value parameter gave the best predictive model results, with the lowest RMSE, MSE, and MAPE values and the highest R² value.

Discussion

pH value

MLPR, SVMR, and KPLSR exhibited excellent predictions of the pH of mangoes. Overall, MLPR using GFS spectra provided the best overall model for pH prediction, with R² = 0.832, RMSE = 0.187, MSE = 0.036, and MAPE = 3.389% (Table 2). All treatment spectra revealed that the best predictive model used MLPR, with the highest R² value and the lowest RMSE, MSE, and MAPE values compared with the other regression methods. This is consistent with the results of the previous studies by Ulya et al.¹⁶^,²⁶

The order of prediction performance based on the regression method from the best is MLPR, SVMR and KPLSR. MLPR is a novel method for predicting the internal quality of fruits developed by Ulya et al.¹⁶^,²⁶ This study confirms that the MLPR method can produce a robust predictive model for determining the internal quality of mangoes; MLPR (nonparametric regression) has predictive performance with a lower MAPE value than that of the MPR (parametric regression). Table 2 proves that SVMR outperforms KPLSR with higher R² values, lower MSE, RMSE and MAPE. This is similar to Refs. 45, 46 which shows that predictions using the SVM method are better than PLSR because SVM regression can deal with nonlinearity in spectral data.⁴⁷ PLSR is a linear regression method which fail to express the linear relationship found in spectral data.

Among the three regression methods, MLPR exhibited lower MAPE values in all treatment spectra. With the Gaussian filter spectra calibration and validation data, MLPR provided the highest R² and lowest RMSE, MSE, and MAPE compared with the other methods, as shown in Figure 5. Even with the overall data, the R², RMSE, MSE, and MAPE values were better for MLPR than the those of the other methods, which indicates that the MLPR method provides an accurate prediction of all spectral data with MAPE <10%³⁹ and low RMSE values. The predictive performance of the MLPR model also had a high R² (0.82–0.9), thus indicating good predictive ability.⁴⁸

Figure 5. MLPR predictive performance on Gaussian filter smoothing (GFS) spectra data for predicting the pH of mangoes (Fold 4).

KPLSR method performed the worst in predicting the pH of mangoes. Only a few studies have predicted fruit characteristics using the KPLSR. Most studies use PLSR because of its simplicity and small calculation volume. Partial Least Square (PLS) is a linear method of data analysis.⁴⁹ Based on untreated spectra, Nicolaï et al.²⁷ used KPLSR to predict apple sugar content.

The prediction of the sugar content of Gannan Navel oranges based on several treated spectra was reported by Liu,⁴⁴ where KPLSR, particularly the spline PLS model, was superior to others with an R² of 0.87, RMSE validation of 0.47 °Brix, and standard deviation ratio of 2.34. Kernel PLS is suitable for dealing with nonlinear phenomena; this may be owing to the changes in the chemical interactions of the fruit matrix because unripe and ripe fruits have different structures and varieties.⁵⁰

TSS value

The best regression method for predicting the TSS value of mangoes was MLPR, based on SG smoothing. The calibration, validation, and overall models of the MLPR method based on all spectral treatments have higher R² values and lower RMSE, MSE, and MAPE values than the other methods. The MAPE value is higher than that of the pH prediction model, but the MAPE value is still less than 10%, thus categorizing the method as highly accurate in forecasting.³⁹ Figure 6 shows the MLPR predictive performance based on SG smoothing. Previous research has shown that a small set of reference attributes and spectral behavior changes influenced by cultivar, fruit size, and fruit origin significantly impact model robustness.⁶^,⁵¹^,⁵² Moreover, the prediction performance was affected by the lack of variability in the calibration model. When validated by samples outside the prediction model range, the prediction model performance in a study investigating the total acid content of Japanese plums decreased.⁵¹ Subedi, Walsh, and Owens⁵³ reported that a TSS prediction model developed from fruits at late stages of ripening failed to predict the TSS of fruits at earlier stages of ripening.

Figure 6. MLPR predictive performance on SG Smoothing spectra data for predicting the TSS of mangoes (Fold 1).

The order of the best regression methods in predicting mango TSS is MLPR, SVMR and Kernel PLSR. The utilization of partial least square regression (PLSR) has the benefit of dealing with irrelevant and noisy variables. However, when the number of samples is substantially fewer than the number of variables, as in the case of spectral data, the prediction ability of PLSR is lowered.⁵⁴ Because the relationship between near-infrared spectra and targeted components to be simulated is not strictly linear in many existing and established NIR spectroscopic measurement applications, the cause of nonlinearity can vary greatly and is hard to identify. This means there are better solutions that conventional linear regression approaches like PLSR.⁵⁵ Similar findings were made by another researcher, who concluded that SVMR outperformed PLSR in prediction performance.⁴⁵^,⁴⁶^,⁵⁵^,⁵⁶ In this study, the best prediction performance values are provided by MLPR, a nonparametric regression. Applying a nonparametric regression model based on a local polynomial estimator outperforms a mathematical computation technique, including the support vector machine approach.⁵⁷ This is comparable to research⁵⁸^,⁵⁹ which reveals that nonparametric regression approaches based on local polynomial estimators outperform parametric regression in the case of prediction models.

The best pre-treatment spectral data for predicting the TSS value was SG smoothing. Overall, MLPR with SG smoothing spectra was the best model, with an R² value of 0.805, RMSE value of 0.436, MSE value of 0.192, and MAPE value of 6.454. This is in agreement with the findings³⁹ that the SG smoothing spectral model for predicting the tannin content of persimmon fruit is better than MSC. The R² values for SG smoothing and MSC were 0.107 and 0.016, respectively.²⁹ In contrast to,²⁸ the prediction of the mangoes’ TSS values using SVMR based on extended MSC gave an R² validation of 0.86 and an RMSE of 0.66.

Generally, all spectral treatment and regression methods on the calibration model have an R² value higher than that of the validation model, with a small gap between them; this indicates that the k-fold cross-validation method can balance the prediction results of the calibration and validation datasets. The k-fold cross-validation method can reduce bias in sampling.³⁶ If the test matrix method is used, the R² of the validation models would be less than the R² of the calibration model. Louw and Theron⁵¹ reported that the prediction model's performance for the total acid content of Japanese plums decreased when samples outside the prediction model range were validated.

Conclusions

Prediction of the internal quality of mangoes, including pH and TSS, can be performed rapidly and non-destructively using NIR spectroscopy. Spectral pre-treatment, such as SG smoothing, GFS, and MSC, affects the ability of the prediction model to use KPLSR, SVMR, and MLPR. The best regression model for pH prediction is MLPR based on a GFS spectra. In addition, KPLSR, SVMR, and MLPR based on raw spectra, SG smoothing, and MSC also provided highly accurate prediction performance, with MAPE values of less than 10%, low MSE and RMSE, and high R².

The best regression model for TSS prediction was MLPR based on SG smoothing. In addition, KPLSR, SVMR, and MLPR based on raw spectra, GFS, and MSC also provided highly accurate prediction performance, with MAPE values of less than 10%, low MSE and RMSE, and high R². We believe that NIR spectroscopy can be used to determine the internal quality of mangoes. However, further research is required to improve the prediction model performance of TSS values using MLPR based on a combination of several pre-treatment spectra. In conclusion, NIR spectroscopy combined with nonparametric regression MLPR could become a rapid and non-destructive alternative method for predicting the internal quality of mangoes. The robustness of the model based on MLPR could potentially be investigated in further study using test data sets built up of samples from various seasons. Likewise, the quality of sample collection in further study needs to be improved to reduce the number of outliers.

Data availability

Underlying data

Open Science Framework: dataset avomango, https://doi.org/10.17605/OSF.IO/YMS7F.⁶⁰

This project contains the following underlying data:

- datasets of 186 mangos Gadung Klonal 21.xlsx (included in the 21 outliers are data 6, 18, 33, 36, 39, 42, 46, 52, 112, 137, 151, 152, 153, 155, 157, 159, 160, 162, 167, 168, and 176.

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Acknowledgments

The authors would like to thank the Directorate General of Indonesian Higher Education, Ministry of Education, Culture, Research, and Technology of the Republic of Indonesia, for funding this research through the BPPDN Scholarship 2019.

References

1. Karsinah, Rebin: Tasliah Varietas Unggul Mangga Gadung 21: Daging Buah Tebal, Berserat Rendah, Rasa Manis. Iptek Hortik. 2017; 13: 39–44.
2. Mir SA, Shah MA, Mir MM: Postharvest Biology and Technology of Temperate Fruits.2018. 9783319768434.
3. Sohaib ASS, Zeb A, Qureshi WS, et al.: Towards Fruit Maturity Estimation Using NIR Spectroscopy. Infrared Phys. Technol. 2020; 111: 103479. Publisher Full Text
4. Jie D, Xie L, Rao X, et al.: Using Visible and near Infrared Diffuse Transmittance Technique to Predict Soluble Solids Content of Watermelon in an On-Line Detection System. Postharvest Biol. Technol. 2014; 90: 1–6. Publisher Full Text
5. Sari HP, Purwanto YA, Budiastra IW: Prediction of Chemical Contents in ‘Gedong Gincu’ Mango Using near Infrared Spectroscopy. J. Agritech. 2016; 36: 294. Publisher Full Text
6. Nicolaï BM, Beullens K, Bobelyn E, et al.: Nondestructive Measurement of Fruit and Vegetable Quality by Means of NIR Spectroscopy: A Review. Postharvest Biol. Technol. 2007; 46: 99–118. Publisher Full Text
7. Jha SN, Chopra S, Kingsly ARP: Modeling of Color Values for Nondestructive Evaluation of Maturity of Mango. J. Food Eng. 2007; 78: 22–26. Publisher Full Text
8. Jha SN, Jaiswal P, Narsaiah K, et al.: Non-Destructive Prediction of Sweetness of Intact Mango Using near Infrared Spectroscopy. Sci. Hortic. (Amsterdam). 2012; 138: 171–175. Publisher Full Text
9. Jha SN, Narsaiah K, Jaiswal P, et al.: Nondestructive Prediction of Maturity of Mango Using near Infrared Spectroscopy. J. Food Eng. 2014; 124: 152–157. Publisher Full Text
10. Watanawan C, Wasusri T, Srilaong V, et al.: Near Infrared Spectroscopic Evaluation of Fruit Maturity and Quality of Export Thai Mango (Mangifera Indica L. Var. Namdokmai). Int. Food Res. J. 2014; 21: 1073–1078.
11. Schulze K, Nagle M, Spreer W, et al.: Development and Assessment of Different Modeling Approaches for Size-Mass Estimation of Mango Fruits (Mangifera Indica L., Cv.’Nam Dokmai’). Comput. Electron. Agric. 2015; 114: 269–276. Publisher Full Text
12. Rungpichayapichet P, Mahayothee B, Nagle M, et al.: Robust NIRS Models for Non-Destructive Prediction of Postharvest Fruit Ripeness and Quality in Mango. Postharvest Biol. Technol. 2016; 111: 31–40. Publisher Full Text
13. Agussabti, Rahmaddiansyah, Satriyo P, et al.: Data Analysis on near Infrared Spectroscopy as a Part of Technology Adoption for Cocoa Farmer in Aceh Province, Indonesia. Data Br. 2020; 29: 105251. PubMed Abstract | Publisher Full Text | Free Full Text
14. Valipour M, Banihabib ME, Behbahani SMR: Monthly Inflow Forecasting Using Autoregressive Artificial Neural Network. J. Appl. Sci. 2012; 12: 2139–2147. Publisher Full Text
15. Anderson NT, Walsh KB, Flynn JR, et al.: Achieving Robustness across Season, Location and Cultivar for a NIRS Model for Intact Mango Fruit Dry Matter Content. II. Local PLS and Nonlinear Models. Postharvest Biol. Technol. 2021; 171: 111358. Publisher Full Text
16. Ulya M, Chamidah N, Saifudin T: Predicting the Sweetness Level of Avomango (Gadung Klonal 21) Using Multi-Predictor Local Polynomial Regression. IOP Conf. Ser. Earth Environ. Sci. 2021; 733: 012009. Publisher Full Text
17. Chamidah N, Lestari B: Estimation of Covariance Matrix Using Multi-Response Local Polynomial Estimator for Designing Children Growth Charts: A Theoretically Discussion. J. Phys. Conf. Ser. 2019; 1397: 012072. Publisher Full Text
18. Derkacheva A, Mouginot J, Millan R, et al.: Data Reduction Using Statistical and Regression Approaches for Ice Velocity Derived by Landsat-8, Sentinel-1 and Sentinel-2. Remote Sens. 2020; 12: 1–21. Publisher Full Text
19. Islamiyati A, Chamidah N: Ability of Covariance Matrix in Bi-Response Multi-Prredictor Penalized Spline Model Through Longitudinal Data Simulation.2019; 3: 8–11.
20. Adiwati T, Chamidah N: Modelling of Hypertension Risk Factors Using Penalized Spline to Prevent Hypertension in Indonesia. IOP Conf. Ser. Mater. Sci. Eng. 2019; 546: 052003. Publisher Full Text
21. Ramadan W, Chamidah N, Zaman B, et al.: Standard Growth Chart of Weight for Height to Determine Wasting Nutritional Status in East Java Based on Semiparametric Least Square Spline Estimator. IOP Conf. Ser. Mater. Sci. Eng. 2019; 546: 052063. Publisher Full Text
22. Lestari B, Fatmawati, Budiantara IN, et al.: Estimation of Regression Function in Multi-Response Nonparametric Regression Model Using Smoothing Spline and Kernel Estimators. J. Phys. Conf. Ser. 2018; 1097: 012091. Publisher Full Text
23. Hidayati L, Chamidah N, Nyoman Budiantara I: Spline Truncated Estimator in Multiresponse Semiparametric Regression Model for Computer Based National Exam in West Nusa Tenggara. IOP Conf. Ser. Mater. Sci. Eng. 2019; 546: 052029. Publisher Full Text
24. George J, Janaki L, Parameswaran Gomathy J: Statistical Downscaling Using Local Polynomial Regression for Rainfall Predictions – A Case Study. Water Resour. Manag. 2015; 30: 183–193. Publisher Full Text
25. Block P, Goddard L: Statistical and Dynamical Climate Predictions to Guide Water Resources in Ethiopia. J. Water Resour. Plan. Manag. 2012; 138: 287–298. Publisher Full Text
26. Ulya M, Chamidah N: Multi-Predictor Local Polynomial Regression for Predicting the Acidity Level of Avomango (Gadung Klonal 21). AIP Conf. Proc. 2021; 2329. Published. Publisher Full Text
27. Nicolaï BM, Theron KI, Lammertyn J: Kernel PLS Regression on Wavelet Transformed NIR Spectra for Prediction of Sugar Content of Apple. Chemom. Intell. Lab. Syst. 2007; 85: 243–252. Publisher Full Text
28. Al-Sanabani DGA, Solihin MI, Pui LP, et al.: Development of Non-Destructive Mango Assessment Using Handheld Spectroscopy and Machine Learning Regression. J. Phys. Conf. Ser. 2019; 1367: 012030. Publisher Full Text
29. Cortés V, Rodríguez A, Blasco J, et al.: Prediction of the Level of Astringency in Persimmon Using Visible and Near-Infrared Spectroscopy. J. Food Eng. 2017; 204: 27–37. Publisher Full Text
30. Luo J, Ying K, Bai J: Savitzky-Golay Smoothing and Differentiation Filter for Even Number Data. Signal Process. 2005; 85: 1429–1434. Publisher Full Text
31. Tan YP, Chia KS: Effects of Pre-Processing and Principal Components for Artificial Neural Network in Non-Destructive Internal Quality Prediction of Mango across Different Harvest Periods. IEEE 13th Int. Conf. Control Syst. Comput. Eng. 2023; 144–148. Publisher Full Text
32. Mishra P, Woltering E: Semi-Supervised Robust Models for Predicting Dry Matter in Mango Fruit with near-Infrared Spectroscopy. Postharvest Biol. Technol. 2023; 200: 112335. Publisher Full Text
33. Saeys W, Nguyen Do Trong N, et al.: Multivariate Calibration of Spectroscopic Sensors for Postharvest Quality Evaluation: A Review. Postharvest Biol. Technol. 2019; 158: 110981. Publisher Full Text
34. Hastie T, Tibshirani R: Generalized Additive Models. Chapman & Hall; 1990. 9781351445962.
35. Ariyanto RA, Chamidah N: Sentiment Analysis for Zoning System Admission Policy Using Support Vector Machine and Naive Bayes Methods. J. Phys. Conf. Ser. 2021; 1776: 012058. Publisher Full Text
36. Ardhani BA, Chamidah N, Saifudin T: Sentiment Analysis Towards Kartu Prakerja Using Text Mining with Support Vector Machine and Radial Basis Function Kernel. J. Inf. Syst. Eng. Bus. Intell. 2021; 7: 119. Publisher Full Text
37. Asrol M, Papilo P, Gunawan FE: Support Vector Machine with K-Fold Validation to Improve the Industry’s Sustainability Performance Classification. Procedia Comput. Sci. 2021; 179: 854–862. Publisher Full Text
38. Ren L, Glasure Y: Applicability of the Revised Mean Absolute Percentage Errors (MAPE) Approach to Some Popular Normal and Non-Normal Independent Time Series. Int. Adv. Econ. Res. 2009; 15: 409–420. Publisher Full Text
39. Moreno JJM, Palmer Pol A, Sesé Abad A, et al.: El Índice R-MAPE Como Medida Resistente Del Ajuste En La Previsiońn. Psicothema. 2013; 25: 500–506. Publisher Full Text
40. Akhlaghi YG, Ma X, Zhao X, et al.: A Statistical Model for Dew Point Air Cooler Based on the Multiple Polynomial Regression Approach. Energy. 2019; 181: 868–881. Publisher Full Text
41. Magwaza LS, Opara UL, Nieuwoudt H, et al.: NIR Spectroscopy Applications for Internal and External Quality Analysis of Citrus Fruit-A Review. Food Bioprocess Technol. 2012; 5: 425–444. Publisher Full Text
42. Mami AM, Jaber AM, Almabrouk OS: Applying Bootstrap Robust Regression Method on Data with Outliers. Int. J. Sci.: Basic Appl. Res. 2020; 49: 143–160.
43. Kutner MH, Nachtsheim C, Neter J: Applied Linear Regression Models. 4^th ed.McGraw-Hill/Irwin; 2004.
44. Xie L, Ye X, Liu D, et al.: Prediction of Titratable Acidity, Malic Acid, and Citric Acid in Bayberry Fruit by near-Infrared Spectroscopy. Food Res. Int. 2011; 44: 2198–2204. Publisher Full Text
45. Kamboj U, Guha P, Mishra S: Comparison of PLSR, MLR, SVM Regression Methods for Determination of Crude Protein and Carbohydrate Content in Stored Wheat Using near Infrared Spectroscopy. Mater. Today Proc. 2022; 48: 576–582. Publisher Full Text
46. Vasconcelos L, Dias G, Leite A, et al.: SVM Regression to Assess Meat Characteristics of Bísaro Pig Loins Using NIRS Methodology. Foods. 2023; 12: 1–15. PubMed Abstract | Publisher Full Text | Free Full Text
47. Chanda S, Hazarika AK, Choudhury N, et al.: Support Vector Machine Regression on Selected Wavelength Regions for Quantitative Analysis of Caffeine in Tea Leaves by near Infrared Spectroscopy. J. Chemom. 2019; 33: 1–15. Publisher Full Text
48. Williams P, Antoniszyn J, Manley M: Near Infrared Technology: Getting the Best out of Light. USA: Sun Press; 2019. 9781928480303.
49. Liu Y, Sun X, Zhou J, et al.: Linear and Nonlinear Multivariate Regressions for Determination Sugar Content of Intact Gannan Navel Orange by Vis-NIR Diffuse Reflectance Spectroscopy. Math. Comput. Model. 2010; 51: 1438–1443. Publisher Full Text
50. Chauchard F, Cogdill R, Roussel S, et al.: Application of LS-SVM to Non-Linear Phenomena in NIR Spectroscopy: Development of a Robust and Portable Sensor for Acidity Prediction in Grapes. Chemom. Intell. Lab. Syst. 2004; 71: 141–150. Publisher Full Text
51. Louw ED, Theron KI: Robust Prediction Models for Quality Parameters in Japanese Plums (Prunus Salicina L.) Using NIR Spectroscopy. Postharvest Biol. Technol. 2010; 58: 176–184. Publisher Full Text
52. Peirs A, Tirry J, Verlinden B, et al.: Effect of Biological Variability on the Robustness of NIR Models for Soluble Solids Content of Apples. Postharvest Biol. Technol. 2003; 28: 269–280. Publisher Full Text
53. Subedi PP, Walsh KB, Owens G: Prediction of Mango Eating Quality at Harvest Using Short-Wave near Infrared Spectrometry. Postharvest Biol. Technol. 2007; 43: 326–334. Publisher Full Text
54. de Santana FB , Otani SK, de Souza AM , et al.: Comparison of PLS and SVM Models for Soil Organic Matter and Particle Size Using Vis-NIR Spectral Libraries. Geoderma Reg. 2021; 27: e00436. Publisher Full Text
55. Munawar AA, Zulfahrizal, Meilina H, et al.: Near Infrared Spectroscopy as a Fast and Non-Destructive Technique for Total Acidity Prediction of Intact Mango: Comparison among Regression Approaches. Comput. Electron. Agric. 2022; 193: 106657. Publisher Full Text
56. Cardoso VGK, Poppi RJ: Non-Invasive Identification of Commercial Green Tea Blends Using NIR Spectroscopy and Support Vector Machine. Microchem. J. 2021; 164: 106052. Publisher Full Text
57. Chamidah N, Gusti KH, Tjahjono E, et al.: Improving of Classification Accuracy of Cyst and Tumor Using Local Polynomial Estimator. Telkomnika (Telecommunication Comput. Electron. Control). 2019; 17: 1492–1500. Publisher Full Text
58. Fibriyani V, Chamidah N: Prediction of Inflation in Indonesia Using Nonparametric Regression Approach Based on Local Polynomial Estimator. Adv. Soc. Sci. Educ. Humanit. Res. 2020; 474: 79–86. Publisher Full Text
59. Chamidah N, Mardianto MFF, Limanta EE, et al.: Modelling of Poverty Percentage Based on Mean Years of Schooling in Indonesia Using Local Linear Estimator. Vol. 474. . Atlantis Press; 2020; pp. 87–91. Publisher Full Text
60. Ulya M: dataset avomango. [dataset]. 2023, February 10. Publisher Full Text

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 13 Jun 2023

Author details Author details

¹ Department of Agroindustrial Technology, Faculty of Agriculture, Universitas Trunojoyo Madura, Bangkalan, 69162, Indonesia
² Doctoral Study Program of Mathematics and Natural Sciences, Faculty of Science and Technology, Airlangga University, Surabaya, 60115, Indonesia
³ Department of Mathematics, Faculty of Science and Technology, Airlangga University, Surabaya, 60115, Indonesia

Millatul Ulya
Roles: Data Curation, Funding Acquisition, Investigation, Resources, Visualization, Writing – Original Draft Preparation

Nur Chamidah
Roles: Conceptualization, Formal Analysis, Methodology, Project Administration, Supervision, Writing – Review & Editing

Toha Saifudin
Roles: Data Curation, Formal Analysis, Investigation, Software, Supervision, Validation

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (2)

version 2

Revised

Published: 18 Mar 2024, 12:656

https://doi.org/10.12688/f1000research.130015.2

version 1

Published: 13 Jun 2023, 12:656

https://doi.org/10.12688/f1000research.130015.1

Copyright

© 2024 Ulya M et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Ulya M, Chamidah N and Saifudin T. Mango quality prediction based on near-infrared spectroscopy using multi-predictor local polynomial regression modeling [version 2; peer review: 3 approved with reservations, 1 not approved]. F1000Research 2024, 12:656 (https://doi.org/10.12688/f1000research.130015.2)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 2

VERSION 2

PUBLISHED 18 Mar 2024

Revised

Views

8

Reviewer Report 20 May 2024

Agustami Sitorus, National Research and Innovation Agency (BRIN);, Jakarta Pusat, Indonesia

Approved with Reservations

https://doi.org/10.5256/f1000research.161939.r260284

This article discusses the application of NIRs to predict the quality of manga fruit using a chemometrics algorithm adapted to this case. To be accepted, here are my suggestions to improve the quality of this article.

... Continue reading

This article discusses the application of NIRs to predict the quality of manga fruit using a chemometrics algorithm adapted to this case. To be accepted, here are my suggestions to improve the quality of this article.

#Title.
↦ The title of this study is very broad, and only the quality of pH and TSS is examined. That's really unfair. Please kindly make it more sense.
↦ Regarding the chemometrics used, I suggest making it more general. The technique used is part multivariate analysis and part machine learning. Please think again to develop your title.

#Abstract.
↦ In my opinion, the abstract is easy to understand because it makes the research problem, objectives, methods, results, and conclusions short.

#Keywords.
↦In my opinion, the keyword "sustainable production" is not appropriate to the content of this article. Please consider my suggestion.

#Introduction.
↦ I was a little confused in the final paragraph of this study when the Authors mentioned the research objectives. Why have the authors declared that MLPR is the best regression model? Isn't this still the introduction? Please kindly revise it.
↦ Many studies have been published using NIR in manga quality with various models, ranging from multivariate to deep learning. My strong suggestion is that it is hoped that the author can create a comparison table from previous studies to show the position of this study, which is different from that of previous studies.

#Method.
↦ In NIR spectra data acquisition, the Author states, "The spectral data were originally presented in terms of the reflectance value (R) and were later converted to the absorbance spectra value (log 1/R)." Please to learn and explain "why log (1/R) is equal to absorbance (A)?". It should not be equal for opaque and non-absorbing materials.
↦ In the "Spectral data pre-treatment" section, MSC is not used to smooth your data. Please kindly learn why you need MSC preprocessing and do not need SNV, 1st derivatives, 2nd derivatives, or others.
↦ From your spectra data, the noise that comes to you is baseline shifting, which may be present due to non-uniform conditions of sample temperature or room temperature when scanning. Please report this temperature.
↦ In my opinion, the evaluators commonly used to evaluate calibration models include RMSE, MSE, R2, and RPD. As for the others, the Author must be able to explain one by one why the Author needs this evaluator.
↦ Have authors explain what MSC is? And what noise can be handled by MSC?

#Results.
↦ In pre-treatment. Where is the visualization in 3d of your PCA that the author says has outliers? Please show it.
↦ We know that the best model is always if the performance has the "highest R2 value and the lowest MSE, RMSE, and MAPE values". But what is essential about that is that it correlates with the model you are developing. Please discuss it for both.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Data science and analytics, ML & DL chemometrics,Agricultural engineering

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

13

Reviewer Report 23 Apr 2024

Rudiati Evi Masithoh, Universitas Gadjah Mada, Yogyakarta, Indonesia

Approved with Reservations

https://doi.org/10.5256/f1000research.161939.r260280

Researchers have succeeded in writing a manuscript on the use of NIR spectroscopy to predict pH and Brix of mango. However, there are several things that need to be improved.
1. In the Abstract, the method section shows ... Continue reading

Researchers have succeeded in writing a manuscript on the use of NIR spectroscopy to predict pH and Brix of mango. However, there are several things that need to be improved.
1. In the Abstract, the method section shows the best model. The method should only include the research stages of sampling, spectral data collection and analysis, while the best preprocessing is included in the results and discussion.
2. In the Introduction it is said that the relationship between spectra and parameters can be linear and nonlinear, and the author states that there has not been much research on nonlinear modeling to predict quality parameters, so it was carried out in this research. However, apart from the fact that not much research has been carried out with nonlinear models, it would be better if the author explained the reasons related to the extrinsic and intrinsic parameters of the sample which cause the nonlinearity of the model to also need to be investigated. The journal entitled "Non-linear regression methods in NIRS quantitative analysis" by Perez-Marin et al (2007) can be used as a reference.
3. In the method, spectral data was collected from 3 points on 1 mango (250-300g), but the chemical data was only collected at 10g. It was not explained whether all parts of the mango were blended and then only 10g was taken for chemical analysis, or 10g was only from one part (perhaps the tip, shoulder or cheek).
4. In the method, the mango is blended with distilled water which is then placed in a refractometer to obtain Brix data, which means the resulting Brix data must be calculated using a certain formula because it involves dilution. Does the Brix data used for the model accommodate this? If not, the Brix value must be recalculated.
5. If you use a wavelength of 900-1650 nm with an interval of 7 nm, 107-108 variables are produced, but in the method it is written 112 predictor variables, which one is correct?
6. In Statistical Analysis, it is written "... The predictor variables were the wavelengths of 112 NIR spectra for each mango sample..". It would be more appropriate if the 'predictor variables' were the absorbance'.
7. In Statistical Analysis, the author wrote 'The following steps were to perform dimension reduction using principal component analysis (PCA) to reduce predictor variables into two principal components'. In reality, PCA does not only reduce to 2 dimensions. Please check related references.
8. Figure 4, what is the reason for displaying all the spectra, is it possible to just take the average spectra from each level of ripeness (raw, unripe, or ripe, like the grouping in Table 1, so that there is additional discussion of differences in spectra from various levels of ripeness.
9. In the discussion it is necessary to add differences in the results of R2, RMSE, etc. when compared with other linear methods (PLSR, etc.), or even nonlinear ones, for example with ANN
10. In the conclusion, comments from the author can be added regarding the recommended method for predicting pH and TSS of mango, whether with a linear or nonlinear model.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: NIR spectroscopy and chemometrics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Version 1

VERSION 1

PUBLISHED 13 Jun 2023

Views

20

Reviewer Report 09 Oct 2023

Jens Petter Wold, The Norwegian Institute of Food, Fisheries and Aquaculture Research, Osloveien, Norway

Not Approved

https://doi.org/10.5256/f1000research.142742.r210564

This study compares different regression and pre-processing techniques of NIR spectra from mango fruits, where the reference values are sugar (TSS) and pH. Unfortunately, this is not a proper scientific study.

No consideration has been made

This study compares different regression and pre-processing techniques of NIR spectra from mango fruits, where the reference values are sugar (TSS) and pH. Unfortunately, this is not a proper scientific study.

No consideration has been made on how to obtain a best possible match between NIR spectra and the target features TSS and pH.
Internal quality features of mango are attempted modeled with surface measurements on the skin of the mangos.
The NIR spectra shown are extremely noisy. It is not clear what kind of noise this is. It is not commented on by the authors. Furthermore, the spectra are not in any way explained to the reader, what do we see, what are absorption bands of interest?
There is no interpretation of regression vectors or spectral features whatsoever to explain the results or to make probable that they rely on sound relations. It means that the models could rely on completely other features than TSS and pH, for instance pigments.
Based on a data set that the authors clearly do not understand well, they compare different regression models. It does not make much sense.

This is an article that is more misleading than informative, and it should not be published unless major improvements can be made.

Other comments:

The work is not clearly presented, see comments below. The work cites some relevant literature in the introduction, but almost no discussion of the results is done in relation to previous reported work.
The design of the study is very limited in terms of samples. All samples are from the same garden giving very little relevant bio-variation in the sample set. It is well known that fruit from different gardens can exhibit slightly different NIR signals, so a bigger data set should be made, especially when the aim here is to compare regression methods. A completely independent test set should also be used.
The NIR spectra are presented maybe as absorption spectra turned upside down? The quality of the spectra is very poor with a lot of noise. The noise seems to increase with increasing absorption. What is wrong here? This very high level of noise would obscure the calibrations, since e.g the absorption bands of sugar are very small and the variation very subtle.
It is well known that TSS can be measured with NIR spectroscopy, but not pH. The authors should argue and explain why they can measure pH. Which spectral features does the model rely on?
A figure of the NIR measurement set-up should be included. Also showing where the samples for references were taken from each mango.
Description of data analysis is confusing. 5-fold cross validation is used. Still, they show predicted vs. Measured plots of calibration set and validation set. This does not make sense. It must be clarified.
The number of components in the models are not listed, they give an impression of the complexity of the models.
The differences in performance between the regression methods are small. A significance test should be conducted, to indicate if the RMSECV values are significantly different between the methods.
Many outliers are taken out of the data set based purely on statistics. This is normally not acceptable. There must be a good reason for removing an outlier. E.g. that the measurement was done in a wrong way.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

No

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: My areas of expertise are within NIR spectroscopy, food science and data modeling

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

27

Reviewer Report 24 Aug 2023

Kim Seng Chia, Electrical and Electronic Engineering, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia

Approved with Reservations

https://doi.org/10.5256/f1000research.142742.r193780

Title: Mango quality prediction based on near-infrared spectroscopy using multi-predictor local polynomial regression modeling

Summary of the study:
This study shows that the use of nonparametric regression (i.e. multi-predictor local polynomial regression (MLPR)) outperformed ... Continue reading

Title: Mango quality prediction based on near-infrared spectroscopy using multi-predictor local polynomial regression modeling

Summary of the study:
This study shows that the use of nonparametric regression (i.e. multi-predictor local polynomial regression (MLPR)) outperformed kernel partial least square regression (KPLSR) and support vector machine regression (SVMR) in predicting the pH and total soluble solids (TSS) of "Gadung Klonal 21" mango fruits, which maturity stages cannot be visually classified as stated the introduction section - paragraph 3. 165 mangoes (three different maturity) were used (excluding 21 outliers) that appears sufficient to build the MLRP, KPLSR, and SVMR. 5-fold cross-validation was used to evaluate the performance of regression models.

Comments with respect to the questions (bolded) are as follows.

1. Is the work clearly and accurately presented and does it cite the current literature?

- No reference from 2023 and 2022 is cited and discussed. Authors shall discuss more related recent works (2019-2023) in the manuscript too. Suggestion: more than 50% of the cited works shall be published in last five years. Current manuscript only cited literature that published in 2021 (6 works), 2020 (3 works), 2019 (9 works), and other years (28 works).

2. Is the study design appropriate and is the work technically sound?

- Figure 3 suggests that only MSC was able to remove unwanted baseline/offset effects. This is questionable about why models with smoothing (that consisted of unwanted baseline shift effects) can outperform models with MSC preprocessing method. (p/s: please define the full name of MSC for its first use). Suggestion: author may try to use SG smoothing + MSC as Figure 3 (d) suggests that the spectra shall be smoothed first prior MSC.

3. Are sufficient details of methods and analysis provided to allow replication by others?

Four comments are as follows.

(i) A figure to illustrate the actual spectral data acquisition setup shall be provided with details of the distances among sample, light source and spectrometer.

(ii) From the manuscript, "The following steps were to perform dimension reduction using principal component analysis (PCA), ..." Why PCA was involved in this study? The use of PCA prior to KPLSR is not make sense as PLS does not need additional dimension reduction method to reduce dimension in general. Nevertheless, in the results section, PCA was used for outlier detection. Authors shall revise the description to avoid confusing.

(iii) How were the data split during cross-validation (section Model evaluation)?

(iv) Figure 4 and 5 are confusing. As cross-validation was used, authors shall provide performance of all folds instead of fold 4 and fold 1 in figure 4 and 5, respectively.

4. If applicable, is the statistical analysis and its interpretation appropriate?

- Table 1. Descriptive statistics shall detail the info for maturity stages - how many mangoes in each maturity stages and their descriptive statistics?

5. Are all the source data underlying the results available to ensure full reproducibility?

- Reproducibility concern: "In this study, 21 outliers were identified. These outliers were excluded because they could have a negative impact on the model." Authors shall explain more about why more than 10% of data were outliers and how to avoid/minimize them in the future.

- Additionally, which data were the outliers shall be stated clearly as the provided open dataset includes the 21 outliers.

6. Are the conclusions drawn adequately supported by the results?

Conclusion content - in future, authors might consider to evaluate the robustness of the models using samples across different seasons as testing datasets.

7. Other comments:

- Authors shall use RMSECV instead of RMSEV as cross-validation was used in this study.

- Figure 1 - please use Validation instead of Testing to align with the text.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Machine Learning, Transfer Learning, Near Infrared Spectroscopy, Real-time Embedded System

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 13 Jun 2023

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3	4
Version 2 (revision) 18 Mar 24			read	read
Version 1 13 Jun 23	read	read

Kim Seng Chia, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Jens Petter Wold, The Norwegian Institute of Food, Fisheries and Aquaculture Research, Osloveien, Norway
Rudiati Evi Masithoh, Universitas Gadjah Mada, Yogyakarta, Indonesia
Agustami Sitorus, National Research and Innovation Agency (BRIN);, Jakarta Pusat, Indonesia

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

8 Views

20 May 2024 | for Version 2

Agustami Sitorus, National Research and Innovation Agency (BRIN);, Jakarta Pusat, Indonesia

8 Views Cite this report Responses(0)

Approved With Reservations

This article discusses the application of NIRs to predict the quality of manga fruit using a chemometrics algorithm adapted to this case. To be accepted, here are my suggestions to improve the quality of this article.

#Title.
↦ The title of this study is very broad, and only the quality of pH and TSS is examined. That's really unfair. Please kindly make it more sense.
↦ Regarding the chemometrics used, I suggest making it more general. The technique used is part multivariate analysis and part machine learning. Please think again to develop your title.

#Abstract.
↦ In my opinion, the abstract is easy to understand because it makes the research problem, objectives, methods, results, and conclusions short.

#Keywords.
↦In my opinion, the keyword "sustainable production" is not appropriate to the content of this article. Please consider my suggestion.

#Introduction.
↦ I was a little confused in the final paragraph of this study when the Authors mentioned the research objectives. Why have the authors declared that MLPR is the best regression model? Isn't this still the introduction? Please kindly revise it.
↦ Many studies have been published using NIR in manga quality with various models, ranging from multivariate to deep learning. My strong suggestion is that it is hoped that the author can create a comparison table from previous studies to show the position of this study, which is different from that of previous studies.

#Method.
↦ In NIR spectra data acquisition, the Author states, "The spectral data were originally presented in terms of the reflectance value (R) and were later converted to the absorbance spectra value (log 1/R)." Please to learn and explain "why log (1/R) is equal to absorbance (A)?". It should not be equal for opaque and non-absorbing materials.
↦ In the "Spectral data pre-treatment" section, MSC is not used to smooth your data. Please kindly learn why you need MSC preprocessing and do not need SNV, 1st derivatives, 2nd derivatives, or others.
↦ From your spectra data, the noise that comes to you is baseline shifting, which may be present due to non-uniform conditions of sample temperature or room temperature when scanning. Please report this temperature.
↦ In my opinion, the evaluators commonly used to evaluate calibration models include RMSE, MSE, R2, and RPD. As for the others, the Author must be able to explain one by one why the Author needs this evaluator.
↦ Have authors explain what MSC is? And what noise can be handled by MSC?

#Results.
↦ In pre-treatment. Where is the visualization in 3d of your PCA that the author says has outliers? Please show it.
↦ We know that the best model is always if the performance has the "highest R2 value and the lowest MSE, RMSE, and MAPE values". But what is essential about that is that it correlates with the model you are developing. Please discuss it for both.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Data science and analytics, ML & DL chemometrics,Agricultural engineering

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

13 Views

23 Apr 2024 | for Version 2

Rudiati Evi Masithoh, Universitas Gadjah Mada, Yogyakarta, Indonesia

13 Views Cite this report Responses(0)

Approved With Reservations

Researchers have succeeded in writing a manuscript on the use of NIR spectroscopy to predict pH and Brix of mango. However, there are several things that need to be improved.
1. In the Abstract, the method section shows the best model. The method should only include the research stages of sampling, spectral data collection and analysis, while the best preprocessing is included in the results and discussion.
2. In the Introduction it is said that the relationship between spectra and parameters can be linear and nonlinear, and the author states that there has not been much research on nonlinear modeling to predict quality parameters, so it was carried out in this research. However, apart from the fact that not much research has been carried out with nonlinear models, it would be better if the author explained the reasons related to the extrinsic and intrinsic parameters of the sample which cause the nonlinearity of the model to also need to be investigated. The journal entitled "Non-linear regression methods in NIRS quantitative analysis" by Perez-Marin et al (2007) can be used as a reference.
3. In the method, spectral data was collected from 3 points on 1 mango (250-300g), but the chemical data was only collected at 10g. It was not explained whether all parts of the mango were blended and then only 10g was taken for chemical analysis, or 10g was only from one part (perhaps the tip, shoulder or cheek).
4. In the method, the mango is blended with distilled water which is then placed in a refractometer to obtain Brix data, which means the resulting Brix data must be calculated using a certain formula because it involves dilution. Does the Brix data used for the model accommodate this? If not, the Brix value must be recalculated.
5. If you use a wavelength of 900-1650 nm with an interval of 7 nm, 107-108 variables are produced, but in the method it is written 112 predictor variables, which one is correct?
6. In Statistical Analysis, it is written "... The predictor variables were the wavelengths of 112 NIR spectra for each mango sample..". It would be more appropriate if the 'predictor variables' were the absorbance'.
7. In Statistical Analysis, the author wrote 'The following steps were to perform dimension reduction using principal component analysis (PCA) to reduce predictor variables into two principal components'. In reality, PCA does not only reduce to 2 dimensions. Please check related references.
8. Figure 4, what is the reason for displaying all the spectra, is it possible to just take the average spectra from each level of ripeness (raw, unripe, or ripe, like the grouping in Table 1, so that there is additional discussion of differences in spectra from various levels of ripeness.
9. In the discussion it is necessary to add differences in the results of R2, RMSE, etc. when compared with other linear methods (PLSR, etc.), or even nonlinear ones, for example with ANN
10. In the conclusion, comments from the author can be added regarding the recommended method for predicting pH and TSS of mango, whether with a linear or nonlinear model.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

NIR spectroscopy and chemometrics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

20 Views

09 Oct 2023 | for Version 1

Jens Petter Wold, The Norwegian Institute of Food, Fisheries and Aquaculture Research, Osloveien, Norway

20 Views Cite this report Responses(0)

Not Approved

This study compares different regression and pre-processing techniques of NIR spectra from mango fruits, where the reference values are sugar (TSS) and pH. Unfortunately, this is not a proper scientific study.

No consideration has been made on how to obtain a best possible match between NIR spectra and the target features TSS and pH.
Internal quality features of mango are attempted modeled with surface measurements on the skin of the mangos.
The NIR spectra shown are extremely noisy. It is not clear what kind of noise this is. It is not commented on by the authors. Furthermore, the spectra are not in any way explained to the reader, what do we see, what are absorption bands of interest?
There is no interpretation of regression vectors or spectral features whatsoever to explain the results or to make probable that they rely on sound relations. It means that the models could rely on completely other features than TSS and pH, for instance pigments.
Based on a data set that the authors clearly do not understand well, they compare different regression models. It does not make much sense.

This is an article that is more misleading than informative, and it should not be published unless major improvements can be made.

Other comments:

The work is not clearly presented, see comments below. The work cites some relevant literature in the introduction, but almost no discussion of the results is done in relation to previous reported work.
The design of the study is very limited in terms of samples. All samples are from the same garden giving very little relevant bio-variation in the sample set. It is well known that fruit from different gardens can exhibit slightly different NIR signals, so a bigger data set should be made, especially when the aim here is to compare regression methods. A completely independent test set should also be used.
The NIR spectra are presented maybe as absorption spectra turned upside down? The quality of the spectra is very poor with a lot of noise. The noise seems to increase with increasing absorption. What is wrong here? This very high level of noise would obscure the calibrations, since e.g the absorption bands of sugar are very small and the variation very subtle.
It is well known that TSS can be measured with NIR spectroscopy, but not pH. The authors should argue and explain why they can measure pH. Which spectral features does the model rely on?
A figure of the NIR measurement set-up should be included. Also showing where the samples for references were taken from each mango.
Description of data analysis is confusing. 5-fold cross validation is used. Still, they show predicted vs. Measured plots of calibration set and validation set. This does not make sense. It must be clarified.
The number of components in the models are not listed, they give an impression of the complexity of the models.
The differences in performance between the regression methods are small. A significance test should be conducted, to indicate if the RMSECV values are significantly different between the methods.
Many outliers are taken out of the data set based purely on statistics. This is normally not acceptable. There must be a good reason for removing an outlier. E.g. that the measurement was done in a wrong way.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

My areas of expertise are within NIR spectroscopy, food science and data modeling

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

27 Views

24 Aug 2023 | for Version 1

Kim Seng Chia, Electrical and Electronic Engineering, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia

27 Views Cite this report Responses(0)

Approved With Reservations

Title: Mango quality prediction based on near-infrared spectroscopy using multi-predictor local polynomial regression modeling

Summary of the study:
This study shows that the use of nonparametric regression (i.e. multi-predictor local polynomial regression (MLPR)) outperformed kernel partial least square regression (KPLSR) and support vector machine regression (SVMR) in predicting the pH and total soluble solids (TSS) of "Gadung Klonal 21" mango fruits, which maturity stages cannot be visually classified as stated the introduction section - paragraph 3. 165 mangoes (three different maturity) were used (excluding 21 outliers) that appears sufficient to build the MLRP, KPLSR, and SVMR. 5-fold cross-validation was used to evaluate the performance of regression models.

Comments with respect to the questions (bolded) are as follows.

1. Is the work clearly and accurately presented and does it cite the current literature?

- No reference from 2023 and 2022 is cited and discussed. Authors shall discuss more related recent works (2019-2023) in the manuscript too. Suggestion: more than 50% of the cited works shall be published in last five years. Current manuscript only cited literature that published in 2021 (6 works), 2020 (3 works), 2019 (9 works), and other years (28 works).

2. Is the study design appropriate and is the work technically sound?

- Figure 3 suggests that only MSC was able to remove unwanted baseline/offset effects. This is questionable about why models with smoothing (that consisted of unwanted baseline shift effects) can outperform models with MSC preprocessing method. (p/s: please define the full name of MSC for its first use). Suggestion: author may try to use SG smoothing + MSC as Figure 3 (d) suggests that the spectra shall be smoothed first prior MSC.

3. Are sufficient details of methods and analysis provided to allow replication by others?

Four comments are as follows.

(i) A figure to illustrate the actual spectral data acquisition setup shall be provided with details of the distances among sample, light source and spectrometer.

(ii) From the manuscript, "The following steps were to perform dimension reduction using principal component analysis (PCA), ..." Why PCA was involved in this study? The use of PCA prior to KPLSR is not make sense as PLS does not need additional dimension reduction method to reduce dimension in general. Nevertheless, in the results section, PCA was used for outlier detection. Authors shall revise the description to avoid confusing.

(iii) How were the data split during cross-validation (section Model evaluation)?

(iv) Figure 4 and 5 are confusing. As cross-validation was used, authors shall provide performance of all folds instead of fold 4 and fold 1 in figure 4 and 5, respectively.

4. If applicable, is the statistical analysis and its interpretation appropriate?

- Table 1. Descriptive statistics shall detail the info for maturity stages - how many mangoes in each maturity stages and their descriptive statistics?

5. Are all the source data underlying the results available to ensure full reproducibility?

- Reproducibility concern: "In this study, 21 outliers were identified. These outliers were excluded because they could have a negative impact on the model." Authors shall explain more about why more than 10% of data were outliers and how to avoid/minimize them in the future.

- Additionally, which data were the outliers shall be stated clearly as the provided open dataset includes the 21 outliers.

6. Are the conclusions drawn adequately supported by the results?

Conclusion content - in future, authors might consider to evaluate the robustness of the models using samples across different seasons as testing datasets.

7. Other comments:

- Authors shall use RMSECV instead of RMSEV as cross-validation was used in this study.

- Figure 1 - please use Validation instead of Testing to align with the text.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Machine Learning, Transfer Learning, Near Infrared Spectroscopy, Real-time Embedded System

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. Karsinah, Rebin: Tasliah Varietas Unggul Mangga Gadung 21: Daging Buah Tebal, Berserat Rendah, Rasa Manis. Iptek Hortik. 2017; 13: 39–44.

[2] 2. Mir SA, Shah MA, Mir MM: Postharvest Biology and Technology of Temperate Fruits.2018. 9783319768434.

[3] 3. Sohaib ASS, Zeb A, Qureshi WS, et al.: Towards Fruit Maturity Estimation Using NIR Spectroscopy. Infrared Phys. Technol. 2020; 111: 103479. Publisher Full Text

[4] 4. Jie D, Xie L, Rao X, et al.: Using Visible and near Infrared Diffuse Transmittance Technique to Predict Soluble Solids Content of Watermelon in an On-Line Detection System. Postharvest Biol. Technol. 2014; 90: 1–6. Publisher Full Text

[5] 5. Sari HP, Purwanto YA, Budiastra IW: Prediction of Chemical Contents in ‘Gedong Gincu’ Mango Using near Infrared Spectroscopy. J. Agritech. 2016; 36: 294. Publisher Full Text

[6] 6. Nicolaï BM, Beullens K, Bobelyn E, et al.: Nondestructive Measurement of Fruit and Vegetable Quality by Means of NIR Spectroscopy: A Review. Postharvest Biol. Technol. 2007; 46: 99–118. Publisher Full Text

[7] 7. Jha SN, Chopra S, Kingsly ARP: Modeling of Color Values for Nondestructive Evaluation of Maturity of Mango. J. Food Eng. 2007; 78: 22–26. Publisher Full Text

[8] 8. Jha SN, Jaiswal P, Narsaiah K, et al.: Non-Destructive Prediction of Sweetness of Intact Mango Using near Infrared Spectroscopy. Sci. Hortic. (Amsterdam). 2012; 138: 171–175. Publisher Full Text

[9] 9. Jha SN, Narsaiah K, Jaiswal P, et al.: Nondestructive Prediction of Maturity of Mango Using near Infrared Spectroscopy. J. Food Eng. 2014; 124: 152–157. Publisher Full Text

[10] 10. Watanawan C, Wasusri T, Srilaong V, et al.: Near Infrared Spectroscopic Evaluation of Fruit Maturity and Quality of Export Thai Mango (Mangifera Indica L. Var. Namdokmai). Int. Food Res. J. 2014; 21: 1073–1078.

[11] 11. Schulze K, Nagle M, Spreer W, et al.: Development and Assessment of Different Modeling Approaches for Size-Mass Estimation of Mango Fruits (Mangifera Indica L., Cv.’Nam Dokmai’). Comput. Electron. Agric. 2015; 114: 269–276. Publisher Full Text

[12] 12. Rungpichayapichet P, Mahayothee B, Nagle M, et al.: Robust NIRS Models for Non-Destructive Prediction of Postharvest Fruit Ripeness and Quality in Mango. Postharvest Biol. Technol. 2016; 111: 31–40. Publisher Full Text

[13] 13. Agussabti, Rahmaddiansyah, Satriyo P, et al.: Data Analysis on near Infrared Spectroscopy as a Part of Technology Adoption for Cocoa Farmer in Aceh Province, Indonesia. Data Br. 2020; 29: 105251. PubMed Abstract | Publisher Full Text | Free Full Text

[14] 14. Valipour M, Banihabib ME, Behbahani SMR: Monthly Inflow Forecasting Using Autoregressive Artificial Neural Network. J. Appl. Sci. 2012; 12: 2139–2147. Publisher Full Text

[15] 15. Anderson NT, Walsh KB, Flynn JR, et al.: Achieving Robustness across Season, Location and Cultivar for a NIRS Model for Intact Mango Fruit Dry Matter Content. II. Local PLS and Nonlinear Models. Postharvest Biol. Technol. 2021; 171: 111358. Publisher Full Text

[16] 16. Ulya M, Chamidah N, Saifudin T: Predicting the Sweetness Level of Avomango (Gadung Klonal 21) Using Multi-Predictor Local Polynomial Regression. IOP Conf. Ser. Earth Environ. Sci. 2021; 733: 012009. Publisher Full Text

[17] 17. Chamidah N, Lestari B: Estimation of Covariance Matrix Using Multi-Response Local Polynomial Estimator for Designing Children Growth Charts: A Theoretically Discussion. J. Phys. Conf. Ser. 2019; 1397: 012072. Publisher Full Text

[18] 18. Derkacheva A, Mouginot J, Millan R, et al.: Data Reduction Using Statistical and Regression Approaches for Ice Velocity Derived by Landsat-8, Sentinel-1 and Sentinel-2. Remote Sens. 2020; 12: 1–21. Publisher Full Text

[19] 19. Islamiyati A, Chamidah N: Ability of Covariance Matrix in Bi-Response Multi-Prredictor Penalized Spline Model Through Longitudinal Data Simulation.2019; 3: 8–11.

[20] 20. Adiwati T, Chamidah N: Modelling of Hypertension Risk Factors Using Penalized Spline to Prevent Hypertension in Indonesia. IOP Conf. Ser. Mater. Sci. Eng. 2019; 546: 052003. Publisher Full Text

[21] 21. Ramadan W, Chamidah N, Zaman B, et al.: Standard Growth Chart of Weight for Height to Determine Wasting Nutritional Status in East Java Based on Semiparametric Least Square Spline Estimator. IOP Conf. Ser. Mater. Sci. Eng. 2019; 546: 052063. Publisher Full Text

[22] 22. Lestari B, Fatmawati, Budiantara IN, et al.: Estimation of Regression Function in Multi-Response Nonparametric Regression Model Using Smoothing Spline and Kernel Estimators. J. Phys. Conf. Ser. 2018; 1097: 012091. Publisher Full Text

[23] 23. Hidayati L, Chamidah N, Nyoman Budiantara I: Spline Truncated Estimator in Multiresponse Semiparametric Regression Model for Computer Based National Exam in West Nusa Tenggara. IOP Conf. Ser. Mater. Sci. Eng. 2019; 546: 052029. Publisher Full Text

[24] 24. George J, Janaki L, Parameswaran Gomathy J: Statistical Downscaling Using Local Polynomial Regression for Rainfall Predictions – A Case Study. Water Resour. Manag. 2015; 30: 183–193. Publisher Full Text

[25] 25. Block P, Goddard L: Statistical and Dynamical Climate Predictions to Guide Water Resources in Ethiopia. J. Water Resour. Plan. Manag. 2012; 138: 287–298. Publisher Full Text

[26] 26. Ulya M, Chamidah N: Multi-Predictor Local Polynomial Regression for Predicting the Acidity Level of Avomango (Gadung Klonal 21). AIP Conf. Proc. 2021; 2329. Published. Publisher Full Text

[27] 27. Nicolaï BM, Theron KI, Lammertyn J: Kernel PLS Regression on Wavelet Transformed NIR Spectra for Prediction of Sugar Content of Apple. Chemom. Intell. Lab. Syst. 2007; 85: 243–252. Publisher Full Text

[28] 28. Al-Sanabani DGA, Solihin MI, Pui LP, et al.: Development of Non-Destructive Mango Assessment Using Handheld Spectroscopy and Machine Learning Regression. J. Phys. Conf. Ser. 2019; 1367: 012030. Publisher Full Text

[29] 29. Cortés V, Rodríguez A, Blasco J, et al.: Prediction of the Level of Astringency in Persimmon Using Visible and Near-Infrared Spectroscopy. J. Food Eng. 2017; 204: 27–37. Publisher Full Text

[30] 30. Luo J, Ying K, Bai J: Savitzky-Golay Smoothing and Differentiation Filter for Even Number Data. Signal Process. 2005; 85: 1429–1434. Publisher Full Text

[31] 31. Tan YP, Chia KS: Effects of Pre-Processing and Principal Components for Artificial Neural Network in Non-Destructive Internal Quality Prediction of Mango across Different Harvest Periods. IEEE 13th Int. Conf. Control Syst. Comput. Eng. 2023; 144–148. Publisher Full Text

[32] 32. Mishra P, Woltering E: Semi-Supervised Robust Models for Predicting Dry Matter in Mango Fruit with near-Infrared Spectroscopy. Postharvest Biol. Technol. 2023; 200: 112335. Publisher Full Text

[33] 33. Saeys W, Nguyen Do Trong N, et al.: Multivariate Calibration of Spectroscopic Sensors for Postharvest Quality Evaluation: A Review. Postharvest Biol. Technol. 2019; 158: 110981. Publisher Full Text

[34] 34. Hastie T, Tibshirani R: Generalized Additive Models. Chapman & Hall; 1990. 9781351445962.

[35] 35. Ariyanto RA, Chamidah N: Sentiment Analysis for Zoning System Admission Policy Using Support Vector Machine and Naive Bayes Methods. J. Phys. Conf. Ser. 2021; 1776: 012058. Publisher Full Text

[36] 36. Ardhani BA, Chamidah N, Saifudin T: Sentiment Analysis Towards Kartu Prakerja Using Text Mining with Support Vector Machine and Radial Basis Function Kernel. J. Inf. Syst. Eng. Bus. Intell. 2021; 7: 119. Publisher Full Text

[37] 37. Asrol M, Papilo P, Gunawan FE: Support Vector Machine with K-Fold Validation to Improve the Industry’s Sustainability Performance Classification. Procedia Comput. Sci. 2021; 179: 854–862. Publisher Full Text

[38] 38. Ren L, Glasure Y: Applicability of the Revised Mean Absolute Percentage Errors (MAPE) Approach to Some Popular Normal and Non-Normal Independent Time Series. Int. Adv. Econ. Res. 2009; 15: 409–420. Publisher Full Text

[39] 39. Moreno JJM, Palmer Pol A, Sesé Abad A, et al.: El Índice R-MAPE Como Medida Resistente Del Ajuste En La Previsiońn. Psicothema. 2013; 25: 500–506. Publisher Full Text

[40] 40. Akhlaghi YG, Ma X, Zhao X, et al.: A Statistical Model for Dew Point Air Cooler Based on the Multiple Polynomial Regression Approach. Energy. 2019; 181: 868–881. Publisher Full Text

[41] 41. Magwaza LS, Opara UL, Nieuwoudt H, et al.: NIR Spectroscopy Applications for Internal and External Quality Analysis of Citrus Fruit-A Review. Food Bioprocess Technol. 2012; 5: 425–444. Publisher Full Text

[42] 42. Mami AM, Jaber AM, Almabrouk OS: Applying Bootstrap Robust Regression Method on Data with Outliers. Int. J. Sci.: Basic Appl. Res. 2020; 49: 143–160.

[43] 43. Kutner MH, Nachtsheim C, Neter J: Applied Linear Regression Models. 4^th ed.McGraw-Hill/Irwin; 2004.

[44] 44. Xie L, Ye X, Liu D, et al.: Prediction of Titratable Acidity, Malic Acid, and Citric Acid in Bayberry Fruit by near-Infrared Spectroscopy. Food Res. Int. 2011; 44: 2198–2204. Publisher Full Text

[45] 45. Kamboj U, Guha P, Mishra S: Comparison of PLSR, MLR, SVM Regression Methods for Determination of Crude Protein and Carbohydrate Content in Stored Wheat Using near Infrared Spectroscopy. Mater. Today Proc. 2022; 48: 576–582. Publisher Full Text

[46] 46. Vasconcelos L, Dias G, Leite A, et al.: SVM Regression to Assess Meat Characteristics of Bísaro Pig Loins Using NIRS Methodology. Foods. 2023; 12: 1–15. PubMed Abstract | Publisher Full Text | Free Full Text

[47] 47. Chanda S, Hazarika AK, Choudhury N, et al.: Support Vector Machine Regression on Selected Wavelength Regions for Quantitative Analysis of Caffeine in Tea Leaves by near Infrared Spectroscopy. J. Chemom. 2019; 33: 1–15. Publisher Full Text

[48] 48. Williams P, Antoniszyn J, Manley M: Near Infrared Technology: Getting the Best out of Light. USA: Sun Press; 2019. 9781928480303.

[49] 49. Liu Y, Sun X, Zhou J, et al.: Linear and Nonlinear Multivariate Regressions for Determination Sugar Content of Intact Gannan Navel Orange by Vis-NIR Diffuse Reflectance Spectroscopy. Math. Comput. Model. 2010; 51: 1438–1443. Publisher Full Text

[50] 50. Chauchard F, Cogdill R, Roussel S, et al.: Application of LS-SVM to Non-Linear Phenomena in NIR Spectroscopy: Development of a Robust and Portable Sensor for Acidity Prediction in Grapes. Chemom. Intell. Lab. Syst. 2004; 71: 141–150. Publisher Full Text

[51] 51. Louw ED, Theron KI: Robust Prediction Models for Quality Parameters in Japanese Plums (Prunus Salicina L.) Using NIR Spectroscopy. Postharvest Biol. Technol. 2010; 58: 176–184. Publisher Full Text

[52] 52. Peirs A, Tirry J, Verlinden B, et al.: Effect of Biological Variability on the Robustness of NIR Models for Soluble Solids Content of Apples. Postharvest Biol. Technol. 2003; 28: 269–280. Publisher Full Text

[53] 53. Subedi PP, Walsh KB, Owens G: Prediction of Mango Eating Quality at Harvest Using Short-Wave near Infrared Spectrometry. Postharvest Biol. Technol. 2007; 43: 326–334. Publisher Full Text

[54] 54. de Santana FB , Otani SK, de Souza AM , et al.: Comparison of PLS and SVM Models for Soil Organic Matter and Particle Size Using Vis-NIR Spectral Libraries. Geoderma Reg. 2021; 27: e00436. Publisher Full Text

[55] 55. Munawar AA, Zulfahrizal, Meilina H, et al.: Near Infrared Spectroscopy as a Fast and Non-Destructive Technique for Total Acidity Prediction of Intact Mango: Comparison among Regression Approaches. Comput. Electron. Agric. 2022; 193: 106657. Publisher Full Text

[56] 56. Cardoso VGK, Poppi RJ: Non-Invasive Identification of Commercial Green Tea Blends Using NIR Spectroscopy and Support Vector Machine. Microchem. J. 2021; 164: 106052. Publisher Full Text

[57] 57. Chamidah N, Gusti KH, Tjahjono E, et al.: Improving of Classification Accuracy of Cyst and Tumor Using Local Polynomial Estimator. Telkomnika (Telecommunication Comput. Electron. Control). 2019; 17: 1492–1500. Publisher Full Text

[58] 58. Fibriyani V, Chamidah N: Prediction of Inflation in Indonesia Using Nonparametric Regression Approach Based on Local Polynomial Estimator. Adv. Soc. Sci. Educ. Humanit. Res. 2020; 474: 79–86. Publisher Full Text

[59] 59. Chamidah N, Mardianto MFF, Limanta EE, et al.: Modelling of Poverty Percentage Based on Mean Years of Schooling in Indonesia Using Local Linear Estimator. Vol. 474. . Atlantis Press; 2020; pp. 87–91. Publisher Full Text

[60] 60. Ulya M: dataset avomango. [dataset]. 2023, February 10. Publisher Full Text

Mango quality prediction based on near-infrared spectroscopy using multi-predictor local polynomial regression modeling

Abstract

Background

Methods

Results

Conclusions

Keywords

Revised Amendments from Version 1

Introduction

Methods

Sample preparation

NIR spectra data acquisition

Figure 1. NIRS Acquisition.

Statistical analysis

(1)

(2)

(3)

(4)

Figure 2. Five-fold cross-validation.

(5)

(6)

(7)

(8)

(9)

(10)

(11)

(12)

Figure 3. Stages of developing prediction models.

Results

Pre-treatment

Figure 4. Untreated and pre-treated spectra data.

Descriptive statistical values of the pH and TSS of the mangoes

Table 1. Descriptive statistics of the pH and TSS of mangoes.

Predictive performance comparison of pH value

Table 2. Predictive performance comparison of pH values.

Predictive performance comparison of TSS value

Table 3. Predictive performance comparison of TSS values.

Discussion

pH value

Figure 5. MLPR predictive performance on Gaussian filter smoothing (GFS) spectra data for predicting the pH of mangoes (Fold 4).

TSS value

Figure 6. MLPR predictive performance on SG Smoothing spectra data for predicting the TSS of mangoes (Fold 1).

Conclusions

Data availability

Underlying data

Acknowledgments

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated