ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Forecasting Pepper and Turmeric Prices in India: A Machine Learning Analysis of Macroeconomic Drivers

[version 1; peer review: awaiting peer review]
PUBLISHED 13 Jan 2026
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

This article is included in the Manipal Academy of Higher Education gateway.

Abstract

Background

Accurate forecasting of agricultural commodity prices is crucial for maintaining economic stability and informing data-driven policy interventions in emerging economies, such as India. Price volatility in key spices, such as pepper and turmeric, can significantly impact producers, consumers, and market stakeholders, underscoring the need for robust predictive models that capture both short-term fluctuations and long-term dependencies.

Methods

This study develops an integrated forecasting framework that combines Machine Learning (Random Forest), Deep Learning (LSTM), and Econometric (VECM) approaches to analyse the dynamic behaviour of Indian pepper and turmeric prices. The models incorporate major macroeconomic determinants, including GDP, Consumer Price Index (CPI), exchange rate, gold price, interest rate, trade volume, and foreign institutional investments (FII), to capture both non-linear and long-term relationships. Model performance was evaluated using RMSE, MAE, and symmetric MAPE (sMAPE) metrics, alongside SHAP-based feature explainability analysis.

Results

The findings reveal that Random Forest delivers the most robust predictive accuracy overall, especially for Turmeric, while LSTM achieves slightly lower forecast errors for Pepper. Both machine-learning models substantially outperform the VECM in short-term price forecasts. Feature importance and SHAP analyses identified NIFTY50, GDP, CPI, exchange rate, interest rate and gold prices as key drivers of spice price movements.

Conclusions

Integrating machine learning, deep learning, and econometric models enhances the robustness and interpretability of commodity price forecasting. The study provides empirical evidence that macroeconomic variables significantly influence spice price dynamics, offering a hybrid framework that can support policymakers, traders, and researchers in mitigating market risks and designing more effective agricultural price stabilisation strategies.

Keywords

Agricultural commodities, Machine Learning, Price forecasting, Macroeconomic variables, Market volatility

1. Introduction

India, often celebrated as the “Land of Spices,” is a dominant player in the global spice trade, contributing substantially to both domestic and international economies.1 Among its most valuable exports, turmeric and black pepper hold pivotal positions not only for their culinary significance but also for their medicinal, industrial, and cultural importance.2 The spice sector sustains millions of livelihoods across rural India, linking agricultural productivity to export earnings and global market integration. However, this sector faces persistent challenges, including price volatility, yield fluctuations, and macroeconomic uncertainties.3

Price instability in agricultural commodities like pepper and turmeric stems from both domestic factors, such as variable production cycles, climatic shocks, and storage inefficiencies, as well as global forces, including exchange rate fluctuations, international trade agreements, and changing consumer preferences. Macroeconomic conditions further exacerbate these challenges; for example, inflationary pressures can increase production costs, while exchange rate depreciation can impact export competitiveness.4 As such, forecasting price movements with precision is essential for stabilising farmer incomes, informing export policies, and ensuring long-term sustainability in the spice sector.5

The selection of pepper and turmeric for this study is guided by their economic interlinkages and market structure similarities. Both are perennial crops with high export dependency, price-inelastic domestic demand, and overlapping supply chains involving common intermediaries, logistics systems, and production regions.6,7 These structural parallels suggest the possibility of long-run cointegration where both commodities may share a stable equilibrium relationship influenced by macroeconomic drivers.

Existing studies have largely analysed individual commodities or short-term dynamics, often omitting a comparative and hybrid analytical framework that integrates both macroeconomic influences and modern predictive analytics. This research addresses this gap by developing a hybrid econometric–machine learning framework that examines the macroeconomic determinants of turmeric and pepper prices in India. Specifically, it combines traditional econometric modelling using the Vector Error Correction Model (VECM) with advanced Machine Learning (Random Forest) and Deep Learning (LSTM) models to enhance predictive accuracy and interpretability.

The objectives of this study are threefold: To identify and quantify the macroeconomic drivers (e.g., GDP, CPI, exchange rate, gold prices, interest rates, and FII inflows) influencing turmeric and pepper price fluctuations; To assess the short-term forecasting accuracy of machine learning and deep learning models relative to econometric approaches; To interpret the long-run equilibrium relationships between the two commodities and the broader macroeconomic environment.

By integrating predictive and interpretive models, this study contributes to agricultural economics by offering actionable insights for policy formulation, export strategy, and price risk management. The results are particularly relevant for policymakers, exporters, and farmers seeking evidence-based approaches to mitigate volatility and enhance the long-term resilience of India’s spice trade.

2. Literature review

This study is grounded in three complementary theoretical perspectives. First, the Law of One Price and price transmission theory explain how macroeconomic shocks such as exchange rate movements, inflation, and monetary policy changes are transmitted to agricultural commodity prices through trade and cost channels.8 Second, market integration theory supports the application of cointegration analysis to identify long-run equilibrium relationships between spice prices and macroeconomic fundamentals.9

Third, behavioural finance and nonlinear market theory provide justification for the use of machine-learning models, as agricultural markets often exhibit asymmetric responses, regime shifts, and nonlinear adjustment patterns that linear econometric models may fail to capture.10 Integrating these theoretical perspectives enables a richer interpretation of both long-run equilibrium dynamics and short-run price fluctuations.

The reviewed literature from Table 1 collectively underscores that agricultural commodity prices are deeply influenced by macroeconomic fundamentals, policy interventions, and market integration. Across the reviewed studies, methods such as VECM, ARIMA-GARCH, and LSTM-based hybrid models consistently reveal that variables like exchange rates, inflation, interest rates, and foreign investments play pivotal roles in determining commodity price volatility. The empirical consensus highlights that while econometric models effectively capture long-run equilibrium and causal linkages, machine learning approaches outperform them in short-run forecasting by accommodating nonlinearity and higher-order interactions. Furthermore, the integration of sustainability and climate-resilient frameworks in recent studies reflects a growing interdisciplinary focus on aligning price forecasting with the Sustainable Development Goals (SDGs), particularly in emerging economies like India, where agriculture remains a key driver of inclusive growth.

Table 1. Literature review summarised.

Sl No.YearTitleAuthor(s)ObjectivesVariablesFindingsMethodsResearch Gap
12024Return and Volatility Spillover in Equity and Commodity Markets: Some Indian EvidenceTapas Kumar Sethy, Naliniprava TripathyStudy returns and volatility spillovers among Indian equity, agro and non-agro commodity marketsEquity indices (BSE SENSEX), commodity indices (MCX Gold/Silver/Crude Oil, NCDEX agri)Bidirectional and unidirectional causal relations; SENSEX dominates info flow; COVID-19 structural breaks affect volatility11GARCH, EGARCH with dummy variables; VAR Granger causalityDoes not examine macroeconomic drivers or spice-specific commodities
22024Fast and Order-Invariant Inference in Bayesian VARs with Nonparametric ShocksFlorian Huber, Gary KoopEfficient Bayesian VAR method with nonparametric shocksMacroeconomic variables in the VAR frameworkImproves forecast accuracy by modeling time-varying shock transmission12Bayesian VAR, Dirichlet process mixture, MCMCMethodology not applied to agricultural or spice markets
32024Volatility Spillovers among the Major Commodities: A Bibliometric ReviewKonstantinos D. MelasReview of literature trends on commodity price volatility spilloversCommodity price volatilities (metals, energy, agriculture)Significant spillovers mainly among metals and energy categories13Bibliometric review and literature synthesisNo country-specific or commodity-specific modelling
42023A Comparative Multivariate Analysis of VAR and Deep Learning Models for Forecasting Volatile Time SeriesAkkemCompare multivariate forecasting models: VAR vs deep learningFinancial time series (open, high, low, close, volume)Deep learning models outperform VAR; the Transformer best method14VAR, LSTM, BI-LSTM, TCN, Multi-head AttentionNo hybrid econometric–ML frameworks; no macro drivers
52023How Serious is India's Nonperforming Assets Crisis?ManiInvestigate the impact of macroeconomic shocks on NPAs in Indian bankingNPAs, rainfall, food prices, fiscal and monetary shocksNPA surges mainly not due to macro shocks; feed and fuel shocks influence NPAs15Structural macro-financial econometric modelEstablishes macro-price transmission but not for commodities
62023Price Discovery Mechanism and Volatility Spillover Between the National Agriculture Market and the NCDEXGargAnalyse price discovery & volatility spillovers in Indian agri marketsSpot and futures prices and volatilitiesNCDEX futures and spot lead E-NAM spot prices; volatility spillover is significant16Johansen cointegration, VEC, Granger causality, GARCHNo macro-variable inclusion; focused on market microstructure
72023Investigating Spillovers between Energy, Food, and Agricultural Commodity MarketsKhalfaoui, Shahzad, Asl, JabeurQuantile coherency analysis of energy, food, and agri spillovers, including the COVID effectGlobal indices of energy, food, fertiliser, and agricultureStrong fertiliser energy link; weak oil-agri except extremes; COVID increased co-movement17Quantile coherency, frequency-domain methodsCountry-specific analysis absent
82023Different Moments Create Different Spillovers: A Study of Commodity MarketsHe & HamoriExamine higher moment (skewness, kurtosis) spillovers in commoditiesEnergy, metals, agriculture, livestockSignificantly higher moment spillovers; non-linear risk structure18Copula-based spillover measuresDoes not address macroeconomic drivers
92023Dynamic Return and Volatility Connectedness for Dominant Agricultural Commodity Markets During COVID-19Umar, Escribano, JareoExamine time-varying connectedness among agri commodity indexes under COVID-19Grains, soft commodities, livestock indexesPeak connectedness and media sentiment effects during the pandemic19TVP-VAR, connectedness indicesNo India-focused or spice-specific results
102023Price Interconnection of Fuel and Food Markets: Evidence from Biodiesel in the USTanakaStudy price comovement in biodiesel, diesel, and food marketsPrices of biodiesel, diesel, soybeans, rapeseed, and canolaStrong cointegration, tight feed-food-fuel integration20Cointegration, ARDL, CausalityFocus on biofuel–food nexus, not spices
112022Quantile Risk Spillovers Between Energy and Agricultural Commodity Markets: Pre and During COVID-19Tiwari, Abakah, Adewuyi, LeeAnalyse quantile risk spillovers with a COVID focusWheat, corn, sugar, soybean, coffee, cotton, gasoline, crude oil, gasStronger spillovers at upper/lower quantiles; agri dominates energy at extremes21Rolling quantile VAR, Realised varianceNo forecasting or SHAP interpretability
122021The Impact of COVID-19 on Commodity Markets VolatilityUmar, Gubareva, TeplovaAnalyse the COVID-19 effects on commodity price volatility and panicEnergy, agri, livestock, metals, COVID Panic IndexCommodity investment diversification and hedging potential shifts22Wavelet coherence, phase differenceNo macro-drivers or India-specific commodities
132021Return and Volatility Transmission Between Oil Price Shocks and Agricultural CommoditiesZaghum UmarStudy the connectedness between oil price shocks and agri commoditiesOil price shocks, agricultural commodity pricesOil shocks Granger-cause agri price changes; connection rise in crises23Granger causality, connectedness indexCovers global agri, not spice markets
142018Asymmetric Risk Spillovers Between Oil and Agricultural CommoditiesShahzadAnalyse asymmetric tail risk spillovers between oil and agri commoditiesOil, wheat, maize, soybeans, rice pricesRisk spillovers intensify during crises; symmetric tail dependence24ARMA-GARCH, CoVaR, quantile analysisNo comparison between commodities
152016Spillovers Between Output and Stock Prices: A Wavelet ApproachMcMillan, TiwariStudy time-varying spillovers between output and stock pricesGDP, stock prices, consumption, inflationBidirectional spillover; output leads stock prices25Wavelet decomposition, Diebold-Yilmaz indexIndicates macro transmission, not tied to agriculture
162013Volatility Spillover Between Oil and Agricultural Commodity MarketsNazlioglu, Erdem, SoytasTest volatility transmission from oil prices to agri commodities pre/post 2006Oil, wheat, corn, soybeans, sugarPost-2006, oil volatility affects agri prices26Causality-in-variance, impulse response, GARCHDoes not explore deep learning or ML forecasting
172005Futures Trading Activity and Commodity Cash Price VolatilityJian Yang, Balyeat, LeathamExamine the lead-lag between futures volume and cash price volatilityFutures volume and open interest, agri cash pricesFutures volume raises volatility; weak feedback from open interest27GARCH, Granger causality, FEVDNo macroeconomic or ML angle
182005New Introduction to Multiple Time Series Analysis (VAR and Cointegration)Helmut LütkepohlProvide comprehensive coverage on multiple time series analysisMultivariate time seriesVAR and VECM models are standard in econometrics28Econometric theory, VECM modelsUseful theoretically; lacks ML & commodity context

Guided by the reviewed literature, a panel of macroeconomic and commodity variables was selected for empirical modelling. Table 1 depicts the literature review summary which provides a comparative overview of studies examining commodity price spillovers, volatility transmission, and forecasting methods in agricultural and financial markets. Table 2 provides detailed definitions and sources of all variables employed in the analysis.

Table 2. Table showing Variable definition.

VariableDefinitionSource
Nifty50The Indian equity benchmark index, comprising 50 leading firms on the NSE, represents the overall market sentiment.29SEBI
GDP (Constant Prices)Quarterly real GDP adjusted for inflation; an indicator of economic growth.30Ministry of Commerce, Govt. of India
Indian Foreign Trade (IFT)Combined quarterly value of exports and imports, measuring trade openness.Ministry of Commerce, Govt. of India
Interest Rate (Repo Rate)RBI’s policy repo rate, the key instrument for liquidity control and inflation targeting.30Reserve Bank of India (RBI)
Foreign Institutional Investment (FII)Net quarterly inflows from foreign institutional investors are a proxy for global capital sentiment31Reserve Bank of India (RBI)
Gold RateDomestic gold price per 10g; a traditional inflation hedge and financial market indicator32SEBI
Index of Industrial Production (IIP)Composite indicator of industrial output, representing manufacturing and infrastructure growth.MOSPI Official Data
Consumer Price Index (CPI)Quarterly CPI: a standard measure of retail price inflation.Reserve Bank of India (RBI)
Exchange Rate (INR/USD)The average quarterly INR/USD rate indicates external competitiveness.Reserve Bank of India (RBI)
Long-Term External Debt (ED)Outstanding debt owed to foreign creditor33Reserve Bank of India (RBI)
Turmeric Price (NCDEX)Spot and futures prices of turmeric traded on NCDEX; used for price discovery and volatility analysis34Reserve Bank of India (RBI)
Pepper Price (NCDEX)Spot and futures prices of pepper traded on NCDEX are used for analysing price discovery and risk management35NCDEX

Despite these advances, there remains a scarcity of empirical studies that combine macroeconomic determinants with machine learning and econometric frameworks to analyse Indian spice markets, particularly for pepper and turmeric. This study contributes a novel, dual-framework approach by integrating the Vector Error Correction Model (VECM) with Random Forest and LSTM architectures to jointly assess long-run equilibrium, short-run dynamics, and nonlinear temporal dependencies in the prices of pepper and turmeric. By incorporating a comprehensive set of macroeconomic determinants, including GDP, CPI, exchange rate, interest rate, foreign institutional investments, and trade volume, this study extends existing evidence to a multi-model, multi-variable framework that captures both structural and data-driven patterns. Moreover, the study fills a key empirical gap by using quarterly data spanning 2010–2025, offering one of the most recent and methodologically integrated analyses of the Indian spice economy. The findings not only advance methodological rigour in agri-price forecasting but also provide policy-relevant insights for managing volatility, export planning, and sustainable agricultural trade under dynamic macroeconomic conditions.

3. Data and Methodology

3.1 Data

This study employs secondary quarterly time-series data spanning from 2010–11 (Q1) to 2025–26 (Q2). The dataset integrates a set of macroeconomic indicators that collectively capture both domestic economic activity and external sector dynamics relevant to spice price movements in India. The dependent variables are the spot and futures prices of turmeric and pepper, both traded on the National Commodity and Derivatives Exchange (NCDEX), which represent benchmark indicators for price discovery and volatility assessment in the Indian spice markets.

Given the quarterly frequency of the data, seasonal dummy variables are incorporated to capture cyclical influences such as harvest seasons and festival-induced demand surges. Additionally, policy dummies are included to represent structural shifts, notably the introduction of the Goods and Services Tax (GST) in 2017 (Q3) and the COVID-19 lockdown periods (2020: Q2–Q3), which significantly disrupted agricultural supply chains.

3.2 Methodology

The methodological framework integrates econometric and machine learning approaches to capture both long-run cointegration and short-term non-linear dynamics in spice price behaviour.

Agricultural commodity prices are influenced by both long-run economic fundamentals and short-run nonlinear market dynamics. Econometric models grounded in economic theory, such as the Vector Error Correction Model (VECM), are particularly suitable for identifying long-run equilibrium relationships and adjustment mechanisms among macroeconomic variables.36,37 In the context of agricultural markets, cointegration-based models have been widely used to analyse price transmission, market integration, and equilibrium adjustment.8

However, VECM relies on linear assumptions and often exhibits limited forecasting accuracy in the presence of nonlinearities, regime shifts, and structural breaks that frequently characterise agricultural commodity markets.38 Machine learning (ML) and deep learning (DL) models, by contrast, are designed to capture nonlinear relationships, higher-order interactions, and complex temporal dependencies without imposing restrictive functional forms.39

Random Forest (RF) models have demonstrated strong predictive performance in agricultural price forecasting due to their robustness to multicollinearity, noise, and overfitting, particularly in small and medium-sized datasets.40 Long Short-Term Memory (LSTM) networks extend this capability by explicitly modelling temporal dependence and long-memory effects in time series data, making them well-suited for capturing price dynamics during volatile periods.41,42

Accordingly, this study adopts a hybrid framework integrating VECM with RF and LSTM models. The VECM provides theory-consistent long-run economic interpretation, while ML and DL models enhance short-run predictive accuracy and capture nonlinear price behaviour. This combined approach enables a comprehensive assessment of spice price dynamics by striking a balance between interpretability, theoretical grounding, and forecasting performance.

3.2.1 Econometric framework

The econometric analysis begins with unit root testing using the Augmented Dickey–Fuller (ADF) test and Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test to determine the order of integration. Variables found to be integrated of order one, I(1), are tested for cointegration using Johansen’s maximum likelihood method, based on both trace and maximum eigenvalue statistics with an unrestricted constant. The cointegration rank (r) is selected according to statistical significance, and normalised β-vectors are reported.

The optimal lag order (p) is selected via Akaike Information Criterion (AIC) under the Vector Autoregression (VAR) framework. The resulting Vector Error Correction Model (VECM) captures both the short-run adjustment dynamics (α) and long-run equilibrium relations (β). Diagnostic checks include: Lagrange Multiplier (LM) Test for serial correlation, ARCH Test for heteroskedasticity, and Doornik–Hansen Test for normality.

Dynamic responses are analysed using Impulse Response Functions (IRF) and Forecast Error Variance Decomposition (FEVD) to quantify the magnitude and persistence of shocks in macroeconomic variables on spice prices.43

Since both dependent series (pepper and turmeric) are found to be I(1), a restricted VECM is applied, defined as:

Δyt=αβ'yt1+ΣΓiΔyti+Cxt+εt

Where,

  • yt=[ln(Peppert),ln(Turmerict)]' represents endogenous price variables,

  • xt includes exogenous macroeconomic regressors,

  • β represents long-run cointegrating vectors,

  • α indicates speed of adjustment, and

  • Γi captures short-run dynamic effects.

This formulation enables the model to isolate commodity-specific linkages (β) while accounting for broader macroeconomic shocks xt . It reduces parameter complexity, mitigating risks of overfitting in small samples.

3.2.2 Machine learning and deep learning framework

To complement the econometric model, the study employs Random Forest (RF) and Long Short-Term Memory (LSTM) models for non-linear forecasting and pattern recognition in the same dataset.

Random Forest (RF): A robust ensemble learning technique that constructs multiple decision trees and aggregates results to minimise overfitting. It captures complex feature interactions and provides feature importance metrics, enabling the interpretation of which macroeconomic factors most influence spice prices.

Long Short-Term Memory (LSTM): A recurrent neural network architecture designed to capture long-term temporal dependencies in sequential data. The model learns from lagged values of the dependent variables and macroeconomic inputs to predict future prices of spices.

Both models are trained on normalised data, with an 80:20 train-test split, and evaluated using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Symmetric Mean Absolute Percentage Error (SMAPE). SHAP (Shapley Additive Explanations) analysis is applied post-training to interpret the influence of features at both global and local levels, thereby bridging the gap between predictive accuracy and transparency. Given 62 quarterly observations, this corresponds to approximately 50 observations for training and 12 for testing.

Alternative econometric approaches, such as univariate ARIMA, GARCH, and ARDL models, were not adopted because they either focus on a single dependent variable or are limited in capturing multivariate cointegration and cross-market interactions.44 Given the joint determination of pepper and turmeric prices with macroeconomic variables, a multivariate cointegration framework was necessary.

While more complex deep-learning architectures such as Transformers have shown promise in large-scale financial datasets, they require substantially larger sample sizes to achieve stable estimation and avoid overfitting.45 With a relatively small quarterly sample, simpler yet robust architectures, such as Random Forest and LSTM, are empirically more suitable and widely validated in commodity price forecasting studies.42

3.2.3 Conceptual framework

The conceptual workflow of this study (Model 1) illustrates the integrated application of econometric and machine-learning approaches to analyse spice price dynamics. The analysis begins with the collection of macroeconomic indicators and commodity price data for Pepper and Turmeric, followed by data preprocessing procedures that include normalisation, seasonal and policy adjustments, and feature engineering. Econometric modelling using the Vector Error Correction Model (VECM) is then employed to identify long-run cointegration relationships and causal linkages among variables. To complement this, machine learning and deep learning models, specifically Random Forest (RF) and Long Short-Term Memory (LSTM), are applied to capture nonlinear patterns and enhance short-run forecasting performance. Model accuracy is evaluated using RMSE, MAE, sMAPE, and the Diebold–Mariano test for comparative forecast assessment. Finally, explainability techniques such as SHAP and feature importance analysis are used to interpret the contribution of key macroeconomic drivers, enabling the derivation of policy-relevant and market-oriented insights for exporters, policymakers, and farmers.

Model 1: Conceptual framework

ba78063f-c39a-4868-a5a4-f3d37282ed7d_figure9.gif

(Author’s Design)

This hybrid econometric–AI framework enhances both predictive power and interpretability, providing a balanced approach to understanding and forecasting the dynamics of agricultural commodity prices.

4. Findings and Discussion

4.1 Results

The analysis utilises quarterly time-series data from 2010Q1 to 2025Q2, comprising 62 observations for each variable, including Pepper, Turmeric, Nifty50, GDP, CPI, ER, ED, IFT, IIP, Gold, IR, and FII.

4.2 Descriptive statistics and stationarity

According to Table 3, Pepper exhibits higher levels and volatility than Turmeric, indicating a greater sensitivity to supply shocks and global market dynamics. CPI and ER show persistent upward trends, consistent with inflation and currency depreciation. Nifty50 and GDP demonstrate broader economic cycles. The wide dispersion of ED, IFT, FII and Gold suggests significant macroeconomic variability over the sample period.

Table 3. Summary statistics (2010 Q1–2025 Q2).

VariableMeanMedianStd. Dev.Min Max
Nifty5010,6009,4604,7204,95019,600
GDP2,860,0002,940,000737,0001,160,0003,980,000
CPI14314022.9105186
ER66.76711.344.782.7
ED2,720,0002,670,000954,000996,0004,150,000
IFT568,000505,000199,000250,000944,000
IIP99.81013.686.8105
Gold40,00031,00014,50020,50064,400
IR6.726.621.534.259.88
FII44,90015,300115,0003,060855,000
Pepper27,20022,5009,24016,50043,400
Turmeric8,9307,5503,4005,56018,000

Table 4 illustrates that the ADF test examines the presence of a unit root, whereas the KPSS test assesses stationarity in the presence of a deterministic trend. In several cases, such as GDP and CPI, the tests provide conflicting signals (ADF rejects a unit root, while KPSS rejects stationarity). In line with standard time-series pre-testing practice, variables displaying mixed results were treated as I(1) unless both tests consistently indicated stationarity. Accordingly, all variables except NIFTY50 and gold prices were treated as non-stationary in levels and were differenced before inclusion in the VECM. This approach ensures consistency in the presence of borderline or ambiguous unit-root behaviour.

Table 4. Unit root tests (ADF & KPSS summary).

VariableADF p-valueKPSS Stationarity interpretation
Nifty500.0520.462I(1)
GDP0.0230.019Mixed → Treated as I(1)
CPI0.539>0.10I(1)
ER (Exchange Rate)0.210>0.10I(1)
ED (External Debt)0.4610.048Mixed → Treated as I(1)
IFT (Trade)0.841>0.10I(1)
IIP0.01480.085Mixed → Treated as I(1)
Gold0.753>0.10I(1)
IR (Interest Rate)0.03770.041Mixed → Treated as I(1)
FII≈00.040Mixed → Treated as I(1)
Pepper0.8960.012I(1)
Turmeric0.547>0.10I(1)

The combined use of Augmented Dickey–Fuller (ADF) and KPSS tests strengthens stationarity assessment by exploiting their complementary null hypotheses. While the ADF tests for the presence of a unit root, the KPSS tests the null hypothesis of stationarity. Employing both tests reduces the risk of incorrect integration classification, particularly in small samples where test power may be limited.46

4.3 Cointegration and lag selection

Table 5 shows that both the Trace and the Maximum-Eigenvalue tests reject the null hypothesis of no cointegration, confirming the existence of strong long-run equilibrium relationships among the variables. This validates the application of a Vector Error Correction Model (VECM). Johansen’s cointegration approach is employed because it allows for the identification of multiple cointegrating relationships within a multivariate system, unlike single-equation methods such as the Engle–Granger approach. This is particularly relevant for agricultural markets where prices and macroeconomic variables are jointly determined.28,47

Table 5. Johansen cointegration results.

RankTrace statisticp-value Max-Eigen statistic p-value
0413.290.000112.290.000
1301.010.00083.130.000
2217.880.00060.540.0003
3157.340.00044.070.013
4113.270.00038.810.009

VAR lag length selection

Table 6 clearly shows that all information criteria (AIC, BIC, HQC) consistently identify lag 1 as the optimal lag. This is typical for quarterly macroeconomic data and helps avoid over-parameterisation. Information criteria such as AIC and BIC are used to balance model fit and parsimony. Selecting an optimal lag length is especially important in quarterly datasets to prevent over-parameterisation and loss of degrees of freedom.28

Table 6. VAR lag length selection.

LagLog LAICBIC HQC
1–873.55934.90636.18235.395
2–870.80834.95436.38035.501

4.4 Long-run and short-run VECM results

Table 7 shows that Pepper does not adjust significantly to long-run disequilibrium, while Turmeric shows a strong and significant adjustment speed. This indicates that Turmeric is the market that restores long-run equilibrium when deviations occur.

Table 7. VECM cointegration vector & adjustment coefficients.

Variableβ (Pepper=1)α (Pepper) α (Turmeric)
Pepper1.0000.00152 (ns)
Turmeric–6.941–0.02302 (p < 0.01)

Table 8 shows that Pepper’s short-run dynamics show only a weak link with Nifty50, implying relative independence from immediate macro shocks. In contrast, Turmeric responds significantly to GDP, CPI, ER, ED, IFT, Gold, and FII, reflecting stronger sensitivity to domestic and trade conditions.

Table 8. Significant short-run VECM coefficients.

Δ Pepper Equation
VariableCoeff.SE p-value
Nifty50–0.7780.4350.080*
Δ Turmeric Equation
VariableCoeff.SEp-value
GDP–0.001550.000810.062*
CPI–48.1122.320.036**
ER–223.3396.990.026**
ED0.003540.001570.029**
IFT0.009770.003190.004***
Gold0.08110.03760.036**
FII0.004120.001160.001***
ECT–0.023020.005250.0001***

* p < 0.10,

** p < 0.05,

*** p < 0.01.

4.5 Variance decomposition

Table 9 shows that Turmeric’s long-run volatility is primarily driven by Interest Rate shocks (~37%) and Gold (~15%), indicating a strong sensitivity to monetary and inflationary pressures.

Table 9. Turmeric FEVD.

HorizonNifty50EREDIFTGoldIRPepper Turmeric
10.570.4915.9826.470.490.0615.5940.35
1011.295.075.594.8012.6231.2315.6813.72
2013.186.523.512.4614.7136.7213.879.02

Table 10 shows that Pepper is highly self-driven (66% initially) but becomes increasingly influenced by External Debt and IR at longer horizons. Turmeric contributes little to Pepper’s variance (~1%).

Table 10. Pepper.

HorizonNifty50EREDIFTGoldIRPepper Turmeric
12.121.502.3219.535.882.6865.980.00
104.173.2730.815.001.5520.7833.301.12
204.253.7132.614.011.2319.7033.471.02

4.6 Impulse response functions

Figure 1 shows that a Pepper shock produces strong, persistent increases in Pepper and modest, delayed increases in Turmeric. A Turmeric shock produces immediate but short-lived effects. ER depreciation increases Pepper strongly. IR hikes depress Turmeric. Gold shocks increase both commodities over medium horizons. Impulse Response Functions (IRFs) and Forecast Error Variance Decomposition (FEVD) are employed to examine the dynamic transmission of macroeconomic shocks and to quantify the relative importance of each variable in explaining price volatility over time, consistent with macro-financial transmission and price spillover theory.48

ba78063f-c39a-4868-a5a4-f3d37282ed7d_figure1.gif

Figure 1. IRFs for Pepper and Turmeric.

(Author’s computation)

This figure presents the impulse response functions (IRFs) for Turmeric (left panel) and Pepper (right panel) in response to shocks across key macroeconomic variables, including Nifty50, exchange rate, external debt, gold prices, interest rate, IFT, IIP, GDP, and FII. Each response is plotted over a 20-period horizon to illustrate both the short-run reactions and the long-run adjustment dynamics following structural shocks.

4.7 Forecasting performance

Table 11 shows that LSTM achieves the lowest RMSE, MAE, and sMAPE for Pepper, while Random Forest clearly dominates for Turmeric. Overall, Random Forest provides more robust performance across commodities, whereas LSTM is competitive only for Pepper and substantially overfits Turmeric. VECM operates on absolute price levels rather than normalised values, producing large raw RMSE/MAE magnitudes that are not directly comparable with RF and LSTM.

Table 11. Forecasting Accuracy (RF, LSTM, VECM).

CommodityModelRMSEMAE sMAPE
PepperRandom Forest2,077.372,075.989.26
PepperLSTM1,948.661,805.608.31
PepperVECM1,499,677961,048
TurmericRandom Forest734.31615.304.11
TurmericLSTM3,368.403,368.0325.81
TurmericVECM485,739,600315,535,400

Figures 2 and 3 show that RF closely tracks the actual series, particularly during the volatile phases that followed 2020. LSTM captures Pepper trends but struggles with Turmeric’s spikes. Overall, Random Forest demonstrates stronger generalizability across commodities, whereas LSTM performs competitively only for Pepper and loses stability in Turmeric’s more volatile price structure.

ba78063f-c39a-4868-a5a4-f3d37282ed7d_figure2.gif

Figure 2. Actual vs Predicted Prices Pepper (RF & LSTM).

(Author’s computation.

This figure compares actual Pepper prices with forecasts generated by the Random Forest and LSTM models for the test period (2023–2024). The plot illustrates how closely each model tracks observed price movements.

ba78063f-c39a-4868-a5a4-f3d37282ed7d_figure3.gif

Figure 3. Actual vs Predicted Prices Turmeric (RF & LSTM).

(Author’s computation).

This figure compares observed Turmeric prices with predictions generated by the Random Forest and LSTM models for the test period (2023–2024), showing the relative accuracy of each approach.

4.8 Feature importance and SHAP analysis

Figures 4 and 5 show that Pepper depends primarily on Nifty50, GDP, Gold, ER, and IR.

ba78063f-c39a-4868-a5a4-f3d37282ed7d_figure4.gif

Figure 4. Random Forest Feature Importance Pepper.

Pepper (Author’s computation).

This figure displays the top 10 most influential features contributing to the Random Forest model’s prediction of Pepper prices. Feature importance values represent the relative contribution of each lagged or aggregated macroeconomic predictor.

ba78063f-c39a-4868-a5a4-f3d37282ed7d_figure5.gif

Figure 5. Random Forest Feature Importance Turmeric.

(Author’s computation).

This figure reports the top 10 features driving Random Forest predictions for Turmeric prices. Importance values reflect each variable’s relative contribution within the model.

Turmeric is driven by CPI, Gold, IFT, ER, and IR.

Figures 6 and 7 show that SHAP confirms the economic interpretation: CPI ↑ → Turmeric ↑; Nifty50 ↑ → Pepper ↑; ER appreciation → lower prices; IR ↑ → lower Turmeric/Pepper prices

ba78063f-c39a-4868-a5a4-f3d37282ed7d_figure6.gif

Figure 6. SHAP Summary Plot Pepper.

(Author’s computation).

This figure displays the top 10 most influential features contributing to the Random Forest model’s prediction of Pepper prices. Feature importance values represent the relative contribution of each lagged or aggregated macroeconomic predictor.

ba78063f-c39a-4868-a5a4-f3d37282ed7d_figure7.gif

Figure 7. SHAP Summary Plot Turmeric.

(Author’s computation).

This SHAP beeswarm plot illustrates how individual features influence the Random Forest model’s predictions of Turmeric prices. Warmer colours indicate higher feature values, while the distribution along the x-axis reflects each variable’s impact magnitude.

Across both commodities, exchange rate and interest rate consistently emerge as dominant predictors, with stock market performance (NIFTY50), inflation indicators (CPI), and gold prices contributing substantially. For Pepper, GDP and NIFTY50 exert relatively stronger influence, whereas Turmeric is more sensitive to CPI and input price variations (IFT). These SHAP patterns align with the commodity-specific economic structure discussed earlier.

4.9 Model diagnostics

Table 12 shows that the system exhibits acceptable stability and no autocorrelation, but displays ARCH effects and non-normality, which are common in macro-financial data, supporting the use of ML alongside VECM. VECM stability was assessed using the modulus of the companion matrix eigenvalues. For the Pepper–Turmeric system, the eigenvalues were 0.979 and 1.021. Since one eigenvalue slightly exceeds unity, the system is technically marginally unstable, suggesting the presence of weak unit-root behaviour or near-nonstationary adjustment dynamics. This does not invalidate the VECM but implies that long-run equilibrium adjustment may be slow and sensitive to specification.

Table 12. VECM diagnostics.

TestStatisticp-value Interpretation
Autocorrelation (Rao F)0.840.502No autocorrelation
ARCH54.540.025ARCH present
Normality16.640.002Non-normal residuals
Residual Corr.–0.02Very low correlation
Eigenvalues0.979, 1.021Marginally unstable (one eigenvalue > 1)

Table 13 compares the forecasting accuracy of the three models across both commodities. In absolute terms (raw scale), the VECM produces very large RMSE and MAE values because it operates on unscaled price levels. The machine-learning models Random Forest and LSTM perform substantially better and provide realistic short-run forecasts. Relative error metrics (sMAPE) allow a fair comparison across models. Random Forest achieves the strongest predictive accuracy overall (9.26% for Pepper; 4.11% for Turmeric). LSTM performs slightly better for Pepper (8.31%) but overfits Turmeric (25.81%). This confirms Random Forest as the most reliable model for both commodities under typical market conditions.

Table 13. Combined model comparison.

CommodityModelRMSE (raw)MAE (raw) sMAPE (%)
PepperVECM1,499,677961,048
PepperRandom Forest2,077.372,075.989.26
PepperLSTM1,948.661,805.608.31
TurmericVECM485,739,600315,535,400
TurmericRandom Forest734.31615.304.11
TurmericLSTM3,368.403,368.0325.81

Figure 8 visually confirms the results in the Table. Panel A uses a logarithmic scale to illustrate the significant raw-scale errors produced by VECM compared to machine-learning models. Random Forest and LSTM both produce much lower RMSE values for Pepper and Turmeric.

ba78063f-c39a-4868-a5a4-f3d37282ed7d_figure8.gif

Figure 8. Combined Model Performance Chart.

(Author’s computation).

This figure compares model performance based on RMSE and MAE for VECM, Random Forest, and LSTM. The logarithmic vertical axis highlights large scale differences between econometric and machine learning model errors.

Panel B compares sMAPE, a scale-free measure, and clearly shows that Random Forest has the lowest percentage errors for both commodities. LSTM tracks Pepper fairly well, but performs poorly for Turmeric. In contrast, VECM lacks short-run predictive power but remains valuable for structural long-run analysis.

Overall, the combined econometric and machine learning evidence indicates that Pepper is a slower-adjusting, export-linked commodity whose long-run dynamics are primarily driven by exchange rate movements, external debt, and interest rate shocks. In contrast, Turmeric behaves as a domestically anchored, inflation-sensitive commodity that adjusts rapidly to short-run macroeconomic disturbances, particularly changes in the CPI, interest rates, gold prices, and trade-related variables. The VECM confirms strong long-run cointegration but limited short-run predictability, whereas the machine-learning models, especially Random Forest, capture nonlinear behaviour and deliver substantially improved forecast performance. Together, these findings highlight structurally different price-formation mechanisms across the two markets and reinforce the value of combining econometric and ML methods for analysing and forecasting agricultural commodity prices.

4.10 Discussions

This study presents a comprehensive investigation of India’s high-value spices, Pepper and Turmeric, through an integrated econometric and machine-learning framework (Johansen cointegration, VECM, FEVD, IRFs, Random Forest, LSTM, SHAP). The hybrid approach enables the identification of long-run equilibria, short-run nonlinear responses, and predictive performance, thus providing a deeper understanding of how macroeconomic variables drive spice price behaviour.

4.10.1 Macroeconomic determinants and structural dynamics

The evidence from cointegration and adjustment dynamics confirms robust long-run relationships between macro-variables and spice prices. For Pepper, economic growth (GDP), export/trade flows (IFT), and interest rates emerge as dominant long-run determinants, underscoring its dependence on broader economic and trade cycles. This aligns with earlier findings in,49 which identified exchange rate, GDP, and interest rate as key influences on Indian pepper exports. For Turmeric, our results indicate that gold prices, CPI inflation, and trade-related variables have a stronger long-run influence, reflecting its closer linkage with inflationary dynamics and domestic consumption patterns. This finding is consistent with broader studies of spice markets such as.50

The study’s analysis also reveals greater volatility in Turmeric post-2020, likely driven by the post-pandemic supply-chain disruptions and speculative demand spikes, a pattern similar to that observed by.7 The analysis results indicate that Turmeric’s long-run dynamics are primarily driven by interest-rate and gold-price shocks, while globally linked trade and financial variables (FII) exhibit stronger short-run roles.

For Pepper, long-run sensitivity to interest-rate policy contrasts with weaker short-run responses and slower adjustment speed (the error-correction term for Pepper is insignificant). This slower adjustment is in line with the findings of,51 who documented lagged responses in the Indian pepper market. In comparison, Turmeric corrects disequilibria rapidly (significant ECT), confirming its higher elasticity to shocks. The weak error-correction coefficient for Pepper confirms its slow adjustment to long-run deviations, reflecting its export dependence and sensitivity to external macroeconomic shocks. Turmeric corrects rapidly toward equilibrium, consistent with its strong linkage to domestic inflation and internal demand-supply cycles.

4.10.2 Machine learning, forecasting accuracy, and interpretability

The inclusion of Random Forest and LSTM models adds a new dimension by capturing nonlinearities that traditional VECM may overlook. Random Forest identified GDP, ER depreciation, gold price and CPI as the most influential features for both commodities. SHAP analysis validated the directionality: rising GDP and weaker INR drive higher spice prices, while interest-rate hikes exert downward pressure. These feature rankings echo the VECM long-run coefficients, supporting the structural validity of the ML models.

Notably, LSTM achieved the best RMSE/MAE for Pepper and matched RF in many segments, particularly during volatile periods (2020–22). However, for Turmeric LSTM over-fitted and yielded a higher sMAPE, whereas Random Forest remained robust with the lowest sMAPE (4.11%). This aligns with,52 which indicate that tree-based models often yield superior stability compared to neural-net architectures in small samples. Taken together, these results indicate that Random Forest is the most reliable forecasting model overall, even though LSTM has a slight edge for Pepper.

Together, the econometric and ML results confirm that macroeconomic drivers remain central to price formation, while ML models enhance short-term forecasting reliability in volatile regimes.

4.10.3 Macroeconomic transmission & Market volatility

Our results emphasise that spices operate under multiple transmission channels. Exchange rate fluctuations and gold price movements emerged as key conduits of inflation and liquidity shocks into spice prices. For Pepper, exchange-rate depreciation enhances export competitiveness and lifts domestic price levels, consistent with the economic theory of commodity pass-through. For Turmeric, gold acts as an inflation hedge and speculative asset, thereby influencing price behaviour in inflationary regimes. This multilayered transmission aligns with broader evidence on Indian agricultural commodities by.53

The contrasting behaviour of Pepper and Turmeric reveals an asymmetric adjustment mechanism and hierarchical market structure. Pepper acts as the price-leader: its shocks explain 31 % of Turmeric’s variance at horizon 20-quarters, whereas Turmeric’s influence on Pepper remains minimal (~1%). These results align with earlier work on hierarchical commodity adjustments in emerging markets.

4.10.4 Theoretical and empirical contributions

This research contributes to the literature and practice in three ways: It is among the first to combine VECM-based cointegration modelling with machine learning interpretability (SHAP) in analysing India’s spice markets, bridging traditional econometric rigour with modern AI explainability. It establishes asymmetric price dynamics between pepper and turmeric, offering commodity-specific evidence of macroeconomic transmission mechanisms. It extends the discussion on how macroeconomic volatility, if unbuffered, can propagate through agricultural markets and threaten rural income stability, providing a framework for data-driven policy interventions.

5. Conclusion, policy implications, and limitations

5.1 Policy implications

The findings carry several actionable implications for policymakers, traders, and agricultural stakeholders. Given the sensitivity of pepper and turmeric prices to interest rates and foreign trade, policymakers should adopt coordinated monetary–trade strategies that stabilise credit costs while promoting export competitiveness. The presence of long-run equilibrium but short-run volatility suggests the need for hedging instruments, price insurance mechanisms, and warehouse receipt systems that protect farmers from macroeconomic shocks and price swings. The proven predictive strength of LSTM and Random Forest models supports the creation of AI-based price forecasting dashboards that integrate macroeconomic indicators, thereby enabling real-time monitoring and decision support for farmers, exporters, and regulators. The results underscore the importance of integrating sustainability and climate resilience into agricultural value chains. Stable macroeconomic environments, combined with adaptive supply chain policies, can reduce volatility and foster long-term income security for spice farmers. The influence of exchange rates and trade flows suggests that export diversification and bilateral trade agreements can help mitigate domestic markets’ vulnerability to global demand shocks. Since interest rate volatility significantly impacts price levels, ensuring accessible credit lines and inclusive financial products for spice producers can mitigate liquidity constraints and production disruptions.

5.2 Limitations and scope for future research

While the present study provides comprehensive insights, certain limitations must be acknowledged. The analysis relies on quarterly secondary data, which may smooth short-term shocks and understate high-frequency volatility, especially in post-pandemic periods. Incorporating monthly or weekly data could enhance sensitivity to sudden market movements. Although policy events (GST, COVID-19) were included through dummy variables, potential nonlinear regime changes and unobserved shocks may not be fully captured. Future studies could employ Markov-switching VECM or time-varying parameter models for deeper structural analysis. While macroeconomic indicators were central, climate, rainfall, and yield data, key supply-side drivers, were not explicitly modelled. Integrating agro-climatic and logistics data would refine the understanding of price dynamics. Although SHAP provides useful insights into feature importance, machine learning models remain partially opaque regarding causality. Hybrid econometric–ML causal frameworks could improve both predictive power and interpretability. The findings, while robust for Indian market conditions, may not directly generalise to other spice-producing economies due to differing policy regimes and trade exposures. Cross-country comparative analyses could enrich global understanding.

5.3 Future research directions

Building on these findings, future research can explore: Hybrid causal learning models (VECM–LSTM or ARDL–Transformer hybrids) for interpretable time-series forecasting. Climate-adjusted price models linking weather indices to macroeconomic shocks. Regional market integration studies analysing intra-India price transmission between producing and consuming centres. Sustainability-focused simulations quantifying the trade-offs between price stabilisation, export growth, and environmental resilience. Such extensions will not only deepen empirical precision but also guide India’s ongoing transition toward a digitally intelligent, macro-stable, and sustainable spice economy.

5.4 Conclusion

This study examined the macroeconomic factors and dynamic relationships influencing the prices of pepper and turmeric in India through an integrated econometric and machine learning approach. Using Johansen cointegration, VECM, IRF, FEVD, and predictive models such as Random Forest, LSTM, and SHAP analysis, it provides insights into long-term equilibrium and short-term nonlinear adjustments in spice markets.

Econometric results indicate that pepper is more sensitive to monetary policy and external shocks, making it a policy-sensitive export. Turmeric exhibits stronger domestic linkages and responds more quickly to fluctuations in inflation, gold prices, and trade volumes. Among the predictive models, Random Forest emerges as the most reliable overall, delivering the lowest forecast errors for Turmeric and competitive performance for Pepper. In contrast, LSTM achieves slightly better accuracy for Pepper alone but overfits Turmeric, particularly during volatile periods. Random Forest and SHAP results confirm that exchange rate changes, inflation, interest rates, and gold prices are key factors for both commodities. Overall, combining econometrics with AI models provides a solid basis for understanding agricultural prices and enhancing forecast accuracy. These findings suggest that while deep-learning architectures capture nuanced temporal dynamics for commodities with smoother long-term structure, tree-based ensemble methods offer greater robustness and stability for markets exhibiting higher volatility or structural shifts.

Ethics and consent

This study is based exclusively on secondary data obtained from publicly available sources, including government databases and published commodity price series. The analysis does not involve human participants, personal data, or biological samples. Therefore, ethical approval from an institutional review board or ethics committee was not required, in accordance with national and international research guidelines.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 13 Jan 2026
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
D V, B R S and Desai G. Forecasting Pepper and Turmeric Prices in India: A Machine Learning Analysis of Macroeconomic Drivers [version 1; peer review: awaiting peer review]. F1000Research 2026, 15:49 (https://doi.org/10.12688/f1000research.174434.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 13 Jan 2026
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.