Keywords
digital finance, accounting formalization, firm upgrading, Saudi Arabia, complementarity, organizational complementarity, wild-cluster bootstrap, Cinelli-Hazlett sensitivity.
Saudi Arabia’s digital payment capability reached 85 percent of retail transactions in 2025. However, the firm-level development procedures remain unchanged.
This study analyzes whether the relationship between digital finance and upgrading is influenced by accounting conditions at the firm level: digital finance induces traceability of transactions, while accounting formalization induces verification and information discipline that is conducive to traces translating to managerial signals.
We build the Digital Finance Index (DFI) and Accounting Formalization Index (AFI) and test the complementarity prediction of the DFI × AFI in the spirit of Athey-Stern, which can account for 1,002 firms using the survey-weighted estimation alongside the Romano-Wolf family-wise correction across four pre-specified hypothesis families, Oster and Cinelli-Hazlett sensitivity bounds, and the wild-cluster bootstrap inference (B = 1,999) for the special features of the six sector clusters.
Supermodularity prediction was not supported. The DFI × AFI interaction on labor productivity is negative in the point estimate (α₃ = −0.038), but it loses massively under sector-level resampling (wild p = 0.517 versus cluster-robust p ~ 0.004). None of the AFI subchannels carried precise moderation.
Two outcome-specific patterns were highly factual. Labor productivity and digital finance have a positive association (β = +0.075, wild p = 0.040, Romano-Wolf p = 0.046) and a negative association with product innovation (Romano-Wolf p = 0.046), which is robust to Oster but at the binding Cinelli-Hazlett benchmark.
The pattern is about specialization, one on the transactional versus the other on the experimental aspects, instead of complementarity to raise precision.
Findings are more of conditional associations rather than causal effects; bundled interventions, including onboarding and audit-readiness supports for digital finance, but add SME targeting capability vouchers warrant testing.
digital finance, accounting formalization, firm upgrading, Saudi Arabia, complementarity, organizational complementarity, wild-cluster bootstrap, Cinelli-Hazlett sensitivity.
Saudi Arabia has developed a full-fledged digital payment system in its own style. According to the Saudi Central Bank (SAMA), Saudi Arabia has adopted the proposed Retail Payment transaction rate. It has been increasing since 2024 and is scheduled to reach 85 percent in 2025 from last year’s 79 percent (Saudi Central, 2026). At the same time, the Zakat, Tax and Customs Authority (ZATCA) began its phased rollout of e-invoicing, with taxpayers earning more than SAR 375,000 in any one of the tax years 2022, 2024, or 2024 needing to integrate with their e-invoicing provider FATOORA by mid-2026 (Zakat & Customs, 2025a, 2025b). An overwhelming portion of Saudi establishments have internet access, approximately 97.7 percent (General Authority for, 2024). However, the following changes in the sectoral/firm-type structurally unequal firm performance: Saudi Arabia presents an ideal case to showcase some of these hurdles, as well as its successes (or failures) in the direction of digital traceability and the production of greater productivity, innovation, and training for employees.
This study conducts a detailed analysis to determine whether the upgrade results differed on the two documentation sub-channels (digital finance and upgrading) and whether they could indicate a difference that would support their sustainability inquiry. All discussions were open-ended, and the questions were open for empirical inquiry. On one hand, the value of ‘digital traceability’ increases, which is predicted by complementarity logic. On the other hand, as a forecast for its reduction by a mechanism of ‘substituting one channel for another’ and a mechanism of high specialization of channels. The various hypotheses of questions treated symmetrically and the same number of frequencies of the inferences to be confirmed/verified are discussed.
Regarding Digital Finance, what is the most effective way to translate it into labor productivity, formal workers who undergo training, and innovation in the products/services of Saudi companies? An upgrade does not depend on a composite but is operationalized through these four observable results. There is evidence of the impacts (outcomes) and contexts (heterodox) of digital finance (Konte & Tetteh, 2023). There is no statistical relationship directly between mobile money and employer productivity for these 14 economies at the sub-Saharan level. However, they do not describe how significant capital gains related to the formal banking system can be affected by mobile money. Those with mobile technology and access to loans accessed mobile loans more frequently because of reductions in information asymmetry (some SMEs in Kenya). However, it has no connection with sales and profits, as detailed by Tetteh (2023). Therefore, such a conditional structure helps to initiate a framework in which digital finance is involved in developing organizational capabilities. It also leaves space for the direction of the conditional relationships.
The same applies to digital finance, which provides transactional traceability. It generates machine-readable records, such as card rails and systems for instant settlement, when individual payments are made via mobile payments. The formalization of accounting verifies, records discipline, and routinizes the processing of a system that finalizes how firms internalize those records. Although each of these two interpretations is based on existing theory, they follow opposite directions. This complementarity prediction builds on the tradition of technology–organization complementarity (Athey & Stern, 1998; Bloom, Sadun, & Van Reenen, 2012; Bresnahan, Brynjolfsson, & Hitt, 2002). The two capabilities are supermodular; the value given by the digital traceability system would be higher in formalization, where a ‘more precise’ realization of the signal would lead to a higher value. Substitute or specialization prediction is consistent with concurrent information on management activities (Cook, Jackson, Fisher, Baker, & Diepeveen, 2022). Formalization could serve as an alternative to the information deficit in digital finance. Otherwise, it can fill or shift to administrative adherence, but not towards experimenting with products (Hausberg, Liere-Netheler, Packmohr, Pakura, & Vogelsang, 2019).
First, The documented validation of the complementarity prediction is not supported. The supermodularity prediction is tested immediately with the Digital Finance index (DFI) × Accounting formalization Index (AFI) interaction in the theory (Athey & Stern, 1998). This is then directly tested with bounds obtained from sensitivity analysis (Cinelli & Hazlett, 2020; Oster, 2019), the wild-cluster bootstrap (Cameron, Gelbach, & Miller, 2008; MacKinnon & Webb, 2018), and applying the Romano-Wolf multiple-testing correction (Clarke, Romano, & Wolf, 2020). If the wild-cluster bootstrapped levels are considered to lie in the wild, then none of the other levels of the asset point estimates are significantly different from each other (except the anchor productivity outcome point, which is negative but not significantly different from the others). This was carried out openly and reported. The implication is that widespread digital payment diffusion is insufficient for accounting formalization to create precision-raising complementarity.
Second, The most consistent relationship, which is concluded to have a negative order throughout the entire process, is between product innovation and digital finance intensity. Such a correlation is quite meaningful and constant in various penalized-likelihood estimations (FIRTH, 1993). A fixed-effects (sector-feasible) bounded estimation grid, and the association that satisfies the Oster ‘proportional-selection’ conditions. Within the Cinelli–Hazlett benchmark, it indeed reflects the p-value of the best covariate seen (a Romano–Wolf adjusted p-value, which takes into account the family-wise multiple testing range, is 0.046 with B = 1999, well below the 5 percent limit). The pattern shows consistency between transactional and experimental specialization descriptions. Firms that intensively use electronic payment gateways appear systematically different in product-innovation activity compared to others that operate through rational, customized, or less digitalized commercial structures. This aligns with organizational and innovation theories, in which standardized transactional efficiency and product exploration likely depend on distinct routines, incentives, and informative environments (Bloom, Eifert, Mahajan, McKenzie, & Roberts, 2013; Holmstrom, 1989).
DFI and AFI are justified as formative composites rather than reflective scales (Aguirre-Urreta, Rönkkö, & Marakas, 2023; Berbekova, Kock, Assaf, & Josiassen, 2025). The formalization indicators from external resources (ZATCA and GASTAT) and theory for each component, as shown in Table 3, represent the construct validity of the DFI and AFI. The confirmatory estimation protocol, consisting of multiple threshold cutoff points for inferential estimation, and sensitivity analysis were all conducted before the final estimation on the Open Science Framework. Subsequently, diagnostic analysis and multiple other exploratory analyses of the estimated data (data cleaning, measure creation) were performed transparently. The estimation strategy accumulates survey weighted models, three pre-specified standard-error variants, which are similar to six cluster settings, formal omitted-variable bounds (Oster “δ”, for R.max = 1.3 × and Cinelli–Hazlett, partial-R2 benchmarks against the most robust observed covariate), Romano-Wolf, family-wise multiple testing correction across four predefined hypothesis families.
The Kingdom of Saudi Arabia is a ‘test bed’ country for the organization. SAMA is pushing to implement a payment value system by developing two payment selection systems, namely SARIE, a real-time settlement system, and Mada, a payment card system, under the Vision 2030 financial sector development program. It has already reached 70 percent of cashless transactions, two years ahead of the scheduled cashless transaction goal in the retail sector (Saudi Central, 2026). In 2024, non-oil activity remained strong (+ 4.3 percent per GASTAT (2024b) or + 4.5 percent per IMF (2025)) during the implementation period. The ZATCA has successfully implemented the process that should be followed by the fiscal interface of hundreds of thousands of businesses in the integration phase (Zakat & Customs, 2025a). However, at the level where it occurs, the complementarities between digital finance and formalization, in terms of the modernization of the payment system, sustainable non-oil diversification, and formalization (such as the use of VAT) must come to light. There are three SDGs (two small, one medium) that have links and are relevant to all four upgrading outcomes: SDG 8.2 and 8.3 (survival and improvement of productivity, innovation, and SME formalization) and SDG 9 (industry, innovation, and infrastructure) (Gupta et al., 2025).
The Saudi Arabia Enterprise Survey 2025 (SAES 2025) (Group, 2025), could only consider any differences in the quality of management by viewing it from the prism of how management reacted to the lessons learned. The scope sample’s orientation is expanding towards exports (mean direct export share of 97.3 percent), and a scope condition is provided in Section 7. The test of the discipline (e.g., the test of the mechanism) is based on the association(s) found.
All theories are described in detail in Section 2. The hypotheses and questions examined are presented in Section 3. Section 4 includes the Data & Measurement. Section 5 describes the empirical approach. Section 6 focuses on two levels of contribution and describes the outcomes. Explanations are provided in Section 7, and the conclusion is presented in Section 8.
The relationship between digital financial services and firm-level outcomes is generally positive in this study. However, the organizational associative mechanism is still not fully specified, and the magnitudes may vary. Digital payment use is associated with a higher GDP per capita and a decrease in informal sector employment, particularly in countries where primary operations occur with credit access, such as Brazil, India, Kenya, and South Africa (Aguilar, Frost, Guerra, Kamin, & Tombini, 2024). However, according to Jun and Ran (2024), based on data from Chinese A-share listed companies, digital financial services can break through the “anti-financing barriers” on both ends of the asset side of the balance sheet as well as at different life cycle stages to reach informational “network effects.” Again, according to Dalton, Pamuk, Ramrattan, Uras, and van Soest (2024), incorporating e-payment technology in the operations of SMEs boosted the adoption of mobile loan technology because of reduced information asymmetry. However, this did not translate into any positive impact on sales and profits. No significant changes were found in the context of productivity, particularly in mobile money. Gains only emerge when mobile money is combined with formal banking accounts.
We highlight three characteristics of the literature that raise certain questions. Direct digital-finance effects diminish under controls, taking into account that the channel through which digital finance translates into development is conditional on other firm attributes. Most significant outcomes appear when digital finance is paired with complementary capabilities such as recordkeeping, bank access, and organizational practices. The direction of any moderating relationship is not traditionally fixed; empirical support is given to both complementarity and substitution logics. It’s reasonable theoretically that digital finance and product innovation have a negative association. Higher volume usage of digital-payment maybe concentrated among firms organized around standardized, high-volume, transactional models rather than product experimentation. In general conclusion, innovation is not hampered by digital finance; It simply reflects a fresh perspective that businesses embrace, which does not overlap between experimentation and becoming more efficient in their transactions.
The empirical framework is rooted in Athey and Stern (1998). It formalizes the conditions under which joint adoption of organizational activities can be interpreted as supermodularity instead of correlated returns to independent practices. Methodological central framework is, pairwise interaction tests on observable practices can conflate real complementarity along with positively correlated unobserved returns. Cross-equation restrictions, instrumental variation in joint adoption, or transparent sensitivity analysis are required by design discipline to support the interaction to be read as evidence of complementarity. This article fosters the third option. Section 5.8 has particularly shown Oster (2019) proportional-selection bounds and Cinelli and Hazlett (2020) omitted variable contours as the discipline to evaluate linear estimates. The bounds main function is to quantify whether an observed estimate remains credible under a reasonable situation with hidden bias, and it doesn’t try to resolve the Athey-Stern identification problem.
The parent literature on IT-organization complementarity provides a solid background. According to Bresnahan et al. (2002), when paired with decentralized decision rights and complementary human capital, IT capital yields higher returns. Followingly, Bloom et al. (2012) extended this to a much broader multinational scenario, attributing the productivity gap to structured people-management practices that travel with the parent firm. According to Brynjolfsson, Rock, and Syverson (2021) generalize purpose technologies. Observable productivity returns are systematically delayed because firms must accumulate unobserved complementarity intangibles before the actual potential of GPTs is realized. This paper treats digital finance as a transaction-layer digital technology with general-purpose technology-like features and accounting formalization as a candidate organizational complement (Qiu & Gao, 2026).
Formalization of accounting involves not only recording transactions but also various verification routines, record-keeping practices, and fiscal interface protocols through which a firm generates information from transactions and uses that information as cues for managerial decisions. This aligns with the organizational-information-quality context, in which the practice of accounting itself is not viewed as part of the external-reporting technologies world but is considered to be part of the internal-control technologies (Busco & Quattrone, 2018). According to Lambert, Leuz, and Verrecchia (2012), firm-level information accuracy controls the cost of capital and, through financing frictions, real investment behavior. Current experimental work confirms that information system precision moves managerial reporting at the margin, even when the information quantity is fixed (Douthit, Majerczyk, & McLuckie Thain, 2024).
From an accounting perspective, two subchannels can be differentiated. Internal formalization means theoretical record authentication, record discipline, and reconciliation routines (Hall, 2010). The SAES measure observes this channel narrowly through external-auditor certification empirically. Formalization of the fiscal interface refers to three features: the taxpayer is registered for VAT, refund-application engagement, and the tax administration is frictionless (Naritomi, 2019). The internal formalization of accounting is a precision-enhancing mechanism within the firm created via the complementarity tradition. Fiscal-interface formalization redirects attention toward administrative standardization and compliance reporting; it may raise managerial precision but may also move effort from experimentation. Section 5.5 decomposes AFI accordingly because the direction across subchannels can be different for the predicted moderation.
Contemplate a firm has to choose whether to undertake an upgrading investment I at the cost c, given a latent state θ, consist of demand, process waste, and skill gap, which is observed through internal signals. The number of independent transaction signals n(x) generated per period is increased by digital finance adoption x. With n′(x), accounting formalization y raises the accuracy τ(y) of each signal with the help of verification and routinization, with τ′(y) > 0. The (posterior) precision for the parameter θ of the Bayesian updating of the posterior precision is:
Where Π0 is prior precision. After the posterior precision crosses an opportunity-specific threshold Π*, then the firm invests; the assumed value of the developing opportunity is V(x,y) = Pr[(x,y) > = Π*]· E[π|I] – c. The complementarity prediction follows from supermodularity, when the cross-partial of the expected value with respect to capabilities must be positive:
Different traceability channels are followed up with respect to the dimension of the traceable outlined in Figure 1. We estimate the first-order effects on each of these channels if there were to be an increase in the initial cost of upgrading to firms’ digital finance. A formalization of accounting, as a fair performance-based competency in the model, improves the accuracy of the transactional signals to managerial information (Douthit et al., 2024; Lambert et al., 2012). The framework acts symmetrically among the competing predictions; moderation may elevate the association of digital finance, reduce or leave it without doing anything. The two main moderators under study are: the level of moderation (i.e., the slope) may differ depending on firm size for stable as well as unstable financial constraints; and the negative effect on the subsidiary’s upgrade ceases to exist when the information system of the parent firms can moderate through foreign owners (Bloom et al., 2012). The engagement with SDG 8.2, SDG 8.3, and SDG 9 (Economic & Affairs, 2024), with the view of upgrading operationalized, which includes four firm = level outcomes such as product and process innovation, labor productivity, and formal workers’ training.
The study is based on one baseline hypothesis, one descriptive proposition, and three research questions. In an empirical study, it is not possible to provide a solution to all of the questions that are put forward by the literature. In this study, the explicit expectation is equated with a hypothesis supported by prior evidence, while research questions are asked countering directional hypotheses from prior literature’s competing predictions.
(Outcome-specific baseline association): Digital finance adoption is anticipated to portray a link that is outcome-specific with firm-upgrading. Here, the results can be positive for labor productivity, while other variables, such as product innovation, process innovation, and formal worker training, are left open for new signs. H1 is viewed as directionally supported if the productivity association is positive and clears the wild-cluster bootstrap threshold. Accurate family-wise affirmation is needed for all four outcomes to counter Romano–Wolf adjustment, and the two standards are reported distinctly. The equation:
P1 (Descriptive alignment proposition): DFI and AFI are positively aligned at the firm level after conditioning on observables. The proposition is described as an illustrative alignment, rather than a mediation or causal sequencing.
RQ1 (Moderation): Does DFI increase, decrease, or show no change in the DFI-upgrading association? In the interaction specification:
α3 is the parameter of interest. An interaction estimate is handled as supported, but this would require the wild-cluster bootstrap p-value to be <0.05 following the documented inferential standard, and for specific conclusions, we would require consistency in the corresponding discussion for the variants.
RQ2 (Sub-channel decomposition): Do internal and fiscal-interface formalization moderate the DFI-upgrading relationship in the same or different directions, and does the answer vary by the upgrading outcomes? Sub-channel decomposition question replaces the two sub-indices for the composite in Equation (4).
RQ3 (Heterogeneity): Do baseline and moderation patterns differ by the firm size, financial-obstacle status, and foreign ownership? SME and finance-obstacle splits are primary; The foreign-ownership cell (n = 95) is reported directionally as a mechanism–consistency check.
The three rules of interpretation are: The descriptive character of the DFI - AFI linkage in P1 finds a match in the absence of language of causal actions in Sections 6–8, and the absence of indirect effects through the agency of AFI. RQ1, RQ2, and RQ3 are evaluated at the standard of documentation of the inferential standard specified in §5.1. In §5.8, the Oster and Cinelli-Hazlett sensitivity standards are applied to the linear productivity specifications and to the linear-probability approximation for the product-innovation baseline, which is not for the nonlinear interaction models. Confirmatory protocol recorded on Open Science Framework before final estimation; interpretations of estimates are conditional associations, not causal effects.
These firm-level data are based on the SAES 2025 (Group, 2025). The survey measured innovation, employment, training, ownership, export orientation, payment activities, routine financial management, and regulatory practices that are normally part of a typical establishment framework in the economy, see Table 1. According to Buffington, Foster, Jarmin, and Ohlmacher (2017), the survey design is treated as a part of the empirical architecture; sample, missing-data treatment, weights, and robustness are stated distinctly.
The survey includes 1002 companies by following the documented inclusion rules and distributed into three size classes (small [5–29] employees, n = 449; medium [20–99], n = 303; large [100+], n = 250), six regions, and six sector groupings (sampling inclusion rules in annex). The survey responses are weighted by three weights: Median, Strict, and Weak. The most important of these three weights is the Median weight, which is the main specification. The strict weight and the weak are reserved for sensitivity. Appendix B portrays weight construction. The weight examples are export-oriented (mean direct export share of 97.3 percent) and presented in section 7 (scope condition). The results obtained (association) are based on actual data (and not on population-wide effects), but for estimation purposes, the SAES 2025 sample (Qamruzzaman, 2026b).
While the DFI is not a measure of digitalization, it is rather a theoretical assumption with traceable transactions, which measures firm engagements with digital financial channels. Three SEAS indicators are included as part of the baseline DFI as an additive standardized composite: To what extent annual purchases are paid electronically (k38, mean 87.0%, SD 8.2); to what extent annual sales are paid electronically (k33, mean 82.7%, SD 11.2); and a binary indicator for holding a checking or savings account (k6, 92.5% Yes). The first two are catches the intensive margin, and the third extensive margin. According to (Avinç & Doğan, 2024), DFI is not handled as a reflective scale but as a formative scale. The reason is that components are not interchangeable symptoms of one hidden construct. Variance inflation factors (VIF) don’t exceed the usual cut-off (1.92, 1.92, and 1.04). The composite retains the total firms sample under the documented available component rule for establishments lacking k33. The whole component DFI is documented as appendix robustness. An intensive margin only composite is reported in section 5.9 based on k33 and k38.
The AFI combines the report of verifiable records, financial discipline via routines, and good fiscal engagement. AFI baseline is made up of 5 SAES items in 2 sub-indices. For the internal-formalization sub-index, the value can be almost directly interpreted as a proxy, k21. The financial statements were audited by external sources, with 94.3% of the respondents answering ‘Yes’ and 57 of the firms with no financial statement auditor stating a variation. This variable does not showcase all record-discipline directly; it is used for the validation of formal reporting of financial records from external markers. The fiscal-interface sub-index accumulates four items: VAT refund application (j38, 87.4% Yes), VAT refund waiting time in weeks (j39, mean 5.95, range 3–9) reverse-coded, tax-administration obstacle (j30c), and tax-rate obstacle (j30f ), both reverse-coded. The higher the value, the lighter the obstacle. The tax-obstacle items are items that can be related to perceived burden, sectoral exposure, and/or administrative salience rather than accounting routines, and will be analyzed separately in the AFI decomposition. They are interpreted as fiscal-interface conditions. AFI is regarded as a formative composite (Aguirre-Urreta et al., 2023); the construction of validity rests on theoretical justification per component, which can be seen in Table 3 and Section 4.4. In Section 5.7, AFI is decomposed into internal and fiscal-interface sub-indices. Because the internal-formalization proxy has scarce variation, whereas channel = specific estimates are described carefully.
The quartiles that were retained were: labor productivity (logarithm of sales last fiscal year/full-time permanent employees (n2a/l1); the anchor outcome), product innovation (h1, 12.1% Yes), process innovation (h2, 3.7% Yes), and formal worker training (l10, 3.5% Yes). None of them are collapsed into a composite, as the framework predicts that DFI and AFI may operate differently around the outcome domains. The low base rates lead to lower precision from process innovation and training. Labor productivity acts as an anchor in section 5, where product innovation plays a second major outcome role, process innovation, and training as a lesser valued directional check. Employment growth is retained only as a supplementary appendix outcome.
All of the six sector fixed effects, six region fixed effects, and the following control variables are included: the ISO/control variable quality certification (e6), the direct export share (d3a), foreign ownership share (b2b), the log of the number of permanent employees (l1), firm age (b5), and the five-level finance-obstacle indicator (k30). Based on the high mean value of the direct export share, the export-share control has been applied for the residual in the sample variation. To reduce the distortions of specification drifting, all models (baseline, alignment, complementarity, and heterogeneity) share the same architecture, with VIFs of continuous level/normalized covariates at <2.0 ( Table 3, Panel B). An explicitly described robustness grid of alternative control sets is added only (as described in Section 5.9).
Tables 1-3 present descriptives of samples, measurement architecture, and measure diagnostics. AFI is a formative composite, and it does not rest on internal consistency. The paper externally contextualizes AI by using ZATCA e-invoicing rollout information, GASTAT establishment ICT, and e-government used indicators (General Authority for, 2024; Zakat & Customs, 2025a). The reunification of these elements with AFI and giving them credibility is not observed in the context of AFI as the main indicator of understanding accounting-formalities, but rather in the light of fiscal interfaces, or digital-new-administration, increasingly highlighted in the country.
There are design-aware workflows that are used for dealing with missing data (Kalpourtzi, Carpenter, & Touloumi, 2024). The VAT refund waiting time (j39) is the main problem, as it only counts VAT refund applicants; 876 applicants, 126 non-applicants, and 875 who observed waiting-time values. The main specification retains non-applicants through a separate indicator and uses waiting-time information only where observed. Robustness checks re-estimated for the “applicant-only” sample (n = 875 applicants) and on a variant that has been median imputed. Listwise deletion is used for cells that do not have at least 1% missing. Details variables measurement reports in Table 2 and the extended information can be found in Appendix A1 and A2, respectively.
The analysis is confirmatory and in the sense of an observational approach. Finally, the complete estimation methodology (including construction of the index, definition of sub-groups, decomposition of AFI, construction of the sensitivity bounds, and various multiple testing) and the bounded specification grid were posted on Open Science Framework before the final run. Data cleaning has been conducted as an exploratory process, and measurements have been defined; and transparency is assured with their reporting, see results in Table 3. Outcome scale: Labor productivity estimated by survey weighted ordinary least square (OLS); product and process innovations (and worker training) estimated by survey weighted logit with outcome scale corresponding to the Average marginal effects (AMEs), which are more interpretable (Kiefer, Woud, Blackwell, & Mayer, 2024). Additionally, a separation/sensitivity check is reported, but not used as a primary specification when the outcome is a low base-rate outcome, Firth penalized-likelihood logit. Weighted-logit AMEs are the more commonly used explanations of binary response models with probabilities; linear probability model analogs used for omitted-variable sensitivity diagnostics (Oster, Cinelli-Hazlett), based on a linear-probability-model analog of a binary response model, are more self-explanatory. The interpretation of estimates as conditional associations as opposed to causal effects, as done in 3.3, is not controversial (Pesantez-Narvaez & Guillén, 2020, p. 5502).
The sector stratification used in the SAES has six clusters, which are in the lower range of cluster-robust inference for reliability. Wild-cluster bootstrap p-values are primary (B = 1,999 Rademacher resamples) and cluster-robust (for comparison only) SEs (Cameron et al., 2008; MacKinnon & Webb, 2018). Only the two most important variants are used: the Strict/Weak variant for sensitivity; the other one is the SAES median weight as the main specification. If there are any conflicts between the two standards, this is identified in section 6.
The baseline of H1 is the situation in which all elements of the control vector and the fixed effects appear in Equation (3) and results presents in Table 4. As the unconditional pairwise correlation of DFI and innovation is negative, sector and region impacts are more relevant regarding the innovation outcomes, and thus the composition of the sectors. Only two criteria were followed by H1: (1) Directional; DFI was positively associated with at least one outcome under wild bootstrap, and (2) Strict; the four Family 1 Romano-Wolf p-values were all less than 0.05. These are reported individually, in Section 6.1.
The alignment of the model resulted in a regression of AFI of the DFI with the full control vector, the fixed effects, and the median weight:
A positive θ is also an indication that digesters at companies have more formal control, conditional on observables. P1 is a description of alignment, not mediation.
RQ1 estimates in Equation (4) and findings are available in Table 5. When α3 is positive, it may be a sign of complementarity, when negative, it may be a sign of substitution, and when null, it may be a sign of null moderation. The correlated unobserved returns can be included as Athey and Stern (1998) have shown. The interaction is supported when p < 0.05 (wild-bootstrap), and conclusions are stable across SE variants, section 5.9 specification grid.
For the interaction estimates, we provide conditional marginal effects at the 10th, 50th, and 90th AFI percentiles (Beiser-McGrath & Beiser-McGrath, 2023):
For the binary outcomes, the analogous quantity of each percentile is the weighted-logit AME.
However, in the RQ2 re-estimation of Equation (4) is carried out using sub-indices for fiscal and internal interactions, and some interactions can be re-estimated simultaneously, see results in Table 6.
In the section 2.4, α4 must carry supermodularity in the internal channels. However, α5 might attenuate the fiscal channel if compliance redirects effort. The results for the changes in channels should be approached with caution, as there are only a couple of different internal proxy sources (57 unaudited firms).
The following three subsamples are tested: the SMEs subsample (n = 752) as compared to the larger firms subsample (n = 250); the subsidiary foreign-owned subsample (n = 95); and the subsidiary where there are significant financial barriers subsample (n = 102) vs others. As firms get nearer to the upgrading-investment threshold (as in section 2.4), however, the change per precision unit is more and more. Tests should also be targeted Bloom et al. (2012), and there could be other ways to fill in for AFI in other foreign-owned companies from their parent firms. Substantive interpretations are kept excluded by degenerate subgroup specifications.
Robustness is bounded ex ante: Substitute DFI constructions (baseline vs intensive-margin only), alternative AFI constructions, productivity with and without the 1/99 winzorisation; missing data sensitivity for j39. There were 24-specification curve; each consisted of 2 DFI × 3 AFI × 2 control sets × 2 productivity treatments. If the median value is not very far from the value of 0, if the t-statistic does not display below −1.96, or if more than one-third of the specifications produce sign reversals, then the interaction is fragile.
Two diagnostics are used to host the formal sensitivity. Oster (2019) δ at Rmax = 1.3 × :
|δ| > 1 would mean that the unobserved selection effect should be even larger than the observed selection effect to nullify the estimate, if applied to the linear productivity measure, which is also for the LPM analog in the case of product innovation. Conducted by the authors based on the value of Robustness as provided by Cinelli and Hazlett (2020):
The method of Romano-Wolf (stepdown) adjusted p-values is used to make family-wise error adjustments (Clarke et al., 2020).
In Appendix A, the DDML productivity baseline is checked, as it can be well learnt by flexible learners of this sample size of the interaction signals (Ahrens, Hansen, Schaffer, & Wiemann, 2025; Chernozhukov et al., 2018), not an extension of Equation (4). PDS-LASSO is the name of the secondary check (Belloni, Chernozhukov, & Hansen, 2014). The two can only provide rather limited feedback to the reverse causality problem.
The results follow the architectures that are first reported in section 3.2, and then in section 5: H1 baseline (6.1), P1 alignment (6.2), RQ1 moderation (6.3), RQ2 decompositions (6.4), sensitivity (6.5), RQ3 heterogeneity (6.6), and robustness (6.7). The significance of all the inferences is cluster corrected with the SAES-median weight as well as the cluster standard errors and Bootstrap with 1999 replications for the main inference. The sample sizes for the main models are 1002, the SME models are 752, the finance major models are 102, and the foreign ownership cell is 95.
In Panels A and B of Table 7, summarize the baseline DFI associations across the four outcomes under Equation (3) with the fixed effects and full control vector.
On labor productivity, β (DFI) = + 0.075 (cluster SE = 0.021, CR 95% CI [+ 0.034, + 0.116], wild p = 0.040, wild 95% CI [+ 0.005, + 0.146]), approximately 7.8 percent higher productivity per one-SD increase in DFI. On product innovation, weighted-logit AME = − 4.2 pp (Firth AME comparable); LPM β = − 0.076 (cluster SE = 0.011, CR 95% CI [− 0.097, − 0.055], wild p = 0.046). On process innovation, AME = − 0.7 pp (LPM β = − 0.030, wild p = 0.056). On worker training, AME = + 1.9 pp (LPM β = + 0.006, wild p = 0.067).
Romano-Wolf adjusted p-values across Family 1 for both the productivity and product innovation are 0.046, while for the process innovation and training, they are 0.067. H1 enjoyed directional support for all four outcomes; productivity and product innovation survived Romano-Wolf correction. However, process innovation and training fall just above.
Table 4, Panel C, reports the alignment regression: θ = + 0.135 (cluster SE = 0.031, cluster t = 4.41, R2 = 0.126). The empirical condition under which the section 6.3 moderation analysis is interpretable, digitally engaged firms exhibit higher formalization conditional on observables. This is an alignment of the description, not a mediation.
Table 5, Panel A, shows the results of the DFI × AFI moderation, and Figure 2 displays the conditional marginal effects of DFI at percentiles 10, 50, and 90 of AFI, which can also be seen in Panel B. This prediction was for documented confirmation of α3 > 0, which is not confirmed by the estimates.
On labor productivity, α3 = − 0.038 (cluster SE = 0.013, cluster t = − 2.85, CR 95% CI [− 0.064, − 0.012], wild p = 0.517 at B = 1,999, wild 95% CI [− 0.123, + 0.047]). The wild bootstrap is used as the primary, and the cluster-robust SE implies p ~ 0.004, but for interactions with six sector clusters. Here, p = 0.517 shows that accuracy is not robust for sector-level resampling. A 100-fold discrepancy reflecting few-cluster sensitivity rather than a consistent interaction signal, and the interaction is not supported.
Marginal effects of DFI on productivity at p10 (+ 0.087), p50 (+ 0.069), and p90 (+ 0.057) ( Figure 2): effects decrease with an increase in AFI, but it remains positive. Although opposing the prediction of the supermodularity. The value of the weighted logit interaction AME for product innovation is around – 2 pp at the value of AFI (LPM α3 = −0.043, wild p = 0.226) levels. The estimation of both interactions, process innovation, and training is poor. The complementarity prediction is not supported by any results, the empirical content of Contribution 1.
In Table 6, the AFI is divided into sub-channel decomposition under Equation (7). For productivity, neither sub-channel yields a precise moderation: internal-AFI α4 = − 0.012 (cluster SE = 0.028, wild p = 0.655), fiscal-AFI α5 = − 0.034 (SE = 0.037, p = 0.942). For product innovation, the decomposition is asymmetric: internal-AFI α4 = − 0.045 (cluster t = − 2.86, wild p = 0.118), fiscal-AFI α5 = + 0.008 (p = 0.783). The internal-AFI interaction does not clear the wild-bootstrapped threshold, but its cluster-t is comparable to the composite. Suggesting that moderations of product-innovation may derive from an internal formalization, rather than the fiscal interface. It is just a mechanical clarification compared to confirmation on the internal proxy, which is consistent within sections 4.3 and 5.7.
Table 7 reports the values for the sensitivity analysis of Equations (8) and (9). Oster δ at Rmax = 1.3 × is 3.37 in the case of productivity, 6.06 for product innovation, and both are well above |δ| > 1. The bias-adjusted β* at δ = 1 remains positive for productivity (+ 0.053) and negative for product innovation (− 0.064). Both pass the Oster screening.
We let Cinelli–Hazlett RV (q = 1), and percentage binding, α = 0.05, be our binding constraint. Productivity: RV = 8.46% vs the 11.34% in the log-employment association (standard log-employment); a confounder weaker than log employment could plausibly nullify the association. Product innovation is RV = 6.76% when compared to the benchmark of 6.84%, which is close to the benchmark of log-employment. Both pass Oster comfortably; In the context of Cinelli-Hazlett, product innovation is placed at the solid observed covariate benchmark, and productivity is more vulnerable than Oster δ alone suggests. Section 1.4 frames Cinelli-Hazlett as the binding benchmark in the empirical content.
Subgroup tests on productivity reveal that the negative DFI × AFI pattern is concentrated among SMEs (α3 = − 0.046, cluster SE = 0.014, t = − 3.31), and absent for larger firms (α3 = + 0.021, t = 0.36). The cell labeled as “finance-major-obstacle” (n = 102) is exploratory, because the basic control specification produces a singular design matrix; the corrected specification yields α3 = − 0.268 (cluster t = − 3.31). The foreign-ownership cell (n = 95) is directional (α3 = − 1.077, cluster t = − 1.33). Complete subgroup numerics are available in Appendix Table A2.
For the bounded specification curve ( Figure 3), there are 24 possibilities. Value of α3 ranges from (− 0.038 to +0.018) for productivity with a median t = − 0.006. There are 15 negative specifications, 9 are at or 0 or slightly positive, and only one cluster-t value at or below −1.96. No regular positive interaction scheme for the emergence of shapes. None of the one-sector diagnoses in leave-one-cluster-out seems to be causing the sign change, and the wild-bootstrap distribution is not that skewed either (p = 0.517).

No change in its weight-variant sensitivity, which is (+ 0.075 to +0.080) of productivity β (DFI). Sensitivity item for missing data for j39, Firth penalized-likelihood logit on product innovation, and the DDML robustness shown in Appendix A, with all preserved point estimates and signs. The complementarity prediction is not supported by Contribution 1; the negative DFI-product innovation association is stable across Firth, sector-conditional fixed effects. The bounded grid and the Oster bound, located at the binding Cinelli-Hazlett benchmark, while clearing the 5 percent Romano-Wolf threshold as stated in Contribution 2.
The paper proposed that digital finance creates transaction traceability, while the formalization of accounting would give both verification and provide a discipline of information to the companies, which can validate whether companies convert traces into managerial signals (Lambert et al., 2012). This supermodularity assumption, that the upgrade of the DFI should be steepened at a higher level of AFI, is not supported. The negative interaction in the point estimate contrasts with the cluster-robust p-value, which is ~0.004; however, the bootstrap p-value (B = 1,999) is 0.517, indicating that the apparent accuracy is not robust to sector-level resampling. The discrepancy between these two variants’ interference is a significant methodological finding. The precision-raising system is unable to detect at the firm level, whereas the conservative bootstrap discipline is appropriate for six clusters. The results have a kind of theoretical implication beyond the context. The established framework by (Athey & Stern, 1998) predicts supermodularity under monotonic returns to joint adoption. Although digital payments diffused at a near saturation (85 percent of retail transactions), the marginal informational value of other formalization might not justify the monotonicity assumption, diminishing the predicted gradient.
The decomposition of AFI subchannels creates more informative nulls. Neither of these kinds of moderation (internal-formalization (audit-based) or fiscal-interface (VAT engagement)) is positive, and neither exceeds the wild-bootstrap limit for any of the outcomes. The results for the channel-decomposition cannot be compared with the asymmetric precision-raising scenario where one of the channels is complementary, and another is not. The key takeaway is that the upgrading concept bottleneck lies below formalization, compared to being at the formalization stage. Audit verification and VAT compliance both supply legible records. However, legibility does not translate into the managerial routines, experimentation, or capability accumulation required for upgrading automatically. This is quite consistent with the logic of the productivity J-curve (Brynjolfsson et al., 2021).
The continuously observed non-productivity pattern is the negative relationship between digital finance and product innovation. It is important to note here that the shape of the cross-section alone is not able to conclude whether heavy use of digital payment is stopping innovation, or whether it is due to other business philosophies in which transactional efficiency and experimentation are less likely to coincide (Holmstrom, 1989). In the case of B = 1,999, the association passes the un-adjusted (p = 0.046) as well as Romano-Wolf family-wise (p = 0.046) threshold, and Oster δ = 6.06 is greater than the threshold value |δ| > 1. The estimate of the Cinelli-Hazlett RV of 6.76 percent is approximately as close as the estimate can be to the benchmark estimate of the log-employment of 6.84 percent, but a confounder no stronger than log employment may cancel the estimate as well. However, the transactional vs experimental specialization interpretation is statistically supported under the documented family-wise correction but bound by omitted-variable sensitivity at the strongest- observed covariate threshold. The firms organized around standardized exchange may benefit from digital infrastructure through monitoring and settlement. The product innovators may rely on relational or project-based connections that digital-payment intensity does not capture (Bloom et al., 2013). The cross-sectional architecture admits two alternative readings. The firms that are intensive users of digital payments may self-select from a low innovation-oriented commercial population, or product innovators might avoid too much use of such electronic payment, as their relationship is more project-based rather than transactional. The product-process asymmetry shows process null and product negative, supporting the specialized reading at the margin. Operational reorganization is compatible with payment standardization, while market-facing experimentation is less so. Such a design cannot adjudicate among these systems.
The results have direct relevance for the agenda for SDG 8.2, SDG 8.3 and SDG 9 of Saudi Arabia. The study found a positive association of DFI with labor productivity, where an increase of one standard deviation in DFI was associated with an increase of one standard deviation in labor productivity (β = +0.075; wild p = 0.040; Romano-Wolf p = 0.046). The current policy already devotes significant focus to cashless payments, e-invoicing infrastructure and digital payments. Another divide is the lack of automatic translation of payment modernization to firm upgrading. In this regard, the Saudi Central Bank and Ministries of Commerce and Economy and Planning will need to move away from digital payment diffusion and see through digital-productivity programs to support firms in utilizing the transaction record for pricing, inventory management, cash-flow management, and managing suppliers.
Our results suggest that implementing business-support frameworks based on an e-payment-use training program will be beneficial for adopting digital finance, because of the positive correlation between the two variables. SAMA and Saudi Chambers can create brief training programmes for SMEs on the use of electronic payment records to track sales, plan working-capitals, and monitor costs. This would be possible, since existing companies have already adopted digital-payment channels at large levels. The policy task is not “building from scratch,” but rather creating a new infrastructure from an existing one. It is to facilitate better utilization of existing digital records in support of operational decision making processes in firms. The investment in training material, advisory capacity and digital dashboards is required up front but can mitigate long-term expenses associated with suboptimal use of records, weak cash-flow management and low productivity for SMEs.
The second implication is related to accounting formalization. This study does not confirm the predicted complementarity effect between the DFI × AFI interaction, the precision of the interaction between productivity does not increase with the wild-cluster bootstrap inference (wild p = 0.517). Current formalization policy is geared towards audit, VAT engagement, e-invoicing and fiscal compliance. The data however, indicate that the formalization of accounting does not enhance the digital-finance productivity channel. Thus, ZATCA and MOC should adjust the guidelines for firm support to ensure that formalization involves managerial interpretation of accounting records, rather than mere completion. Essential decision-use tools, such as budget variance templates, receivable tracking and cost review procedures should also be implemented alongside audit readiness and tax readiness.
If you consider these results from the resource allocation perspective, it would seem that the productivity return from existing digital systems could be increased by allocating some of the support funding for SMEs to capability vouchers rather than general digital onboarding. Some of the services covered by capability vouchers include accounting advisory sessions, setting up bookkeeping software, training on tax records, and productivity diagnostics. It would be an efficient solution because it focuses on the major constraint that was uncovered in the study: firms create traceable records, but many do not add these records to their upgrading routines. This support should begin with SMEs as the evidence from the subgroups shows that the pattern of DFI × AFI is weaker and negative for SMEs. Larger companies may already have their own systems in place, while SME’s require a formal approach to converting digital data into business decisions.
The third implication is with regard to innovation policy. The study shows a negative correlation between digital finance and product innovation which remains robust even after Romano-Wolf correction, but is not robust at the Cinelli-Hazlett benchmark. Existing policy could be based on the premise that all types of upgrading are equally benefited from digital finance. That is not the case, however, according to the evidence. Hence, Productivity support should be differentiated from Innovation support through the Small and Medium Enterprises General Authority, sector agencies and funds supporting innovation. Productivity instruments are better suited to firms that are more “digital-payment intensive”, whereas products innovation will need other instruments, including prototyping grants, market testing support, technical mentoring, connections to research institutions, etc. This separation can help avoid the wrong resources being utilized.
Implementation should be done in stages. Agencies can pilot in areas that have high digital-payment usage but low innovation. One way is to pilot in sectors that have high digital-payment usage with low levels of innovation. Secondly, the pilots should compare the companies that are only onboarded with digital finance with those that are onboarded, participate in the accounting-use training and receive innovation vouchers. Third, future assessment should be based on panel data for the ZATCA e-invoicing rollout to verify if such policy packages generate more upgrading results in the long run. The staged approach respects the limitation of the study that the evidence is a conditional association not causality but translates the results into testable policy action.
The current findings on Saudi firms find different attributes compared to past results. According to Tengeh and Gahapa Talom (2020), document mobile payments × bank-account complementarity in the case of African SMEs; our analogues payments × formalization test produces a null. The majority of the paper’s sample firms’ bank-access is high (92.5 percent), the DFI-product-innovation association is negative, and the relief mechanism is not included in our sample, so the result of Shi (2024), that DFI positively affects breakthrough innovation for Chinese firms through a financing-constraint relief mechanism does not agree with the point of this paper. Moreover, the findings also contradict the results of Dalton et al. (2024), as they don’t identify a relationship between e-payments and productivity of SMEs, but do see an association between e-payments and access to credit. This paper gets productivity association but not credit-channel evidence. We can still see that there is cross-country evidence in support of our findings of productivity in Aguilar et al. (2024), but it fails to account for firm-level moderation of productivity.
Five scope conditions limit the interpretation. The dataset cross-section cannot resolve reverse causation and unobserved managerial quality, estimate are conditional associations, not causal effects. The sample is export-oriented (with an export share as high as 97.3 percent), and businesses’ characteristics are more formal and internationally oriented than the characteristics of the overall Saudi enterprises. The six-sector clustering structure is small, and even with the wild-cluster bootstrap discipline (Cameron et al., 2008). A more intensive industrial classification would strengthen inferential resolution. DFI is not comprehensive enough in terms of full digital transformation of the firm, but rather it captures digital finance engagement. AFI’s internal channel rests narrowly on external-auditor certification. Generalization is defensible in terms of reform-intensive settings with active payment modernization and tax-formalization infrastructure.
The bigger picture is that digital financial traceability does not guarantee an upgrade; it only opens informational possibilities of upgrading if there is any. The procedure depends on capabilities outside the DFI and AFI measurement architecture (managerial analytics, experimentation routines, human-capital investment), whether firms convert traces into productivity, innovation, and training, which the observational design cannot evaluate directly.
This paper thus posed the question under which “organizational” conditions, does digital finances lead to upgrading of the firm? More streamlined accounting has been proposed to be more closely tied to the DFI-upgrading association by the framework, as it would result in clearer signals (Athey & Stern, 1998; Lambert et al., 2012). With all Saudi Firms (N = 1002), the DFI × AFI interaction is a negative point estimate, and the wild-cluster bootstrap p = 0.517 (B = 1999) indicates that it’s not well-precise, meaning that DFI doesn’t precisely have a positive moderation effect by any of the AFI sub-channels.
There are two patterns about the outcome-specific information that are quite significant. Digital finance is positively significant for labor productivity (β = + 0.075; wild p = 0.040; Romano-Wolf p = 0.046), but opposite with product innovation (Romano-Wolf p = 0.046), robust to the Oster bound but positioning at the binding Cinelli-Hazlett benchmark. The approach that the pattern is consistent with is a transactional/experimental specialization (e.g., “(Aghion & Tirole, 1994; Bloom et al., 2013)”).
Another two methodological contributions are presented. There is a 100-fold variation in the p-value for the productivity interaction cluster robust versus the wild-cluster bootstrap discrepancy. This illustrates why observational complementarity tests mattered by few-cluster inference discipline. The Cinelli-Hazlett as a binding constraint reframes operationalizes a stricter sensitivity standard than what Oster alone delivers.
Saudi Arabia’s payment and VAT-formalization is aligned with SDGs 8.2, 8.3, and 9 (Economic & Affairs, 2024), but infrastructure alone does not produce the predicted complementarity. Bundled interventions, for example, combining digital-finance onboarding with Audit-readiness support, targeting capability vouchers at SMEs, should be examined as candidate sites. With more fine-grained accounting measures, panel data, and quasi-experimental designs, the ZATCA e-invoicing rollout exploitation would sharpen the identification.
The authors confirmed that no generative Artificial Intelligence (AI) tools were used in the conceptualization of this research, the writing, data analysis, or interpretation of this study.
Qamruzzaman, M. (2026). Enterprise survey data - Saudi Arab, figshare. Dataset. https://doi.org/10.6084/m9.figshare.32354577 (Qamruzzaman, 2026a, 2026b). This work contains the following underlying data:
Qamruzzaman, M. (2026). Enterprise survey data - Saudi Arab, figshare. Dataset. https://doi.org/10.6084/m9.figshare.32354577 (Qamruzzaman, 2026a, 2026b). This work contains the following extended data:
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
| Views | Downloads | |
|---|---|---|
| F1000Research | - | - |
|
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)