Informative prior on structural equation modelling with non-homogenous error structure

Introduction: This study investigates the impact of informative prior on Bayesian structural equation model (BSEM) with heteroscedastic error structure. A major drawback of homogeneous error structure is that, in most studies the underlying assumption of equal variance across observation is often unrealistic, hence the need to consider the non-homogenous error structure. Methods: Updating appropriate informative prior, four different forms of heteroscedastic error structures were considered at sample sizes 50, 100, 200 and 500. Results: The results show that both posterior predictive probability (PPP) and log likelihood are influenced by the sample size and the prior information, hence the model with the linear form of error structure is the best. Conclusions: The study has been able to address sufficiently the problem of heteroscedasticity of known form using four different heteroscedastic conditions, the linear form outperformed other forms of heteroscedastic error structure thus can accommodate any form of data that violates the homogenous variance assumption by updating appropriate informative prior. Thus, this approach provides an alternative approach to the existing classical method which depends solely on the sample information.


Introduction
Bayesian structural equation modeling (BSEM) analyses the relationship between the observed, unobserved, and latent variables within the Bayesian context. 14, 16,21,23 The data visualization can be done by path diagram. 24 In spite the rising number of statistical research ideas that have been created and verified using structural equation modelling (SEM). Despite its propensity to skew statistical estimates and inference and unlike the classical regression, we suggest the use of diagnostic tests for the presence of multicollinearity, heteroscedasticity, and nonnormality. Bayesian structural equation model investigations rarely mention the use of statistical approaches for measurement and structural model assessment, non-normality, multicollinearity, heteroscedasticity, and combinations thereof.
In Bayesian inference, θ is random, which depicts the level of uncertainty about the true value of θ because both the observed data y and the parameters θ are assumed random. The joint probability of the parameters and the data as functions of the conditional distribution of the data given the parameters, and the prior distribution of the parameters can be modelled. More formally, where P(θ|y) is the posterior distribution P(θ) is the prior distribution P(y|θ) is the likelihood function The un-normalized posterior distribution when expressed in terms of the unknown parameters θ for fixed values of y, this term is the likelihood L(θ|y). Thus, can be rewritten as: Studies abound on classical methods and Bayesian methods with a focus on homogeneous variance. 8,19,22,26 This study explores the BSEM using different forms of heteroscedastic error structure.

Bayesian estimation of structural equation models (SEM)
This section develops a Gibbs sampler to estimate SEM with reflective measurement indicators. 1,11,12 The Bayesian estimation is illustrated by considering a SEM that is equivalent to the mostly used model. A SEM is composed of a measurement equation (3) and a structural equation (4) 9 : where i ϵ 1, …,n f g

REVISED Amendments from Version 1
The title of this manuscript will not change "Informative prior on structural equation modelling with non-homogenous error structure" because the title suggested by the reviewer is very broad and all-encompassing more importantly using the noninformative prior will be consider as the area of further research. A brief introduction of the research result was added to the abstract, more elaborations on the background information on the introduction was done by discussing the related work using classical approach which gave room for this area of research using the Bayesian approach with an informative prior. All the symbols used in this work were defined either after or before the each of the equation. And, in conclusion the limitation and future research directions has been spelt out.
Any further responses from the reviewers can be found at the end of the article It is assumed that measurement errors are uncorrelated with ω and δ, residuals are uncorrelated with ω and the variables are distributed as follows: Given Y and Ω, Λ and Ψ ε are independent from Σ ω . Draws of Ω, can cause estimation of Λ and Ψ ε as a simple regression model. Thus, sampling from the posterior distribution of Λ and Ψ ε without reference to Σ ω : The same holds for inference with regard to M, Φ and Ψ δ , which are independent from Y given Ω.

Heteroscedastic error structures
The heteroscedastic error structure with different functional form of error variance under consideration are double logarithmic form, linear form, linear-inverse form and linear-absolute form as expressed in equation 17, 18, 19 and 20, respectively. 5 The posterior distribution The posterior density is the product of the likelihood and the prior distribution chosen 2,13 Since the full posterior distribution is intractable; a Markov chain Monte Carlo (MCMC) simulation method of Gibbs sampling is employed. 26 This involves the use of marginal posterior distribution.
Consider an informative prior created by set.
And letting c ! 0 for j ¼ 1, 2 The posterior distribution of λ * conditional on γ * , h, Ω is given by: Solving the exponential part of the above equation, we will have: The additional term not involving λ * is factored out to give: Factorization in terms of λ * , the term in the exponential becomes: So, the posterior density of λ * conditioned on other parameter h, Ω, y * is a multivariate normal with mean λ * and variance σ 2 * .
That is, The posterior distribution of h conditional on λ * , Ω, y * is given by: The posterior distribution of Ω * , conditional on y * , λ * , h, is given by: The Gibbs sampler The Gibbs sampling procedure used in this study involves generation of sequence of draws from the conditional posterior distribution of each parameter. 2,22,26 Gibbs sampling procedure (i) Chose a starting or initial value, (v) Perform the Burn-in by dropping the first S 0 ð Þ of these draws to eliminate the effect of ϕ 0 , the remaining S 1 draws are then averaged to obtain the estimate of the posterior E g ϕ ð Þ=y ½ .
The right-hand side of (15) is proportional to the density function of an inverse Wishart distribution.
Design of simulation • At different functional forms of 3 heteroscedastic error structure with changes in sample size of 50, 100, 200 and 500. Hyper-parameter will be arbitrarily chosen for the simulation using Gibbs sampler an MCMC method. 6,22 • The R code can be accessed via the Extended data. 27 • Factor loading and error precision followed multivariate normal and inverse gamma distributions respectively to assess the prior sensitivity. 21 • The criteria that will be used to assess the performance of the posterior simulation technique are the posterior estimates.
In order to evaluate the Bayesian model fit, we used the posterior predictive probability (PPP) procedure. 4,5,7,25 After achieving convergence (after j iterations).
gives Bayesian estimates of parameter and the latent variables. 10,17,23

Results and discussion
The section presents the discussion of analysis of results; performances of the estimators across the parameters for the different forms of heteroscedasticity, performances of Bayesian posterior simulation and analytical methods in the presence of heteroscedasticity via consideration of four (4) different forms of heteroscedastic error structures over four sample sizes of 50, 100, 200 and 500.
Performance of the estimators at heteroscedasticity condition This gives the results for the latent and observed variables at various sample sizes for the four heteroscedastic error conditions considered.
Comparison of latent variable estimates at different sample sizes under the heteroscedasticity condition Using the assumed values for the estimates which are λ 1 = 2.0, λ 2 = 3.0 and precision = 15.0.
The covariance matrix of ω was derived to be  Examining different forms of heteroscedastic error structures in Bayesian structural equation modeling using informative priors, rather than assuming homogenous variance which is often a statistical fallacy in many studies. We compare the models' posterior means and standard deviations in Tables 1, 2, 3 and 4. The differences are unlikely to impact substantive conclusions, but two of them are noteworthy.
First, the posterior means of the loadings ( λ 1 and λ 2 ) are somewhat smaller under different heteroscedastic condition with the informative priors as observed in Tables 6 and 7. Second, the factor variance γ * is larger under our model with informative priors, likely because the informative prior placed more density on larger values of the posterior standard deviation. An evaluation of the model fit was based on the values of PPP as shown in Table 5 and it was observed that the linear form is the best with minimum PPP value as sample size increases. It was also revealed by the downward slope of the model as the sample size increases from 50 to 500 shown in Figure 1b when compared with Figure 1a, 2a and 2b.

Comparison using Log-likelihood and PPP under the second heteroscedasticity conditions
LogLik PPP

Comparison using Log-likelihood and PPP under the fourth heteroscedasticity conditions
LogLik PPP  Considering an improvement to maximum likelihood method, in Bayesian estimations, parameters are considered as random with informative prior distribution also known as the conjugate family of the posterior, once the data is simulated/ collected, it is combined with prior distribution using Bayes theorem, next posterior distribution is calculated reflecting the prior knowledge and simulated data. 14,15,21 Joint posterior distribution is summarized using MCMC simulation techniques in terms of lower dimensional summary statistics as posterior mean and posterior standard deviations. 5,26 We observe that the structural and measurement equation obtained from this study are adequate and in general we could accept the proposed model.

Conclusion
In this research, the derived Bayesian estimators of a structural equation model in the presence of different forms of heteroscedastic error structures validated accurate statistical inference. The study has also been able to address sufficiently the problem of heteroscedasticity of known form using four different heteroscedastic conditions for both linear and quadratic forms, and it has also successfully modified the homogenous error structure to heteroscedastic error structure in Bayesian structural equation model. 20 The linear form outperformed other forms of heteroscedastic error structure thus can accommodate any form of data that violates the homogenous variance assumption by updating appropriate informative prior. However, these heteroscedastic error structure models can also be tested as an area of further research by updating appropriate noninformative prior. 16,18 Thus, this approach provides an alternative approach to the existing classical method which depends solely on the sample information.

Data availability
Underlying data All data underlying the results are available as part of the article and no additional source data are required.

Open Peer Review
Research, Cairo University, Giza, Egypt This paper investigated the impact of informative prior on Bayesian structural equation model with heteroscedastic error structure. Four different forms of heteroscedastic error structures were considered. The suggested Bayesian approach provides an alternative approach to the existing classical method which depends solely on the sample information. The results indicate that the suggested Bayesian estimation method is more efficient than the existing classical method.
In my opinion, the paper offers a good contribution. So, I recommend accepting this paper, but after making the following modifications to improve the manuscript: I think the title of the paper needs improvement. I suggest the following title: "Bayesian estimation of structural equation modelling with non-homogenous error structure". 1.
In the "abstract" section, the findings or research results should be introduced briefly in the abstract.

2.
In the "introduction" section, the introduction did not contain enough background information. Also discuss the similar work that has been done in this area to give a detailed view of this work. The authors should add more papers related to the Bayesian estimation of structural equation modelling.

3.
In the "methods" section, the authors should define each symbol given in each equation. 4.
In the "conclusion" section, the limitation and future research directions should be mentioned.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Applied statistics, Econometric models. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
consideration are double logarithm form, linear form, linear-inverse form and linearabsolute form as expresses in equation 17,18,19 and 20. structure in Bayesian structural equation model. Thus, the approach provides an alternative approach to the existing classical method which depends solely on sample information. 3.