Keywords
Linear regression model, generalized regression model, Ridge estimator, Liu estimator, KL estimator.
Linear regression model, generalized regression model, Ridge estimator, Liu estimator, KL estimator.
The difference between this version and the first is that all corrections that were raised by the three reviewers were effected. The new version included more equations to simplify methods earlier discussed as raised by the reviewers.
See the authors' detailed response to the review by Mohammad Arashi
A special case of the Generalized Linear Models (GLM) is the Poisson Regression Model (PRM) which is generally applied for count or frequency data modelling. Other count data models include: Bell regression model, Negative binomial regression model, zero inflated bell regression model, zero inflated regression model (Amin et al., 2020, 2021; Sami et al., 2021; Rashad and Algamal, 2019; Majid et al., 2021). The PRM is employed to model the relationship between a response variable and one or more explanatory variable where the response variable denotes a rare event or count data. The response variable also takes the form of a non-negative variable, and it is applicable in the following fields: economics, health, social and physical sciences. The Maximum Likelihood Estimation (MLE) method is popularly used to estimate the regression coefficient in a PRM. In both a Linear Regression Model (LRM) and Generalized Linear Model (GLM), MLE suffers a setback when the explanatory variables are correlated, which implies multicollinearity. Multicollinearity effects include large variance and regression coefficient covariances, negligible t-ratio and a high coefficient of determination (R-square) values. Alternative estimators to the MLE in the linear regression model include the ridge regression estimator by Hoerl and Kennard (1970), Liu estimator by Liu (1993), Liu-type estimator by Liu (2003), two-parameter estimator by Özkale and Kaciranlar (2007), r-d class estimator Kaçiranlar and Sakallioǧlu (2007), k-d class estimator Sakallioglu and Kaciranlar (2008), a two-parameter estimator by Yang and Chang (2010), modified two-parameter estimator by Dorugade (2014), modified ridge-type estimator by Lukman et al. (2019), modified Liu estimator by Lukman et al. (2020), Kibria-Lukman (KL) estimator by Kibria and Lukman (2020), modified new two-parameter estimator by Ahmad and Aslam (2020), the modified Liu ridge type estimator by Aslam and Ahmad (2020) and the DK estimator by Dawoud and Kibria (2020) among others. Researchers have extended some of these existing estimators in LRM to the PRM. Mansson et al. (2012) introduced the Liu estimator into the PRM. The modified jackknifed ridge estimator for the PRM was introduced by Türkan and Özel (2016). The ridge estimator was introduced into the PRM by Månsson and Shukur (2011). A new two-parameter for PRM was developed by Asar and Genç (2017). Recently, Poisson KL estimator was developed by Lukman et al. (2021) for combating multicollinearity in the PRM.
In this study, we propose the Modified Kibria-Lukman estimator to handle multicollinearity in PRM. The estimator is a single parameter estimator which makes it less computationally intensive as compared with the two-parameter estimators. Also, since the Kibria-Lukman estimator is found to outperform the Ridge and the Liu estimators, it is expected that the modification in this study will enhance the performance of the Kibria-Lukman estimator. Furthermore, we compared the performance of the estimator with the Poisson Maximum Likelihood Estimator (PMLE), Poisson Ridge Regression Estimator (PRE), Poisson Liu Estimator (PLE) and the Poisson KL estimator (PKLE).
Given that the response variable, yi is in the form of count data, then it is assumed to follow a Poisson distribution as Po (μi) where μi = e(xiβ), and In µi = (xiβ), xi is the ith row of matrix X which is a n×(p+1) data matrix with p explanatory variables and β is a (p+1)×1 vector of coefficients. The log likelihood of the model is given as:
The most common method of maximizing the likelihood function is to use the iterated weighted least squares (IWLS) algorithm which results to:
where and is a vector while the ith element equals
The MLE is normally distributed with a covariance matrix that is equivalent to the inverse of the second derivative as:
and the mean squared error is given as:
where is the jth eigen value of the matrix.
The Ridge estimator was adopted by Månsson and Shukur (2011) to solve multicollinearity problem in count data. The estimator is defined as follows:
where and
βPRE is effective in practice but it is a complicated function of the biasing parameter k (Liu, 1993).
Mansson et al. (2012) developed the Liu estimator to the Poisson regression model as:
The MSE for the Liu estimator is defined as:
where is the jth eigenvalue of and αj is the jth element of α.
The KL estimator was proposed by Kibria and Lukman (2020) as a means of mitigating the effect of multicollinearity on parameter estimation. The estimator is defined as
By means of extension, the Poisson K-L estimator was proposed by Lukman et al. (2021) as follows:
where and
The proposed estimator is obtained as follows: in equation (2.11) is replaced with the ridge estimator. Thus, we have:
The properties of the new estimator include:
The bias can be written in scalar form as:
can be represented in scalar form as follows:
The proposed estimator in (2.14) is extended to the PRM. It is referred to as the Poisson modified KL (PMKL) estimator and defined as:
The mean squared error of the PMKL is defined as:
Suppose and Where Λ is the matrix of eigen-values of and Q is the matrix whose columns are the eigenvectors of .
The mean squared error (MSEM) and the following lemmas are adopted for theoretical comparisons among the estimators.
Lemma 2.1 Let A be a positive definite (pd) matrix, that is, A > 0, and a be some vector, then if and only if (iff) (Farebrother, 1976).
Lemma 2.2 , if and only if where and (Trenker and Toutenburg, 1990).
Theorem 2.1: is preferred to iff, provided k > 0.
It is observed that such that the expression above is non-negative for k > 0
Theorem 2.2: is preferred to iff, provided k > 0.
We can observe that the difference of the variance of the estimator is non-negative since for k > 0.
Theorem 2.3: is preferred to iff, provided k > 0 and 0 < d < 1.
The difference of the variance is non-negative since
for 0 < d < 1 and k > 0.
Theorem 2.4: is preferred to iff, provided k > 0.
The difference of the variance is non-negative since for k > 0.
The biasing parameter k for the estimator is obtained by differentiating the MSE in equation (2.21) with respect to k as follows:
The shrinkage parameter estimated by Mansson and Shukur, (2011) and Kibria and Lukman (2020) was also adopted for this study as listed:
k1 and k2 is the biasing parameter for PMKL1 and PMKL2, while k3 is the biasing parameters for PMKL3.
In this section, a simulation study is carried out to compare the performance of the different estimators. The generation of the dependent variables are done using pseudo-random numbers from Po (μi) where and Xi is the ith row of the design matrix with being the coefficient vector. The generation of the explanatory variables with different levels of correlation is obtained using
where is the level of multicollinearity between the explanatory variables (Kibria et al. 2015; Kibria and Banik, 2016; Lukman et al., 2019b, Lukman et al. 2020b). are pseudo-random numbers generated using the standard normal distribution such that i ranges from 1 to n and j from 1 to p. As a common restriction used in simulation studies, it is assumed that and Also, the effect of the intercept value is also being investigated as values are taken to be 1, 0 and -1 (Kibria et al. 2014). The different levels of correlation taken are 0.8, 0.9, 0.95, 0.99 and 0.999. The other factors varied in the simulation study are the sample size n and the number of explanatory variable p. We assume n = 50, 100 and 200 observations and p = 4 and 8 explanatory variables.
The simulation results in Tables 1 to 6 that for each of the estimators, the simulated MSE values increase as the multicollinearity level increases, keeping other factors constant. There is also an increase in the mean squared error as the sample size increases for all estimators compared while other factors were kept constant. As the intercept values varied from -1 to +1, the values of the mean squared error reduced for all estimators. Result shows that the PMKL1 performed best with minimum MSE at varying sample sizes. It was closely followed by PMKL2. They are both considered more suitable for estimation of parameters in the Poisson regression model than the MLE as it performed worst when multicollinearity is a challenge. In general, the PMKL1 estimator consistently performed more efficiently than the MLE, PRE, PLE and the PKL estimators.
Having carried out a simulation study, the efficacy of the proposed estimator needs to be further investigated by considering a real-life application. The Poisson regression model has been applied to the aircraft damage dataset initially by Myers et al. (2012) and subsequently by other researchers such as Asar and Genc (2017) and Amin et al. (2020) among others. By following the Pearson chi-square goodness of fit test, Amin et al. (2020) was able to ascertain that the data fits a Poisson regression model. The test confirms the suitability of the response variable to Poisson distribution with P-value of 6.898122 (0.07521). The dataset provides some detail on two separate aircrafts: The McDonnell Douglas A-4 Skyhawk and the A-6 Grumman Itruder. The dependent variable denotes the number of locations with damage on the aircraft and this follows a Poisson distribution (Asar and Genc, 2017; Amin et al., 2020). The data set has three explanatory variables, X1 shows the type of aircraft which makes the outcome binary (A-4 is coded as 0 and A-6 is coded as 1). X2 is the bomb load in tons and X3 is the number of months of aircrew experience. Meyers et al. (2012) was able to ascertain that the data set is greatly affected by multicollinearity. The eigenvalues of the matrix X were obtained as 4.3333, 374.8961 and 2085.2251. The condition number of 219.3654 was also obtained which is an indication of the problem of multicollinearity since it is greater than 30 (Asar and Genc, 2017). The performance of the estimators is judged based on the mean squared error of each of the estimators.
From Table 7, it is evident that all of the regression coefficients had identical signs. The estimator with the highest mean squared error is the MLE due to the presence of multicollinearity. The suggested estimator (PMKL1, PMKL2, PMKL3) has the lowest MSE that has established its dominance. We also observed that the performance of the estimator is highly dependent on the biasing parameter k. The expressions for the biasing parameters are defined in equation (2.26)-(2.28).
The parameters in the PRM are commonly estimated using the Maximum Likelihood Estimator. However, literature had shown that the estimator suffers a setback when the explanatory variables are correlated. This problem led to the implementation of alternative estimators with single shrinkage parameters such as the Poisson Ridge Regression Estimator (PRE), Poisson Liu Estimator (PLE) and the Poisson KL Estimator (PKLE). The KL estimator was generally preferred to the ridge regression and Liu estimator in the linear regression model. According to Lukman et al. (2021), the Poisson KL estimator outperforms PRE and PLE. This study modified the KL estimator to propose a new estimator called the Poisson Modified KL estimator (PMKL). The new estimator falls in the same class with the ridge, Liu and KL estimators since they possessed a single shrinkage parameter. We investigated the performance of the estimators with a simulation study and a real-life application. From the results, we observed that the new estimator consistently performed well in the presence of multicollinearity with the lowest MSE. Finally, the new estimator is more suitable to combat multicollinearity in the PRM.
All data underlying the results are available as part of the article and no additional source data are required.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Regression Analysis, Biased Estimation Methods
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: High-dimensional modeling; shrinkage estimation
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Partly
References
1. Üstündağ Şiray G, Toker S, Özbay N: Defining a two-parameter estimator: a mathematical programming evidence. Journal of Statistical Computation and Simulation. 2021; 91 (11): 2133-2152 Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Monte Carlo simulation, Linear regression model, Econometric models, Applied statistics, Biased estimation
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Yes
References
1. Amin M, Akram M, Amanullah M: On the James-Stein estimator for the poisson regression model. Communications in Statistics - Simulation and Computation. 2020. 1-13 Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Regression Analysis, Biased Estimation Methods
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: High-dimensional modeling; shrinkage estimation
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 2 (revision) 14 Dec 21 |
read | read | |
Version 1 08 Jul 21 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)