ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Nonparametric Survival Analysis estimation and comparison with Algorithm

[version 1; peer review: awaiting peer review]
PUBLISHED 01 Jul 2026
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

This article is included in the Fallujah Multidisciplinary Science and Innovation gateway.

Abstract

Accurate estimation of survival-related probability functions on positive support domains is a fundamental problem in reliability and survival analysis, particularly when data exhibit skewness and boundary effects. This study proposes a flexible nonparametric framework based on asymmetric kernel-family estimation for density, distribution, survival, and hazard functions on (0,∞). Instead of relying on a single kernel, several positive-support kernel families derived from Log-Lindley, Birnbaum–Saunders, and Inverse-Weibull distributions are constructed and compared with benchmark kernels such as Gamma and Inverse-Gaussian kernels. Bandwidth selection is performed using likelihood cross-validation (LCV) and a Silverman-type rule adapted to positive support. The proposed framework is evaluated through simulation studies under multiple distributional scenarios and then applied to real catheterization survival data. Performance is assessed using IMSE, IAE, weighted survival discrepancy measures, and information criteria. The results indicate that asymmetric kernel families substantially reduce boundary bias and provide flexible estimation for skewed survival data. In the real-data application, kernel-based survival estimates closely matched the empirical Kaplan–Meier survival curve, while several parametric competitors exhibited larger discrepancy measures. The findings demonstrate that kernel-family estimation combined with data-driven bandwidth selection offers a robust and practical alternative for nonparametric survival and hazard estimation.

Keywords

nonparametric survival, positive-support KDE asymmetric kernel families; hazard estimation; cross-validation

1. Introduction

Nonparametric estimation has become an important statistical approach for modeling complex data without imposing restrictive parametric assumptions. In survival and reliability analysis, observed data are frequently positively supported, skewed, and bounded below by zero, which makes flexible estimation methods particularly important. Kernel density estimation (KDE) is one of the most widely used smoothing techniques for estimating unknown probability density functions from observed samples. However, classical symmetric kernels may suffer from substantial boundary bias when applied to positive-support data, especially near zero. To overcome these limitations, asymmetric kernel estimation methods have been developed using positively supported distributions such as Gamma, Inverse-Gaussian, and related skewed families.13 These kernels improve estimation accuracy in bounded domains and provide better adaptability for skewed survival and reliability data. Recent developments in asymmetric kernel estimation have demonstrated improved performance in survival applications, hazard estimation, and density reconstruction for nonnegative random variables.46 In survival analysis, flexible nonparametric estimation of the survival function and hazard function is essential for accurately representing lifetime behavior without relying on restrictive parametric assumptions. Kernel-based survival estimation provides a useful alternative to classical approaches by combining smoothing flexibility with data-driven estimation.7,8 In addition, several recent studies have emphasized the importance of transformed survival models and algorithm-based estimation methods in reliability and lifetime analysis.911 This study adopts a kernel-family framework rather than relying on a single asymmetric kernel. Several positive-support kernel families derived from Log-Lindley, Birnbaum–Saunders, and Inverse-Weibull distributions are constructed and evaluated under unified comparison criteria. The proposed framework extends asymmetric kernel estimation from density estimation to survival and hazard estimation while integrating likelihood cross-validation and Silverman-type bandwidth selection methods. Recent related work and applications can be found in Refs. 1220. The main contributions of this study can be summarized as follows:

  • (i) proposing a flexible asymmetric kernel-family framework for positive-support survival estimation;

  • (ii) extending kernel estimation to density, survival, and hazard function estimation;

  • (iii) comparing several asymmetric kernel families under unified evaluation criteria;

  • (iv) integrating data-driven bandwidth selection methods;

  • (v) evaluating the proposed methodology through simulation studies and real survival data applications.

2. Asymmetric Kernel Families on (0, ∞)

Kernel density estimation (KDE) is one of the most widely used nonparametric techniques for estimating unknown probability density functions. Given a random sample x1, x2, …, xn from a positive-support distribution, the KDE provides a smooth estimate of the underlying density by averaging localized kernel functions., which leads to a smooth and flexible estimate of the probability density function by kernel functions.21 Given observations x1, x2, …, xn with x i > 0, the kernel density estimator is defined as:

f̂h(x)=1ni=1nK(t;xi;h)t>0.

where K (t; x i, h) is a nonnegative asymmetric kernel centered around xi and controlled by the bandwidth parameter h. The kernel integrates to one over (0, ∞), ensuring that the estimator remains a valid density function on positive support. In asymmetric KDE, K depends on xi so that the kernel adapts locally to the positive support and reduces boundary bias.

The following asymmetric kernel families are considered in this study. Each kernel is defined on the positive semi-axis and parameterized locally through the observation xi and the bandwidth parameter h.

  • 1. Log Lindley based kernel (via exponential transformation)

    The Log-Lindley kernel is motivated by the flexibility of the Lindley distribution in modeling skewed positive data and by its analytical tractability near the boundary region.

    Start from the Linley pdf as a baseline:

    fL(z;θ)=θ2θ+1(1+z)eθzz>0.

Using the transformation Y = exp (−Z), where Z follows the Lindley distribution, the induced Log-Lindley density on (0,1) is obtained as:

g(y;θ)=θ2θ+1(1logy)yθ1,0<y<1.

Define Ti=xilog(Y)(0,). Then a convenient log-Lindley-based kernel is:

KLL(t;xi,h)=θ2iθi+1(1+txi)exp(θitxi)1xi,t>0,

A practical local bandwidth parameterization is adopted through θ i = 1/h, allowing the kernel shape to adapt according to the smoothing level. Birnbaum-Sauders (fatigue-life) kernel.

Considering the standard normal pdf (.) and for the shape parameter α>0and scale parameterβ>0,

KBS(t;xi,h)=12αit(tβi+βit)(1αitβiβit),t>0,

One can use practically local βi=xi,αi=h.

The Birnbaum–Saunders kernel is suitable for lifetime and fatigue-type data due to its positive support and skewness flexibility.

  • 2. Invers-Weibull kernel

    The Inverse-Weibull kernel is particularly useful for modeling heavy-tailed lifetime behavior and decreasing hazard structures.

For parameters β>0,γ>0 ,

KIW(t;xi,h)=βiγit(γi+1)exp(βitγi),t>0.

These asymmetric kernels provide flexible local smoothing mechanisms while preserving the positive support of survival data. Compared with symmetric kernels, they reduce boundary distortion and improve estimation accuracy near zero.

3. Benchmark Kernel Families

To evaluate the performance of the proposed asymmetric kernel families, several benchmark kernels commonly used in positive-support density estimation are considered for comparison. These include Gamma, Inverse-Gaussian, Lindley-based, and symmetric Epanechnikov-type kernels. Consider the following kernels:

  • 1. Gamma:

    KG(t;ki;θi)=tki1et/θiΓ(ki)θikit>0

    A practical local parameterization is adopted through:

    ki=xih+1,θi=h.

  • 2. Inverse-Gaussian kernel are given by:

    The Inverse-Gaussian kernel is suitable for positively skewed lifetime data and provides adaptive smoothing near the boundary.

    KIG(t;μ,λ)=λ2πt3exp(λ(tμ)22μ2t),t>0

  • 3. Symmetric Epanechnikov kernel:

    For comparison purposes, a symmetric Epanechnikov kernel is adapted to positive support through a logarithmic transformation.

on the log-scale: let u=(logtlogxi)/h and

KE(u)=34(1u2)1(|u|1)

Then the positive support version is

KE+(t;xi,h)=1thKE(u).

The multiplicative factor 1/ t arises from the logarithmic transformation Jacobian and guarantees proper normalization on the positive semi-axis.

These benchmark kernels provide reference models for evaluating the flexibility and estimation performance of the proposed asymmetric kernel-family framework.

4. Bandwidth selection

Estimator performance is significantly impacted by bandwidth selection. We employ two complementary approaches: Silverman-type rule adapted to positive support (pilot scale estimate) and Likelihood cross-validation (LCV): choose h that maximizes the leave-one-out log-likelihood

LCV(h)=i=1nlog(f̂(xi;h)).

5. Nonparametric survival and hazard estimation

Once the kernel density estimator f̂(t) is obtained, the corresponding distribution and survival functions can be computed numerically.

F̂(t)=0tf̂(u)du,Ŝ(t)=1F̂(t).

The corresponding hazard function is estimated as

ĥ(t)=f̂(t)/Ŝ(t).

In the presence of censored observations, the Kaplan–Meier estimator ŜKM(t) is used as a benchmark nonparametric survival estimator and compared with the kernel-based survival estimate. Kernel-based survival estimation provides a smooth alternative to empirical survival estimation and allows flexible representation of lifetime behavior on positive support.

Algorithm of Asymmetric Kernel-Family Survival Estimation

Step 1: preprocess data and define minimum and maximum of data

Step 2: Preprocess data and define grid T on (min(t), max(t)).

Step 3: For each kernel family j:

    (a) Select bandwidth h j via likelihood cross-validation (or Silverman-type rule).

    (b) Compute density f̂j(t) on T.

    (c) Numerically integrate to obtain F̂j(t) and Ŝj(t)  = 1− F̂j(t) .

    (d) Compute hazard ĥj(t) = f̂j(t) /max( Ŝj(t) , ε).

Step 4: Compute Kaplan–Meier survival ŜKM(t) (benchmark, when censoring exists).

Step 5: Fit parametric models M k by MLE under censoring and compute S k(t), h k(t).

Step 6: Evaluate kernels and models using multiple criteria (Section 6) and select the best performer.

Step 7: Report tables and figures for f̂ (t), Ŝ (t), and ĥ (t).

6. Simulation study

We assess performance under two data-generating scenarios to represent different shapes and tail behaviors (e.g., Gamma-like and Lognormal-like). For each scenario, we consider several sample sizes (e.g., n = 25, 50, 100, 200) and repeat the experiment over R replications. For each replication, we compute the kernel estimates and evaluate them using integrated error measures and predictive (CV) scores.

Recommended criteria (replace/extend beyond ISE): Integrated Absolute Error (IAE) Integrated Mean Squared Error (IMSE), Hellinger distance, and likelihood cross-validation score (LCV).

This table reports the Integrated Mean Squared Error (IMSE) of the estimated density under Scenario A using two bandwidth selectors: a Silverman-type rule and likelihood cross-validation (LCV). For each kernel family, the corresponding selected bandwidth values are also reported. Lower IMSE indicates better estimation accuracy. As shown in Table 1, kernel-family performance under, Scenario A varies across bandwidth selection methods, highlighting the impact of the bandwidth choice on estimation accuracy.

Table 1. Performance of kernel families under Scenario A (example structure).

Kernel familyIMSE (Silverman)IMSE (LCV)Bandwidth (Silverman) Bandwidth (LCV)
Log-Lindley 0.005470.0074510.190.11
Birnbaum–Saunders0.057490.17490.250.198
Inverse-Weibull 0.078710.097450.150.1546
Inverse-Gaussian 0.045120.054780.350.1784
Gamma0.03710.01450.210.14
Lindley0.04570.03980.250.1
Epanechnikov (sym.)0.054870.0440.280.18

This table reports the Integrated Absolute Error (IAE) under Scenario B using two bandwidth selectors (Silverman-type and LCV). The selected bandwidth values are included for each kernel family. Lower IAE indicates better estimation accuracy. Table 2 summarizes kernel-family performance under Scenario B, where accuracy is evaluated using IAE under both Silverman-type and LCV bandwidth selection.

Table 2. Performance of kernel families under Scenario B (example structure).

Kernel familyIAE (Silverman)IAE (LCV)Bandwidth (Silverman) Bandwidth (LCV)
Log-Lindley 0.1240.04780.190.11
Birnbaum–Saunders0.1140.01450.250.198
Inverse-Weibull 0.1110.02410.150.1546
Inverse-Gaussian 0.1540.0374610.350.1784
Gamma0.1740.01450.210.14
Lindley0.1460.05470.250.1
Epanechnikov (sym.)0.1170.01780.280.18

7. Application to real survival data

7.1 Numerical results for the real data

Censoring status: all observations correspond to events (δi = 1 for all i). Therefore, the Kaplan–Meier estimator reduces to the empirical survival function ŜKM(t)  = 1 − ECDF(t). A 95% confidence interval is reported using Greenwood’s formula with the log–log transformation.

Estimated median survival time (KM): 0.75.

Kaplan–Meier (KM) survival probabilities Ŝ (t) are reported at selected quantiles of the observed survival times. Since all observations correspond to events (δi = 1 for all i), the KM estimator reduces to the empirical survival 1-ECDF(t). A 95% confidence interval is computed using Greenwood’s formula with the log–log transformation. The estimated median survival time is 0.75. such that Empirical/KM survival estimates at key time points are reported in Table 3.

Table 3. Kaplan–Meier survival estimates at selected time points (all events).

QuantileTime (t)KM S ( t)Lower 95% Upper 95%
0.10.220.8933330.8317980.933249
0.250.42250.7466670.669020.808699
0.50.750.4933330.4111180.570264
0.751.07750.2533330.1868950.324962
0.91.2710.10.05863610.154242

The dataset consists of positive survival times (in the study unit) for patients who underwent catheterization. Since no censoring indicators were provided, the empirical survival is computed as 1 − ECDF, which coincides with the Kaplan–Meier estimator in the absence of censoring.

Summary statistics for the positive survival times (n = 150) are reported, including minimum, quartiles, mean, standard deviation, maximum, interquartile range (IQR), skewness, and coefficient of variation (CV). These statistics provide an overview of the scale and dispersion of the real survival dataset used in the application. Where Descriptive statistics for the real dataset are provided in Table 4.

Table 4. Descriptive statistics of the catheterization survival times.

nminQ1medianmeanstdQ3maxIQRskewness cv
1500.090.42250.750.74960.3831.07751.410.655−0.00050.5115

For each asymmetric kernel family, this table reports the bandwidth selected by a Silverman-type rule and by likelihood cross-validation (LCV). The maximized LCV objective value, LCV( h∗) is also reported to quantify the cross-validated fit. These bandwidths are used to construct the kernel-based density and survival estimates in the real-data application. Where Bandwidths selected for the real dataset are summarized in Table 5.

Table 5. Bandwidth selection for asymmetric kernel families (Silverman vs LCV).

Kernel family h (Silverman) h (LCV) LCV(h*)
Gamma kernel0.1491090.0191596−58.0008
Inverse-Gaussian kernel0.1491090.137498−60.6772
Lognormal kernel0.1491090.137498−60.6902

Maximum likelihood estimates (MLEs) are reported for three non-Weibull parametric survival models (Gamma, Lognormal, Log-logistic), along with the log-likelihood (logL), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). Smaller AIC/BIC indicate a better trade-off between goodness-of-fit and model complexity. Parametric competitors and their information-criterion values are reported in Table 6.

Table 6. Parametric model comparison (non-Weibull) using MLE and information criteria.

Parametric modelMLE/EstimateslogLAIC BIC
Gammak = 2.89134, theta = 0.259257−71.1432146.286152.308
Lognormalmu = −0.471004, sigma = 0.673754−82.9566169.913175.934
Log-logistic alpha = 0.67575, beta = 2.62355−83.5944171.189177.21

This table compares kernel-family survival estimates against the empirical survival function 1 − ECDF(t) (equivalent to KM with no censoring). For each kernel family, the LCV-selected bandwidth h∗, the LCV log-likelihood, and several discrepancy measures between the estimated and empirical survival curves are reported (weighted ISE, ISE, and IAE). The mean hazard (grid average) is included as a descriptive summary of the estimated hazard level over the evaluation grid. Lower error measures indicate closer agreement with the empirical survival. Where Kernel-family survival estimates are quantitatively compared with the empirical survival in Table 7.

Table 7. Kernel-family survival comparison vs empirical survival (1 − ECDF).

Kernel family h* (LCV)LCV log-likelihood Weighted ISE on S ( t)ISE on S ( t) IAE on S ( t) Mean hazard (grid avg)
Inverse-Gaussian kernel0.137498−60.67720.0003325590.000364750.01773972.67995
Lognormal kernel0.137498−60.69020.0003363110.000368520.01785022.6746
Gamma kernel0.0191596−58.00080.0004189330.0004375120.0191862.70542

This table compares fitted parametric survival models (Gamma, Log-logistic, Lognormal) against the empirical survival 1 − ECDF(t). Discrepancy is quantified using weighted ISE, ISE, and IAE computed over the evaluation grid. Lower values indicate improved agreement with the empirical survival curve. Parametric survival models are compared to the empirical survival in Table 8.

Table 8. Parametric survival comparison vs empirical survival (1 − ECDF).

Parametric modelWeighted ISE on S ( t)ISE on S ( t) IAE on S ( t)
Gamma0.005259440.0041530.0680224
Log-logistic 0.006668450.005431730.0760906
Lognormal0.00931880.007402980.0905809

A real survival dataset (survival times) is used to illustrate the proposed methodology. We estimate the density and the survival function using the best-performing asymmetric kernel family and compare it with:

  • Kaplan–Meier estimator (nonparametric survival benchmark).

  • A selected parametric model (e.g., Lognormal or Log-logistic) fitted by MLE (non-Weibull).

Evaluation focuses on survival-level discrepancies and predictive performance rather than relying only on classical goodness-of-fit tests.

7.2 Figures

Figures 14 summarize the real-data application of the proposed positive-support kernel-family framework. We present kernel-based density estimates under different bandwidth selection strategies and compare the resulting fitted curves with the empirical distribution of the data. In addition, we report normalized error/predictive measures to quantify performance across kernels and bandwidth selectors, and we compare survival curves to evaluate how well the nonparametric estimators reproduce the empirical survival pattern. Together, these figures illustrate the impact of bandwidth selection (Silverman vs LCV), the differences between kernel families on positive support, and the resulting consequences for density and survival estimation.

dc3f2926-2f4e-45ad-b678-a6f28ae2bee6_figure1.gif

Figure 1. Displays kernel-family density estimates for the catheterization data using both Silverman’s rule and likelihood cross-validation (LCV) bandwidths; the histogram represents the empirical distribution, while the solid curves correspond to asymmetric kernel estimates.

dc3f2926-2f4e-45ad-b678-a6f28ae2bee6_figure2.gif

Figure 2. Presents kernel-family density estimates for the real data using the LCV-selected bandwidth, highlighting differences among positive-support (asymmetric) kernel families.

dc3f2926-2f4e-45ad-b678-a6f28ae2bee6_figure3.gif

Figure 3. Summarizes normalized error and predictive measures across kernel families and bandwidth selectors, including IMSE and IAE (computed against a lognormal reference fit on a dense grid) and the LCV log-likelihood (higher values indicate better fit).

dc3f2926-2f4e-45ad-b678-a6f28ae2bee6_figure4.gif

Figure 4. Survival function comparison for the real data. The empirical survival (1 − ECDF; equivalent to Kaplan–Meier with no censoring) is contrasted with the best-performing kernel-family survival estimate and the best parametric survival model selected by information criteria.

8. Conclusions

This paper provided a kernel-family system for positive-support nonparametric estimation and applied it to survival evaluation by estimating survival and hazard functions. In contrast to single-kernel methods, the family-based design enables practitioners to choose kernels that correspond to the data’s tail characteristics and boundary behavior. An efficient, data-driven method for choosing bandwidth is likelihood cross-validation. Comparing kernel-based survival with Kaplan-Meier and non-Weibull parametric models in real survival analysis reveals the useful trade-off between interpretability/parsimonious structure (parametric) and flexibility (nonparametric).

All tables have been labeled sequentially ( Tables 18), cited in the text, and provided with complete.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 01 Jul 2026
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
J.S .AL-Majidi A, Abdul Hafedh Mohammed E and Faydh Mohammed S. Nonparametric Survival Analysis estimation and comparison with Algorithm [version 1; peer review: awaiting peer review]. F1000Research 2026, 15:1054 (https://doi.org/10.12688/f1000research.177792.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 01 Jul 2026
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.