Survival models for right censored breast cancer data: theory, application and comparison [version 1; peer review: awaiting peer review]

Background: Censoring frequently occurs in disease data analysis, which is a key characteristic of time to failure modeling. Typically, time to failure studies are conducted through non-parametric and semiparametric modelling techniques. Parametric models provide more efficient estimates, but are seldomly used, because of some of the limitations and assumptions which need to be fulfilled to apply them. The aim of this study is to illustrate the theoretical and application limitations and performance of different flexible and standard parametric models to evaluate the prognostic value for mortality risk of breast cancer after recurrence among women. Methods: This article describes the theoretical properties of flexible parametric models and compares their performances to standard parametric models, by studying mortality in women diagnosed with breast cancer. We describe how time to failure data may be analyzed with nonlinear flexible models. In this regard, we apply fractional polynomials, spline models, piecewise exponential models, and piecewise exponential additive mixed models. We also illustrate properties of standard parametric models. All analyses have been conducted with multiple covariates to identify significant predictors. Information criteria have been used to evaluate performances of models. Results: Fractional polynomial and spline-based generalized additive models work well in capturing local fluctuations. Parameter estimation with a piecewise exponential additive mixed model (PAMM) as an extension of the piecewise exponential modelling (PEM) approach automatically penalizes model complexity, which is very helpful to avoid over fitting. Conclusions: Flexible parametric time to failure models are more efficient than standard parametric time to failure models. By Open Peer Review Reviewer Status AWAITING PEER REVIEW Any reports and responses or comments on the article can be found at the end of the article. Page 1 of 12 F1000Research 2021, 10:1042 Last updated: 14 OCT 2021


Background
Cancer causes a large disease burden worldwide, among which breast cancer is the most frequent cause of cancer deaths in women. Pakistan, being a lower-middle-income country, has a greater number of breast cancer patients compared to its neighboring countries. It is the country with the highest age-standardized death rate in 2019 globally. Risk of death increases after early breast cancer recurrence in the first three to five years of primary treatment. Time after recurrence to death is analyzed through time to failure techniques, incorporating recorded prognostic factors before recurrence such as age, tumor grade, molecular subtype and treatment. 1 In previous research, age has not been proven to be a significant influence on breast cancer deaths. 2,3 To further explore its role, age at diagnosis and age at recurrence are included in this study with other covariates.
Time to failure data have incomplete information about exact event occurrence time, which is known as censoring. Three common types of censoring are encountered in time to failure studies: right, left and interval censoring. The most common is right censoring, which is classified into three types: fixed type 1, type 11 and random type 1. In fixed type 1, right censoring occurs for all understudy subjects, who do not observe the event of interest during the predefined study time. Type 11 censoring is named for all subjects who do not observe a specific event after a pre-specified number of events have occurred. In random type 1 right censoring, censored subjects have different censoring times, as not all have same entry time into the study. 4 Non-parametric, semi-parametric and parametric modelling techniques are amenable to analyze such types of time to failure disease studies. 5 Kaplan-Meier (KM) is the simplest method, used to estimate survival function by a non-parametric maximum likelihood estimator (NPMLE), which has the limitation of studying only one factor at a time. Therefore, it is not suitable for multivariate studies. 6 The Cox proportional hazard (PH) models, a semi-parametric approach, does not assume the shape of the baseline hazard function, so distributions of regression parameters' outcomes remain unknown. 7 Cox PH models incorporate multivariate predictors by holding the PH assumption, which assumes a fixed proportion of hazard for individuals. In case of right censoring where upper bounds of event occurrences are not specified, regression parameters are estimated through dividing the likelihood function of the PH model into two parts: one comprises of the baseline hazard and unknown parameters, while the other has only unknown parameters to be estimated, which is called partiallikelihood. Breslow 8 and Efron 9 introduced approximations in partial-likelihood to handle ordered ties in uncensored event times, while exact and discrete methods are also available, in which non-ordered tied survival times are applied through a partial likelihood approach. 5 Validity of PH assumption can be checked through a standard global test suggested by Grambsch and Therneau. 10 Furthermore, graphical ways of plotting residuals versus predictors are also discussed in their research. 10 In case of nonproportionality, extended Cox PH models can apply, which account for the effects of time varying predictors on survival times. 11 Spline-based methods are a good choice to estimate effects of unknown nonlinear predictors on continuous response through penalized partial likelihood, they also explore the functional form of non-proportional predictors. 12 Piecewise models are a good choice for long length follow-up studies, where predictors' effects are checked at different finite time intervals to obtain in-depth information about disease progression. 13 Parametric models rely on a fully maximum likelihood approach, parametric estimates are more efficient and precise if conducted through correctly specified forms. In parametric modelling, time to failure is assumed to follow any distribution, such as exponential, Weibull, gamma, generalized gamma, log-normal, log-logistic, Gompertz and Generalized F. 14 By building a linear relationship between the logarithm of failure time and predictors, data can be analyzed through the accelerated failure time (AFT) model. In AFT models one-unit changes in predictors explain a proportional change in survival time, as illustrated by Lee and Go, 15 while in PH parametric form assumes a proportional change in hazard due to a one-unit change in predictors.
The aim of this paper is to review and apply the above stated modelling techniques to time to failure data, and to evaluate their performances through statistical measures, to investigate the best fitted one for right censored data, while fulfilling limitations and assumptions.

Study design
Our data consists of 1,028 women diagnosed with breast cancer in Lahore, Pakistan. All women observed recurrence between February 2011 and February 2018 after initial treatment. They were treated at the same hospital (Institute of Nuclear Medicine & Oncology Lahore, Pakistan). The primary endpoint of this study is death due to breast cancer. Exclusion criteria were: incomplete or missing information, women diagnosed with another disease or another cancer before breast cancer, and bilateral carcinomas. Women who were still alive (survived) at the end of the study, or died due to another reason than breast cancer, are considered right-censored. Age at diagnosis, age at recurrence, estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (Her2), tumor grade, radiotherapy and chemotherapy are the predictors included in this study, all were chosen with the help of clinicians and oncologists.
In the data, age at diagnosis and age at recurrence (in years) are continuous variables, estrogen receptor, progesterone receptor and Her2 are represented by binary variables (0 = Negative, 1 = Positive), tumor grade is represented categorically (I,II, III). In addition, chemotherapy and radiotherapy are indicated by dummy variables (0 = No, 1 = Yes), here 0 indicated the patients who did not receive the treatment, while 1 ¼ Yes meant they received treatment. Survival time was considered from recurrence to death or drop out, and censoring status was coded with 0 for censored, and 1 for death due to breast cancer. Along with the aim of this study, which is the comparison of different parametric models, it is also a major interest to find out how the two treatments (radiotherapy and chemotherapy), in combination with the other predictors, affect the survival of breast cancer patients after recurrence.
Proportional hazards have been checked by scaled Schoenfeld residuals (Extended data). Furthermore, the statistical tests revealed that not all covariates are statistically significant (p < 0.05), but the global test is statistically significant. Therefore, we can assume that the proportional hazard assumption holds.

Regression models
In this work, right censored time-to-event data are considered. Survival time is denoted by T and censoring time by C j , where j ¼ 1, 2,3,…,m are women diagnosed with invasive breast cancer who observed first recurrence after primary treatment. The j th event time is defined by t j ¼ min T j , C j À Á , while time for censored and uncensored events is denoted by The general relationship form, between T and X is given as where X ¼ x 1 ,x 2 ,x 3 , …,x p , is the vector of predictors which may have an impact on time to failure, and f is the unknown functional form, which may be linear or non-linear. Practically, a 100% exact true relationship is not possible. To cover up uncontrolled chances of error, ε is also included in the model. S t ð Þ ¼ P T > t ð Þis the survival function, which represents the probability of a woman survivor up to a time point, and the hazard rate is written as The hazard rate represents a probability of an instantaneous failure per unit of time given that an individual patient has survival after time t. 5 The main objective of time to failure studies is to estimate the hazard function accurately. For this purpose, different modelling approaches are applied.
In the multivariate approach, the semi-parametric Cox PH model is most popular for the analysis of time to failure data. The general form is written as where, m is the number of patients under study, and x j = x j:1 ,x j:2 , x j:3 ,…x j:p À Á T is a row vector of p predictors for subject j, while α is the vector of regression coefficients. 7 The partial likelihood method employed for estimating unknown parameters is suggested by Cox. 16 In time to failure analysis, continuous predictors are often categorized, which has disadvantage of information loss within categories. Useful available statistical methods of handling continuous predictors are fractional polynomials (FPs) and restricted cubic splines (RCS).
FPs provide flexible parameterization of continuous predictors, were first used for modelling families of curves by Royson and Altman, 17 with polynomial of degree m, it can be written as m is a positive integer and α 0 ,α 1 , α 2 , …,α m are regression parameters, while f j x 1 ð Þ is divided into two parts: one consists of x p j 1 when p j 6 ¼ 0, and another of ln x 1 ð Þ for p j ¼ 0. 17 FP models have a wide variety of shapes based on different transformations. The main issue with fractional polynomials is in choosing a suitable power for polynomials, as this has a direct positive relationship with flexibility. To increase flexibility, a greater power can be used, but with the major threat of non-locality. That means that the fitted function at a given point of x 0 depends on data points which are very far from that reference point. 18 Spline regression is an improved technique, used to overcome the non-locality of fractional polynomials. In spline models, the dataset is divided into multiple parts and these parts are joined with knots. In time to failure regression analysis, spline modelling is extensively used to smooth non-linear effects of continuous predictors. Spline f X ð Þ has a smooth function. The major problem occurs in choosing the number of knots, as no hard and fast rule is available to apply the suitable number of knots. 19 Under RCS, the best way of choosing the number of knots is using the quotient of the difference between the largest and average uncensored log survival time and the largest and smallest uncensored log survival time. 20 Royston suggested another method in this respect; according to him, a good way to choose the suitable number of knots is to randomly apply different number of knots every time, and select the best model with measure of information criteria. 21 Flexible parametric models have scaling for proportional hazards or proportional odds, which are usually based on transformation of survival function by a link function S 0 t ð Þ is the baseline survival function, α is the vector of unknown parameters for predictors x: Piecewise exponential models (PEMs) are also a reasonable approach to estimate hazard ratios more accurately. 21 Under the PEM modelling approach, follow-up time is divided into i intervals, by assuming a constant baseline hazard in each interval, so that π 0 t ð Þ ¼ π i simplifies to, For the cut points of follow-up times, minimum to maximum time is divided into finite intervals, 0 ¼ n 0 < n 1 < n 2 < n 3 < … < n i ¼ t max : Here, n 0 to n i are time intervals, and the hazard rate of exponential distribution is constant over time intervals. Censoring and all unique event times can be used as time interval cut points, but no hard and fast rule is available to choose cut points, which is a point to ponder; too small or too large cut points may cause under-or overfitting.
One approach to deal with this cut point problem is an extension of PEM, in which a large number of cut points are used and the hazard is estimated semi-parametrically. This is called the Piecewise exponential additive mixed model (PAMM), in which a hazard is modeled through a smooth nonlinear function. In PAMM, predictors contribute to the hazard additively, imposing a quadratic penalty on the basis coefficients: where, f 0 t j À Á denotes log baseline hazard rate, and f n x j:n , t j À Á represented effects of smooth nonlinear constant predictors, while t j is finite time cut point. PAMM is an extension of PEM, in which modeling is done by using baseline hazard as a spline basis function, hazard is constant across intervals, through penalization over fitting is avoided even with large number of time intervals. 22 In fully parametric PH modelling, baseline hazard function is assumed to follow a specific distribution and coefficients are estimated via maximum likelihood. A number of different parametric PH models are derived by applying distributions, such as exponential, Weibull, and Gompertz.
An alternative to parametric PH is AFT models, the corresponding log-linear form of the AFT model with respect to time is given as where, α j is the vector of coefficients of unknown parameters, W is the scale parameter, and ε is the random error term. 23 AFT models measure the direct effect of predictors on the survival time rather than the hazard. ε j , as a random variable, assumes different distributions for survival time T, such as enponential, Weibull, gamma, generalized gamma, lognormal, log-logistic and generalized F. 24 The likelihood estimates are maximized using the Newton Raphson procedure, 15 which may be time consuming and tricky without computer programming. The freely available R software is used to implement all modelling techniques.

Measures of models fitting
Comparison of fitting models is done via measures of fit, which describe accuracy of fitted models for a given data set, usually called goodness of fit measures. Model fitting accuracy has nothing to do with the predictive ability for external data prediction. The Akaike Information Criterion (AIC) 25 and the Bayesian Information Criterion (BIC) 26 are two of the most common measures which are used to compare models' performances. For the PH model AIC and BIC are based on log-partial likelihood BIC ¼ À2 Log À partial likelihood ð Þ þ n df ð Þ (10) where, df represents degrees of freedom of the fit, and n is the total number of observations in the data. The minimum value of both AIC and BIC is considered a good one. Basically, the AIC criterion is used as a penalized function, as if one adds a variable, sampling variability also increases. While BIC imposes stronger penalty in the inclusion of additional covariates to the model. Hurvich et al. 27 suggested a modified AIC, in which n Â df þ 1 ð Þ= n À df þ 2 ð Þ ð Þis used as a penalty term. A corrected version of BIC proposed by Volinsky and Raftery 28 can also be used, which replaced n with uncensored observations. Corrected AIC ðAICc) and BIC BIC ð c) have written forms as Ethical approval and consent to participate According to the Ethical Guidelines for Epidemiologists (IEF-EGE) and the regulations of the ethics committee located at the Advanced Studies and Review Board, University of the Punjab, Lahore (Pakistan), no ethics approval is needed, because the analysis is based on routine data. The study was critically cleared by the Advanced Studies and Review Board of Punjab University. The letter of support written by the departmental head was submitted to the selected hospital. Prior to data collection, written consent was obtained from the head of oncology department and confidentiality was maintained by coding from data collection to analysis.

Results
In the present study, women's age was collected for: diagnosis and recurrence time. The median age at diagnosis of breast cancer was 47 years (range: 18-59); while the median age at recurrence was 49 years (range: 21-62). Median survival time after recurrence was 3 years, and just half (54.1%) of cancer cases were ER-negative. The majority of patients were PR-positive (64.6%) and had a positive human epidermal growth factor receptor 2 (52.9%). Overall, 207 women (20.1%) had tumor grade 1, whereas 821 (79.9%) had a higher level of malignancy. Chemotherapy (36.4%) and radiotherapy (87.4%) were given as primary treatments (Table 1).
There were 447 deaths among the 1,028 women included in the study. As shown in Table 1, 78.5% of deaths occurred due to breast cancer within three years after recurrence, while 20.8% between 3 to 6 years, and 0.7% of patients died due to breast cancer after 6 years of its recurrence. The molecular markers among women who died due to breast cancer were distributed as follows: 59.3% ER-positive, 66.7% PR-negative, and 65.5% human epidermal growth factor receptor 2-positive. Breast cancer death was positively associated with higher tumor grade (11 and 111; 97.3%) and no chemotherapy (67.8%).  To make results less lengthy and meaningful, we only discuss here AIC and AICc values of first three good fitted accelerated failure time distributional models. From the fully parametric models, Weibull is the best fitted one (AIC = 7269.5, AICc = 7271.9) among others, generalized gamma (AIC = 7269.8, AICc = 7272.1), gamma (AIC = 7270.2, AICc = 7272.5), and Generalized F (AIC = 7271.7, AICc = 7274.0) come next in terms of preferences, respectively. We also presented BIC and BICc in Table 2.
Multivariate fractional polynomial (MFP) is the best fitted model incorporating time dependent covariates (AIC = 5167.5, AICc = 5163.6), while the generalized additive model (GAM) (AIC = 5171.3, AICc = 5173.8) is also a good choice for analyzing non-linear continuous predictors in a multivariate setting, having the advantage of small numbers of parameters in non-integer, which is due to shrinkage during parameter estimation. Royston and Palmar's flexible parametric models Time dependent covariates, age at diagnosis and age at recurrence are modeled using 4 degrees of freedom for splines. PEM and PAMM are applied to the understudy data to get baseline hazard estimates, where finite time intervals are considered as factors to maintain constant hazard for each interval. The age at diagnosis and recurrence are estimated using P-splines with the same 4 degrees of freedom. PEM and PAMM results are compared with Nelson Aalen estimator, graphical displays showed close agreement of good model fitting in Figure 2.

Discussion
In this paper, different parametric models are compared in terms of theoretical aspects and application. Our findings suggested that progesterone receptor negative, human epidermal growth factor receptor 2 positive, higher tumor grade, and no chemotherapy increase the risk of death after recurrence. The most surprising result is regarding radiotherapy treatment, which depicted no reduction in breast cancer time to death. This might be due to a higher level of physical impairment of patients receiving radiotherapy treatment. However, patients treated by radiotherapy at an early stage have a larger survival time. 33 We applied distributional parametric models which are known as standard parametric models, with a full maximum likelihood estimator to estimate unknown parameters. AFT models make practical sense to study the influence of covariates, which may accelerate breast cancer mortality. The AFT model is the best choice for the analysis of time to failure data when hazards are non-proportional, as it provides efficient estimates and an estimate of the median failure time.
The exponential distribution having one rate parameter is often used in experiments to account for the amount of time until an event occurs. The Weibull distribution is a special case of the exponential distribution with shape and scale parameters. It provides a better fit than exponential with one extra degree of freedom. The same is true for gamma distribution, which has two parameters and has close results to Weibull. Generalized gamma has mean, location and scale parameters. Log-normal is a probability distribution, with a normally distributed logarithm. It is widely used in lifetime data analysis. The two parameters of mean and standard deviation have a more stable behavior than log-logistic distribution. Generalized F distribution is a good alternative to generalized gamma with one extra parameter. From the interpretation point of view, the AFT model's results are easy to interpret and help clinicians to make wise decisions related to the patients' conditions.
Flexible parametric models have advantage of using restricted cubic splines, which incorporates time dependent effects of predictors on the log hazard and reduces the bias of non-proportionality. The Royston and Parmar generalized gamma flexible parametric model under hazard scale outperformed the odd scale model. Of course, one should not ignore the threat of overfitting, by including greater number of internal knots. The functional polynomial model has the advantage of only considering significant factors, so it gives better results than other spline-based models. GAM under spline basis function has the potential to provide a better fit of data than generalized linear flexible parametric models. 34,35 The main strength of this study is that we described and applied different time to failure models, to right censored breast cancer data. In a piecewise exponential model the baseline hazard is modeled by step function with different intervals, estimation is done by including dummy variables for each interval. The major disadvantage of this technique is that data becomes too long, and parameter estimation becomes unstable. The piecewise additive mixed model overcomes this drawback. By adding a large number of basis functions and using P-splines between neighboring basis coefficients, parameters are estimated through restricted maximum likelihood (REML). 13

Limitations
There are several limitations to our study. First, the use of a single case study may be viewed as a limitation. However, a simulation study can also be designed to validate results. Second, the model comparisons are based on within sample information measures (AIC, AICc, BIC, BICc), while predictive performances of models can also be checked via different measures. Third, sensitivity analyses of choosing different numbers of knots in spline-based models can be performed to make firm conclusions.

Conclusion
Flexible parametric modelling of the hazard function is more efficient than standard parametric models, incorporating the complex patterns of the observed failure data. Generalized additive models provide more accurate estimates under splinebasis function, with time dependent covariates. For long follow-up studies and multiple time dependent covariates, which may have effects on hazard, penalized models are more suitable.

Data availability
Underlying data Data is available from the corresponding author, Dr. Florian Fischer (florian.fischer1@charite.de), upon reasonable request. Data can be used for research purposes, but cannot be published because it is taken from a hospital.