ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Method Article

Cox hazard models and structural zeros; grand multiparity, age at first birth and risk of breast cancer

[version 1; peer review: 1 approved with reservations]
PUBLISHED 07 Apr 2026
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Oncology gateway.

Abstract

Background

The awareness in reproductive epidemiology of potential structural zeros, defined as impossible combinations of covariate values, has been limited compared to studies of random or sampling zeros. None of the many studies of parity (number of children) and age at first birth (AFB) have analysed or discussed the strong interdependency of high parity and AFB. Here we introduce statistical methods for estimating relative risk (RR) for a breast cancer diagnosis in prospective studies with structural zeros using Cox proportional hazard models.

Methods

Information on parity and age at marriage for 385.816 women was collected in the 1960 Norwegian Census. Women aged 45–90 years were followed to the end of 2005 based on linkages to the Cancer Registry of Norway identifying 16 905 incident breast cancer diagnoses. The new methodology handles structural zeros in Cox proportional hazard model making it possible to separate the effect of parity and AFB. The model also estimates the effect of increases or decreases in RR for each additional child independently.

Results

In a full model each additional child was associated with a relative risk (RR) of 0.89 (95% CI; 0.88–0.90) or an 11% reduction per child over the entire fertility range of 1–15 children. The effect of AFB was reduced in the same model and only had impact on the first three children. The effect of early birth was less than one additional child.

Conclusion

New methods for handling structural zeros in Cox hazards analyses demonstrated a shift in the interpretation towards stronger effect of parity and less of AFB.

Keywords

parity, age at first birth, breast cancer, cohort, structural zeros, incidence

Introduction

The concept of structural zeros is known from health research of infectious diseases,1 alcohol consumption2 and ecology.3 Structural zeros refer to impossible combinations of covariates values, i.e. combinations where there a priori is known to be zero observations. They should be kept separated from zeros due to random variations or sampling variability.4 Research in reproductive factor has clear limitations due to biological constraints giving structural zeros. With increasingly high parity (number of children), AFB will be restricted systematically since the women have to start childbearing earlier. Certain combinations of parity and AFB will give structural zeros due to impossible biological combinations of the two covariates. Figure 1 illustrates real or natural combinations of values to the left and the structural zeros to the right. Women with ten children must have AFB less than 30 years assuming 1.5 years intervals, and even earlier with two years interval. No prospective studies of human reproduction and cancer have incorporated methods or discussions for handling structural zeros. These analytical problems have so far not been discussed for the use of Cox proportional hazards model. Here, the aim is to develop statistical methods for estimating relative risk for a breast cancer diagnosis with the control of structural zeros in Cox hazard models. The papers5,6 show some of the challenges with using hazard ratios in complex studies. We demonstrate how to resolve some of these challenges by increasing the number of hazard ratios.

3d959f3e-2206-4b51-9b4f-59270c750c37_figure1.gif

Figure 1. The relationship between natural combinations of parity and age at first birth showing structural zeros due to biological limitations.

Increasing the birth interval gives more structural zeros.

Material and methods

The 1960 Census in Norway established the unique Norwegian person number with information on date of birth and sex. Every household were visited by civil servants.7 The question on the number of children alive at birth was posed only to married women and only children born in the actual marriage were counted. This study included all married women born 1870–1915 or aged 45–89 years at the census. The study population consists of 386,114 women and for 385,816 the number of children was known. The average number of children was 2.6. There are 44 women in the cohort with 16–21 children and neither has any of the breast, ovary, endometrial or cervical cancer. The number of breast cancer diagnoses is shown in Table 1 together with the number of women at the start of follow-up. In the 1960 Census women were asked about the age of marriage. Age of marriage is known for 319,454 women with at least one child and for 66,158 women without children. The validity of age at marriage as a proxy for AFB has been confirmed in two reports from Statistics Norway.8,9

Table 1. Number of breast cancer, upper panel, and number of women, lower panel, for three age groups and parity.

Due to privacy, information for women with 16 or more children is omitted.

Age group0123456789101112131415
45–59415560660395214833922<5<5<5<5<5<5<5<5
60–7413641502196911706193001576745221175<5<5<5
75–8913311461179412316793332019172372475<5<5<5
Sum3110352344232796151271639718011960371411<5<5<5
Number of women in the three age groups in 1960
45–5942,75843,93257,01940,09825,51415,2299,4086,0373,9412,4991,5628264802099442
60–7420,32220,71926,79219,16212,1787,2414,3502,8701,8601,20177639323910450<5
75–893,0783,0083,9683,0111,8571,08170339629120111664372410<5
Sum66,15867,65987,77962,27139,54923,55114,4619,3036,0923,9012,4541,28375633715464

Follow-up

The follow-up to the Cancer Registry of Norway and the Norwegian Cause of Death Registry were based on linkages using the unique person number. The international codes for diseases ICD7, ICD8, ICD9, and ICD10 have been transformed to a common version of ICD10 by the Cancer Registry of Norway. Analyses included 16 905 breast cancer cases, Table 1. Excluded were sarcomas (n = 62), in-situ (n = 272), women with cancer above 89 years (n = 1,047), and unknown parity (n = 13). Causes of death were available through a linkage done within Statistics Norway. Follow-up terminated at 31.12.2005. Women not registered in the Cause of Death Registry are either alive in 2005 or had emigrated. The number of emigrated women has been estimated to be 0.2%. The number of person-years, PY, is calculated from the age at the 1960 census until the first of the following: age 90, death or the first cancer diagnosis of either breast, ovary, endometrial or cervical cancer. Incidence rates are the number of diagnoses divided by PY in each category.

Ethical clearance and privacy

The project has been evaluated and accepted by the Regional Ethical Committee for South-East Norway (number 475656) and approved on 12.10.2022. There was no written informed consent due to national obligation for all Norwegians to participate in order to give everybody a unique identification number.

The legal restriction on the information was a follow-up till 90 years, no geographical information and all published results based on less than five women were truncated.

The women in the Cohort were born in the period 1871–1915. These women are only followed until they are 90 years old which was at least 18 years ago. All women are dead except for a few women with an age of at least 110 years, see Table 1 and Table 2. There is so little information that it is not possible to identify the women since there is no dates and no geographical information. Only one person (LH) has access to the data. In all published results, there are at least 5 women in each cell to ensure anonymity. Due to privacy regulations in Norway, and in any other country with the GDPR regulation, it is not possible to distribute the data set.

Table 2. Estimated parameters and 95% confidence intervals for breast cancer with Model 1.

ParametersRRCI
exp(α) (age)1.00961.0086, 1.0105
exp(γ) (child)0.9040.897, 0.910

Statistical methods

The new approach is an extension for the use of Cox proportional hazard model. Different combinations of the two risk factors, parity and AFB, are explored for postmenopausal breast cancer. The use of relative hazard rate is usually substituted by RR. First two models with parity as a covariate are presented, followed by a model with AFB as a covariate and then a standard model with both parity and AFB as covariates. All these models are problematic since we cannot separate the effect of parity and AFB due to the structural zeros. Finally, in model 5 we show how to separate the effect of parity and AFB. Age is also a covariate in the model, but this causes no problems since it is independent of the structural zeros.

In a simple model incidence for breast cancer can be estimated as

Model1λexp(αa+γn)
where a is the age and n is the number of children. Only changes in RR for age and parity is of interest. Therefore, it is not necessary to estimate the constant λ . This model assumes that the hazard rates or incidence rates increases with the same factor for each new year of age and decreases for each additional child.

To find out whether the effect is the same for each additional child, a slightly more complex model is introduced,

Model2λexp(αa+nγn.)
where γn0 only if the woman has at least n children for n = 1,2,3,,,,,15. This model describes possible different effect for each additional child. To have statistical strength, it is necessary to assume that some of the parameters γn for higher values of n, are identical, i.e. reduce from 15 covariates to a smaller number. Model 2 is the same as model1 if all γn=γ when the value is none-zero.

Then, with the focus on AFB a linear model where b is AFB is introduced.

Model3λexp(αa+βb)

The next model includes effects both of parity and AFB.

Model4λexp(αa+γn+βb)

This is the standard Cox model for the three covariates age, parity and AFB. However, this model is problematic. The strong dependency between parity and AFB gives structural zeros. It becomes necessary to include separate covariates for each parity, by using the difference between AFB for a woman and the average AFB for all women with the same number of children:

dn=bnbn¯= age at the first birth for a woman with n children – the average age at first birth for women with n children.

dn is denoted as AFBd, for the difference between the actual AFB for a woman and average AFB for all women with parity n. This eliminates the structural zero problem. The covariate dn0 is only for women with exactly n children. This makes the covariates parity, n, and AFBd, dn, when they are non-zero, independent of each other. Also, here we start with many covariates dn, n = 1,2,,,,,15 and then reduce the number to find statistical significance. The covariates a,n,d1,d2,d3,. are used in the Model 5:

Model5λexp(αa+γn+ndnωn).

This model has some similarities with partial conditional Cox model proposed in.10 It is possible to include the separate effect for each child γn from Model 2, in Model 5. However, this reduces the statistical strength. The effect of each factor is analysed separately, analysing γn in model 2 and jointly γ and AFBd in Model 5.

Results

The study population had 16,905 breast cancer cases. The number of women with many children decreases rapidly for each additional child, Table 1. There are 69 cases with 10 or more children. The highest observation of breast cancer is from women with 15 children. and the highest number of children in the cohort is 21.

Same multiplicative effect for each year and child

First, the simple Model 1 with the same multiplicative effect for each additional year of age and each child is tested, Table 2. The first-order effects of age and number of children are strong with a reduction in RR of 9.6% per child.

Separate effect for each additional child

In Model 2 the effect of each additional child is separated. The result is shown in Table 3. As expected, it is necessary to group the effect for one additional child, when the parity becomes large to get significant estimates. Here parity is grouped together for the effect of the 6th and 7th child and for the 8th and additional children. With 15 children the risk is reduced with 70% compared to 1 child.

Table 3. Estimated parameters, 95% confidence intervals for breast cancer when including covariates when separating the effect of each child using Model 2.

We assume γ6=γ7 and γ8=γ9=γ10=γ11=γ12=γ13=γ14=γ15=γ16=γ17=γ18=γ19=γ20=γ21 .

ParametersRR, per childCIRR, total
exp(α)(Age) 1.00961.0086, 1.0105
1. Childreference
exp(γ2) (2. child)0.940.91, 0.970.94
exp(γ3) (3. child)0.910.88, 0.940.86
exp(γ4) (4. child)0.900.86, 0.940.76
exp(γ5) (5. child)0.860.81, 0.920.65
exp(γ6) (6. child)0.880.84, 0.930.57
exp(γ6) (7. child)0.880.84, 0.930.51
exp(γ8) (8. child)0.940.90, 0.980.47
exp(γ8) (9. child)0.940.90, 0.980.44
exp(γ8) (10. child)0.940.90, 0.980.41
exp(γ8) (11. child)0.940.90, 0.980.38
exp(γ8) (12. child)0.940.90, 0.980.35
exp(γ8) (13. child)0.940.90, 0.980.33
exp(γ8) (14. child)0.940.90, 0.980.31
exp(γ8) (15. child)0.940.90, 0.980.29

The effect of AFB

The risk of breast cancer depends on AFB. The result of the linear Model 3 is shown in Table 4. The two covariates are as significant as in model 1 with very short CI-intervals . The covariate AFB was not significant when testing a stepwise effect like parity in Model 2. The estimate for the parameter for AFB is close to 1, but the CI-interval is separated from 1 and the cumulative increase in RR from 16 years to 43 years is 1.83. Model 5, see below, shows that this effect is not correct, but due to the structural zeros.

Table 4. The estimated parameters, the 95% confidence intervals for breast cancer with Model 3.

ParametersRRCI
exp(α) (age)1.0291.027, 1.030
exp(β) (AFB)1.0221.019, 1.026

The effect of the parity and AFB in same model

Model 4 is a joint model with parity and AFB, Table 5. All three covariates have very short CI-intervals . The parameter for AFB is close to 1, but the CI-interval is separated from 1. In fact, the effect of parity is almost 10 times stronger when comparing one additional child and one additional year of AFB. Since parity and AFB are strongly correlated it is difficult to interpret the result. It is necessary to rewrite the model in order to have the two covariates independent as shown in Model 5. The effect of AFBd is estimated for up to three children, Table 6. AFBd is not significant for higher parity. In this model is the effect of each additional child adjusted for AFB estimated to RR = 0.89 (95% CI; 0.88–0.90). In Model 5 we find an effect of AFB that has half the strength compared to Model 3 and only significant for n = 1,2,3. Model 4 and 5 gives about the same strength of AFB for n = 1,2,3. For n > 3 has the effect of AFB been reduced using Model 5 in contrast to the results from Model 4.

Table 5. The estimated parameters, the 95% confidence intervals for breast cancer with Model 4.

ParametersRRCI
exp(α) (age)1.0301.028, 1.032
exp(γ) (child)0.900.89, 0.91
exp(β) (AFB)1.0101.0071, 1.014

Table 6. Estimated parameters, 95% confidence intervals for breast cancer when including covariates for the age of the women at the first birth using Model 5.

Parity grouped as 1, 2, 3 when separating the effect of each child.

ParametersRRCI
exp(α)(Age) 1.0301.029, 1.032
exp(γ) (child)0.890.88, 0.90
ABFdexp(ω1) (1. child)1.00811.0027, 1.0135
ABFdexp(ω2) (2. child)1.0141.0080, 1.020
AFBdexp(ω3) (3. child)1.00911.0007, 1.018

Discussion

To our best knowledge this is the first methodological development considering structural zeros in the Cox proportional hazards model. There are combinations of values for parity and AFB that is not biological possible as shown for GGM and AFB. This is different from zeros due to insufficient size of the study population. The new statistical method for analyses of structural zeros in a Cox hazard model demonstrated the need for careful analyses of existing epidemiological information to avoid bias. The traditional mutual adjustment of parity and AFB had to be replaced by use of separate covariates for each parity group. The study clearly demonstrated the overall importance of pregnancies for a reduction in breast cancer with less importance of AFB.

In Model 1 with age and parity as continuous variables the reduction was 9.6% in risk for breast cancer for each additional child, the same as in a previous analysis based on logit models13 with neither higher order nor mixed terms significant. A major difference between the hazard model and the logit regression model, is that the hazard model estimates the ratio between the two hazard rates for the different values of the covariates, e.g. estimation of relative risk, while the logit regression model estimates the incidence for each combination of values for the covariates.

The present low fertility will increase the incidence of breast cancer. In 1960 the number of women with at least 4 children was 32%, Table 1. From 1950 to 2021 the global total fertility rates went down from 4.8% to 2.3%.11 Currently, in Norway only 4.8% of live births was number four or higher.12 This indicates that 95% of the Norwegian female population has a high absolute risk for breast cancer due to low fertility.

The linear relationship was confirmed in the analysis with Model 2. RR were found to be almost the same for each additional child. There is no increasing or decreasing trend. Hence, it is natural to believe that RR for each additional child is the same. The estimate for AFB in a linear model, Model 3, showed an increase of 2.2% for each year. When parity and AFB were combined in the same analysis, Model 4, both factors were highly significant. When parity increases from 1 to 15 children then RR is reduced to 0.23 using a reduction in risk for each child of 0.9 or 10%. Similarly, RR increases from 1.0 to 1.8 with increasing AFB from 16 to 43 years. Women with AFB equal 20 or less have on average more than 5 children while women with AFB above 40 has on average 1.3 children. Hence, it is not possible to combine the two RRs as two independent dimensions.

Parity and AFB can be combined as shown in the final Model 5. Each additional birth gives an 11,0% decreased risk or a RR of 0.20 over the entire fertility range of 1–15 children. It is noticeable that the estimated RR of AFBd is much smaller than one additional child. There are many women with at least 6 children. The covariates are not significant for AFBd for more than 4 children since RR is close to 1, not because of few women in the sample. Also, with more than 4 children, AFBd varies quite little. This also reduces the effect of AFBd.

In all similar data set, there are few women with high parity. Our data set has more women with high parity than most other data sets. Therefore, this problem of estimation for high parity arises for a higher parity than in most other data sets. For all data sets, it is tempting to group women at intervals for parity e.g. 4–6 children, >5 or > 10 children. However, this must be handled with care since it may introduce a bias. In a group with parity >m, there is dependency between parity and both AFB and AFBd within the group.

The relative importance of parity and AFB as risk factors for breast cancer has been investigated for over a century from the first designed case-control study of breast cancer.12 The crude odds ratio for women with 10+ children, grand grand multiparity (GGM), was 0.16. Then, in 1970 a large international study found no effect of parity while AFB dominated with odds ratio of about one-third for women with birth under 18 years versus 35 years or more.14 Since then, many large prospective studies have confirmed the effect of parity, while the effect of AFB has been less consistent.1424 Most of the studies of GGM used either standardized relative risk (SRR) or observed versus expected (O/E). These methods have no clear reference values since the expected values depend on the average number of children in the population. An overview of published, large cohorts with a reasonable number of grand multiparity women (GM) with 5+ children or grand grand multiparity women with 10+ children (GGM) shows that almost all of them were register-based linkage studies in the Nordic countries and Israel based on historical data. Grand multiparity women were included by going back in time or from populations with high fertility. The maximum number of children in the 1960 Census was 21.9 A methodological study of women’s reproductive capacity found the same.25 The reduction in risk for each additional child is slightly larger than in other cohort studies.

Strength and weaknesses. The Norwegian Census information from 1960 has no selection bias and no information bias for parity. Age at first marriage is established as reasonable proxy for AFB. A large proportion of young women were pregnant at time of marriage. The data set consists of women aged 45–90 years in 1960 and assumed to be postmenopausal. There was no use of external sex hormones. There is no information on sex hormone risk factors like BMI, smoking and alcohol use. These risk factors act partly by increasing levels of hormones. However, level of circulating sex hormones in postmenopausal women is independent of parity.26

Conclusion

The introduction of a method including structural zeros increased the relative importance of parity in contrast to AFB. The large cohort made it possible to estimate the risk of breast cancer over the entire fertility range of 1–15 children. The study demonstrated a strong reduction in the risk of breast cancer of 11% for each additional pregnancy in women in a Cox hazard model with the implementation of structural zeros. The new method separates the effect of the number of children and AFB in a decoupled analysis. AFB is a significant factor only for low parities. Thus, in low fertility countries as Norway most women are at a high risk for breast cancer.

Disclaimer

Some of the data in this article are from the Cancer Registry of Norway. The Cancer Registry of Norway is not responsible for the analysis or interpretation of the data presented.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 07 Apr 2026
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Holden L and Lund E. Cox hazard models and structural zeros; grand multiparity, age at first birth and risk of breast cancer [version 1; peer review: 1 approved with reservations]. F1000Research 2026, 15:482 (https://doi.org/10.12688/f1000research.177703.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 07 Apr 2026
Views
10
Cite
Reviewer Report 07 May 2026
Ngo Minh Toan, University of Debrecen, Debrecen, Hungary 
Approved with Reservations
VIEWS 10
This manuscript investigates the long-term risk of breast cancer in a massive cohort from the 1960 Norwegian Census, specifically examining the interplay between grand multiparity and age at first birth. It utilizes a specialized Cox proportional hazard model to separate ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Toan NM. Reviewer Report For: Cox hazard models and structural zeros; grand multiparity, age at first birth and risk of breast cancer [version 1; peer review: 1 approved with reservations]. F1000Research 2026, 15:482 (https://doi.org/10.5256/f1000research.195969.r479073)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 07 Apr 2026
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.