Keywords
parity, age at first birth, breast cancer, cohort, structural zeros, incidence
This article is included in the Oncology gateway.
The awareness in reproductive epidemiology of potential structural zeros, defined as impossible combinations of covariate values, has been limited compared to studies of random or sampling zeros. None of the many studies of parity (number of children) and age at first birth (AFB) have analysed or discussed the strong interdependency of high parity and AFB. Here we introduce statistical methods for estimating relative risk (RR) for a breast cancer diagnosis in prospective studies with structural zeros using Cox proportional hazard models.
Information on parity and age at marriage for 385.816 women was collected in the 1960 Norwegian Census. Women aged 45–90 years were followed to the end of 2005 based on linkages to the Cancer Registry of Norway identifying 16 905 incident breast cancer diagnoses. The new methodology handles structural zeros in Cox proportional hazard model making it possible to separate the effect of parity and AFB. The model also estimates the effect of increases or decreases in RR for each additional child independently.
In a full model each additional child was associated with a relative risk (RR) of 0.89 (95% CI; 0.88–0.90) or an 11% reduction per child over the entire fertility range of 1–15 children. The effect of AFB was reduced in the same model and only had impact on the first three children. The effect of early birth was less than one additional child.
New methods for handling structural zeros in Cox hazards analyses demonstrated a shift in the interpretation towards stronger effect of parity and less of AFB.
parity, age at first birth, breast cancer, cohort, structural zeros, incidence
The concept of structural zeros is known from health research of infectious diseases,1 alcohol consumption2 and ecology.3 Structural zeros refer to impossible combinations of covariates values, i.e. combinations where there a priori is known to be zero observations. They should be kept separated from zeros due to random variations or sampling variability.4 Research in reproductive factor has clear limitations due to biological constraints giving structural zeros. With increasingly high parity (number of children), AFB will be restricted systematically since the women have to start childbearing earlier. Certain combinations of parity and AFB will give structural zeros due to impossible biological combinations of the two covariates. Figure 1 illustrates real or natural combinations of values to the left and the structural zeros to the right. Women with ten children must have AFB less than 30 years assuming 1.5 years intervals, and even earlier with two years interval. No prospective studies of human reproduction and cancer have incorporated methods or discussions for handling structural zeros. These analytical problems have so far not been discussed for the use of Cox proportional hazards model. Here, the aim is to develop statistical methods for estimating relative risk for a breast cancer diagnosis with the control of structural zeros in Cox hazard models. The papers5,6 show some of the challenges with using hazard ratios in complex studies. We demonstrate how to resolve some of these challenges by increasing the number of hazard ratios.
The 1960 Census in Norway established the unique Norwegian person number with information on date of birth and sex. Every household were visited by civil servants.7 The question on the number of children alive at birth was posed only to married women and only children born in the actual marriage were counted. This study included all married women born 1870–1915 or aged 45–89 years at the census. The study population consists of 386,114 women and for 385,816 the number of children was known. The average number of children was 2.6. There are 44 women in the cohort with 16–21 children and neither has any of the breast, ovary, endometrial or cervical cancer. The number of breast cancer diagnoses is shown in Table 1 together with the number of women at the start of follow-up. In the 1960 Census women were asked about the age of marriage. Age of marriage is known for 319,454 women with at least one child and for 66,158 women without children. The validity of age at marriage as a proxy for AFB has been confirmed in two reports from Statistics Norway.8,9
Due to privacy, information for women with 16 or more children is omitted.
The follow-up to the Cancer Registry of Norway and the Norwegian Cause of Death Registry were based on linkages using the unique person number. The international codes for diseases ICD7, ICD8, ICD9, and ICD10 have been transformed to a common version of ICD10 by the Cancer Registry of Norway. Analyses included 16 905 breast cancer cases, Table 1. Excluded were sarcomas (n = 62), in-situ (n = 272), women with cancer above 89 years (n = 1,047), and unknown parity (n = 13). Causes of death were available through a linkage done within Statistics Norway. Follow-up terminated at 31.12.2005. Women not registered in the Cause of Death Registry are either alive in 2005 or had emigrated. The number of emigrated women has been estimated to be 0.2%. The number of person-years, PY, is calculated from the age at the 1960 census until the first of the following: age 90, death or the first cancer diagnosis of either breast, ovary, endometrial or cervical cancer. Incidence rates are the number of diagnoses divided by PY in each category.
The project has been evaluated and accepted by the Regional Ethical Committee for South-East Norway (number 475656) and approved on 12.10.2022. There was no written informed consent due to national obligation for all Norwegians to participate in order to give everybody a unique identification number.
The legal restriction on the information was a follow-up till 90 years, no geographical information and all published results based on less than five women were truncated.
The women in the Cohort were born in the period 1871–1915. These women are only followed until they are 90 years old which was at least 18 years ago. All women are dead except for a few women with an age of at least 110 years, see Table 1 and Table 2. There is so little information that it is not possible to identify the women since there is no dates and no geographical information. Only one person (LH) has access to the data. In all published results, there are at least 5 women in each cell to ensure anonymity. Due to privacy regulations in Norway, and in any other country with the GDPR regulation, it is not possible to distribute the data set.
The new approach is an extension for the use of Cox proportional hazard model. Different combinations of the two risk factors, parity and AFB, are explored for postmenopausal breast cancer. The use of relative hazard rate is usually substituted by RR. First two models with parity as a covariate are presented, followed by a model with AFB as a covariate and then a standard model with both parity and AFB as covariates. All these models are problematic since we cannot separate the effect of parity and AFB due to the structural zeros. Finally, in model 5 we show how to separate the effect of parity and AFB. Age is also a covariate in the model, but this causes no problems since it is independent of the structural zeros.
In a simple model incidence for breast cancer can be estimated as
To find out whether the effect is the same for each additional child, a slightly more complex model is introduced,
Then, with the focus on AFB a linear model where is AFB is introduced.
The next model includes effects both of parity and AFB.
This is the standard Cox model for the three covariates age, parity and AFB. However, this model is problematic. The strong dependency between parity and AFB gives structural zeros. It becomes necessary to include separate covariates for each parity, by using the difference between AFB for a woman and the average AFB for all women with the same number of children:
age at the first birth for a woman with n children – the average age at first birth for women with n children.
is denoted as AFBd, for the difference between the actual AFB for a woman and average AFB for all women with parity n. This eliminates the structural zero problem. The covariate is only for women with exactly n children. This makes the covariates parity, n, and AFBd, when they are non-zero, independent of each other. Also, here we start with many covariates n = 1,2,,,,,15 and then reduce the number to find statistical significance. The covariates are used in the Model 5:
This model has some similarities with partial conditional Cox model proposed in.10 It is possible to include the separate effect for each child from Model 2, in Model 5. However, this reduces the statistical strength. The effect of each factor is analysed separately, analysing in model 2 and jointly and AFBd in Model 5.
The study population had 16,905 breast cancer cases. The number of women with many children decreases rapidly for each additional child, Table 1. There are 69 cases with 10 or more children. The highest observation of breast cancer is from women with 15 children. and the highest number of children in the cohort is 21.
First, the simple Model 1 with the same multiplicative effect for each additional year of age and each child is tested, Table 2. The first-order effects of age and number of children are strong with a reduction in RR of 9.6% per child.
In Model 2 the effect of each additional child is separated. The result is shown in Table 3. As expected, it is necessary to group the effect for one additional child, when the parity becomes large to get significant estimates. Here parity is grouped together for the effect of the 6th and 7th child and for the 8th and additional children. With 15 children the risk is reduced with 70% compared to 1 child.
We assume and .
The risk of breast cancer depends on AFB. The result of the linear Model 3 is shown in Table 4. The two covariates are as significant as in model 1 with very short CI-intervals The covariate AFB was not significant when testing a stepwise effect like parity in Model 2. The estimate for the parameter for AFB is close to 1, but the CI-interval is separated from 1 and the cumulative increase in RR from 16 years to 43 years is 1.83. Model 5, see below, shows that this effect is not correct, but due to the structural zeros.
Model 4 is a joint model with parity and AFB, Table 5. All three covariates have very short CI-intervals The parameter for AFB is close to 1, but the CI-interval is separated from 1. In fact, the effect of parity is almost 10 times stronger when comparing one additional child and one additional year of AFB. Since parity and AFB are strongly correlated it is difficult to interpret the result. It is necessary to rewrite the model in order to have the two covariates independent as shown in Model 5. The effect of AFBd is estimated for up to three children, Table 6. AFBd is not significant for higher parity. In this model is the effect of each additional child adjusted for AFB estimated to RR = 0.89 (95% CI; 0.88–0.90). In Model 5 we find an effect of AFB that has half the strength compared to Model 3 and only significant for n = 1,2,3. Model 4 and 5 gives about the same strength of AFB for n = 1,2,3. For n > 3 has the effect of AFB been reduced using Model 5 in contrast to the results from Model 4.
| Parameters | RR | CI |
|---|---|---|
| (age) | 1.030 | 1.028, 1.032 |
| (child) | 0.90 | 0.89, 0.91 |
| (AFB) | 1.010 | 1.0071, 1.014 |
Parity grouped as 1, 2, 3 when separating the effect of each child.
| Parameters | RR | CI |
|---|---|---|
| 1.030 | 1.029, 1.032 | |
| (child) | 0.89 | 0.88, 0.90 |
| (1. child) | 1.0081 | 1.0027, 1.0135 |
| (2. child) | 1.014 | 1.0080, 1.020 |
| (3. child) | 1.0091 | 1.0007, 1.018 |
To our best knowledge this is the first methodological development considering structural zeros in the Cox proportional hazards model. There are combinations of values for parity and AFB that is not biological possible as shown for GGM and AFB. This is different from zeros due to insufficient size of the study population. The new statistical method for analyses of structural zeros in a Cox hazard model demonstrated the need for careful analyses of existing epidemiological information to avoid bias. The traditional mutual adjustment of parity and AFB had to be replaced by use of separate covariates for each parity group. The study clearly demonstrated the overall importance of pregnancies for a reduction in breast cancer with less importance of AFB.
In Model 1 with age and parity as continuous variables the reduction was 9.6% in risk for breast cancer for each additional child, the same as in a previous analysis based on logit models13 with neither higher order nor mixed terms significant. A major difference between the hazard model and the logit regression model, is that the hazard model estimates the ratio between the two hazard rates for the different values of the covariates, e.g. estimation of relative risk, while the logit regression model estimates the incidence for each combination of values for the covariates.
The present low fertility will increase the incidence of breast cancer. In 1960 the number of women with at least 4 children was 32%, Table 1. From 1950 to 2021 the global total fertility rates went down from 4.8% to 2.3%.11 Currently, in Norway only 4.8% of live births was number four or higher.12 This indicates that 95% of the Norwegian female population has a high absolute risk for breast cancer due to low fertility.
The linear relationship was confirmed in the analysis with Model 2. RR were found to be almost the same for each additional child. There is no increasing or decreasing trend. Hence, it is natural to believe that RR for each additional child is the same. The estimate for AFB in a linear model, Model 3, showed an increase of 2.2% for each year. When parity and AFB were combined in the same analysis, Model 4, both factors were highly significant. When parity increases from 1 to 15 children then RR is reduced to 0.23 using a reduction in risk for each child of 0.9 or 10%. Similarly, RR increases from 1.0 to 1.8 with increasing AFB from 16 to 43 years. Women with AFB equal 20 or less have on average more than 5 children while women with AFB above 40 has on average 1.3 children. Hence, it is not possible to combine the two RRs as two independent dimensions.
Parity and AFB can be combined as shown in the final Model 5. Each additional birth gives an 11,0% decreased risk or a RR of 0.20 over the entire fertility range of 1–15 children. It is noticeable that the estimated RR of AFBd is much smaller than one additional child. There are many women with at least 6 children. The covariates are not significant for AFBd for more than 4 children since RR is close to 1, not because of few women in the sample. Also, with more than 4 children, AFBd varies quite little. This also reduces the effect of AFBd.
In all similar data set, there are few women with high parity. Our data set has more women with high parity than most other data sets. Therefore, this problem of estimation for high parity arises for a higher parity than in most other data sets. For all data sets, it is tempting to group women at intervals for parity e.g. 4–6 children, >5 or > 10 children. However, this must be handled with care since it may introduce a bias. In a group with parity >m, there is dependency between parity and both AFB and AFBd within the group.
The relative importance of parity and AFB as risk factors for breast cancer has been investigated for over a century from the first designed case-control study of breast cancer.12 The crude odds ratio for women with 10+ children, grand grand multiparity (GGM), was 0.16. Then, in 1970 a large international study found no effect of parity while AFB dominated with odds ratio of about one-third for women with birth under 18 years versus 35 years or more.14 Since then, many large prospective studies have confirmed the effect of parity, while the effect of AFB has been less consistent.14–24 Most of the studies of GGM used either standardized relative risk (SRR) or observed versus expected (O/E). These methods have no clear reference values since the expected values depend on the average number of children in the population. An overview of published, large cohorts with a reasonable number of grand multiparity women (GM) with 5+ children or grand grand multiparity women with 10+ children (GGM) shows that almost all of them were register-based linkage studies in the Nordic countries and Israel based on historical data. Grand multiparity women were included by going back in time or from populations with high fertility. The maximum number of children in the 1960 Census was 21.9 A methodological study of women’s reproductive capacity found the same.25 The reduction in risk for each additional child is slightly larger than in other cohort studies.
Strength and weaknesses. The Norwegian Census information from 1960 has no selection bias and no information bias for parity. Age at first marriage is established as reasonable proxy for AFB. A large proportion of young women were pregnant at time of marriage. The data set consists of women aged 45–90 years in 1960 and assumed to be postmenopausal. There was no use of external sex hormones. There is no information on sex hormone risk factors like BMI, smoking and alcohol use. These risk factors act partly by increasing levels of hormones. However, level of circulating sex hormones in postmenopausal women is independent of parity.26
The introduction of a method including structural zeros increased the relative importance of parity in contrast to AFB. The large cohort made it possible to estimate the risk of breast cancer over the entire fertility range of 1–15 children. The study demonstrated a strong reduction in the risk of breast cancer of 11% for each additional pregnancy in women in a Cox hazard model with the implementation of structural zeros. The new method separates the effect of the number of children and AFB in a decoupled analysis. AFB is a significant factor only for low parities. Thus, in low fertility countries as Norway most women are at a high risk for breast cancer.
Some of the data in this article are from the Cancer Registry of Norway. The Cancer Registry of Norway is not responsible for the analysis or interpretation of the data presented.
Access to the information from the 1960 Norwegian Census and the Cancer Registry of Norway is handled by Statistics Norway/the Central Bureau of Statistics, Oslo, Norway. It is not allowed to distribute the data set due to privacy. Contact address; [email protected].
| Views | Downloads | |
|---|---|---|
| F1000Research | - | - |
|
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the rationale for developing the new method (or application) clearly explained?
Yes
Is the description of the method technically sound?
Yes
Are sufficient details provided to allow replication of the method development and its use by others?
Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: breast cancer, nuclear medicine
Alongside their report, reviewers assign a status to the article:
| Invited Reviewers | |
|---|---|
| 1 | |
|
Version 1 07 Apr 26 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)