Predicting the chances of live birth for couples undergoing IVF-ICSI: a novel instrument to advise patients and physicians before treatment [version 1; peer review: 2 not approved]

Background: In developed countries, the prevalence of infertility ranges from 3.5% to 16.7%. Therefore, the number of in vitro fertilization technique (IVF) and its subtype intracytoplasmic sperm injection (ICSI) treatments has been significantly increasing across Europe. Several factors affect the success rate of in vitro treatments, which can be used to calculate the probability of success for each couple. As these treatments are complicated and expensive with a variable probability of success, the most common question asked by IVF patients is ‘‘What are my chances of conceiving?”. The main aim of this study is to develop a validated model that estimates the chance of a live birth before they start their IVF non-donor cycle. Methods: A logistic regression model was developed based on the retrospective study of 737 IVF cycles. Each couple was characterized by 14 variables (woman’s and man’s age, duration of infertility, cause of infertility, woman’s and man’s body mass index (BMI), antiMüllerian hormone (AMH), antral follicle count (AFC), woman’s and man’s ethnicity, woman’s and man’s smoking status and woman’s and man’s previous live children) and described with the outcome of the treatment "Live birth" or "No live birth". Results: The model results showed that from the 14 variables acquired before starting the IVF procedures, only male factor, man’s BMI, man's mixed ethnicity and level of AMH were statistically significant. The interactions between infertility duration and woman’s age, infertility duration and man’s BMI, AFC and AMH, AFC and woman’s age, AFC and woman’s BMI and AFC and disovulation were also statistically significant. The area under the receiver operating characteristic (AUROC) curve test for the discriminatory ability of the final prediction model is 0.700 (95% confidence interval (CI) 0.660–0.741). Conclusions: This model might result in a new validated decision Open Peer Review


Introduction
Infertility is defined as a disease of the reproductive system defined by the failure to achieve a clinical pregnancy after 12 months or more of regular unprotected sexual intercourse 1 . Taking this definition into account, it is a challenge to calculate the real prevalence or incidence rates for this condition 2 .
According to the study performed by Boivin et al. 3 , the prevalence of infertility ranged from 3.5% to 16.7% in developed countries. Based on these authors' estimates, 72.4 million women are currently infertile and, of these, 40.5 million are currently seeking infertility medical care 3 . Most recent Portuguese data estimate that 9.8% of couples are infertile 4 .
Infertility is a multifactorial disease 5 . A male-related factor can be the cause, especially when there is a sperm or ejaculatory problem 6,7 or it may be a female-related factor, namely when there is a disovulation or tubal dysfunction 8,9 . Women's age, ovarian reserve, weight, and lifestyle factors -such as alcohol consumption and exercise habits -have also been related to this 5,10-16 . Nonetheless, in up to 20% of cases, no cause can be found 3,5 .
Several treatments have been proposed during the last century, with the most innovative bring in vitro fertilization (IVF), which was developed by Robert Edwards 17 in 1978. After 41 years, the two most important medically assistant reproduction techniques (MAR) are in vitro fertilization (IVF) and its subtype intracytoplasmic sperm injection (ICSI) 18 . The number of IVF and ICSI treatments has been increasing across Europe 19 . According to the European Society of Human Reproduction and Embryology, the number of MAR cycles between 1997 and 2014 increased by 13%, reaching 776 556 cycles in Europe in 2015 19 .
Despite the increasing number of MAR treatments, IVF success is not guaranteed 20 . In healthy young couples, the probability of achieving a live birth is between 20% and 25% per month. This probability may increase by up to 60% with MAR techniques 21 . The study by Malizia et al. also corroborates this finding, indicating that between 38% and 49% of couples who start IVF remain childless, even after undergoing up to six IVF cycles 22 . In Portugal, the last report of MAR showed that the treatments success rate is around 25-30% 23 .
MAR treatments are expensive, time-consuming, stressful, and may lead to anxiety, depression, or marital problems 24-28 . Also, these treatments have possible complications such as ovarian hyperstimulation syndrome, bleeding, and infection, as well as multiple or premature births 29 .
The success rates fluctuate between studies and are dependent on several factors 30 . If the chances of live birth are low and the risk or the cost is too high, the couple may consider other options such as adoption or to remain childless 31 . In other words, based on their specific probability of success, the couple may decide whether or not they proceed with the treatment. Therefore, the most common question asked by IVF patients is ''What are my chances of conceiving?" The answer to this tough question usually depends on the woman's age and infertility diagnosis 30 . Nevertheless, many more parameters are known to affect IVF outcomes 32 such as sperm quality, hormone doses, and physiological factors. Given this, a decision support system would be helpful to ensure that infertile couples are well informed regarding their chances of success with IVF 33 .
There have been various efforts to build prediction models to assist physicians in predicting MAR success 30,34-37 . To our knowledge, the first predictive model ever built in this context is from Templeton et al. in 1996 using a logistic regression model to predict the probability of live birth for an individual woman using the woman's age, number of previous live birth or pregnancies not resulting in a live birth, whether these were a result of previous IVF treatment, female causes of infertility, duration of infertility and the number of previous unsuccessful IVF treatments 30 .
These authors found that the success of IVF decreased with female age and that women between 25 and 30 years were the most likely to have a live birth 30 .
Considering the evolution of technology and the importance of other variables, Nelson and Lawlor developed a new model in 2011, using the same mathematical techniques, but including other factors such as the most prevalent causes of infertility, the source of the egg (donor or patient's own), type of hormonal preparation used (antioestrogen, gonadotrophin, or hormone replacement therapy), whether or not ICSI was used, and the number of previous cycles (1, 2 or 3) 36 .
Taking into account that it is imperative to have validated models 38  Other relevant models of note include the one developed by Marca et al. which predicts live birth in assisted reproduction based on serum anti-Müllerian hormone (AMH) and women's age 35 , and the Mc Lernon et al. study 37 that estimated the cumulative personal chance of a first live birth over a maximum of six complete cycles of IVF using data from 23 417 women in the UK. The estimates of the last study 37 were only adjusted by the woman's age, infertility duration, previous pregnancies, infertility causes, and type of treatment.
Nonetheless, when Leijdekkers performed an external validation of Mc Lernon's model 39 with Dutch women, it was found that there was an overestimation of the results and he decided to include biomarkers such as anti-Müllerian hormone (AMH), antral follicle count (AFC) and women's body weight. Other studies indicate it is important to include covariables such as body mass index (BMI), ethnicity and ovarian reserve 34 and corroborate that the use of variables such as BMI, AFC, AMH, ethnicity, and smoking status should be predictors in the model 5,14,40,41 . Furthermore, a machine learning approach was also proposed, which includes decision trees, genetic algorithms, and k nearest neighbors classifiers 21,42-45 .
The main aim of this study was to integrate all the variables of interest referred in the literature to develop a validated model that estimates the chance of live birth for couples before they start their IVF non-donor cycle.

Methods
The present work is a retrospective study of data from IVF/ ICSI cycles. The cycles were performed between 2012 and 2016 in the Centro de Infertilidade e Reprodução Medicamente Assistida (CIRMA) at Hospital Garcia de Orta, E.P.E., Almada, Portugal. 739 couples were considered once they met the criterion of being fresh non-donor cycles or cycles with live birth or without frozen embryos available.
The study was approved by the Hospital's Ethics Committee for Health, couples signed an informed consent document before treatment begins at the first infertility consultation. Patients were informed that, under complete anonymity, the data may be used in scientific papers for public presentation or publication 46 .
The IVF result for each couple in the study was classified as 1 (if, at least, one baby was born alive and survived for more than 1 month) and 0 (otherwise), which is the model's dependent variable/primary outcome. This decision was based on other studies in this area 30,34-37 .
In terms of the baseline characteristics used to develop this model and based on evidence from the published literature in this area 30,32 the woman's and man's age (years), duration of infertility (months), cause of infertility (categorised as tubal factor, endometriosis, disovulation, male factor, both female and male factor -depending on whether the cause of infertility underlies the woman or the man -multiple female factors, unexplained infertility or other), woman's and man's BMI (kg/m 2 ), AMH (ng/mL), AFC (antral follicle count), woman's and man's ethnicity (categorised as Asian, Caucasian, Gipsy, Indian, Black or Mixed), woman's and man's smoking status (never, previous, present) and woman's and man's previous live children (yes or no) were considered. Antral follicle count was obtained by transvaginal ultrasound, and serum anti-Müllerian hormone levels were measured by blood analysis. There should be no bias associated with this study, however, possible sources of bias may arise from couples responses during the consultation and the entry of the data in the database.
A summary statistical evaluation was performed for each variable. A univariate analysis was first performed. All continuous variables were compared using the t-student test and the categorical one using the Chi-square test. A p-value <0,05 was considered statistically significant 47 .
A binary logistic regression 48 was developed using the primary outcome as binary (no=0 and yes=1) to achieve the aim of this study. An automatic backward selection process based on the Wald statistic was used to determine the final and the best logistic regression model 49 . The model was based on the data from 737 couples (99,73%), because in 2 couples there was data missing (lack of BMI values in two couples) and as Iezzoni 50 indicated, it is not recommended to input missing values in health data.
As recommended by Hosmer and Lemeshow 48 the interactions between some variables were included, to increase the reliability of the model because as Fisher 51 explained, the primary outcome can depend not only on one individual baseline characteristic but also on the relationship between baseline characteristics. The variables in interaction were: female's age and infertility duration 52 , BMI and oligospermia 53 , AFC levels and women's BMI 13 and AFC and AMH levels 40 .
The model's performance was measured with the discriminative power assessed using the area under the curve (AUC) value 54,55 and the model was internally validated using the bootstrapping technique with 1000 iterations 56 . All statistical procedures were computed using IBM SPSS Statistics 25, and it was considered an α level of 0.05, which means that was assumed that there isn't statistical significance if the p-value of the test is above 0.05 The STROBE cross-sectional reporting guidelines were adopted in this article 57 .

Results
A total of 737 cycles were evaluated. The overall rate of at least one live birth was 31.4%.
Baseline characteristics of couples are presented in Table 1 and  Table 2.
The average age for female participants was 34.04 years (Table 1), and 36.14 for male participants. Younger women and men were more likely to achieve a live birth. The same was seen for men and women with lower BMI and shorter infertility durations. However, these results were only statistically significant for women's age, men's age, AFC and AMH (p<0.05). Table 2 shows that most women and men had never smoked. Most men and women were Caucasian and the male factor, which means male infertility, is the most prevalent infertility cause in this data. Most men and women had no previous live births (90,1% and 87,4%). Chi-square test revealed that no discrete characteristics have statistically significant differences for live birth (p>0.05). For this reason, interactions on different characteristics were considered.
In Table 3, the binary logistic regression parameters computed with SPSS are presented. These parameters allow the assessment  of the chance of living birth on couples before they start their IVF non-donor cycle based on the variables previously mentioned.
The logistic regression results showed that from the 14 variables evaluated on the pre-treatment procedures, only male factor, the man's BMI, mixed ethnicity for men, and the level of AMH were statistically significant. Apart from these variables, the interactions between infertility duration and women's age, infertility duration and men's BMI, AFC and AMH, AFC and women's age, AFC and women's BMI and AFC and disovulation were also statistically significant (p-value less than 0.05). The interactions between infertility duration and woman's age and man's BMI; AFC and AMH, woman's age, woman's BMI and disovulation were also statistically significant.
According to regression coefficients of the final model, an AMH unit increase raises the probability of IVF-ICSI success by 0.172 times, ceteris paribus. Similarly, the interaction between  AFC and the woman's BMI decreases the same probability by 0.003 times and interaction between AFC and the woman's age increases it by 0.004 times. Male factor is the only infertility cause that enters individually into the final model raising the chances of success by 0.420 times.

Discussion
So far, providing an accurate prediction of the chances of achieving a live birth after FIV-ISCI treatments has not been an easy task 32 . In this study, a novel prediction model was built, including almost all clinical factors reported as important in the literature.
To our knowledge, this is the first model for live birth FIV-ICSI prediction that accounts for men's ethnicity. It also includes variables such as BMI and AFC, which were not included in many models before 34 . This model includes all the characteristics that have been indicated as important in the literature to predict the chances of live birth in MAR 5,30,32 .
The Templeton model 30 considers the existence of previous IVF cycles and has been externally validated by Nelson and Lawlor 36 . Therefore, both these models can be applied before IVF is started and predict the success of IVF/ICSI treatment. During this study, it was not possible to obtain information about the number of previous cycles per couple, and so this factor was not included.
This model supports earlier findings in this scientific area such as the importance of AMH and AFC to predict live birth as predicted by other studies 35,40 and indirectly corroborates the importance of woman's age to predict the result as shown by Templeton et al. 30 . Nonetheless, it is crucial to refer that the increase of women's age, when in interaction with infertility duration, significantly decreases the probability of a living birth which is in accordance to Nelson's study 36 .
Concerning the main reason for treatment, both male factor and disovulation (with AFC interaction) are the only causes that emerged in the final model. These causes raised the chances of success, which might be explained with the IVF-ICSI technique itself once the problem of sperm and ovulation anomalies are overcome 18 . In other words, women with ovulation problems and men with sperm anomalies have higher chances of success with IVF-ICSI technique because sperm and oocytes are medically collected and interact directly, although in an in vitro environment 18 .
Another notable finding is that an increase of the man's BMI (per si) or woman's BMI (with AFC interaction) decreased the chances of live birth, which is in accordance with the scientific literature. Women who are overweight are known to have ovulatory problems, and increased risks of miscarriage 10 and obesity may adversely affect male reproduction by endocrine, thermal, genetic, and sexual mechanisms 10 . For instance, increased testicular temperature can result from prolonged periods of sitting in men with excessive lower abdominal fat deposited in the lower abdomen 58 . Also, obesity tends to increase estrogen levels and reduce testosterone levels in men 59 .
Concerning the man's ethnicity, we found that if the man is of mixed ethnicity, the odds of success are around 3.8. This finding must be interpreted cautiously as the sample is composed of 92% Caucasian males, with only 2.6% of males being of mixed ethnicity. Up to now, there doesn't seem to be any biological plausibility for this find, further researchers is required to validate this possble relationship between mixed ethnicity in men and increased success in IVF treatments.
A limitation of our model is that it is restricted to pre-treatment stages. Thereby, the physician must explain to patients when using our model that their success probability invariably changes during the cycles process. The resulting probability of our model should be considered a baseline one. Predictive ability of the prediction models on medicine has been assessed by the AUC 54,55 . In general, AUC for prediction models in reproductive medicine is rather low, ranging between 0.59 and 0.64 55 . Table 4 shows the values of models' AUC predicting the chances of live birth for couples undergoing IVF-ICSI. Although McLernon et al. 37 have the highest value of the area under the receiver operating characteristic (AUROC) curve, that value decreased on Leijedekkers validation for 0.62 39 . The model developed in this study has an AUC of 0.700, which is the second-highest value, and so has a comparable discriminatory ability with these previous models.
Taking into account that Coppus et al.'s systematic review concluded that prediction models in reproductive medicine would be limited to an AUC value of 0.65 due to the relatively The present model predicts the specific probability of a live birth based on easily acquirable couple characteristics before starting a treatment. It might help patients to understand the limitations of an IVF/ICSI in their particular case and also physicians to compare different treatment strategies. Furthermore, it might support institutions to predict the probability of the need for treatment repetition based on the specific characteristics of each couple.
We intend in the near future to perform an external validation with the developed model with a new dataset from Centro de Infertilidade e Reprodução Medicamente Assistida (CIRMA). Follow this, it is also planned to perform an external geographical validation to check if there is a geographical influence on chances of live birth for couples undergoing IVF-ICSI.

Conclusions
Many couples with fertility problems ask themselves if they should undergo an IVF-ICSI treatment. Our novel model provides an estimated probability of their chances of live birth. This is the first model with Portuguese data and takes in account the important variables described in literature such as AFC, AMH, and woman's age. This tool may help physicians to shape couples' expectations conceding them the opportunity to plan their treatments and to prepare both emotionally and financially to them and so we are developing a user-friendly interface to help physicians in their clinical practice.

Data availability
Underlying data The data that support the findings of this study are available on request from the author José Metello (jose.metello@hgo. min-saude.pt). The data cannot be made publicly available in accordance to paragraph d), number 2, Article 9 of the Regulation no. 2016/679 of the European Parliament and the Council of 27 April 2016 and due to the sensitive data containing information that could compromise the privacy of research participants. Informed consent was provided by patients who participated in the study for the use of their data for scientific purposes only and to safeguard their anonymity.