Early embryo mortality in natural human reproduction: What the data say

How many human embryos die between fertilisation and birth under natural conditions? It is widely accepted that natural human embryo mortality is high, particularly during the first weeks after fertilisation, with total prenatal losses of 70% and higher frequently claimed. However, the first external sign of pregnancy occurs two weeks after fertilisation with a missed menstrual period, and establishing the fate of embryos before this is challenging. Calculations are additionally hampered by a lack of data on the efficiency of fertilisation under natural conditions. Four distinct sources are used to justify quantitative claims regarding embryo loss: (i) a hypothesis published by Roberts & Lowe in The Lancet is widely cited but has no practical quantitative value; (ii) life table analyses give consistent assessments of clinical pregnancy loss, but cannot illuminate losses at earlier stages of development; (iii) studies that measure human chorionic gonadotrophin (hCG) reveal losses in the second week of development and beyond, but not before; and (iv) the classic studies of Hertig and Rock offer the only direct insight into the fate of human embryos from fertilisation under natural conditions. Re-examination of Hertig’s data demonstrates that his estimates for fertilisation rate and early embryo loss are highly imprecise and casts doubt on the validity of his numerical analysis. A recent re-analysis of hCG study data concluded that approximately 40-60% of embryos may be lost between fertilisation and birth, although this will vary substantially between individual women. In conclusion, natural human embryo mortality is lower than often claimed and widely accepted. Estimates for total prenatal mortality of 70% or higher are exaggerated and not supported by the available data.


Introduction
Early human embryo mortality is a matter of considerable interest not only to reproductive biologists and fertility doctors, but also to philosophers 1,2 , theologians 3 and lawyers 4 . Most especially, becoming pregnant and having children is of overwhelming and personal importance to many women and their families. As with all biological processes, nothing works perfectly all the time 5 , and failure to conceive and pregnancy loss are common problems. However, among reputable scientific publications, including medical and reproductive biology text books, scientific reviews and primary research articles, reported mortality estimates are surprisingly varied and include: 30-70% 6 , >50% 7 and 75% 8 before and during implantation; >50% 9 , 73% 10 and 80% 11 before the 6 th week; 75% before the 8 th week 12 ; 70% in the first trimester 13 ; 40-50% in the first 20 weeks 14 ; and 46% 7 , 49% 15 , 50% 16-18 , >50% 19,20 , 53% 21 , 54% 22 , 60% 23 , >60% 24 , 63% 25,26 , 70% 27-31 , 50-75% 32 , 76% 10,33 , 78% 34 , 80-85% 35 , >85% 36 , and 90% 37 total loss from fertilisation to term. The variance in these estimates is striking and the scale of some implausible. 90% intrauterine mortality implies a maximal live birth fecundability of 10%, and only then if all other stages of the reproductive process are 100% efficient. Observed human fecundability is low compared to other animals 21 , but at approximately 20-30% 9,38 it is still higher than implied by such a high embryo mortality rate. Such inconsistent estimates of pregnancy loss are not reassuring, nor do they provide a sound basis for either a quantitative understanding of natural human reproductive biology or an unbiased appraisal of artificial reproductive technologies. These divergent and excessive values therefore invite scrutiny of the evidence that supports them. In this article, I identify and reevaluate published data that contribute to claims regarding natural human embryo mortality. Using the available data, I attempt to answer the question: "How many human embryos die between fertilisation and birth under natural conditions?"

A quantitative framework for embryo mortality
A quantitative framework has recently been proposed to facilitate the calculation and comparison of embryo mortalities from fecundability and pregnancy loss data 39 . Briefly, the model comprises conditional probabilities (π) of the following biological processes: (1) reproductive behaviours resulting in spermovum-co-localisation per cycle = π SOC ; (2) successful fertilisation given sperm-ovum-co-localisation = π FERT ; (3) implantation of a fertilised ovum as indicated by increased levels of human chorionic gonadotrophin (hCG) = π HCG ; (4) progression of an implanted embryo to a clinically recognised pregnancy = π CLIN ; (5) survival of a clinical pregnancy to live birth = π LB . Fecundability (FEC) is the probability of reproductive success per cycle, but may take different values depending on the definition of success. The following four fecundabilities broadly follow Leridon (1977) 38 : 1. Total (all fertilisations): FEC TOT = π SOC × π FERT 2. Detectable (implantation): FEC HCG = π SOC × π FERT × π HCG

Apparent (clinical):
FEC CLIN = π SOC × π FERT × π HCG × π CLIN 4. Effective (live birth): FEC LB = π SOC × π FERT × π HCG × π CLIN × π LB Hence, the probability that a fertilised egg will perish prior to implantation is , and prior to clinical recognition is ]. In theory, embryonic mortality may be estimated at different stages; however, in practice, this depends on available data. Clinical and live birth fecundabilities are most easily quantified and most frequently reported. Total and detectable fecundabilities are less frequently reported, although of direct relevance.

BOX 1. Glossary of Key Reproductive Terms
1. Ovum: A female gamete, also known as an egg or oocyte. Ova (pl) are produced by the ovaries of the woman.
2. Spermatozoon: A male gamete. Sperm (or spermatozoa, pl) are produced in the testes of the man.
3. Ovulation: The release of an ovum from the ovary. In humans, ovulation usually involves the release of a single egg in each menstrual cycle.

Fallopian tube:
A narrow tubular extension of the uterus, which opens out next to the ovary. It is also called the oviduct. Following ovulation, the ovum passes into the opening of the fallopian tube and travels towards the uterus.

5.
Coitus: An act of sexual intercourse between a man and woman, usually resulting in the deposition of sperm within the reproductive tract of the woman.
6. Menstrual cycle: An interval of approximately 28 days, which commences with the onset of menstruation. Ovulation occurs mid-way though a menstrual cycle, approximately 14 days before the onset of the next cycle.
7. Amenorrhoea: The absence of menstruation. A missed menstrual period is often the first observable sign that pregnancy has commenced, although there are many other causes.
8. Fertile period: The time in a woman's menstrual cycle during which coitus may result in pregnancy. This period probably varies considerably between women. Coitus up to 6 days prior to and 1 day after ovulation may result in pregnancy although the most fertile days are the day of ovulation and the 2 days beforehand 40 .

Amendments from Version 1
Version 2 has a new figure and glossary to assist readers and help them to follow the timings of key reproductive events.
Sections of text in the Introduction and Discussion have been re-ordered.
In text dates have been provided to aid with understanding of the chronology of studies.
A brief commentary on a study into implantation by Brosens et al. (2014) has been included.
Methods for the Roberts & Lowe simulation study, the calculation of bootstrap confidence intervals, and the calculation of estimated loss from implantation to birth have been clarified.
Comments have been included to note value in Roberts & Lowe's analysis, to affirm the significance of Leridon's critique, and to better contextualise the subjects from Hertig's study.
Some additional studies have been referenced.

REVISED
To aid understanding of the reproductive processes described in this article definitions of key terms have been provided in Box 1, and Figure 1 illustrates the timelines for key biological events associated with fecund and non-fecund cycles.

What the data say
Publications containing data relevant to early human embryo mortality were identified primarily by manually tracing citations found in articles, reviews and textbooks. A PubMed search ("early pregnancy loss" [All Fields]) identified some, but not all relevant studies. Certain studies were not conducted to address the specific question, and others are in books or publications that are not adequately indexed. If not entirely complete, nevertheless the data presented form a substantial proportion of relevant, available scientific data on natural early human embryo mortality.
Studies that contribute analysis and data relevant to the quantification of natural human embryo mortality fall into the following four categories and will be considered in turn.

1.
A speculative hypothesis published in The Lancet.

Studies of early pregnancy by biochemical detection of hCG.
4. Anatomical studies of Dr Arthur Hertig and Dr John Rock.

Where have all the conceptions gone?
In 1975, a short hypothesis published in The Lancet entitled "Where Have All The Conceptions Gone?" concluded that 78% of all conceptions were lost before birth 34 . It has been widely cited by both scientists 9,25,27,28,41 and non-scientists 42,43 alike. Conceptions among married women aged 20-29 in England and Wales in 1971 were estimated and compared to infants born in the same period. In this analysis (Table 1) there are reliable values, e.g., census data, and simple arithmetical calculations. However, speculative values are necessary to perform the calculations. Three are biological: (1) fertilisation rate following unprotected coitus during the fertile period was estimated as 50% and supported by reference to Hertig 44 (although his estimate was 84% 5 ); (2) the length of a menstrual cycle (28 days); and (3) the duration of the fertile period (2 days). These latter values are plausible, but also variable ( Figure 1). No justification is provided for three behavioural variables: (1) coital frequency estimated at twice per week; (2) proportion of unprotected coital acts estimated at 25%; and (3) either a random or regular distribution of coital acts during menstrual cycles such that 1/14 of all coital acts fall within a fertile period.
The validity of Roberts & Lowe's conclusion depends largely on the accuracy and precision of these speculative values. The following two simple analyses illustrate the sensitivity of their conclusion on the speculative values.

1.
When four of the speculative values are reduced by 25% (e.g., coital frequency reduced to 1.5/week) and cycle 9. Fertilization: The fusion of a spermatozoon and an ovum, which usually takes place in the fallopian tube up to 24 hours after ovulation.
10. Conception: A biologically imprecise term meaning either 'the coming into existence of a new human being' or 'the beginning of a pregnancy'. It is often used synonymously with fertilisation but may also refer to implantation.
11. Embryo: A newly fertilised ovum until the eighth week of development.
14. Implantation: The biological process that begins when a blastocyst attaches to the lining of the uterus approximately 6-7 days after fertilisation. The embryo subsequently becomes embedded within the uterine lining.
15. Human chorionic gonadotrophin (hCG): A protein produced by the embryo. It signals to the mother that an embryo is present and prevents menstruation and the loss of the embryo. Elevated levels of hCG can be detected in the serum or urine of a woman from around the time of implantation.
16. Fecundability: A measure of reproductive potential. It is the probability of becoming pregnant in a single menstrual cycle. Fecundity is often used to mean the probability of achieving a live birth in a single cycle. A fecund cycle is one in which fertilisation occurs.

Pregnancy:
The condition of a woman harbouring an embryo, fetus or unborn child. When pregnancy begins is a matter of some confusion 7 ( Figure 1). Pregnancy may be considered to commence with fertilisation and lasts approximately 38 weeks. Clinicians often time the onset of pregnancy from day 1 of the last menstrual cycle, 2 weeks before fertilisation, and refer to subsequent time as a period of gestation. On this account, pregnancy or gestation lasts approximately 40 weeks. Some scientists and legal judgements define pregnancy as beginning with implantation, one week after fertilisation. This definition is of particular utility in the context of IVF treatment where evidence of implantation is the earliest sign that a transferred embryo has developed normally and that fertility treatment has, up to that point, been successful. For some women, the start of a pregnancy may be noted with the first missed menstrual period, approximately 2 weeks after fertilisation, or a positive pregnancy test.
18. Miscarriage: The premature termination of a pregnancy leading to loss of a developing embryo or fetus. Embryo loss may occur before a woman knows she is pregnant. Miscarriage late in pregnancy is often called abortion, with a cut-off of approximately 20 weeks gestation used to distinguish between miscarriage and abortion.

Early Pregnancy Loss:
This usually refers to the loss of an embryo very early in pregnancy, even before a clinical diagnosis is made, when a woman would not be aware of the pregnancy. Such losses are also called occult, because they are hidden, or biochemical, because they can only be identified by detecting hCG. Pregnancy loss shortly after a clinical diagnosis may also be described as early.
length increased by 10% (from 28 days to 31 days 38 ), the estimate for embryo loss drops to 22%. The opposite operation (e.g., coital frequency increased to 2.5/week) results in an estimate of 92% (Table 1). Embryo loss of 22% is barely sufficient to account for observed clinical losses, and 92% indicates a maximum FEC LB of 8%. Neither scenario is biologically plausible.

2.
A non-zero variance was applied to each speculative value reflecting their uncertain nature. Using the random number generator in Microsoft ® Excel (Office 2010) simulated values were obtained by random sampling from normal distributions with means equal to Roberts & Lowe's speculative values with coefficients of variation equal to 20%. For simplicity, it was assumed that there was no covariance between the different speculative values. Table 1 shows the expected range within which 95% of these simulated values fall (e.g., coital frequency is 1.2-2.8/week). For each simulated record, a new estimate of embryo loss was calculated, and from 10,000 of these, the mean, median and 2.5 th and 97.5 th percentiles of embryo loss were determined. This step was repeated 1,000 times: the mean value of the simulated means was 73.3% and of the simulated medians was 76.5%. The mean values of the 2.5 th and 97.5 th percentile boundaries for embryo loss were 37% and 90% (Table 1). Separately, the same simulation was performed using NONMEM 7.3.0 ® (Icon PLC, Dublin, Eire) to generate 100,000 data records which are represented in Figure 2. The code and simulated data values are in Dataset 1.

Figure 1. Schematic representation of timelines and key events in (A) non-fecund and (B) fecund menstrual cycles.
Menstrual cycle lengths vary considerably and most fall within a range of 20 to 40 days 45 . A typical menstrual cycle is usually represented as lasting for 28 days, as shown here. Differences in cycle length are mostly due to variations in the duration of the follicular phase, the time from the onset of menstruation to ovulation. The time from ovulation to the onset of the next cycle, the luteal phase, is more consistently 14 days. Therefore, in the typical 28 day cycle, ovulation occurs midway at around 14 days. The fertile period (shown in light blue) is the time during which coitus may result in a pregnancy. The probability of pregnancy is highest when coitus occurs in the two days leading up to ovulation 40 . In a normal fecund cycle, fertilisation occurs within hours of ovulation in the fallopian tube, after which point an embryo is present and development begins. Embryonic development may fail at any stage from fertilisation through to birth. 6-7 days after fertilisation, the embryo begins to implant in the uterine wall at which stage human chorionic gonadotrophin (hCG) produced by the embryo becomes detectable in urine or serum samples. The onset and duration of pregnancy may be defined in various ways: gestational pregnancy (typically used in clinical practice) is timed from the first day of the last menstrual period; developmental pregnancy begins with fertilisation; in an IVF treatment cycle, although an embryo is present in the uterus immediately following embryo transfer, pregnancy is not considered to be established until there is evidence of implantation, usually provided by elevated hCG levels. The earliest point at which a woman could observe that she is pregnant is approximately 14 days after ovulation/fertilisation with the first missed period. The stage at which pregnancies are clinically confirmed depends on study design and clinical practice, and may be at gestational day 28 (i.e., first missed menstrual period, Zinaman (1996)  The sole purpose of these simple sensitivity analyses is to illustrate that modest adjustments to Roberts & Lowe's original speculative values can result in any biologically plausible estimate for embryo loss. Whilst their analysis is useful for highlighting factors that influence observed fecundity, the output from the calculation remains substantially dependent on the subjectively selected input. Consequently, their analysis has no practical quantitative value.
Other sources of bias in their model include the failure to account for intentionally terminated pregnancies and the reduced fecundability of already pregnant women and nursing mothers. Despite this, it was described as "persuasive" 50 and it has been claimed that "it is still difficult to better the original calculations of Roberts and Lowe (1975)" 27 . By contrast, others have noted that "their calculations can be criticized" 9 and are "tenuous" 51 . Considering its quantitative limitations, it has been cited surprisingly often 13,28,52,53 .

Life tables of intrauterine mortality
Constructing a life table of intrauterine mortality is challenging since embryonic death may occur even before the presence of an embryo is recognised. Nevertheless, in 1977, the distinguished demographer Henri Leridon published an impressive critique and analysis of pregnancy loss data, and a complete life table of intrauterine mortality 26 . Leridon highlighted the consequences of inappropriate analysis and the quantitative biases produced by  1962 and 1970 47,54-58 . These data are summarised in Figure 3 and suggest that 12-24% embryos alive at 4 weeks' gestation (i.e., approx. 2 weeks' post-fertilisation, see Figure 1) will perish before birth.  Figure 3).
All recorded pregnancies in the Kauai study were categorised by date of enrolment in four week intervals, beginning with 4-7 weeks' gestation. This time-staggered approach enabled risk of miscarriage to be associated with stage of gestation. However, despite considerable efforts, only 19% of the 3,197 recorded Kauai pregnancies were enrolled between 4-7 weeks' gestation, thereby reducing the precision of pregnancy loss estimates for this earliest of time intervals. Although pregnancies were grouped in four week periods, Leridon suggested that early mortality may change week by week,  resulting in underestimation of pregnancy loss. He re-allocated the 592 study entries and 32 pregnancy losses for weeks 4-7 (Table 2) generating an overall probability of pregnancy loss during this period of 15.0%, higher than 10.8% originally reported 47 . Leridon's own description of this interpolation as "risky" can be illustrated by adjusting his re-allocation 26 . Transferring just two of the pregnancy losses out of or into the first week results in estimates of the 4-7 week pregnancy loss of 10.9% and 19.1%  respectively ( Table 2). The validity of adjusting Leridon's re-allocation may be questioned. However, pregnancy loss in week 4-5 of the Kauai Study would manifest as a menstrual period delayed by up to one week. This is far from being a robust pregnancy diagnosis and in a different study 57 , exclusion of pregnancy losses reported within one week of study entry resulted in substantially different loss probabilities ( Figure 3) suggesting a confounding correlation between entry and loss 26 . Nevertheless, the reallocation does reinforce a concern highlighted by Leridon, namely the uncertainty that affects the first probability. Clearly, these estimates of early loss should be treated with caution.
A more fundamental problem is that these data offer no insight into the fate of embryos prior to the earliest possible point of clinical pregnancy detection. Leridon completed his life table with values from Hertig's 1967 analysis 5 . He concluded that among 100 ova exposed to the risk of fertilisation, 16 are not fertilised, 15 die in week one (between fertilisation and implantation), and 27 die in week two (between implantation and the time of the first missed period). After two weeks his life table follows the Kauai probabilities closely ending with 31 live births. Leridon's table therefore indicates an embryo mortality of 50% (42/84) within the first two weeks after fertilisation and a total mortality of 63% (53/84) from fertilisation to birth.
Leridon's account of intrauterine mortality has been widely cited. However, its accuracy depends entirely on the quality and interpretation of the data from Hertig 5 and French & Bierman 47 . French & Bierman's approach probably resulted in an overestimate of total pregnancy loss and is certainly imprecise in its estimate of embryo loss in the four weeks following the first missed menstrual period. The reliability of Hertig's estimates of embryo loss in the two weeks following fertilisation is considered below.
3. Biochemical detection of pregnancy using hCG Quantification of pregnancy loss requires pregnancy diagnosis. The earliest outward sign of pregnancy is a missed menstrual period, approximately 2 weeks after fertilisation, although amenorrhoea in women of reproductive age is not exclusively associated with fertilisation 59,60 . Several potentially diagnostic pregnancy-associated proteins have been identified 61 of which only one, Early Pregnancy Factor (EPF) 62 , has been claimed to be produced by embryos within one day of fertilisation. However, there is doubt about the utility of EPF for diagnosing early pregnancy 63 and little has been published on it in the past five years.
Modern pregnancy tests detect human chorionic gonadotrophin (hCG), a highly glycosylated 37 kDa protein hormone produced by embryonic trophoblast cells 64 . Elevation of hCG around 6-7 days after ovulation is associated with embryo implantation 27,28,65 ( Figure 1). Early assays for the detection of hCG were probably confounded by antibody cross-reactivity with luteinizing hormone 66 but modern tests are more specific and a positive result is a reliable indicator of early pregnancy. Highly sensitive assays have revealed low levels of hCG in non-pregnant women and healthy men 67 ; hence, quantitative criteria and appropriate design are required to distinguish between non-pregnant women and those harbouring early embryos 65,68,69 . Table 3 summarise findings from thirteen studies that used hCG to identify so-called early, occult or biochemical pregnancy loss, i.e., pregnancy loss between the initiation of implantation and clinical recognition 46,48,49,70-79 . (Ellish et al. (1996) 69 is not included since the hCG assay was positive for only 72.5% of clinical pregnancies. By contrast, among the thirteen studies in Table 3 only one clinically-recognised pregnancy was reported undetected by hCG testing 70 . Nevertheless, their estimates of early pregnancy loss (17.4%) and clinical loss (13.7%) are comparable to these other studies.) Each study measured urinary hCG levels except two, which measured hCG in serum 72,73 . Notwithstanding design and subject differences, estimates for clinical pregnancy loss, ranging from 8.3% -21.2% (Figure 4), are similar to previous estimates ( Figure 3). Estimates for early/occult pregnancy loss ranged from 0% to 58.3% in studies 70-74 prior to Wilcox (1988) 49 . This high variance was probably due to reduced specificity and sensitivity of the hCG assays and sub-optimal study design 16,61,[80][81][82][83] . Studies from Wilcox (1988) 49 onwards have  Table 3. Summary data from thirteen studies using hCG detection to diagnose pregnancy and identify early pregnancy loss. Raw FEC HCG is the ratio of hCG pregnancies detected and the number of cycles monitored in each study. Where available, mean (SD) ages of the participating women are taken directly from the published study. In some cases mean and SD (indicated by *) or SD (indicated by †) were estimated based on published demographic characteristics. § These data relate to the whole study cohort (n=124) which included known sub-fertile women, and not just to the 74 apparently fertile women. ‡ Mean value from Wilcox et al. (2001) 84 . ¶ Some studies only provide data up to late pregnancy (e.g., up to 28 weeks) rather than to term. ND = no data. ¤ Wilcox subsequently reported an additional hCG pregnancy which had not been detected and reported in the 1988 paper, making a total of 199 hCG pregnancies and 44 pre-clinical losses in the study group 85 . # Mumford reported data from aspirin-and placebo-treated subjects who had at least one prior miscarriage. Summary data from both treatment groups are included as there was no effect of aspirin 79 .

First author
Year  (1988), pregnancy loss from first detection of hCG through to live birth is approximately one third (Table 3). This is consistent with another recent study which found that 98 out of 301 (32.6%) singleton pregnancies diagnosed by an early positive hCG test and followed-up to either birth or miscarriage were lost 89 . See README.docx for a description of the file.
The much cited Wilcox (1988) study 49 is the earliest of several large well-designed studies that made use of a specific and sensitive hCG assay and led to numerous further publications 40,84-87,90-92 . Two other studies (Zinaman (1996) 46 and Wang (2003) 48 ) were similar in purpose, design and execution. These studies provide some of the best available data to calculate pregnancy loss between implantation and birth 39 . In each study, women intending to become pregnant and with no known fertility problems were recruited and hCG levels monitored cycle by cycle in daily urine samples until they became pregnant. Most women were followed through to late pregnancy or birth. Although these studies provide evidence regarding the outcome of both clinical and hCG pregnancies, determining the fate of embryos prior to implantation is more difficult. To relate the study results to pre-implantation embryo loss, it is necessary to determine fecundability. In each study FEC CLIN declined in successive cycles as the proportion of sub-fertile women increased. Why do a proportion of menstrual cycles in women attempting to conceive fail to show any increase in hCG? Since FEC HCG = π SOC × π FERT × π HCG , there can be various causes for this failure including mistimed coitus, anovulation, failure of fertilisation or preimplantation embryo death. Although FEC HCG puts limits on the extent of pre-implantation embryo loss, uncertainty in the estimates of π SOC , π FERT and π HCG translates into uncertainty in estimates of pre-implantation embryo mortality. In the Wang study, for normally fertile women, FEC HCG = 46.2%; hence, the absolute maximum value for pre-implantation embryo loss must be 53.8%, although only if π SOC = π FERT = 1, conditions both extreme and unlikely 39 . Studies of the relationship between coital frequency and conception indicate that fecundability is greater with daily compared to alternate day intercourse 39,93,94 . Hence, when coital frequency is less than once per day a proportion of reproductive failure will be due to mistimed coitus, i.e., π SOC < 1. In the Wilcox study, coitus occurred on only 40% of the six pre-ovulatory days 39,40 , and in the Zinaman study participants were advised that alternate day intercourse was optimal 46 . Based on the difference in fecundability between daily and alternate day intercourse as modelled by Schwartz 94 , a value of π SOC = 0.80 was used to calculate pre-implantation embryo mortality 39 . However, this is a speculative estimate, and in reality the value may be higher, or lower.
A further critical missing piece of the equation is knowledge of the efficiencies of fertilisation and implantation under normal, natural, propitious circumstances. Assuming that either of these processes may be up to 90% efficient, and based on data from the three hCG studies 46,48,49 , a plausible range for pre-implantation embryo loss in normally fertile women is 10-40% and for loss from fertilisation to birth, 40-60% 39 . Even with these wide ranges of mathematically possible outcomes, it is clear that estimates for total embryonic loss of 90% 37 , 85% 36 , 83% 2 , 80-85% 11,35 , 78% 34 , 76% 10,33 and 70% 27-31 are excessive.
In 1990, Charles Boklage concluded that "at least 73% of natural single conceptions have no real chance of surviving 6 weeks of gestation" 10,95 . Live birth fecundability was estimated as "not over 15%", substantially lower than Leridon's 31%. Despite this discrepancy, Boklage's conclusions were derived from a review of data including several hCG studies 49,65,70-73 and Leridon's analysis 26 . He derived a model describing the survival probability of human embryos comprising the sum of two exponential functions: in which t is the time in days post-fertilization. This is the source of the 73% in the conclusion.
There are, however, serious problems with this analysis. Firstly, data presented as embryo survival probabilities at different times post-fertilization 49,65,70,71,73 are fecundabilities, i.e., successes per cycle, not per fertilised embryo. Secondly, for reasons that are unclear, data from Whittaker 72 and Leridon 26 were excluded from the modelling analysis and the data from an earlier Wilcox report 65 were included twice since this preliminary data had been incorporated into the later report 49 . Thirdly, the modelled data were normalised to a survival probability of 0.287 at 21 days post-fertilization. This value was derived from data published by Barrett & Marshall (1969) on the relationship between coital frequency and conception 93 . Barrett & Marshall had concluded that coitus during a single day alone, 2 days before ovulation, resulted in a conception probability of 0.30. Boklage's value of 0.287 is his calculated equivalent. However, conception in this study was "identified by the absence of menstruation, after ovulation" 93 . Hence, 0.30 (and similarly, 0.287) is a clinical fecundability and not a measure of embryo survival. Furthermore, 0.30 is a non-maximal fecundability, since it was an estimate based on coitus on a single day (2 days before ovulation) within the cycle. Barrett & Marshall clearly report that as coital frequency increased so did the fecundability, up to a maximum of 0.68 associated with daily coitus 93 .
Boklage's analysis can only make biological sense if it is assumed that every cycle in the Barrett & Marshall study resulted in fertilisation. Under these circumstances, failure to detect conception in 71.3% (1 -0.287) of cycles would be due entirely to embryo mortality. However, this is highly implausible and explicitly contradicted by the higher estimate of fecundability reported 93 . Boklage's implicit assumption also contradicts his further conclusion that "only 60-70% of all oocytes are successfully fertilized given optimum timing of natural insemination" 10 . The vertical normalisation of the hCG study data to a value of 0.287 at 21 days is the principal determinant of the parameters that define the two exponential model. Any change in this value would commensurately alter the balance between the two implied sub-populations of embryos.
Since it is evident that the value of 0.287 is neither an embryo survival rate nor even a maximal fecundability, it follows that quantitative conclusions from this analysis in relation to the survival of naturally conceived human embryos are of doubtful validity.
However, Boklage was right about two things. Firstly, the difficulty of calculating pre-clinical losses: as he put it, "In the place of the necessary numbers for the first few weeks of pregnancy we find editorially acceptable estimates which, while perhaps not far wrong, are difficult to defend with any precision". Secondly, the source of some of the only directly relevant data (even though he excluded it from his modelling analysis), namely, "Hertig's sample is, and will probably remain, unique".

The anatomical studies of Dr Arthur Hertig
At the start of the 1930s, no-one had ever seen a newly fertilised human embryo. It was barely 60 years since Oscar Hertwig had first observed fertilisation in sea urchins 96 , and just 40 years before the birth in 1978 of Louise Brown, the first test tube baby 97,98 . In Boston, Dr Arthur Hertig and Dr John Rock's search for early human embryos generated an irreplaceable collection, and set an influential benchmark for the scale of early human embryo mortality.
The so-called "Boston Egg Hunt" began in 1938 99 . Hertig and Rock recruited 210 married women of proven fertility who presented for gynaecological surgery 44 . (In most of their publications, the number is given as 210 5,100,101 although 211 subjects are mentioned elsewhere 44 .) Of these, 107 were considered optimal for finding an embryo because they apparently: (i) demonstrated ovulation; (ii) had at least one recorded coital date within 24 hours before or after the estimated time of ovulation; (iii) lacked pathologic conditions that would interfere with conception. Hertig examined the excised uteri and fallopian tubes, and over fifteen years found 34 human embryos aged up to 17 days 5,44,100-107 . Of these, 24 were normal and 10 abnormal 5,100 . (There is some confusion over this: in three publications 44,101,107 , 21 embryos are described as normal and 13 as abnormal. It appears that the three alternatively described embryos (C-8299; C-8000; C-8290) were originally defined as abnormal based on their position or depth of implantation 44 .) Table 4 provides information about the 34 embryos found in these 107 women. Although the study was primarily intended to find and describe early human embryos, Hertig subsequently used the data to derive estimates of reproductive efficiency including early embryo wastage 5,100 .
Hertig's analysis 5,100 relies heavily on the 15 normal and 6 abnormal implanted embryos found in 36 women from cycle day 25 onwards.  The embryos were collected from 107 out of 210 women. *In Hertig's figure, day 28 of the ovulatory cycle is identified with day 1 of the next cycle and is the day of the presumed missed period in cases where pregnancy had commenced. The 36 cases that provide the evidential foundation for his numerical analysis are shown in bold. He assumed the 6 abnormal embryos would perish around the time of the first period concluding that fertility (% pregnant) at this stage = 42% (15/36). Of the 8 pre-implantation embryos identified (7 in the uterus and 1 in the fallopian tubes), 4 were abnormal. Hertig assumed the 4 normal embryos would implant successfully but that some of the abnormal ones would not, such that the proportion of normal embryos would increase from 50% (4/8) before implantation to 71% (15/21) after implantation as observed. Hence, among the 36 post-cycle day 25 cases, in addition to the 15 normal embryos, there must have been 15 abnormal pre-implantation embryos of which 60% (9/15) failed to implant and were not observed, and 40% (6/15) did implant and were observed, although these 6 would have perished shortly afterwards. This left 6/36 eggs that must have been unfertilised. The ratio of 'unfertilised': 'fertilised abnormal': 'fertilised normal' was therefore 6:15:15, matching the 16% infertility (no fertilisation), 42% sterility (post-fertilisation death) and 42% fertility (reproductive success) reported in Figure 9 of Hertig's 1967 article, "The Overall Problem in Man" 5 . This is the source of Hertig's 84% fertilisation rate and 50% embryo loss before and during implantation, and is reproduced in Leridon's life table 26 as 84/100 eggs surviving at time zero (ovulation and fertilisation) and 42 surviving to 2 weeks (time of first missed period).

Day of cycle
Hertig provides almost the entire body of evidence used to quantify natural human embryo loss in the first week post-fertilisation. Most claims regarding early human embryo mortality find their source here. Before considering how reliable the figures are, it is worth repeating Hertig's own caveat, namely, the lack of data on the efficiency of natural fertilisation 5 . All estimates of embryo mortality from fertilisation onwards are subject to commensurate inaccuracy in the absence of reliable fertilisation probabilities (i.e., π FERT ), which are "surprisingly difficult to estimate" 21 .
There are several problems with Hertig's analysis. As noted by others, the observations are cross-sectional, but the inferences are longitudinal 108 . Hertig detected 21 embryos from 36 cases (58.3%) from cycle day 25 onwards. If this detection rate were representative, then on average, prior to day 25, the detection rate should either be the same or higher; however, they are all lower, and substantially so (Table 4). Hertig suggested that this was due to the technical difficulty of finding newly fertilised embryos. However, the detection rate for cycle days 18-19 was good (46.7%) and embryos one or two days younger would not have been much smaller, at which stage the detection rate was poor (11.1%). An alternative explanation for this discrepancy might simply be random variation. Furthermore, from cycle day 25 onwards, embryos would probably have produced hCG and therefore FEC HCG would have been at least 58%. This is approximately double the equivalent values observed in more recent and robust hCG studies ( Table 3) further suggesting that this subset of the data is not representative.
Despite having proven fertility, these women presented for gynaecological surgery which, according to Hertig, was "medically essential" 99 . This suggests that the women may have had suboptimal reproductive function, although the effect of this on the quantitative outcome of the study is difficult to gauge. Furthermore, Hertig's reproductively 'optimal' coital pattern does not include 2 days pre-ovulation and does include one day post-ovulation, conditions which are known not to maximise fertilisation 39,40,93,94,109 . Hence, detection rates before cycle day 25 may be more representative than those after. Given the numerical discrepancies, they cannot both be.
Hertig does not provide error estimates with his conclusions. In order to estimate the precision of his derived proportions, a bootstrap analysis was performed as follows: Hertig's 107 optimal cases were categorised according to stage of cycle (Category 1 = cycle days 16-19 (n=24); Category 2 = cycle days 20-24 (n=47); Category 3 = cycle days ≥25 (n=36)), and presence and type of embryos (Category 0 = no embryo (n=73); Category 1 = normal embryo (n=24); Category 3 = abnormal embryo (n=10)). Five hundred pseudo-datasets each containing 107 cases were generated using a balanced random re-sampling method using Microsoft Excel ® . The original and pseudo datasets are in Dataset 4. The congruence between these confidence intervals and the point estimates provides some reassurance that that the bootstrap procedure worked effectively. Estimates of parameters other than the day 25 detection rate (58%) are derived from more complex proportional relationships, and are therefore less precise. Table 5 reproduces a life table in the style of Leridon 26 and includes probabilities for each reproductive step with confidence intervals. These intervals (and some noted above) are impossibly wide highlighting further problems with Hertig's analysis.
Hertig's analysis omits 47 cases from cycle days 20-24, comprising 44% of his data. It is clear why he cannot use it, since all five embryos were normal and, given his mathematical and biological assumptions, five normal implanting embryos could not become 29% (6/21) abnormal post-implantation. Others have also noted that these "missing data are sufficient to engender an entirely different result" 16 . Furthermore, the data that define the 50% proportion of abnormal pre-implantation embryos (i.e., 4/8) are so few that any numerical variation will make a substantial difference to derived proportions. If he had observed 3/8 abnormal embryos, his estimate of pre-implantation loss would have been 13% rather than 30%: for 5/8 it would have been 48%, with a fertilisation rate of 111%, which is clearly impossible. It seems therefore, that Hertig designed his analysis based on a post-hoc examination and selective use of the data. His own caveat about the lack of relevant and necessary data should be taken at least as seriously as his conclusions.
Hertig and Rock's contribution to human embryology is undeniable and their quantitative conclusions have profoundly influenced our impression of the extent of early human embryo mortality. Regrettably, their estimates have a cripplingly low precision, which undermines their biological credibility or utility. In conclusion, Hertig's data and flawed analysis cannot be regarded as a reliable quantitative foundation upon which to evaluate and understand natural human reproduction.

Discussion
Answering the question "How many fertilised human embryos die before or during implantation under natural conditions?" is difficult. Relevant, credible data are in short supply. Among regularly cited publications, the Lancet hypothesis 34 is entirely speculative and in the view of the current author should cease to be used as an authoritative source. Clinical pregnancy studies are only useful for quantifying clinical pregnancy loss and contribute nothing to estimates of embryo mortality in the first two weeks' post-fertilisation. Even Hertig's unique dataset is inadequate to draw quantitative conclusions and oft-repeated values should be treated with scepticism. The hCG studies from 1988 onwards provide the best data for estimating embryo mortality although a lack of information on fertilisation success rates 5,16,21,23,113 prevents satisfactory completion of the calculations. A recent re-analysis of these data proposed plausible limits for reproductively normal women indicating that approximately 10-40% of embryos perish before implantation and 40-60% do so between fertilisation and birth 39 . However, these ranges are wide, particularly for pre-implantation mortality, reflecting the lack of appropriate data. Is there any possibility of narrowing down the numbers?
In the 1980s, two separate groups collected embryos from women following carefully timed artificial insemination as part of fertility treatment. Insemination around the time of ovulation in women of proven fertility was followed 5 days later by uterine lavage to recover ova [114][115][116][117] . These data appear to hold promise for determining fertilisation efficiency and some authors have made quantitative inferences about embryo mortality from them 24,27,28 . However, such inferences are complicated by numerous confounding factors. For example, in one series 116 , from 88 uterine lavages following artificial insemination by donor (AID), 4 unfertilised eggs, 6 fragmented eggs, and 27 embryos from 2 cell to blastocyst stage were retrieved.
In the 51 cycles in which no egg or embryo was retrieved, there was one retained pregnancy suggesting that the lavage and ova retrieval efficiency was reasonably high, albeit not perfect. These data therefore suggest that FEC TOT was low (≈31/88 = 35%) although a proportion of fertilised eggs may have completely degenerated within the first 5 days. Assuming π SOC was high (given the targeted insemination), this suggests that π FERT ≈ 50%. In the context of the recent analysis 39 , this implies that π HCG is high and that levels of embryo mortality are therefore towards the lower end of the 10-40% and 40-60% ranges. However, the clinical pregnancy rate following transfer of the embryos was only 40%. This is equivalent to π HCG × π CLIN . If π CLIN ≈ 75%, as suggested by the hCG studies (Table 3), this would mean that π HCG ≈ 50%. This would imply that π FERT is high, fertilised egg degeneration is high, occurs before day 5 and was therefore unobserved, and hence levels of embryo mortality tend towards the upper end of the 10-40% and 40-60% ranges.
It is possible that the lavage/transfer procedure reduced implantation and early developmental efficiency thereby reducing π HCG × π CLIN . A comparison of AID pregnancy rates may provide some insight as suggested by the authors 116 . The clinical pregnancy rate in their pharmacologically unstimulated cohort was 12.5% (11/88) which is lower than an equivalent 18.9% observed for fresh semen AID 118 , and also the live birth rate (which also incorporates clinical pregnancy losses) of 14.7% reported by the HFEA for AID in 2012 in unstimulated women aged 18-34 119 . These different success rates suggest that the lavage/transfer procedure did adversely affect implantation and early gestation with clear implications for quantitative extrapolation. Furthermore, the women who were embryo recipients were receiving fertility treatment and their overall fertility may have been lower than expected in a normal healthy cohort. In summary, it seems that there are too many unresolved variables in these data to narrow down estimates of fertilization (π FERT ) or implantation (π HCG ) rates.
With high fecundability, the range of possible embryo mortality rates falls. Red deer hinds have pregnancy rates of >85% following natural mating 120 : establishing numerical limits for embryo mortality under these efficient reproductive circumstances is more straightforward. By contrast, humans lack the instinct to mate predominantly during fertile periods thereby reducing observed reproductive efficiency substantially. In studies of early pregnancy loss, owing to sub-optimal coital frequency and cohorts including sub-fertile couples, natural fecundability was almost certainly not maximised 39 . Combining data on coital frequency and hCG elevation may help to address this. In 1995, applying the Schwartz model 94 to his 1988 hCG data 49 , Wilcox calculated a FEC HCG value of 36% for high coital frequencies (>4 days with intercourse in 6 pre-ovulatory days) 40 . However, the Schwartz model assumed that cycle viability was evenly distributed among couples, a condition which the authors recognised was not true and is contradicted by a subsequent analysis which suggests that approximately a quarter of the Wilcox cohort was sub-fertile 39 . If possible, focussing analytical attention on normally fertile women with the highest coital frequencies may help to narrow the range of plausible embryo mortality.
In this review of natural early embryo mortality no use has been made of data from in vitro fertilisation (IVF) and associated laboratory studies. Sub-optimal conditions for embryo culture mean that it was 121,122 and probably still is 123 doubtful that reliable values can be extrapolated from laboratory in vitro to natural in vivo circumstances 28 . Importantly, the reproductive stages are also altered. In IVF, π SOC = 1 and for transferred embryos π FERT = 1. Furthermore, transferred embryos are selected based on quality criteria, however inexact those may be 123,124 . IVF program manipulations may reduce π HCG compared to natural circumstances 6 and implantation failure remains a substantial issue for IVF 125,126 . Although for IVF cycles, the reported live birth rate per cycle has gone up (from 14% in 1991 to 25.4% in 2012 119 ), comparison of IVF success rates and natural live birth fecundability values involves too many undefined variables to shed numerical light on early natural embryo development and mortality.
In vitro fertilisation per se may provide some insight into values of π FERT , since π SOC = 1, and successful fertilisation can be observed.
In seven studies of natural cycle IVF, fertilisation was successful in 70.9% (443/625) of attempts [127][128][129][130][131][132][133] . If this represented natural, in vivo fertilisation, based on the recent analysis 39 , it implies that π HCG ≈ 0.75, focusing estimates for pre-implantation embryo loss on 25%, and for total loss on 50%. However, high frequencies of chromosomal aberrations caused by the in vitro handling of human oocytes 134 can render any comparison of natural and assisted reproduction open to criticism 9 .
In calculating summary values of embryo mortality, it is important to note that human fertility is as numerically heterogeneous as it could possibly be. Some couples are infertile and some are highly fertile. Excessive attention to averages and neglect of variances fosters a misleading appreciation of reality. The hCG studies clearly had both fertile and sub-fertile participants: use of overall values underestimated fecundability for the fertile majority 39 . Furthermore, apparently 'optimal' conditions for conception may not maximise human biological fecundability. Other biological factors also contribute to reproductive heterogeneity in humans; however, even after controlling for age-related decline, fecundability remains highly variable 119,135 . For intercourse occurring 2 days prior to ovulation, average fecundabilities resembled those previously published 88 , but for couples at the 5 th and 95 th percentiles, fecundabilities were 5% and 83%. 83% fecundability implies a very low embryo mortality rate. In conclusion, apparent low fecundability in humans need not necessarily be caused by embryo mortality, but also defects of ovulation, mistimed coitus, or fertilisation failure 39 . Where fecundability is low, any or all of these factors may contribute.
Embryo mortality and pregnancy loss are not only a matter of academic scientific interest, and diverse quantitative estimates can also be found in popular media. For example, 70% loss in the first six days is claimed by Michael Mosley in "You made it through the first round" (http://www.bbc.co.uk/timelines/z84tsg8; transcript at http://a.files.bbci.co.uk/bam/live/content/z3b87hv/transcript: accessed on 20 th April, 2017). By contrast a 25% pre-implantation loss is reported by the Science Museum's online exhibit, "Who Am I?" (http://www.sciencemuseum.org.uk/WhoAmI/FindOutMore/ Yourbody/Wheredidyoucomefrom/Howdoyougrowinthewomb/ Whathappensinweek1) : accessed on 20 th April, 2017). News reports, often associated with ethical controversies, also feature estimates of embryo loss. On 1 st February 2016, James Gallagher reported that only 13/100 fertilised eggs develop beyond 3 months (http:// www.bbc.co.uk/news/health-35459054) : accessed on 20 th April, 2017) and on 4 th May 2016, Sarah Knapton reported in the online Daily Telegraph that "two thirds of pregnancies fail because the embryo does not implant properly" (http://www.telegraph.co.uk/ science/2016/05/04/human-embryos-kept-alive-in-lab-for-unprecedented-13-days-so-sci/) : accessed on 20 th April, 2017). In an ethical advocacy video, Bill Nye ("The Science Guy") begins by claiming that "Many, many, many, many more hundreds of eggs are fertilized than become humans" (https://www.youtube.com/ watch?v=4IPrw0NYkMg) : accessed on 20 th April, 2017). Additionally, academic philosophical articles 1 and legal judgements 4 have considered the significance of the scale of embryo loss. Given the breadth of societal interest in this facet of human reproductive biology, it is vital that scientists report plausible and defensible estimates for natural embryo mortality. Above all, it is obvious that women wishing to have children deserve reliable and unbiased estimates of reproductive success and pregnancy failure.
Natural pre-implantation embryo loss remains quantitatively undefined. In the absence of knowledge of π SOC and π FERT it is almost impossible to estimate precisely. Hertig's estimate is 30%; however, mathematically and biologically implausible confidence intervals [-28%, 73%] betray the quantitative weaknesses in his data and analysis. The best available data for quantifying early pregnancy loss are from studies monitoring daily hCG levels in women attempting to conceive. A recent re-analysis 39 of data from three studies 46,48,49 concluded that, in normal healthy women, 10-40% is a plausible range for pre-implantation embryo loss and overall pregnancy loss from fertilisation to birth is approximately 40-60%. This latter range is consistent with Kline's estimate of 50% 16 , and similar to, although a little narrower than the 25-70% suggested by Professor Robert Edwards 136 .
In the absence of suitable data to quantify pre-implantation loss, many published articles and reviews merely restate previously published values 11,28,29 . It has been suggested that "claimed research findings may often be simply accurate measures of the prevailing bias" 137 . Widely held views on early embryo mortality may reflect an entrenched and biased view of the biology. For example, the Macklon "Black Box" review 28 has been cited 212 times (Web of Science TM citations on 12 th April 2017) with many articles explicitly referencing its 30% survival/70% failure values 13,29,125,138-146 . Macklon's quantitative summary in his "Pregnancy Loss Iceberg" (30% implantation failure; 30% early pregnancy loss; 10% clinical miscarriage; 30% live births) is a direct, unedited reproduction of estimates published over 10 years previously 27 . 30% preimplantation loss fairly represents Hertig's conclusions although, as has been shown, this estimate is highly imprecise. However, Macklon misrepresents the best data which he reviews 46,49 . Wilcox reports early pregnancy loss (i.e., [1 -π CLIN ]) of 21.7% whereas Macklon's iceberg implies that 43% (30/70) of implanting embryos fail before clinical recognition. The iceberg's clinical loss rate of 25% (10/40) is also higher than relevant data indicate (Figure 3  & Figure 4). Total loss of implanting (hCG+) embryos (i.e., [1 -(π CLIN × π LB ]) is 57% (40/70) according to the iceberg. By contrast, Wilcox 49 and Zinaman 46 , both included in Macklon's review, report that only 31% of hCG positive pregnancies fail.
Early pregnancy loss of 10-25% is not trivial. Despite difficulties associated with extrapolating in vitro observations to in vivo circumstances, implantation is clearly a crucial biological milestone in embryonic development and pregnancy. Recent studies suggest that biological mediators from embryos may regulate endometrial receptivity resulting in selective implantation of fitter embryos 18,147,148 . Interestingly, supernatant from developmentally impaired IVF embryos deemed unsuitable for transfer provoked a different response in endometrial cells compared to supernatant from developmentally competent embryos that produced an ongoing pregnancy after transfer 18 . However, it would be of interest to compare these responses to that provoked by supernatant from developmentally competent embryos that did not successfully implant and produce an ongoing pregnancy. Arrested or developmentally impaired embryos may lose cellular integrity and release mediators that disrupt endometrial receptivity; however, the intrinsic developmental competence of such embryos must also determine the success of implantation. Implantation failure is a substantial issue for IVF 125 , and distinguishing between obviously impaired and competent embryos is not the principal challenge, but rather distinguishing from among apparently competent embryos those that will successfully implant and those that will not. Striking evidence suggests that only 9% of IVF embryos have a normal karyotype in all their cells 145,149 raising the possibility that the rate of aneuploidy in IVF embryos is artefactually high, or that a degree of mosaic aneuploidy in human embryos is not necessarily developmentally deleterious, or indeed both. Implantation failure of a normal embryo may well be an uncommon event 51 , but defining what is normal is a considerable scientific and conceptual challenge.
If Macklon's 28 (and Chard's 27 ) 70% estimate for embryo loss is excessive, as the data suggest, this casts doubt on claims 125,143 that the frequency of embryonic abnormalities observed in vitro is representative of the natural in vivo situation. In turn, this implies that many of the chromosomal abnormalities observed in in vitro human embryos may be, to a greater extent than currently recognised 125 , an artefact of the clinical and experimental interventions of assisted reproductive technologies. This is not the first time that attention has been drawn to unsatisfactory estimates of early embryo loss. Faced with some of the same data, others have noted that "a claim of 'no significant difference' might easily be sustained against any interpretation proffered" 16 and that estimates are "difficult to defend with any precision" 10 . Conclusions have been based on "poor estimates of fertilization failure rate and the mortality at 2 weeks after fertilisation" 23 and drawn "from unusual or biased samples" 150 . Nevertheless, although precision may be elusive, exaggeration can be avoided. It is hoped that this critical re-evaluation of the data describing early human embryo mortality will serve as a robust foundation upon which to make informed biological, ethical, legal and personal judgements.

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work. Dr. Jarvis assesses the empirical support for the belief that there is a "great deal" of fetal wastage in humans. His conclusion is that there is less wastage than is often believed and that the percent loss between conception and birth is 40-60%. Resolution of this issue is important, as it has substantial implications for our understanding of early human development.
He states (p. 2) that four types of evidence underlie these claims: A speculative hypothesis published in The Lancet. Life tables of intra-uterine mortality. Studies of early pregnancy by biochemical detection of hCG.
Anatomical studies of Dr Arthur Hertig and Dr John Rock. On the basis of his review of this evidence, Dr. Jarvis concludes (p. 12) that "….10-40% is a plausible range for pre-implantation embryo loss and overall pregnancy loss from fertilization to birth is approximately 40-60%." This means that the best estimate of pre-birth mortality according to Dr. Jarvis is consistent with many previous estimates. In order to understand this consistency, it is useful to examine these types of evidence and what Dr. Jarvis makes of each. I discuss them in turn.
The Lancet article is Roberts & Lowe (1975). These authors concluded (p. 498) from their "speculative" 1. analysis of the number of married women age 20-29 in England and Wales and of the number of live and dead births that 78% of conceptions are lost. In order to generate this estimate, the authors estimated the number of conceptions in any given year (based on the number of sexual encounters, probability of fertilization, etc.). Dr. Jarvis assesses the influence of changing the number of conceptions on the estimate of fetal wastage and shows (p. 3) that a low estimate of the number of conceptions results in an estimate of 22% conceptions lost and that a high estimate of the number of conceptions results in an 2 1 1 2 estimate of 22% conceptions lost and that a high estimate of the number of conceptions results in an estimate of 92% loss. He also generates a 95% confidence interval for the loss percentage of 37% -90% by doing a simulation in which each value contributing to the number of conceptions is normally-distributed with a mean identical to Roberts and Lowe's value and a coefficient of variation of 20%. On this basis, he concludes about Roberts and Lowe's analysis that (p. 1) it "….has no quantitative value." and that (p. 4) it "….has no practical quantitative value".
Dr. Jarvis provides a useful sensitivity analysis of Roberts and Lowe's estimate, which should be taken seriously by those who may believe that their analysis is definitive (their paper has been cited more than 300 times, with many citations that point to the 78% estimate). That said, Dr. Jarvis' conclusion that Roberts and Lowe's analysis is quantitatively useless is itself incoherent. A number is a number and as a starting point, their estimate is useful although limited. If their analysis lacks "practical quantitative value" so too does the analysis of Dr. Jarvis. After all, there is no empirical basis for his assumptions about the statistical independence of the components contributing to his estimate of percentage or that these components are normally-distributed or that they have a coefficient of variation of 20%. It is not as though simply making arbitrary assumptions about the variability of parameters somehow means that an analysis is more quantitatively useful than one without such assumptions. The point is that both analyses have value. It is telling in this regard that their estimate is "close" to Dr. Jarvis' estimate. In fact, one could readily claim that Dr. Jarvis's analysis validates Roberts and Lowe's estimate in as much as their estimate is within the 95% confidence interval he generates.
By way of understanding Robert and Lowe's self-described "speculative" work, it is important to note it belongs to the voluminous "gray" literature relating to human pregnancy. This is the literature that is published without much review (if any) and without much requirement for rigor and data. To see this, one need go farther than this passage (p. 498): Animal studies, which allow a more systematic investigation of [pregnancy loss], have shown detectable prenatal losses ranging from 15 to 60% in domestic cattle, sheep, and pigs and in wild forms such as stoats, rats, squirrels, and rabbits.
They cite Austin (1972) for this claim. He merely states (p. 134): The data show that prenatal losses ranging between 15 and 60 per cent occur in cattle, sheep and pigs, as well as in wild forms such as stoats, rats, squirrels and rabbits.
No data are cited! In fact, Austin's gloss on the loss percentage for domesticated species is reasonably accurate (Casida, 1953;First & Eyestone, 1988;Lasley, 1957) although there are less data than one might imagine. It is of note that these species have been selected for offspring production and so how relevant these data are is not completely resolved. Perhaps fetal wastage in their wild relatives would be greater. My guess is that the data alluded to as being from "wild forms" are in papers such as those by Brambell (1942Brambell ( , 1948. That said, to my knowledge, it is not clear that such studies reliably account for early gestational losses. More generally, there are few "wild forms" for which there are estimates.
The overall point is that Robert and Lowe's paper contains a disconnection between data and conclusions that would be sustained even if one read the cited source. Their paper is best viewed as a heuristic exercise. This is not a criticism. It is meant to underscore that Dr. Jarvis' conclusion that their paper is "useless" treats it as something that it isn't. We are ignorant of the training of Drs. Robert and Lowe but like many authors of the gray literature concerning pregnancy, they may have lacked rigorous training in research practice and data analysis. This is not inherently bad, as long as the nature of such publications is properly understood. As a community of scientists, we can make use of their insight into human pregnancy as long as its potential limitations are understood. We need all the help we can get! pregnancy as long as its potential limitations are understood. We need all the help we can get!
The "life tables of intra-uterine mortality" are French & Bierman (1962) and Léridon (1977).The former 2. study is an analysis of pregnancies in Kauai, Hawaii; the authors' conclusion was that approximately 24% of the pregnancies registered with an estimated gestational age of greater than four weeks would die. Léridon married this result with the data of Hertig, Rock, Adams, & Menkin (1959), which provide an estimate of wastage prior to four weeks, to infer that 63% of conceptions die before birth (Table 4.20, p. 81). Dr. Jarvis' cautions about the assumptions that underlie this estimate are reasonable. That said, it is important to note that the Léridon's chapter ("Intrauterine Mortality", pp. 48-81) is no casual exercise. It is the longest chapter in the book and an open-minded reader can see that Table 4.20 is based upon reasonable assumptions that Léridon clearly states do not have as much of a solid empirical basis as would be desired. Unfortunately, Dr. Jarvis' sole mentions of Léridon's caveats are a statement (p. 5) in which Léridon describes (p. 56) an interpolation he makes (in his analysis of French and Bierman's data) as "risky" and another in which his (Dr. Jarvis) reanalyses of the French and Bierman data (p. 5) "reinforce a concern highlighted by Léridon". To this extent, a reader of Dr. Jarvis' paper could easily come away with the mistaken belief that Léridon's analysis is superficial at best. As in the case of Roberts and Lowe's estimate, it is important to note that Léridon's estimate of conceptions lost of 63% is close to Dr. Jarvis' estimate of 40-60%.
"Studies of early pregnancy by biochemical detection of hCG." The modern pregnancy test is based 3. upon an assay of human chorionic gonadotrophin (hCG), an oligosaccharide glycoprotein hormone produced by embryonic cells. An elevated level of hCG is detectable six to fourteen days post-conception (Nepomnaschy, Weinberg, Wilcox, & Baird, 2008;Wilcox, Baird, & Weinberg,1999). By this time, most embryos capable of implantation will have done so. Unfortunately, earlier pre-implantation detection of pregnancy based upon assay of the "Early Pregnancy Factor", a heat-shock protein expressed within 48 hours of conception, is not in widespread use (Clarke, 1997;Fan & Zheng, 1997;Morton, Rolfe, & Cavanagh, 1992;Rolfe, 1982;Shahani, Moniz, Chitlange, & Meherji, 1991;Shahani, Moniz, Gokral, & Meherji, 1995;Smart, Fraser, Roberts, Clancy, & Cripps, 1982). Dr. Jarvis correctly describes the pioneering hCG results of Wilcox (1988) and others (as summarized in Table  et al. 3), which indicate that the percentage loss of conceptions after hCG detection is between approximately 20 and 60%, with many estimates between 30 and 40%; Dr. Jarvis concludes (p. 6) that this percentage loss is approximately 33%.
Dr. Jarvis goes on to estimate that the "…loss from fertilization to birth [is] 40-60%"; this is based on the combination of three estimates based on hCG assay of percentage loss from conception to birth (35.7%: Wang , 2003;31.3%: 31.3%: Wilcox , 1988;31.3% Zinaman, Clegg, Brown, O'Connor, et al. et al. & Selevan, 1996) and his estimate (pp. 7-8) that the efficiency of implantation of embryos "…may be up to 90% efficient…." in order. He concludes that higher estimates of loss from fertilization to birth from the literature are "excessive".
Dr. Jarvis' estimate is likely an underestimate. There is strong circumstantial evidence that many more than 10% of embryos do not successfully implant, as discussed below. The implication of this is that Dr. Jarvis' estimate and the previous estimates are consistent. It is also worth noting that Dr. Jarvis uses an arbitrary estimate for implantation rate, even though he judges other analyses to be useless because they contain an arbitrary parameter estimate.
Dr. Jarvis goes on to criticize Boklage (1990) who estimated the percentage of unsuccessful conceptions Dr. Jarvis goes on to criticize Boklage (1990) who estimated the percentage of unsuccessful conceptions based on an analysis of hCG data (see his Figure 2, p. 84). Dr. Jarvis is right to raise concerns (p. 8) that Boklage's analysis is less definitive than desired. In particular, he states (p. 8) that Boklage's assumption that the 21-day survival rate of conceptions is 28.7% is based upon a misinterpretation of a previous study. That said, Dr. Jarvis makes an unsubstantiated conclusion (p. 8) that "…quantitative conclusions from [Boklage's] analysis in relation to the survival of naturally conceived human embryos are of doubtful validity". This may be true, but this remains to be seen given the lack of any demonstration of the sensitivity of Boklage's quantitative conclusions to changes in the underlying assumptions. Boklage's analysis needs more careful scrutiny than given by Dr. Jarvis. For example, Boklage presents a formula for the percentage loss of conceptions as a function of time (p. 84). Are the coefficients estimated via a standard statistical approach such as maximum likelihood estimation and chosen via a likelihood ratio test or via comparison of AIC values associated with competing models? This is not clear. As such, it is unclear as to what to make of the predictions even putting aside Dr. Jarvis' concerns about the biological validity of some of the underlying data. The equation appears to be based upon the assumption that a cohort of embryos is an admixture of those that are likely to die before six weeks and those that will survive longer. The basis for this assumption is unclear. The lack of transparency of Boklage's equation is underscored by the fact that Dr. Jarvis does not mention that it predicts 75.8 percent fetal wastage between conception and full-term birth (270 days). As above, this estimate is rightly or wrongly consistent with most previous estimates.
The "anatomical studies of Dr Arthur Hertig and Dr John Rock" are investigations of conceptions 4. recovered from uteri obtained via gynecologic surgery. Their results are summarized in Hertig et al. (1959);Hertig & Rock, (1973); Hertig, (1967). As described by Dr. Jarvis (p. 9), Hertig 's et al. conclusion is that 50% of embryos will die within two weeks after conception.
Dr. Jarvis' is correct to point out concerns about their conclusion, although we believe that it has been well recognized that it is "impressionistic" as opposed to something that has a solid quantitative underpinning. Of course, as noted by Dr. Jarvis, their work remains important.
Dr. Jarvis makes some assertions about Hertig 's work that seem mainly intended to accentuate et al. doubts about it as opposed to placing it in proper context. He notes correctly (p. 9) that the sample is cross-sectional and not longitudinal. Given the nature of this study, this was unavoidable. Dr. Jarvis notes there are some unresolved discrepancies among age-specific detection rates for embryos and also between the estimated implantation rate and the rate inferred from other studies. These are worth mentioning but the implications of these discrepancies remain ambiguous in the absence of a quantitative analysis that accounts for sampling variation.
Similarly un-useful is Dr. Jarvis' statement (p. 9) that "Despite having proven fertility, these women presented with gynaecological problems, suggesting suboptimal reproductive function." There is a wide range of "gynaecological problems" and an unanchored assertion that such a broad category might result in "sub-optimal reproductive function" means nothing in the absence of evidence that whatever problems were present had some influence on embryonic viability. In an effort to "estimate the precision" of the various proportions presented by Hertig (e.g., the survival rate to implantation), Dr. Jarvis generated et al. 500 so called "bootstrap" samples from the original data consisting of 107 cases. These samples arise from sampling with replacement of the original data (e.g., see Efron & Tibshirani, 1986;Efron, 1987). Such an investigation is worthwhile, although a bootstrap analysis is not a "cure" for small sample size. In any case, Dr. Jarvis' analyses of the bootstrap results are incorrect. He describes (p. 10) "95% CIs" any case, Dr. Jarvis' analyses of the bootstrap results are incorrect. He describes (p. 10) "95% CIs" for various proportions that are outside of the range of 0-100%. For example, the confidence interval (p. 10) he provides for pre-implantation embryo survival probability is 27-128%. Such an interval cannot be generated by a correct bootstrap analysis. There are various ways to calculate a bootstrap confidence interval (Efron & Tibshirani, 1986). The simplest, known as the "percentile method", generates a 95% bootstrap confidence interval for a proportion directly from the range of proportions associated with the central 95% of the bootstrap estimates. Accordingly, the confidence interval must be between 0 and 100% because each of the bootstrap samples must generate a proportion between 0 and 100%. Dr. Jarvis' mistake appears to be that he estimated an average proportion and its variance from the ensemble of bootstrap estimates and then calculated the confidence interval using standard formulae (p. 10). The purpose of bootstrap estimation is to avoid such calculations, which can generate inaccurate confidence intervals. Although some of the bootstrap confidence intervals provided by Dr. Jarvis do not fall below 0% or surpass 100%, we guess that all of them are incorrectly calculated. Unfortunately, the incorrect confidence intervals are described by Dr. Jarvis (p. 12) as "mathematically and biologically implausible" and taken to "….betray the quantitative weaknesses in [Hertig 's] data and analysis." et al. Indeed, they are "mathematically and biologically implausible" but the reason is that they were not correctly calculated. Whatever bearing a bootstrap analysis has on our understanding of the "precision" of Hertig 's data and analyses remains to be seen. et al.
Dr. Jarvis' central argument is that there is more ambiguity associated with estimates of fetal wastage in humans and that this ambiguity is not widely understood. Many of his concerns should be taken seriously. Nonetheless, his analysis is undermined by errors of analysis and overstatement. In the end, his estimate of fetal wastage from conception to birth is consistent with many of the previous estimates.
Dr. Jarvis' analysis is also undermined by an incorrect dismissal of data from embryos created via assisted reproductive technology (ART), which he refers to as fertilization (IVF). On page 11, he in vitro alludes to "…sub-optimal conditions for embryo culture…" and implies that somehow ART embryos are "different" in undefined ways from naturally-conceived embryos that negate their potential use in regard to estimating fetal wastage. This is an exercise in rhetoric, not a scientific argument. It is true that ART embryos are different from natural embryos in ways that could influence an estimate of fetal wastage. However, it is essential to note that they constitute the best available sample for insight into the "black box" of early pregnancy, despite the possible biases they may have that could distort our view into the black box. To this extent, it is best to assess what information they can provide about fetal wastage, rather than provide tenuous or irrelevant reasons as to why they are not useful.
Dr. Jarvis mistakenly assumes (p. 11) that only ART embryos transferred into mothers would provide information about fetal wastage. In fact, as Dr. Jarvis notes, there are a number of reasons why transferred embryos are not representative of all embryos (e.g., conscious or unconscious quality biases, sex selection) and accordingly, this kind of sample could be misleading. That said, studies of such samples suggest that at least some aspects of their biology are identical to that of naturally-conceived embryos. For example, the sex ratio at birth for ART embryos is statistically identical with that of natural conceptions (Orzack , 2015). et al.
More importantly, the entire ensemble of ART embryos (untransferred and transferred) provides information about fetal wastage. Almost all ART embryos undergoe testing for chromosomal abnormalities, such as aneuploidy. The consequences of aneuploidy are well-known -it results in almost certain death before birth. This is consistent with the fact that many spontaneous abortions are karyotypically abnormal (Boué, Boué, & Lazar, 1967, 1975Jauniaux & Burton, 2005). To this extent, the frequency of such abnormalities provides strong circumstantial evidence as to the amount of fetal wastage. Orzack (2015) investigated a sample of ART embryos whose karyotypes were assayed via et al.

wastage. Orzack
(2015) investigated a sample of ART embryos whose karyotypes were assayed via et al. FISH or CGH and reported that 84,881 out of 139,704 embryos contained at least one aneuploid chromosome. The implied percentage of fetal wastage (60.8%) is remarkably consistent with the central tendency of the many reports that Dr. Jarvis dismisses as unreliable, as well as with his own estimate. As noted, we need to be cautious about inferences from this sample but not avoid making them. There is no compelling reason to think that "suboptimal" conditions for embryo culture (if any) cause many chromosomal abnormalities, most of which very likely arise during meiosis (e.g., Hassold & Hunt, 2001;Hunt & Hassold, 2007;Jones, 2008;Nagaoka, Hassold, & Hunt, 2012). What deserves scrutiny are whether the frequency of chromosomal abnormalities is elevated by techniques for collecting eggs and/or because women providing them for use in ART are unrepresentative of all reproductive women. There are limited data that unstimulated and stimulated oocytes have similar frequencies of abnormality (Labarta , 2010). Of course, women using ART are often older than many typical et al. mothers. However, a high frequency of karyotypic abnormality is also observed among oocytes from young women (Baart , 2006;Munné , 2006). These concerns should continue to be investigated et al.
et al. but they in no way imply that ART embryos cannot provide useful insights about early human development and fetal wastage, especially given the current lack and very likely continuing lack of a large sample of naturally-conceived human embryos.
We see then a web of circumstantial evidence implying that there is a substantial amount of fetal wastage in humans. This insight arises from imperfect types of knowledge (as documented by Dr. Jarvis) but nonetheless, there is a signal consistent with the claim that approximately half or more of conceptions fail. More needs to be done to improve our understanding.
The study of fetal wastage shares with the study of the human sex ratio during pregnancy the fact that many different kinds of scientists are involved and so, the associated balkanization has reduced the accountability that arises from a shared disciplinary perspective about the standards for the interpretation of data (Orzack, 2016;Orzack , 2015). One cause and consequence of this division is the gray et al. literature mentioned above.
What contributes to the continuing "life" of the gray literature? Science abhors a vacuum and claims about high fetal wastage in humans have been repeated often in a way that the connection with assumptions and data have gotten obscured or lost. Some claims date well before there was any means by which early mortality could be assessed (Mall, 1917;Meyer, 1920;Pearson, 1897). Pearson clearly acknowledged the lack of direct evidence but such caveats get lost especially in medicine in which attention to standards of evidence, recognition of the assumptions needed to connect data with conclusions, and awareness of needed statistical techniques have been less as compared to in biological research. These deficiencies have diminished as medical training has incorporated more scientific training but have not disappeared. Nonetheless, during medical training the "inhalation" of facts is important. It is one reason as to why many believe that fetal wastage is high, despite having little or no familiarity with the available data along with the ins and outs of their analysis and interpretation.
(We have replaced number citations with author citations). Several of these claims are in medical (We have replaced number citations with author citations). Several of these claims are in medical textbooks and are akin to newspaper articles, i.e., they are reports on prior research as opposed to being independent estimates. Even then the nature of the evidence can go unmentioned. For example, in their text book Johnson & Everitt (2000) include no evidence or citations in which to find evidence underlying their estimate. Of the claims in the primary literature, we again see a lack of independent evidence in as much as someone else's estimate is reported. For example, Chard (1991); Drife (1983) We now know that for every successful pregnancy that results in a live birth many, perhaps as many as five early embryos will be lost or will "miscarry"…. This is clearly a heuristic estimate! The point is that there is less of a monolithic ensemble of flawed estimates that need to be debunked than one might imagine given Dr. Jarvis' passage. In any case, there is nothing inherently problematic about the citations just described. Indeed, it would be preferable if attributions were better and speculation was better highlighted as such. Nonetheless, such estimates should be used with caution but not discarded, given the substantial difficulties associated with the estimation of fetal wastage in humans.
An ideal future investigation of fetal wastage is easy to imagine: daily assessment of EPF and hCG for a cohort of women attempting to get pregnant. Easier said than done! Consider what such a study would require: a reliable assay for EPF, the enrollment of thousands of women, collection of and accurate assessment of thousands of samples, and more. Perhaps these technical and logistical barriers can be overcome soon. In the meantime, we can recognize that there is strong circumstantial evidence that human fetal wastage is likely between 50 and 75%. At the same time, we can recognize along with Dr. Jarvis that this conclusion lacks definitive proof and that additional investigations and scrutiny are needed. We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above. [37-90%] is not a confidence interval, I do not refer to it as such, and nor can it be, since there are no data. As described in the article, it is the range within which 95% of simulated estimates fall, based on Roberts & Lowe's speculative values and other assumptions.

References
The reviewers suggest that my analysis lacks "practical quantitative value". I agree. This is the point and I am glad they have recognised it, if not entirely appreciated its significance. My analysis has "no practical quantitative value" . As I for estimating the number of conceptions that are lost explicitly point out, the sole purpose of the sensitivity analyses is to show that modest changes in the speculative estimates used by Roberts & Lowe may result in any biologically plausible value for embryo loss. is not 'Gray Literature'. I The Lancet comment further on this below. Contrary to the reviewers' suggestion, we are not completely "ignorant of the training of Drs. Roberts & Lowe" or unaware of their experience in "research practice and data analysis". Charles Ronald Lowe was the more senior of the two. He was 63 years old and Professor of Social and Occupational Medicine at the University of Wales College of Medicine when article was published. He "contributed much to the growth of academic The Lancet public health and the teaching of epidemiology and statistics." I do not describe their work as "useless" -if intended as a quote, then it is a misquote. I describe it as having "no practical quantitative value". These are carefully chosen words. (I have edited the equivalent phrase in the Abstract to match the full text.) The critique offered by the reviewers and their description of the paper as heuristic support this view. Nevertheless, I have added a statement that, as a model for highlighting factors that influence fecundity, the Roberts & Lowe analysis has some value.
In all fairness, on four separate occasions, I describe the analysis of Roberts & Lowe as a "hypothesis", i.e., the banner under which it was originally published in . Indeed, they The Lancet describe their arithmetic as "speculative"; however, they also describe their estimate as "conservative", implying that the true result may be even higher than 78%. My critique would be less germane had their hypothesis not been cited so widely ("more than 300 times", as helpfully pointed out by the reviewers). I suggest that it is not I, but those who enthusiastically cite it who treat it as "something that it isn't".

Life Tables of Intrauterine Mortality
I do not consider Leridon's chapter a "casual exercise" or "superficial". On the contrary, it is a well-reasoned attempt to answer a challenging biological question. I have included a tribute in my article to Leridon's review. I hope this prevents readers from gaining such false impressions.
I agree with the reviewers that Leridon's 63% is close to my 40-60%. However, Roberts & Lowe's 78% is not, as they imply. that Hertig's conclusion does not have a "solid quantitative underpinning"; however, it is precisely the quantitative underpinning of Leridon's life table and other claims about early natural embryo mortality. This is a key point of my article. It is not clear what the reviewers mean by 'impressionistic' : some authors seem to offer an 'unimpressionistic' account of Hertig. For example, in the widely-cited 'Black Box' review , Macklon . write regarding Hertig's study: et al "…the high rate of early pregnancy loss before the time of the first missed period was thus clearly demonstrated…" Other less widely-cited articles do address the design and analytical shortcomings.
Pointing out shortcomings in studies is what scientists (and reviewers) are meant to do. Thus, I agree with the reviewers that they are "worth mentioning". Furthermore, by pointing out that Hertig's subjects were of proven fertility, had gynaecological problems and may have had suboptimal reproductive function, I am placing Hertig's study "in proper context". This is not "un-useful". Nevertheless, I have edited this section, to accommodate these reviewers' scepticism with the more positive view of others . I hope I have struck an acceptable balance.
Orzack & Zuckerman appear to have concerns with well-established statistical techniques, referring to my "so-called 'bootstrap' samples". I agree that bootstrapping is "not a 'cure' for small sample size", but I do not claim that it is. Bootstrapping can provide estimates of precision when it is not possible to calculate these analytically. As with all analyses, outputs require appropriate interpretation.
The reviewers state that the "analyses of the bootstrap results are incorrect" because some of the confidence intervals lie outside the range 0-100%. I am aware that this is impossible (for a probability) as I explicitly point out. Such outputs do indicate a serious flaw in the analysis, which is as follows: Hertig ignores 47 of his 107 cases. These cases are included in my bootstrap. The reader may consider whether ignoring 44% of the data is reasonable and the extent to which by doing so Hertig has generated biased estimates of the probabilities he calculates. Kline . et al (1989) make a similar point: "The missing data are sufficient to engender an entirely different result" . The bootstrap therefore illustrates the extent to which Hertig's estimates are biased by ignoring his own data. There are other reasons to doubt the precision of his conclusions and the representative nature of the subset of data upon which he relies so heavily -these are described in the article.
The bootstrap pseudo-datasets are available for scrutiny (Dataset 4). Thus, if there are any flaws in my reasoning or bootstrap, the reviewers may point these out. I used the percentile method (to which they refer) to calculate the 95% CIs and I have edited the text to clarify this. I do not believe there are any flaws in my bootstrap.

IVF/ART data
There is a wealth of data from IVF/ART studies and I have only mentioned a tiny proportion of this.
Orzack & Zuckerman and a previous reviewer suggest that such data could contribute to a quantitative understanding of the situation. In the broadest sense, this is of course true. in vivo However, there are difficulties in extrapolating from to circumstances. I am not alone in vitro in vivo in pointing this out , and I have illustrated some of these difficulties in the Discussion.
My description of "sub-optimal conditions for embryo culture" is drawn from two papers:

2.
My description of "sub-optimal conditions for embryo culture" is drawn from two papers: Bolton & Braude (1987) : "Optimal culture conditions for human embryos have yet to be defined" and "suboptimal culture conditions are undoubtedly responsible for a proportion of this embryonic failure". Bolton . (2015) : "Embryo culture conditions are likely to be suboptimal et al in vitro compared to those ." in vivo Is this just rhetoric or a reasonable consideration?
Describing data as the "best available" is a weak claim in the absence of equivalent natural in vitro data. The extent to which embryos are representative of embryos is precisely in vivo in vitro in vivo the point in question. Is there really numerical consistency between natural and IVF/ART embryos? There may be consistency in sex ratios , but does that extend to aneuploidy rates, mosaicism, epigenetic defects, implantation potential, spontaneous abortion rates, etc? These are big questions and this article is not the place to answer them. However, if 70% loss is the natural benchmark by which IVF/ART embryos are judged to be equivalent to natural embryos , but the true rate of natural loss lies in the range 40-60%, this therefore casts doubt on the judgement that IVF/ART and natural embryos are equivalent. Furthermore, the suggestion that IVF/ART and natural embryos may be different is neither radical, novel, nor strong . However, the real reason I do not consider IVF/ART embryo data is that the article is a critique of data from natural circumstances. Comparison of natural and IVF/ART embryos is a project for the future.
The reviewers refer to my "tenuous or irrelevant reasons" why ART embryos are not useful for quantifying early embryo mortality, yet they provide the perfect reason themselves: "it is true that ART embryos are different from natural embryos in ways that could influence an estimate of fetal wastage" . Nevertheless, I do discuss circumstances in which different ART interventions (e.g., observation of fertilisation ; retrieval of embryos following timed artificial insemination, in vitro per se as well as AID/IVF success rates) may cast light on embryonic/fetal wastage.
Orzack & Zuckerman extrapolate from 84,881 aneuploidies among 139,704 IVF/ART embryos to an "implied percentage of fetal wastage" of 60.8%. They state that this is the "central tendency" of "many reports" that I dismiss as unreliable. Of course, if this were true, then the observation would add little to what was already known. It is not clear which are the "many reports".
Let us consider the hypothesis that aneuploidy predicts natural total fetal wastage. Firstly, in vitro "The only well-established epidemiological facts about EPL { } are that about early pregnancy loss 50-60% of cases are associated with a chromosomal defect of the conceptus" suggesting that euploid embryos may also fail. Secondly, "FISH may overestimate the incidence of aneuploidy" suggesting a proportion of apparently aneuploid embryos may not fail. Furthermore, aneuploidy may not developmentally compromise embryos ; estimates of IVF/ART embryo aneuploidy/mosaicism vary considerably ; mosaic embryos can self-correct ; aneuploidy in trophoblast/placental cells may be less developmentally problematic -who knows, it may even be advantageous! The point is simple. There are too many undefined variables associated with IVF/ART embryos to shed more than the faintest light on the question of natural embryo survival. I have included a brief discussion of some of these issues and edited the penultimate paragraph to be more circumspect by replacing an "are" with a "may be". I hope this meets with the reviewers' approval.

Gray Literature
On several occasions, the reviewers refer to Gray Literature. They offer a revealing account and speculate on its continuing 'life'.
Gray Literature has been defined as follows: "That which is produced on all levels of government, academics, business and industry in print and electronic formats, but which is not controlled by commercial publishers." The list of references reproduced by the reviewers, starting with Opitz, 2002 andending with McCoy . 2015 are all from academic books, journals, or text books. They are all published by et al commercial publishers. They were all written (with one exception) by medical practitioners or scientists, many of whom are experts in reproductive biology. The one exception (Harris, 2003) is a moral philosopher; however, the reviewers usefully point out that his estimate comes from a well-known and eminent reproductive biologist.
None of this is Gray Literature.
, & Human Reproduction Update Fertility & Sterility PLOS Genetics are reputable academic journals. Many of these articles will have been peer-reviewed. Even pieces "akin to newspaper articles" (the Drife (1983) piece could be described as such and was BMJ probably not peer-reviewed ) are subject to editorial control, and an expectation of academic professionalism is surely reasonable from such experts.
The reviewers state that it would "be preferable if attributions were better and speculation was better highlighted". I agree. Yet they highlight my 'so-called' "errors of analysis" and "overstatement" whilst passing over errors and overstatement in these citations as "nothing inherently problematic".
What Orzack & Zuckerman describe and defend is not Gray Literature, but 'Gray Scholarship'.

Heuristics
A heuristic estimate may be based on simplified quantitative criteria, educated guesswork, rules of thumb, common sense, past experience, etc. Despite their utility, in the absence of evidence heuristic estimates may become biased. Faced with inconsistent estimates, on the one hand, those that are heuristic or based on circumstantial evidence, and on the other, those based on well-defined analysis of relevant data, surely an appropriate scientific response is to favour the latter and re-evaluate the former.
A further problem with heuristic estimates is that the process for deriving them is not always transparent. For example, it is not obvious how Orzack & Zuckerman use the "web of circumstantial evidence" to which they refer to conclude that "human fetal wastage is likely between 50 and 75%". There is something 'gray' about this. My estimates of 10-40% preimplantation loss and 40-60% total loss are partly evidence-based and partly heuristic. They may be imperfect, and no doubt will not be the last word on the matter, but it is at least clear how they were derived . Thanks for the opportunity to review this high-quality manuscript. Peer review can be a chore, but this was a pleasure to read. I will state that my training is in statistics and research methodology. Although much of my work is in the field of fertility, I have no clinical expertise and no familiarity with the literature discussed in this review. Any comments I make are from the point of view of the statistician and, with respect to the subject-matter, the layperson.
I am unable to comment on whether or not the body of evidence discussed in the review is comprehensive. However, the critical appraisal of these studies is conducted to a high standard, with a strong command of quantitative research methods on display. I can't fault it. The reader is left in no doubt as to the considerable limitations (many of which appear to be fatal) of these studies. All data used in the manuscript have been made available for the purposes of reproducing the analysis. I was slightly confused by the description of the simulation study as a two-stage procedure in the critique of Roberts & Lowe. If I understand correctly, sets of simulated values for five quantities were drawn from Normal distributions centred around the estimates used by Roberts & Lowe, with standard deviations equal to these values multiplied by 0.2. Each time a new set of these five quantities was drawn, the values were used to calculate (predict) a value for embryo loss. This was done 100,000 times. However, the author speaks of 1,000 simulations, each containing 10,000 separate estimates. It is unclear what exactly varied within and between the 1000 simulations. If the data generating model was the same for all of these (ie: this was just done for computational reasons), then it would be helpful if the author could make this clear in the text.
The author assumed that the simulated quantities were independent in the simulation -I confess to having no real intuition as to the implications of this assumption. However, I don't believe this would affect the author's conclusion.
One minor typo; 'this is far from being a robust pregnancy diagnosis and in different study [46]…' I believe that it would be appropriate to accept this manuscript without revision, although the author may wish to clarify the point about the first simulation described above.
Conducting peer review may be beneficial to my career. I am funded by a Doctoral Competing Interests: Research Fellowship from the National Institute for Health Research (DRF-2014-07-050). The views expressed in this peer review are those of the authors and not necessarily those of the NHS, the National Institute for Health Research or the Department of Health. JW is a statistical editor of the Cochrane Gynaecology and Fertility Group.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

7.
function would be able to understand the arguments he is providing.
We are pleased to see this article being written. We think it is timely, thought-provoking and this is an excellent moment in which to consider in realistic terms the kind of evidence that is constantly requoted in the debate about how fertile the human species is. Currently this topic is dominated by data from studies on women who are sub/infertile receiving medical support to achieve a pregnancy.

Specific points
Who is the audience for this paper and does the introduction set the scene in such a way that the reader will be both interested and motivated to read the remaining part of the paper, which I would like to see them do? I think as written the Introduction may not achieve this objective. For example the first sentence starts with some glib comments about it being 'widely accepted' that under natural circumstances human embryo mortality is high, and then there is an extensive section quoting number of popularist articles and websites -why have this up front? It seemed to undermine the erudite arguments of the rest of the paper.
The second paragraph with some modification would make a sufficient introduction. The aim of the review as stated in the discussion 'How many fertilized human embryos..?' should also be frontloaded at some point here. Clearly embryo mortality is of interest to both reproductive biologists and fertility doctors but why not also mention couples trying to conceive?
Reading the introduction we were struck by the pressing need for 'key terms' box -the kind of thing you see in Nature papers -where there is a definition of each of the terms used, e.g. Fecundability, embryo, HCG, etc. If this paper is going to be read by individuals who are not fertility experts or experts in reproductive biology but people interested in ethics or chance or statistics, I think they will be very confused by the different terms that are used.
What is not clear from the paper is the chronology of the observations/data being discussed. It is common for people (even those familiar with the field but who work on animal models) to be very confused by the timings in women. For example -the day on which fertilisation takes place versus the last menstrual period, e.g. fertilisation versus gestation versus the first day (depending on when you count from) on which you might reasonably expect to detect HCG in the urine. We would argue there needs to be a figure defining when each of these happens in terms of days in a woman's reproductive span. This could also help clarify the points in the process that the probabilities of π π etc can apply to.
The second piece of information where we think it would be very helpful is under the section called 'What the data say' where the terms such as 'old' are added and there are no dates or refs provided. What do they mean by 'old ' -pre 1960, pre 1950, pre 1940? Because the author has used numbered references, there is also no sense of the relationship of one study to another in terms of dates i.e. how they chronologically relate to each other. Some minor reworking in which the author says, for instance, "the work of Hertig and Rogg in the 1950's" would be helpful.
The author is also slightly confusing when talking about the pregnancy study (ref 42) not giving the names of the authors nor the date on which it was published in the section on page 4, and then in the reference, for instance in Fig 2, they talk about the pregnancy study ref 42 but in the figure it is FERT, CLIN 7. 8.
the reference, for instance in Fig 2, they talk about the pregnancy study ref 42 but in the figure it is shown as French and Bierman 1962. This is the kind of things that make it difficult to get a sense of the chronology of observations and how people have built on each other's observations in order to support subsequent studies, and this after all is one of the most crucial points of this paper.
On page 6 we finally get to some discussion about modern pregnancy tests. It is not until some pages after that we know whether they are in blood or urine. Mid cycle elevation of HCG -this is not defined in terms of days (cf comments above). For information the fact that these assays were likely to be urine-based assays is not mentioned until page 7.
We think many aspects of this paper are extremely well argued, very much so the data provided. The very great detailed analysis in Table 3 and also in other parts of page 7, and some very good points are made about the over-emphasis on using data from patient groups where infertility is probably one of the reasons for presentation that may have caused a less robust data set.
The author makes a valid argument about potential subfertility within the Hertig cohort but this is not balanced. Equally, these women were selected for proven fecunditity and this factor affects interpretation of this cohort as much as the other.
On page 10 the discussion starts with a key question how many fertilised human embryos die. It is slightly frustrating that this was not put up front as the question being addressed in this paper. Maybe the author might like to consider setting out aims more clearly.
Again, in the discussion, many of the arguments being made would have been greatly enhanced by telling us the dates on which some of these studies were conducted. When looking at the reference list I see many of them were in the '80s and early '90s.
We wonder if the first paragraph on page 12 might reasonably be eliminated -it feels repetitive compared to other parts of the paper. I think the discussion of the studies by Macklon review ref 20 is extremely insightful and useful. However we draw the author's attention to a more recent study by Macklon and Brosens which we believe puts forward some interesting arguments that might reasonably be discussed in his study about how the endometrium in which the embryos are set to implant might be acting as a 'sensor' of embryo quality. This is in Biology Reproduction 2014, vol 91. There is also a complementary paper in Sci Rep, vol 6, Brosens 2014. et al.
The conclusion of the discussion seems more like a continuation of the critique of the final few paragraphs. It would be desirable to provide a concluding paragraph which holistically draws together the content of the review. Again the heavy use of quoting references as appears in the introduction masks the opportunity for the author to provide his own conclusions.
In summary we welcome this review which we think makes many erudite comments on a difficult field.
No competing interests were disclosed.

Competing Interests:
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.
The question (slightly modified) has been included in the Abstract and Introduction. I hope that this helps to clarify and reinforce the purpose of the article. Some dates have been incorporated to enhance chronological clarity (see point 6). The first paragraph on page 12 addresses the importance of biological variance. It does not go into detail but stresses that point estimates of risk do not provide the whole picture when considering either populations or individual cases. As I have put it, a neglect of variance fosters a misleading appreciation of reality. I would prefer to retain this paragraph, in the hope that it will encourage readers to consider the importance and implications of numerical diversity when interpreting data.The arguments proposed by Macklon and Brosens relating to endometrial receptivity are indeed interesting. However, they propose mechanistic explanations for implantation failure and do not directly address the issue of how frequently such events occur. Nevertheless, their inclusion is contextually valuable, and I have made some comments on their studies. The final paragraph has been edited. The quotations are useful to make it clear that I am not alone in drawing attention to the limitations of the available data. I have endeavoured to summarise the broad purpose and value of this work.

None
Competing Interests: