Early embryo mortality in natural human reproduction : What the data

It is generally accepted that natural human embryo mortality during pregnancy is high – losses of 70% and higher from fertilisation to birth are frequently claimed. The first external sign of pregnancy occurs two weeks after fertilisation with a missed menstrual period. Establishing the fate of embryos before this is challenging, and hampered by a lack of data on the efficiency of fertilisation under natural conditions. Four distinct sources are cited to justify quantitative claims regarding embryo loss: (i) a hypothesis published by Roberts & Lowe in is widely cited but has no practical quantitative value; (ii) life table The Lancet analyses give consistent assessments of clinical pregnancy loss, but cannot illuminate losses at earlier stages of development; (iii) studies that measure human chorionic gonadotrophin (hCG) reveal losses in the second week of development and beyond, but not before; and (iv) the classic studies of Hertig and Rock offer the only direct insight into the fate of human embryos from fertilisation under natural conditions. Re-examination of Hertig’s data demonstrates that his estimates for fertilisation rate and early embryo loss are highly imprecise and casts doubt on the validity of his numerical analysis. A recent re-analysis of hCG study data suggests that approximately 40-60% of embryos may be lost between fertilisation and birth, although this will vary substantially between individual women. In conclusion, it is clear that some published estimates of natural embryo mortality are exaggerated. Although available data do not provide a precise estimate, natural human embryo mortality is lower than is often claimed.

Early human embryo mortality is of interest not only to reproductive biologists and fertility doctors, but also to ethicists 31 , theologians 32 and lawyers 2 . Nevertheless, becoming pregnant and having children is of primary and personal importance to many women and their families. As with all biological processes, nothing works perfectly all the time 33 , and failure to conceive and pregnancy loss are common problems. However, inconsistent estimates of early pregnancy loss are not reassuring, nor do they provide a sound basis for either a quantitative understanding of natural human reproductive biology or an unbiased appraisal of artificial reproductive technologies. The divergent and excessive values noted above therefore invite scrutiny of the evidence that supports them. In this article, I identify and re-evaluate published data that contribute to claims regarding natural human embryo mortality.

A quantitative framework for embryo mortality
A quantitative framework has been proposed to facilitate the calculation and comparison of embryo mortalities from fecundability and pregnancy loss data 34 . The model comprises conditional probabilities (π) of the following biological processes: (1) reproductive behaviours resulting in sperm-ovum-co-localisation per cycle = π SOC ; (2) successful fertilisation given sperm-ovum-colocalisation = π FERT ; (3) implantation of a fertilised ovum as indicated by increased levels of human chorionic gonadotrophin (hCG) = π HCG ; (4) progression of an implanted embryo to a clinically recognised pregnancy = π CLIN ; (5) survival of a clinical pregnancy to live birth = π LB .
Fecundability is the probability of reproductive success per cycle, but may take different values depending on the definition of success. The following four fecundabilities broadly follow Leridon 30 : 1. Total (all fertilisations): FEC TOT = π SOC × π FERT 2. Detectable (implantation): FEC HCG = π SOC × π FERT × π HCG 3. Apparent (clinical): FEC CLIN = π SOC × π FERT × π HCG × π CLIN 4. Effective (live birth): FEC LB = π SOC × π FERT × π HCG × π CLIN × π LB Hence, the probability that a fertilised egg will perish prior to implantation is , and prior to clinical recognition is ]. In theory, embryonic mortality may be estimated at different stages; however, in practice, this depends on available data. Clinical and live birth fecundabilities are most easily quantified and most frequently reported. Total and detectable fecundabilities are less frequently reported, although of direct relevance.

What the data say
Publications containing data relevant to early human embryo mortality were identified primarily by tracing citations found in articles, reviews and textbooks. Systematic online searches did not capture all of these studies. Some are particularly old, many were not conducted to address the specific question, and others are in books or publications that are not adequately indexed. If not entirely complete, nevertheless the data presented form a substantial proportion of relevant, available scientific information on natural early human embryo mortality.
Studies that contribute analysis and data relevant to the quantification of natural human embryo mortality fall into the following four categories and will be considered in turn.

A speculative hypothesis published in The Lancet.
2. Life tables of intra-uterine mortality.
3. Studies of early pregnancy by biochemical detection of hCG.
4. Anatomical studies of Dr Arthur Hertig and Dr John Rock.

Where have all the conceptions gone?
In 1975, a short hypothesis published in The Lancet entitled "Where Have All The Conceptions Gone?" concluded that 78% of all conceptions were lost before birth 26 . It has been widely cited by both scientists 4,17,19,20,35 and non-scientists 36,37 alike. Conceptions among married women aged 20-29 in England and Wales in 1971 were estimated and compared to infants born in the same period. In this analysis (Table 1) there are reliable values, e.g., census data, and simple arithmetical calculations. However, speculative values  ‡ The median values of the 2.5 th and 97.5 th percentile boundaries from 1,000 simulations, each containing 10,000 separate estimates for embryo loss. The derivation of these values is described in the text. Briefly, each separate estimate of embryo loss was calculated using variable speculative values that were obtained by random sampling from a normal distribution with a mean equal to the Roberts & Lowe value and a coefficient of variation of 20%. The median value of the mean percentage loss was 73.3% and of the median was 76.5%. ¥ The most frequent duration of a menstrual cycle is 28 days but there is substantial variability and the mean length is generally 30-31 days 30 . (1) fertilisation rate following unprotected coitus during the fertile period was estimated as 50% and supported by reference to Hertig 38 (although his estimate was 84% 33 ); (2) the length of a menstrual cycle (28 days); and (3) the duration of the fertile period (2 days). These latter values are plausible, but also variable. No justification is provided for three behavioural variables: (1) coital frequency estimated at twice per week;

Description of data
(2) proportion of unprotected coital acts estimated at 25%; and (3) either a random or regular distribution of coital acts during menstrual cycles such that 1/14 of all coital acts fall within a fertile period.
The validity of Roberts & Lowe's conclusion depends largely on the accuracy and precision of these speculative values. The following two simple analyses illustrate the sensitivity of their conclusion on the speculative values.
1. When four of the speculative values are reduced by 25% (e.g., coital frequency reduced to 1.5/week) and cycle length increased by 10% (from 28 days to 31 days 30 ), the estimate for embryo loss drops to 22%. The opposite operation (e.g., coital frequency increased to 2.5/week) results in an estimate of 92% (Table 1). Embryo loss of 22% is barely sufficient to account for observed clinical losses, and 92% indicates a maximum FEC LB of 8%. Neither scenario is biologically plausible.
2. A non-zero variance was applied to each speculative value reflecting their uncertain nature. Using the random number generator in Microsoft ® Excel (Office 2010) simulated values were obtained by random sampling from normal distributions with means equal to Roberts & Lowe's speculative values with coefficients of variation equal to 20%. For simplicity, it was assumed that there was no covariance between the different speculative values. Table 1 shows the expected range within which 95% of these simulated values fall (e.g., coital frequency is 1.2-2.8/week). For each simulated record, a new estimate of embryo loss was calculated and from 10,000 of these, the mean, median and 2.5 th and 97.5 th percentiles of embryo loss were determined. This was repeated 1,000 times: the mean value of the simulated means was 73.3% and of the simulated medians was 76.5%. The mean values of the 2.5 th and 97.5 th percentile boundaries for embryo loss were 37% and 90% ( Table 1). The same simulation was also performed using NONMEM 7.3.0 ® (Icon PLC, Dublin, Eire) and generated 100,000 data records. The outcome of this is shown in Figure 1. The code and simulated data values are in Dataset 1. See README.docx for a description of the file.
The sole purpose of these simple sensitivity analyses is to illustrate that modest adjustments to Roberts & Lowe's original speculative values can result in any biologically plausible estimate for embryo loss. The output from the calculation is therefore substantially dependent on the subjectively selected input. Such an analysis has no practical quantitative value.
Other sources of bias in their model include the failure to account for intentionally terminated pregnancies and the reduced fecundability of already pregnant women and nursing mothers. Despite this, it was described as "persuasive" 39 and it has been claimed that "it is still difficult to better the original calculations of Roberts and Lowe (1975)" 19 . By contrast, others have noted that "their calculations can be criticized" 4 and are "tenuous" 40 . Considering its quantitative limitations, it has been cited surprisingly often 8,20,41 .

Life tables of intrauterine mortality
Constructing a life table of intrauterine mortality is challenging since embryonic death may occur even before the presence of an embryo is recognised. Nevertheless, in 1977 Henri Leridon published a complete life table of intrauterine mortality 18 . Leridon highlighted the consequences of inappropriate analysis and the quantitative biases produced by alternative numerical methods. Overall, he discussed sixteen studies, and provided detailed commentary on six [42][43][44][45][46][47] . These data are summarised in Figure 2 and suggest that 12-24% embryos alive at 4 weeks' gestation (i.e., approx. 2 weeks' post-fertilisation) will perish before birth.    loss during this period of 15.0%, higher than 10.8% originally reported 42 . Leridon's own description of this interpolation as "risky" can be illustrated by adjusting his re-allocation 18 . Transferring just two of the pregnancy losses out of or into the first week results in estimates of the 4-7 week pregnancy loss of 10.9% and 19.1% respectively ( Table 2). The validity of adjusting Leridon's re-allocation may be questioned. However, pregnancy loss in week 4-5 of the Kauai Study would manifest as a menstrual period delayed by up to one week. This is far from being a robust pregnancy diagnosis and in different study 46 , exclusion of pregnancy losses reported within one week of study entry resulted in substantially different loss probabilities ( Figure 2) suggesting a confounding correlation between entry and loss 18 . Nevertheless, the re-allocation does reinforce a concern highlighted by Leridon, namely the uncertainty that affects the first probability. Clearly, these estimates of early loss should be treated with caution.
A more fundamental problem is that these data offer no insight into the fate of embryos prior to the earliest possible point of clinical pregnancy detection. Leridon completed his life table with values from Hertig's analysis 33 . He concluded that among 100 ova exposed to the risk of fertilisation, 16 are not fertilised, 15 die in week one (before implantation), and 27 die in week two (before the menstrual period). After two weeks his life table follows the Kauai probabilities closely ending with 31 live births. Leridon's table therefore indicates an embryo mortality of 50% (42/84) within the first two weeks after fertilisation and a total mortality of 63% (53/84) from fertilisation to birth.
Leridon's account of intrauterine mortality has been widely cited. However, its accuracy depends entirely on the quality and interpretation of the data from Hertig 33 and French & Bierman 42 . French & Bierman's approach probably resulted in an overestimate of total pregnancy loss and is certainly imprecise in its estimate of embryo loss in the four weeks following the first missed menstrual period. The reliability of Hertig's estimates of embryo loss in the two weeks following fertilisation is considered below. All recorded pregnancies in the Kauai study were categorised by date of enrolment in four week intervals, beginning with 4-7 weeks' gestation. This time-staggered approach enabled risk of miscarriage to be associated with stage of gestation. However, despite considerable efforts, only 19% of the 3,197 recorded Kauai pregnancies were enrolled between 4-7 weeks' gestation, thereby reducing the precision of pregnancy loss estimates for this earliest of time intervals. Although pregnancies were grouped in four week periods, Leridon suggested that early mortality may change week by week, resulting in underestimation of pregnancy loss. He re-allocated the 592 study entries and 32 pregnancy losses for weeks 4-7 (Table 2) generating an overall probability of pregnancy 3. Biochemical detection of pregnancy using hCG Quantification of pregnancy loss requires pregnancy diagnosis. The earliest outward sign of pregnancy is a missed menstrual period, approximately 2 weeks after fertilisation, although amenorrhoea in women of reproductive age is not exclusively associated with fertilisation 49,50 . Several potentially diagnostic pregnancyassociated proteins have been identified 51 of which only one, Early Pregnancy Factor (EPF) 52 , has been claimed to be produced by embryos within one day of fertilisation. However, there is doubt about the utility of EPF for diagnosing early pregnancy 53 and little has been published on it in the past five years.
Modern pregnancy tests detect human chorionic gonadotrophin (hCG), a highly glycosylated 37 kDa protein hormone produced by embryonic trophoblast cells 54 . Mid-cycle elevation of hCG is associated with embryo implantation 19,20,55 . Early assays for the detection of hCG were probably confounded by antibody cross-reactivity with luteinizing hormone 56 but modern tests are more specific and a positive result is a reliable indicator of early pregnancy. Highly sensitive assays have revealed low levels of hCG in non-pregnant women and healthy men 57 ; hence, quantitative criteria are required to distinguish between non-pregnant women and those harbouring early embryos 55 . Figure 3 and Table 3 summarise findings from thirteen studies that used hCG to identify so-called early, occult or biochemical pregnancy loss, i.e., pregnancy loss between the initiation of implantation and clinical recognition 58-70 . Notwithstanding design and subject differences, estimates for clinical pregnancy loss, ranging from 8.3% -21.2% (Figure 3), are similar to previous estimates ( Figure 2). Estimates for early/occult loss ranged from 0% to 58.3% in studies 58-62 prior to Wilcox in 1988 63 . This high variance was probably due to reduced specificity and sensitivity of the hCG assays and sub-optimal study design 48,51,71-74 . Studies from 1988 63 onwards have produced more consistent data indicating early/occult loss of approximately 20% (Figure 3).
In the three largest studies 63,66,70 pregnancies were clinically recognised only if they lasted ≥6 weeks after the onset of the last menstrual period 66,75 . Hence, early pregnancy losses in these studies included those lost up to approximately two weeks after a missed menstrual period: this may influence comparison of study results 34,73 . An overview of the thirteen studies suggests that overall pregnancy loss from first detection of hCG through to live birth is approximately one third (Table 3). This is consistent with another recent study which found that 98 out of 301 (32.6%) singleton pregnancies diagnosed by an early positive hCG test and followed-up to either birth or miscarriage were lost 76 . See README.docx for a description of the file.
The much cited Wilcox study 63 is the earliest of several large welldesigned studies that made use of a specific and sensitive hCG assay and led to numerous further publications 75,77-83 . Two other studies (Zinaman 65 and Wang 66 ) were similar in purpose, design and execution. These studies provide some of the best available data to calculate pregnancy loss between implantation and birth 34 . In each study, women intending to become pregnant and with no known fertility problems were recruited and hCG levels monitored  Table 3. Summary data from thirteen studies using hCG detection to diagnose pregnancy and identify early pregnancy loss. Raw FEC HCG is the ratio of hCG pregnancies detected and the number of cycles monitored in each study. Where available, mean (SD) ages of the participating women are taken directly from the published study. In some cases mean and SD (indicated by *) or SD (indicated by †) were estimated based on published demographic characteristics. § These data relate to the whole study cohort (n=124) which included known sub-fertile women, and not just to the 74 apparently fertile women. ‡ Mean value from Wilcox et al. (2001) 78 . ¶ Some studies only provide data up to late pregnancy (e.g., up to 28 weeks) rather than to term. ND = no data. ¤ Wilcox subsequently reported an additional hCG pregnancy which had not been detected and reported in the 1988 paper, making a total of 199 hCG pregnancies and 44 pre-clinical losses in the study group 75 . # Mumford reported data from aspirin-and placebo-treated subjects who had at least one prior miscarriage. Summary data from both treatment groups are included as there was no effect of aspirin 70 .

First author
Year Number of women Why do a proportion of menstrual cycles in women attempting to conceive fail to show any increase in hCG? Since FEC HCG = π SOC × π FERT × π HCG , there can be various causes for this failure including mistimed coitus, anovulation, failure of fertilisation or pre-implantation embryo death. Although FEC HCG puts limits on the extent of pre-implantation embryo loss, uncertainty in the estimates of π SOC ,π FERT and π HCG translates into uncertainty in estimates of pre-implantation embryo mortality. In the Wang study, for normally fertile women, FEC HCG = 46.2%; hence, the absolute maximum value for pre-implantation embryo loss must be 53.8%, although only if π SOC = π FERT = 1, conditions both extreme and unlikely 34 . Studies of the relationship between coital frequency and conception indicate that fecundability is greater with daily compared to alternate day intercourse 34,84,85 . Hence, when coital frequency is less than once per day a proportion of reproductive failure will be due to mistimed coitus, i.e., π SOC < 1. In the Wilcox study, coitus occurred on only 40% of the six pre-ovulatory days 34,79 , and in the Zinaman study participants were advised that alternate day intercourse was optimal 65 . Based on the difference in fecundability between daily and alternate day intercourse as modelled by Schwartz 85 , a value of π SOC = 0.80 was used to calculate pre-implantation embryo mortality 34 . However, this is a speculative estimate, and in reality the value may be higher, or lower.
A further critical missing piece of the equation is knowledge of the efficiencies of fertilisation and implantation under normal, natural, propitious circumstances. Assuming that either of these processes may be up to 90% efficient, and based on data from the three hCG studies 63,65,66 , a plausible range for pre-implantation embryo loss in normally fertile women is 10-40% and for loss from fertilisation to birth, 40-60% 34 . Even with these wide ranges of mathematically possible outcomes, it is clear that estimates for total embryonic loss of 90% 29 , 85% 28 , 83% 31 , 80-85% 6,27 , 78% 26 , 76% 5,25 and 70% 19-23 are excessive.
A previous review concluded that "at least 73% of natural single conceptions have no real chance of surviving 6 weeks of gestation" 5,86 . Live birth fecundability was estimated as "not over 15%", substantially lower than Leridon's 31%. Despite this discrepancy, Boklage's conclusions were derived from a review of data including several hCG studies 55,58-61,63 and Leridon's analysis 18 . He derived a model describing the survival probability of human embryos comprising the sum of two exponential functions: ...in which t is the time in days post-fertilization. This is the source of the 73% in the conclusion.
There are, however, serious problems with this analysis. Firstly, data presented as embryo survival probabilities at different times post-fertilization 55,58,59,61,63 are fecundabilities, i.e., successes per cycle, not per fertilised embryo. Secondly, for reasons that are unclear, data from Whittaker 60 and Leridon 18 were excluded from the modelling analysis and the data from an earlier Wilcox report 55 were included twice since this preliminary data had been incorporated into the later report 63 . Thirdly, the modelled data were normalised to a survival probability of 0.287 at 21 days post-fertilization. This value was derived from data published by Barrett & Marshall on the relationship between coital frequency and conception 84 . Barrett & Marshall had concluded that coitus during a single day alone, 2 days before ovulation resulted in a conception probability of 0.30. Boklage's value of 0.287 is his calculated equivalent. However, conception in this study was "identified by the absence of menstruation, after ovulation" 84 . Hence, 0.30 (and similarly, 0.287) is a clinical fecundability and not a measure of embryo survival. Furthermore, 0.30 is a nonmaximal fecundability, since it was an estimate based on coitus on a single day (2 days before ovulation) within the cycle. Barrett & Marshall clearly report that as coital frequency increased so did the fecundability, up to a maximum of 0.68 associated with daily coitus 84 .
Boklage's analysis can only make biological sense if it is assumed that every cycle in the Barrett & Marshall study resulted in fertilisation. Under these circumstances, failure to detect conception in 71.3% (1 -0.287) of cycles would be due entirely to embryo mortality. However, this is highly implausible and explicitly contradicted by the higher estimate of fecundability reported 84 . Boklage's implicit assumption also contradicts his further conclusion that "only 60-70% of all oocytes are successfully fertilized given optimum timing of natural insemination" 5 . The vertical normalisation of the hCG study data to a value of 0.287 at 21 days is the principal determinant of the parameters that define the two exponential model. Any change in this value would commensurately alter the balance between the two implied sub-populations of embryos. Since it is evident that the value of 0.287 is neither an embryo survival rate nor even a maximal fecundability, it follows that quantitative conclusions from this analysis in relation to the survival of naturally conceived human embryos are of doubtful validity.
However, Boklage is right about two things: firstly, the difficulty of calculating pre-clinical losses, because "In the place of the necessary numbers for the first few weeks of pregnancy we find editorially acceptable estimates which, while perhaps not far wrong, are difficult to defend with any precision", and secondly, that the source of some of the only directly relevant data (even though he excluded it from his modelling analysis), namely, "Hertig's sample is, and will probably remain, unique".

The anatomical studies of Dr Arthur Hertig
At the start of the 1930s, no-one had ever seen a newly fertilised human embryo. It was barely 60 years since Oscar Hertwig had first observed fertilisation in sea urchins 87 , and just 40 years before the birth of the first test tube baby 88,89 . In Boston, Dr Arthur Hertig and Dr John Rock's search to find early human embryos generated an irreplaceable collection which has left an indelible mark on our understanding of human embryology.
Hertig and Rock recruited 210 married women of proven fertility who presented for gynaecological surgery 38 . (In most of their publications, the number is given as 210 33,90,91 although 211 subjects are mentioned elsewhere 38 .) Of these, 107 were considered optimal for finding an embryo because they apparently: (i) demonstrated ovulation; (ii) had at least one recorded coital date within 24 hours before or after the estimated time of ovulation; (iii) lacked pathologic conditions that would interfere with conception. Hertig examined the excised uteri and fallopian tubes, and over fifteen years found 34 human embryos aged up to 17 days 33,38,90-97 . Of these, 24 were normal and 10 abnormal 33,90 . (There is some confusion over this: in three publications 38,91,97 , 21 embryos are described as normal and 13 as abnormal. It appears that the three alternatively described embryos (C-8299; C-8000; C-8290) were originally defined as abnormal based on their position or depth of implantation 38 .) Table 4 provides information about the 34 embryos found in these 107 women. Although the study was primarily intended to find and describe early human embryos, Hertig subsequently used the data to derive estimates of reproductive efficiency including early embryo wastage 33,90 .
Hertig's analysis 33,90 relies heavily on the 15 normal and 6 abnormal implanted embryos found in 36 women from cycle day 25 onwards. He assumed the 6 abnormal embryos would perish around the time of the first period concluding that fertility (% pregnant) at this stage = 42% (15/36). Of the 8 preimplantation embryos identified (7 in the uterus and 1 in the fallopian tubes), 4 were abnormal. Hertig assumed the 4 normal embryos would implant successfully but that some of the abnormal ones would not, such that the proportion of normal embryos would increase from 50% (4/8) before implantation to 71% (15/21) after implantation as observed. Hence, among the 36 post-cycle day 25 cases, in addition to the 15 normal embryos, there must have been 15 abnormal pre-implantation embryos of which 60% (9/15) failed to implant and were not observed, and 40% (6/15) did implant and were observed, although these 6 would have perished shortly afterwards. This left 6/36 eggs that must have been unfertilised. The ratio of 'unfertilised' : 'fertilised abnormal' : 'fertilised normal' was therefore 6:15:15, matching the 16% infertility (no fertilisation), 42% sterility (post-fertilisation death) and 42% fertility (reproductive success) reported in Figure 9 of Hertig's article, "The Overall Problem in Man" 33 . This is the source of Hertig's 84% fertilisation rate and 50% embryo loss before and during implantation, and is reproduced in Leridon's life table 18 as 84/100 eggs surviving at time zero (ovulation and fertilisation) and 42 surviving to 2 weeks (time of first missed period).
Hertig provides almost the entire body of evidence used to quantify natural human embryo loss in the first week post-fertilisation. Most claims regarding early human embryo mortality find their source here. Before considering how reliable the figures are, it is worth repeating Hertig's own caveat, namely, the lack of data on the efficiency of natural fertilisation 33 . All estimates of embryo mortality from fertilisation onwards are subject to commensurate inaccuracy in the absence of reliable fertilisation probabilities (i.e., π FERT ), which are "surprisingly difficult to estimate" 13 .
There are several problems with Hertig's analysis. As noted by others, the observations are cross-sectional, but the inferences are longitudinal 48 . Hertig detected 21 embryos from 36 cases (58.3%) from cycle day 25 onwards. If this detection rate were representative, then on average, prior to day 25, the detection rate should either be the same or higher; however, they are all lower, and substantially so (Table 4). Hertig suggested that this was due to the technical difficulty of finding newly fertilised embryos. However, the detection rate for cycle days 18-19 was good (46.7%) and embryos one or two days younger would not have been much smaller, at which stage the detection rate was poor (11.1%). An alternative explanation for this discrepancy might simply be random variation. Furthermore, from cycle day 25 onwards, embryos would probably have produced hCG and therefore FEC HCG would have been at least 58%. This is approximately double the equivalent values observed in more recent and robust hCG studies ( Table 3) further suggesting that this subset of the data is not representative.
Despite having proven fertility, these women presented with gynaecological problems, suggesting sub-optimal reproductive function. Furthermore, Hertig's reproductively 'optimal' coital pattern does not include 2 days pre-ovulation and does include one day post-ovulation, conditions which are known not to maximise fertilisation 34,79,84,85,98 . Hence, detection rates before cycle day 25 may be more representative than those after. Given the numerical discrepancies, they cannot both be.
Hertig does not provide error estimates with his conclusions. In order to estimate the precision of his derived proportions, a bootstrap analysis was performed as follows: Hertig's 107 optimal cases were categorised according to stage of cycle (Category 1 = cycle days 16-19 (n=24); Category 2 = cycle days 20-24 (n=47); Category 3 = cycle days ≥25 (n=36)), and presence and type of embryos (Category 0 = no embryo (n=73); Category 1 = normal embryo (n=24); Category 3 = abnormal embryo (n=10)). Five hundred pseudo-datasets each containing 107 cases were generated using a balanced random re-sampling method using Microsoft Excel ® . The original and pseudo datasets are in Dataset 4. See README.docx for a description of the files.

Table 4. Summary of the characteristics of Hertig's 34 embryos (values are taken from Figure 4 in Hertig et al. (1959)).
The embryos were collected from 107 out of 210 women. *In Hertig's figure, day 28 of the ovulatory cycle is identified with day 1 of the next cycle and is the day of the presumed missed period in cases where pregnancy had commenced. The 36 cases that provide the evidential foundation for his numerical analysis are shown in bold. The congruence between these confidence intervals and the point estimates provides some reassurance that that the bootstrap procedure worked effectively. Estimates of parameters other than the day 25 detection rate (58%) are derived from more complex proportional relationships, and are therefore less precise. Table 5 reproduces a life table in the style of Leridon 18 and includes probabilities for each reproductive step with confidence intervals. These intervals (and some noted above) are impossibly wide highlighting further problems with Hertig's analysis.

Day of cycle
Hertig's analysis omits 47 cases from cycle days 20-24, comprising 44% of his data. It is clear why he cannot use it, since all five embryos were normal and, given his mathematical and biological assumptions, five normal implanting embryos could not become 29% (6/21) abnormal post-implantation. Furthermore, the data that define the 50% proportion of abnormal pre-implantation embryos (i.e., 4/8) are so few that any numerical variation will make a substantial difference to derived proportions. If he had observed 3/8 abnormal embryos, his estimate of pre-implantation loss would have been 13% rather than 30%: for 5/8 it would have been 48%, with a fertilisation rate of 111%, which is clearly impossible. It seems therefore, that Hertig designed his analysis based on a post-hoc examination and selective use of the data.
His own caveat about the lack of relevant and necessary data should be taken at least as seriously as his conclusions.
Hertig and Rock's contribution to human embryology is undeniable. However, their quantitative conclusions regarding early embryo mortality have a low precision that undermines their biological credibility or utility. Such estimates cannot be regarded as a reliable foundation upon which to evaluate and understand natural human reproduction.

Discussion
Answering the question "How many fertilised human embryos die before or during implantation under natural conditions?" is difficult. Relevant, credible data are in short supply. Among regularly cited publications, the Lancet hypothesis 26 is entirely speculative and in the view of the current author should cease to be used as an authoritative source. Clinical pregnancy studies are shown for each stage of the early development process. Medians and 95% confidence intervals derived from a bootstrap analysis of Hertig's data indicate the precision in the estimates for fertilisation and embryo loss in the first two weeks. * Although Leridon's values are based on Hertig, they do not fully match. Leridon reports losses of 15 and 27 in the first and second weeks respectively. However, Hertig's 60% loss of abnormal pre-implantation embryos implies 25 (0.6 × 42) losses in the first week leaving 58, and 16 (58 × (6/21)) losses in the second week, leaving 42. ¥ A value of π SOC = 0.90 was used to avoid the calculation of probabilities greater than 1. only useful for quantifying clinical pregnancy loss and contribute nothing to estimates of embryo mortality in the first two weeks' post-fertilisation. Even Hertig's unique dataset is inadequate to draw quantitative conclusions and oft-repeated values should be treated with scepticism. The hCG studies from 1988 onwards provide the best data for estimating embryo mortality although a lack of information on fertilisation rates 13,15,33,48,101 prevents satisfactory completion of the calculations. A recent re-analysis of these data has proposed plausible limits for reproductively normal women indicating that approximately 10-40% of embryos perish before implantation and 40-60% do so between fertilisation and birth 34 . However, these ranges are wide, particularly for pre-implantation mortality, reflecting the lack of appropriate data. Is there any possibility of narrowing down the numbers?

Week after Ovulation
Two separate groups have previously collected embryos from women following carefully timed artificial insemination as part of fertility treatment. Insemination around the time of ovulation in women of proven fertility was followed 5 days later by uterine lavage to recover ova 102-105 . These data appear to hold promise for determining fertilisation efficiency and some authors have made quantitative inferences about embryo mortality from them 16,19,20 . However, such inferences are complicated by numerous confounding factors. For example, in one series 104 , from 88 uterine lavages following artificial insemination by donor (AID), 4 unfertilised eggs, 6 fragmented eggs and 27 embryos from 2 cell to blastocyst stage were retrieved. In the 51 cycles in which no egg or embryo was retrieved, there was one retained pregnancy suggesting that the lavage and ova retrieval efficiency was reasonably high, albeit not perfect. These data therefore suggest that FEC TOT was low (≈31/88 = 35%) although a proportion of fertilised eggs may have completely degenerated within the first 5 days. Assuming π SOC was high (given the targeted insemination), this suggests that π FERT ≈ 50%. In the context of the recent analysis 34 , this implies that π HCG is high and that levels of embryo mortality are therefore towards the lower end of the 10-40% and 40-60% ranges. However, the clinical pregnancy rate following transfer of the embryos was only 40%. This is equivalent to π HCG × π CLIN . If π CLIN ≈ 75%, as suggested by the hCG studies, this would mean that π HCG ≈ 50%. This would imply that π FERT is high, fertilised egg degeneration is high, occurs before day 5 and was therefore unobserved, and hence levels of embryo mortality tend towards the upper end of the 10-40% and 40-60% ranges.
It is possible that the lavage/transfer procedure reduced implantation and early developmental efficiency thereby reducing π HCG × π CLIN . A comparison of AID pregnancy rates may provide some insight as suggested by the authors 104 . The clinical pregnancy rate in their pharmacologically unstimulated cohort was 12.5% (11/88) which is lower than an equivalent 18.9% observed for fresh semen AID 106 , and also the live birth rate (which also incorporates clinical pregnancy losses) of 14.7% reported by the HFEA for AID in 2012 in unstimulated women aged 18-34 107 . These different success rates suggest that the lavage/transfer procedure did adversely affect implantation and early gestation with clear implications for quantitative extrapolation. Furthermore, the women who were embryo recipients were receiving fertility treatment and their overall fertility may have been lower than expected in a normal healthy cohort. In summary, it seems that there are too many unresolved variables in these data to narrow down estimates of fertilization (π FERT ) or implantation (π HCG ) rates.
With high fecundability, the range of possible embryo mortality rates falls. Red deer hinds have pregnancy rates of >85% following natural mating 108 : establishing numerical limits for embryo mortality under these efficient reproductive circumstances is more straightforward. By contrast, humans lack the instinct to mate predominantly during fertile periods thereby reducing observed reproductive efficiency substantially. In studies of early pregnancy loss, owing to sub-optimal coital frequency and cohorts including sub-fertile couples, natural fecundability was almost certainly not maximised 34 . Combining data on coital frequency and hCG elevation may help to address this. In a later analysis, applying the Schwartz model 85 to hCG data, Wilcox calculated a FEC HCG value of 36% for high coital frequencies (>4 days with intercourse in 6 pre-ovulatory days) 79 . However, the model assumed that cycle viability was evenly distributed among couples, a condition which the authors recognised was not true and is contradicted by a subsequent analysis which suggests that approximately a quarter of the Wilcox cohort was sub-fertile 34 . If possible, focussing analytical attention on normally fertile women with the highest coital frequencies may help to further narrow the range of plausible embryo mortality.
In this review of natural early embryo mortality no use has been made of data from in vitro fertilisation (IVF) and associated laboratory studies. Sub-optimal conditions for embryo culture mean that it was 109,110 and probably still is 111 doubtful that reliable values can be extrapolated from laboratory in vitro to natural in vivo circumstances 20 . Importantly, the reproductive stages are also altered. In IVF, π SOC = 1 and for transferred embryos π FERT = 1. Furthermore, transferred embryos are selected based on quality criteria, however inexact those may be 111,112 . IVF program manipulations may reduce π HCG compared to natural circumstances 3 and implantation failure remains a substantial issue for IVF 113,114 . Although for IVF cycles, the reported live birth rate per cycle has gone up (from 14% in 1991 to 25.4% in 2012 34 ), comparison of IVF success rates and natural live birth fecundability values involves too many undefined variables to shed numerical light on early natural embryo development and mortality.
In vitro fertilisation per se may provide some insight into values of π FERT , since π SOC = 1, and successful fertilisation can be observed. In seven studies of natural cycle IVF, fertilisation was successful in 70.9% (443/625) of attempts [115][116][117][118][119][120][121] . If this represented natural, in vivo fertilisation, based on the recent analysis 34 , it implies that π HCG ≈ 0.75, focusing estimates for pre-implantation embryo loss on 25%, and for total loss on 50%. However, high frequencies of chromosomal aberrations caused by the in vitro handling of human oocytes 122 can render any comparison of natural and assisted reproduction open to criticism 4 .
In calculating summary values of embryo mortality, it is important to note that human fertility is as numerically heterogeneous as it could possibly be. Some couples are infertile and some are highly fertile. Excessive attention to averages and neglect of variances fosters a misleading appreciation of reality. The hCG studies clearly had both fertile and sub-fertile participants: use of overall values underestimated fecundability for the fertile majority 34 . Furthermore, apparently 'optimal' conditions for conception may not maximise human biological fecundability.
Other biological factors also contribute to reproductive heterogeneity in humans; however, even after controlling for age-related decline, fecundability remains highly variable 107,123 . For intercourse occurring 2 days prior to ovulation, average fecundabilities resembled those previously published 124 , but for couples at the 5 th and 95 th percentiles, fecundabilities were 5% and 83%. 83% fecundability implies a very low embryo mortality rate.
In conclusion, apparent low fecundability in humans need not necessarily be caused by embryo mortality, but also defects of ovulation, mistimed coitus, or fertilisation failure 34 . Where fecundability is low, any or all of these factors may contribute. Natural pre-implantation embryo loss remains quantitatively undefined. In the absence of knowledge of π SOC and π FERT it is almost impossible to estimate precisely. Hertig's estimate is 30%; however, mathematically and biologically implausible confidence intervals [-28%, 73%] betray the quantitative weaknesses in his data and analysis. The best available data are from studies monitoring daily hCG levels in women attempting to conceive 63,65,66 . Based on analyses of these data, in normal healthy women, 10-40% is a plausible range for pre-implantation embryo loss and overall pregnancy loss from fertilisation to birth is approximately 40-60% 34 . This latter range is similar to, although a little narrower than the 25-70% suggested by Professor Robert Edwards 125 .
In the absence of suitable data to quantify pre-implantation loss, many published articles and reviews merely restate previously published values 6,20,21 . It has been suggested that "for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias" 126 . Widely held views on early embryo mortality may reflect an entrenched and biased view of the biology. For example, the Macklon "Black Box" review 20 has been cited over 200 times (Web of Knowledge citations on 10 th October 2016) with many articles explicitly referencing its 30% survival/70% failure value 8,21,113,[127][128][129][130][131][132][133] . Macklon's quantitative summary in his "Pregnancy Loss Iceberg" (30% implantation failure; 30% early pregnancy loss; 10% clinical miscarriage; 30% live births) is a direct, unedited reproduction of estimates published over 10 years previously 19 . 30% preimplantation loss fairly represents Hertig's conclusions although, as has been shown, this estimate is highly imprecise. However, Macklon misrepresents the best data which he reviews 63,65 . Wilcox reports early pregnancy loss (i.e., ) of 21.7% whereas Macklon's iceberg implies that 43% (30/70) of implanting embryos fail before clinical recognition. The iceberg's clinical loss rate of 25% (10/40) is also higher than relevant data indicate (Figure 2 & Figure 3). Total loss of implanting (hCG+) embryos (i.e., [1 -(π  In attempting to quantify pre-implantation embryo mortality it is easy to appreciate why "a claim of 'no significant difference' might easily be sustained against any interpretation proffered" 48 , and why estimates are "difficult to defend with any precision" 5 . In conclusion, "poor estimates of fertilization failure rate and the mortality at 2 weeks after fertilisation" 15 drawn "from unusual or biased samples" 134 indicate that the "black box" of early pregnancy loss 20 is not as wide open as has been thought.

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work. Dr. Jarvis assesses the empirical support for the belief that there is a "great deal" of fetal wastage in humans. His conclusion is that there is less wastage than is often believed and that the percent loss between conception and birth is 40-60%. Resolution of this issue is important, as it has substantial implications for our understanding of early human development.
He states (p. 2) that four types of evidence underlie these claims: A speculative hypothesis published in The Lancet. Life tables of intra-uterine mortality. Studies of early pregnancy by biochemical detection of hCG.
Anatomical studies of Dr Arthur Hertig and Dr John Rock. On the basis of his review of this evidence, Dr. Jarvis concludes (p. 12) that "….10-40% is a plausible range for pre-implantation embryo loss and overall pregnancy loss from fertilization to birth is approximately 40-60%." This means that the best estimate of pre-birth mortality according to Dr. Jarvis is consistent with many previous estimates. In order to understand this consistency, it is useful to examine these types of evidence and what Dr. Jarvis makes of each. I discuss them in turn.
The Lancet article is Roberts & Lowe (1975). These authors concluded (p. 498) from their "speculative" estimate of 22% conceptions lost and that a high estimate of the number of conceptions results in an estimate of 92% loss. He also generates a 95% confidence interval for the loss percentage of 37% -90% by doing a simulation in which each value contributing to the number of conceptions is normally-distributed with a mean identical to Roberts and Lowe's value and a coefficient of variation of 20%. On this basis, he concludes about Roberts and Lowe's analysis that (p. 1) it "….has no quantitative value." and that (p. 4) it "….has no practical quantitative value".
Dr. Jarvis provides a useful sensitivity analysis of Roberts and Lowe's estimate, which should be taken seriously by those who may believe that their analysis is definitive (their paper has been cited more than 300 times, with many citations that point to the 78% estimate). That said, Dr. Jarvis' conclusion that Roberts and Lowe's analysis is quantitatively useless is itself incoherent. A number is a number and as a starting point, their estimate is useful although limited. If their analysis lacks "practical quantitative value" so too does the analysis of Dr. Jarvis. After all, there is no empirical basis for his assumptions about the statistical independence of the components contributing to his estimate of percentage or that these components are normally-distributed or that they have a coefficient of variation of 20%. It is not as though simply making arbitrary assumptions about the variability of parameters somehow means that an analysis is more quantitatively useful than one without such assumptions. The point is that both analyses have value. It is telling in this regard that their estimate is "close" to Dr. Jarvis' estimate. In fact, one could readily claim that Dr. Jarvis's analysis validates Roberts and Lowe's estimate in as much as their estimate is within the 95% confidence interval he generates.
By way of understanding Robert and Lowe's self-described "speculative" work, it is important to note it belongs to the voluminous "gray" literature relating to human pregnancy. This is the literature that is published without much review (if any) and without much requirement for rigor and data. To see this, one need go farther than this passage (p. 498): Animal studies, which allow a more systematic investigation of [pregnancy loss], have shown detectable prenatal losses ranging from 15 to 60% in domestic cattle, sheep, and pigs and in wild forms such as stoats, rats, squirrels, and rabbits.
They cite Austin (1972) for this claim. He merely states (p. 134): The data show that prenatal losses ranging between 15 and 60 per cent occur in cattle, sheep and pigs, as well as in wild forms such as stoats, rats, squirrels and rabbits.
No data are cited! In fact, Austin's gloss on the loss percentage for domesticated species is reasonably accurate (Casida, 1953;First & Eyestone, 1988;Lasley, 1957) although there are less data than one might imagine. It is of note that these species have been selected for offspring production and so how relevant these data are is not completely resolved. Perhaps fetal wastage in their wild relatives would be greater. My guess is that the data alluded to as being from "wild forms" are in papers such as those by Brambell (1942Brambell ( , 1948. That said, to my knowledge, it is not clear that such studies reliably account for early gestational losses. More generally, there are few "wild forms" for which there are estimates.
The overall point is that Robert and Lowe's paper contains a disconnection between data and conclusions that would be sustained even if one read the cited source. Their paper is best viewed as a heuristic exercise. This is not a criticism. It is meant to underscore that Dr. Jarvis' conclusion that their paper is "useless" treats it as something that it isn't. We are ignorant of the training of Drs. Robert and Lowe but like many authors of the gray literature concerning pregnancy, they may have lacked rigorous training in research practice and data analysis. This is not inherently bad, as long as the nature of such publications is properly understood. As a community of scientists, we can make use of their insight into human pregnancy as long as its potential limitations are understood. We need all the help we can get! pregnancy as long as its potential limitations are understood. We need all the help we can get!
The "life tables of intra-uterine mortality" are French & Bierman (1962) and Léridon (1977).The former 2. study is an analysis of pregnancies in Kauai, Hawaii; the authors' conclusion was that approximately 24% of the pregnancies registered with an estimated gestational age of greater than four weeks would die. Léridon married this result with the data of Hertig, Rock, Adams, & Menkin (1959), which provide an estimate of wastage prior to four weeks, to infer that 63% of conceptions die before birth (Table 4.20, p. 81). Dr. Jarvis' cautions about the assumptions that underlie this estimate are reasonable. That said, it is important to note that the Léridon's chapter ("Intrauterine Mortality", pp. 48-81) is no casual exercise. It is the longest chapter in the book and an open-minded reader can see that Table 4.20 is based upon reasonable assumptions that Léridon clearly states do not have as much of a solid empirical basis as would be desired. Unfortunately, Dr. Jarvis' sole mentions of Léridon's caveats are a statement (p. 5) in which Léridon describes (p. 56) an interpolation he makes (in his analysis of French and Bierman's data) as "risky" and another in which his (Dr. Jarvis) reanalyses of the French and Bierman data (p. 5) "reinforce a concern highlighted by Léridon". To this extent, a reader of Dr. Jarvis' paper could easily come away with the mistaken belief that Léridon's analysis is superficial at best. As in the case of Roberts and Lowe's estimate, it is important to note that Léridon's estimate of conceptions lost of 63% is close to Dr. Jarvis' estimate of 40-60%.
"Studies of early pregnancy by biochemical detection of hCG." The modern pregnancy test is based 3. upon an assay of human chorionic gonadotrophin (hCG), an oligosaccharide glycoprotein hormone produced by embryonic cells. An elevated level of hCG is detectable six to fourteen days post-conception (Nepomnaschy, Weinberg, Wilcox, & Baird, 2008;Wilcox, Baird, & Weinberg,1999). By this time, most embryos capable of implantation will have done so. Unfortunately, earlier pre-implantation detection of pregnancy based upon assay of the "Early Pregnancy Factor", a heat-shock protein expressed within 48 hours of conception, is not in widespread use (Clarke, 1997;Fan & Zheng, 1997;Morton, Rolfe, & Cavanagh, 1992;Rolfe, 1982;Shahani, Moniz, Chitlange, & Meherji, 1991;Shahani, Moniz, Gokral, & Meherji, 1995;Smart, Fraser, Roberts, Clancy, & Cripps, 1982). Dr. Jarvis correctly describes the pioneering hCG results of Wilcox (1988) and others (as summarized in Table  et al. 3), which indicate that the percentage loss of conceptions after hCG detection is between approximately 20 and 60%, with many estimates between 30 and 40%; Dr. Jarvis concludes (p. 6) that this percentage loss is approximately 33%.
Dr. Jarvis goes on to estimate that the "…loss from fertilization to birth [is] 40-60%"; this is based on the combination of three estimates based on hCG assay of percentage loss from conception to birth ( & Selevan, 1996) and his estimate (pp. 7-8) that the efficiency of implantation of embryos "…may be up to 90% efficient…." in order. He concludes that higher estimates of loss from fertilization to birth from the literature are "excessive".
Dr. Jarvis' estimate is likely an underestimate. There is strong circumstantial evidence that many more than 10% of embryos do not successfully implant, as discussed below. The implication of this is that Dr. Jarvis' estimate and the previous estimates are consistent. It is also worth noting that Dr. Jarvis uses an arbitrary estimate for implantation rate, even though he judges other analyses to be useless because they contain an arbitrary parameter estimate.
Dr. Jarvis goes on to criticize Boklage (1990) who estimated the percentage of unsuccessful conceptions Dr. Jarvis goes on to criticize Boklage (1990) who estimated the percentage of unsuccessful conceptions based on an analysis of hCG data (see his Figure 2, p. 84). Dr. Jarvis is right to raise concerns (p. 8) that Boklage's analysis is less definitive than desired. In particular, he states (p. 8) that Boklage's assumption that the 21-day survival rate of conceptions is 28.7% is based upon a misinterpretation of a previous study. That said, Dr. Jarvis makes an unsubstantiated conclusion (p. 8) that "…quantitative conclusions from [Boklage's] analysis in relation to the survival of naturally conceived human embryos are of doubtful validity". This may be true, but this remains to be seen given the lack of any demonstration of the sensitivity of Boklage's quantitative conclusions to changes in the underlying assumptions. Boklage's analysis needs more careful scrutiny than given by Dr. Jarvis. For example, Boklage presents a formula for the percentage loss of conceptions as a function of time (p. 84). Are the coefficients estimated via a standard statistical approach such as maximum likelihood estimation and chosen via a likelihood ratio test or via comparison of AIC values associated with competing models? This is not clear. As such, it is unclear as to what to make of the predictions even putting aside Dr. Jarvis' concerns about the biological validity of some of the underlying data. The equation appears to be based upon the assumption that a cohort of embryos is an admixture of those that are likely to die before six weeks and those that will survive longer. The basis for this assumption is unclear. The lack of transparency of Boklage's equation is underscored by the fact that Dr. Jarvis does not mention that it predicts 75.8 percent fetal wastage between conception and full-term birth (270 days). As above, this estimate is rightly or wrongly consistent with most previous estimates.
The "anatomical studies of Dr Arthur Hertig and Dr John Rock" are investigations of conceptions 4. recovered from uteri obtained via gynecologic surgery. Their results are summarized in Hertig et al. (1959); Hertig & Rock, (1973); Hertig, (1967). As described by Dr. Jarvis (p. 9), Hertig 's et al. conclusion is that 50% of embryos will die within two weeks after conception.
Dr. Jarvis' is correct to point out concerns about their conclusion, although we believe that it has been well recognized that it is "impressionistic" as opposed to something that has a solid quantitative underpinning. Of course, as noted by Dr. Jarvis, their work remains important.
Dr. Jarvis makes some assertions about Hertig 's work that seem mainly intended to accentuate et al. doubts about it as opposed to placing it in proper context. He notes correctly (p. 9) that the sample is cross-sectional and not longitudinal. Given the nature of this study, this was unavoidable. Dr. Jarvis notes there are some unresolved discrepancies among age-specific detection rates for embryos and also between the estimated implantation rate and the rate inferred from other studies. These are worth mentioning but the implications of these discrepancies remain ambiguous in the absence of a quantitative analysis that accounts for sampling variation.
Similarly un-useful is Dr. Jarvis' statement (p. 9) that "Despite having proven fertility, these women presented with gynaecological problems, suggesting suboptimal reproductive function." There is a wide range of "gynaecological problems" and an unanchored assertion that such a broad category might result in "sub-optimal reproductive function" means nothing in the absence of evidence that whatever problems were present had some influence on embryonic viability. In an effort to "estimate the precision" of the various proportions presented by Hertig (e.g., the survival rate to implantation), Dr. Jarvis generated et al. 500 so called "bootstrap" samples from the original data consisting of 107 cases. These samples arise from sampling with replacement of the original data (e.g., see Efron & Tibshirani, 1986;Efron, 1987). Such an investigation is worthwhile, although a bootstrap analysis is not a "cure" for small sample size. In any case, Dr. Jarvis' analyses of the bootstrap results are incorrect. He describes (p. 10) "95% CIs" any case, Dr. Jarvis' analyses of the bootstrap results are incorrect. He describes (p. 10) "95% CIs" for various proportions that are outside of the range of 0-100%. For example, the confidence interval (p. 10) he provides for pre-implantation embryo survival probability is 27-128%. Such an interval cannot be generated by a correct bootstrap analysis. There are various ways to calculate a bootstrap confidence interval (Efron & Tibshirani, 1986). The simplest, known as the "percentile method", generates a 95% bootstrap confidence interval for a proportion directly from the range of proportions associated with the central 95% of the bootstrap estimates. Accordingly, the confidence interval must be between 0 and 100% because each of the bootstrap samples must generate a proportion between 0 and 100%. Dr. Jarvis' mistake appears to be that he estimated an average proportion and its variance from the ensemble of bootstrap estimates and then calculated the confidence interval using standard formulae (p. 10). The purpose of bootstrap estimation is to avoid such calculations, which can generate inaccurate confidence intervals. Although some of the bootstrap confidence intervals provided by Dr. Jarvis do not fall below 0% or surpass 100%, we guess that all of them are incorrectly calculated. Unfortunately, the incorrect confidence intervals are described by Dr. Jarvis (p. 12) as "mathematically and biologically implausible" and taken to "….betray the quantitative weaknesses in [Hertig 's] data and analysis." et al. Indeed, they are "mathematically and biologically implausible" but the reason is that they were not correctly calculated. Whatever bearing a bootstrap analysis has on our understanding of the "precision" of Hertig 's data and analyses remains to be seen. et al.
Dr. Jarvis' central argument is that there is more ambiguity associated with estimates of fetal wastage in humans and that this ambiguity is not widely understood. Many of his concerns should be taken seriously. Nonetheless, his analysis is undermined by errors of analysis and overstatement. In the end, his estimate of fetal wastage from conception to birth is consistent with many of the previous estimates.
Dr. Jarvis' analysis is also undermined by an incorrect dismissal of data from embryos created via assisted reproductive technology (ART), which he refers to as fertilization (IVF). On page 11, he in vitro alludes to "…sub-optimal conditions for embryo culture…" and implies that somehow ART embryos are "different" in undefined ways from naturally-conceived embryos that negate their potential use in regard to estimating fetal wastage. This is an exercise in rhetoric, not a scientific argument. It is true that ART embryos are different from natural embryos in ways that could influence an estimate of fetal wastage. However, it is essential to note that they constitute the best available sample for insight into the "black box" of early pregnancy, despite the possible biases they may have that could distort our view into the black box. To this extent, it is best to assess what information they can provide about fetal wastage, rather than provide tenuous or irrelevant reasons as to why they are not useful.
Dr. Jarvis mistakenly assumes (p. 11) that only ART embryos transferred into mothers would provide information about fetal wastage. In fact, as Dr. Jarvis notes, there are a number of reasons why transferred embryos are not representative of all embryos (e.g., conscious or unconscious quality biases, sex selection) and accordingly, this kind of sample could be misleading. That said, studies of such samples suggest that at least some aspects of their biology are identical to that of naturally-conceived embryos. For example, the sex ratio at birth for ART embryos is statistically identical with that of natural conceptions (Orzack , 2015). et al.
More importantly, the entire ensemble of ART embryos (untransferred and transferred) provides information about fetal wastage. Almost all ART embryos undergoe testing for chromosomal abnormalities, such as aneuploidy. The consequences of aneuploidy are well-known -it results in almost certain death before birth. This is consistent with the fact that many spontaneous abortions are karyotypically abnormal (Boué, Boué, & Lazar, 1967, 1975Jauniaux & Burton, 2005). To this extent, the frequency of such abnormalities provides strong circumstantial evidence as to the amount of fetal wastage. Orzack (2015) investigated a sample of ART embryos whose karyotypes were assayed via et al.

wastage. Orzack
(2015) investigated a sample of ART embryos whose karyotypes were assayed via et al. FISH or CGH and reported that 84,881 out of 139,704 embryos contained at least one aneuploid chromosome. The implied percentage of fetal wastage (60.8%) is remarkably consistent with the central tendency of the many reports that Dr. Jarvis dismisses as unreliable, as well as with his own estimate. As noted, we need to be cautious about inferences from this sample but not avoid making them. There is no compelling reason to think that "suboptimal" conditions for embryo culture (if any) cause many chromosomal abnormalities, most of which very likely arise during meiosis (e.g., Hassold & Hunt, 2001;Hunt & Hassold, 2007;Jones, 2008;Nagaoka, Hassold, & Hunt, 2012). What deserves scrutiny are whether the frequency of chromosomal abnormalities is elevated by techniques for collecting eggs and/or because women providing them for use in ART are unrepresentative of all reproductive women. There are limited data that unstimulated and stimulated oocytes have similar frequencies of abnormality (Labarta , 2010). Of course, women using ART are often older than many typical et al. mothers. However, a high frequency of karyotypic abnormality is also observed among oocytes from young women (Baart , 2006;Munné , 2006). These concerns should continue to be investigated et al.
et al. but they in no way imply that ART embryos cannot provide useful insights about early human development and fetal wastage, especially given the current lack and very likely continuing lack of a large sample of naturally-conceived human embryos.
We see then a web of circumstantial evidence implying that there is a substantial amount of fetal wastage in humans. This insight arises from imperfect types of knowledge (as documented by Dr. Jarvis) but nonetheless, there is a signal consistent with the claim that approximately half or more of conceptions fail. More needs to be done to improve our understanding.
The study of fetal wastage shares with the study of the human sex ratio during pregnancy the fact that many different kinds of scientists are involved and so, the associated balkanization has reduced the accountability that arises from a shared disciplinary perspective about the standards for the interpretation of data (Orzack, 2016;Orzack , 2015). One cause and consequence of this division is the gray et al. literature mentioned above.
What contributes to the continuing "life" of the gray literature? Science abhors a vacuum and claims about high fetal wastage in humans have been repeated often in a way that the connection with assumptions and data have gotten obscured or lost. Some claims date well before there was any means by which early mortality could be assessed (Mall, 1917;Meyer, 1920;Pearson, 1897). Pearson clearly acknowledged the lack of direct evidence but such caveats get lost especially in medicine in which attention to standards of evidence, recognition of the assumptions needed to connect data with conclusions, and awareness of needed statistical techniques have been less as compared to in biological research. These deficiencies have diminished as medical training has incorporated more scientific training but have not disappeared. Nonetheless, during medical training the "inhalation" of facts is important. It is one reason as to why many believe that fetal wastage is high, despite having little or no familiarity with the available data along with the ins and outs of their analysis and interpretation.
(We have replaced number citations with author citations). Several of these claims are in medical (We have replaced number citations with author citations). Several of these claims are in medical textbooks and are akin to newspaper articles, i.e., they are reports on prior research as opposed to being independent estimates. Even then the nature of the evidence can go unmentioned. For example, in their text book Johnson & Everitt (2000) include no evidence or citations in which to find evidence underlying their estimate. Of the claims in the primary literature, we again see a lack of independent evidence in as much as someone else's estimate is reported. For example, Chard (1991); Drife (1983) We now know that for every successful pregnancy that results in a live birth many, perhaps as many as five early embryos will be lost or will "miscarry"…. This is clearly a heuristic estimate! The point is that there is less of a monolithic ensemble of flawed estimates that need to be debunked than one might imagine given Dr. Jarvis' passage. In any case, there is nothing inherently problematic about the citations just described. Indeed, it would be preferable if attributions were better and speculation was better highlighted as such. Nonetheless, such estimates should be used with caution but not discarded, given the substantial difficulties associated with the estimation of fetal wastage in humans.
An ideal future investigation of fetal wastage is easy to imagine: daily assessment of EPF and hCG for a cohort of women attempting to get pregnant. Easier said than done! Consider what such a study would require: a reliable assay for EPF, the enrollment of thousands of women, collection of and accurate assessment of thousands of samples, and more. Perhaps these technical and logistical barriers can be overcome soon. In the meantime, we can recognize that there is strong circumstantial evidence that human fetal wastage is likely between 50 and 75%. At the same time, we can recognize along with Dr. Jarvis that this conclusion lacks definitive proof and that additional investigations and scrutiny are needed. PubMed Abstract Publisher Full Text 25. Jauniaux E, Burton GJ: Pathophysiology of histological changes in early pregnancy loss.

Introduction
The purpose of my article is to evaluate available data that contribute to our quantitative understanding of natural human embryo mortality. The body of relevant data is small, as noted by the reviewers, although I have attempted to identify all of it. I deliberately avoided IVF/ART data since there is so much, and it is not obvious how such data illuminate natural circumstances (I comment further on this below). My comments on IVF/ART data are therefore confined to the Discussion.
Orzack & Zuckerman repeatedly refer to my estimates of 10-40% preimplantation loss and 40-60% total embryo loss, which are important benchmarks for my article. They are critical of these, although they do not seem to appreciate where they come from. Contrary to what they imply ("On the basis of his review of this evidence…"), they do not arise from analyses described in this article. Rather, they are from an analysis described in a previous article in . I have F1000Research amended the article to clarify this point. Concerns with the validity of these estimates should focus on that analysis, which is not listed among their 53 references.
In their review, the reviewers are ambiguous (one might say 'gray') in their use of quotation marks and appear to ascribe to me things I did not write. For example, I do not use the phrase "great deal". Thus, for the sake of clarity, and to separate literary emphasis from quotation, I will follow the convention employed by GEM Anscombe, who coined a useful phrase , to distinguish between 'scare quotes' and "quotations".
I address points raised in the review, approximately in the order in which they appear.

Roberts & Lowe
Orzack & Zuckerman state that I calculate 95% confidence intervals. This is incorrect. The range [37-90%] is not a confidence interval, I do not refer to it as such, and nor can it be, since there are 1 2 [37-90%] is not a confidence interval, I do not refer to it as such, and nor can it be, since there are no data. As described in the article, it is the range within which 95% of simulated estimates fall, based on Roberts & Lowe's speculative values and other assumptions.
The reviewers suggest that my analysis lacks "practical quantitative value". I agree. This is the point and I am glad they have recognised it, if not entirely appreciated its significance. My analysis has "no practical quantitative value" . As I for estimating the number of conceptions that are lost explicitly point out, the sole purpose of the sensitivity analyses is to show that modest changes in the speculative estimates used by Roberts & Lowe may result in any biologically plausible value for embryo loss. is not 'Gray Literature'. I The Lancet comment further on this below. Contrary to the reviewers' suggestion, we are not completely "ignorant of the training of Drs. Roberts & Lowe" or unaware of their experience in "research practice and data analysis". Charles Ronald Lowe was the more senior of the two. He was 63 years old and Professor of Social and Occupational Medicine at the University of Wales College of Medicine when article was published. He "contributed much to the growth of academic The Lancet public health and the teaching of epidemiology and statistics." I do not describe their work as "useless" -if intended as a quote, then it is a misquote. I describe it as having "no practical quantitative value". These are carefully chosen words. (I have edited the equivalent phrase in the Abstract to match the full text.) The critique offered by the reviewers and their description of the paper as heuristic support this view. Nevertheless, I have added a statement that, as a model for highlighting factors that influence fecundity, the Roberts & Lowe analysis has some value.
In all fairness, on four separate occasions, I describe the analysis of Roberts & Lowe as a "hypothesis", i.e., the banner under which it was originally published in . Indeed, they The Lancet describe their arithmetic as "speculative"; however, they also describe their estimate as "conservative", implying that the true result may be even higher than 78%. My critique would be less germane had their hypothesis not been cited so widely ("more than 300 times", as helpfully pointed out by the reviewers). I suggest that it is not I, but those who enthusiastically cite it who treat it as "something that it isn't".

Life Tables of Intrauterine Mortality
I do not consider Leridon's chapter a "casual exercise" or "superficial". On the contrary, it is a well-reasoned attempt to answer a challenging biological question. I have included a tribute in my article to Leridon's review. I hope this prevents readers from gaining such false impressions.
I agree with the reviewers that Leridon's 63% is close to my 40-60%. However, Roberts & Lowe's 78% is not, as they imply. 78% is not, as they imply.
A critique of Leridon's life-table is not a critique of Leridon at all, but of French & Bierman and Hertig . I discuss briefly why French & Bierman may be an overestimate and, in detail, how Hertig's analysis is flawed. Leridon's account has been widely cited, especially by those describing embryo loss at the earliest stage. I hope readers will find it useful to know how Leridon's values are derived.

hCG studies of early pregnancy loss
The Edmonds (1982) estimate of approximately 60% loss is the highest I report and, for reasons discussed in the article and mentioned by others , is likely to be an over-estimate. Nevertheless, years after the more credible Wilcox (1988) study was published, Edmonds is still widely cited to justify high levels of embryo wastage. For example, Hyde & Schust (2015) cite both Edmonds and Wilcox to support their claim that "Approximately 70% of human conceptions fail to achieve viability, with almost 50% of all pregnancies ending in miscarriage before the clinical recognition of a missed period…" By showing Edmonds' results in context, I hope this kind of overstatement can be avoided.
My conclusion of one third loss is based on the average of the eight listed studies from Wilcox to the present day (unweighted average = 31.9%). I have edited the paper to make this clear. I also discuss why the estimates prior to Wilcox are less reliable and cite several studies that make similar observations. As already noted, my 40-60% estimate is from a previous analysis and is not a combination of the values (31.3; 35.7; 31.3) highlighted by the reviewers. My rationale for using a 90% implantation (and fertilisation) efficiency is found in that analysis .
My conclusion regarding the validity of Boklage's analysis of embryo mortality is not "unsubstantiated". Indeed, the reviewers mention a key point of substance: namely, that Boklage's value of 28.7% misinterprets the biology. Boklage uses this as measure of embryo mortality, whereas it is a fecundability. If fecundabilities are analysed as embryo mortalities, surely this casts doubt on the validity of conclusions regarding embryo mortality. I cannot comment on Boklage's statistical methodology (i.e., use of MLE, LRTs or AIC values) since he reports no such detail. However, I thank the reviewers for highlighting the lack of clarity in Boklage's analysis.
Contrary to the claim of the reviewers, I refer to Boklage's estimate of 76% loss from conception (fertilisation) to birth on three occasions. This 76% estimate is consistent with Roberts & Lowe's value. It is somewhat higher than Leridon's (whose life table is inexplicably omitted from the Boklage analysis). It is clearly not consistent with my 40-60% estimate .

Hertig's data and analysis
Regarding Hertig's conclusion, Orzack & Zuckerman "believe that it has been well recognized that it is 'impressionistic' as opposed to something that has a solid quantitative underpinning". I agree that Hertig's conclusion does not have a "solid quantitative underpinning"; however, it is precisely that Hertig's conclusion does not have a "solid quantitative underpinning"; however, it is precisely the quantitative underpinning of Leridon's life table and other claims about early natural embryo mortality. This is a key point of my article. It is not clear what the reviewers mean by 'impressionistic' : some authors seem to offer an 'unimpressionistic' account of Hertig. For example, in the widely-cited 'Black Box' review , Macklon . write regarding Hertig's study: et al "…the high rate of early pregnancy loss before the time of the first missed period was thus clearly demonstrated…" Other less widely-cited articles do address the design and analytical shortcomings.
Pointing out shortcomings in studies is what scientists (and reviewers) are meant to do. Thus, I agree with the reviewers that they are "worth mentioning". Furthermore, by pointing out that Hertig's subjects were of proven fertility, had gynaecological problems and may have had suboptimal reproductive function, I am placing Hertig's study "in proper context". This is not "un-useful". Nevertheless, I have edited this section, to accommodate these reviewers' scepticism with the more positive view of others . I hope I have struck an acceptable balance.
Orzack & Zuckerman appear to have concerns with well-established statistical techniques, referring to my "so-called 'bootstrap' samples". I agree that bootstrapping is "not a 'cure' for small sample size", but I do not claim that it is. Bootstrapping can provide estimates of precision when it is not possible to calculate these analytically. As with all analyses, outputs require appropriate interpretation.
The reviewers state that the "analyses of the bootstrap results are incorrect" because some of the confidence intervals lie outside the range 0-100%. I am aware that this is impossible (for a probability) as I explicitly point out. Such outputs do indicate a serious flaw in the analysis, which is as follows: Hertig ignores 47 of his 107 cases. These cases are included in my bootstrap. The reader may consider whether ignoring 44% of the data is reasonable and the extent to which by doing so Hertig has generated biased estimates of the probabilities he calculates. Kline . et al (1989) make a similar point: "The missing data are sufficient to engender an entirely different result" . The bootstrap therefore illustrates the extent to which Hertig's estimates are biased by ignoring his own data. There are other reasons to doubt the precision of his conclusions and the representative nature of the subset of data upon which he relies so heavily -these are described in the article.
The bootstrap pseudo-datasets are available for scrutiny (Dataset 4). Thus, if there are any flaws in my reasoning or bootstrap, the reviewers may point these out. I used the percentile method (to which they refer) to calculate the 95% CIs and I have edited the text to clarify this. I do not believe there are any flaws in my bootstrap.

IVF/ART data
There is a wealth of data from IVF/ART studies and I have only mentioned a tiny proportion of this.
Orzack & Zuckerman and a previous reviewer suggest that such data could contribute to a quantitative understanding of the situation. In the broadest sense, this is of course true. in vivo However, there are difficulties in extrapolating from to circumstances. I am not alone in vitro in vivo in pointing this out , and I have illustrated some of these difficulties in the Discussion.
My description of "sub-optimal conditions for embryo culture" is drawn from two papers:

2.
My description of "sub-optimal conditions for embryo culture" is drawn from two papers: Bolton & Braude (1987) : "Optimal culture conditions for human embryos have yet to be defined" and "suboptimal culture conditions are undoubtedly responsible for a proportion of this embryonic failure". Bolton . (2015) : "Embryo culture conditions are likely to be suboptimal et al in vitro compared to those ." in vivo Is this just rhetoric or a reasonable consideration?
Describing data as the "best available" is a weak claim in the absence of equivalent natural in vitro data. The extent to which embryos are representative of embryos is precisely in vivo in vitro in vivo the point in question. Is there really numerical consistency between natural and IVF/ART embryos? There may be consistency in sex ratios , but does that extend to aneuploidy rates, mosaicism, epigenetic defects, implantation potential, spontaneous abortion rates, etc? These are big questions and this article is not the place to answer them. However, if 70% loss is the natural benchmark by which IVF/ART embryos are judged to be equivalent to natural embryos , but the true rate of natural loss lies in the range 40-60%, this therefore casts doubt on the judgement that IVF/ART and natural embryos are equivalent. Furthermore, the suggestion that IVF/ART and natural embryos may be different is neither radical, novel, nor strong . However, the real reason I do not consider IVF/ART embryo data is that the article is a critique of data from natural circumstances. Comparison of natural and IVF/ART embryos is a project for the future.
The reviewers refer to my "tenuous or irrelevant reasons" why ART embryos are not useful for quantifying early embryo mortality, yet they provide the perfect reason themselves: "it is true that ART embryos are different from natural embryos in ways that could influence an estimate of fetal wastage" . Nevertheless, I do discuss circumstances in which different ART interventions (e.g., observation of fertilisation ; retrieval of embryos following timed artificial insemination, in vitro per se as well as AID/IVF success rates) may cast light on embryonic/fetal wastage.
Orzack & Zuckerman extrapolate from 84,881 aneuploidies among 139,704 IVF/ART embryos to an "implied percentage of fetal wastage" of 60.8%. They state that this is the "central tendency" of "many reports" that I dismiss as unreliable. Of course, if this were true, then the observation would add little to what was already known. It is not clear which are the "many reports".
Let us consider the hypothesis that aneuploidy predicts natural total fetal wastage. Firstly, in vitro "The only well-established epidemiological facts about EPL { } are that about early pregnancy loss 50-60% of cases are associated with a chromosomal defect of the conceptus" suggesting that euploid embryos may also fail. Secondly, "FISH may overestimate the incidence of aneuploidy" suggesting a proportion of apparently aneuploid embryos may not fail. Furthermore, aneuploidy may not developmentally compromise embryos ; estimates of IVF/ART embryo aneuploidy/mosaicism vary considerably ; mosaic embryos can self-correct ; aneuploidy in trophoblast/placental cells may be less developmentally problematic -who knows, it may even be advantageous! The point is simple. There are too many undefined variables associated with IVF/ART embryos to shed more than the faintest light on the question of natural embryo survival. I have included a brief discussion of some of these issues and edited the penultimate paragraph to be more circumspect by replacing an "are" with a "may be". I hope this meets with the reviewers' approval.

Gray Literature
On several occasions, the reviewers refer to Gray Literature. They offer a revealing account and speculate on its continuing 'life'.
Gray Literature has been defined as follows: "That which is produced on all levels of government, academics, business and industry in print and electronic formats, but which is not controlled by commercial publishers." The list of references reproduced by the reviewers, starting with Opitz, 2002 and ending with McCoy . 2015 are all from academic books, journals, or text books. They are all published by et al commercial publishers. They were all written (with one exception) by medical practitioners or scientists, many of whom are experts in reproductive biology. The one exception (Harris, 2003) is a moral philosopher; however, the reviewers usefully point out that his estimate comes from a well-known and eminent reproductive biologist.
None of this is Gray Literature.
, & Human Reproduction Update Fertility & Sterility PLOS Genetics are reputable academic journals. Many of these articles will have been peer-reviewed. Even pieces "akin to newspaper articles" (the Drife (1983) piece could be described as such and was BMJ probably not peer-reviewed ) are subject to editorial control, and an expectation of academic professionalism is surely reasonable from such experts.
The reviewers state that it would "be preferable if attributions were better and speculation was better highlighted". I agree. Yet they highlight my 'so-called' "errors of analysis" and "overstatement" whilst passing over errors and overstatement in these citations as "nothing inherently problematic".
What Orzack & Zuckerman describe and defend is not Gray Literature, but 'Gray Scholarship'.

Heuristics
A heuristic estimate may be based on simplified quantitative criteria, educated guesswork, rules of thumb, common sense, past experience, etc. Despite their utility, in the absence of evidence heuristic estimates may become biased. Faced with inconsistent estimates, on the one hand, those that are heuristic or based on circumstantial evidence, and on the other, those based on well-defined analysis of relevant data, surely an appropriate scientific response is to favour the latter and re-evaluate the former.
A further problem with heuristic estimates is that the process for deriving them is not always transparent. For example, it is not obvious how Orzack & Zuckerman use the "web of circumstantial evidence" to which they refer to conclude that "human fetal wastage is likely between 50 and 75%". There is something 'gray' about this. My estimates of 10-40% preimplantation loss and 40-60% total loss are partly evidence-based and partly heuristic. They may be imperfect, and no doubt will not be the last word on the matter, but it is at least clear how they were derived . Thanks for the opportunity to review this high-quality manuscript. Peer review can be a chore, but this was a pleasure to read. I will state that my training is in statistics and research methodology. Although much of my work is in the field of fertility, I have no clinical expertise and no familiarity with the literature discussed in this review. Any comments I make are from the point of view of the statistician and, with respect to the subject-matter, the layperson.
I am unable to comment on whether or not the body of evidence discussed in the review is comprehensive. However, the critical appraisal of these studies is conducted to a high standard, with a strong command of quantitative research methods on display. I can't fault it. The reader is left in no doubt as to the considerable limitations (many of which appear to be fatal) of these studies. All data used in the manuscript have been made available for the purposes of reproducing the analysis. I was slightly confused by the description of the simulation study as a two-stage procedure in the critique of Roberts & Lowe. If I understand correctly, sets of simulated values for five quantities were drawn from Normal distributions centred around the estimates used by Roberts & Lowe, with standard deviations equal to these values multiplied by 0.2. Each time a new set of these five quantities was drawn, the values were used to calculate (predict) a value for embryo loss. This was done 100,000 times. However, the author speaks of 1,000 simulations, each containing 10,000 separate estimates. It is unclear what exactly varied within and between the 1000 simulations. If the data generating model was the same for all of these (ie: this was just done for computational reasons), then it would be helpful if the author could make this clear in the text.
The author assumed that the simulated quantities were independent in the simulation -I confess to having no real intuition as to the implications of this assumption. However, I don't believe this would affect the author's conclusion.
One minor typo; 'this is far from being a robust pregnancy diagnosis and in different study [46]…' I believe that it would be appropriate to accept this manuscript without revision, although the author may wish to clarify the point about the first simulation described above.
Conducting peer review may be beneficial to my career. I am funded by a Doctoral Competing Interests: Research Fellowship from the National Institute for Health Research (DRF-2014-07-050). The views expressed in this peer review are those of the authors and not necessarily those of the NHS, the National Institute for Health Research or the Department of Health. JW is a statistical editor of the Cochrane Gynaecology and Fertility Group.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

6.
7. different stages of early pregnancy -this would mean individuals who are not well versed in reproductive function would be able to understand the arguments he is providing.
We are pleased to see this article being written. We think it is timely, thought-provoking and this is an excellent moment in which to consider in realistic terms the kind of evidence that is constantly requoted in the debate about how fertile the human species is. Currently this topic is dominated by data from studies on women who are sub/infertile receiving medical support to achieve a pregnancy.

Specific points
Who is the audience for this paper and does the introduction set the scene in such a way that the reader will be both interested and motivated to read the remaining part of the paper, which I would like to see them do? I think as written the Introduction may not achieve this objective. For example the first sentence starts with some glib comments about it being 'widely accepted' that under natural circumstances human embryo mortality is high, and then there is an extensive section quoting number of popularist articles and websites -why have this up front? It seemed to undermine the erudite arguments of the rest of the paper.
The second paragraph with some modification would make a sufficient introduction. The aim of the review as stated in the discussion 'How many fertilized human embryos..?' should also be frontloaded at some point here. Clearly embryo mortality is of interest to both reproductive biologists and fertility doctors but why not also mention couples trying to conceive?
Reading the introduction we were struck by the pressing need for 'key terms' box -the kind of thing you see in Nature papers -where there is a definition of each of the terms used, e.g. Fecundability, embryo, HCG, etc. If this paper is going to be read by individuals who are not fertility experts or experts in reproductive biology but people interested in ethics or chance or statistics, I think they will be very confused by the different terms that are used.
What is not clear from the paper is the chronology of the observations/data being discussed. It is common for people (even those familiar with the field but who work on animal models) to be very confused by the timings in women. For example -the day on which fertilisation takes place versus the last menstrual period, e.g. fertilisation versus gestation versus the first day (depending on when you count from) on which you might reasonably expect to detect HCG in the urine. We would argue there needs to be a figure defining when each of these happens in terms of days in a woman's reproductive span. This could also help clarify the points in the process that the probabilities of π π etc can apply to.
The second piece of information where we think it would be very helpful is under the section called 'What the data say' where the terms such as 'old' are added and there are no dates or refs provided. What do they mean by 'old ' -pre 1960, pre 1950, pre 1940? Because the author has used numbered references, there is also no sense of the relationship of one study to another in terms of dates i.e. how they chronologically relate to each other. Some minor reworking in which the author says, for instance, "the work of Hertig and Rogg in the 1950's" would be helpful.
The author is also slightly confusing when talking about the pregnancy study (ref 42) not giving the FERT, CLIN 7. 8.
The author is also slightly confusing when talking about the pregnancy study (ref 42) not giving the names of the authors nor the date on which it was published in the section on page 4, and then in the reference, for instance in Fig 2, they talk about the pregnancy study ref 42 but in the figure it is shown as French and Bierman 1962. This is the kind of things that make it difficult to get a sense of the chronology of observations and how people have built on each other's observations in order to support subsequent studies, and this after all is one of the most crucial points of this paper.
On page 6 we finally get to some discussion about modern pregnancy tests. It is not until some pages after that we know whether they are in blood or urine. Mid cycle elevation of HCG -this is not defined in terms of days (cf comments above). For information the fact that these assays were likely to be urine-based assays is not mentioned until page 7.
We think many aspects of this paper are extremely well argued, very much so the data provided. The very great detailed analysis in Table 3 and also in other parts of page 7, and some very good points are made about the over-emphasis on using data from patient groups where infertility is probably one of the reasons for presentation that may have caused a less robust data set.
The author makes a valid argument about potential subfertility within the Hertig cohort but this is not balanced. Equally, these women were selected for proven fecunditity and this factor affects interpretation of this cohort as much as the other.
On page 10 the discussion starts with a key question how many fertilised human embryos die. It is slightly frustrating that this was not put up front as the question being addressed in this paper. Maybe the author might like to consider setting out aims more clearly.
Again, in the discussion, many of the arguments being made would have been greatly enhanced by telling us the dates on which some of these studies were conducted. When looking at the reference list I see many of them were in the '80s and early '90s.
We wonder if the first paragraph on page 12 might reasonably be eliminated -it feels repetitive compared to other parts of the paper. I think the discussion of the studies by Macklon review ref 20 is extremely insightful and useful. However we draw the author's attention to a more recent study by Macklon and Brosens which we believe puts forward some interesting arguments that might reasonably be discussed in his study about how the endometrium in which the embryos are set to implant might be acting as a 'sensor' of embryo quality. This is in Biology Reproduction 2014, vol 91. There is also a complementary paper in Sci Rep, vol 6, Brosens 2014. et al.
The conclusion of the discussion seems more like a continuation of the critique of the final few paragraphs. It would be desirable to provide a concluding paragraph which holistically draws together the content of the review. Again the heavy use of quoting references as appears in the introduction masks the opportunity for the author to provide his own conclusions.
In summary we welcome this review which we think makes many erudite comments on a difficult field.
No competing interests were disclosed.

Competing Interests:
We have read this submission. We believe that we have an appropriate level of expertise to 10.
I agree with the point made. In this section the first point I make about the 210 recruited women is that they were of "proven fertility". The same point is made again in the commentary on Hertig's data. I have also edited the text in response to Reviewer 3 and hope that the final result is appropriately balanced. The question (slightly modified) has been included in the Abstract and Introduction. I hope that this helps to clarify and reinforce the purpose of the article. Some dates have been incorporated to enhance chronological clarity (see point 6). The first paragraph on page 12 addresses the importance of biological variance. It does not go into detail but stresses that point estimates of risk do not provide the whole picture when considering either populations or individual cases. As I have put it, a neglect of variance fosters a misleading appreciation of reality. I would prefer to retain this paragraph, in the hope that it will encourage readers to consider the importance and implications of numerical diversity when interpreting data.The arguments proposed by Macklon and Brosens relating to endometrial receptivity are indeed interesting. However, they propose mechanistic explanations for implantation failure and do not directly address the issue of how frequently such events occur. Nevertheless, their inclusion is contextually valuable, and I have made some comments on their studies. The final paragraph has been edited. The quotations are useful to make it clear that I am not alone in drawing attention to the limitations of the available data. I have endeavoured to summarise the broad purpose and value of this work.

Competing Interests:
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com