Oral hormone pregnancy tests and the risks of congenital malformations: a systematic review and meta-analysis

Background: Oral hormone pregnancy tests (HPTs), such as Primodos, containing ethinylestradiol and high doses of norethisterone, were given to over a million women from 1958 to 1978, when Primodos was withdrawn from the market because of concerns about possible teratogenicity. We aimed to study the association between maternal exposure to oral HPTs and congenital malformations. Methods: We have performed a systematic review and meta-analysis of case-control and cohort studies that included data from pregnant women and were exposed to oral HPTs within the estimated first three months of pregnancy, if compared with a relevant control group. We used random-effects meta-analysis and assessed the quality of each study using the Newcastle–Ottawa Scale for non-randomized studies. Results: We found 16 case control studies and 10 prospective cohort studies, together including 71 330 women, of whom 4,209 were exposed to HPTs. Exposure to oral HPTs was associated with a 40% increased risk of all congenital malformations: pooled odds ratio (OR) = 1.40 (95% CI 1.18 to 1.66; P<0.0001; I 2 = 0%). Exposure to HPTs was associated with an increased risk of congenital heart malformations: pooled OR = 1.89 (95% CI 1.32 to 2.72; P = 0.0006; I 2=0%); nervous system malformations OR = 2.98 (95% CI 1.32 to 6.76; P = 0.0109 I 2 = 78%); gastrointestinal malformations, OR = 4.50 (95% CI 0.63 to 32.20; P = 0.13; I 2 = 54%); musculoskeletal malformations, OR = 2.24 (95% CI 1.23 to 4.08; P= 0.009; I 2 = 0%); the VACTERL syndrome (Vertebral defects, Anal atresia, Cardiovascular anomalies, Tracheoesophageal fistula, Esophageal atresia, Renal anomalies, and Limb defects), OR = 7.47 (95% CI 2.92 to 19.07; P < 0.0001; I 2 = 0%). Conclusions: This systematic review and meta-analysis shows that use of oral HPTs in pregnancy is associated with increased risks of congenital malformations.


Introduction
Oral hormone pregnancy tests (HPTs), such as Primodos (known as Duogynon in Germany), were available as injections from 1950 and in tablet form in the UK from 1956 onwards, before the modern forms of urine pregnancy tests became available 1 . Oral HPTs contained ethinylestradiol and large doses of norethisterone (synthetic forms of estrogen and progesterone respectively), the latter in much larger amounts than those included in current combined oral contraceptives (see Table 1). The test principle was that they would induce bleeding similar to menstruation in those who were not pregnant.
In the UK more than a million women took HPTs 2 . However, evidence that they should not be used in pregnant women because of a risk of fetal malformations 3 led the then Committee on Safety of Medicines in 1975 to conclude that a warning should be added to the Data Sheets, stating that HPTs should not be taken during pregnancy. (Supplementary File 1) Warnings about HPTs in pregnancy first emerged in 1956: 4 accumulating concerns over an increased risk of malformations led to their withdrawal in a number of countries at different times. Norway cancelled the indication in pregnancy for HPTs in 1970; when the UK did so in 1978, the manufacturers of Primodos, Schering AG (taken over by Bayer AG in 2008), voluntarily stopped marketing the product; in Germany, Duogynon was taken off the market in 1981 1 .
Since Primodos was withdrawn, the discovery of previously confidential documents has led to renewed concerns about its potential to cause harm 5 . In 2014, therefore, the Medicines and Healthcare products Regulatory Agency (MHRA) initiated a review, which was published in 2017 and reported that the evidence was insufficient, mixed, and too heterogeneous to support an association between oral HPTs and congenital malformations 3 .
To date, there has been no systematic review and meta-analysis of oral HPTs, using all the available data, to assess the likelihood of an association. We have therefore performed a systematic review to obtain all relevant data on hormone pregnancy tests and congenital malformations, used meta-analytical tools to obtain summary estimates of the likelihood of an association, and assessed the potential biases in these estimates.

Data sources
Full details of our search strategy are provided in Supplementary File 2. We searched Medline, Embase, and Web of Science (which yielded German papers and conference abstracts) and searched for regulatory documents online, including the UK Government's "Report of the Commission on Human Medicines' Expert Working Group on Hormone Pregnancy Tests", which includes the original Landesarchiv Berlin Files 3 , and reference lists of retrieved studies from the start of the databases in 1946 to 20 February 2018.
We used the following search terms without date limits or language restrictions: (Primodos OR Duogynon OR "hormone pregnancy test" OR "sex hormones" OR "hormone administration" OR "norethisterone" OR "ethinylestradiol") AND pregnancy AND (congenital OR malformations OR anomalies). Several comparable high-dose HPTs were available at the same time as Primodos; we performed additional searches for evidence relating to these (See Supplementary File 3 for List of HPTs included in evidence search).

Amendments from Version 1
We thank the reviewers for their positive comments about our manuscript and we have responded to the points made and revised version 1 of the paper in light of their comments. The major changes affect the introduction which has been revised in line with the peer review comments, We have also made changes to the references due to citation errors; uploaded revised excel data sheets as there was an error in the data in the first version, and made one correction in the text of the results to the 'Exposure to oral HPTs was associated with a 40% increased risk of all congenital malformations: pooled odds ratio (OR) = 1.40 (95% CI 1.18 to 1.66; P < 0.0001; I 2 = 0%)' , which incorrectly stated the increased risk as 37%. We have also revised the forest plots ( Figure 2-Figure 8) as the effect estimates were incorrectly labelled.

Study selection
We included observational studies of women who were or became pregnant during the study and were exposed to oral HPTs within the estimated first three months of pregnancy and compared them with a relevant control group. When a study was described in more than one publication, we chose the publication that contained the most comprehensive data as the primary publication. We excluded studies where the intervention was oral hormones taken for other reasons (e.g., oral contraception) and it was not possible to extract data on hormone pregnancy tests. We did not restrict the language of publication. We checked additional relevant data and extracted them from the secondary publications when necessary.
Data extraction and risk of bias assessment Two reviewers (CH and ES) applied inclusion and quality assessment criteria, compared results, and resolved discrepancies through discussion with the other authors. We used a review template to extract data on study type, numbers of pregnancies exposed and not exposed to oral HPTs, and types and numbers of outcomes. Where available, we extracted data about the women studied, including ascertainment of cases, age, parity, setting, exposure to other medications, and confounding variables.
In case-control studies, if data were reported on more than one control group, we extracted data where possible for nondisease/non-abnormality controls, and combined control groups if necessary.
The primary outcome of interest was all major congenital malformations. We also categorized outcomes for the congenital anomaly in the offspring at any time into congenital cardiac, gastrointestinal, musculoskeletal, nervous system, and urogenital defects, and the VACTERL syndrome (Vertebral defects, Anal atresia, Cardiovascular anomalies, Tracheoesophageal fistula, Esophageal atresia, Renal anomalies, and Limb defects).
We assessed quality using the Newcastle-Ottawa Scale (NOS) for non-randomized studies included in systematic reviews 6 . The scale assesses the selection of study groups (cases and controls), comparability of study groups, including cases and controls, and ascertainment of the outcome/exposure. Each positive criterion scores 1 point, except comparability, which scores up to 2 points. The maximum NOS score is 9, and we interpreted a score of 1 to 3 points as indicating a high risk of bias 7 .
To determine whether the study had controlled for the most important factors, we selected the items reported in the original paper and resolved disagreements through consensus, using a third author (IO). We examined whether there was a linear relation between methodological quality and study results, by plotting the odds ratios against the NOS scores, using Excel, and assessed the correlations of NOS scores with several confounding variables we collected 8 .

Data synthesis and statistical methods
We calculated study-specific odds ratios for outcomes and associated confidence intervals. We meta-analysed the data using a random-effects model. We assessed heterogeneity across studies using the I 2 statistic and publication bias using funnel plots 9 .
We performed a sensitivity analysis by removing single studies to judge the stability of the effect and to explore the effect on heterogeneity 10 , and we described any sources of variation. We also judged robustness by removing studies of low quality from the analysis. To examine whether the observed heterogeneity could be explained by differences in the NOS score, we also performed meta-regression using the NOS score as the covariate against the log OR as weights for traditional meta-regression using Stata version 14.
We planned subgroup analyses for the timing of administration of HPTs in relation to pregnancy and organogenesis and study design (case-control versus cohort) using Cochran's Q test. We used RevMan v.5.3 for all analyses, except for meta-regression, for which we used Stata version 14. RevMan and Stata estimate the effects of trials with zero events in one arm by adding a correction factor of 0.5 to each arm (trials with zero events in both arms are omitted). We performed a sensitivity analysis by removing studies with zero events from the analyses.
We followed the reporting guidelines of the Meta-Analysis of Observational Studies in Epidemiology (MOOSE). A completed checklist is available as Supplementary File 4 11

Patient involvement
Members of the Association for Children Damaged by HPTs were involved in the original discussions of this review and provided input to the outcome choices, the search, the location of study articles, and translations. We plan to present the study findings to relevant patient groups and make available lay interpretations.

Description of included studies
We retrieved 409 items for screening. After title and abstract screening and removal of duplicates (n = 18), we excluded 354 records as not being relevant to the aim of the review. We assessed the full texts of 37 articles and identified 24 articles for inclusion. Figure 1 shows the PRISMA flow diagram for the inclusion of studies.
The 24 included articles reported on 26 studies (16 case-control studies and ten prospective cohort studies); one article [Nora 78] included two case-control studies and one prospective study. We found no randomized controlled trials. Of these articles, two were unpublished reports (see Supplementary File 5 for full references). The studies included 71,330 women. The case-control studies included 28,761 mothers, 594 of whom were exposed to HPTs; the cohort studies included 42,569 mothers and 3,615 exposures to HPTs. The studies were published between 1972 and 2014, and all were performed either in Europe or the USA. They mostly recruited women and their infants at maternity centres or hospital paediatrics wards.
The choices of controls in the case-control studies varied; they included, at one extreme, healthy infants born on a date close to the case infants and, at the other extreme, infants with malformations other than those under investigation. Among the prospective cohort studies, the populations tended to be women recruited at antenatal clinics or birth centres (See Table 2. Characteristics of included studies).

Quality assessment of included studies
Of the 26 included studies, three were assigned a NOS score of 3 or below and were therefore judged as being at high risk of bias. One was a case-control study (Laurence 1971, a published abstract as a letter) and two were cohort studies (Fleming 1978 andHaller 1974, both unpublished). The NOS scores ranged from 2 to 9 (median 5). Twelve of the 26 included studies scored 7 to 9 and were judged to be at low risk of bias (see Table 3 of NOS scores in the data files). Item 5 of the NOS score addresses comparability of cases and controls based on design or analysis. Of the 16 case control studies, 12 controlled for the most important factor (item 5a) and nine controlled for important additional factors (item 5b). Of the ten cohort studies, six controlled for the most important factor (item 5a) and four controlled for important additional factors (item 5b). The mean Newcastle-Ottawa scale score was 6.1, indicating an overall moderate risk of bias. Table 2 also shows that seven studies did not report the confounding variables collected (Laurence 1971;Levy 1973;Tummler 2014;Fleming 1978;Haller 1974;Moire 1978;Rousel 1968). NOS scores correlated with the increasing number of confounding variables collected (r = 0.83). Supplementary File 6 shows the funnel plots for all congenital malformations and congenital heart disease; because of inadequate numbers of included studies, we did not use more advanced statistical methods to assess publication bias.
Association of exposure to HPT with the risks of malformations Nine studies, including 61,642 mothers of infants and 3,274 exposed to HPTs, examined the association in pregnancy with all congenital malformations. Two were case-control studies (Greenberg 1977;Sainz 1987) and seven were cohort studies (Fleming 1987;Goujard 1979;Haller 1974;Kullander 1976;Michaelis 1983;Rumeau-Rouquette 1978;Torfs 1981) (Figure 2). Exposure to oral HPTs was associated with a 40% increased risk of all

Comparability of cases and controls on the basis of the design or analysis
Are the participants representative of the exposed cohort?

Discussion
We found 24 articles containing 26 studies that reported the association between exposure to oral hormone pregnancy tests in mothers and malformations in their infants: 16 were casecontrol studies and ten were prospective cohort studies. The overall quality of the evidence, assessed by the Newcastle-Ottawa Scale, was moderate.
We found significant associations for all congenital malformations pooled and separately for congenital heart malformations, nervous system malformations, musculoskeletal malformations, and the VACTERL syndrome. Many of these pooled analyses had zero heterogeneity, and the direction of effect favoured the controls in 30 of the 32 analyses undertaken (Torfs 81 provided the only effect estimate favouring HPT exposure). The analyses were also robust to sensitivity analyses, and there was no relation between NOS score and increasing risk.
Based on the assumptions that a teratogenic effect of HPTs would be mediated by actions on estrogen and progestogen receptors, and that concentrations of ethinylestradiol and norethisterone in the fetus would be too low to have a significant effect on those receptors, it has been suggested that there is no mechanistic argument for teratogenicity 3 . However, other unknown mechanisms might be at play. For example, Isabel Gal first reported concerns of malformations in the children of mothers exposed to HPTs in 1967 12 , pointing out that bleeding often occurred in pregnant women soon after exposure and suggesting that that would affect the "equilibrium" of the uterus. Between 5 and 11% of exposed women had bleeding, and the RCGP survey reported induced abortions in about 10% of women 13 .
The drugs in Primodos were not tested for animal toxicity and teratogenicity at the time, which, although not unusual, meant that there was a gap in mechanistic understanding. A 2018 study showed that the components in Primodos are associated with dose-dependent and time-related damage in zebrafish embryos, and affect nerve outgrowth and blood vessel patterning in zebrafish 12,14 . Although it is difficult to compare drug actions between species, and evidence from animal studies is limited, the drugs accumulated in the zebrafish embryos, persisted for some time, and led to rapid embryonic damage 12,14 . In contrast, other animal studies have shown minimal effects on embryo development 15 . There is also evidence that estradiol and progestogens increase the expression of mRNA for isoforms of vascular endothelial growth factor (VEGF) in Ishikawa cells from human endometrial adenocarcinoma 16 .

Strengths and weaknesses
Establishing causal associations in the absence of randomization can be difficult. However, the lack of randomized trials in our analysis should not be seen as a barrier to interpreting our findings. It would have been unethical to randomize individuals to drugs with known concerns, and randomization, like systematic reviews, was not the norm at the time. Furthermore, for questions about harms, the Oxford CEBM levels of evidence puts systematic reviews of case-control studies on a par with systematic reviews of randomized trials 17 .
However, observational methods have limitations 18 . First, interpretation can be affected by confounding factors. Although most of the studies in this review used matched controls, our analysis was based on raw data from the publications and did not adjust for confounders. Secondly, susceptibility bias can occur, as women with threatened abortions might be more likely to present and take the medication. Both of these problems can be mitigated by careful matching; 13 of the 16 studies controlled for the most important factor, item 5a on the NOS scale. Thirdly, the severity of malformations studied will have led to differing risk estimates across studies. Fourthly, inappropriate methods of ascertainment of the malformations and exposures could have introduced bias. Finally, incomplete and uneven reporting, along with publication bias (since it is likely that unreported studies exist) could introduce bias and alter the effect estimates.
The use of scoring systems to assess quality has been criticized. However, the NOS scale has been used widely in assessing the quality of non-randomized studies [19][20][21][22][23][24] . A NOS score between 0 and 9 has previously been used as a potential moderator in meta-regression 25 , and has been recommended by the Cochrane Collaboration 26 . A weakness of the NOS scale is the possible low agreement between assessors 27 . This was particularly the case when authors had limited experience in doing systematic reviews, but training, even of novices, improves agreement 19 .
The effects were also stable to sensitivity analyses, and changes in NOS score did not affect the risk estimates. The absence of subgroup differences between study designs for the risk estimates supports the robustness of the findings. We also tried to overcome publication bias by translation and assessment of unpublished data. The sample sizes in the studies for all congenital malformations, congenital heart disease, and nervous system malformations were sufficiently large to suggest that small unpublished studies would have little effect on the estimates unless they were highly heterogeneous. The analyses of gastrointestinal, urogenital, musculoskeletal, and VACTERL malformations were limited by their small sample sizes and low number of events: the interpretation of these effects should, therefore, be treated more cautiously. The significant effect observed for VACTERL should also be treated cautiously, as the confidence intervals for this effect were wide.
Our study has several strengths. We used standard systematic review methods, and by asking a focused question solely on exposure to HPTs, and excluding exposure to other hormones, we have been able to assess the heterogeneity of the effect estimates. However, as with any observational studies, there is always the possibility that an unknown confounder could be the cause of the observed difference. While such a possibility cannot be ruled out, the lack of heterogeneity means that such a confounder would potentially have to act in the same direction, despite many different confounders being collected and controlled for. Confounding factors with variable effects on the effect estimates would have probably led to a high degree of heterogeneity, which would have prevented pooling; this was not the case.

Conclusion
Regulators were first made aware of the link between exposure to HPTs and congenital malformations in 1967. After 1975, the Primodos label was changed to state that the medication should not be used in pregnancy because of a risk of malformations (see Figure 9). The evidence of an association has previously been deemed weak, and previous litigation and reviews have been inconclusive. However, we believe that this systematic review shows an association of oral HPTs with congenital malformations.
Our results show the benefit of undertaking systematic reviews, a study type not in routine use when most of these studies were done. For example, only one study (Greenberg 1997) out of nine reported a significant effect for all congenital malformations; the pooled estimate was significant. Much of the discussion over the associations of HPTs with congenital malformations at the time these studies were published focused on the lack of 1.

2.
Many thanks for these positive comments. This is a timely and much-needed paper that deserves to be widely read and cited. It provides the first systematic review and meta-analysis of old epidemiological data pointing towards a long-acknowledged association between HPTs and birth defects. Most of the paper is devoted to apparently rigorous statistical analysis. We leave constructive criticism of the statistics to other, more appropriately qualified reviewers. Instead, we confine our comments to the historical context and factual details presented in the paper. These, on the whole, are entirely satisfactory. But some minor errors -that do not significantly detract from the overall argument -should be amended: 'Oral hormone pregnancy tests (HPTs), such as Primodos, containing ethinylestradiol and high doses of norethisterone, were given to over a million women from 1958 to 1978' (p. 1).
It is worth clarifying that HPTs were available as injections from 1950 and in tablet form (e.g., Schering's Orasecron, Roussel's Amenorone Forte), in the UK, from at least 1956. See, for example, Britton (1956 ); and . For an extended https://archive.org/details/b19974760M4180/page/n45?q=amenorone+1956 discussion, see Olszynko-Gryn (2014) We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Carl Heneghan
Many thanks for these positive comments.
We have amended the introduction with the following text: 'Oral hormone pregnancy tests (HPTs), such as Primodos (known as Duogynon in Germany), were available as injections from 1950 and in tablet form in the UK from 1956 onwards, before the modern forms of urine pregnancy tests became available [1] The benefits of our systematic review include that it quantifies the magnitude of the association and tests the robustness of this association across multiple studies by meta-analysis. We, therefore, perceive that they are rationale and the objectives are clear.

CH, ES, KRM and IO received funding from the NIHR SPCR Evidence
Competing Interests: Synthesis Working group (NIHR SPCR ESWG. CH is Director of the NIHR SPCR ESWG, and receives funding support from the NIHR Oxford BRC, is an NIHR Senior Investigator and Editor in Chief of BMJ Evidence-Based Medicine. JKA has published papers and edited textbooks on adverse drug reactions; he has also acted as an expert witness in cases related to adverse drug reactions.

David Healy
The Primodos components norethisterone acetate and ethinyl estradiol induce developmental abnormalities in zebrafish embryos [Brown S, Fraga LR, Cameron G, Erskine L, Vargesson N. Sci Rep. 2018 Feb 13;8(1):2917. doi: 10.1038]. Brown's data in Zebrafish show acetate and ethinylestradiol teratogenicity depends on dose and the embryonic stage of development, embryos at an early stage being more sensitive than those at a later stage.
The comments are interested in why the regulator (MHRA) did not find comparable results but this is not a matter that should be addressed in this article. We agree with this issue -no change required.
We have removed the spacing between numbers to eliminate the odd spacing effect.
CH, ES, KRM and IO received funding from the NIHR SPCR Evidence Competing Interests: Synthesis Working group (NIHR SPCR ESWG. CH is Director of the NIHR SPCR ESWG, and receives funding support from the NIHR Oxford BRC, is an NIHR Senior Investigator and Editor in Chief of BMJ Evidence-Based Medicine. JKA has published papers and edited textbooks on adverse drug reactions; he has also acted as an expert witness in cases related to adverse drug reactions.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com