Keywords
reproducibility, replicability, repeatability, agreement, validation, truth, methodology, equivalence
This article is included in the Research on Research, Policy & Culture gateway.
reproducibility, replicability, repeatability, agreement, validation, truth, methodology, equivalence
Reproducibility is said to be a core principle of scientific progress. Nevertheless, poor reproducibility has recently been shown to haunt preclinical research1,2, translational research3, medicine4 and psychology5. False-positive initial results due to random chance or incorrect study design were among the reasons implicated, as well as data-dredging, publication bias and misconduct. Others called irreproducible results ‘biased’1 and ‘unreliable’5.
Coming from a background of meta-analysis with its countless examples of unexplained heterogeneity and an ingrained appreciation of sampling variability, we were surprised that these outcries cited above were not accompanied by a formal definition of the concept of reproducibility. Goodman et al. did define three types of reproducibility (methods, results, and inferences) and stated that confusion arises when, inadvertently, people use reproducibility as a synonym for “truth”6. We read their paper as being about truth although its title suggests otherwise. Our paper is about reproducibility sensu stricto and we revisit some basic definitions of reproducibility, notice that these definitions are problematic, and argue that the concept of equivalence in randomized trials may be fruitfully applied to sharpen our understanding of what we mean by reproducibility. We propose that investigators aiming to reproduce others’ findings should pay more attention to predefining a margin of (unacceptable) discordance with existing findings.
Box 1 shows two formal definitions of the concept of reproducibility.
Definition 1:
“The value below which the absolute difference between two single test [or study, our addition] results may be expected to lie with a probability of 95%, when the results are obtained by the same method and equipment from identical test material in the same setting by the same operator within short intervals of time. A test or measurement [or study, our addition] is reproducible if the results are identical or closely similar each time it is conducted (Synonym, repeatability)”7
Definition 2:
“The degree of agreement among a set of observations […] after all known sources of error are accounted for (Synonym, precision)”8
Note the following differences between definitions 1 and 2:
(i) In definition 1, reproducibility is taken to be a binary concept: a result is either reproduced or not. Definition 2, takes reproducibility to be a continuous concept, like a degree of concordance.
(ii) Related to (i), definition 1 implies the subjective choice of a difference, δ, whose value will depend on the measurement problem at hand. Definition 2 avoids a choice of δ.
(iii) Definition 1 chooses the value ‘95’ for the confidence interval to be used. Definition 2 avoids subjective choices of a particular confidence level, such as 95, 90, 68 etc.
(iv) Only definition 2 emphasizes measurement that is free of bias.
Reproducibility studies may be seen as a type of equivalence trials (see Figure 1). Briefly, in classic superiority trials, we pose a statistical null hypothesis of no difference, which we then seek to reject to conclude that a difference exists. In equivalence trials, we define a (narrow) zone around a zero difference (between, say, our new drug and an existing one) and we establish equivalence if the entire confidence interval for the reproducibility study lies inside that zone. In this article, we propose to replace the difference of zero by the (pooled) value of (the) previous study or studies (vertical line in Figure 1). The width of the grey equivalence zone or “zone of reproducibility” is crucial and it seems sensible to define it pragmatically for each research situation separately. Without concrete ideas about the maximal width of this zone, judgments of when a result counts as a reproducibility can be quite subjective. For example, Begley and Ellis considered positive results as not reproduced if the replicate findings were not sufficiently robust to drive a drug-development program. Ioannidis considered the results of a therapeutic intervention as reproduced if the researcher’s final interpretation of the data in both studies was that the intervention was effective (or ineffective). Figure 1, however, shows that even in situations in which one has strictly defined the width of the zone and a suitable type of confidence interval, undecided outcomes may still occur (situations 5–7, Figure 1).
Numbers in brackets refer to the 9 scenarios; horizontal lines are xx% confidence intervals (CI), where xx=95, 90, or 68 etc; short vertical lines depict point estimates; the grey area signifies the zone of reproducibility; delta (δ) refers to the maximal absolute value below which reproducibility (concordance with (an) existing finding(s)) is deemed present. Scenarios 1–4: reproducibility is present since the new point estimate and its entire 95%CI interval lie within the grey zone; scenarios 5–6: presence of reproducibility is uncertain since the point estimate lies inside the grey zone, but the xx%CI does not; scenario 7: presence of reproducibility is uncertain since the point estimate lies outside the grey zone, but part of its xx%CI lies inside; scenario 8–9: absence of reproducibility since point estimate and corresponding xx%CIs are outside the grey zone. Note, that two components are subjective: (1) the choice of δ, although preferably it should be chosen with a thorough understanding of theory or application of the research problem, and (2) the type of confidence interval since other choices than a 95%CI may be possible and defensible. Note also that, even after delta and the type of confidence limit have been chosen, uncertainty may persist if confidence limits overlap the boundaries of delta.
Reproducibility studies imply healthy scepticism: “Can we reproduce this finding?” In contrast with the comment cited above, which states that irreproducible results are biased, we emphasize that (ir)reproducibility of results says nothing about the validity of the previous nor of the current findings. For that, we need (validity) judgments about rigor of study design and execution. Meta-analyses of many small, but concordant, studies that were subsequently negated by the result of a single mega-trial (believed by many to represent the truth) illustrate this situation9.
In conclusion, the concept of reproducibility (repeatability, precision) should be distinguished from validity (“truth”). Furthermore, an equivalence trials framework can be fruitfully used to clarify the concept of reproducibility if we change the (narrow) equivalence zone around a zero difference by a zone of reproducibility around (a) previous finding(s). Care should be exercised when selecting sensible margins (delta) to decide on reproducibility of results10.
No data is associated with this article.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the topic of the opinion article discussed accurately in the context of the current literature?
Yes
Are all factual statements correct and adequately supported by citations?
Partly
Are arguments sufficiently supported by evidence from the published literature?
Yes
Are the conclusions drawn balanced and justified on the basis of the presented arguments?
Yes
References
1. Goodman S, Fanelli D, Ioannidis J: What does research reproducibility mean?. Science Translational Medicine. 2016; 8 (341). Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: research integrity, open science, methodology, plagiarism, publishing
Is the topic of the opinion article discussed accurately in the context of the current literature?
Partly
Are all factual statements correct and adequately supported by citations?
Partly
Are arguments sufficiently supported by evidence from the published literature?
Partly
Are the conclusions drawn balanced and justified on the basis of the presented arguments?
Partly
References
1. Begley CG, Ellis LM: Drug development: Raise standards for preclinical cancer research.Nature. 2012; 483 (7391): 531-3 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Particular interest in the area of scientific rigour and research methodology.
Is the topic of the opinion article discussed accurately in the context of the current literature?
Partly
Are all factual statements correct and adequately supported by citations?
Partly
Are arguments sufficiently supported by evidence from the published literature?
Partly
Are the conclusions drawn balanced and justified on the basis of the presented arguments?
Partly
References
1. Goodman S, Fanelli D, Ioannidis J: What does research reproducibility mean?. Science Translational Medicine. 2016; 8 (341). Publisher Full TextCompeting Interests: I was a lead author on the 2016 article which is being discussed here.
Reviewer Expertise: Statistical inference, research reproducibility, epidemiology, clinical research.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 1 09 Jan 19 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)