Keywords
Reproducibility, Biomedical Science, Irreproducibility, publishing, funding, standards and practises, institutions
Reproducibility, Biomedical Science, Irreproducibility, publishing, funding, standards and practises, institutions
In 2005 a paper was published by Dr. John Ioannidis entitled “Why most published research findings are false” (Ioannidis, 2005b). It was essentially Dr. Ioannidis’s claim that “false findings may be the majority or even the vast majority of published research claims”. The strange thing about this paper was that it wasn’t exaggerating.
In his paper Dr. Ioannidis used a mathematical proof that assumed modest levels of researcher bias. This bias could be human error, bad methodology or any number of other factors. He argued that a sufficiently motivated researcher that wishes to prove a theory correct, can do so most of the time regardless of whether the theory is actually correct. The rates of “wrongness” his model predicted in various fields of medical research corresponded to the observed rates at which findings were later refuted. And these rates of “wrongness” were not insignificant. 80 percent of non-randomized studies, 25 percent of “gold-standard” randomized trials and even as much as 10 percent of the “platinum-standard” large randomized trials turn out to be irreproducible.
A second paper by Dr. Ioannidis was published that same year. In this paper Dr. Ioannidis looked at 49 of the most citied articles in the most citied journals (Ioannidis, 2005a). 45 of these papers claimed to have described effective interventions for various diseases ranging from heart attacks to cancer. Of these 45, seven were contradicted by subsequent studies, seven others had their reported effects diminished by subsequent studies and 11 were largely unchallenged. Only 20/45 (44%) of the field guiding papers had been replicated successfully. And in a finding that shows how these irreproducible papers are impacting the field, Dr. Ioannidis found that even when a research paper is later soundly refuted its findings can persist, with researchers continuing to cite them as correct for years afterwards.
The counterargument is of course that despite all this the system clearly does work on the whole. Even if mistakes are being made and inefficiency is rampant, if something is wrong it will eventually be found out and corrected. Take for example the now infamous recent controversy regarding stimulus-triggered acquisition of pluripotency cells or STAP cells. Published in Nature, these two papers were considered a massive breakthrough in stem cell research (Obokata et al., 2014a; Obokata et al., 2014b). However very quickly problems emerged with the data presented in the paper. An investigation by the hosting institute found the lead investigator guilty of misconduct and the papers were subsequently retracted (Editorial, 2014). This is one example showing how the checks and balances in place within biomedical research can work and work well. Despite these measures the price of irreproducible research remains a substantial one.
A recent study estimated the cost of irreproducible pre-clinical research at 28 billion dollars in the US alone (Freedman et al., 2015). The study estimated the overall rate of irreproducibility at 53%, but warned that the true rate could be anywhere between 18% and 89%. While the exact figures are certainly debateable the clear message is that this is a significant issue even if we assume the rate of irreproducibility is at the lower end of their scale. Even big pharma companies have noted the lack of reproducibility coming from academia. They report that their attempts to replicate the conclusions of peer-reviewed papers fail at rates upwards of 75% (Prinz et al., 2011).
Going hand in hand with the financial costs we cannot forget about the time invested on these irreproducible studies. One could argue that this is indeed the more damaging factor given that it slows the development of potentially lifesaving treatments and interventions that would significantly improve quality of life for large proportions of society. This is of course an even more difficult if not impossible metric to measure accurately but it logically must be impacted upon. For example consider this: a researcher has a hypothesis and carries out 3 experiments to test this hypothesis. The first 2 experiments are successful and seem to confirm the researcher’s hypothesis. The findings from these first two experiments are thus published. This naturally leads to a third experiment which seems to strongly disprove the hypothesis and so the researcher abandons this line of work to move onto something else. The researcher is unlikely to publish the findings of the third experiment but the 2 other papers will remain published. This may lead other research groups to continue this work from the first 2 papers, perhaps even carrying out the same failed experiments, unware that it had already been tried and rejected. Again it is difficult to know how often something like this happens simply because we don’t know how often researchers are leaving their negative results unpublished. However even a cursory read through some biomedical journals will reveal that papers with negative results are few and far between.
Yet another group that believes this is a major issue in need of addressing is the Global Biological Standards Institute or GBSI. The GBSI carried out a study in which they interviewed 60 individuals throughout the life science community including biopharmaceutical R&D executives, academic & industry researchers, editors of peer-reviewed journals, leaders of scientific professional societies, experts in quality management, experts in standards, academic research leaders and many more disciplines (GBSI, 2013). In the extensive interviews with these professionals a systemic and pervasive problem with reproducibility was reported. Over 80% of academic research leaders they interviewed had some experience with irreproducibility. The reasons for this irreproducibility included inconsistencies between labs, non-standardised reagents, variability in assays, cell lines, experimental bias, differences in statistical methods, lack of appropriate controls and several others.
There is a perception amongst the general public that scientists are a group of meticulous, highly organised, extremely intelligent section of society (Castell et al., 2014). This perception is certainly not without a basis in reality but fails to appreciate the human aspect of scientists and our work. Mistakes happen, negligence occurs. Politics, money, bureaucracy and rivalries all get in the way of scientific research (GBSI, 2013; Wilmshurst, 2007). This all happens on a pretty regular basis according to the GBSI report with one researcher quoted as saying “We’ve had to address issues with replicating published work many times. Sometimes it’s a technical issue, sometimes a reagent issue, sometimes it’s that the technique was not being used appropriately” (GBSI, 2013). Within the biomedical research community these obstacles are a disliked but tolerated part of doing science. They are unfortunately sometimes considered part of the job and just how the system works as is demonstrated by their prevalence (Tavare, 2012; Wilmshurst, 2007).
This is a dangerous and irresponsible attitude to allow to continue within our community. It is precisely because of our line of work that we should seek to uphold to highest professional and academic standards. Our work which can quite literally be the difference between someone living or dying, or having to suffer from debilitating illness on a daily basis. Just because our impact is delayed by its long journey from bench to bedside does not make it any less crucial to people’s lives.
Yes, there are reasons for things being the way they are. The publishing process is outdated. There’s never enough funding to go around and what little there is goes often has strict criteria attached to it (Editors, 2011). When applying for positions in academia, publications are king with quantity being above quality or accuracy in many cases and these publications are being dominated by a select few in the field (Ioannidis et al., 2014). It is therefore unfortunately often necessary for scientists to play the game and submit to the demands of the system. Corners are cut, statistics are “reinterpreted” and results exaggerated all in the hopes of getting past the journals review panel, or having a grant proposal approved (Baker, 2016; GBSI, 2013). None of this is maliciously done of course, at least not usually, but malicious or not the negative effects are the same. Whatever the reason it’s still bad science.
So what is the solution? First we must appreciate that the problems leading to this situation are clearly multifaceted and require concerted efforts from groups and individuals at all levels throughout the community. We will discuss 4 of the main areas that could be improved upon.
One change that could have the most wide reaching effect would be a greater emphasis on the importance of replicability in studies. Just as the number of citations a researcher has today is considered an important metric for the quality of their work, the number of a researcher’s papers that have been reproduced by other groups should be strongly considered too. Standardised metrics such as this will help place greater importance on the quality of a piece of work, rather than just the exciting nature of its claims. This can help us to reduce the pressure to publish in the highest impact factor journals. With standardised metrics the prestige of a journal won’t necessarily matter as long as the papers quality metrics are solid.
For this to work however journals need to be more accepting of papers whose sole purpose is to reproduce or confirm another group’s work instead of favouring papers that report new discoveries or interventions. In addition journals could allow researchers to pre-register their planned experiments with them in exchange for a potential fast track to publication if they then carry out those experiments, even if they give negative results. This would go a long way to improving transparency and encouraging researchers to not discard unfavourable experiments. It would also help avoid situations like the one we discussed above where researchers continued to work on the basis of experiments they were unaware had already been disproven. Some examples of initiatives moving towards these goals include the All Trials campaign and the registered reports approach (AllTrials, 2016; Chambers, 2016).
Next there needs to be an expanded development and adoption of standards and best practices. Standards can be physical standards like standardised assays and cell lines or they can be documental standards such as protocols and practises. To do this guidelines for best practises and standards need to be easily accessible and widely available, which is one of the aims of the EQUATOR network (EQUATOR, 2016). Improvement in standards is arguably the most important factor for increasing reliability but given its wide ranging, complicated and technical nature we will leave discussion of it for others to pursue.
Similar to journals, funding bodies could introduce mechanisms to reward reproducibility in a researcher’s work. They should look at an investigator’s past record with producing reproducible work (new journal metrics would complement this) and also look at his/her current grant application to see if it’s set up to produce reproducible results. The NIH for example has introduced new guidelines beginning in January 2016 to improve reproducibility in grant in applications (NIH, 2016). The 4 main areas the guidelines seek to address include the scientific premise of the proposed research, rigorous experimental design for robust and unbiased results, the consideration of relevant biological variables and the authentication of key biological and/or chemical resources proposed in a grant application.
Finally and perhaps ultimately, the responsibility of producing high quality reproducible work is down to the principle Investigators, their lab group and the institution that hosts them. These are the people that can have the most immediate impact through self-correction and adherence to the best standards and practises whenever and wherever possible.
This article is in no way all-encompassing regarding the issue of reproducibility. Several problems and solutions exist which we have not discussed in detail, not least of which the troubling subject of researcher bias, for example. The references below do however discuss some of these in greater detail. Many will argue that implementing the above proposals will of course require additional work and possibly some significant upfront costs but we would counter that the longer term impact to biomedical research would be immense and one we cannot afford to miss out on.
HJ conceived of the article topic. HJ and RM contributed to researching, writing and referencing the article. HJ drafted the manuscript which RM reviewed and agreed to the final content therein.
Thank you to Matt, Laura and Zein for reading and advising me in the course of writing this article.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
References
1. Begley CG, Ellis LM: Drug development: Raise standards for preclinical cancer research.Nature. 2012; 483 (7391): 531-3 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 1 30 Mar 16 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
A 2013 updated Cochrane Review noted that, ...the results did not provide sufficient evidence on the effectiveness of daclizumab, and further studies are needed. In order to have more clear results, the length of follow-up needs to be longer.
In science, the impact of bias, lack of reproducibility, wilful falsification of data, and other myriad issues will unfortunately, persist; they contribute to affecting countless lives because of their effect on 'guidelines' and 'protocols' in medicine. Setting a gold standard for reproducibility issues is easier said than done, but is a noble goal.
A 2013 updated Cochrane Review noted that, ...the results did not provide sufficient evidence on the effectiveness of daclizumab, and further studies are needed. In order to have more clear results, the length of follow-up needs to be longer.
In science, the impact of bias, lack of reproducibility, wilful falsification of data, and other myriad issues will unfortunately, persist; they contribute to affecting countless lives because of their effect on 'guidelines' and 'protocols' in medicine. Setting a gold standard for reproducibility issues is easier said than done, but is a noble goal.