New forms of checks and balances are needed to improve research integrity

Recent attempts at replicating highly-cited peer-reviewed studies demonstrate that the “reproducibility crisis” is indeed upon us. However, punitive measures against individuals committing research misconduct are neither sufficient nor useful because this is a systemic issue stemming from a lack of positive incentive. As an alternative approach, here we propose a system of checks and balances for the publishing process that involves 1) technical review of methodology by publishers, and 2) incentivizing direct replication of key experimental results. Together, these actions will help restore the self-correcting nature of scientific discovery.


Introduction
The scientific method provides a systematic framework for formulating, testing and refining hypotheses. By definition, it requires findings to be reliable so that theories can be refined and scientific progress can occur. Recently, it has become clear that the scientific method as it is currently being practiced is failing in self-correction, with multiple studies indicating that more than 70% of surveyed peer-reviewed articles cannot be independently verified 1-4 . Unfortunately, instead of focusing on new systems to promote high quality reproducible research, most resources and attention are focused on trying to police the scientific community by investigating allegations of research misconduct. This approach is destined to fail, because the problem is systemic and not caused by a few bad players who can be caught and punished. From 1994From -2003 cases of misconduct were formally investigated by the Office of Research Integrity 5 . In contrast, ~480,000 papers funded by the NIH were published 6 . It would be impractical and ineffective to investigate why 70% of published findings are irreproducible, even though ultimately the ability to repeat and build upon prior work is the key component of research integrity that we should care about. Instead, truly addressing the "reproducibility crisis" requires establishing new checks and balances for the publishing process through 1) technical review of methodology by publishers, and 2) incentivizing direct replication of key experimental results. If we, the scientific community, fail to ensure the quality of the research we produce, other parties with their own vested interests will step in to police us instead 7 .

Checks: Publishers need to verify quality of research through third-party technical review
Publishers are uniquely placed to significantly improve reproducibility because of their inherent need to garner respect from the scientific community. Nature and EMBO are two stand-out examples who are leading the way on ensuring the quality of the research published in their journals. Moreover, current efforts to ensure quality using peer-review alone to weed out irreproducible research are not effective. One reason is that the breadth of technical knowledge that is now required to review a single study is beyond individual scientists. The number of authors per article has increased over the last decade 8 . In contrast, peer review still relies on two or three peers who are unlikely to be qualified to assess every experimental technique in the study. Nature has implemented an impressive new policy to reduce irreproducibility of its published papers 9 , and a key aspect to this is employing expert statisticians to review the statistical analysis of papers. Currently, a major limiting factor for implementing technical review is the lack of standardization for methodology design and required controls. Establishing and implementing these standards to ensure the technical quality of the research published in their journals is an effective value-added service that publishers should provide as a separate power in the scientific community. The Resource Identification Initiative (https://www.force11. org/node/4463 date accessed: 2014-04-24) is an example of practical implementation for reporting of materials and methods in a standardized and machine-readable manner. Similar to successful mandates on open access to raw data, journals wield the power to require clear methodology as prerequisite for publication. Further, analogous to open data, the nascent implementation of standardized methodologies will likely yield debates, but lively discussions by the scientific community are useful for policy refinement (http://blogs.plos.org/everyone/2014/03/08/plos-new-data-policypublic-access-data/ date accessed: 2014-04-25).

Balances: Direct replication needs to be incentivized for science to be self-correcting
While journals should carry technical review responsibilities, establishing positive incentive structures for reproducible science is necessary to balance the pressure of producing high-profile publications at all costs. Of course, there will always be edge cases where it is not practical to directly replicate findings (for example unpredictable or one-off events like an earthquake), but for the majority of findings it should be possible to directly replicate them. That is, repeat the experiment as-is, while collecting additional information such as "the reliability of the original results across samples, settings, measures, occasions, or instrumentation" 10 . This is separate from conceptual replication, which is "an attempt to validate the interpretation of the original observation by manipulating or measuring the same conceptual variables using different techniques" 10 . It is also separate from re-analysis of existing raw data to check for errors in analysis and presentation, but where no new data are obtained. Therefore, directly reproducing experiments is not merely redundant effort, because new data are generated and analyzed to demonstrate the robustness of the original results.
Journals such as F1000Research and PLOS ONE (http://f1000research.com/author-guidelines, http://www.plosone.org/static/publication, date accessed: 2014-03-14) now consider direct replication of original studies, but even a place to publish is not sufficient because there needs to be an effective system to incentivize scientists to conduct replication studies in the first place. The simplest way to conduct replication studies is via fee-for-service technical providers because of their pre-existing methodological expertise and neutral academic involvement (i.e. they are motivated by an operational or a monetary incentive, and thus do not fear retribution from their peers or have the need to accumulate high impact 'novel' publications). Similarly, grants specifically designated for research integrity are vital for driving replication (http://www.arnoldfoundation.org/reproducibility-initiative-receives-13m-grant-validate-50-landmark-cancer-studies date accessed: 2014-04-28). These are strategies used by the Reproducibility Initiative (https://www.scienceexchange.com/reproducibility, date accessed: 2014-03-14), and it remains to be proven whether it will be a cost-effective mechanism to conduct direct replications.
The recent ascent of crowd-sourced post publication peer reviews have identified manuscripts with problematic content, but they remain most active for articles on new techniques that other researchers are eager to replicate for their own experiments (e.g. http://www.ipscell.com/stap-new-data/ date accessed: 2014-04-28 and http://f1000research.com/articles/3-102/v1 date accessed: 2014-05-20). Therefore, positively incentivizing direct replication is necessary for science to become self-correcting again, because no one would selectively publish only their experiments that worked or manipulate their findings knowing that a replication attempt, whether experimental or analytical, would not find the same significant outcome. Scientists would also be more willing to share their raw data and full methodologies before publishing because they want to make sure that their findings are reproducible. Not identifying robust and reproducible research is very costly and impairs our ability to make effective progress against diseases like cancer in which we have already invested billions of dollars. Establishing new checks and balances with existing members of the scientific community such as publishers and fellow scientists is infinitely more preferable than those imposed by outside authorities. And if science progresses by "standing on the shoulders of giants", it is our duty as scientists to ensure that the "shoulders" are steadfast for our peers.
Author contributions E.I and C.C. co-wrote this article.

Competing interests
Elizabeth Iorns is employed at and owns shares of Science Exchange Inc.
Christin Chong has no conflicts of interest to disclose.

Grant information
The author(s) declared that no grants were involved in supporting this work. Iorns and Chong state in the first paragraph of their Opinion Article that "70% of surveyed peer-reviewed ". Iorns, who heads the company, Science Exchange, Inc., articles cannot be independently verified reported the same statistic in an interview with Jennifer Welsh in Business Insider, 2012. Now she and Christin Chong present a set of recommendations for alleviating this problem. But the way they support their claim that 70% of research is irreproducible is problematic. They base their value primarily on four references that demand scrutiny. These references include three cases on drug effects that are marginal and a fourth on sex differences. Two of the references include data not peer reviewed and authored by individuals from commercial companies . A third is retrospective and involves the re-evaluation of statistical calculations of the original authors . Only one, testing the effects of drugs on increased longevity of SOD1G934 mice, provides data that can be assessed , and even those data, obtained in an impressive manner, are presented in a review article.

Open Peer Review
There is merit in questioning the reproducibility of studies on marginal drug effects or sex differences, but it seems irresponsible to present, as Iorns and Chong have, a sweeping statement that 70% of all published peer-reviewed articles are irreproducible, even with the qualification of "surveyed" articles. Do these authors really believe that this 70% value applies to studies on signal transduction pathways, the phenotypes of mutants from viruses to bacteria to mammals, the interactions and roles of cytoskeletal molecules, the molecular evolution of species, the functions of molecules in embryogenesis and a vast variety of other biological fields? If Iorns and Chong had limited their commentary to the efficacy of drugs in model systems with marginal effects, they could have made an important and plausible case. But even then they would have had to do a better job referencing their argument. And to bring up the fact that 259 cases of misconduct were investigated by the Public Health Service, followed by their statement "That in .", appears to be an attempt to globalize the contrast ~480,000 papers funded by the NIH were published problem by insinuation rather than hard supporting data.
The suggestion by the authors that publishers should assess the methods and statistics used by third parties is already in place. It is, obviously, the peer review system, and of course it has its problems. But the insinuation is that this process is failing in 70% of cases. Publishers should indeed be more responsible for making sure that reviewers are selected who can really assess whether the methods employed and the statistics applied are valid, especially when marginal effects are claimed. I am sure that all other scientists would whole heartedly agree with that general suggestion. But a vehicle for immediately replicating data in every published paper is extraordinarily impractical, potentially very expensive and not at all necessary in areas of research in which answers are far-more straight forward.

2.
And who would foot the bill? The publishers? They are, in almost all cases, for-profit. For replication, they would charge a small fortune. And would scientists spend half of their research funds replicating other scientist's discoveries. With the radical decrease in funding we are now experiencing, I would not bet on it. Iorns is co-founder of Science Exchange, Inc., a for profit company that charges scientists to have measurements performed in 900 laboratories worldwide that appear to have been recruited to perform experiments for a fee, and a profit, presumably for them and a presumable cut for Science Exchange, Inc. Would Science Exchange, Inc. be the vehicle for such testing?
The authors should realize that big discoveries are immediately reproduced by other scientists, to build on those discoveries. Therefore, most scientists are obsessed with the validity of their results. And reproducibility is a tough chore if scientists do not apply the exact same procedures, under the exact same conditions, with the exact same strains and the exact same reagents. Biological systems, from cell cultures to biofilms to biochemical reactions have inherent plasticity and variability, highly responsive to the smallest changes in genetic background, temperature, composition of the atmosphere, trace elements, source of reagents and extracts, and even the quality of double distilled water. But contradictions in the results published by different laboratories have a way of "shaking themselves out". Most seasoned biologists at the bench know this is the case. Iorns and Chong have made a reasonable case for a limited area of biomedical research that involves searching for small or marginal effects and which involve apparently high noise levels. But they have presented no proof that supports their claim that 70% of all biomedical research is irreproducible, an overstatement which insinuates a significant number of scientists are at worst actively trying to dupe the rest of the scientific world or at best incompetent. By globalizing the problem to a majority of the entire scientific research community in the first paragraph of their commentary, they have sensationalized the targeted problem.
I have read this submission. I believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.
No competing interests were disclosed. Competing Interests: 16