Reproducibility2020: Progress and priorities

The preclinical research process is a cycle of idea generation, experimentation, and reporting of results. The biomedical research community relies on the reproducibility of published discoveries to create new lines of research and to translate research findings into therapeutic applications. Since 2012, when scientists from Amgen reported that they were able to reproduce only 6 of 53 “landmark” preclinical studies, the biomedical research community began discussing the scale of the reproducibility problem and developing initiatives to address critical challenges. Global Biological Standards Institute (GBSI) released the “Case for Standards” in 2013, one of the first comprehensive reports to address the rising concern of irreproducible biomedical research. Further attention was drawn to issues that limit scientific self-correction, including reporting and publication bias, underpowered studies, lack of open access to methods and data, and lack of clearly defined standards and guidelines in areas such as reagent validation. To evaluate the progress made towards reproducibility since 2013, GBSI identified and examined initiatives designed to advance quality and reproducibility. Through this process, we identified key roles for funders, journals, researchers and other stakeholders and recommended actions for future progress. This paper describes our findings and conclusions.


Introduction
Introduction and purpose of the report Preclinical biomedical research is the foundation of health care innovation. The preclinical research process is a cycle of idea generation, experimentation, and reporting of results (Figure 1) 1 . The biomedical research community relies on the reproducibility of published discoveries to create new lines of research and to translate research findings into therapeutic applications. Irreproducibility limits the translatability of basic and applied research to new scientific discoveries and applications.
Although quality control during the research process centers on review of proposals and completed experiments (Figure 1), opportunities to improve reproducibility exist across the entire life-cycle of the research enterprise. In fact, as Figure 1 describes, there are very few steps in the cycle where quality check points are broadly used. By recognizing these opportunities, stakeholders, such as leading scientists, journals, funders, and industry leaders, are taking meaningful steps to address reproducibility throughout the research life-cycle, including commitments to scientific quality, a willingness to examine long-held research policies, and the development of new policies and procedures to improve the process of science.
The magnitude and effects of reproducibility problems are well documented. In 2012, scientists at Amgen reported that they were able to reproduce only 6 of 53 "landmark" preclinical studies 2 .
Global Biological Standards Institute (GBSI) released the "Case for Standards" in 2013 1 , one of the first comprehensive reports to address the rising concern of irreproducible biomedical research. Further attention was drawn to issues that limit scientific selfcorrection, including reporting and publication bias, underpowered studies, lack of open access to methods and data, and editorial and reviewer bias against publishing reproducibility studies (see Section IV) 3 . Based on these findings, GBSI completed an economic study in 2015 and estimated that the prevalence of irreproducible preclinical research exceeds 50%, with associated annual costs of approximately $28B in the United States alone 4 .
Research community stakeholders have responded to these concerns with innovation and policy. In early 2016, GBSI launched the Reproducibility2020 Initiative to leverage the momentum generated by these stakeholder-led initiatives. Reproducibility2020 is a challenge to all stakeholders in the biomedical research community to improve the quality of preclinical biological research by the year 2020. The Reproducibility2020: Progress and Priorities Report (or Report), is the first to highlight progress and track important publications and actions, since the issue started to get broad research community and public attention in 2013 5,6 . The Report addresses progress in the four major components of the research process: study design and data analysis, reagents and reference materials, laboratory protocols, and reporting and review. Moreover, the Report identifies the following broad strategies as integral to the continued improvement of reproducibility in biomedical research: 1) drive quality and ensure greater accountability through strengthened journal and funder policies; 2) engage the research community in establishing communityaccepted standards and guidelines in specific scientific areas; 3) create high quality online training and proficiency testing and make them widely accessible; 4) enhance open access to data and methodologies.
Note to Reader: Terms such as reproducibility, replicability, and robustness lack consistent definition. The Report draws upon the definitions promulgated by the framework proposed by Goodman et al. 7 : "methods reproducibility" refers to the complete and transparent reporting of information required for another researcher to repeat protocols and analytical methods; "results reproducibility" refers to independent attempts to produce the same result with the same protocols (often called "replication"); and "inferential reproducibility" refers to the ability to draw the same conclusions from experimental data. The Report defines "reproducibility" to include issues affecting any of these three areas.

Irreproducibility: Drivers and impact
This report is organized around key areas in the life-sciences research process where action can significantly drive improved reproducibility 4 ( Figure 2): I. Study design and data analysis II. Reagents and reference materials III. Laboratory protocols IV. Reporting and review The following sections contain detailed descriptions of each of these areas, including a review of the associated reproducibility problems, solutions, and examples of recent or current activities to promote greater quality and rigor (summarized in Table 1). The Report outlines the potential impact that lack of reproducibility has on the research community and its stakeholders ( Table 2).

Methods
To identify key initiatives in reproducibility of biomedical research from 2013 to 2017, we conducted a review of literature, U.S. government policies, and online sources using the following keywords: reproducibility, rigor, transparency, and open access. Through these initial searches, we identified conferences on and funders of various efforts associated with reproducibility, which we used to identify other initiatives that were not identified using the keyword approach. We analyzed the information and developed recommended actions for promotion, and roles for life science stakeholders.
Results and discussion I. Study design and analysis Study design is the development of a research framework and analytical methods prior to beginning experiments 8 . A welldesigned study has a research question with a rationale, and clearly defined experimental conditions, sample sizes, and analytic methods. In addition, researchers may include practices, such as blinded analysis, to mitigate subconscious bias. Pre-determining the research questions and sample sizes helps avoid problems such as "p-hacking" and selective reporting, where sample sizes and analytic variables are chosen based on their statistical significance rather than through a research framework (e.g., a hypothesis or an exploratory research model). Poor study design and incorrect data analysis can sabotage even a perfectly executed experiment.
Researcher surveys suggest that study design flaws are a key source of irreproducibility. Four of the top ten irreproducibility factors identified in a researcher survey relate to poor study design and analytical procedures 10 . These findings can promote a multifaceted approach to improving study design and data analysis. Although researchers ultimately are responsible for ensuring sound study design and analysis, funder policies should encourage rigorous study design before research begins, journal requirements should facilitate better review of completed research, and training and support resources should improve researchers' study design and analysis skills.  With respect to study design and analysis, the policy requires grant applicants to evaluate the rigor of prior studies that form the basis of a research proposal, and to justify their proposed study design. In the first round of reviews with the new guidelines, the NIH Center for Scientific Review noted that panels increasingly discussed the areas of emphasis, but that additional communication is required to get all reviewers and applicants on the same page (http://www. csr.nih.gov/CSRPRP/2016/09/implementing-new-rigor-and-transparency-policies-in-review-lessons-le). Formal evaluations of this ongoing effort will provide valuable lessons for NIH and other funders interested in implementing their own rigor and transparency guidelines. To augment these efforts, NIH has worked with the journal community to develop publication guidelines (see Section IV), and funded the development of researcher training programs in study design (see "Training and Support" below) as part of its rigor and reproducibility efforts.

Box 1. Strengthened funder policies
As the largest and most influential research funder in the world, NIH took a major step in establishing new guidelines and going on record that NIH will address other areas where they can impact reproducibility 9 . NIH serves as an important model for other government and private research funders looking to establish greater accountability around quality and rigor. Training and support. Many life-science researchers will require training and support to satisfy the funding and publication policies described above. In the 2016 Proficiency Index Assessment (PIA) (see Box 2), GBSI surveyed over 1,000 researchers of varying experience levels. Participants reported lower confidence in their skills in study design, data management, and analysis compared to their experimental execution skills 13 . Furthermore, research experience did not correlate with higher study design proficiency, suggesting the value of ongoing training and support in this area. New textbooks 8,17 , online minicourses (https://www.nih.gov/researchtraining/rigor-reproducibility/training) 18 and journal articles 19 can be used for course development or independent study by more senior trainees.

Box 2. Online training and proficiency testing
New approaches to training researchers should be a priority for all steps in the research cycle, including the study design training resources described in the Report. Enhanced training should be available for all levels of researchers-graduate students, post-docs, and experienced PIs. Active learning opportunities are particularly important, considering the informal apprenticeship culture of science, in which trainees learn how to design, perform, and report on their research by working with more senior scientists. However, not all senior researchers have the most current expertise or may not be able to spend the requisite time with their trainees. Surveys of researchers support this need: the 2016 Proficiency Index Assessment indicated that even experienced researchers stand to benefit from study design training, and a figshare and Digital Science survey reported that over half of researchers wanted training on open access policies and procedures 13,14 .
Innovative pedagogical approaches are required to ensure that training is effective and engaging for researchers at all stages of their careers. These approaches, including interactive teaching, in-lab practice, and proficiency assessments, are increasingly being explored by many institutions (see "Training and Support" example in Section I). Online training modules are a costeffective way to provide high-quality, accessible, interactive training for researchers at all levels.
The positive response to study design courses established at Johns Hopkins University 20 and Harvard University (https://nanosandothercourses.hms.harvard.edu/node/96) demonstrate the value of study design training. These courses are becoming more widespread and better tailored to the needs of life scientists, but are not universally available or required. Efforts are underway to increase the experimental design skillset of early-career students, but funding in this area has been relatively modest and in general, private funders have seen training and education as the responsibility of government funders and graduate programs. In 2014, NIH funded graduate courses on study design. Since 2014, NIH has issued a series of four funding opportunities for grantees interested in providing study design instruction for their graduate students and postdoctoral trainees through administrative supplements to existing grants (https://www.nih.gov/research-training/rigorreproducibility/funding-opportunities, https://grants.nih.gov/grants/ guide/rfa-files/RFA-GM-15-006.html). Several of these grantees have used the funds to develop study design training programs that are tailored to their respective research areas (https://www.nigms. nih.gov/training/instpredoc/Pages/admin-supplements-prev.aspx). For more computationally-focused researchers, a Harvard course on reproducible genomics is available online for free 21 .
In addition to training, researchers now have increased access to expert support during study design and analysis. University statistics departments often provide free consulting services to affiliated researchers (http://statistics.berkeley.edu/consulting, https://catalyst. harvard.edu/services/biostatsconsult/, http://www.stat.purdue.edu/ scs/), and the Center for Open Science provides a similar service (https://cos.io/our-services/training-services/). The CHDI Foundation provides protocol and study design assistance, evaluation, and review to researchers studying Huntington's disease (http://chdifoundation.org/independent-statistical-standing-committee/). This model may be of interest to other disease-specific funders as a lowcost investment that can improve research rigor and strengthen the community of practice in their mission area.
Together, these training and support resources work together to improve reproducibility by increasing the general standard of rigor for all research. As researchers gain an improved understanding and awareness of study design, they can design their own studies better and more effectively communicate with statistics consultants, conduct peer review, and evaluate published findings that may inform future work.

II. Reagents and reference materials
Reproducibility is difficult if labs are not working with the same research reagents and materials. Supplier-to-supplier variability often is poorly characterized until researchers run into problems with results reproducibility, as demonstrated by the example of synthetic albumin. The structure, stability, and immunogenicity of synthetic albumin varies across suppliers and lots, in ways that are not commonly characterized 22 . In addition, factors, such as lot-tolot material variability, cell line drift, and contamination, can cause an individual researcher's assays to change over time. Examples from other sectors suggest that these problems can be addressed with standards.
Materials developed and validated based on standards are wellcharacterized and demonstrate consistency. Standardized materials that exhibit a predictable behavior can be used reliably in methods reproducibility, and can facilitate development of reference materials for assay validation. Standards of most well-known and often-used biological materials typically apply to particular clinical applications, such as virus strains used in influenza vaccine development 1 . Although preclinical researchers often use standardized chemical reagents (e.g., salts and sugars), few standardized biological materials exist. However, surveys suggest that life science researchers increasingly understand the need for standardized materials 1 , and the research community recently has made progress on cell line authentication and antibody validation.

Standards development for biomedical research reagents.
Stakeholders of preclinical research include researchers, reagent manufacturers, funders, journals, standards experts, and nonprofit organizations from countries throughout the world. Recent efforts to establish antibody databases, information-sharing requirements, and international frameworks for antibody validation standards are good examples of the broad, multi-stakeholder approach required to develop consensus standards around a specific reagent (see Box 3).

Box 3. Improved reagent standards: the Antibody Initiative
The research community has acknowledged that antibodies are an area of widespread error and inaccuracy 23 . The Antibody Validation Initiative, involving stakeholders throughout the research community and led by GBSI, is an example that could be replicated in other scientific areas (e.g. both stem cells and synthetic biology are areas where a greater emphasis on development of standards and best practices are needed to ensure quality and advance discovery). Antibodies are key reagents in preclinical research for activities as diverse as protein visualization, protein quantification, and biochemical signal disruption. Antibody performance is variable, with differences in specificity, reliability, and functionality for different types of experiments (e.g., Western blotting and immunofluorescence), manufacturers, and lots, harming reproducibility 24 . Stakeholder solutions include antibody databases, such as the CiteAB database (https://www.citeab. com/), and repositories, such as the proposed universal library recombinant antibodies for all human gene products 25 . In all cases, validation is a key component of the solution.
NIH specifically highlights antibody authentication in the Rigor and Transparency guidelines, (https://grants.nih.gov/ grants/guide/notice-files/NOT-OD-16-011.html) providing additional impetus for new standards, policies, and practices. Researchers, manufacturers, pharmaceutical companies, funders, and journals have held dedicated conferences on antibody validation e.g. (http://www.antibodyvalidation.co.uk/). In 2016, the International Working Group on Antibody Validation (IWGAV) qualitatively identified key validation "pillars" that may be suitable for assessing antibody performance 26 . Seeking to build on the IWGAV recommendations, GBSI and The Antibody Society organized a workshop for all stakeholder groups to develop actionable recommendations to improve antibody validation 27 . Stakeholder groups recognized the shared responsibility of antibody validation and effective communication of validation methodology and results. In addition, they highlighted the need for continued, multi-sectoral engagement during the development of standards for validation, which may vary by use case, and information-sharing, which may vary by stakeholder.
Since the workshop, GBSI established seven multi-stakeholder working groups to draft validation guidelines for the major antibody applications. Validation guidelines will include an application-specific point system to quantify antibody specificity, sensitivity, and technical performance. The Antibody Validation Initiative also includes a Producer Consortium to address issues of common concern for producers and a Training and Proficiency Assessment program to ensure the highest quality of validation.
Good cell culture practice. One well-known example of developing standards for laboratory reagents is cell culture validation, which includes assay validation, cell line authentication, and testing for contamination 28 . Many commonly-used cell lines are available from repositories, such as ATCC, as well as other nonprofit, governmental, and for-profit organizations. These organizations regularly test and validate the cells, confirming desired cell function and testing for accidental cross-contamination or infection. Researchers in two different labs can purchase validated cells from these providers and be assured that they are receiving the same product, but cells diverge once they are used in the lab. Use of shared sterile culture hoods, incubators, and reagent storage spaces can cause infection with bacteria, viruses, mold, or yeast, and result in unintentional cross-contamination of purchased cells with other cell cultures used in the lab. Even without contamination, genetic changes occur in cells through repeated culturing and experimentation, a process known as cell line drift. Despite these known problems, periodic cell line authentication and infection testing are not universallypracticed in preclinical research even though a human cell authentication standard exists 29,30 .
As with study design, cell culture validation can be enhanced with policies from funders and journals. For example, the Prostate Cancer Foundation has been a leader in validation of cell lines used to study the disease, requiring periodic cell line authentication since 2013. NIH now requires grant applicants to describe their authentication plan as part of the Rigor and Transparency guidelines (https://grants.nih.gov/grants/guide/notice-files/NOT-OD-16-011.html) and many journals now ask researchers to perform cell line authentication (http://www.scoop.it/t/cell-line-contamination/ p/4040895974/2015/04/08/which-journals-ask-for-cell-lineauthentication).
Many of the validation assays required for cell culture validation can be borrowed directly from other applications. In 2011 and 2012, ATCC organized an international group of scientists from academia, regulatory agencies, major cell repositories, government agencies, and industry to develop a standard that describes optimal cell line authentication practices, ANSI/ATCC ASN-0002-2011. The authentication assay uses Short Tandem Repeat (STR) profiling technology and is an affordable cell line authentication tool. The International Cell Line Authentication Committee's Database of Cross-contaminated or Misidentified Cell Lines provides researchers with a dataset to check during the authentication process 31 . For products of animal origin, U.S. Department of Agriculture regulations specify testing protocols for mycoplasma and select viruses 32 and test kits are commercially available.
Improving the reproducibility and translation of biomedical research using cultured cell lines must build on ongoing, multi-stakeholder efforts to raise awareness of the issues of misidentification and the role of authentication 33 . GBSI's #authenticate campaign encourages this kind of stakeholder engagement (www.gbsi.org/authenticate).
Technology and assay development. The development and propagation of standards is an iterative process. For example, recent publications highlight the simultaneous progress in cell line authentication technologies and standards development, including the establishment of reference data standards and cell line authentication policies for the broader research community 28,29 . As technology development progresses, the standards need to be revisited and improved to reflect the current capabilities afforded by new tools 34 . For example, more affordable next generation sequencing is an increasingly useful tool to validate genome editing and characterize changes in cell behavior 35 , and mass spectrometry and lab-on-a-chip assays can help characterize sera and other liquid reagents 36,37 .
Sera validation: an opportunity for standards and technology development. One opportunity to further improve cell culture validation would be to develop standards for sera production and validation. The media used to feed most cells in culture include sera, such as fetal bovine serum, that provides a variety of growth factors and other small molecules. Even authenticated cells may perform very differently in two different sera preparations. Serum is a "black box" ingredient with high variability between manufacturers and lots. Recently developed best practices include characterizing and reporting information on the particular lot(s) of serum/sera used in an experiment, and repeating an experiment with multiple lots of sera to ensure that observed phenotypes are not serum-related artifacts 38 . Serum manufacturers have begun to characterize and validate sera (http://www.bioind.com/support/tech-tips-posters/ introduction-to-fetal-bovine-serum-class/), but no industry standard exists for reporting serum characteristics and reliability.
Further technological development could reduce reliance on sera.
In serum-free culture, researchers precisely define all components of the cell culture medium rather than using a "black box" serum.
Building a system with defined minimum essential components improves reproducibility and enhances scientific understanding of the key signaling molecules involved in biological processes of interest 38 . Researchers are developing and validating robust, serumfree culture systems. Clear material and validation standards are building blocks that facilitate this development.

III. Laboratory protocols
Reproducibility requires thorough, detailed laboratory protocols. Without ready access to the original protocols, researchers may introduce process variability when attempting to reproduce the protocol in their own laboratories. The respondents of the GBSI's Proficiency Index Assessment were more confident in their experimental skills than their study design skills 13 . Despite this relative confidence in their laboratory execution skills, researchers frequently are unable to recreate an experiment based on the experimental methods published in journals, which usually do not contain step-by-step laboratory protocols that specify every relevant variable. Further, a particular study may use a modified version of an established protocol, but state the method was "as previously described" without noting the changes. If attempts to contact authors to request the original protocols are not successful, the reader may not be able to reproduce the methods in the published work. In a Nature survey, nearly half of researchers felt that incomplete experimental protocol descriptions in published articles hindered methods reproduction efforts 10 . Although fewer efforts exist in this key area than in the other three areas described in this report, newly developed tools and processes designed to facilitate protocol sharing and version control may improve documentation and reduce barriers to methods reproduction.
Protocol repositories. Protocol repositories are an innovative approach that may facilitate transparency, protocol sharing, and version control. Researchers can upload their protocols to a repository, such as Protocols.io, precisely specifying all step-by-step instructions with links to required reagents. As the original researchers, or others, modify the protocol, they can document these changes in the repository and create their own "forked" version of the protocol. Protocols in the repository can receive a DOI number, making identification of the precise version used in a publication easier. Suppliers also can post recommended protocols for their products on these websites, which facilitates adoption of their products.
Protocol development requires a robust community of practice, so that protocols can be developed and tested by researchers in different laboratories. This practice ensures that the written instructions are understandable and replicable by a third party. Emerging on-line tools, such as BioSpecimen Commons (The Biodesign Institute at Arizona State University), provides a common location and uniform set of protocols and conditions for clinical sample-related standard operating procedures. Another example is the international Protist Research to Optimize Tools in Genetics group, funded by the Gordon and Betty Moore Foundation, and working on the Protocols.io website (https://www.moore.org/ article-detail?newsUrlName=$8m-awarded-to-scientists-from-thegordon-and-betty-moore-foundation-to-accelerate-developmentof-experimental-model-systems-in-marine-microbial-ecology, https://www.protocols.io/groups/protist-research-to-optimizetools-in-genetics-protg). As of January 2017, this group has 95 members who have contributed 31 protocols to the platform. Although this group does not focus on preclinical research, the practices established by this group are a relevant example that could be reproduced in preclinical research. Preclinical research funders may find added value with version control, protocol forking, and communities of practice in their areas of interest.

Improved protocol reporting in journals. The Principles and
Guidelines for Reporting Preclinical Research also call for "no limit or generous limits on the length of methods sections." (https:// www.nih.gov/research-training/rigor-reproducibility/principlesguidelines-reporting-preclinical-research) However, most methods sections still do not contain step-by-step protocols. Authors submitting to participating journals can include links to Protocols.io in the methods section, specifying the exact version of a protocol that was used in the study with a DOI number (https://www.protocols. io/partners?publishers). In April 2017, PLOS and Protocols.io announced a partnership where PLOS is encouraging their authors to log their experimental methods in Protocols.io (https://www. moore.org/article-detail?newsUrlName=open-access-to-data-andthe-laboratory-methods).
Although methods journals (i.e., those dedicated to publishing detailed methods) usually provide sufficient information about protocols, most scientific publications do not. Even new techniques are not described in full detail because they build on established techniques, the methods for which are not fully described. However, some journals, such as the Journal of Visualized Experiments, publish original, peer-reviewed manuscripts and videos of both established and new techniques (http://www.jove.com/). The use of videos helps to communicate technique subtleties that may not be captured in written instruction. This type of tacit knowledge often only can be obtained by visiting a laboratory and learning directly from the protocol developers.

IV. Reporting and review
The scientific community requires ready access to publications and the original underlying data to adequately review studies and conduct results for reproducibility efforts. Journal reporting guidelines improve methods reproducibility by ensuring that manuscripts contain a minimum standard of required information. Data standards further facilitate this process, as large data sets formatted in an agreed-upon, machine-readable format are easier to find, compare, and integrate across different studies. With better access to data and manuscripts, researchers now can engage in more robust postpublication review. Reducing these barriers can improve reproducibility by identifying potential flaws in published papers, making scientific self-correction and self-checking faster and cheaper.
Enhanced journal reporting guidelines. Journals increasingly recognize the importance of methods reproducibility and are developing more transparent and enhanced reporting guidelines. Co-led by the Nature Publishing Group, the American Association for the Advancement of Science (AAAS; publisher of Science), and the NIH (as part of its Rigor and Reproducibility efforts), the scientific journal community established the Principles and Guidelines for Reporting Preclinical Research in June 2014 (https://www. nih.gov/research-training/rigor-reproducibility/principles-guidelines-reporting-preclinical-research). Per the last update of the NIH website in 2016, 31 journals have signed on to these guidelines (https://www.nih.gov/research-training/rigor-reproducibility/ principles-guidelines-reporting-preclinical-research). The guidelines provide a minimum consensus standard for statistical rigor, reporting transparency, data and material availability, and other relevant best practices, but do not specify in detail exactly what these reporting requirements should be.
More specific guidelines from journals have built upon this initial effort. Differences in implementation of reporting guidelines may cause some short-term confusion among authors and reviewers. However, over time, their implementation could provide long-term benefit in identifying successful approaches and best practices. One initiative that seeks to provide broad direction and even instruction to journals are the Transparency and Openness Promotion (TOP) Guidelines, promulgated by the Center for Open Science's Open Science Framework. TOP includes templates for journals interested in implementing their own reproducibility guidelines, and exist in a tiered framework so journals can gradually implement more stringent standards as they improve their own implementation and review capability 39 . Several of the journals highlighted in the examples listed below are signatories to the TOP guidelines.
• Expanded reproducibility guidelines from the Biophysical Journal are an example of what enhanced journal guidelines look like in practice. These guidelines specifically establish reporting standards in four key areas: Rigorous Statistical Analysis, Transparency and Reproducibility, Data and Image Processing, and Materials and Data Availability (http://www.cell.com/pb/assets/raw/journals/ society/biophysj/PDFs/reproducibility-guidelines.pdf).
• Authors submitting to the Nature Publishing Group family of journals must complete a reporting checklist to ensure compliance with established guidelines, including a requirement that authors detail if and where they are sharing their data (http://www.nature.com/authors/policies/checklist.pdf).
• STAR Methods guidelines (Structured, Transparent, and Accessible Reporting) are designed to improve reporting across Cell Press journals. These guidelines remove length restrictions on methods, provide standardized sections and reporting standards for methods sections, and ensure that authors include adequate resource and contact information (http://www.cell.com/star-methods).
• Since January 2016, researchers funded by the Howard Hughes Medical Institute have been required to adhere to a set of publication guidelines that cover similar areas as the minimum consensus guidelines described above (http:// www.hhmi.org/sites/default/files/About/Policies/sc_300. pdf).
• The Research Resource Identification Initiative establishes unique identifiers for reagents, tools, and materials used in experiments, reducing ambiguity in methods descriptions 40 .
Journals and funders can use two methods to measure and continuously improve implementation of these guidelines: 1) stakeholder feedback studies; and 2) research measuring the frequency of compliance over time. The journal community periodically should reconvene and use data from these evaluations to identify and propagate successful implementation of the Guidelines, and to update and improve the Guidelines.

Data standards.
Policies that ensure open access to the original underlying data and materials can be leveraged more effectively when the data from different studies can be compared easily. Common standards have been incorporated into reporting policies for journals. For example, the Addgene Vector Database provides a repository of published and commercially-available expression vectors (https://www.addgene.org/vector-database/). At least 31 journals recommend or require authors to submit their plasmids to the Addgene repository (https://www.addgene.org/deposit/prepublication/). Addgene performs sequencing to verify submission quality (https://help.addgene.org/hc/en-us/articles/206135535-What-type-of-Quality-Control-does-Addgene-perform-), and requires each contributor to provide the same types of information in a uniform format, making the database easily searchable and comparable.
The Addgene approach works well for plasmids, which consist of a relatively limited number and size compared to high-throughput, whole genome sequencing data sets. As next generation techniques become more widespread, data standards will become even more important. These data standards include metadata (i.e., information about the data set), data fields, and file formats. With data standards, large data sets become much easier to download and interpret, because users do not have to spend valuable and expensive computational time modifying existing analysis tools to fit each new data set. Researchers have proposed a series of metadata checklists for high-throughput studies 48 . Similar to the development of reagent standards described above, updated data standards will require multi-stakeholder collaboration within the community of practice, harnessing existing standards where possible and harmonizing divergent practices where appropriate.
Post-publication review. Scientific review is an ongoing process that continues well after peer-review and publication. The broader scientific community may identify issues that were not highlighted by the peer reviewers, and other researchers may attempt to reproduce a study on their own. As the post-publication review process may require experimentation, it warrants dedicated resources.
Despite the time commitment and added value to science, the research community typically does not reward post-publication review. Historically, funding agencies and tenure boards do not tend to reward results reproducibility studies, and researchers can have trouble convincing journals to review and accept such manuscripts. However, stakeholders from different sectors now are dedicating resources to results reproduction. The Laura and John Arnold Foundation currently is funding a cancer biology results reproducibility study as part of its Reproducibility Project series. The first five attempts to reproduce papers as part of this effort were published in January 2017 in the journal eLife, an open access journal supported by the Howard Hughes Medical Institute, Max Planck Gesellschaft, and the Wellcome Trust 49 . Two of these five studies successfully reproduced the original findings, one study did not, and two attempts were inconclusive. Since the project seeks to reproduce approximately 50 papers, conclusions about the Project's reproducibility rates at this early stage (i.e., after five experiments) would be premature. An earlier project, Reproducibility Project: Psychology, attempted to reproduce 100 original psychology findings, successfully reproducing one-third to one-half of the results 50 . Another open access publication, F1000Research, established the Preclinical Reproducibility and Robustness Channel as a platform dedicated to reproducibility of published papers (https://f1000research.com/channels/PRR).
Researchers attempting to raise concerns to editors about irreproducible or incorrectly analyzed results found in published articles describe many barriers to the process of raising these concerns, including lack of clarity and transparency from journals in the postpublication review process 51 . Similarly, journals do not always have a clearly-defined retraction process that mirrors the submission and peer review processes. Much like the stakeholder discussions on study design, cell line authentication, and open access, the retraction process is an important topic that warrants engagement by the research community. The Committee on Publication Ethics has established best practices for Retraction Guidelines 52 , which may provide an opportunity for this discussion.
Websites, like PubMed Commons and PubPeer, provide an informal mechanism to facilitate post-publication review and results reproduction attempts by providing a discussion forum for researchers to openly discuss scientific publications. Discussions on these platforms can occur much faster than the pace of published technical commentaries in journals, and provide opportunities for more scientists to contribute. Last year, researchers undertook a widespread deployment of the automated statcheck algorithm on nearly 700,000 experiments from over 50,000 papers, and automatically generated comments on PubPeer for each paper 53 . This automated tool helps researchers identify papers that deserve further review and discussion about solutions, such as retraction or publication of counter studies. Discussions on open blogs are a double-edged sword. Whereas rapid turnaround and informal discussion can stimulate productive scientific debate, unmoderated discussion can also lead to unwarranted criticism of legitimate studies. In contrast, technical commentary in journals is refereed by an editor who can help organize and moderate the discussion.
The sheer volume of published research increases the difficulty of identifying and tracking publication errors. Science journalism is another tool that can improve reproducibility. Science reporters, such as the authors of Retraction Watch (www.retractionwatch. com), bring publicity to reproducibility and retraction news, which can galvanize the scientific community to action. For example, replicability of the initial paper describing the NgAgo genome editing technique has been the subject of fierce debate in the community wherein researchers described their difficulties in reproducing the paper's claims on internet and scientific news sites. The technique drew so much attention that over 100 researchers attempted to reproduce the technique in the first few months after publication, but less than 10% were successful 54 . The controversy resulted in three peer-reviewed publications, all of which documented a failure to reproduce the original study, and researchers now are trying to understand the reasons for irreproducibility 55 .
Retraction Watch also partners with the Center for Open Science to generate a database of retractions, as some retracted articles still are cited frequently after retraction 56 . Researchers armed with this database can avoid using retracted work as a (shaky) foundation for new studies, thereby increasing their chance of success. By reading about reproducibility and retraction news, researchers can learn about the common pitfalls that can cause retractions and new resources available to help them improve the reproducibility of their work, such as the initiatives described in this report. However, highly-visible retractions are a potential threat to public confidence and support for science, as the lay public reads more about retractions and irreproducibility. This further highlights the urgent need for the scientific community to act on the initiatives described in this report and make meaningful improvements to reproducibility.

Conclusion: a path forward
Irreproducibility is a serious and costly problem in the life sciences. Measured reproducibility rates are shockingly low, requiring significant effort to solve this problem. Many stakeholders now recognize the importance of reproducibility and are taking steps to develop and implement meaningful policies, practices, and resources to address the underlying issues. The lessons learned from these early efforts will assist all stakeholders seeking to scale up or replicate successful initiatives. The research community is making progress to improve research quality. By prioritizing the strategies outlined in the Report, stakeholders in life science research will continue to make progress in improving reproducibility and in turn have a profound positive impact on the subsequent development of treatments and cures.
However, the authors would be remiss if we ignored a transcending challenge facing the research community and their willingness to voluntarily accept these positive steps in addressing reproducibility: the current rewards system in academia, including constant pressure to obtain grants and publish in "high impact" journals. The research culture, particularly at academic institutions, must seek greater balance between the pressures of career advancement and advancing rigorous research through standards and best practices. We believe that the many initiatives described in this Report add needed momentum to this emerging culture shift in science, but additional leadership and community-wide support will be needed to better align incentives with reproducible science and effect this change.
Continued transparent, international, multi-stakeholder engagement is the way forward to better, more impactful science. GBSI calls on all stakeholders -individuals and organizations alike -to take action to improve reproducibility in the preclinical life sciences by joining an existing effort, replicating successful policies and practices, providing resources to results reproduction efforts, and/or taking on new opportunities. Table 3 contains specific actions that each stakeholder group can take to enhance reproducibility. In its leadership role, GBSI will: • work with journals and funders to encourage policies that increase rigor, accountability and open access to data and methodologies; • lead the effort toward improving the validation of reagentsparticularly cells and antibodies-and work with the research community to explore other scientific areas (e.g. stem cells and synthetic biology) where a greater emphasis on development of standards and best practices are needed to ensure quality and advance discovery; • ensure high quality, accessible online training modules available to both emerging and experienced researchers who are eager to improve their proficiencies in new and evolving best practices; and • continue to track reproducibility efforts through the Repro-ducibility2020 Initiative.
The preclinical research community is full of talented, motivated people who care deeply about producing high-quality science. We are optimistic about the potential to improve reproducibility, and look forward to contributing to the effort.
Author contributions LF, GV, and RW conceived of the review study. LF and RW developed the initial outline and GV carried out most of the literature review and completed the first draft. All authors were involved in subsequent revisions of the manuscript and have agreed to the final content.

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work.

2.
3. ). It extrapolates to all of US Biomedical Funding from a few estimates of irreproducibility research in specific fields. I know of no quantitative research that evaluates reproducibility of published basic research in zebra fish or drosophila communities. If reproducibility problems are greater in cancer, human cell lines, and other research fields, the overall scale of the reproducibility problem across all biomedical research could be smaller.

Open Peer Review
Also, I very much appreciate the authors' note that the "irreproducible" definition is tricky and that they include results, methods, and inferential reproducibility in their analysis. So, the results may be simply "hard to reproduce" due to missing details or reagents, but they would be included in the "irreproducible total". The definition issues further complicate the attempt to estimate in dollar amounts the scale of irreproducible research.
Instead of saying, "the prevalence of irreproducible preclinical research exceeds 50%, with associated annual costs of approximately $28B in the United States alone", I urge the authors to simply refer to their publication with something more general such as, "GBSI's 2015 economic study highlighted the high level of economic costs from poor reproducibility." [Study design and analysis] Box 2 recommends online training courses as highly cost-effective. It is true that they are cost effective, but are they effective when it comes to improving study design? Given how busy scientists tend to be, it is unclear that they will actually devote time to watching online training videos. (For example, podcasts for scientists tend to be consumed much more readily than videos of the same length, as people can listen during commute, runs, cooking, etc. In contrast, videos longer than 3-4 minutes are barely watched by anyone to the end.)

[Laboratory protocols]
This section should probably mention the from which is a Protocol Exchange Nature/Springer protocol repository that was started over a decade ago to improve the reporting of methods.
The authors might also want to include a mention of Bio-protocol, a journal devoted to increasing reproducibility. Though a selective peer-reviewed journal rather than a repository, Bio-protocol is also connecting to journals and in their author guidelines to encourage eLife recently included them scientists when appropriate to submit new method details to Bio-protocol in parallel with their eLife manuscript submission.

[Reporting and review]
In the data reporting section, I recommend adding a brief discussion of data repositories such at and . Journal policies regarding data sharing are critical and this overview of the Dryad figshare genomics community journal policies from Heather Piwowar and Wendy Chapman is relevant: . http://elpub.scix.net/data/works/att/001_elpub2008.content.pdf Also, the explicit from the Public Library of Science is an important step in improving data policy reproducibility of published work.
Related to the data policies, sharing code and software from computational pipelines used to analyze the data is critical. Perhaps add a mention of policies encouraging proper reporting and This section does a good job of summarizing open access initiatives and policies from funders, but the link to reproducibility is unclear. As an advocate for open access, I am delighted to see these developments, but the connection between open access publishing and increased reproducibility is not obvious to me.
A paper in a subscription journal can be solid and reproducible, while one in an open access journal is not. The reverse is just as likely. Certainly, this is more a function of chance and editorial and peer review vigilance than the journal's business model.
An argument can be made for how open access enables reproducibility initiatives (ex. CiteAb), but I don't think I saw it in this paper.
[Reporting and review: preprints] As above for open access, I am a huge fan of preprints but am unsure how they fit into the push for greater reproducibility. Preprints, of course, shorten publication delays, facilitating communication and speeding up research. However, preprints are not peer-reviewed, do not go through conflict-of-interest checks, data/method reporting compliance checks, and so forth. At scale adoption of preprints in biology is welcome for many reasons, but not exactly due to more rigor and higher reproducibility.
(Possibly, preprints reduce the pressure to publish and create a track record of a paper's initial state, reducing publication biases? Preprints can also help to challenge previously-published work and to report negative results. If these are the arguments for preprints improving reproducibility, please make this case explicitly in the manuscript.) (Minor note: the use of "preprint" versus "pre-print" is inconsistent in this paper. Please remove the extra dash.) [ Table 3, action plan] , there is a recommendation to " ". I

For funders
Enact policies requiring study design pre-registration am on the steering committee for COS's pre-registration initiative and support this effort, but I am not sure that "requiring" pre-registration widely is appropriate. This will depend on the funder and 10.

11.
12. not sure that "requiring" pre-registration widely is appropriate. This will depend on the funder and specific research grant. For example, in the case of method development and highly explorative grants, pre-registration is unlikely to be productive. How about "encourage where appropriate" instead of "require"?
For journals, there is a recommendation to " ". Require authors to link to version-controlled protocols Again, "require" is a strong term. In certain cases, it may be better to share a protocol directly as part of the publication (for example, JOVE). A more general "encourage or require detailed reporting of protocols" may be more appropriate.
[Conclusion] "Irreproducibility is a serious and costly problem in the life sciences. Measured reproducibility rates are shockingly low, requiring significant effort to solve this problem." I very much agree with the first sentence in that irreproducibility is a serious problem. However, is the reproducibility rate "shockingly" low? What is that rate for biology in general? As discussed above, 50% may be the number for some fields but not for others. More importantly, what rate are we aiming for? 70%? 90%? If all of the action items recommended in this report were followed, what rate would we end up with? Is our current level of reproducibility better or worse than it was 30 years ago? What is the optimal reproducibility rate from society's perspective? I don't have the answers to the above questions. We need a lot more data to make informed statements about the levels of reproducibility over time. It is terrific that we are discussing this issue and the initiatives to address the problem, but I urge caution in editorializing about whether today's reproducibility levels are a "crisis" or are "shocking". Science is hard and because it is pushing the boundaries of knowledge, we will never be at 100% of published research being reproducible. We can and should do a lot better, hence all of the initiatives, but it will never be 100%.

[General thoughts]
As I mention in #11 above, with the exception of a few efforts from Science Exchange and the Center for Open Science, we have very little data on the reproducibility issue. The authors may want to include in their discussion the need for more quantitative studies about replication and reproducibility over time. We need ways to assess the various initiatives and to measure whether they are in fact improving the overall reproducibility levels of published research.
Also, most of the recommendations and discussion in this Report are focused on the design, execution, and publication steps of the research cycle. However, given the complexity of research and the fact that we will never attain 100% reproducibility, efforts aimed at post-publication opportunities to improve reproducibility may be particularly effective. Perhaps we should pay more attention not just to preventing mistakes, but to ways to correct and improve papers, long after publication.
This Report mentions post-publication review and retractions, but there are other promising efforts in this phase. Versioning, as implemented on F1000Research and bioRxiv, has great potential. There is a need for technologies that automatically connect readers to corrections and discussion on the papers that they have in their libraries. Crossmark from Crossref is a great initiative aimed at making corrections discoverable. Also, an interesting argues for rethinking of recent proposal "retractions/corrections" in favor of "amendments" to increase post-publication evolution and improvement of work. The paper is interesting, well-written, and well-documented. I appreciated the many web links that take the reader directly to interesting sites.
The authors suggest that the current crisis begins with the Amgen findings (Reference 2). While that was a defining moment, I wonder whether it's also worth mentioning that contemporary discussion about false research findings dates back at least as far back as Ioannidis 2005 ( . Ioannidis there suggests that exploratory research https://doi.org/10.1371/journal.pmed.0020124) was highly vulnerable because of small sample sizes, overly flexible designs, and biased designs (e.g. with lack of randomization and proper masking). Table 1: I commend the authors for noting that "the chance of an irreproducible finding is much higher than the commonly noted 5% threshold." This is widely under-appreciated, even by well-trained scientists. The authors might consider spelling out that prospective, properly done sample size calculations are critical to overcoming this problem. The "elephant in the room" is that sample sizes will have to increase substantially, meaning that with constrained funds researchers will be forced to conduct fewer experiments. But as some have noted (Cressey D, Nature, April ), that may be good for the enterprise -it would be better to do fewer properly powered 15, 2015 experiments than to do too many woefully underpowered experiments. Table 1 and elsewhere: Should there be a "Consumer Reports" for antibodies, cell lines, and other resources? Or maybe I'm missing it, and you're saying that's happening. Such a "Consumer Reports" would allow for large-scale surveys in which researchers can report problems with purchased materials. Table 1: Another potential solution to study design and analysis is mandatory sharing of statistical code (e.g. in SAS, R, or Stata). This is already common practice in some fields (e.g. economics). Table 2: Another consequence for the public is lack of faith in science. They hear scientists promising the moon, and then nothing happens. Table 2: There is an ethical problem subjecting animals and people to inadequately designed or documented experiments that were doomed to be irreproducible from the beginning. Table 2 or elsewhere: NAS just released a report on research integrity in which notes a continuum between frank misconduct (fabrication, falsification, and plagiarism) and "practices detrimental to research." The authors might want to consider the comments of the report ( . https://www.nap.edu/catalog/21896/fostering-integrity-in-research) There have been some recent successes in improved rigor, such as in preclinical stroke research. (For example, see . The authors http://circres.ahajournals.org/content/early/2017/04/04/CIRCRESAHA.117.310628) note that "stroke research has uniquely improved." Page 6 -the link didn't take me directly to "Statcheck software," though I did eventually find it.
Protocols -many leading clinical journals require authors to submit full clinical trial protocols along Protocols -many leading clinical journals require authors to submit full clinical trial protocols along with the manuscripts. Table 3 Should it be the responsibility of funders to provide statistical consultation to applicants? Should it be the responsibility of funders to pay for open access and transparency tools? Should funders include dedicated reviews on methodological issues for those applications deemed meritorious by content?
Is the topic of the review discussed comprehensively in the context of the current literature? Yes

Is the review written in accessible language? Yes
Are the conclusions drawn appropriate in the context of the current research literature? Yes No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.