Three more steps toward better science

Science has striven to do better since its inception and has given us good philosophies, methodologies and statistical tools that, in their own way, do reasonably well for purpose. Unfortunately, progress has also been marred by historical clashes among perspectives, typically between frequentists and Bayesians, leading to troubles such as the current reproducibility crises. Here I wish to propose that science could do better with more resilient structures, more useful methodological tutorials, and clearer signaling regarding how much we can trust what it produces.

Science has striven to do better since its inception. For example, empiricism was sought as an alternative mode of learning as early as the XVI Century (Ball, 2012); XIX Century researchers sought a less subjective approach to learning from data via frequentist statistics, which progressively displaced Bayesian inference (Gigerenzer et al., 1989); in the XX Century, seeking a better way of establishing causation, Fisher (e.g., 1954) popularized a consistent framework of experimental design and frequentist inference based on small samples; Neyman & Pearson (e.g., 1928) expanded on Fisher's statistical innovations to bring about more control of research power; Jeffreys (e.g., 1961) countered with a more nuanced approach toward evidential support for hypotheses via his Bayes factor; Cohen (1988) veered the focus away from significance testing and toward practical importance with his seminal work on effect sizes and power analyses; Mayo (e.g., 2018) is nowadays popularizing a framework based on severity testing for better frequentist inference; and computational advancements are giving full Bayesian inference a new opportunity to claw back the territory lost since the XX Century (McGrayne, 2012).
Such historical drive has given us good tools for purpose, including philosophies and methodologies, as well as statistical tools for exploratory data analyses, data testing, hypothesis testing, and replication research. The path has not been easy, with a lot of effort gone onto warring among different philosophies, methodologies, and statistical approaches, and leading to troubles such as the current reproducibility crises (e.g., Fanelli, 2018).
Still, most approaches have been put forth and defended on the common goal of bettering science and, in their own way, all do so reasonably well. For example, Table 1 summarizes results obtained using different testing approaches, all concluding with similar inferences. Therefore, the real "enemy" is not what makes for better science but what makes for worse science: namely, problems with methodological control, with the misunderstanding and misuse of statistics, and with unsupported conclusions (i.e., with ethical concerns and with the use of scientific methods in a pseudoscientific manner; Perezgonzalez & Frías-Navarro, 2018).
Such enemy will be difficult to defeat. On first impression, science seems to suffer the fate of the 'tragedy of the commons', the 'free-rider dilemma' being, perhaps, its most specific affliction (Fisher, 2008). A recent book by Taleb (2018) on asymmetry sheds some light on the gaming element of science, namely on its misuse of analytical models, agency problems, asymmetric information

Amendments from Version 1
The new version addresses the main shortcomings pointed out by peer-reviewers. I have expanded Table 1, which now contains a column for t-tests (with degrees of freedom), as well as footnotes, to clarify that effect sizes and Severity statistics are based on observed effects. New entries in the text point to initiatives such as the STRATOS Initiative and overlay journals, both of which are quite consistent with the recommendations made in the manuscript.
My affiliation has been updated to "School of Aviation, Massey Business School, Massey University, Palmerston North, 4442, New Zealand".

REVISED
sharing, and the rationality of the enterprise. Taleb also proposes three solutions that we could expand upon to provide a synergic path for how to go about bettering science (Perezgonzalez, 2018).
Firstly, there is a need to make 'scientific structures' more resilient, for them to deliver the outcomes they were set up for: widespread accessibility and quality control. For example, open access publishing is nowadays countering the paywall limitations of traditional scientific publishing and its bias toward novel research with significant results, thus addressing important academic and social backlashes (Kelly, 2018;Schiltz, 2018).
Unfortunately, it has also motivated the rise of predatory journals catering for the same pool of conscientious researchers. To counter the explosion of these predatory journals some idiosyncratic blacklists (e.g., the defunct Beall's list) and organizational whitelists (e.g., Directory of Open Access Journals) have been created, albeit with mix success. Meanwhile, online repositories and preprint servers are challenging the entry costs of open access journals, thus making widespread communication more resilient but with the drawback of lacking good quality control-although overlay journals are taking care of the latter drawback.
Quality control itself has received more attention of lately, with some journals becoming more transparent about who peerreviews, while platforms such as Publons.com provide peer-review services and credit, including access to peerreviews when allowed. Among quality-control structures is worth mentioning F1000Research, a publication platform that sits at the fringe of a paid preprint and a fully transparent peerreviewed open access journal. This seems a more resilient structure worthy of emulation and improvement.
Perhaps more importantly, a new need is becoming imperative: To find an effective solution to the indexing and curation of the ever expanding universe of research outputs. We do have, for example, Altmetric.com, albeit it is too geared toward scoring research outputs. Instead, what we need is an integrated solution to the indexing of both an output and all related content relevant to it, including post-publication reviews, comments in blogs and preprint servers, retraction notices, and the like. We also need a good solution to curating the entire spectrum of research outputs, moving from a plethora of stand-alone manuscripts toward mega-content organized as, for example, research topics.
Secondly, 'minority movements' do have an impact on science via creating the above new structures (e.g., open access, repositories…), but also by improving on legacy ones (e.g., postpublication review sites such as PubPeer.com). Paramount among such movements have been those calling for Open Science (e.g., Banks et al., 2018) and research ethics (e.g., Committee on Publications Ethics, RetractionWatch.com).
Minority movements also have an impact on other aspects of science, from calls toward a better use of frequentist statistics (Perezgonzalez, 2015) to the outright banning of p-values (Trafimow & Marks, 2015), to the alternative use of Bayesian statistics (Wagenmakers et al., 2018b) or mixed approaches (Perezgonzalez & Frías-Navarro, 2018). Because of the intrinsic social dynamics of minority groups, the polarization of inter-group attitudes and consequential external warring are not only unsurprising but also expected. Yet, as alternative scientific approaches mostly have a different research focus, science has been less productive than it could be because more effort has been put into warring among factions than into clearly explaining what each provides to the advancement of science (Mayo, 2018). This has allowed specific methodological knowledge to be too much textbook-based, thereby more aligned with editorial concerns than with the advancement of science (Gigerenzer, 2004), or to be polarized by the intrinsic dynamics of minority groups. Thus, what we presently need are good tutorials on the purpose of each approach and on how to effectively use them for such purpose; preferably, tutorials which are independently created by unfettered authors rather than centrally abridged by textbook editors, so as to provide a diversity of options able to address the same topic from different perspectives and to cater to different stakeholders (e.g., researchers, reviewers, and readers; novice and experts; technically-focused, philosophically-aware, as well as practitioners; etc.-see also the STRATOS Initiative, already working toward a similar goal, Stratos-initiative.org). Such diversity will also allow for progressively developing optimal tutorials that minimize steep learning curves, capture methodological errors, and avoid philosophical and interpretive misconceptions.
Finally, there is the need to signal how much 'soul is in the game' in each piece of published research. The pre-registration movement is achieving this via badges; most journals require authors to signal adherence to ethical principles via the corresponding disclaimers; some journals actively signal their peer-reviewing by naming peer-reviewers-e.g., Frontiersin. com, F1000Research.com-or by allowing open access to peerreviews-e.g., via Publons.com. What we are presently lacking is good signaling to address methodological concerns and the avoidance of pseudoscience. That is, for authors to signal that they have followed, for example, Fisher's approach to data testing, or Neyman-Pearson's approach, or Mayo's severity approach, or Jeffreys's approach, or a full Bayesian approach; in brief, for them to signal when their research is compliant with the requisites of any of those approaches.

Data availability
No data is associated with this article.

Grant information
The author(s) declared that no grant(s) were involved in supporting this work.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Some of the statements in the paper are a little vague. Most of the "meat" of the paper comes in the penultimate paper. It might be worth separating out and clearly stating the specific recommendations. Table 1 presented SEV values. However, SEV is always with respect to a and an specific inference observation (an observation). For example, for an observation of e.g., mu = 17, the inference mu1 = 12 and the inference mu1 = 14 would have different SEV values, but Table 1 gives no indication of what inference is being made (I assume the inference is the same as the observed result, but this should at least be indicated somewhere). It also appears that Table 1 presented the results of t-tests. I would be inclined to include a test statistic (or an N or both -with an N the reader could calculate the test stat, or with a test stat the reader could calculate the N). I think it is almost essential, because the sample size is one of the key determinants of when the various inferential frameworks actually come apart. I think in its current form Table 1 paints a slightly misleading picture.
On Page 3, Para 1: It may be worth discussing "overlay journals". These exist in physics and computing, but recently an overlay journal in neuroscience has also been launch (Neuroscience, Behaviour, Data and Theory). Thank you very much for your review and useful pointers.
I think this paper could benefit from a minor revision. Some of the statements in the paper are a little vague. Most of the "meat" of the paper comes in the penultimate paper. It might be worth separating out and clearly stating the specific recommendations. I agree that the text is a little vague at times, albeit this is namely for the reason of lack of effective control on how the recommendations will eventually be implemented. For example, I can foresee the benefit of tutorials, in general (recommendation 2), but I cannot phantom whether there is a perfect tutorial to satisfy everyone (multiple and diverse tutorials may be needed for a "market-type" selection to kick-in, thus signalling helpful from less helpful ones). Actually, it is the third recommendation (in the penultimate paragraph) the one I see affected the least by variability in our imagination, as it calls for specific standards similar to standards already existing elsewhere (which is also why I placed it last, so to somewhat finish the commentary with a clearer, less vague, recommendation). It is for those reasons that I find it difficult to make the remaining recommendations more specific, and had to resort to the use of the conventional keywords "firstly", "secondly", and "finally" to somewhat constrain them to particular sections in the manuscript.
Still so, I also aimed to put the recommendations more clearly towards the end of their sections: better indexing and curation as Recommendation 1 (this implies software-based indexing and curation, but I have little idea whether such software will work well); tutorials (but, again, I am not sure how well the idea will work; nonetheless, I added a correction linking to the STRATOS Initiative, at , which may be one of the ways forward; other could be a www.stratos-initiative.org methodology-based overlay journal, as you mentioned).

Table 1 presented SEV values. However, SEV is always with respect to a specific
and an observation (an observation). For example, for an observation of e.g., mu inference = 17, the inference mu1 = 12 and the inference mu1 = 14 would have different SEV values, but Table 1 Table 1 Table 1 paints a slightly misleading picture. I have added a new column with the -test statistics and corresponding degrees of freedom. I have t also extended the note on Severity to clarify it and also give a quick pointer for assessment, as "SEV: severity tests based on the observed effects (severity is strong if greater than 0. 80;e.g., Mayo, 1996)" On Page 3, Para 1: It may be worth discussing "overlay journals". These exist in physics and computing, but recently an overlay journal in neuroscience has also been launch (Neuroscience, Behaviour, Data and Theory). Thanks for the pointer. I wasn't aware of overlay journals. But they certainly are a pretty good solution. I have added the following statement to paragraph 1, page 3: -although overlay journals " are taking care of the later drawback". However, I thought it would be not too good an idea to name them as the recommendation focuses on the idea of generating them; thus pointing to some (which may or may not be useful, in hindsight) seems distracting. I have, however, added a new entry to an initiative that seems to be working on the same idea, the STRATOS Initiative).

No competing interests. Competing Interests:
Further comments: In the abstract, the statement 'warring among different perspective' may need a bit more clarity. The paragraph where the second issue is explained, could start with 'Secondly'. (The first issue is introduced with 'firstly', the third issue with 'finally'.)  Regarding scientific structures and peer-review: my issue is that peer review is very important (I am afraid that reviewers are sometimes the only people who check the contents of a study report), but it remains a largely uncredited -almost secret -endeavor. Don't you agree that it still needs to be made more official, e.g. that universities do not only demand their researchers to publish papers, but also to deliver decent peer review reports? For example, when I started my tenure track, the university told me what they expected in terms of publication output, grants, and PhD guidance, but they said nothing about peer review, and they never evaluated me on these terms. A decent peer review should almost count as much as publishing a paper? 'Factions' (bottom of first column on p3): what is that? Would the STRATOS initiative ( ), coordinated by Willi Sauerbrei from www.stratos-initiative.org Freiburg, be in line with the second issue? It brings together experts on different methodological topics in the context of observational studies, with the aim to provide guidance on how to address these topics. (I am a member of this initiative.)

Are the conclusions drawn balanced and justified on the basis of the presented arguments? Yes
No competing interests were disclosed. This is an opinion piece, so I reviewed it as such. The paper discusses a few possible directions for improving the scientific process. I largely agree with the expressed opinions. That said, I think that the text remains quite general and vague at times. For example, with respect to the second issue, what do you mean with 'tutorials which are independently created rather than centrally edited'? About the third issue: what is your specific suggestion? That researchers should better frame their work before starting it (as to what methodology and statistical approach they will use), and adhere to that when writing the report? This is unclear. I agree that the text is "general and vague at times", albeit this is namely because I don't know in which form the recommendations will eventually be implemented.
Independently created tutorials calls for multiple and diverse works to be done by independent authors or groups of authors, as opposed to them being 'centrally edited' by a publisher (e.g., of textbooks on methods, or statistics). The noted sentence follows from an earlier assertion, that "methodological knowledge [is] too much textbook-based, thereby more aligned with editorial concerns than with the advancement of science (Gigerenzer, 2004)". At the time, I also found it difficult to write it better without repeating 'independently / independent authors', and 'centrally edited / textbook editors'. I have attempted it with synonyms here (albeit it may read a bit 'forced'). It now reads "…tutorials which are independently created by unfettered authors rather than centrally abridged by textbook editors…" The third issue is a bit simpler than framing our work a priori (although it may help with this, as well). It is more like framing our work for publication. E.g., those who have followed a Fisherian approach, would say so but also write in a manner that is consistent with such approach, as the assumptions and inferential process are different to those following a Neyman-Pearson approach, or a Jeffreysian approach, etc. If no approach is clear or if they got mixed up in the process of doing the research, then no standard ought to be indicated (it would be misleading, otherwise).
The recommendation is more or less the following: In the same way we may decide to release a document under a particular creative commons license (or none), a license which we need to specify and which binds the document to it; we could also release a research report under a particular research standard (or none), and we shall specify such standard so that peer-reviewers can assess the manuscript as per compliance with those standards, and readers can understand the results in reference to such standards. This also means the standards need to be negotiated, approved and hosted as for facilitating quick referencing for the aforementioned peer-reviewers and readers (and, of course, authors).
Further comments: In the abstract, the statement 'warring among different perspective' may need a bit more clarity. The paragraph where the second issue is explained, could start with 'Secondly'. (The first issue is introduced with 'firstly', the third issue with 'finally'.) I have re-written the abstract, substituting 'warring' by "historical clashes among different I have re-written the abstract, substituting 'warring' by "historical clashes among different perspectives, typically between frequentists and Bayesians".
The second issue actually starts with 'secondly', when I introduced the idea of 'minority movements'. p-values and 'frequentist decision' Severity statistics (which is also a frequentist approach) Bayes factors and 'Bayesian inference' I also clarified a bit more severity testing.
The paragraph starting with 'such enemy will be difficult to defeat' is quite vague. E.g. what do you mean with terms like 'tragedy of the commons', 'asymmetric information sharing', 'rationality of the enterprise', and others? All those constructs are found in the book by Taleb (2018). The paragraph is meant to quickly brush over them as a quick introduction, as the really relevant constructs (also Taleb's) follow: 'scientific structures', 'minority movements', and 'soul in the game'.
Pre-print servers (line 2 of p3): do you refer to repositories like arXiv? Both, actually, as they often have such a dual role: either as a final repository or as a repository of manuscripts prior to them being sent for publication elsewhere (nowadays, many journals accept the latter, as long as they are in repositories / preprint servers). I have nonetheless added the concept 'online repositories' to the text.
Regarding scientific structures and peer-review: my issue is that peer review is very important (I am afraid that reviewers are sometimes the only people who check the contents of a study report), but it remains a largely uncredited -almost secret -endeavor. Don't you agree that it still needs to be made more official, e.g. that universities do not only demand their researchers to publish papers, but also to deliver decent peer review reports? For example, when I started my tenure track, the university told me what they expected in terms of publication output, grants, and PhD guidance, but they said nothing about peer review, and they never evaluated me on these terms. A decent peer review should almost count as much as publishing a paper? I fully agree. In fact, I think that a decent peer-review should count as much as publishing a paper and academics could, ideally, gain tenure and the like on peer-reviewing alone, not just on teaching or researching, as this allows for good, committed peer-reviewers and increased quality standards (the problem is how to define 'decent'!). That's why I added an entry on 'quality control' under recommendation 1 for more resilient scientific structures. However, it will be very difficult to know who has peer-reviewed what and with what quality. This means that all comes down to trust: the academic on the tenure track would claim to have done 'x' reviews for 'y' journals…but it will be difficult to prove.
Thus far, I know of platforms such as Publons, PubPeer, FrontiersIn, and F1000Research, which would somewhat "reward" reviewers and/or publish peer-reviews. Of these, Publons actually rewards peer-reviewing and allows for generating some statistics in the form of percentiles and graphs. But reviews are only displayed depending on particular journal permissions. On the other graphs. But reviews are only displayed depending on particular journal permissions. On the other hand, F1000Research does actually publish reviews and give them a doi. But there is no statistical summary of any form for reviewers. Thus, independently acknowledging / rewarding peer-review will be difficult unless the action of peer-reviewing have been somewhat independently 'vetted' (Publons have such control…but I am not too sure how well it works) or peer reviews are openly displayed (F1000Research). I don't' see Universities caring for a career in peer-reviewing any time soon, though.
'Factions' (bottom of first column on p3): what is that? Factions follow the "war" theme of Mayo's latest book, consistent with similar concepts in the paragraph (enemies, polarization, warring, etc.). I have repeated Mayo's reference in this paragraph.
Would the STRATOS initiative (www.stratos-initiative.org), coordinated by Willi Sauerbrei from Freiburg, be in line with the second issue? It brings together experts on different methodological topics in the context of observational studies, with the aim to provide guidance on how to address these topics. (I am a member of this initiative.) Yes… I think that you have almost 'pulled the rug from under my feet' here. The STRATOS initiative is pretty much in line with that point. I have added the following reference to the manuscript ("-see also the STRATOS Initiative, already working toward a similar goal, " ). www.stratos-initiative.org

No competing interests. Competing Interests:
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com