An assessment of the autism neuroimaging literature for the prospects of re-executability

Background: The degree of reproducibility of the neuroimaging literature in psychiatric application areas has been called into question and the issues that relate to this reproducibility are extremely complex. Some of these complexities have to do with the underlying biology of the disorders that we study and others arise due to the technology we apply to the analysis of the data we collect. Ultimately, the observations we make get communicated to the rest of the community through publications in the scientific literature. Methods: We sought to perform a ‘re-executability survey’ to evaluate the recent neuroimaging literature with an eye toward seeing if the technical aspects of our publication practices are helping or hindering the overall quest for a more reproducible understanding of brain development and aging. The topic areas examined include availability of the data, the precision of the imaging method description and the reporting of the statistical analytic approach, and the availability of the complete results. We applied the survey to 50 publications in the autism neuroimaging literature that were published between September 16, 2017 to October 1, 2018. Results: The results of the survey indicate that for the literature examined, data that is not already part of a public repository is rarely available, software tools are usually named but versions and operating system are not, it is expected that reasonably skilled analysts could approximately perform the analyses described, and the complete results of the studies are rarely available. Conclusions: We have identified that there is ample room for improvement in research publication practices. We hope exposing these issues in the retrospective literature can provide guidance and motivation for improving this aspect of our reporting practices in the future.


Introduction
There is concern about the status of reproducibility in science in general and neuroimaging neuroscience in particular (Button et al., 2013;Gorgolewski & Poldrack, 2016).A particularly germane concern was expressed by Kapur and colleagues in lamenting: "a profusion of statistically significant, but minimally differentiating, biological findings; 'approximate replications' of these findings in a way that neither confirms nor refutes them" (Kapur et al., 2012).The replication of a specific finding (or reproducibility of a specific analysis), as reflected in a publication, has many details and nuances to it (Kennedy et al., 2019).Often, we are searching for the 'generalizability' of a finding: does the finding hold true when using 'similar' data and a 'similar' analysis.The similarity of data (or analysis) is a fuzzy concept.One could have a population with the same number of subjects with the same diagnosis, having the same mean age and same gender distribution as a target population; however, if the diagnosis in question is a 'spectrum '-diagnosis (for example, autism, schizophrenia, depression, etc.), despite the 'sameness' of my sample in the aforementioned categories, the detailed nature of the characteristics of my sample in the features of the diagnosis itself can still be quite variable.At the level of a biological finding, we typically do not predicate the finding on an exact acquisition protocol, or a specific analysis protocol; rather, it is implicit in our finding that it should hold for other valid acquisitions and analyses of the reported types.There is increasing evidence that this implicit assumption of similarity, when it relates to the specific details of acquisition or analysis, does not necessarily hold (Glatard et al., 2015).Some have argued that the starting point for the structured exploration of the generalizability of a specific finding (and thus a cornerstone to the quest for reproducibility) lies in the original finding itself being re-executable (Ghosh et al., 2017;Kennedy, 2019).Starting from the re-execution of a finding will allow for the systematic exploration of the generalizability of that finding, over changes in data and analysis.To date, when new studies find different findings from prior studies, it is too easy to simply argue that differences in the subject population or analysis workflow differences account for the discrepancy.
In this paper we concentrate on assessing the technical prospects of re-executability of a publication.As introduced above, there are many other factors that will contribute to the actual generalization of the findings including subject population details, data acquisition details, the nature of the processing and statistics (even if they can be re-executed), the underlying biological effect size, if present, etc. (see Figure 1).Take for example, the subject population.Too often researchers communicate a finding based on a convenience sample without any statement indicating that the results might not generalize to a sample that more accurately reflects human diversity (e.g.DeJesus et al., 2019;Henrich et al., 2010;Hruschka et al., 2018;Rad et al., 2018).Comprehensive and standardized description of all these additional factors are critical as well, but are beyond the scope of this evaluation.Our groups and others are looking into reporting standards for these areas as well.
The potential impact of reproducibility issues become most obvious when trying to make sense of the accumulated literature on specific topic areas (Rane et al., 2015).For this reason, we have chosen a particular area, 'autism' as a way to focus the literature for this survey, so that the conclusions we reach can have potential specific implications for that topic area.We feel that the autism focus, however, will generate findings that will have similar implications to other psychiatric and developmental application areas.
In this paper, we: 1) develop a specification for what constitutes an assessment of the technical re-executability for a given publication in each of the domains of: data, software, execution environment, statistics and results; 2) codify this assessment in survey form; and 3) apply the survey to a subset of the autism neuroimaging literature published recently (~2018).From the results of this survey, we can begin to generalize the state of the re-executability of the recent autism neuroimaging literature, in order to identify trends and opportunities for the enhancement of the re-executability status in support of greater overall generalizability (and hence reproducibility) of the literature.The survey template could also be applied as part of the publication review process, in order to prospectively attempt to enhance these aspects of reproducibility.

Survey development
Following the concept of a 're-executable publication' (Kennedy, 2019), in order to assess the prospects of re-execution of a given paper, we assess 1) the availability of the starting data, 2) the perceived completeness of the analysis description (both data processing and statistical assessment), and 3) the availability of the detailed complete results (in order to

Amendments from Version 1
In this version we made a number of important upgrades in response to the reviewers comments.First we clarify the scope of this specific survey.We try to do a better job of setting up the scope of the survey in the introduction.Specifically, we note that there are at least three domains in a publication where sufficient information for re-execution needs to be considered: the subject selection (can another researcher generate a comparable group); the data acquisition (can another researcher collect the same data); and the analysis (can another researcher perform the same analysis).All of these areas are important but we are only addressing the 'analysis' aspect in this manuscript.Second, we indicate in Table 1 the 'type' of MRI imaging (structural, functional, etc.) and which reviewers were involved.Third, we make the distinction that our assessment is aimed at evaluating the quality of the reporting (can I do what was reported), rather than the content (is what was reported the right or best thing to do?).Fourth, we have clarified the text regarding the validation cases (pilot assessment) and the dual raters for consensus evaluation of each publication.Fifth, we have tried to clarify the meaning and ways that the ' complete results availability' can be satisfied in the Discussion.Finally, the Results, Discussion and Conclusions sections have been updated to better reflect the appropriate content.
Any further responses from the reviewers can be found at the end of the article verify accuracy of re-execution).Regarding the 'availability of the starting data', we assess if the publication indicates how someone1 (other than the authors themselves) could appropriately access the data.The 'precision of the analysis description' ultimately asks if a reader who is reasonably skilled in the necessary domains, could precisely carry out the prescribed analysis steps.Specifically, are the software versions, operating system and complete parameters somehow made available to the reader?The 'detailed complete results' assesses if the publication indicates how to obtain the complete results, in order to both verify that the re-execution generates the same result and to overcome the limitations of only a selected summary being presented, which impedes a more complete meta-analysis of the literature.
In each of the three assessment areas, the survey distinguished between the theoretical potential for reproduction (such as complete descriptions of data used, software and commands executed, and statistical tests applied) and the practical potential for reproduction (whether the data is in fact accessible, whether the software is still available and will run).While the survey did not require the raters to actually reproduce the various steps, they were asked to use their professional judgement and past experience to determine the potential reproducibility.In these 'judgement' questions we allow responses of 'Yes', 'Approximately', 'I'm not sure', and 'No' to allow some degree of confidence in these judgements.For 'results availability', we coded 'Yes' if all of the results were indicated as being available, 'Partially' if some of the results were indicated as being available, and 'No' if none of the results were indicated as available or no indication of the results availability was provided.Note that our assessments are not if the analysis or data accessibility is 'optimum', or even 'correct', but rather if the assessor could redo the approach as described.
Figure 2 provides an overview of the survey design.
The survey was constructed in Google Forms.The details of the logic and wording of the survey forms was piloted (10 articles, three raters) within our own group, and then released for public comment to the BrainHack Slack2 channel in August, 2018.The final complete (serialized) text of the survey is provided in S1 (see Extended data; Hodge et al., 2020c).This is the expansion of the general query for 'autism AND MRI, qualified to select publications between 1/25/2014 -1/23/2019 and where the MeSH term includes 'human'.This query generated 811 resultant publications at the time of the query (see S2, Underlying data; Hodge et al., 2020a).We note that re-running the query today will generate additional results due to publications that have been added to PubMed after the search date but with publication dates within the defined range.

Survey application
Starting from the most recent publication and working backwards, we reviewed the title and abstract to verify publications that were indeed neuroimaging studies (not a case report or review), in English, related to autism and for which we could access the full text of the article.Working backwards from publication date, we selected the first 50 publications that met the above criteria.Of these 50 publications, 38 were available as free full text on PubMed, three were available as a PDF through a general Google Scholar search (publisher/author provided), two were available in PDF format from ResearchGate, and seven did not seem to be available without institutional access.The survey was applied to each paper by one of three raters (DNK, SMH, CH).Each of the final results were reviewed by a second rater (DNK or SMH) and consensus reached with the original rater if discrepancies were found.

Literature selected
The final set of publications used in this report is tabulated in

Survey results
A high-level summary of the survey results are represented in Figure 3.The complete set of question-by-question results are provided in S3 (see Underlying data; (Hodge et al., 2020c).et al., 2012), andFreeSurfer (Makris et al., 2003).The specific operating system used is rarely reported (1 of 50, 2%).Overall, our raters felt that in 80% of the publications a skilled image analyst could (or might be able to) repeat the analysis.

Statistical analysis:
In approximately two thirds of the publications (66%), the statistical software is indicated, again with variable indication of version and no reporting of the operating system upon which the software was running.In summary, our raters felt that in 29 of the 50 papers (58%), a skilled statistical analyst could (or might be able to) repeat the analysis.
Results availability: Availability of the detailed results is fairly rare.All or partial results are available in seven of the 50 publications (14%).
Other observations: Two publications which were clinical trials indicated preregistration (with the EU Clinical Trials Register and ClinnicalTrials.gov).None of the non-clinical trials publications reviewed indicated pre-registration (Nosek et al., 2019).

Discussion
The recent past literature of autism neuroimaging presents a somewhat consistent picture with respect to the prospects of re-executability with regard to the characteristics we examined in this report.Concerns of this sort have been raised in numerous contexts.The Organization for Human Brain Mapping's Committee on Best Practices in Data Analysis and Sharing (COBIDAS)4 , for example, digs very deeply into the recommendations for reporting and sharing in the literature.
The work here is complementary as it takes a high-level gestalt view of re-executability.
Data availability is low, as we would expect to see given the current state of affairs.Figure 3 indicates that there may be a trend towards better data availability (more "Yes" values in the data access column as PubMed ID increases, a good proxy for relative date of publication).
While 80% of the publications were deemed to have repeatable image analysis, the low rate of specifying software version and vanishing rate of specifying operating system is troublesome, since these details can make a difference in results (Ghosh et al., 2017;Glatard et al., 2015).Even if there are currently only limited software options in some analysis domains, which may implicitly implicate the operating system used, such limitations are not guaranteed to persist through time and should not be assumed for the reader.
A smaller fraction of papers indicates statistical software other than image analysis software, perhaps in the belief that the statistical techniques are more important than the software used to implement the technique.
In both cases there is a distinct difference between the theoretical and practical ability to reproduce both the image analysis and statistical analysis.Rater confidence in the ability to re-execute image analysis and statistical analysis are similar, regardless of the fraction of cases where the software is specified.
The complete results availability criterion was rarely met.Lack of results availability causes a number of problems.Primarily, it is harder to confirm replication (or the degree to which replication was or was not achieved) without the complete set of reported observations, not just the summary tables or figures.Resorting to visual interpretations of 'similarity' of published figures remains fraught with issues that can hamper true understanding of new results compared to prior results.Lack of detailed results sharing also compromises subsequent meta-analytic studies that would strive to integrate observations across multiple publications.Finally, lack of complete results exacerbates the publication bias (Jennings & Van Horn, 2012) through focus on the (relatively few) statistically significant observations while not reporting the large set of observations that are not significant.Examples of complete results availability include when the individual statistical maps for a fMRI analysis are available in a resource such as NeuroVault5 , the individual segmentation results of a processing workflow are available at NITRC6 or Zenodo7 , etc.
None of the reviewed publications indicated pre-registration (Nosek et al., 2019).This is not surprising as pre-registration is a fairly new phenomenon, and its uptake in the literature can be expected to take a while.However, as a 'baseline' observation, it is still important to note, so that changes in the prevalence of the pre-registration practice can be monitored.

Limitations
The scope of our survey was rather limited; only 50 publications, and in a selected topic area, autism.However, as a retrospective starting point for evaluation, we believe that it fairly represents the qualitative impressions that investigators have about the nature of neuroimaging publications.We covered numerous neuroimaging subdomains: structural, diffusion, functional; and data and analytic practices in these subdomains can be rather variable.We acknowledge that the details of precise description and dissemination of data and methods may indeed vary by discipline.However, we argue that the 'best practice' principles that we are suggesting here are universal and domain-specific solutions are currently available.Also, even though fifty publications are included in the survey, a number of these publications share co-authors or originate from the same research groups.Specifically, 15 of these authors are listed on two or more publications, and 14 of the publications have authors that are also authors on other publications in this set.
The raters (DNK, CH, SMH) we used had over 15 years of neuroimaging research experience each; however, the specialties of each varied from more methodological/statistical to image analytic.This 'background' can influence the interpretation of how successfully other 'reasonably skilled' investigators could re-execute a given analysis.Familiarity with particular methods can both increase perceived confidence with its reuse ("Of course, everyone knows how to execute that common method") or decrease confidence ("There are so many details that I know could be varied, how do I know what was really done?").In the absence of inclusion of explicitly re-executable data and methods in a publication (as in, for example, Ghosh et al., 2017) the interpretation of the precision and completeness of the description with regard to re-executability will be somewhat imprecise and reader-dependent.
Finally, the assessment of each publication is performed on the accessible manuscript as published.It is possible that data and results sharing can have occurred after publication, but this fact may not be represented in the materials reviewed.Indeed, it would be a valuable service to facilitate a more prospective management of these critical re-execution factors that can support authors in making additional supporting data and methods available post publication.

Conclusions
In conclusion, we feel that the survey results presented here reflect a state of neuroimaging publication practices that leaves ample room for improvement.While reuse of existing data is good, the majority of new data being collected for use in publications is not made publicly available.While the listing of software used is good, important details for reproducibility, such as version, detailed parameters, operating system, etc. are not fully disclosed.Similarly, statistical assessment details are variably reported, making re-execution problematic and approximate.Finally, as very little of the complete results of a publication are disclosed, assessment of the similarity of future replication attempts is severely hampered.Given the overall state of uncertainty about how reproducible (and representative) specific neuroimaging findings are, it seems prudent to begin to tighten up the variables that we as authors do have in order to better support the effective accumulation of knowledge about conditions we study.Promoting best practices in ethical data sharing, complete analytic approach disclosure, and complete results reporting seem to be critical in integrating the complex set of observations we collectively have published about the brain and how it develops and ages.The implications of these observations are that authors should redouble their efforts to be comprehensive in their reporting, even after the publication, to make as accessible as possible the detailed methods and results that they are reporting on.Specifically, authors, reviewers and editors should insist on the complete declaration of: data source and availability status, all software and versions used for data analysis and statistical assessment, the operating system (and version) for data and statistical analysis, and the disposition of the analytic results.Such a 'checklist' would be a valuable asset for the community and will be the subject of future efforts.This future checklist should be developed in conjunction with journal specific guidelines, and other checklists (established in conjunction with the COBIDAS report (Nichols et al., 2017), statistical reporting (Dexter & Shafer (2017), Nature Neuroscience Reporting Checklist, etc.).In such a way, publishers, editors and reviewers can impart more influence in the manuscripts that they encounter, in an effort to increase the transparency and completeness of the published record that they are playing their role in creating.Together, we hope that we can move the field forward and generate a literature that is more amenable to supporting the understanding of how our collective observations fit together in supporting the understanding of the brain.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Neuroscience, Psychology, Computer Science, Neuroimaging
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Henrich, Heine, & Norenzayan, 20104).Of course, this paper is about other reasons for reproducibility, but it seems appropriate to mention this, especially in light of the increased attention given to exclusionary social systems in other domains.
Response: As we discuss in response to Reviewer 1 above, the details of the subject pool ascertainment and its' generalizability is beyond the scope of this manuscript, but as this is an important point, we have included it in our updated introduction.
"In this paper we concentrate on assessing the technical prospects of re-executability of a publication.As introduced above, there are many other factors that will contribute to the actual generalization of the findings including subject population details, data acquisition details, the nature of the processing and statistics (even if they can be re-executed), the underlying biological effect size, if present, etc. (see Figure 1).Take for example, the subject population.Too often researchers communicate a finding based on a convenience sample without any statement indicating that the results might not generalize to a sample that more accurately reflects human diversity (e.g.DeJesus, Callanan, Solis & Gelman, 20191;Hruschka, Medin, Rogoff & Henrich, 20182;Rad, Martingano & Ginges, 20183;Henrich, Heine, & Norenzayan, 20104).Comprehensive and standardized description of all these additional factors are critical as well, but are beyond the scope of this evaluation.Our groups and others are looking into reporting standards for these areas as well." We also had some concern with the concepts of the 'precision of analysis' (methods paragraph 1).This issue in particular seems difficult to assess reliably, and so there might be a higher degree of measurement error for this concept in comparison to the other concepts.We appreciate that the authors allude to this difficulty later in the paper, when they state that more expertise could also lead to higher levels of measurement error, but here we feel that a more explicit note of caution that these variables in particular should be viewed with additional skepticism.

Response:
In order to help the reader appreciate the cautionary note regarding these assessments, we have updated the notion of 'precision' to "perceived completeness" to help remind that the precision assessment is in the mind of the assessor.This is reflected in Methods paragraph one and elaborated upon a little more in Limitations paragraph two: "In the absence of inclusion of explicitly re-executable data and methods in a publication (as in, for example, Ghosh, et al.) the interpretation of the precision and completeness of the description with regard to re-executability will be somewhat imprecise and readerdependent." The description of how the assessment was applied to each paper was difficult to follow ('survey application' pg 4: "one of three raters applied the survey to each of these articles.Each of the final results...").Does this mean that each paper was evaluated by 1 reviewer?It seems like it would be useful to have more than one person complete the review.This would allow the reader to have a sense of the degree of inter-rater reliability.

Response:
We have attempted to clarify the text regarding the validation cases (pilot assessment) and the dual raters for each publication.
"The survey was applied to each paper by one of three raters (DNK, SMH, CH).Each of the final results were reviewed by a second rater (DNK or SMH) and consensus reached with the original rater if discrepancies were found." [related] an independent assessment by other raters (along with ratings) would be a wonderful addition to the work, if a bit effort-intensive.Adding a column to figure 2 listing which rater assessed which publication would be helpful.This column could be coded for anonymity (Rater 1, Rater 2) if the authors so choose.

Response:
Table 1 now has a column indicating which raters (Rev1, Rev2 or Rev3) reviewed each publication as the 'primary' or 'checking' reviewer.
We additionally found the category of 'results availability' to be a little vague.Especially so since it seems as though papers never reached this cutoff.What does it take for a paper to have complete results availability?○ Response: We agree that the 'complete results availability' was a lofty and somewhat variable goal statement.We have tried to clarify the meaning and ways that this can be satisfied in the updated text of paragraph five in the Discussion: "The complete results availability criterion was rarely met.Lack of results availability causes a number of problems.Primarily, it is harder to confirm replication (or the degree to which replication was or was not achieved) without the complete set of reported observations, not just the summary tables or figures.Resorting to visual interpretations of 'similarity' of published figures remains fraught with issues that can hamper true understanding of new results compared to prior results.Lack of detailed results sharing also compromises subsequent meta-analytic studies that would strive to integrate observations across multiple publications.Finally, lack of complete results exacerbates the publication bias (Jennings and Van Horn 2012) through focus on the (relatively few) statistically significant observations while not reporting the large set of observations that are not significant.Examples of complete results availability include when the individual statistical maps for a fMRI annalysis are available in a resource such as NeuroVault, the individual segmentation results of a processing workflow are available at NITRC or Zenodo, etc." Competing Interests: No competing interests were disclosed.
Reviewer Report 22 September 2020 https://doi.org/10.5256/f1000research.27929.r70168© 2020 Specht K.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Karsten Specht
Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway

Summary
The article by Hodge and co-workers summarises an attempt in assessing the possibility to replicate 50 published neuroimaging studies on autism.The results indicate that the majority of the studies provide only partial information that would be required for replication of the study.In particular, are information about the operating system missing, only a few studies share their data or other files, and the description of the different analysis steps are sparsely described.

Assessment:
The article is well written with a clearly described method and results.The study provides a suitable method that could easily be applied to other research topics, as well.However, the conclusions that can be drawn from this study are still limited in my view, since it would have been good to include further information in the survey, which I will list below: In my opinion, the authors focus too much on the technical aspects of a study.Although the authors introduce that a "spectrum-diagnosis" might generate further problems, they do not follow this up in the survey.I would like to see at least one additional column that codes whether the diagnostic criteria and sample are replicable, i.e. are the patients well characterised (age, gender, education), are the diagnostic instrument mentioned, cut-off criteria, etc. 1.
I suggest including another column (at least) in the supplementary material S3 that also lists the imaging modality, i.e. structural MRI, fMRI, MRS, DTI, since they also partly represent different disciplines and traditions in publishing.Further, some methods have only a very limited number of software tools, like MRS, which are often restricted to only one (type of) OS.So, reporting the software may make it almost obsolete to report the OS.Therefore, doing a survey across different neuroimaging modalities may show some general deficiencies, but the other disciplines may need to improve on different aspects.

2.
Similarly, concerning fMRI, it also makes a difference whether studies were analysed as whole-brain studies or as a focused region of interest analyses, and, in the latter case, whether the regions were derived from anatomical images or, for example, simply spheres.It would also be informative to know whether studies applied corrected p-values, and which one, and whether the effect sizes were reported.

3.
Did the authors control how many studies came from the same lab?Some labs might have a kind of "tradition" in reporting results, which could bias the survey.

4.
I think, the headlines of the article are a bit off since the "Discussion" mostly reports the results, and the "Conclusion" primarily discusses the results.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound?Yes

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate?Yes Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Neuroimaging, fMRI, MRS, reliability I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
similar concern raised by Reviewer #2): "In this paper we concentrate on assessing the technical prospects of re-executability of a publication.As introduced above, there are many other factors that will contribute to the actual generalization of the findings including subject population details, data acquisition details, the nature of the processing and statistics (even if they can be re-executed), the underlying biological effect size, if present, etc. (see Figure 1).Take for example, the subject population.Too often researchers communicate a finding based on a convenience sample without any statement indicating that the results might not generalize to a sample that more accurately reflects human diversity (e.g.DeJesus, Callanan, Solis & Gelman, 20191;Hruschka, Medin, Rogoff & Henrich, 20182;Rad, Martingano & Ginges, 20183;Henrich, Heine, & Norenzayan, 20104).Comprehensive and standardized description of all these additional factors are critical as well, but are beyond the scope of this evaluation.Our groups and others are looking into reporting standards for these areas as well." I suggest including another column (at least) in the supplementary material S3 that also lists the imaging modality, i.e. structural MRI, fMRI, MRS, DTI, since they also partly represent different disciplines and traditions in publishing.

Response:
We have added a new column to Table 1 that indicates modality.While the details of the data and analysis procedures will vary by these modalities, the need to fully express the complete analysis should be independent of the specific modality.some methods have only a very limited number of software tools, like MRS, which are often restricted to only one (type of) OS.So, reporting the software may make it almost obsolete to report the OS.

○
Response: While this is certainly true in some situations, we suggest that a good best practice for reporting should be universal (and OS versions change and thus should be disclosed).We have added in the Discussion, third paragraph: "Even if there are currently only limited software options in some analysis domains, which may implicitly implicate the operating system used, such limitations are not guaranteed to persist through time and should not be assumed for the reader." doing a survey across different neuroimaging modalities may show some general deficiencies, but the other disciplines may need to improve on different aspects.

○
Response: Again, while this is true, the general best practices and principles we're trying to elucidate here should be universal.What specific disciplines need to do to support these necessary practices may indeed vary by discipline.We try to elaborate on this in the Limitations section, first paragraph: "We acknowledge that the details of precise description and dissemination of data and methods may indeed vary by discipline.However, we argue that the 'best practice' principles that we are suggesting here are universal and domain-specific solutions are currently available." The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com

Figure 1 .
Figure 1.Essential elements of a publication.Elements of a publication that comprise a starting point for a structured exploration of the generalizability of a specific finding.The outlined areas define the technical prospects of re-executability of a finding that are evaluated in this survey.

Figure 3 .
Figure 3. Survey results summary.The 50 publications are summarized on the main factors of data availability, software specification, statistical specification and results availability.
Academy of Sciences.2019; 116 (37): 18370-18377 Publisher Full Text 2. Hruschka D, Medin D, Rogoff B, Henrich J: Pressing questions in the study of psychological and behavioral diversity.Proceedings of the National Academy of Sciences.2018; 115 (45): 11366-11368 Publisher Full Text 3. Rad M, Martingano A, Ginges J: Toward a psychology ofHomo sapiens : Making psychological science more representative of the human population.Proceedings of the National Academy of Sciences.2018; 115 (45): 11401-11405 Publisher Full Text 4. Henrich J, Heine SJ, Norenzayan A: The weirdest people in the world?.Behav Brain Sci.2010; 33 (2-3): 61-83; discussion 83 PubMed Abstract | Publisher Full Text Is the work clearly and accurately presented and does it cite the current literature?Yes Is the study design appropriate and is the work technically sound?Yes
availability: 38 of the 50 (76%) publications appear to have 'free full text' available, according to the PubMed search.Of these, 33 are indexed in PubMed Central.Overall, 43 were freely available through either PubMed Central, Google Scholar or publisher or other websites.Data availability: 17 of the 50 (34%) publications make reference to the availability of the data used in the publication.However, the publications that indicate availability are mostly reusing data from the large repositories, whereas the publications that do not indicate data availability are principally locally conducted studies.Thus, this indicates that a large fraction of the data being used in publications are not available to the community.3 of these 17 indicate 'available upon request'.For the data that is available, the following resources are indicated: ABIDE 1 (Di Martino et al.