A libraries reproducibility hackathon: connecting students to university research and testing the longevity of published code

Chasz Griego; Kristen Scotti; Elizabeth Terveen; Joseph Chan; Daisy Sheng; Alfredo González-Espinoza; Christopher Warren

doi:10.12688/f1000research.156917.2

Home Browse A libraries reproducibility hackathon: connecting students to university...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Case Study

Revised

A libraries reproducibility hackathon: connecting students to university research and testing the longevity of published code

[version 2; peer review: 3 approved]

Chasz Griego ¹, Kristen Scotti¹, Elizabeth Terveen², [...] Joseph Chan², Daisy Sheng², Alfredo González-Espinoza¹, Christopher Warren³

Chasz Griego ¹, Kristen Scotti¹, [...] Elizabeth Terveen², Joseph Chan², Daisy Sheng², Alfredo González-Espinoza¹, Christopher Warren³

PUBLISHED 09 Sep 2025

Author details Author details

¹ University Libraries, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, USA
² Carnegie Mellon University School of Computer Science, Pittsburgh, Pennsylvania, 15213, USA
³ Carnegie Mellon University Department of English, Pittsburgh, Pennsylvania, 15213, USA

Chasz Griego
Roles: Conceptualization, Data Curation, Formal Analysis, Methodology, Project Administration, Resources, Software, Supervision, Validation, Writing – Original Draft Preparation

Kristen Scotti
Roles: Conceptualization, Data Curation, Formal Analysis, Resources, Software, Validation, Visualization, Writing – Original Draft Preparation

Elizabeth Terveen
Roles: Data Curation, Formal Analysis, Validation, Visualization

Joseph Chan
Roles: Data Curation, Formal Analysis, Software, Validation

Daisy Sheng
Roles: Data Curation, Visualization

Alfredo González-Espinoza
Roles: Data Curation, Formal Analysis, Validation

Christopher Warren
Roles: Conceptualization, Data Curation, Resources, Software, Visualization, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Hackathons collection.

Abstract

Background

Reproducibility is a basis of scientific integrity, yet it remains a significant challenge across disciplines in computational science. This reproducibility crisis is now being met with an Open Science movement, which has risen to prominence within the scientific community and academic libraries especially. At the Carnegie Mellon University Libraries, the Open Science and Data Collaborations (OSDC) Program promotes Open Science practices with resources, services, and events. Hosting hackathons in academic libraries may show promise for furthering such efforts.

Methods

To address the need for reproducible computational research and promote Open Science within the community, members of the OSDC Program organized a single-day hackathon centered around reproducibility. Partnering with a faculty researcher in English and Digital Humanities, we invited community members to reuse Python code and data from a research publication deposited to Harvard Dataverse. We also published these materials as a compute capsule in Code Ocean that participants could also access. Additionally, we investigated ways to use ChatGPT to troubleshoot errors from rerunning this code.

Results

Three students from the School of Computer Science participated in this hackathon. Accessing materials from Harvard Dataverse, these students found success reproducing most of the data visualizations, but they required some manual setup and modifications to address depreciated libraries used in the code. Alternatively, we found Code Ocean to be a highly accessible option, free from depreciation risk. Last, ChatGPT also aided in finding and addressing the same roadblocks to successfully reproduce the same figures as the participating students.

Conclusions

This hackathon allowed several students an opportunity to interact with and evaluate real research outputs, testing the reproducibility of computational data analyses. Partnering with faculty opened opportunities to improve open research materials. This case study outlines one approach for other academic libraries to highlight challenges that face reproducibility in an interactive setting.

Keywords

Reproducibility, Hackathon, Academic Libraries, Open Science, Metascience, Digital Humanities, Computational Research, Software, Community Engagement

Corresponding author: Chasz Griego

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2025 Griego C et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Griego C, Scotti K, Terveen E et al. A libraries reproducibility hackathon: connecting students to university research and testing the longevity of published code [version 2; peer review: 3 approved]. F1000Research 2025, 13:1305 (https://doi.org/10.12688/f1000research.156917.2) First published: 31 Oct 2024, 13:1305 (https://doi.org/10.12688/f1000research.156917.1) Latest published: 09 Sep 2025, 13:1305 (https://doi.org/10.12688/f1000research.156917.2)

Revised Amendments from Version 1

This version of the article includes further context and details that we agreed to add following peer review. Note that the abstract has been modified to follow the formatting for a case study article, which we hope allows readers to follow our process as they find the article. Also, we have added further points in the conclusion section, which we believe strengthens the message derived from the outcomes of this work. Finally, Figure 1 was modified to give readers a more informative graphic that will again aid in explaining our process.

See the authors' detailed response to the review by Teresa Gomez-Diaz and Tomas Recio
See the authors' detailed response to the review by Benjamin Antunes

Introduction

Reproducibility in scientific research is a highly regarded concept that measures both the credibility and soundness of rigorous studies.^1,2 In the context of research, the term reproducibility is often loosely defined or used interchangeably with replicability. The National Academy of Sciences convened in 2019 to develop a definition³ and differentiate between reproducibility, replicability, and generalizability.^2,3 Reproducibility, which is generally consistent with computational reproducibility, is defined as consistent computational outcomes by utilizing identical input data, procedures, methods, code, and analytical conditions. Replicability is defined as uniform outcomes across research projects designed to address the same scientific question, where each project collects its own data. And finally, generalizability is defined as how well findings of a study can be applied to different contexts or populations beyond the original.

Modern research continues to face multiple challenges surrounding reproducibility. The adoptable skills and practices that promote reproducibility may vary between fields,^4–6 awareness or guidance is often lacking,^4,6,7 and the time and resources devoted to teaching and practicing reproducibility may be deficient.⁸ Computation research, in particular, faces barriers to reproducibility⁹ with software inconsistencies, changing versions and dependencies,¹⁰ and insufficient documentation of datasets¹¹ or methods.¹² For example, Liu and Salganik¹¹ found that the complexity of employed computational methods impacts reproducibility. The more common methods in a given field may ease reproducibility, as they are familiar and well-documented, but more advanced methods can pose greater challenges, requiring specialized knowledge, environments, and resources that are not as readily available or standardized.

Investing in tools and practices around computational reproducibility enhances the transparency and reliability of scientific findings by enabling independent verification of results and fostering further research on established findings. Such a shift in computational research practice is important since it increases research credibility and efficiency and avoids wasted efforts in attempting to build on unreliable results.^13,14 The scientific community is increasingly advocating for the use of various tools and practices aimed at improving reproducibility of computational research,^12,15 including adopting open-source software,¹²^,¹⁶ conducting version control (e.g., Git and GitHub),¹⁷ encapsulating computing environments (e.g., Docker and Guix),^12,18,19 and following community-driven standards and frameworks. Reproducible journals like Image Processing On Line (IPOL)²⁰ and repositories like Zenodo and Software Heritage are further supporting this shifting practice, allowing further sharing and archiving of research software and source code.

In general, scientific communities are upping advocacy through the Open Science movement.²¹ Open Science promotes transparency of research process, data, methodologies, and outputs; enhancing accessibility and usability by the broader scientific community.^22,23 Reproducibility is fundamental to open science,^23,24 by ensuring verifiable scientific claims to the community; made possible via open sharing of datasets, methodology, code, and using open source software. Overall, alignment with reproducible practices moves the scientific community towards a more collaborative environment.²²

At the Carnegie Mellon University Libraries, a dedicated Open Science & Data Collaborations Program (OSDC) was established in 2018, which highlights the Libraries’ resources and tools that foster open, transparent, and reproducible research.²⁵ In this program, librarians and functional specialists serve as advocates, consultants, and collaborators for researchers accessing the existing Open Science infrastructure in the Libraries, including CMU’s institutional repository, KiltHub,²⁶ and research data management services.²⁷ In addition, the Libraries have held multiple iterations of the Open Science Symposium,²⁸ which gathered global researchers and thought leaders in academia, industry, and publishing to discuss the ways that Open Science has transformed research. The OSDC program also created roles for STEM PhDs to transition into Open Science and librarianship through an Open Science Postdoctoral Associate position, with Chasz Griego and Kristen Scotti being the first and second candidates in this role, respectively. Through this postdoctoral role, Griego considered ways to explore reproducibility integration and evaluation among researchers and students at CMU.²⁹

Motivated by the challenges surrounding reproducibility and the growing awareness of Open Science, members of the CMU Libraries and OSDC Program organized a single-day hackathon centered around reproducibility. Hackathons, which are time-bound events for groups to collectively solve or explore a technical problem, often find libraries as a suitable host as a neutral and welcoming space on campus that harbors learning, discovery, and collaboration.³⁰ The CMU Libraries has effectively hosted multiple hackathons, including Biohackathons and a recent AI Literacy Resource Hackathon.^31–33

Libraries and hackathons surrounding reproducibility is also not a novel concept. The Center for Digital Scholarship at Leiden University held ReprohackNL in 2019, where participants attempted to reproduce published results using authors’ provided data and code in a single day event.³⁴

At the ReprohackNL event, the organizers observed how participants faced challenges with insufficient documentation, data behind a paywall, and problems with code and proprietary software. The CMU Libraries Reproducibility Hackathon aimed to offer a similar model, but here, we focused on the reproducibility of a study with shared data and code around a subject that could create interdisciplinary collaboration across students and researchers at CMU.

Purpose of the event

The CMU Libraries Reproducibility Hackathon was an event that allowed any participant the opportunity to reproduce research results produced by a university professor. The purpose of such an event is to increase awareness of reproducibility as a realistic piece of the research life cycle, and demonstrate the outcomes of research scrutiny, which could theoretically be done by anyone. Most are familiar with the concept of reproducibility at a surface level, but many may not actually have a grasp on what that may look like, how it may happen, or how impactful it could be for any study. In an extremely idealistic world, a person with any background, experience, or research interest would have access to every material behind a particular study. These materials, again in an ideal case, would include all resources and guidance for a person to reproduce each result, assuring the findings first-hand. In reality, most research is very distanced from such an idealistic scenario, however, an event like a reproducibility hackathon can actually attempt to measure this distance for researchers willing to participate. For this hackathon, we offered one academic researcher the opportunity to put their research outputs up to scrutiny. Regardless of the outcome, a volunteer researcher helps set an example. If the results are highly reproducible, this presents a recognizable feat, and if the results prove more difficult to reproduce, this presents an opportunity to reflect on ways to improve. Work that is less reproducible does not have to be met with shame or guilt, but a humbling opportunity to learn and do better.

For the CMU Libraries Reproducibility Hackathon, we teamed up with Christopher Warren, a Professor of English and Associate Department Head in the Dietrich College of Humanities and Social Sciences at CMU. As the subject for a reproducibility assessment, Warren offered the content of a published data analysis of the Oxford Dictionary of National Biography (ODNB),^35,36 a collection of biographies for over 60,000 influential figures in British history. Warren’s work is credited as an “audit” of the ODNB,³⁷ revealing the biases and assumptions hidden beneath the vastness of big data infrastructure. This subject was well aligned with the event, because with an analysis such as Warren’s, which scrutinizes the soundness of a massive data collection, the Python code that reveals the findings must also hold up to its own audit, ensuring that changing versions and dependencies do not prevent repeat analyses. With Warren offering his humanities research as an exemplary case, we offer further dialog around inclusivity of all research areas beyond STEM in the Open Science movement.³⁸

Event execution

We openly distributed a call for participants to anyone interested from CMU and other institutions in the greater Pittsburgh area. The call noted that knowledge and awareness of scientific programming with Python and/or R was a desired, but not required, condition for participation. Participants submitted their interest and background through a short form and had the option to attend one of two information sessions a week prior to the event. During the information session, participants received a link to Warren’s manuscript to provide sufficient time to read into the details of the work. They were asked to not search for any items related to the work, avoiding the risk of early access to data and code posted to a repository.

The hackathon was scheduled for just one day, with participants and organizers convening in a library space during typical business hours (9am - 5pm). A project page on Open Science Framework (OSF)³⁹ was created for the event, where participants could access event information and guidance, a copy of Warren’s manuscript, and documents with links to access data and code. In addition to the published manuscript, Warren deposited supplementary research outputs to Harvard Dataverse, which includes a Jupyter Notebook, CSV files, Open Refine transformation JSON files, and others.³⁶ This collection of outputs served as one option for participants to use to attempt to reproduce the study ( Figure 1). This option is arguably open, having research outputs deposited in a repository to ensure secure storage, public access, and findability with a DOI, and this option used non-proprietary file formats such as CSV and JSON, and open-source Python code written in notebooks. Such attributes greatly promote reusability once users download each output and ensure that Python and all necessary libraries are installed.

Figure 1. Diagram depicting the two options to reproduce research during the hackathon.

The first option (top) uses research outputs stored in Harvard Dataverse, and the second option (bottom) uses a capsule in Code Ocean. The end of the diagram (right) shows a reproduced version of Figure 1 from Warren’s work.³¹ Icons attributed to the Noun Project, the Dataverse Project, and Code Ocean.

We also provided a second option that aims to greatly facilitate reproducible computational research ( Figure 1). The same data and code found in Harvard Dataverse were organized and configured in a Code Ocean capsule.³⁵^,⁴⁰ Code Ocean is a platform that allows researchers to archive research data and code along with preserved language and library versions plus dependencies and a metadata record in a self-contained capsule. Capsules are discoverable with a DOI, and outside users can perform reproducible runs that compile and execute code in a virtual environment, producing results that match the original analysis.

Observations and reflections

The reproducibility hackathon hosted three undergraduate participants, who were each in the first or second year of their degree program in the School of Computer Science with established Python programming experience. Each student is an author of this work, with the following summarizing the notable outcomes, experiences, and contributions of these participants. Some notes are also listed briefly in Table 1. One participant, Elizabeth Terveen, chose the option of downloading the materials from Harvard Dataverse and attempting to run the Jupyter Notebook in a new Python environment working with the Anaconda package manager. To run the code, Terveen had to install four Python libraries (Pandas, Nltk, Matplotlib, and Plotly). One of the existing import statements in the notebook, regarding Plotly, returned an error due to a module depreciation.

Table 1. Technical issues experienced from participants attempting to reproduce results by running the code corresponding to Warren’s work.³¹

Notable issues
Several Python libraries required installation
Some import statements needed to be updated because of module depreciation
New arguments were required to successfully run a function
A function keyword argument name was updated
File paths in the code were not consistent with file organization in the repository

Of the twenty-five figures produced in the notebook, Terveen noted fourteen figures that were successfully reproduced, but six of these fourteen figures did require some troubleshooting. This included supplying values for arguments in functions that were not specified in the original code, but a resulting error stated as required. The name of a keyword argument related to a Matplotlib function had also become depreciated, which required an update to the original code. Eight figures were not successfully reproduced because of file paths in the code that were not consistent with directory structure Terveen had after downloading from the repository. The reproducibility of two figures, however, werewas not documented during the event.

A second participant, Joseph Chan, also chose the option to download the materials from Harvard Dataverse. Chan immediately noted that if a “requirements.txt” file was included with the original deposit, any user trying to reproduce the results would be able to download and install the appropriate packages and versions. Based on Chan’s setup, five python libraries and Jupyter needed to be installed, including Pandas, Nltk, Pickle, Numpy, and Matplotlib. Additionally, a plotting library was deprecated, requiring an import statement to be updated. Once again, keyword arguments were depreciated and had to be updated. Two figures did not display upon running and were unable to be solved.

Chan also created a modified version of the research outputs as a zip file uploaded to OSF (https://osf.io/hvbzm), which contains a requirements.txt file, all necessary data, and all the code from the notebook in a single Python (.py) file. With this, users can simply reproduce the work by executing a shell script from a terminal. Copies of figures that were successfully reproduced were uploaded to OSF (https://osf.io/gqk8e/), with titles altered with either a note or a name to verify that the figures were original copies (see Figure 1).

Overall, this process of reproducing results using files downloaded from a repository produced several challenges with requirements to install packages, understand depreciations for some packages, and troubleshooting changes in syntax from evolving versions of software. Though there was trial and error, participants were able to reproduce a majority of the figures from this study. Additionally, participants provided their own insight to develop an updated collection of research outputs that help others reproduce this work more easily.

Additional outcomes and lessons learned

Alfredo González-Espinoza, our colleague who participated in the assessment as well, offered an interesting anecdote while pursuing reproducibility via both Harvard Dataverse and Code Ocean (https://osf.io/brn6d). In order to run the code downloaded from the repository, he had to install Python, Jupyter, all required libraries, and found the same depreciation errors as other participants; however, he was successful with reproducibility via the Code Ocean capsule. González-Espinoza’s scenario was interesting because he used a gaming laptop to reproduce from the repository, which doesn’t make a difference when the issue is installing software, but for the capsule he accessed the code via an old Chromebook tablet. This individual experience calls to attention the fact that a platform like Code Ocean may promote accessibility for people that may have limited access to computers, and instead only have access to devices that one can’t use to download and install software as easily.

The hackathon offered an additional opportunity for exploration. Kristen Scotti investigated ways to troubleshoot reproducing the output of the code using generative AI. Python, like many other programming languages, undergoes frequent updates, deprecating functions or changing syntax overtime, making author-provided code more difficult to work with or potentially incompatible with newer versions. Here, Scotti ran the author-provided code through ChatGPT to request updates (https://osf.io/r65ev). ChatGPT’s programming capabilities include the ability to interpret, debug, and suggest updates to code.⁴¹ ChatGPT updated code successfully for all recreated plots except one, which also provided issues for other participants. The likely culprit was formatting issues due to syntax changes and complexity of the provided code block. A more effective approach might be to ask ChatGPT to convert the code in smaller sections, though this often requires someone experienced in the language to fine-tune both the prompts and the outputs.

Outside the hackathon setting, we revisited the capsule version of Warren’s research to test how easily his work could be adapted within this platform. For instance, figures 28 and 29 in the manuscript have prompted discussion in other sources.^37,42 Figure 29, in particular, displays the historical significance of selected women in the ODNB, and the span of living years of these women from 1830-2000 (listed here in Figure 2b). However, horizontal bars representing a span of years could instead incidentally suggest the number of women belonging to the corresponding historical significance. We addressed this ambiguity by creating a new version of the capsule on Code Ocean⁴³ where additional cells in the Jupyter Notebook create new versions of these figures which explicitly state the numbers of years each vertical bar spans ( Figure 2a).

Figure 2. Adapted version (a) of Figure 29 (b) from Warren’s work.³¹

Discussion and conclusions

We have outlined the execution and outcomes of a Reproducibility Hackathon, hosted by CMU Libraries, focusing on the reusability of data and code shared by a faculty member from the university. This event allowed students to interact with real research outputs to learn potentially new skills and fields of study while simultaneously offering their perspective. In reviewing the reproducibility of research code from the participating faculty member, Christopher Warren, students showed that manual setup was needed to rerun research code, and along with that, depreciations required them to write updates and modifications. One participating student sought a way to encapsulate the code so that everything was packaged and shareable, enabling an outside user to properly reproduce the run by executing a single script. Lastly, we attempted to rerun the code while using a ChatGPT API to debug. This option identified and addressed the same errors that student participants met and successfully reproduced the same figures.

The Reproducibility Hackathon presented just one of countless cases of research code that won’t rerun reliable as it. Python code and its many libraries for scientific analysis will update regularly, where over the years, significant changes like those we found will likely happen. Documentation is important to circumvent these changes to code so that external users are confidently aware of the original file organization and hierarchy, versions and dependencies of software libraries, and even potentially the original operating system or computing hardware. Though even with all details noted, an external user may lack all of the resources necessary to fully reproduce the analysis, but the combination of open source licensing, thorough documentation, and full transparency gives these users confident starting points to further research progress.

Code Ocean presented an alternative to sharing code without manual installations or depreciations. An alternative of Warren’s outputs was assembled and published in a reproducible capsule on Code Ocean. This option presented itself as a more accessible way for a user to interact with computational research, as a participating colleague revealed the ease of interacting with the capsule on a tablet device. Further applying this option, we modified Warren’s code in Code Ocean to produce an alternative depiction of data visualizations presented in the original work. Doing this offered an example of how a reproducibility-centered platform can promote research as ongoing conversation that openly invites community members to offer constructive insights.

In addition to Code Ocean, we also promoted the usage of Open Science Framework (OSF) throughout the Reproducibility Hackathon. Both platforms did greatly facilitate access of research documents, data, and code, but we also highlight that these platforms invoke terms of use related to the research data, software, and other content that is stored and manipulated within them. These terms, or the nature of sharing research material on these platforms from such companies, may not always align with the goals or values of individual researchers, teams, or institutions, and discretion is always recommended. These platforms are just a few of the numerous options, including institutional platforms and services, available to researchers to share open and reproducible outputs.

Hosting a reproducibility hackathon at an academic institution created an open environment for researchers to put their published work to the test, opening themselves to constructive feedback around reproducibility, and allowing students or other researchers to interact with real research artifacts. The CMU Libraries reproducibility hackathon allowed three undergraduate students from the School of Computer Science to interact with a unique study in Digital Humanities. This was particularly meaningful because these students may not have otherwise had the opportunity to interact with real research in this way and see examples of the work done in this discipline. Collaborating with a faculty researcher, like Warren, further sets examples of researchers who not only open up research materials, but embraced opportunities to revisit and improve their work to maintain its relevance and scholarly integrity.

We initially aimed for a higher number of participants, but we only received interest from eight students, with three being the listed participants here. For future iterations of this event, we will consider other factors to gain interest and retention. The time commitment of a full day may be excessive, considering class schedules. A shorter event, hosted after most classes in the evening could promote higher participation. Furthermore, the subject matter of the research may not be widely interesting across campus, suggesting a more targeted form of marketing, either to individual academic departments or courses. While we stated several benefits from our collaboration with Warren, other interested faculty researchers may have varying abilities to commit for subsequent events at any given time. While our execution was not one-size-fits-all, nor optimum, we have presented our experience here to offer a blueprint for other academic libraries to execute similar events, boost awareness of the threats to preserving research code, and highlight how new platforms eliminate these threats to reproducibility and enhance adaptation along the research lifecycle.

Ethics and consent

Ethical approval and consent were not required.

Author contributions

Chasz G: Conceptualization, Data Curation, Formal Analysis, Methodology, Project Administration, Resources, Software, Supervision, Validation, Writing – Original Draft Preparation; Kristen S: Conceptualization, Data Curation, Formal Analysis, Resources, Software, Validation, Visualization, Writing – Original Draft Preparation; Elizabeth T: Data Curation, Formal Analysis, Validation, Visualization; Joseph C: Data Curation, Formal Analysis, Software, Validation; Daisy S: Data Curation, Visualization; Alfredo G-E: Data Curation, Formal Analysis, Validation; Christopher W: Conceptualization, Data Curation, Resources, Software, Visualization, Writing – Review & Editing

Data availability

Open Science Framework: Reproducibility Hackathon 2024. https://doi.org/10.17605/OSF.IO/AQ69R.³⁹

The project contains the following underlying data:

• All original documents and materials prepared and produced from the event.

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Harvard Dataverse: Replication Data for: Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB). https://doi.org/10.7910/DVN/D3KFLP.⁴⁰

The project contains the following underlying data:

• All data and code supporting the original work that was the focus of the hackathon.

Data are available under the terms of the Creative Commons 1.0 Universal Dead (CC0 1.0)

Code Ocean: Replication Data for: Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB) (Version 1). https://doi.org/10.24433/CO.6313661.v1.³⁵

The project contains the following underlying data:

• All data, code, and software environment details supporting the original work that was the focus of the hackathon.

Data and code are available under the terms of the Creative Commons CC0 licenses with no rights reserved.

Code Ocean: Replication Data for: Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB) (Version 2). https://doi.org/10.24433/CO.6313661.v2.⁴³

The project contains the following underlying data:

• All data, code, and software environment details supporting the original work that was the focus of the hackathon as well as appended code that adapts Figures 28 and 29 from the original study.

Data and code are available under the terms of the Creative Commons CC0 1.0 licenses with no rights reserved.

Acknowledgements

We thank Melanie Gainey and Tom Hughes from the CMU Libraries for useful discussions about hackathons and their support during the event.

References

1. Buckheit JB, Donoho DL: WaveLab and Reproducible Research.Antoniadis A, Oppenheim G, editors. Wavelets Stat. New York, NY: Springer; 1995; pp. 55–81. Publisher Full Text
2. Barba LA: Terminologies for reproducible research. ArXiv Prepr. ArXiv180203311. 2018.
3. Reproducibility and Replicability in Science. Natl. Acad. Sci. 2019. Reference Source
4. Baker M, Baker M: 1,500 scientists lift the lid on reproducibility. Nature. 2016; 533: 452–454. Publisher Full Text
5. Mullane K, Williams M: Enhancing reproducibility: Failures from Reproducibility Initiatives underline core challenges. Biochem. Pharmacol. 2017; 138: 7–18. PubMed Abstract | Publisher Full Text
6. Kohrs FE, Auer S, Bannach-Brown A, et al.: Eleven strategies for making reproducible research and open science training the norm at research institutions. elife. 2023; 12: e89736. PubMed Abstract | Publisher Full Text | Free Full Text
7. Rich-Edwards JW, Maney DL: Best practices to promote rigor and reproducibility in the era of sex-inclusive research. elife. 2023; 12: e90623. PubMed Abstract | Publisher Full Text | Free Full Text
8. Six factors affecting reproducibility in life science research and how to handle them.n.d. (accessed May 28, 2024). Reference Source
9. Antunes B, Hill DR: Reproducibility, Replicability and Repeatability: A survey of reproducible research with a focus on high performance computing. Comput. Sci. Rev. 2024; 53: 100655.
10. Implementing Reproducible Research. Routledge CRC Press; n.d. (accessed May 28, 2024). Reference Source
11. Liu DM, Salganik MJ: Successes and struggles with computational reproducibility: lessons from the fragile families challenge. Socius. 2019; 5: 2378023119849803. PubMed Abstract | Publisher Full Text | Free Full Text
12. Piccolo SR, Frampton MB: Tools and techniques for computational reproducibility. Gigascience. 2016; 5(1): s13742–s13016. Publisher Full Text
13. Peng RD: Reproducible Research in Computational Science. Science. 2011; 334(6060): 1226–1227. Publisher Full Text
14. Ioannidis JPA: Why Most Published Research Findings Are False. PLoS Med. 2005; 2(8): e124. PubMed Abstract | Publisher Full Text | Free Full Text
15. Brito JJ, Li J, Moore JH, et al.: Recommendations to enhance rigor and reproducibility in biomedical research. GigaScience. 2020; 9(6): giaa056. PubMed Abstract | Publisher Full Text | Free Full Text
16. Fogel K: How To Run A Successful Free Software Project-Producing Open Source Software. CreateSpace. 2009. Accessed: Aug. 26, 2025. Reference Source
17. Ram K: Git can facilitate greater reproducibility and increased transparency in science, Source Code. Biol. Med. 2013; 8: 1–8. Publisher Full Text
18. Boettiger C: An introduction to Docker for reproducible research. ACM SIGOPS Oper. Syst. Rev. 2015; 49(1): 71–79. Publisher Full Text
19. Cheifet B: Promoting reproducibility with Code Ocean. Genome Biol. 2021; 22(1): 65. PubMed Abstract | Publisher Full Text | Free Full Text
20. Nicolaï A: The approach to reproducible research of the Image Processing On Line (IPOL) Journal. Informatio. 2022; 27(1): 76–112.
21. T. UNESCOUNESCO recommendation on open science.United Nations Educational, Scientific and Cultural Organization; 2021.
22. McKiernan EC, Bourne PE, Brown CT, et al.: How open science helps researchers succeed. elife. 2016; 5. PubMed Abstract | Publisher Full Text | Free Full Text
23. Munafò MR, Nosek BA, Bishop DVM, et al.: A manifesto for reproducible science. Nat. Hum. Behav. 2017; 1: 0021–0021. PubMed Abstract | Publisher Full Text | Free Full Text
24. Munafò M: Open science and research reproducibility. Ecancermedicalscience. 2016; 10. PubMed Abstract | Publisher Full Text | Free Full Text
25. Wang H, Gainey M, Campbell P, et al.: Implementation and assessment of an end-to-end Open Science & Data Collaborations program. F1000 Res. 2022; 11: 501. PubMed Abstract | Publisher Full Text | Free Full Text
26. Scherer D, Valen D: Balancing Multiple Roles of Repositories. Developing a Comprehensive Repository at Carnegie Mellon University, Publications; 2019; vol. 7. ; p. 30. Publisher Full Text
27. Van Tuyl SI: Developing a research data management services infrastructure at. Carnegie Mellon University, Carnegie Mellon University; 2013.
28. Wang H, Gainey M, Van Gulick A: Carnegie Mellon’s first Open Science Symposium - Themes about research data and their reuse.2019. (accessed April 1, 2024). Reference Source
29. Griego C: Integrating and Evaluating Best Practices in Research Reproducibility. Carnegie Mellon University; 2023. (accessed March 9, 2024). Reference Source
30. Longmeier MM: Hackathons and libraries: the evolving landscape 2014-2020.2021.
31. Bongiovanni E, Slayton E, Agate N, et al.: AI Literacy Resource Hackathon. Open Sci. Framew. 2024. Publisher Full Text
32. Al Khleifat A, Smith J, Blobner B, et al.: SnpReportR: A Tool for Clinical Reporting of RNAseq Expression and Variants. BioHackrXiv. 2021.
33. Kubica J, Kumar R, Tan G, et al.: The fourth annual Carnegie Mellon Libraries hackathon for biomedical data management, knowledge graphs, and deep learning. BioHackrXiv. 2023.
34. Hettne KM, Proppert R, Nab L, et al.: ReprohackNL 2019: how libraries can promote research reproducibility through community engagement. IASSIST Q. 2020; 44(1/2): 1–10. Publisher Full Text
35. Warren C: Replication Data for: Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB). Code Ocean. 2024. Publisher Full Text
36. Warren CN: Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB). J. Cult. Anal. 2018; 3(1). Publisher Full Text
37. Guldi J: The Dangerous Art of Text Mining: A Methodology for Digital History. Cambridge: Cambridge University Press; 2023. Publisher Full Text
38. Longley Arthur P, Hearn L: Toward open research: A narrative review of the challenges and opportunities for open humanities. J. Commun. 2021; 71(5): 827–853.
39. Griego C, Warren CN, Scotti K, et al.: Reproducibility Hackathon.2024; 2024. (accessed April 23, 2024). Reference Source
40. Warren C: Replication Data for: Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB). Harvard Dataverse. Nov. 19, 2018. Publisher Full Text
41. Biswas S: Role of ChatGPT in Computer Programming.: ChatGPT in Computer Programming. Mesop. J. Comput. Sci. 2023; 2023: 8–16.
42. Guldi J: From Critique to Audit: A Pragmatic Response to the Climate Emergency from the Humanities and Social Sciences, and a Call to Action. KNOW J. Form. Knowl. 2021; 5: 169–196. Publisher Full Text
43. Warren C, Scotti K: Replication Data for: Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB). Code Ocean. 2024. Publisher Full Text

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 31 Oct 2024

Author details Author details

Chasz Griego
Roles: Conceptualization, Data Curation, Formal Analysis, Methodology, Project Administration, Resources, Software, Supervision, Validation, Writing – Original Draft Preparation

Kristen Scotti
Roles: Conceptualization, Data Curation, Formal Analysis, Resources, Software, Validation, Visualization, Writing – Original Draft Preparation

Elizabeth Terveen
Roles: Data Curation, Formal Analysis, Validation, Visualization

Joseph Chan
Roles: Data Curation, Formal Analysis, Software, Validation

Daisy Sheng
Roles: Data Curation, Visualization

Alfredo González-Espinoza
Roles: Data Curation, Formal Analysis, Validation

Christopher Warren
Roles: Conceptualization, Data Curation, Resources, Software, Visualization, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (2)

version 2

Revised

Published: 09 Sep 2025, 13:1305

https://doi.org/10.12688/f1000research.156917.2

version 1

Published: 31 Oct 2024, 13:1305

https://doi.org/10.12688/f1000research.156917.1

© 2025 Griego C et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Griego C, Scotti K, Terveen E et al. A libraries reproducibility hackathon: connecting students to university research and testing the longevity of published code [version 2; peer review: 3 approved]. F1000Research 2025, 13:1305 (https://doi.org/10.12688/f1000research.156917.2)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 2

VERSION 2

PUBLISHED 09 Sep 2025

Revised

Views

Reviewer Report 15 Oct 2025

Stephen J Eglen, Cambridge University, Cambridge, UK

Approved

https://doi.org/10.5256/f1000research.187759.r413599

As this paper has already been through review, it looks like most of the key points have already been noted and corrected.

I found the study easy to read and am supportive of indexing. I have just a few minor suggestions that might improve the paper.

"Eight figures were not successfully reproduced because of file paths in the code": could you fix the code, or did the author get feedback about what went wrong here?

"were was not documented"

Was the author present when you did the hackathon? (Our experience in codecheck hackathons is that they go much quicker, as expected, when authors are present to address any issues.)

Thank you for providing the supplementary information that shows what you did at the event; could you possibly annotate (e.g. a README) for your osf.io collection so that people can easily see what you have shared? It wasn't immediately clear to me what you had provided.

As a general comment, if you were to run another hackathon along these lines, is there anything that you would do differently next time?

Is the background of the case’s history and progression described in sufficient detail?

Yes
Is the work clearly and accurately presented and does it cite the current literature?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes
Is the case presented with sufficient detail to be useful for teaching or other practitioners?

Partly

Competing Interests: I am one of the leads on the project https://codecheck.org.uk

Reviewer Expertise: Computational Neuroscience; Reproducible Research (codecheck.org.uk)

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 06 Oct 2025

Teresa Gomez-Diaz, Universite Gustave Eiffel / Laboratoire d'informatique Gaspard-Monge / CNRS, Marne-la-Vallée, Île-de-France, France

Approved

https://doi.org/10.5256/f1000research.187759.r412602

The second version is fine. The authors have replied to all comments, and they have fixed the main ... Continue reading

CITE

Report a concern

Respond or Comment

Version 1

VERSION 1

PUBLISHED 31 Oct 2024

Views

Reviewer Report 07 Jan 2025

Teresa Gomez-Diaz, Universite Gustave Eiffel / Laboratoire d'informatique Gaspard-Monge / CNRS, Marne-la-Vallée, Île-de-France, France

Tomas Recio, Community of Madrid, Universidad Antonio de Nebrija, Madrid, Spain

Approved with Reservations

https://doi.org/10.5256/f1000research.172297.r355235

CASE STUDY
A libraries reproducibility hackathon: connecting students to University research and testing the longevity of published code https://f1000research.com/articles/13-1305/v1

This case study refers to a hackathon conducted by the members of the Carnegie Mellon University (CMU) Libraries in order to reproduce research results produced by Professor Christopher Warren on data analysis of the Oxford Dictionary of National Biography (ODNB). The event was open for participants coming from CMU and other institutions in the greater Pittsburgh area and was conducted by members of the CMU Libraries. This experience gives an interesting setting to explore reproducibility of published research results by a Professor of the University who also provides data and the produced software involved in the research, and shows a good example of a collaboration between Librarians and members (students?) of the Computer Science School.

However, there are a few points in this manuscript that need attention.

1. Abstract. As included in the Case Studies guidelines [ https://f1000research.com/for-authors/article-guidelines/case-studies ], abstract should be structured into Background, Methods, Results, and Conclusions . In particular, the conclusions should be better presented in the abstract (and in the manuscript, see later).

2. Hackathon context. As explained in the document (see also [ https://www.library.cmu.edu/abou)t/news/2024-04/reproducibility-hackathon ], the event was open to a large (institutional) public, but the title of the manuscript refers to students. More insight about the participants (how many, how many students, which level, which scientific background…) should be provided in order to better seize the complexity of the event. Please, clarify if the three authors of the manuscript coming from the Computer Science School where also participants of the event and if they are students or Members of the School. Moreover, it was indicated that “participants could have knowledge and awareness of scientific programming with Python and/or R” was desired, but was not required. Was this a difficulty? How many participants had or had not programming knowledge?

3. Problems with data and code It is mentioned that: “At this hackathon event, the organizers observed how participants faced challenges with insufficient documentation, data behind a paywall, and problems with code and proprietary software.” But there is no further information about these observations.

4. Reproducibility success. It is mentioned that: “Of the figures produced in the notebook, fourteen figures were successfully reproduced by Terveen … and eight figures were not successfully reproduced.” Or that “Though there was trial and error, participants were able to successfully reproduce figures from this study”. But which is the percentage of success? How many figures were in the initial work by Prof. Warren and how many were reproduced by all the participants?

5. Insufficient documentation and code longevity. As noticed by participant Chan, a “requirements.txt” file could help to facilitate the understanding of the software context of the code produced by Prof. Warren. Our work (see Gomez-Diaz T, et al., 2022 [Ref 1]) goes further in this kind of recommendations, proposing flexible procedures for research software and research data dissemination.
It is extremely idealistic to expect that software released at some time will find easily runable environments, unless there is the available resources (human, funding) to keep the necessary maintenance of the produced software (who can use Windows version 5?). To accompany the research produced software with good documentation and description of the necessary components will help to get to reproducibility environments, that is, to follow appropriate dissemination procedures will be of help for reproducibility issues. Nevertheless, it will be not always possible to ensure the full reproducibility and, in our opinion, a partial solution consists in to release the research software as free/open source software, where the accessibility and the study of the source code is confirmed by the use of free/open source licenses.

6. Conclusions. Conclusions are poorly described, and more information on the benefits, found drawbacks and other consequences of the event, for participants, students, Prof. Warren and event organizers should be provided to complete this case study.

7. References. We would like to suggest the inclusion of these references:

for Open Science: UNESCO: UNESCO Recommendation on Open Science. 2021 [Ref 2]
for Free/Open Source Software: K. Fogel (2005-2022). Producing Open Source Software. How to Run a Successful Free Software Project, https://producingoss.com
for an example of a Journal giving a reproducibility environment for their publications:
IPOL Journal · Image Processing On Line [ https://www.ipol.im/
Nicolaï, A., et al., 2022 [Ref 3]

8. Use of privately owned companies. We would like to raise awareness about the use of privately owned companies like Code Ocean [ https://codeocean.com/ ] for the storage and manipulation of data and code produced within the scientific community. Usually the scientific community is not well aware of legal issues related to the terms of use of these platforms and their consequences, like, for example: `` You hereby grant Code Ocean a worldwide, perpetual, irrevocable, royalty-free right and license to use and exploit any ideas, suggestions, comments, recommendations, enhancement requests or other input provided by you about the Platform to Code Ocean. '' (see point 3. Feedback in [ https://codeocean.com/terms-of-use ]). Regarding OSF, the Center for Open Science, Inc. (referred to as "the COS", "Open Science Framework", "OSF") declares to be a nonprofit organization, but also provides with similar terms of use: `` by depositing any Content (including data and links, but not the linked content) in a COS account, you grant the COS a perpetual, irrevocable, worldwide, royalty-free license to store, reproduce, transmit, distribute, publicly perform, and publicly display such Happy… '' (see further information in [ https://github.com/CenterForOpenScience/cos.io/blob/master/TERMS_OF_USE.md ]). Universities and other Research Performing Institutions, and, in particular, University and other institutional Libraries, could study the proposition of University or institutionally owned platforms and/or services to deal with their scientific production.

9. To complete the background context of this case study, some information on previously hosted hackathons by the CMU Libraries, like Biohackathons and an AI Literacy Resource Hackathon, could be included as they can be of interest for the Hackathon community.

Is the background of the case’s history and progression described in sufficient detail?

Partly
Is the work clearly and accurately presented and does it cite the current literature?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly
Is the case presented with sufficient detail to be useful for teaching or other practitioners?

Partly

References

1. Gomez-Diaz T, Recio T: Research Software vs. Research Data II: Protocols for Research Data dissemination and evaluation in the Open Science context.F1000Res. 2022; 11: 117 PubMed Abstract | Publisher Full Text
2. UNESCO: UNESCO Recommendation on Open Science. 2021. Publisher Full Text
3. A, Nicolaï Q, Bammey M, Gardella T, et al.: The approach to reproducible research of the Image Processing On Line (IPOL) journal. http://www.scielo.edu.uy/pdf/info/v27n1/2301-1378-info-27-01-76.pdf. 2022.

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Research Software, Research Data, Open Science, dissemination, evaluation

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

CITE

Report a concern

Author Response 12 Sep 2025

Chasz Griego, University Libraries, Carnegie Mellon University, Pittsburgh, 15213, USA

12 Sep 2025

Author Response

Thank you, we really appreciate your review and recommendations. We are grateful that you could offer your experience and perspective to help strengthen our paper, as well as offer questions ... Continue reading Thank you, we really appreciate your review and recommendations. We are grateful that you could offer your experience and perspective to help strengthen our paper, as well as offer questions that helped us add more clarity and critical points in our writing. Please see our responses to your comments, where we also note any changes made to the original manuscript:

“abstract should be structured into Background, Methods, Results, and Conclusions . In particular, the conclusions should be better presented in the abstract (and in the manuscript, see later).”

Response: Thank you for bringing this to our attention. We have restructured the abstract, resulting in something that we think will be easier for a reader to navigate. We also revised our conclusions, both here and later in the manuscript (also mentioned later in this response).

“More insight about the participants (how many, how many students, which level, which scientific background…).... Please, clarify if the three authors of the manuscript coming from the Computer Science School where also participants of the event and if they are students or Members of the School.”

Response: Thank you for noting this ambiguity. We've added a statement clarifying that the three authors from the School of Computer Science were undergraduate students participating in the event. This clarification did significantly improve our narrative and summary.

“it was indicated that “participants could have knowledge and awareness of scientific programming with Python and/or R” was desired, but was not required. Was this a difficulty? How many participants had or had not programming knowledge?”

Response: This is good to clarify. Each student already had an established skillset with Python programming. With the limited pool of participants, a desire for programming knowledge was not a challenge, but this could be something to consider for future events, especially considering the aspect of reproducibility by any person or team.

“It is mentioned that: “At this hackathon event, the organizers observed how participants faced challenges with insufficient documentation, data behind a paywall, and problems with code and proprietary software.” But there is no further information about these observations.”

Response: We are grateful that you noted this because it actually revealed an error in our editing. This sentence is in reference to the cited hackathon in the previous statement (ReprohackNL). The paragraph break between these statements was an error. We fixed this error to hopefully make better sense.

“But which is the percentage of success? How many figures were in the initial work by Prof. Warren and how many were reproduced by all the participants?”

Response: Thank you for pointing out these details, which are a bit vague. We've edited the manuscript to mention that the original work had twenty-five figures to reproduce. We also changed the wording in the latter sentence to read with more accuracy. Our paper is not focusing on if the entire work was reproduced during this event, rather, the paper is focusing on the details and nuances of research code that are revealed when a group comes together to try to reproduce the outcomes.

“Our work (see Gomez-Diaz T, et al., 2022 [Ref 1]) goes further in this kind of recommendations, proposing flexible procedures for research software and research data dissemination. It is extremely idealistic to expect that software released at some time will find easily runable environments, unless there is the available resources… Nevertheless, it will be not always possible to ensure the full reproducibility and, in our opinion, a partial solution consists in to release the research software as free/open source software, where the accessibility and the study of the source code is confirmed by the use of free/open source licenses.”

Response: These were all excellent points, especially those written in your work. We decided to add another paragraph to the conclusion section to address how reproducibility also relies on adequate resources and systems, which their availability may also depreciate with time, and that documentation offers further guidance and transparency to help other users reuse research software and data, even when resources are scarce. We appreciated the opportunity to add this critical aspect to our conclusions.

“Conclusions are poorly described, and more information on the benefits, found drawbacks and other consequences of the event, for participants, students, Prof. Warren and event organizers should be provided to complete this case study.”

Response: Thank you for this suggestion. Your review greatly helped us better communicate the impact and implications of running this type of hackathon. It also adds further context for readers to learn how to readapt this model. The final paragraphs of this paper now offer further detail about the benefits, drawbacks, and lessons learned.

“We would like to suggest the inclusion of these references…”

Response: Thank you for these great recommendations. These references were added to the appropriate places in the manuscript.

“Use of privately owned companies…”

Response: We found your point here one of the most important to call to attention. In our libraries, and even within our Open Science initiatives, we often promote tools and platforms citing the benefits of places that host your research materials and offer increased accessibility and interoperability. However, it is true that while these companies create these products, they also have business models to sustain and insurance around their products. This was important to note in our concluding section to highlight that researchers and librarians may have rightful reservations when using these products.
Thank you, we really appreciate your review and recommendations. We are grateful that you could offer your experience and perspective to help strengthen our paper, as well as offer questions that helped us add more clarity and critical points in our writing. Please see our responses to your comments, where we also note any changes made to the original manuscript:

“abstract should be structured into Background, Methods, Results, and Conclusions . In particular, the conclusions should be better presented in the abstract (and in the manuscript, see later).”

Response: Thank you for bringing this to our attention. We have restructured the abstract, resulting in something that we think will be easier for a reader to navigate. We also revised our conclusions, both here and later in the manuscript (also mentioned later in this response).

“More insight about the participants (how many, how many students, which level, which scientific background…).... Please, clarify if the three authors of the manuscript coming from the Computer Science School where also participants of the event and if they are students or Members of the School.”

Response: Thank you for noting this ambiguity. We've added a statement clarifying that the three authors from the School of Computer Science were undergraduate students participating in the event. This clarification did significantly improve our narrative and summary.

“it was indicated that “participants could have knowledge and awareness of scientific programming with Python and/or R” was desired, but was not required. Was this a difficulty? How many participants had or had not programming knowledge?”

Response: This is good to clarify. Each student already had an established skillset with Python programming. With the limited pool of participants, a desire for programming knowledge was not a challenge, but this could be something to consider for future events, especially considering the aspect of reproducibility by any person or team.

“It is mentioned that: “At this hackathon event, the organizers observed how participants faced challenges with insufficient documentation, data behind a paywall, and problems with code and proprietary software.” But there is no further information about these observations.”

Response: We are grateful that you noted this because it actually revealed an error in our editing. This sentence is in reference to the cited hackathon in the previous statement (ReprohackNL). The paragraph break between these statements was an error. We fixed this error to hopefully make better sense.

“But which is the percentage of success? How many figures were in the initial work by Prof. Warren and how many were reproduced by all the participants?”

Response: Thank you for pointing out these details, which are a bit vague. We've edited the manuscript to mention that the original work had twenty-five figures to reproduce. We also changed the wording in the latter sentence to read with more accuracy. Our paper is not focusing on if the entire work was reproduced during this event, rather, the paper is focusing on the details and nuances of research code that are revealed when a group comes together to try to reproduce the outcomes.

“Our work (see Gomez-Diaz T, et al., 2022 [Ref 1]) goes further in this kind of recommendations, proposing flexible procedures for research software and research data dissemination. It is extremely idealistic to expect that software released at some time will find easily runable environments, unless there is the available resources… Nevertheless, it will be not always possible to ensure the full reproducibility and, in our opinion, a partial solution consists in to release the research software as free/open source software, where the accessibility and the study of the source code is confirmed by the use of free/open source licenses.”

Response: These were all excellent points, especially those written in your work. We decided to add another paragraph to the conclusion section to address how reproducibility also relies on adequate resources and systems, which their availability may also depreciate with time, and that documentation offers further guidance and transparency to help other users reuse research software and data, even when resources are scarce. We appreciated the opportunity to add this critical aspect to our conclusions.

“Conclusions are poorly described, and more information on the benefits, found drawbacks and other consequences of the event, for participants, students, Prof. Warren and event organizers should be provided to complete this case study.”

Response: Thank you for this suggestion. Your review greatly helped us better communicate the impact and implications of running this type of hackathon. It also adds further context for readers to learn how to readapt this model. The final paragraphs of this paper now offer further detail about the benefits, drawbacks, and lessons learned.

“We would like to suggest the inclusion of these references…”

Response: Thank you for these great recommendations. These references were added to the appropriate places in the manuscript.

“Use of privately owned companies…”

Response: We found your point here one of the most important to call to attention. In our libraries, and even within our Open Science initiatives, we often promote tools and platforms citing the benefits of places that host your research materials and offer increased accessibility and interoperability. However, it is true that while these companies create these products, they also have business models to sustain and insurance around their products. This was important to note in our concluding section to highlight that researchers and librarians may have rightful reservations when using these products.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 12 Sep 2025

Chasz Griego, University Libraries, Carnegie Mellon University, Pittsburgh, 15213, USA

12 Sep 2025

Author Response

Thank you, we really appreciate your review and recommendations. We are grateful that you could offer your experience and perspective to help strengthen our paper, as well as offer questions ... Continue reading Thank you, we really appreciate your review and recommendations. We are grateful that you could offer your experience and perspective to help strengthen our paper, as well as offer questions that helped us add more clarity and critical points in our writing. Please see our responses to your comments, where we also note any changes made to the original manuscript:

“abstract should be structured into Background, Methods, Results, and Conclusions . In particular, the conclusions should be better presented in the abstract (and in the manuscript, see later).”

Response: Thank you for bringing this to our attention. We have restructured the abstract, resulting in something that we think will be easier for a reader to navigate. We also revised our conclusions, both here and later in the manuscript (also mentioned later in this response).

“More insight about the participants (how many, how many students, which level, which scientific background…).... Please, clarify if the three authors of the manuscript coming from the Computer Science School where also participants of the event and if they are students or Members of the School.”

Response: Thank you for noting this ambiguity. We've added a statement clarifying that the three authors from the School of Computer Science were undergraduate students participating in the event. This clarification did significantly improve our narrative and summary.

“it was indicated that “participants could have knowledge and awareness of scientific programming with Python and/or R” was desired, but was not required. Was this a difficulty? How many participants had or had not programming knowledge?”

Response: This is good to clarify. Each student already had an established skillset with Python programming. With the limited pool of participants, a desire for programming knowledge was not a challenge, but this could be something to consider for future events, especially considering the aspect of reproducibility by any person or team.

“It is mentioned that: “At this hackathon event, the organizers observed how participants faced challenges with insufficient documentation, data behind a paywall, and problems with code and proprietary software.” But there is no further information about these observations.”

Response: We are grateful that you noted this because it actually revealed an error in our editing. This sentence is in reference to the cited hackathon in the previous statement (ReprohackNL). The paragraph break between these statements was an error. We fixed this error to hopefully make better sense.

“But which is the percentage of success? How many figures were in the initial work by Prof. Warren and how many were reproduced by all the participants?”

Response: Thank you for pointing out these details, which are a bit vague. We've edited the manuscript to mention that the original work had twenty-five figures to reproduce. We also changed the wording in the latter sentence to read with more accuracy. Our paper is not focusing on if the entire work was reproduced during this event, rather, the paper is focusing on the details and nuances of research code that are revealed when a group comes together to try to reproduce the outcomes.

“Our work (see Gomez-Diaz T, et al., 2022 [Ref 1]) goes further in this kind of recommendations, proposing flexible procedures for research software and research data dissemination. It is extremely idealistic to expect that software released at some time will find easily runable environments, unless there is the available resources… Nevertheless, it will be not always possible to ensure the full reproducibility and, in our opinion, a partial solution consists in to release the research software as free/open source software, where the accessibility and the study of the source code is confirmed by the use of free/open source licenses.”

Response: These were all excellent points, especially those written in your work. We decided to add another paragraph to the conclusion section to address how reproducibility also relies on adequate resources and systems, which their availability may also depreciate with time, and that documentation offers further guidance and transparency to help other users reuse research software and data, even when resources are scarce. We appreciated the opportunity to add this critical aspect to our conclusions.

“Conclusions are poorly described, and more information on the benefits, found drawbacks and other consequences of the event, for participants, students, Prof. Warren and event organizers should be provided to complete this case study.”

Response: Thank you for this suggestion. Your review greatly helped us better communicate the impact and implications of running this type of hackathon. It also adds further context for readers to learn how to readapt this model. The final paragraphs of this paper now offer further detail about the benefits, drawbacks, and lessons learned.

“We would like to suggest the inclusion of these references…”

Response: Thank you for these great recommendations. These references were added to the appropriate places in the manuscript.

“Use of privately owned companies…”

Response: We found your point here one of the most important to call to attention. In our libraries, and even within our Open Science initiatives, we often promote tools and platforms citing the benefits of places that host your research materials and offer increased accessibility and interoperability. However, it is true that while these companies create these products, they also have business models to sustain and insurance around their products. This was important to note in our concluding section to highlight that researchers and librarians may have rightful reservations when using these products.
Thank you, we really appreciate your review and recommendations. We are grateful that you could offer your experience and perspective to help strengthen our paper, as well as offer questions that helped us add more clarity and critical points in our writing. Please see our responses to your comments, where we also note any changes made to the original manuscript:

“abstract should be structured into Background, Methods, Results, and Conclusions . In particular, the conclusions should be better presented in the abstract (and in the manuscript, see later).”

Response: Thank you for bringing this to our attention. We have restructured the abstract, resulting in something that we think will be easier for a reader to navigate. We also revised our conclusions, both here and later in the manuscript (also mentioned later in this response).

“More insight about the participants (how many, how many students, which level, which scientific background…).... Please, clarify if the three authors of the manuscript coming from the Computer Science School where also participants of the event and if they are students or Members of the School.”

Response: Thank you for noting this ambiguity. We've added a statement clarifying that the three authors from the School of Computer Science were undergraduate students participating in the event. This clarification did significantly improve our narrative and summary.

“it was indicated that “participants could have knowledge and awareness of scientific programming with Python and/or R” was desired, but was not required. Was this a difficulty? How many participants had or had not programming knowledge?”

Response: This is good to clarify. Each student already had an established skillset with Python programming. With the limited pool of participants, a desire for programming knowledge was not a challenge, but this could be something to consider for future events, especially considering the aspect of reproducibility by any person or team.

“It is mentioned that: “At this hackathon event, the organizers observed how participants faced challenges with insufficient documentation, data behind a paywall, and problems with code and proprietary software.” But there is no further information about these observations.”

Response: We are grateful that you noted this because it actually revealed an error in our editing. This sentence is in reference to the cited hackathon in the previous statement (ReprohackNL). The paragraph break between these statements was an error. We fixed this error to hopefully make better sense.

“But which is the percentage of success? How many figures were in the initial work by Prof. Warren and how many were reproduced by all the participants?”

Response: Thank you for pointing out these details, which are a bit vague. We've edited the manuscript to mention that the original work had twenty-five figures to reproduce. We also changed the wording in the latter sentence to read with more accuracy. Our paper is not focusing on if the entire work was reproduced during this event, rather, the paper is focusing on the details and nuances of research code that are revealed when a group comes together to try to reproduce the outcomes.

“Our work (see Gomez-Diaz T, et al., 2022 [Ref 1]) goes further in this kind of recommendations, proposing flexible procedures for research software and research data dissemination. It is extremely idealistic to expect that software released at some time will find easily runable environments, unless there is the available resources… Nevertheless, it will be not always possible to ensure the full reproducibility and, in our opinion, a partial solution consists in to release the research software as free/open source software, where the accessibility and the study of the source code is confirmed by the use of free/open source licenses.”

Response: These were all excellent points, especially those written in your work. We decided to add another paragraph to the conclusion section to address how reproducibility also relies on adequate resources and systems, which their availability may also depreciate with time, and that documentation offers further guidance and transparency to help other users reuse research software and data, even when resources are scarce. We appreciated the opportunity to add this critical aspect to our conclusions.

“Conclusions are poorly described, and more information on the benefits, found drawbacks and other consequences of the event, for participants, students, Prof. Warren and event organizers should be provided to complete this case study.”

Response: Thank you for this suggestion. Your review greatly helped us better communicate the impact and implications of running this type of hackathon. It also adds further context for readers to learn how to readapt this model. The final paragraphs of this paper now offer further detail about the benefits, drawbacks, and lessons learned.

“We would like to suggest the inclusion of these references…”

Response: Thank you for these great recommendations. These references were added to the appropriate places in the manuscript.

“Use of privately owned companies…”

Response: We found your point here one of the most important to call to attention. In our libraries, and even within our Open Science initiatives, we often promote tools and platforms citing the benefits of places that host your research materials and offer increased accessibility and interoperability. However, it is true that while these companies create these products, they also have business models to sustain and insurance around their products. This was important to note in our concluding section to highlight that researchers and librarians may have rightful reservations when using these products.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 04 Dec 2024

Benjamin Antunes, Université Clermont Auvergne, Clermont-Ferrand, France

Approved

https://doi.org/10.5256/f1000research.172297.r339316

This paper presents a reproducibility hackathon where students attempted to replicate the work of Professor Christopher Warren. In my opinion, this is an excellent initiative. Education is a critical avenue for improving reproducibility, as it raises awareness among students and future researchers about its importance.

The paper is not highly technical; instead, it provides an overview of the types of challenges encountered when attempting to reproduce results. It highlights the use of literate programming and ChatGPT to facilitate code reproduction and illustrates common difficulties, such as handling software dependencies and library versioning in Python. The hackathon focused on research in the field of Digital Humanities.

My Remarks:
Introduction – First Paragraph: As noted by Barba (2018), computer science was one of the only field to use swaped definitions between reproducibility and replicability (reproducibility is doing the same experiment to obtain the same results, while replicability is doing another experiment to obtain the same scientific conclusion). ACM recently adopted the NISO-recommended terminology for computer science:

"*As a result of discussions with the National Information Standards Organization (NISO), it was recommended that ACM harmonize its terminology and definitions with those used in the broader scientific research community, and ACM agreed with NISO’s recommendation to swap the terms “reproducibility” and “replication” with the existing definitions used by ACM as part of its artifact review and badging initiative. ACM took action to update all prior badging to ensure consistency."
https://www.acm.org/publications/policies/artifact-review-and-badging-current

This shift swaped the definition of reproducibility and replicability. However, from an epistemological standpoint, reproducibility remains the broader, general term.
The paper omits a crucial concept in computer science: repeatability (bitwise repeatability), which is indispensable for debugging (in the context of computer science)
Code execution depends on the entire software environment. Open-source software is essential, starting with the operating system (e.g., Linux). However, Docker might not be the best tool for encapsulating computing environments (in the context of reproducible research), as it was not initially designed for reproducibility. I suggest exploring Guix, which is purpose-built for reproducible research, and Software Heritage for archiving software.

You rightly emphasize the importance of sharing code and data, aligning with the open science movement. It would be helpful to mention tools like Software Heritage and Zenodo for sharing resources. I noticed the mention of a locally managed public repository, KiltHub. If this repository is not accessible to researchers outside Carnegie Mellon University, highlighting alternative platforms like Zenodo remains relevant.

The inclusion of Figure 1 might not add value to the reader. This paper is a practical feedback, primarily aimed at showcasing the challenges of reproducing work. Instead, figures illustrating reproducibility workflows or tools (e.g., Table 1) might be more informative.

While Python is widely used due to its ease of use, it is not ideal for reproducibility because of its instability compared to languages like Bash or C, that are more stable. Nonetheless, I agree that results should be reproducible by all, including non-experts, so Python remains a good practical choice.

The paper briefly mentions Docker in the introduction but does not elaborate on its usage. It may be more relevant to discuss tools like Guix and Software Heritage, which were explicitly created to address reproducibility challenges. While the paper focuses on Digital Humanities, tools such as CodeOcean (originating from bioinformatics) and FAIR principles are broadly applicable.

The paper focus on Python Jupyter notebooks and literate programming is a solid first step for enabling reproducibility. However, other fields like biology and physics, among others, also face reproducibility crises. Broader discussions could include data storage, workflow tools (for example Snakemake to keep it simple), virtual machines, containers, floating-point arithmetic issues, or parallel computing. But of course, the objective of the paper is to present the hackathon, so we cannot talk about everything. I just write it here if you want to take a look.

Computers, often seen as “black boxes,” create reproducibility challenges across disciplines. A short introduction to computer science and reproducible research should be incorporated into all scientific curricula. I can recommend you the MOOC: https://www.fun-mooc.fr/en/courses/reproducible-research-methodological-principles-transparent-scie/
In addition, I have written one survey about reproducible research, even if it was more oriented to High performance computing, you might benefit from its read:
Antunes B, et al., 2024 (Ref 1)

One detail: In the PDF version of the paper, the abstract text is not justified. Shouldn’t it be?

Conclusion
This hackathon is an excellent initiative to raise awareness among students and young researchers about the challenges of reproducibility. While the paper provides a high-level overview, focusing on literate programming with Jupyter notebooks and the problem of software dependency hell, it serves as a good introduction to the topic.

Is the background of the case’s history and progression described in sufficient detail?

Yes
Is the work clearly and accurately presented and does it cite the current literature?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes
Is the case presented with sufficient detail to be useful for teaching or other practitioners?

Yes

References

1. Antunes B, Hill D: Reproducibility, Replicability and Repeatability: A survey of reproducible research with a focus on high performance computing. Computer Science Review. 2024; 53. Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: My research area if reproducible research in high performance computing. This paper could benefit from having broader information about reproducibility tools, however, it is a short paper describing a Hackathon.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Author Response 12 Sep 2025

Chasz Griego, University Libraries, Carnegie Mellon University, Pittsburgh, 15213, USA

12 Sep 2025

Author Response

Thank you for your review. We really appreciate your thoughtful assessment of our work, and we are thrilled to receive your perspective and recognition. Please see our responses to your ... Continue reading Thank you for your review. We really appreciate your thoughtful assessment of our work, and we are thrilled to receive your perspective and recognition. Please see our responses to your comments, where we also note any changes made to the original manuscript:

“ACM recently adopted the NISO-recommended terminology for computer science…”

Response: We appreciate you noting the consistency of the definitions of reproducibility and replicability, between ACM (and NISO) with what is mentioned in our paper, especially in the context of artifact badging. This was good to note, showing how there exists multiple guidelines that follow this convention.

“The paper omits a crucial concept in computer science: repeatability (bitwise repeatability), which is indispensable for debugging (in the context of computer science)”

Response: We definitely agree that bitwise repeatability is highly notable for fully reproducible computer science, and we're grateful that you note it here. However, as this paper is addressing a less technical scope of reproducible computational research (ie. basic data science and analysis, in contrast to high performance computing), we are content with the scope of detail on computational reproducible that we chose to mention.

“It would be helpful to mention tools like Software Heritage and Zenodo for sharing resources. I noticed the mention of a locally managed public repository, KiltHub. If this repository is not accessible to researchers outside Carnegie Mellon University, highlighting alternative platforms like Zenodo remains relevant.”

Response: Thank you for mentioning both of these platforms and suggesting we mention them. We agree and have included them in the third paragraph of the introduction, as they are greatly contributing to sharing and preserving software. We agree that a repository like Zenodo is a great alternative to an institutional repository, like KiltHub. However, we prefer to not highlight Zotero or other platforms in the paragraph with KiltHub, since the paragraph was designed to focus specifically on the background behind CMU's Open Science infrastructure, rather than the array of options available for open and reproducible research.

“Instead, figures illustrating reproducibility workflows or tools (e.g., Table 1) might be more informative.”

Response: This is a great suggestion. We instead decided to convert this figure into a diagram that illustrates the two options (paths) to reproducibility. This diagram still uses one of the original data visualizations as an example, but in much better context.

“The paper briefly mentions Docker in the introduction but does not elaborate on its usage. It may be more relevant to discuss tools like Guix and Software Heritage, which were explicitly created to address reproducibility challenges.”

Response: Those are fair points. Again, we only mention tools/platforms and avoid adding excessive detail for the sake of the paper’s scope. However, we have now included mentions to Guix and Software Heritage based on your recommendation.

“Broader discussions could include data storage, workflow tools (for example Snakemake to keep it simple), virtual machines, containers, floating-point arithmetic issues, or parallel computing.”

Response: We really appreciate these points you mention. Reproducibility, let alone computational reproducibility, is so nuanced in each field. Issues surrounding data storage, high-throughput computing, and other software architectures are important to consider, and we hope that efforts like this hackathon, and conversations around it, will promote further awareness and practice that leads to these factors being further addressed.

“I can recommend you the MOOC…. In addition, I have written one survey about reproducible research, even if it was more oriented to High performance computing, you might benefit from its read…”

Response: Thank you for sharing the MOOC. The content is very reflective of the practices we aim to instill in our students through the libraries, especially topics around structured Markdown notes, Git version control, and notebooks.

Thank you for also sharing your survey. We hope to learn further about reproducibility challenges, especially with high performance computing, and apply this to further efforts. We are especially grateful that we could receive your comments, seeing that you authored such a thorough and extensive survey on the topic. We will also include this reference in the manuscript.

“One detail: In the PDF version of the paper, the abstract text is not justified. Shouldn’t it be?”

Response: Thanks for asking this. We will redirect this to the editor
Thank you for your review. We really appreciate your thoughtful assessment of our work, and we are thrilled to receive your perspective and recognition. Please see our responses to your comments, where we also note any changes made to the original manuscript:

“ACM recently adopted the NISO-recommended terminology for computer science…”

Response: We appreciate you noting the consistency of the definitions of reproducibility and replicability, between ACM (and NISO) with what is mentioned in our paper, especially in the context of artifact badging. This was good to note, showing how there exists multiple guidelines that follow this convention.

“The paper omits a crucial concept in computer science: repeatability (bitwise repeatability), which is indispensable for debugging (in the context of computer science)”

Response: We definitely agree that bitwise repeatability is highly notable for fully reproducible computer science, and we're grateful that you note it here. However, as this paper is addressing a less technical scope of reproducible computational research (ie. basic data science and analysis, in contrast to high performance computing), we are content with the scope of detail on computational reproducible that we chose to mention.

“It would be helpful to mention tools like Software Heritage and Zenodo for sharing resources. I noticed the mention of a locally managed public repository, KiltHub. If this repository is not accessible to researchers outside Carnegie Mellon University, highlighting alternative platforms like Zenodo remains relevant.”

Response: Thank you for mentioning both of these platforms and suggesting we mention them. We agree and have included them in the third paragraph of the introduction, as they are greatly contributing to sharing and preserving software. We agree that a repository like Zenodo is a great alternative to an institutional repository, like KiltHub. However, we prefer to not highlight Zotero or other platforms in the paragraph with KiltHub, since the paragraph was designed to focus specifically on the background behind CMU's Open Science infrastructure, rather than the array of options available for open and reproducible research.

“Instead, figures illustrating reproducibility workflows or tools (e.g., Table 1) might be more informative.”

Response: This is a great suggestion. We instead decided to convert this figure into a diagram that illustrates the two options (paths) to reproducibility. This diagram still uses one of the original data visualizations as an example, but in much better context.

“The paper briefly mentions Docker in the introduction but does not elaborate on its usage. It may be more relevant to discuss tools like Guix and Software Heritage, which were explicitly created to address reproducibility challenges.”

Response: Those are fair points. Again, we only mention tools/platforms and avoid adding excessive detail for the sake of the paper’s scope. However, we have now included mentions to Guix and Software Heritage based on your recommendation.

“Broader discussions could include data storage, workflow tools (for example Snakemake to keep it simple), virtual machines, containers, floating-point arithmetic issues, or parallel computing.”

Response: We really appreciate these points you mention. Reproducibility, let alone computational reproducibility, is so nuanced in each field. Issues surrounding data storage, high-throughput computing, and other software architectures are important to consider, and we hope that efforts like this hackathon, and conversations around it, will promote further awareness and practice that leads to these factors being further addressed.

“I can recommend you the MOOC…. In addition, I have written one survey about reproducible research, even if it was more oriented to High performance computing, you might benefit from its read…”

Response: Thank you for sharing the MOOC. The content is very reflective of the practices we aim to instill in our students through the libraries, especially topics around structured Markdown notes, Git version control, and notebooks.

Thank you for also sharing your survey. We hope to learn further about reproducibility challenges, especially with high performance computing, and apply this to further efforts. We are especially grateful that we could receive your comments, seeing that you authored such a thorough and extensive survey on the topic. We will also include this reference in the manuscript.

“One detail: In the PDF version of the paper, the abstract text is not justified. Shouldn’t it be?”

Response: Thanks for asking this. We will redirect this to the editor
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 12 Sep 2025

Chasz Griego, University Libraries, Carnegie Mellon University, Pittsburgh, 15213, USA

12 Sep 2025

Author Response

Thank you for your review. We really appreciate your thoughtful assessment of our work, and we are thrilled to receive your perspective and recognition. Please see our responses to your ... Continue reading Thank you for your review. We really appreciate your thoughtful assessment of our work, and we are thrilled to receive your perspective and recognition. Please see our responses to your comments, where we also note any changes made to the original manuscript:

“ACM recently adopted the NISO-recommended terminology for computer science…”

Response: We appreciate you noting the consistency of the definitions of reproducibility and replicability, between ACM (and NISO) with what is mentioned in our paper, especially in the context of artifact badging. This was good to note, showing how there exists multiple guidelines that follow this convention.

“The paper omits a crucial concept in computer science: repeatability (bitwise repeatability), which is indispensable for debugging (in the context of computer science)”

Response: We definitely agree that bitwise repeatability is highly notable for fully reproducible computer science, and we're grateful that you note it here. However, as this paper is addressing a less technical scope of reproducible computational research (ie. basic data science and analysis, in contrast to high performance computing), we are content with the scope of detail on computational reproducible that we chose to mention.

“It would be helpful to mention tools like Software Heritage and Zenodo for sharing resources. I noticed the mention of a locally managed public repository, KiltHub. If this repository is not accessible to researchers outside Carnegie Mellon University, highlighting alternative platforms like Zenodo remains relevant.”

Response: Thank you for mentioning both of these platforms and suggesting we mention them. We agree and have included them in the third paragraph of the introduction, as they are greatly contributing to sharing and preserving software. We agree that a repository like Zenodo is a great alternative to an institutional repository, like KiltHub. However, we prefer to not highlight Zotero or other platforms in the paragraph with KiltHub, since the paragraph was designed to focus specifically on the background behind CMU's Open Science infrastructure, rather than the array of options available for open and reproducible research.

“Instead, figures illustrating reproducibility workflows or tools (e.g., Table 1) might be more informative.”

Response: This is a great suggestion. We instead decided to convert this figure into a diagram that illustrates the two options (paths) to reproducibility. This diagram still uses one of the original data visualizations as an example, but in much better context.

“The paper briefly mentions Docker in the introduction but does not elaborate on its usage. It may be more relevant to discuss tools like Guix and Software Heritage, which were explicitly created to address reproducibility challenges.”

Response: Those are fair points. Again, we only mention tools/platforms and avoid adding excessive detail for the sake of the paper’s scope. However, we have now included mentions to Guix and Software Heritage based on your recommendation.

“Broader discussions could include data storage, workflow tools (for example Snakemake to keep it simple), virtual machines, containers, floating-point arithmetic issues, or parallel computing.”

Response: We really appreciate these points you mention. Reproducibility, let alone computational reproducibility, is so nuanced in each field. Issues surrounding data storage, high-throughput computing, and other software architectures are important to consider, and we hope that efforts like this hackathon, and conversations around it, will promote further awareness and practice that leads to these factors being further addressed.

“I can recommend you the MOOC…. In addition, I have written one survey about reproducible research, even if it was more oriented to High performance computing, you might benefit from its read…”

Response: Thank you for sharing the MOOC. The content is very reflective of the practices we aim to instill in our students through the libraries, especially topics around structured Markdown notes, Git version control, and notebooks.

Thank you for also sharing your survey. We hope to learn further about reproducibility challenges, especially with high performance computing, and apply this to further efforts. We are especially grateful that we could receive your comments, seeing that you authored such a thorough and extensive survey on the topic. We will also include this reference in the manuscript.

“One detail: In the PDF version of the paper, the abstract text is not justified. Shouldn’t it be?”

Response: Thanks for asking this. We will redirect this to the editor
Thank you for your review. We really appreciate your thoughtful assessment of our work, and we are thrilled to receive your perspective and recognition. Please see our responses to your comments, where we also note any changes made to the original manuscript:

“ACM recently adopted the NISO-recommended terminology for computer science…”

Response: We appreciate you noting the consistency of the definitions of reproducibility and replicability, between ACM (and NISO) with what is mentioned in our paper, especially in the context of artifact badging. This was good to note, showing how there exists multiple guidelines that follow this convention.

“The paper omits a crucial concept in computer science: repeatability (bitwise repeatability), which is indispensable for debugging (in the context of computer science)”

Response: We definitely agree that bitwise repeatability is highly notable for fully reproducible computer science, and we're grateful that you note it here. However, as this paper is addressing a less technical scope of reproducible computational research (ie. basic data science and analysis, in contrast to high performance computing), we are content with the scope of detail on computational reproducible that we chose to mention.

“It would be helpful to mention tools like Software Heritage and Zenodo for sharing resources. I noticed the mention of a locally managed public repository, KiltHub. If this repository is not accessible to researchers outside Carnegie Mellon University, highlighting alternative platforms like Zenodo remains relevant.”

Response: Thank you for mentioning both of these platforms and suggesting we mention them. We agree and have included them in the third paragraph of the introduction, as they are greatly contributing to sharing and preserving software. We agree that a repository like Zenodo is a great alternative to an institutional repository, like KiltHub. However, we prefer to not highlight Zotero or other platforms in the paragraph with KiltHub, since the paragraph was designed to focus specifically on the background behind CMU's Open Science infrastructure, rather than the array of options available for open and reproducible research.

“Instead, figures illustrating reproducibility workflows or tools (e.g., Table 1) might be more informative.”

Response: This is a great suggestion. We instead decided to convert this figure into a diagram that illustrates the two options (paths) to reproducibility. This diagram still uses one of the original data visualizations as an example, but in much better context.

“The paper briefly mentions Docker in the introduction but does not elaborate on its usage. It may be more relevant to discuss tools like Guix and Software Heritage, which were explicitly created to address reproducibility challenges.”

Response: Those are fair points. Again, we only mention tools/platforms and avoid adding excessive detail for the sake of the paper’s scope. However, we have now included mentions to Guix and Software Heritage based on your recommendation.

“Broader discussions could include data storage, workflow tools (for example Snakemake to keep it simple), virtual machines, containers, floating-point arithmetic issues, or parallel computing.”

Response: We really appreciate these points you mention. Reproducibility, let alone computational reproducibility, is so nuanced in each field. Issues surrounding data storage, high-throughput computing, and other software architectures are important to consider, and we hope that efforts like this hackathon, and conversations around it, will promote further awareness and practice that leads to these factors being further addressed.

“I can recommend you the MOOC…. In addition, I have written one survey about reproducible research, even if it was more oriented to High performance computing, you might benefit from its read…”

Response: Thank you for sharing the MOOC. The content is very reflective of the practices we aim to instill in our students through the libraries, especially topics around structured Markdown notes, Git version control, and notebooks.

Thank you for also sharing your survey. We hope to learn further about reproducibility challenges, especially with high performance computing, and apply this to further efforts. We are especially grateful that we could receive your comments, seeing that you authored such a thorough and extensive survey on the topic. We will also include this reference in the manuscript.

“One detail: In the PDF version of the paper, the abstract text is not justified. Shouldn’t it be?”

Response: Thanks for asking this. We will redirect this to the editor
Competing Interests: No competing interests were disclosed. Close
Report a concern

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 31 Oct 2024

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 2 (revision) 09 Sep 25		read	read
Version 1 31 Oct 24	read	read

Benjamin Antunes, Université Clermont Auvergne, Clermont-Ferrand, France
Teresa Gomez-Diaz, Universite Gustave Eiffel / Laboratoire d'informatique Gaspard-Monge / CNRS, Marne-la-Vallée, France

Tomas Recio, Universidad Antonio de Nebrija, Madrid, Spain
Stephen J Eglen, Cambridge University, Cambridge, UK

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

3 Views

15 Oct 2025 | for Version 2

Stephen J Eglen, Cambridge University, Cambridge, UK

3 Views Cite this report Responses(0)

Approved

Is the background of the case’s history and progression described in sufficient detail?

Yes
Is the work clearly and accurately presented and does it cite the current literature?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes
Is the case presented with sufficient detail to be useful for teaching or other practitioners?

Partly

Competing Interests

I am one of the leads on the project https://codecheck.org.uk

Reviewer Expertise

Computational Neuroscience; Reproducible Research (codecheck.org.uk)

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

19 Views

06 Oct 2025 | for Version 2

Teresa Gomez-Diaz, Universite Gustave Eiffel / Laboratoire d'informatique Gaspard-Monge / CNRS, Marne-la-Vallée, Île-de-France, France

19 Views Cite this report Responses(0)

Approved

The second version is fine. The authors have replied to all comments, and they have fixed the main issues. This work relates a very interesting reproducibility experience that took place at the Carnegie Mellon University Libraries.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Research Software, Research Data, Open Science, dissemination, evaluation

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

39 Views

07 Jan 2025 | for Version 1

Teresa Gomez-Diaz, Universite Gustave Eiffel / Laboratoire d'informatique Gaspard-Monge / CNRS, Marne-la-Vallée, Île-de-France, France

Tomas Recio, Community of Madrid, Universidad Antonio de Nebrija, Madrid, Spain

39 Views Cite this report Responses(1)

Approved With Reservations

for Open Science: UNESCO: UNESCO Recommendation on Open Science. 2021 [Ref 2]
for Free/Open Source Software: K. Fogel (2005-2022). Producing Open Source Software. How to Run a Successful Free Software Project, https://producingoss.com
for an example of a Journal giving a reproducibility environment for their publications:
IPOL Journal · Image Processing On Line [ https://www.ipol.im/
Nicolaï, A., et al., 2022 [Ref 3]

Is the background of the case’s history and progression described in sufficient detail?

Partly
Is the work clearly and accurately presented and does it cite the current literature?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly
Is the case presented with sufficient detail to be useful for teaching or other practitioners?

Partly

References

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Research Software, Research Data, Open Science, dissemination, evaluation

Respond to this report

Responses (1)

Author Response

12 Sep 2025

Chasz Griego, University Libraries, Carnegie Mellon University, Pittsburgh, 15213, USA

Thank you, we really appreciate your review and recommendations. We are grateful that you could offer your experience and perspective to help strengthen our paper, as well as offer questions that helped us add more clarity and critical points in our writing. Please see our responses to your comments, where we also note any changes made to the original manuscript:

“abstract should be structured into Background, Methods, Results, and Conclusions . In particular, the conclusions should be better presented in the abstract (and in the manuscript, see later).”

Response: Thank you for bringing this to our attention. We have restructured the abstract, resulting in something that we think will be easier for a reader to navigate. We also revised our conclusions, both here and later in the manuscript (also mentioned later in this response).

“More insight about the participants (how many, how many students, which level, which scientific background…).... Please, clarify if the three authors of the manuscript coming from the Computer Science School where also participants of the event and if they are students or Members of the School.”

Response: Thank you for noting this ambiguity. We've added a statement clarifying that the three authors from the School of Computer Science were undergraduate students participating in the event. This clarification did significantly improve our narrative and summary.

“it was indicated that “participants could have knowledge and awareness of scientific programming with Python and/or R” was desired, but was not required. Was this a difficulty? How many participants had or had not programming knowledge?”

Response: This is good to clarify. Each student already had an established skillset with Python programming. With the limited pool of participants, a desire for programming knowledge was not a challenge, but this could be something to consider for future events, especially considering the aspect of reproducibility by any person or team.

“It is mentioned that: “At this hackathon event, the organizers observed how participants faced challenges with insufficient documentation, data behind a paywall, and problems with code and proprietary software.” But there is no further information about these observations.”

Response: We are grateful that you noted this because it actually revealed an error in our editing. This sentence is in reference to the cited hackathon in the previous statement (ReprohackNL). The paragraph break between these statements was an error. We fixed this error to hopefully make better sense.

“But which is the percentage of success? How many figures were in the initial work by Prof. Warren and how many were reproduced by all the participants?”

Response: Thank you for pointing out these details, which are a bit vague. We've edited the manuscript to mention that the original work had twenty-five figures to reproduce. We also changed the wording in the latter sentence to read with more accuracy. Our paper is not focusing on if the entire work was reproduced during this event, rather, the paper is focusing on the details and nuances of research code that are revealed when a group comes together to try to reproduce the outcomes.

“Our work (see Gomez-Diaz T, et al., 2022 [Ref 1]) goes further in this kind of recommendations, proposing flexible procedures for research software and research data dissemination. It is extremely idealistic to expect that software released at some time will find easily runable environments, unless there is the available resources… Nevertheless, it will be not always possible to ensure the full reproducibility and, in our opinion, a partial solution consists in to release the research software as free/open source software, where the accessibility and the study of the source code is confirmed by the use of free/open source licenses.”

Response: These were all excellent points, especially those written in your work. We decided to add another paragraph to the conclusion section to address how reproducibility also relies on adequate resources and systems, which their availability may also depreciate with time, and that documentation offers further guidance and transparency to help other users reuse research software and data, even when resources are scarce. We appreciated the opportunity to add this critical aspect to our conclusions.

“Conclusions are poorly described, and more information on the benefits, found drawbacks and other consequences of the event, for participants, students, Prof. Warren and event organizers should be provided to complete this case study.”

Response: Thank you for this suggestion. Your review greatly helped us better communicate the impact and implications of running this type of hackathon. It also adds further context for readers to learn how to readapt this model. The final paragraphs of this paper now offer further detail about the benefits, drawbacks, and lessons learned.

“We would like to suggest the inclusion of these references…”

Response: Thank you for these great recommendations. These references were added to the appropriate places in the manuscript.

“Use of privately owned companies…”

Response: We found your point here one of the most important to call to attention. In our libraries, and even within our Open Science initiatives, we often promote tools and platforms citing the benefits of places that host your research materials and offer increased accessibility and interoperability. However, it is true that while these companies create these products, they also have business models to sustain and insurance around their products. This was important to note in our concluding section to highlight that researchers and librarians may have rightful reservations when using these products.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

22 Views

04 Dec 2024 | for Version 1

Benjamin Antunes, Université Clermont Auvergne, Clermont-Ferrand, France

22 Views Cite this report Responses(1)

Approved

Is the background of the case’s history and progression described in sufficient detail?

Yes
Is the work clearly and accurately presented and does it cite the current literature?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes
Is the case presented with sufficient detail to be useful for teaching or other practitioners?

Yes

References

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

My research area if reproducible research in high performance computing. This paper could benefit from having broader information about reproducibility tools, however, it is a short paper describing a Hackathon.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (1)

Author Response

12 Sep 2025

Chasz Griego, University Libraries, Carnegie Mellon University, Pittsburgh, 15213, USA

Thank you for your review. We really appreciate your thoughtful assessment of our work, and we are thrilled to receive your perspective and recognition. Please see our responses to your comments, where we also note any changes made to the original manuscript:

“ACM recently adopted the NISO-recommended terminology for computer science…”

Response: We appreciate you noting the consistency of the definitions of reproducibility and replicability, between ACM (and NISO) with what is mentioned in our paper, especially in the context of artifact badging. This was good to note, showing how there exists multiple guidelines that follow this convention.

“The paper omits a crucial concept in computer science: repeatability (bitwise repeatability), which is indispensable for debugging (in the context of computer science)”

Response: We definitely agree that bitwise repeatability is highly notable for fully reproducible computer science, and we're grateful that you note it here. However, as this paper is addressing a less technical scope of reproducible computational research (ie. basic data science and analysis, in contrast to high performance computing), we are content with the scope of detail on computational reproducible that we chose to mention.

“It would be helpful to mention tools like Software Heritage and Zenodo for sharing resources. I noticed the mention of a locally managed public repository, KiltHub. If this repository is not accessible to researchers outside Carnegie Mellon University, highlighting alternative platforms like Zenodo remains relevant.”

Response: Thank you for mentioning both of these platforms and suggesting we mention them. We agree and have included them in the third paragraph of the introduction, as they are greatly contributing to sharing and preserving software. We agree that a repository like Zenodo is a great alternative to an institutional repository, like KiltHub. However, we prefer to not highlight Zotero or other platforms in the paragraph with KiltHub, since the paragraph was designed to focus specifically on the background behind CMU's Open Science infrastructure, rather than the array of options available for open and reproducible research.

“Instead, figures illustrating reproducibility workflows or tools (e.g., Table 1) might be more informative.”

Response: This is a great suggestion. We instead decided to convert this figure into a diagram that illustrates the two options (paths) to reproducibility. This diagram still uses one of the original data visualizations as an example, but in much better context.

“The paper briefly mentions Docker in the introduction but does not elaborate on its usage. It may be more relevant to discuss tools like Guix and Software Heritage, which were explicitly created to address reproducibility challenges.”

Response: Those are fair points. Again, we only mention tools/platforms and avoid adding excessive detail for the sake of the paper’s scope. However, we have now included mentions to Guix and Software Heritage based on your recommendation.

“Broader discussions could include data storage, workflow tools (for example Snakemake to keep it simple), virtual machines, containers, floating-point arithmetic issues, or parallel computing.”

Response: We really appreciate these points you mention. Reproducibility, let alone computational reproducibility, is so nuanced in each field. Issues surrounding data storage, high-throughput computing, and other software architectures are important to consider, and we hope that efforts like this hackathon, and conversations around it, will promote further awareness and practice that leads to these factors being further addressed.

“I can recommend you the MOOC…. In addition, I have written one survey about reproducible research, even if it was more oriented to High performance computing, you might benefit from its read…”

Response: Thank you for sharing the MOOC. The content is very reflective of the practices we aim to instill in our students through the libraries, especially topics around structured Markdown notes, Git version control, and notebooks.

Thank you for also sharing your survey. We hope to learn further about reproducibility challenges, especially with high performance computing, and apply this to further efforts. We are especially grateful that we could receive your comments, seeing that you authored such a thorough and extensive survey on the topic. We will also include this reference in the manuscript.

“One detail: In the PDF version of the paper, the abstract text is not justified. Shouldn’t it be?”

Response: Thanks for asking this. We will redirect this to the editor

View more View less

Competing Interests

No competing interests were disclosed.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Buckheit JB, Donoho DL: WaveLab and Reproducible Research.Antoniadis A, Oppenheim G, editors. Wavelets Stat. New York, NY: Springer; 1995; pp. 55–81. Publisher Full Text

[2] 2. Barba LA: Terminologies for reproducible research. ArXiv Prepr. ArXiv180203311. 2018.

[3] 3. Reproducibility and Replicability in Science. Natl. Acad. Sci. 2019. Reference Source

[4] 4. Baker M, Baker M: 1,500 scientists lift the lid on reproducibility. Nature. 2016; 533: 452–454. Publisher Full Text

[5] 5. Mullane K, Williams M: Enhancing reproducibility: Failures from Reproducibility Initiatives underline core challenges. Biochem. Pharmacol. 2017; 138: 7–18. PubMed Abstract | Publisher Full Text

[6] 6. Kohrs FE, Auer S, Bannach-Brown A, et al.: Eleven strategies for making reproducible research and open science training the norm at research institutions. elife. 2023; 12: e89736. PubMed Abstract | Publisher Full Text | Free Full Text

[7] 7. Rich-Edwards JW, Maney DL: Best practices to promote rigor and reproducibility in the era of sex-inclusive research. elife. 2023; 12: e90623. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Six factors affecting reproducibility in life science research and how to handle them.n.d. (accessed May 28, 2024). Reference Source

[9] 9. Antunes B, Hill DR: Reproducibility, Replicability and Repeatability: A survey of reproducible research with a focus on high performance computing. Comput. Sci. Rev. 2024; 53: 100655.

[10] 10. Implementing Reproducible Research. Routledge CRC Press; n.d. (accessed May 28, 2024). Reference Source

[11] 11. Liu DM, Salganik MJ: Successes and struggles with computational reproducibility: lessons from the fragile families challenge. Socius. 2019; 5: 2378023119849803. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Piccolo SR, Frampton MB: Tools and techniques for computational reproducibility. Gigascience. 2016; 5(1): s13742–s13016. Publisher Full Text

[13] 13. Peng RD: Reproducible Research in Computational Science. Science. 2011; 334(6060): 1226–1227. Publisher Full Text

[14] 14. Ioannidis JPA: Why Most Published Research Findings Are False. PLoS Med. 2005; 2(8): e124. PubMed Abstract | Publisher Full Text | Free Full Text

[15] 15. Brito JJ, Li J, Moore JH, et al.: Recommendations to enhance rigor and reproducibility in biomedical research. GigaScience. 2020; 9(6): giaa056. PubMed Abstract | Publisher Full Text | Free Full Text

[16] 16. Fogel K: How To Run A Successful Free Software Project-Producing Open Source Software. CreateSpace. 2009. Accessed: Aug. 26, 2025. Reference Source

[17] 17. Ram K: Git can facilitate greater reproducibility and increased transparency in science, Source Code. Biol. Med. 2013; 8: 1–8. Publisher Full Text

[18] 18. Boettiger C: An introduction to Docker for reproducible research. ACM SIGOPS Oper. Syst. Rev. 2015; 49(1): 71–79. Publisher Full Text

[19] 19. Cheifet B: Promoting reproducibility with Code Ocean. Genome Biol. 2021; 22(1): 65. PubMed Abstract | Publisher Full Text | Free Full Text

[20] 20. Nicolaï A: The approach to reproducible research of the Image Processing On Line (IPOL) Journal. Informatio. 2022; 27(1): 76–112.

[21] 21. T. UNESCOUNESCO recommendation on open science.United Nations Educational, Scientific and Cultural Organization; 2021.

[22] 22. McKiernan EC, Bourne PE, Brown CT, et al.: How open science helps researchers succeed. elife. 2016; 5. PubMed Abstract | Publisher Full Text | Free Full Text

[23] 23. Munafò MR, Nosek BA, Bishop DVM, et al.: A manifesto for reproducible science. Nat. Hum. Behav. 2017; 1: 0021–0021. PubMed Abstract | Publisher Full Text | Free Full Text

[24] 24. Munafò M: Open science and research reproducibility. Ecancermedicalscience. 2016; 10. PubMed Abstract | Publisher Full Text | Free Full Text

[25] 25. Wang H, Gainey M, Campbell P, et al.: Implementation and assessment of an end-to-end Open Science & Data Collaborations program. F1000 Res. 2022; 11: 501. PubMed Abstract | Publisher Full Text | Free Full Text

[26] 26. Scherer D, Valen D: Balancing Multiple Roles of Repositories. Developing a Comprehensive Repository at Carnegie Mellon University, Publications; 2019; vol. 7. ; p. 30. Publisher Full Text

[27] 27. Van Tuyl SI: Developing a research data management services infrastructure at. Carnegie Mellon University, Carnegie Mellon University; 2013.

[28] 28. Wang H, Gainey M, Van Gulick A: Carnegie Mellon’s first Open Science Symposium - Themes about research data and their reuse.2019. (accessed April 1, 2024). Reference Source

[29] 29. Griego C: Integrating and Evaluating Best Practices in Research Reproducibility. Carnegie Mellon University; 2023. (accessed March 9, 2024). Reference Source

[30] 30. Longmeier MM: Hackathons and libraries: the evolving landscape 2014-2020.2021.

[31] 31. Bongiovanni E, Slayton E, Agate N, et al.: AI Literacy Resource Hackathon. Open Sci. Framew. 2024. Publisher Full Text

[32] 32. Al Khleifat A, Smith J, Blobner B, et al.: SnpReportR: A Tool for Clinical Reporting of RNAseq Expression and Variants. BioHackrXiv. 2021.

[33] 33. Kubica J, Kumar R, Tan G, et al.: The fourth annual Carnegie Mellon Libraries hackathon for biomedical data management, knowledge graphs, and deep learning. BioHackrXiv. 2023.

[34] 34. Hettne KM, Proppert R, Nab L, et al.: ReprohackNL 2019: how libraries can promote research reproducibility through community engagement. IASSIST Q. 2020; 44(1/2): 1–10. Publisher Full Text

[35] 35. Warren C: Replication Data for: Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB). Code Ocean. 2024. Publisher Full Text

[36] 36. Warren CN: Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB). J. Cult. Anal. 2018; 3(1). Publisher Full Text

[37] 37. Guldi J: The Dangerous Art of Text Mining: A Methodology for Digital History. Cambridge: Cambridge University Press; 2023. Publisher Full Text

[38] 38. Longley Arthur P, Hearn L: Toward open research: A narrative review of the challenges and opportunities for open humanities. J. Commun. 2021; 71(5): 827–853.

[39] 39. Griego C, Warren CN, Scotti K, et al.: Reproducibility Hackathon.2024; 2024. (accessed April 23, 2024). Reference Source

[40] 40. Warren C: Replication Data for: Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB). Harvard Dataverse. Nov. 19, 2018. Publisher Full Text

[41] 41. Biswas S: Role of ChatGPT in Computer Programming.: ChatGPT in Computer Programming. Mesop. J. Comput. Sci. 2023; 2023: 8–16.

[42] 42. Guldi J: From Critique to Audit: A Pragmatic Response to the Climate Emergency from the Humanities and Social Sciences, and a Call to Action. KNOW J. Form. Knowl. 2021; 5: 169–196. Publisher Full Text

[43] 43. Warren C, Scotti K: Replication Data for: Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB). Code Ocean. 2024. Publisher Full Text

A libraries reproducibility hackathon: connecting students to university research and testing the longevity of published code

Abstract

Background

Methods

Results

Conclusions

Keywords

Revised Amendments from Version 1

Introduction

Purpose of the event

Event execution

Figure 1. Diagram depicting the two options to reproduce research during the hackathon.

Observations and reflections

Table 1. Technical issues experienced from participants attempting to reproduce results by running the code corresponding to Warren’s work.31

Additional outcomes and lessons learned

Figure 2. Adapted version (a) of Figure 29 (b) from Warren’s work.31

Discussion and conclusions

Ethics and consent

Author contributions

Data availability

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated

Table 1. Technical issues experienced from participants attempting to reproduce results by running the code corresponding to Warren’s work.³¹

Figure 2. Adapted version (a) of Figure 29 (b) from Warren’s work.³¹