Keywords
software, hackathon, documentation, tutorial, vignettes, programming, data science
This article is included in the Hackathons collection.
software, hackathon, documentation, tutorial, vignettes, programming, data science
Data science is an interdisciplinary field that relies heavily on the use of software tools. These tools require advanced domain-specific knowledge, which is often difficult to acquire and keep up-to-date considering the rate at which new methods become available and best practices evolve. This difficulty primarily stems from incomplete or unclear documentation. In the case of software packages (e.g. in R and Python), minimal documentation consists of describing inputs and outputs for individual functions. The Comprehensive R Archive Network (CRAN) is the de facto package repository for the R programming language and requires that submitted packages at least include this degree of documentation. CRAN also offers a framework for package developers to include additional documentation in the form of vignettes, which typically demonstrate real-world use cases. However, it appears that the majority of CRAN packages do not have vignettes1. On the other hand, vignettes are required for submission to Bioconductor, a package repository geared towards computational biology. This requirement has made Bioconductor and its numerous packages much more accessible to both new and experienced users. Unfortunately, the practice of including user-friendly vignettes or tutorials remains uncommon. To address this problem, the authors experimented with tutorial development in a hackathon project at hackseq 20172. We describe our experience in this article.
The aim of our project was to organize the collective knowledge of a group of computational biologists into modular tutorials that leveraged the same dataset. Tutorial topics were proposed by and then assigned to members of the team. The tutorials were designed to be independent from each other, but they can readily be combined to form workshop lessons that use the same dataset. In this paper, we explore the benefits and challenges associated with hackathon-driven tutorial development, including the trade-offs of remote hackathon participation. Briefly, we believe that hackathons are an excellent venue for tutorial development and are particularly suitable for remote participation.
We found at least four benefits of collaboratively developing tutorials in a hackathon setting. First, interest in a given topic can be assessed based on the voluntary participation of hackathon attendees. The assembly of people interested in a topic can further motivate tutorial development. Second, once a common dataset is selected and processed, team members can efficiently work in parallel. Third, although team members do not have to rely on one another, they may draw on the collective knowledge of the team. The various perspectives and ideas from different research specialities can guide tutorial design, resulting in higher-quality material. In more practical terms, developing tutorials during a hackathon allows problems to be more readily resolved, and team members can perform peer review, leading to more polished tutorials. Fourth, these hackathon projects often bring together community members that have yet to work together and can thus catalyze new collaborations.
That said, there are some challenges or considerations to keep in mind when developing tutorials in the context of hackathons. First, the hackathon project should feature a theme or topic that is focused in scope so that team members can assist one another. The skill level of the target audience should also be determined beforehand. Second, once a theme is decided, any existing tutorials should be identified beforehand to avoid repetition. As mentioned previously, vignettes exist for many software tools, and time is best invested in developing new material that is not yet available. Alternatively, one could build on the work of others by adapting or improving existing open-source tutorials. Third, we suggest that you identify a dataset that can be used in all proposed tutorials and meets the following criteria: openly accessible; properly formatted (i.e. little to no missing or malformed data); relevant to the target audience; and ideally sized (i.e. large enough to be interesting but small enough to fit on a personal computer). For example, in the case of tutorials geared towards computational biologists, there are several interesting human genomic datasets, but access is often restricted for privacy. Accordingly, it may be more practical to select a dataset first and then determine a theme and set of tutorial topics that can be developed using this dataset.
It is often cost-prohibitive to travel for short conferences, especially when travel awards do not cover non-traditional meetings such as hackathons. Fortunately, remote participation is not only possible for hackathons, but relatively straightforward. Several tools exist to support collaborative projects while eliminating the need for collocation. For instance, GitHub offers decentralized code sharing, Skype enables face-to-face team discussions, and Slack is a popular platform for asynchronous communication, which is essential when team members inhabit distant time zones. For example, we successfully managed our project despite being located in Vancouver, BC with remote participants in China and France. We argue that remote participation is especially straightforward for tutorial development because material can be developed independently and thus asynchronously. On the other hand, virtual attendance precludes any participation in social or networking events. There is also additional work involved in ensuring that every local and remote team member have assigned tasks at any given time. Overall though, we believe that allowing remote participation is a net benefit for a hackathon project, especially for tutorial development.
In conclusion, our experience with developing tutorials at a hackathon with remote participants was positive. We believe that we were able to achieve more together than separately, mostly because we gain access to immediate peer review of our tutorials. We also learned lessons that will help ensure success for future hackathon projects geared toward tutorial development. We believe this approach to be generalizable to other fields and a model for assembling passionate data scientists with similar interests and organizing their collective knowledge into modular tutorials. In turn, these tutorials can greatly benefit the field of data science by facilitating the adoption of powerful tools and accelerating the training of future data scientists.
No data are associated with this article.
The authors acknowledge the hackseq 2017 steering committee for organizing the hackathon.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the topic of the opinion article discussed accurately in the context of the current literature?
Yes
Are all factual statements correct and adequately supported by citations?
Yes
Are arguments sufficiently supported by evidence from the published literature?
Yes
Are the conclusions drawn balanced and justified on the basis of the presented arguments?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: bioinformatics, community development
Is the topic of the opinion article discussed accurately in the context of the current literature?
Yes
Are all factual statements correct and adequately supported by citations?
Yes
Are arguments sufficiently supported by evidence from the published literature?
Yes
Are the conclusions drawn balanced and justified on the basis of the presented arguments?
Partly
References
1. What nobody tells you about documentation. [accessed 11th Jan 2019]. Reference SourceCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: bioinformatics, data science
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 1 24 Dec 18 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)