ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Design and implementation of semester long project and problem based bioinformatics course

[version 1; peer review: 3 approved with reservations]
PUBLISHED 25 Sep 2018
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Bioinformatics gateway.

This article is included in the Bioinformatics Education and Training Collection collection.

Abstract

Background: Advancements in ‘high-throughput technologies’ have inundated us with data across disciplines. As a result, there is a bottleneck in addressing the demand for analyzing data and training of ‘next generation data scientists’.
Methods: In response to this need, the authors designed a single semester “Bioinformatics” course that introduced a small cohort of students at the University of South Carolina to methods for analyzing data generated through different ‘omic’ platforms using variety of model systems.  The course was divided into seven modules with each module ending with a problem.
Results: Towards the end of the course, the students each designed a project that allowed them to pursue their individual interests. These completed projects were presented as talks and posters at ISCB-RSG-SEUSA symposium held at University of South Carolina.
Conclusions: An important outcome of this course design was that the students acquired the basic skills to critically evaluate the reporting and interpretation of data of a problem or a project during the symposium.

Keywords

bioinformatics education, problem-based learning, project-based learning,hands-on course

Introduction

Bioinformatics is a rapidly growing interdisciplinary field because of advances in both computer science and the life sciences. Rapid advances in sequencing technologies have led to a deluge of biological data, creating a need for expeditious, efficient, and effective analyses. Practioners of bioinformatics now add techniques from statistics, information science and engineering to develop algorithms and build predictive models to understand the dynamics within a biological system. This paradigm shift in how bioinformatics is perceived has resulted in an evolutionary model of growth across both of its root disciplines1. Bioinformatics as a field also enjoys a degree of duality: “episteme” (scientific knowledge) and “techne” (technical know-how), leading to the idea of ‘Science informing the tools and the tools enabling science’1. In a 2017 survey of 704 NSF principal investigators, more than 90% of respondents replied that they were soon to be working with data sets that required high-performance computing, and they also identified bioinformatics data analyses to be the most urgent and unmet need required for successful completion of their projects2. Increased exposure of students at an undergraduate level will help address the need for specialists working in this field and also make the students attractive for opportunities in industry or in graduate school35. The Global Organization for Bioinformatics Learning, Education and Training (GOBLET) identified through surveys that the skills required for ‘basic data stewardship’ are taught only in ~ 25% of education programs creating a gulf between theory and practice68.

Many courses have been designed and implemented to address the gaps faced in the field. They are project based, problem based or a combination of both to study one or more ‘next-generation’ datasets912. The courses have been designed as workshops9 or as semester long courses using analyses from a single next-generation technology10. The authors haven’t come across a course that incorporates multi-omics data analyses in a single semester. There have been studies that address a single problem using multi-omics approaches11 and there have been pipeline designs that help integrate these data under a single platform12.

In response to this need, we designed a single semester course on bioinformatics in the Department of Biological Sciences at University of South Carolina that was targeted towards undergraduate seniors and graduate students who were mainly bench scientists working on experiments which generated data across different ‘omic technologies’ using different living systems.

Challenges in design of bioinformatics curriculum

The curriculum task force of the ‘International Society of Computational Biology’, a scholarly society for both bioinformatics and computational biology research scientists across the world, identified a set of 16 core competencies established through surveys and an iterative process of inputs from people associated with the fields of bioinformatics and computational biology13.

However, one of the biggest challenges is the heterogeneity of the backgrounds of the course participants. There is ‘no one size fits all’ while designing a bioinformatics course. In fact, there are three different types of user groups that employ bioinformatics in their research (Table 1), and each of these user groups requires different competencies14,15.

Table 1. Characteristics of user groups.

User groupsCharacteristics
Bioinformatics Tool Users
(BTU)
These users access bioinformatics resources, packages and software to perform analyses specific
to their research domains. e.g. bench scientists, medical professionals
Bioinformatics Data Scientists
(BDS)
These users utilize computational methods to analyze data and advance the scientific understanding
of living systems.
Bioinformatics Engineers
(BE)
These users, create, develop and manage novel computational methods needed for novel scientific
discoveries.

Thus, there was considerable diversity in the backgrounds of the students registered for our course. In response, we chose to follow a ‘learner adaptable’ style of design of the curriculum. This approach allowed us to design the course based on the students’ knowledge of the subject and their expectations of the course.

Methods

Course design

Course conception. This course was designed to provide a structured Bioinformatics course that is geared towards the needs of students working on different “omics” experiments. The general premise of the course was to critically examine and analyze published or in-preparation datasets across different biological systems in a hands-on fashion. In addition, we wanted to introduce the students to the R programming language.

Course Participants. We had nine participants registered for the course. Four of the students were undergraduate seniors, four were first or second year graduate students and one of them was an emergency medical technician (EMT) with a Bachelor of Science degree who was taking additional classes for credit and is now in medical school.

Learning objectives and outcomes of the course. We sent a three-question survey (Table 2) to all the participants to understand their reasoning for registering in the course.

Table 2. Survey questions sent out to the students.

Question premiseReasons for the questionResponses
Q1) Previous Programming
experience?
We wanted to gauge the level of expertise of the
students and identify the level of programming
to be introduced in class.
(i) 4 participants had taken a course on R.*
(ii) 5 participants had no previous experience
using any bioinformatics software or programming
languages.
Q2) Motivation for registering in
the course?
We wanted to understand the rationale of the
students participating in the course
Unanimous response of the participants was that
they were working on some type of benchwork that
would generate “omic” data.
Q3 Take away from the course?We wanted to ensure our learning outcomes
matched the expectations of the course
participants.
-Understand types of sequencing technologies
-Learn how to analyze data
-Learn better practices of biological data
management

*Since we did not have this information in the pre-class survey answers, we asked students their experience with programming languages in class. We got 7 responses in total to the pre-lab survey.

The primary learning objective of the course was to introduce the students to the breadth and depth of the field of Bioinformatics for ‘omics’ data analyses. We also identified the following three course outcomes for the students.

  • I. At the end of the course, students should be able to identify and implement alternate strategies to answer genomics-based research questions.

  • II. Students should be comfortable with the use open-source genomic software and command line programming, and be able to use R statistical packages.

  • III. Students should be able to design and troubleshoot analyses of nucleotide sequence data and elicit biological information from the data.

Course structure

The course was divided into seven modules spread across the semester: Genome assembly and annotation, Comparative genomics, Introduction to Statistics, Metagenomics, Transcriptomics, Proteomics and Cancer data analysis. Each module ended with a graded research problem either in a prokaryotic system or a eukaryotic system (Table 3 and Supplementary File 1).

Table 3. Summaries of course modules *.

ModuleTopics coveredSoftwareProject
Genome assembly and
annotation
(i)DNA sequencing and its advances over the
years. (ii) Assembly of a bacterial genome
from nucleotide sequencing data, and
submission to NCBI GenBank
Artemis : A free genome browser and annotation tool
that allows visualization of sequence features15.
1. Students were asked to download the Caulobacter
segnis genome and identify the potential sequencing
errors.
2. Project report on the HeLa. Strategies on
identification of the difference between healthy
and non-healthy cells. Ways of identifying HPV 18
contamination in Hela cells
Comparative Genomics(i) Strategies to identify prokaryotic and
eukaryotic genes
(ii)Strategies for genome comparison: genome
size, genomic signature, gene order analyses
through sequence alignment
MAUVE: Multiple genome aligner to compare genomes
for evolutionary events and rearrangements16.
Comparative analyses of ‘Odorant binding proteins’
among strains of Drosophila melanogaster and Apis
mellifera.
Students performed homology comparisons and
constructed phylogenetic trees to observe OBP
diversification across the genomes.
Metagenomics1. Importance of metagenomics across
research domains.
2. Exploring types of research questions
answered by metagenomic based studies
3. How to set up metagenomic studies, data
extraction , submission and analyses through
MG-RAST pipeline
MG- RAST pipeline: It provides an automated quality
control, annotation, comparative analysis and archiving
service of metagenomic and amplicon sequences
using a combination of several bioinformatics tools17.
STAMP: software package for analyzing taxonomic and
metabolic profiles by choosing appropriate statistical
techniques18.
Comparison and analyses of the Global Ocean
Sampling Expedition data available at the MG-RAST
data repository. Students were also introduced to
statistical hypothesis testing within data sets and
between data sets.
Introduction to statistics(i)Descriptive and Inferential statistics.
(ii) Univariate and Bivariate analyses
(iii) ANOVA and PCA
R Statistical package: Students were introduced to the
R package and were given cheat sheets on how to
load, access, and manipulate biological data.
Students were introduced to these concepts and then
allowed to work on their comparative metagenomics
data analyses projects.
TranscriptomicsStudents were introduced to the RNA
sequencing technologies and analyzed data
from an RNAi knock-down experiment of the
pasilla splicing factor gene in Drosophila19.
R Statistical package20Students detected differentially expressed genes
using R packages and learned how to take
confounding factors into account in differential
expression analysis. They were also introduced to
different visualization packages in R.
ProteomicsStudents were introduced to protein diversity
characterization using proteomics. The dataset
used for this module was from Bioconductor
Conference held at Stanford in July 2016.
R Statistical packages Student used R/Bioconductor packages to
explore, process, visualize, and understand mass
spectrometry-based proteomics data.
Cancer data analysesThis module was offered by Dr. Phillip
Buckhaults
(Director of the Cancer Genetics laboratory at
the University of South Carolina)
UCSC Cancer genomics browser21, TCGA22 , Gene set
enrichment analysis23
Students were reintroduced to RNASEQ analysis
and its role in generation of cervical cancer data
for Dr. Buckhaults’ recent paper24. They were also
shown the features of UCSCS Cancer genome
browser. Students analyzed TCGA database for gene
expression association analyses for Gliobastoma.
Further data mining was carried out using Gene set
enrichment analyses were carried out for previously
identified genes to check for statistical importance.

*All the presentations associated with each module, course assignments and problem assignments are available for access in the supplementary section of the paper. The final projects that were presented as posters and talks are not available for access at this time.

Results

Based on the responses of the students, we assigned potential user groups as explained in Table 1 at the start of the class with their expected competency levels at the end of the class. Seven students replied and two students did not reply to the pre-course survey. We were able to obtain permission from six of the seven students who replied to the survey to have their answers published online anonymously. Any identifying information in terms of names or project details have been edited from the responses (Table 4).

Table 4. Student pre class and expected user groups.

StudentPre-class User groupExpected user group
1.Bioinformatics Tool UserBioinformatics Tool User
2Bioinformatics Tool UserBioinformatics Data scientist
3Bioinformatics Tool UserBioinformatics tool user, Bioinformatics Data scientist
4Bioinformatics Tool UserBioinformatics Data scientist
5Bioinformatics Tool UserBioinformatics Data scientist
6Bioinformatics Tool UserBioinformatics Tool User

Successful completion of the project assigned to every student by the end of a course module determined their competency of the course. In lieu of a final exam, each student designed a research project, conducted appropriate analyses, and summarized their results in the form of a poster or a talk at the end of the semester as part of the ISCB-RSG-SE USA (International society of Computational biology-Regional student group-Southeast USA) conference held on campus on Dec 8/9 of 2017. They also had the opportunity to listen to talks from professors working on bioinformatics projects and interacted with their peers from University of South Florida and University of Alabama. In addition, two graduate students wrote papers on their projects with input from their respective research advisors.

Dataset 1.Pre-class survey.
Dataset 2.Post-class survey.

Discussion

This course covered a lot of topics in 13 weeks and some degree of mastery was required for each topic. In addition, half of the students had no familiarity with programming. As a result, many of the students were stretched beyond their comfort zone. However, since this was a small class, we were able to work with the students individually to help them be successful, and also tailor projects to the students’ backgrounds and expectations. An important outcome of this course design was that the students acquired the basic skills to critically evaluate the reporting and interpretation of data of a problem or a project during the symposium.

Our leading goal was to develop a course that was responsive to the needs and background abilities of the participating students. It is important to recognize that every course will have students at different levels of learning with different goals. Hence when designing a course that caters to the needs of the students, it may be a good idea to have a small class.

In our class, every student had a different learning curve. We determined the competency of a student per module by their successful completion of the problem set and or the project. The first objective of the course was to expose the students to not just one living system but many including Bacterial, Human, Drosophila. The other objective was to introduce the students to the R computational platform20. Our initial challenge was to address the problems faced by the students in using the platform for the first time. We wanted the students to understand the intricacies of using R as a programming language but if we repeat this class, we will have the codes for the students as R- markdown documents. We would also have additional R assignments at the beginning of the course and out of class help sessions to help students get comfortable using R.

A major challenge was to identify ways to map the competencies required to the expectations of the course at both the undergraduate and graduate levels. Since we had a small number of students, we designed and delivered a structured curriculum that integrated both the continuously changing and stable technological platforms using model systems that were used by at least one student for every module.

As the important goal of the course was to address the needs of the students, we designed the current model of ‘multi-project’ modules of biological data analyses. Due to the small class size, we were able to give personalized attention to every student. In the future, a big change that we would incorporate would be to separate the projects and problems assigned to graduate and undergraduate students. Generally, the undergraduate students do not have their own data while the graduate students usually have or are in the process of obtaining data that they want to analyze. Therefore, we would either have separate sections for the graduate and undergraduate students or we would have a combined lecture but separate recitation section where the students would apply what they have learned in the lecture portion of the class. The graduate students would be encouraged to develop projects that are relevant to their research while the undergraduates would work in groups on projects designed by the instructor.

Keypoints

  • This course was designed to address the students need to analyze ‘omic’ data sets at University of South Carolina

  • It was divided into seven modules with practical tasks at the end of each module.

  • Students designed their projects and presented it as papers, posters and talks at The ISCB- RSG-SEUSA symposium.

Data availability

Dataset 1: Pre-class surveys 10.5256/f1000research.16310.d21886325

Dataset 2: Post-class surveys 10.5256/f1000research.16310.d21886426

Ethical considerations

The authors have posted the pre-class survey answers of students who have consented to have their responses published anonymously. All identifying information has been edited from the responses. The post–class survey responses are given as a feedback to the instructors, also anonymously, through an online survey carried out by the university.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Sep 2018
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Saarunya G and Ely B. Design and implementation of semester long project and problem based bioinformatics course [version 1; peer review: 3 approved with reservations]. F1000Research 2018, 7(ISCB Comm J):1547 (https://doi.org/10.12688/f1000research.16310.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 25 Sep 2018
Views
28
Cite
Reviewer Report 17 Dec 2018
Allegra Via, National Research Council of Italy (CNR), Institute of Molecular Biology and Pathology (IBPM), c/o Department of Biochemical Sciences "A. Rossi Fanelli", Sapienza University of Rome, Rome, Italy 
Approved with Reservations
VIEWS 28
The paper describes a semester long bioinformatics course targeting graduate seniors and graduate students who were bench scientists in need for learning how to analyse data generated across different ‘omic technologies’.
 
I find it weird that “The ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Via A. Reviewer Report For: Design and implementation of semester long project and problem based bioinformatics course [version 1; peer review: 3 approved with reservations]. F1000Research 2018, 7(ISCB Comm J):1547 (https://doi.org/10.5256/f1000research.17818.r40462)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
25
Cite
Reviewer Report 10 Dec 2018
Mark A. Pauley, National Science Foundation, Alexandria, VA, USA 
Approved with Reservations
VIEWS 25
“Design and Implementation of Semester Long Project and Problem based Bioinformatics Course” describes a “multi-omics” bioinformatics course at the University of South Carolina intended for advanced undergraduates and graduate students. The course was implemented in Fall 2017; nine students took ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Pauley MA. Reviewer Report For: Design and implementation of semester long project and problem based bioinformatics course [version 1; peer review: 3 approved with reservations]. F1000Research 2018, 7(ISCB Comm J):1547 (https://doi.org/10.5256/f1000research.17818.r40513)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
43
Cite
Reviewer Report 02 Nov 2018
Russell Schwartz, Department of Biological Sciences and Computational Biology Department, Carnegie Mellon University , Pittsburgh, PA, USA 
Approved with Reservations
VIEWS 43
Saarunya and Ely describe a problem-based bioinformatics course designed to meet a need for “next generation data scientists” in the life sciences, a need identified by many current efforts in life sciences education. Case studies of course development efforts like ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Schwartz R. Reviewer Report For: Design and implementation of semester long project and problem based bioinformatics course [version 1; peer review: 3 approved with reservations]. F1000Research 2018, 7(ISCB Comm J):1547 (https://doi.org/10.5256/f1000research.17818.r39760)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 26 Nov 2018
    Geetha Saarunya, Biological Sciences, University of South Carolina, Columbia, 29208, USA
    26 Nov 2018
    Author Response
    The authors would like to thank Dr. Schwartz for his in-depth and insightful feedback on the paper. Following are the comments from the authors, which will be incorporated into the ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 26 Nov 2018
    Geetha Saarunya, Biological Sciences, University of South Carolina, Columbia, 29208, USA
    26 Nov 2018
    Author Response
    The authors would like to thank Dr. Schwartz for his in-depth and insightful feedback on the paper. Following are the comments from the authors, which will be incorporated into the ... Continue reading

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Sep 2018
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.