Keywords
bioinformatics education, problem-based learning, project-based learning,hands-on course
This article is included in the Bioinformatics gateway.
This article is included in the Bioinformatics Education and Training Collection collection.
bioinformatics education, problem-based learning, project-based learning,hands-on course
Bioinformatics is a rapidly growing interdisciplinary field because of advances in both computer science and the life sciences. Rapid advances in sequencing technologies have led to a deluge of biological data, creating a need for expeditious, efficient, and effective analyses. Practioners of bioinformatics now add techniques from statistics, information science and engineering to develop algorithms and build predictive models to understand the dynamics within a biological system. This paradigm shift in how bioinformatics is perceived has resulted in an evolutionary model of growth across both of its root disciplines1. Bioinformatics as a field also enjoys a degree of duality: “episteme” (scientific knowledge) and “techne” (technical know-how), leading to the idea of ‘Science informing the tools and the tools enabling science’1. In a 2017 survey of 704 NSF principal investigators, more than 90% of respondents replied that they were soon to be working with data sets that required high-performance computing, and they also identified bioinformatics data analyses to be the most urgent and unmet need required for successful completion of their projects2. Increased exposure of students at an undergraduate level will help address the need for specialists working in this field and also make the students attractive for opportunities in industry or in graduate school3–5. The Global Organization for Bioinformatics Learning, Education and Training (GOBLET) identified through surveys that the skills required for ‘basic data stewardship’ are taught only in ~ 25% of education programs creating a gulf between theory and practice6–8.
Many courses have been designed and implemented to address the gaps faced in the field. They are project based, problem based or a combination of both to study one or more ‘next-generation’ datasets9–12. The courses have been designed as workshops9 or as semester long courses using analyses from a single next-generation technology10. The authors haven’t come across a course that incorporates multi-omics data analyses in a single semester. There have been studies that address a single problem using multi-omics approaches11 and there have been pipeline designs that help integrate these data under a single platform12.
In response to this need, we designed a single semester course on bioinformatics in the Department of Biological Sciences at University of South Carolina that was targeted towards undergraduate seniors and graduate students who were mainly bench scientists working on experiments which generated data across different ‘omic technologies’ using different living systems.
The curriculum task force of the ‘International Society of Computational Biology’, a scholarly society for both bioinformatics and computational biology research scientists across the world, identified a set of 16 core competencies established through surveys and an iterative process of inputs from people associated with the fields of bioinformatics and computational biology13.
However, one of the biggest challenges is the heterogeneity of the backgrounds of the course participants. There is ‘no one size fits all’ while designing a bioinformatics course. In fact, there are three different types of user groups that employ bioinformatics in their research (Table 1), and each of these user groups requires different competencies14,15.
Thus, there was considerable diversity in the backgrounds of the students registered for our course. In response, we chose to follow a ‘learner adaptable’ style of design of the curriculum. This approach allowed us to design the course based on the students’ knowledge of the subject and their expectations of the course.
Course conception. This course was designed to provide a structured Bioinformatics course that is geared towards the needs of students working on different “omics” experiments. The general premise of the course was to critically examine and analyze published or in-preparation datasets across different biological systems in a hands-on fashion. In addition, we wanted to introduce the students to the R programming language.
Course Participants. We had nine participants registered for the course. Four of the students were undergraduate seniors, four were first or second year graduate students and one of them was an emergency medical technician (EMT) with a Bachelor of Science degree who was taking additional classes for credit and is now in medical school.
Learning objectives and outcomes of the course. We sent a three-question survey (Table 2) to all the participants to understand their reasoning for registering in the course.
Question premise | Reasons for the question | Responses |
---|---|---|
Q1) Previous Programming experience? | We wanted to gauge the level of expertise of the students and identify the level of programming to be introduced in class. | (i) 4 participants had taken a course on R.* (ii) 5 participants had no previous experience using any bioinformatics software or programming languages. |
Q2) Motivation for registering in the course? | We wanted to understand the rationale of the students participating in the course | Unanimous response of the participants was that they were working on some type of benchwork that would generate “omic” data. |
Q3 Take away from the course? | We wanted to ensure our learning outcomes matched the expectations of the course participants. | -Understand types of sequencing technologies -Learn how to analyze data -Learn better practices of biological data management |
The primary learning objective of the course was to introduce the students to the breadth and depth of the field of Bioinformatics for ‘omics’ data analyses. We also identified the following three course outcomes for the students.
I. At the end of the course, students should be able to identify and implement alternate strategies to answer genomics-based research questions.
II. Students should be comfortable with the use open-source genomic software and command line programming, and be able to use R statistical packages.
III. Students should be able to design and troubleshoot analyses of nucleotide sequence data and elicit biological information from the data.
The course was divided into seven modules spread across the semester: Genome assembly and annotation, Comparative genomics, Introduction to Statistics, Metagenomics, Transcriptomics, Proteomics and Cancer data analysis. Each module ended with a graded research problem either in a prokaryotic system or a eukaryotic system (Table 3 and Supplementary File 1).
Module | Topics covered | Software | Project |
---|---|---|---|
Genome assembly and annotation | (i)DNA sequencing and its advances over the years. (ii) Assembly of a bacterial genome from nucleotide sequencing data, and submission to NCBI GenBank | Artemis : A free genome browser and annotation tool that allows visualization of sequence features15. | 1. Students were asked to download the Caulobacter segnis genome and identify the potential sequencing errors. 2. Project report on the HeLa. Strategies on identification of the difference between healthy and non-healthy cells. Ways of identifying HPV 18 contamination in Hela cells |
Comparative Genomics | (i) Strategies to identify prokaryotic and eukaryotic genes (ii)Strategies for genome comparison: genome size, genomic signature, gene order analyses through sequence alignment | MAUVE: Multiple genome aligner to compare genomes for evolutionary events and rearrangements16. | Comparative analyses of ‘Odorant binding proteins’ among strains of Drosophila melanogaster and Apis mellifera. Students performed homology comparisons and constructed phylogenetic trees to observe OBP diversification across the genomes. |
Metagenomics | 1. Importance of metagenomics across research domains. 2. Exploring types of research questions answered by metagenomic based studies 3. How to set up metagenomic studies, data extraction , submission and analyses through MG-RAST pipeline | MG- RAST pipeline: It provides an automated quality control, annotation, comparative analysis and archiving service of metagenomic and amplicon sequences using a combination of several bioinformatics tools17. STAMP: software package for analyzing taxonomic and metabolic profiles by choosing appropriate statistical techniques18. | Comparison and analyses of the Global Ocean Sampling Expedition data available at the MG-RAST data repository. Students were also introduced to statistical hypothesis testing within data sets and between data sets. |
Introduction to statistics | (i)Descriptive and Inferential statistics. (ii) Univariate and Bivariate analyses (iii) ANOVA and PCA | R Statistical package: Students were introduced to the R package and were given cheat sheets on how to load, access, and manipulate biological data. | Students were introduced to these concepts and then allowed to work on their comparative metagenomics data analyses projects. |
Transcriptomics | Students were introduced to the RNA sequencing technologies and analyzed data from an RNAi knock-down experiment of the pasilla splicing factor gene in Drosophila19. | R Statistical package20 | Students detected differentially expressed genes using R packages and learned how to take confounding factors into account in differential expression analysis. They were also introduced to different visualization packages in R. |
Proteomics | Students were introduced to protein diversity characterization using proteomics. The dataset used for this module was from Bioconductor Conference held at Stanford in July 2016. | R Statistical packages | Student used R/Bioconductor packages to explore, process, visualize, and understand mass spectrometry-based proteomics data. |
Cancer data analyses | This module was offered by Dr. Phillip Buckhaults (Director of the Cancer Genetics laboratory at the University of South Carolina) | UCSC Cancer genomics browser21, TCGA22 , Gene set enrichment analysis23 | Students were reintroduced to RNASEQ analysis and its role in generation of cervical cancer data for Dr. Buckhaults’ recent paper24. They were also shown the features of UCSCS Cancer genome browser. Students analyzed TCGA database for gene expression association analyses for Gliobastoma. Further data mining was carried out using Gene set enrichment analyses were carried out for previously identified genes to check for statistical importance. |
Based on the responses of the students, we assigned potential user groups as explained in Table 1 at the start of the class with their expected competency levels at the end of the class. Seven students replied and two students did not reply to the pre-course survey. We were able to obtain permission from six of the seven students who replied to the survey to have their answers published online anonymously. Any identifying information in terms of names or project details have been edited from the responses (Table 4).
Successful completion of the project assigned to every student by the end of a course module determined their competency of the course. In lieu of a final exam, each student designed a research project, conducted appropriate analyses, and summarized their results in the form of a poster or a talk at the end of the semester as part of the ISCB-RSG-SE USA (International society of Computational biology-Regional student group-Southeast USA) conference held on campus on Dec 8/9 of 2017. They also had the opportunity to listen to talks from professors working on bioinformatics projects and interacted with their peers from University of South Florida and University of Alabama. In addition, two graduate students wrote papers on their projects with input from their respective research advisors.
This course covered a lot of topics in 13 weeks and some degree of mastery was required for each topic. In addition, half of the students had no familiarity with programming. As a result, many of the students were stretched beyond their comfort zone. However, since this was a small class, we were able to work with the students individually to help them be successful, and also tailor projects to the students’ backgrounds and expectations. An important outcome of this course design was that the students acquired the basic skills to critically evaluate the reporting and interpretation of data of a problem or a project during the symposium.
Our leading goal was to develop a course that was responsive to the needs and background abilities of the participating students. It is important to recognize that every course will have students at different levels of learning with different goals. Hence when designing a course that caters to the needs of the students, it may be a good idea to have a small class.
In our class, every student had a different learning curve. We determined the competency of a student per module by their successful completion of the problem set and or the project. The first objective of the course was to expose the students to not just one living system but many including Bacterial, Human, Drosophila. The other objective was to introduce the students to the R computational platform20. Our initial challenge was to address the problems faced by the students in using the platform for the first time. We wanted the students to understand the intricacies of using R as a programming language but if we repeat this class, we will have the codes for the students as R- markdown documents. We would also have additional R assignments at the beginning of the course and out of class help sessions to help students get comfortable using R.
A major challenge was to identify ways to map the competencies required to the expectations of the course at both the undergraduate and graduate levels. Since we had a small number of students, we designed and delivered a structured curriculum that integrated both the continuously changing and stable technological platforms using model systems that were used by at least one student for every module.
As the important goal of the course was to address the needs of the students, we designed the current model of ‘multi-project’ modules of biological data analyses. Due to the small class size, we were able to give personalized attention to every student. In the future, a big change that we would incorporate would be to separate the projects and problems assigned to graduate and undergraduate students. Generally, the undergraduate students do not have their own data while the graduate students usually have or are in the process of obtaining data that they want to analyze. Therefore, we would either have separate sections for the graduate and undergraduate students or we would have a combined lecture but separate recitation section where the students would apply what they have learned in the lecture portion of the class. The graduate students would be encouraged to develop projects that are relevant to their research while the undergraduates would work in groups on projects designed by the instructor.
• This course was designed to address the students need to analyze ‘omic’ data sets at University of South Carolina
• It was divided into seven modules with practical tasks at the end of each module.
• Students designed their projects and presented it as papers, posters and talks at The ISCB- RSG-SEUSA symposium.
Dataset 1: Pre-class surveys 10.5256/f1000research.16310.d21886325
Dataset 2: Post-class surveys 10.5256/f1000research.16310.d21886426
The authors have posted the pre-class survey answers of students who have consented to have their responses published anonymously. All identifying information has been edited from the responses. The post–class survey responses are given as a feedback to the instructors, also anonymously, through an online survey carried out by the university.
The authors would like to thank Dr. Phillip Buckhaults for the design, conception and delivery of the lectures on “Cancer Genomics”. The authors would also like to thank all the attendees, participants and professors of the Departments of Biological Sciences and Computer Science of University of South Carolina for participating in the first ‘ISCB-RSG-SEUSA’ symposium held this past December of 2017 at Columbia, SC.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
No
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Protein structural bioinformatics, protein structure and function prediction and analysis, and protein interactions. Programming and software development. Science of learning, educational psychology, cognitive sciences, and (bioinformatics) curriculum development.
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioinformatics education, bioinformatics
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
References
1. Williams J, Drew J, Galindo-Gonzalez S, Robic S, et al.: Barriers to Integration of Bioinformatics into Undergraduate Life Sciences Education. bioRxiv. 2017. Publisher Full TextCompeting Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 1 25 Sep 18 |
read | read | read |
Click here to access the data.
Spreadsheet data files may not format correctly if your computer is using different default delimiters (symbols used to separate values into separate cells) - a spreadsheet created in one region is sometimes misinterpreted by computers in other regions. You can change the regional settings on your computer so that the spreadsheet can be interpreted correctly.
Click here to access the data.
Spreadsheet data files may not format correctly if your computer is using different default delimiters (symbols used to separate values into separate cells) - a spreadsheet created in one region is sometimes misinterpreted by computers in other regions. You can change the regional settings on your computer so that the spreadsheet can be interpreted correctly.
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)