A Developmental
Approach to Integrating Bioinformatics with
Laboratory Experiments in Several Undergraduate Courses.
Jeffrey D. Newman,
Lycoming College, Williamsport, PA 17701
Project Website: http://www.lycoming.edu/~newman/models.html
Abstract
The explosion of freely-available biological information combined with comprehensive analytical tools has provided unique opportunities for students to explore molecular structure and function. In this project, the use of bioinformatics increases in sophistication as the students progress through Introductory Biology to sophomore-level Microbiology and Genetics courses and is tightly integrated with particular experiments. In addition to "wet-lab"-based isolation and analysis of plasmid DNA, Introductory Biology students are provided with the sequence of the plasmid from which they can create restriction maps to predict fragment sizes, identify ORFs, and translate them into protein. Tutorials explaining 3-D models of b -lactamase and green fluorescent protein are also available. The PCR/cloning experiment in Genetics involves the amplification of a segment of the clotting factor IX gene and a -complementation-assisted cloning. To support this exercise, students retrieve the cDNA and genomic sequences from GenBank, compare them to identify introns and exons, use the primer sequences to identify the amplified sequence, insert this sequence into the vector and analyze the plasmid as described for the Introductory Biology course. Our Microbiology courses unknown microbe identification experiment includes PCR amplification and sequencing of 16S rDNA to supplement the standard biochemical approaches. Similarity searches are used to identify the organism and multiple sequence alignments permit the construction of phylogenetic trees.
Project History/Overview
The Lycoming College Molecular Biology and Bioinformatics Project began with the development of a senior-level Molecular Biology course in 1996. Techniques used in the lab component of this course trickled down to sophomore/junior level Genetics and Microbiology courses and finally to the freshman Introductory Biology course. Although simple BLAST searches using student-generated sequence data were originally used to identify genes or microbes, this year we have systematically integrated bioinformatics into each of these courses to support the laboratory activities. The key development that triggered the incorporation of bioinformatics into our curriculum was the network installation of DNAstars Lasergene sequence analysis suite. A variety of analytical tools are available on the internet, however these are often cumbersome, scattered throughout many different sites, not particularly well documented and difficult for students to learn. Commercial sequence analysis software such as Lasergene, GCG, Vector NTI and MacVector provide a consistent interface between applications, file compatibility, long term stability, web integration and excellent documentation.
The primary goal of this project is to encourage students to think about and to explore information flow pathways in new ways. Because students often have significant difficulty understanding things they cannot see, a major focus of the experimental component of the project is the visualization of molecular-level processes. In introductory biology, the cutting of a plasmid into 2 fragments by a restriction enzyme can be "seen" by the presence of 2 bands in a lane on a gel. The concept of gene regulation can be "seen" when students observe that transformant colonies glow green on media with arabinose, but not without it. The use of a -complementation and blue-white screening in the Genetics experiment will help students visualize insertion mutagenesis and how this leads to gene inactivation.
Bioinformatics serves at least two important functions in the learning process. First, the analysis of molecular sequence data to interpret and derive meaning from these strings of letters is an excellent critical thinking activity. Concepts such as splicing, the genetic code and sequence homology can be understood more thoroughly by actively analyzing real sequences than by simply studying material in a textbook. A second advantage is the application of principles and concepts learned in the classroom to answer a question. Students cannot merely regurgitate memorized facts, but instead must understand and be able to integrate different types of information. These exercises in problem solving will enhance skills necessary to succeed in any field.
Mr. Green Genes (Biology 110 Introductory Biology)
Biology 110 is the first course taken by freshmen intending to major in biology or nursing and also includes non-science majors participating in the Colleges Scholars Program. The course takes a bottom-up approach by beginning with molecules and basic biochemistry, continuing through an introduction to the cell and genetics and concluding with evolution and the diversity of life. As molecular genetics is discussed during the lecture part of the course, students begin the Mr. Green Genes experiment.
This experiment is designed around the pGLO plasmid that is used in Bio-Rads Biotechnology Explorer program. The plasmid contains an ampicillin resistance gene, a modified Aequorea victoria green fluorescent protein (GFPuv) cDNA under the control of an arabinose inducible promoter, and the arabinose repressor gene. During the first week, students isolate plasmid DNA using the boiling/lysozyme/CTAB method, an inexpensive, rapid procedure that works well in student hands and yields high quality DNA. The following week, students prepare competent cells, transform their DNA into E. coli, and setup restriction digests and an agarose gel. Several incubation periods required by these protocols present an excellent opportunity to demonstrate the use of the Lasergene software and review the students bioinformatics assignment that is to be completed before the next weeks lab meeting.
Also during week 2, the use of the web browser plug-in
Chime to view 3-D protein models can be demonstrated to help
students relate the genes and amino acid sequences to the folded
structure of the protein (Figure 1). During week three, students
analyze their DNA by agarose gel electrophoresis and collect
transformation data.
The Mr. Green Genes bioinformatics assignment includes step by step instructions for completing the tasks described below with a variety of critical thinking questions scattered throughout. Students begin by retrieving the pGLO plasmid sequence from the project website (http://www.lycoming.edu/~newman/models.html). The sequence is then searched to identify open reading frames, which are subsequently translated into amino acid sequences of potential proteins. The protein sequences are used to perform a BLAST search of the NCBI database to determine the identity of each ORF. At this point, students are encouraged to browse through the list of high scoring sequences to consider the significance of the similarities and differences among these proteins. The identities are then used to annotate the plasmid sequence.

Next, the plasmid sequence is scanned for common restriction enzyme recognition sequences, which allows the students to predict the size of fragments that will be obtained by cutting the DNA with several of these enzymes in lab. Finally, all of the information is used to generate a graphic plasmid map similar to Figure 2, which they then include in their lab reports.
PCR Amplification and Cloning of a Human Clotting Factor IX Gene Fragment (Biology 222W Genetics)
The Genetics course is required of all biology majors, is a designated writing-intensive course and is usually taken during the sophomore year. The overall goal of this experiment is to amplify a small region of the students clotting factor IX gene, insert this DNA into a plasmid vector and identify recombinant plasmids. The focus on clotting factor IX encourages students to relate the lab work with their inherent interest in genetic disorders (Hemophilia B is due to a clotting factor IX deficiency), medicine, and potential applications of this work such as gene therapy. In addition, the primers were designed to also work with the mouse clotting factor IX gene. This feature allows students to use mouse DNA as a positive control, which can subsequently be cloned if PCR with their DNA fails. As an added benefit, we used the same primers for RT-PCR with mouse RNA in the senior-level Molecular Biology Course to examine the tissue specificity of factor IX expression (liver-specific) and to clone and sequence the RT-PCR product.
The strategy used in the Genetics experiment is a general one that can be applied to clone any gene or other relatively small (<10 kbp) DNA segment whose sequence is known. The PCR primers incorporate specific restriction sites that facilitate subsequent cloning into the pBluescript vector. This strategy has become tremendously important now that many entire genomes have been sequenced and the human genome project has reached the large-scale rapid sequencing phase.
The overall timetable for this experiment is as follows:
During the restriction digestions
and gel electrophoresis on week 2, there is a significant amount
of free time available to demonstrate how to complete the
bioinformatics component of the experiment. Students begin the
assignment by retrieving human clotting factor IX gene and cDNA
sequences from the National Center for Biotechnology Information
(NCBI)( http://www.ncbi.nlm.nih.gov/). After being guided through
the annotation associated with the sequences, students construct
a map of the gene to depict the miniscule amount of DNA in the
gene that actually codes for protein (Figure 3).
Students then search the gene sequence with the
primer sequences to identify the amplified fragment, select this
sequence and copy it into the appropriate location within the
pBluescript vector sequence. The remainder of the assignment is
nearly identical to the Mr. Green Genes project described above
and results in the production of a plasmid map suitable for
inclusion in the required lab report. Students are also
instructed to visit a student-made tutorial on the clotting
factor IX protein structure (Figure 4).
Ribosomal RNA Gene Amplification and Sequencing to Identify Unknown Microbes. (Biology 321 Microbiology)
Like Genetics, Microbiology is required of all Biology majors and is usually taken during the sophomore or junior year. Most undergraduate microbiology laboratory courses include some form of exercise in which students identify unknown microbes. In this eight-week long investigative laboratory exercise, students sample microbes from environments of their choice, isolate a single pure culture and conduct a series of biochemical tests for comparison to data in Bergey's Manual of Determinative Bacteriology. This relatively common approach is supplemented with a more molecular strategy. A segment of the 16S rRNA gene is PCR amplified from each students organism using primers to highly conserved regions of the gene. The PCR products are sequenced using Promegas Silver Sequencing system and compared to sequences accessible through the Ribosomal Database Project website (http://www.cme.msu.edu/RDP/html/index.html) and/or the NCBI (http://www.ncbi.nlm.nih.gov/). The use of rRNA sequencing in this traditional laboratory exercise reflects the recent paradigm shift in our understanding of microbial phylogeny.
At Lycoming College, the Microbiology lab meets twice per week for 2 hours each time. The unknown microbe identification experiment has the following timeline.
The bioinformatics component of this exercise begins with the explanation of the theory behind the design of the "Universal" 16S rRNA primers. Students are shown sequence alignments (figure 5) and are informed of the general rules for PCR primer design. We then examine how the sequences chosen meet these predefined criteria.

After the students read the sequence data for their unknown organism, they perform a BLAST search of the NCBI Genbank database to identify their organism and obtain a preliminary sequence alignment. They then recheck their gel to determine whether differences between the sequences are real or are simply reading errors. If the differences are real, students are encouraged to collect more data on the organism as an independent study research project. When the differences appear to be reading errors, students compare the biochemical data to that of the identified organism. If these results are consistent, the students can then make a positive identification.
To complete the data analysis, students must retrieve sequences from at least 10 closely related organisms, align them with experimentally generated sequence and construct a phylogenetic tree showing the relationships between the organisms.
The Lycoming College Molecular Biology and Bioinformatics Project is dynamic and rapidly evolving. This document reflects the state of the project as of the Fall, 1999 semester. Specific experimental protocols, primer sequences, student guides/ instructions and other related information, including an advanced assignment used in upper level courses, are available at the project website.
http://www.lycoming.edu/~newman/models.html
Acknowledgements: Thanks to DNAstar for providing the Lasergene software, without which this project would not be possible. I would also like to recognize the excellent support provided by Lycoming College to build a well-equipped molecular biology program. Special thanks to the students from my research lab for assisting in the design, testing and implementation of these activities.
This page was created or last
modified on 10/24/99 by Jeff Newman
and has been accessed times since 10/24/99.
| Assistant Professor | Web page: http://lyco.lycoming.edu/~newman |
| Department of Biology | Email: newman@lycoming.edu |
| Lycoming College | Phone: 570-321-4386 |
| Williamsport PA 17701 | Fax: 570-321-4073 |
© 1998, 1999 Jeffrey D. Newman