Genome 373: Genomic Informatics

   Elhanan Borenstein, elbo [ a t ]
   Jim Thomas, jht [ a t ]

Teaching Assistant:
   Tanya Grancharova, tgranch [ a t ]

Schedule: MWF, 10:30-11:20, Hitchcock 220.


This course is intended to introduce students to the breadth of problems and methods in computational analysis of genomes, arguably the single most important new area in biological research. The specific subjects will include large-scale comparative genome structure, sequence alignment and search methods, gene prediction, evolutionary relationships among genes, and next-generation sequencing. The course will include one mid-term exam and a final exam. Other graded assignments will be problem sets, due most weeks. Some problems will include computer programming and most will involve extensive use of web resources for data set mining.


» Midterm: Monday, May 5, 2014. *** Note, you are allowed to bring a two-sided cheat sheet ***
» The class has filled up and we have a wait-list. If you are on the wait-list, we suggest that you come to the first day of class. Although we are not planning on expanding the class past the current count, based on our past experience there are often several individuals who drop, such that we may be able to accomodate a subset (or, ideally, all) of the wait list.


Problem Set 1
Problem Set 2
Problem Set 3
Problem Set 4
Problem Set 5
Problem Set 6

Problem sets are posted online each Wednesday and are due the following Wednesday by 5PM. Homework is a mix of written problems and programming.
Grades: 50% home assignments, 20% midterm, 30% final exam.

Test/Demo Files

The following files may be used in some of the in-class exercises and demos or in the home assignments.


Lectures and Resources:
(Note: Links to resources will become live as the course progresses)

1 (Thomas)Mar. 31, Apr. 2, 4 Intro to bioinformatics; Intro to Python; Python programming: script, print, variables, strings, lists, tuples, files; Lecture 1, Lecture 2, Lecture 3, Quiz section
2 (Borenstein)Apr. 7, 9, 11 Sequence alignment; Dynamic programming; Global alignment; Local alignment; Lecture 1, Lecture 2, Lecture 3 Quiz section
3 (Borenstein)Apr. 14, 16, 18 Score matrices; Trees; Distance trees; UPGMA; NJ; Parsimony (small and large); Lecture 1, Lecture 2, Lecture 3 Quiz section
4 (Borenstein)Apr. 21, 23, 25 Search heuristics; Microarray clustering algorithms; Hierarchical clustering; K-mean clustering; Lecture 1, Lecture 2, Lecture 3 Quiz section
5 (Borenstein)Apr. 28, 30, May 3 GO annotation; Enrichment analysis; GSEA; Lecture 1, Lecture 2, Lecture 3 Quiz section
6 (Borenstein)May 5, 7, 9 Midterm; Biological networks; Dijkstra's algorithm; Intro to molecular evolution; Lecture 1 Lecture 2 Quiz section
7 (Thomas)May 12, 14, 16 Neutral evolution and mutation; Purifying selection; Web page supplement (lectures, second part of course) Quiz section
8 (Thomas)May 19, 21, 23 Phylogeny and molecular methods; Deep branches; Lineage sorting and hybridization, coalescent; Quiz section
9 (Thomas)May 26, 28, 30 Positive (Darwinian) selection; Positive selection dN/dS methods; Positive selection population methods; Quiz section
10 (Thomas)Jun 2, 4, 6 Positive selection examples 1; Positive selection examples 2; Connecting genotype and phenotype - the challenge; Quiz section


Electronic access to journals is generally free from on-campus computers. For off-campus access, follow the "[offcampus]" links or look at the library "proxy server" instructions.

  1. Noble, WS, "A quick guide to organizing computational biology projects." PLoS Comput. Biol. 5 (2009) e1000424. Pmid: 19649301 [Offcampus]
  2. Dudley, JT and Butte, AJ, "A quick guide for developing effective bioinformatics programming skills." PLoS Comput. Biol. 5 (2009) e1000589. Pmid: 20041221 [Offcampus]
  3. How dictionaries work (aka hash tables or hash maps)
  4. Subramanian et al., "Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles"PNAS 102(43) (2005)

Python Resources:

Regular Expressions
"RegExPal" (For Javascript rather than Python, but similar and quite handy. Try it!)
Python Books
Python for Software Design: How to Think Like a Computer Scientist by Allen B. Downey. (Includes early drafts of our text book; cheaper than the published version, but less polished...)
Learning Python by Mark Lutz. O'Reilly (Very comprehensive. Much is accessible to beginners.)
Dive Into Python 3 by Mark Pilgrim. (Another online book. Based on Python 3, so some differences, and more advanced, but also free.)

Bioinformatics Books

» Biological sequence analysis: probabilistic models of proteins and nucleic acids, R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Cambridge. (Excellent reference, classics)
» Inferring Phylogenies, Joseph Felsenstein, Sinauer, 2004. (Excellent reference on this topic.)
» Introduction to Computational Genomics: A Case Studies Approach, Cristianini, Nello & Hahn, Matthew, Cambridge, 2007.
» An Introduction to Bioinformatics Algorithms, Neil C. Jones & Pavel A. Pevzner, 2004.
» Bioinformatics: Sequence and Genome Analysis, David W. Mount, Cold Spring Harbor Laboratory Press.
» Python for Bioinformatics, Sebastian Bassi, CRC Press, 2010. (A little too advanced as a progamming book for beginners, but fine now that you're experienced.)
» Python for Bioinformatics, Jason Kinser, Jones and Bartlett, 2009. (Ditto.)

Elhanan Borenstein
Department of Genome Sciences
University of Washington
Jim Thomas
Departments of Genome Sciences
University of Washington