GS559: Introduction to Statistical and Computational Genomics (Winter 2012)

   Jim Thomas,
   Elhanan Borenstein,

Schedule: Tues. Thurs, 3:30-4:50, Hitchcock 220. First class Jan. 3, last class Mar. 9.



» Remember: The final exam will take place in class, Thursday, March 8 (last class of the quarter). You are allowed to use any static resource (i.e., books, notes).

» Due to snow days the problem set schedule has changed - problem sets 3 and 4 will be merged (and shortened a bit) and will be available by Tues. Jan. 24 to give you extra time. The Python problems are MUCH harder so don't procrastinate.


Test/Demo Files

The following files are used in some of the in-class exercises and demos.

small.fasta (these are text files despite the .fasta extension)

Lectures and Reading:

#DateLecture TopicProgramming TopicReading
101/03 Overview of course. Introduction to sequence comparison. BLAST, alignment scoringPDF,PP Introduction to Python. Interpreter, objects, types, variables, command linePDF,PP [1, 2]
201/05 Sequence alignment - dynamic programmingPDF,PP StringsPDF,PP
301/10 Sequence alignmentPDF,PP Numbers, lists, tuplesPDF,PP
401/12 Sequence alignment - protein score matricesPDF,PP File input-ouput, if-then-elsePDF,PP
501/17 Sequence alignment - signficance of similarity scoresPDF,PP For loopsPDF,PP
601/19 Signficance of similarity scores continuedPDF
701/24 Whole genome alignments, Sequence trees - introductionPDF,PP While loops, More on loops, Programming efficientlyPDF,PP PDF,PP
801/26 Sequence trees - distance treesPDF,PP Dictionaries (hash maps)PDF,PP [3]
901/31 ParsimonyPDF,PP FunctionsPDF,PP
1002/02 Small parsimonyPDF,PP Functions as arguments, sortingPDF,PP
1102/07 Gene ontology and functional enrichmentPDF,PP More on functions, modulesPDF,PP
1202/09 Gene set enrichment analysisPDF,PP RecursionPDF,PP [4]
1302/14 Gene expression: Clustring PDF,PP Regular expressionsPDF,PP
1402/16 Gene expression: K-mean clustring PDF,PP More regular expressionsPDF,PP
1502/21 Biological networks; Dijkstra algorithmPDF,PP Classes and objectsPDF,PP
1602/23 Degree distribution and network motifsPDF,PP More on classes and objectsPDF,PP
1702/28 Gene predictionPDF ExceptionsPDF
1803/02 Artificial neural networksPDF,PP More on classes, BiopythonPDF,PP
1903/07 ProjectPDF,PP
2003/09 Final Exam


padlock  Electronic access to journals is generally free from on-campus computers. For off-campus access, follow the "[offcampus]" links or look at the library "proxy server" instructions.

  1. Noble, WS, "A quick guide to organizing computational biology projects." PLoS Comput. Biol. 5 (2009) e1000424. Pmid: 19649301 [Offcampus]
  2. Dudley, JT and Butte, AJ, "A quick guide for developing effective bioinformatics programming skills." PLoS Comput. Biol. 5 (2009) e1000589. Pmid: 20041221 [Offcampus]
  3. How dictionaries work (aka hash tables or hash maps)
  4. Subramanian et al., "Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles"PNAS102(43) (2005)

Python Resources:

Regular Expressions
"RegExPal" (For Javascript rather than Python, but similar and quite handy. Try it!)
Python Books
Python for Software Design: How to Think Like a Computer Scientist by Allen B. Downey. (Includes early drafts of our text book; cheaper than the published version, but less polished...)
Learning Python by Mark Lutz. O'Reilly (Very comprehensive. Much is accessible to beginners.)
Dive Into Python 3 by Mark Pilgrim. (Another online book. Based on Python 3, so some differences, and more advanced, but also free.)

Bioinformatics Books

» Biological sequence analysis: probabilistic models of proteins and nucleic acids, R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Cambridge. (Excellent reference, classics)
» Inferring Phylogenies, Joseph Felsenstein, Sinauer, 2004. (Excellent reference on this topic.)
» Introduction to Computational Genomics: A Case Studies Approach, Cristianini, Nello & Hahn, Matthew, Cambridge, 2007.
» An Introduction to Bioinformatics Algorithms, Neil C. Jones & Pavel A. Pevzner, 2004.
» Bioinformatics: Sequence and Genome Analysis, David W. Mount, Cold Spring Harbor Laboratory Press.
» Python for Bioinformatics, Sebastian Bassi, CRC Press, 2010. (A little too advanced as a progamming book for beginners, but fine now that you're experienced.)
» Python for Bioinformatics, Jason Kinser, Jones and Bartlett, 2009. (Ditto.)

James H. Thomas
Department of Genome Sciences
University of Washington
Elhanan Borenstein
Departments of Genome Sciences
University of Washington