18.417: Introduction to Computational Molecular Biology

With the availability of genomic, expression, and structural data, math and computer science have changed the face of modern biology. This course introduces the basic algorithms used to understand genetic basis of life at a molecular level. We will focus on sequence alignment algorithms: dynamic programming, hashing, suffix trees with their applicability in various biological regimes. We then give an introduction to clustering applied to phylogeny and microarray experiments. We conclude with an introduction to probabilistic algorithms and machine learning as applied to gene annotation, motif discovery, and recombination analysis.


Instructor Ross A. Lippert
Time and Place Tuesdays and Thursdays, 1-2:30pm, room 2-139 ( Map1 Map2 floorplan )
Prerequisites 6.001 or 18.410J/6.046J or permission of instructor. No biology background is assumed. Familiarity in some reasonable programming language a must.
Textbook An Introduction to Bioinformatics Algorithms
Lecturers Ross Lippert
TAs Gopal Ramachandran
Grading 70% homework, 30% final project
Contact Email Office Office hours Phone
Ross lippert@math.mit.edu 2-335 TR4 617-253-7905
Gopal gsr@MIT.EDU ?? ?? ???

News


Schedule

Class days in red , holidays in blue , reg/add/drop dates in green

September 2005

Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30

October 2005

Su Mo Tu We Th Fr Sa
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31

November 2005

Su Mo Tu We Th Fr Sa
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30

December 2005

Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Lectures

This is mostly the schedule. Final topics are a bit in flux.
Introduction to biology
Sep.08 The central dogma
  • The DOE primer on molecular biology
  • Section 1: Enumerative solutions: partial digest problem and median strings
    Sep.13 Partial digest problem [Ch.4]
    Sep.15 Motifs and median strings [Ch.4]
  • A review paper on regulatory motifs
  • Section 2: Dynamic programming: sequence alignments
    Sep.20 Global string alignment [Ch.6]
  • The original PAM paper
  • A classic on global alignment
  • A paper on exact multi-alignment
  • Sep.22 Local alignment [Ch.6]
  • More on alignment statistics
  • Sep.27 Spliced alignment [Ch.6]
  • A paper on the subtle problem of aligning proteins to DNA
  • Sliced alignment with SIM4
  • Sep.29 More efficient alignment [Ch.7]
  • A recent paper on sub-quadratic time alignment
  • Gene Myers on the state of alignment in 1991
  • Section 3: Graphs: sequencing genes and proteins
    Oct.04 Genomics and SBH graphs [Ch.8]
  • A nice paper on getting SBH to work
  • Oct.06 Guest lecturer: Michael Kleber, ambiguous assemblies and ALLPATHS
    Oct.13 Peptide Graphs[Ch.8]
  • Scoring scheme for MS/MS explanations based on dynamic programming
  • A more advanced treatment of the MS spectrum graph
  • Section 4: Pattern matching: exact macthes, gapless alignments, and BLAST
    Oct.18 Exact pattern matching with DFAs [Ch.9]
    Oct.20 Suffix Trees [Ch.9]
  • Early application of suffix trees
  • A review of suffix trees
  • Oct.25 Suffix Arrays and BWTs[Ch.9]
  • A nice paper on CSAs/BWTs
  • A nice k-mismatches paper which brings together these Suffix-things
  • Oct.27 BLAST [Ch.9]
  • The BLAST paper
  • BLAST statistics and PAMs
  • Introduction of gapped seed filtration
  • Section 5: Clustering: microarrays and phylogeny
    Nov.01 Clustering [Ch.10]
    Nov.03 Trees [Ch.10]
  • Neighbor Joining
  • A nice review of phylogenetic analysis
  • A very complicated paper on phylogeny
  • A review of coallescent theory in biology
  • Nov.08 Guest lecturer: Hasan Osu: on microarrays
  • The classic AML/ALL cancer classification paper
  • Finding weak signals in microarray data
  • A review article on mining microarray data
  • Nov.10 Trees continued [Ch.10]
    Section 6: Probabilistic models and machine learning: gene annotation and evolution
    Nov.15 Hidden Markov Models [Ch.11]
  • An HMM paper, focused on recognizing folds
  • Nov.17 Hidden Markov Models [Ch.11]
  • One of the GENSCAN papers
  • An HMM searching for coding RNA
  • Nov.22 Gibbs Sampling[Ch.12]
  • A seminal paper on Gibbs sampling
  • Nov.29 Random Projections [Ch.12]
  • The original paper on random projections
  • Dec.01 MCMC and Bayesian networks
  • A probabilistic method to phase haplotype data
  • Another probabilistic method to phase haplotype data
  • A bayesian network to phase and to recover recombination sites
  • Finally
    Dec.06 TBA
    Dec.08 Protein structure (guest lecturer: Bonnie Berger)
    Dec.13 Presentations of final projects

    Homework and Grading

    Problem Sets:

    Final Project:

    There will be 6 problem sets in all, having different weights, roughly in proportion to the amount of class time spent on the section.

    A link might not work if the problem set has not yet been assigned. If a link fails and the problem set has been assigned, then email me.

    Sep.27 Problem set #1
    Oct.13 Problem set #2
    Oct.20 Problem set #3
    Nov.03 Problem set #4
    Nov.15 Problem set #5
    Dec.01 Problem set #6
    (last one)

    This page will contain lectures and handouts for Fall 2005. Lecture notes from previous offerings can be found online for

    I will primarily be following the material of last year's course in OCW so the first link should be a good source of lecture notes for most of the lectures. There will be modifications of problem sets and probably some greater divergence between now and then towards the end of the term.

    The html slide-shows I give in lecture will be found in this directory .

    I believe that our text I have selected is the best one for this course. That said, there is, in my humble estimation, no great text for this course in print at this time. There will be typos and other warts, beware.

    Further Reading: Books

    Further Reading: Links

    Some links you can follow to find alternative readings.


    Ross A. Lippert, instructor

    email

    homepage