7. 91-Lecture #4 Michael Yaffe Database Searching Molecular Phylogenetics ABCD ABc ((A,B)c)D)
7.91 – Lecture #4 Database Searching & Molecular Phylogenetics A A B B C D D (((A,B)C)D) C Michael Yaffe
Outline FASTA, Blast searching, Smith-Waterman ·Psi- Blast Review of genomic dNA structure Substitution patterns and mutation rates Synonymous and non -synonymous substitutions · Jukes- Cantor model Kimura's Two-Parameter Model Molecular clocks Phylogenetic Trees-rooted and unrooted Distance matrix Methods Neighbor-Joining Method and Related Neighbor Methods · Maximum likelihood
Outline • FASTA, Blast searching, Smith-Waterman • Psi-Blast • Review of Genomic DNA structure • Substitution patterns and mutation rates • Synonymous and non-Synonymous substitutions • Jukes-Cantor Model • Kimura’s Two-Parameter Model • Molecular Clocks • Phylogenetic Trees – rooted and unrooted • Distance Matrix Methods • Neighbor-Joining Method and Related Neighbor Methods • Maximum Likelihood
Outline( cont) · Parsimony Branch and bound Heuristic Seaching · Consensus trees Software(PHYLIP, PAUP The tree of life Reading: Mount,p.237-280,283286,291-308
Outline (cont) • Parsimony Branch and Bound Heuristic Seaching • Consensus Trees • Software (PHYLIP, PAUP) • The Tree of Life Reading: Mount, p. 237-280, 283-286, 291-308
Database searching Problem is simple: I want to find homologues to my protein in the database How do i do it Do the obvious - compare my protein against every other protein in the database and look for local alignments by dynamic programming Uh oh! n For k sequences in the 12345678 Database 12345678. this becomes an o(mnk) 12345678 problem 12345678 12345678 12345678 essentially an o(mn) problem
Database Searching Problem is simple: I want to find homologues to my protein in the database How do I do it? Do the obvious – compare my protein against every other protein in the database and look for local alignments by dynamic programming Uh Oh! 1 n For k sequences in the 1 12345678…. Database 12345678…. this becomes an O(mnk) 12345678…. problem! 12345678…. 12345678…. m 12345678…. ….essentially an O(mn) problem
Database Searching Still, this can be done -- 50x slower than blast/FASTA, Smith-Waterman algorithm SSEARCH (tp. virginia. edu/pub/fasta)-do it locally But in the old days, needed a faster method 2 approaches- Blast, FASTA -both heuristic l.e. tried and true)-almost always finds related Proteins but cannot guarantee optimal solution FASTA: Basic Idea Search for matching sequence patterns or words Called k-tuples, which are exact matches of"k"characters between the two sequences i.e. RW= 2-tuple Seq 1: AHFYRWNKLCV Seq 2: DRWNLFCVATYWE
Database Searching Still, this can be done - ~ 50x slower than Blast/FASTA, Smith-Waterman algorithm… SSEARCH (ftp.virginia.edu/pub/fasta) – do it locally! But in the old days, needed a faster method… 2 approaches – Blast, FASTA – both heuristic (i.e. tried and true) – almost always finds related Proteins but cannot guarantee optimal solution FASTA: Basic Idea 1- Search for matching sequence patterns or words Called k-tuples, which are exact matches of “k” characters between the two sequences i.e. RW = 2-tuple Seq 1: AHFYRWNKLCV Seq 2: DRWNLFCVATYWE