791- Lecture #3 Michael Yaffe More Multiple Sequence Alignment and Motif Scanning, Database Searching R 0导中o998:求8 Computed _iing f and mmkelago(Schneider Stephe ns, 199o
7.91 – Lecture #3 More Multiple Sequence Alignment -andMotif Scanning, Database Searching Michael Yaffe
Outline Multiple Sequence Alignment -Carillo Lipman Clustal(W Position-Specific Scoring Matrices(PSSM) Information content, Shannon entropy Sequence logos Hidden markov models Other approaches: Genetic algorithms expectation maximization MEME, Gibbs sampler FASTA, Blast searching, Smith-Waterman ·Psi- Blast Reading- Mount p.139-150,152157,161-171,185-198
Outline • Multiple Sequence Alignment - Carillo & Lipman, Clustal(W) • Position-Specific Scoring Matrices (PSSM) • Information content, Shannon entropy • Sequence logos • Hidden Markov Models • …Other approaches: Genetic algorithms, expectation maximization, MEME,Gibbs sampler • FASTA, Blast searching, Smith-Waterman • Psi-Blast Reading - Mount p. 139-150, 152-157, 161-171, 185-198
Multiple Sequence Alignments Sequences are aligned so as to bring the greatest number of single characters into register If we include gaps, mismatches then even dynamic programming becomes limited to 3 sequences unless they are very short.. need an alternative approach Why?
Multiple Sequence Alignments • Sequences are aligned so as to bring the greatest number of single characters into register. • If we include gaps, mismatches, then even dynamic programming becomes limited to ~ 3 sequences unless they are very short….need an alternative approach… Why?
Consider the 2 sequence comparison an o(mn) problem-order n2 i=01 Gap V c 32 j0123456 04-8-16-24 -3:8 8…44-4→12→-20→28 16-671→-9→17 241469 3222-14 30 0-30-22 133 48-38-30-15 23
Consider the 2 sequence comparison …..an O(mn) problem – order n2 i =0 1 2 3 4 5 j = Gap V D S C Y 0 0 4 -8 -4 -3 -8 -16 -24 -32 -40 -8 1 -8 4 -12 -20 -28 2 -16 -6 7 2 -1 -9 -17 3 -24 -14 1 -7 3 -6 9 4 -32 -22 -14 -30 3 0 5 -40 -22 -7 1 13 3 6 -48 -38 -30 -15 5 23
For 3 sequences. ARDE SHGLLENKLLGCDSMRWE GRDYKMALLEOWILGCD-MRWD SRDW--ALIEDCMV-CNEFRWD An o(mni problem! Consider sequences each 300 amino acids Uh oh !!! 2 sequences-( 300)2 our polynomail problem exponen 3 sequences-()3 but for sequences- 300)v
For 3 sequences…. ARDFSHGLLENKLLGCDSMRWE .::. .:::. .:::: :::. GRDYKMALLEQWILGCD-MRWD .::. ::.: .. :. .::: SRDW--ALIEDCMV-CNFFRWD An O(mnj) problem ! Consider sequences each 300 amino acids Uh Oh !!! 2 sequences – (300)2 Our polynomail problem 3 sequences – (300)3 Just became exponential! but for v sequences – (300)v