791/7.36/BE490 Lecture #4 Mar.4,2004 Markov hidden markov models for DNA Sequence Analysis Chris burge
7.91 / 7.36 / BE.490 Lecture #4 Mar. 4, 2004 Markov & Hidden Markov Models for DNA Sequence Analysis Chris Burge
Organization of topics Dependence Lecture Object Model Structure 5678910111 Weight 3/2 Matrix Independence G SECCAA Model 10201908060.9020.000.0ot Hidden Markov Local 3/4 Dependence Pramater Stop ass 3ss Model 3/9 Energy model Non-local I Covariation ModelDependence Anticodon
Organization of Topics Model Dependence Lecture Object Structure Weight Matrix Model Hidden Markov Model 3/2 Independence Local 3/4 Dependence Energy Model, Covariation Model Non-local Dependence 3/9
Markov Hidden markov models for dna Markov Models for splice sites Hidden Markov models looking under the hood The Viterbi algorithm Real World HMMs See Ch, 4 of Mount
Markov & Hidden Markov Models for DNA • Hidden Markov Models - looking under the hood See Ch. 4 of Mount • Markov Models for splice sites • The Viterbi Algorithm • Real World HMMs
Review of DNA Motif Modeling Discovery WMMs for splice sites Information Content of a motif The Motif Finding/Discovery Problem The Gibbs Sampler TThe gibbs Sampling Algorithm Multimedia Experience Motif Modeling -beyond Weight Matrices See Ch, 4 of Mount
Review of DNA Motif Modeling & Discovery • Information Content of a Motif See Ch. 4 of Mount • WMMs for splice sites • The Motif Finding/Discovery Problem • The Gibbs Sampler • Motif Modeling - Beyond Weight Matrices
Information content of a dna/rna motif 3-2-1:123456 f, freq of nt k at position G 2Ml GeEc Shannon Entropy yH(O)=∑ flog, ( f)(ity Information/position )=2-H(0=2+2/log()=/log) Motif containing m bits of info. Will occur approximately once per 2 bases of random sequence
Information Content of a DNA/RNA Motif -3 -2 -1 1 2 3 4 5 6 f k = freq. of nt k at position Shannon Entropy H( G f ) = − ∑ f log 2( f k k ) (bits) k Information/position ) = 2 + ∑ f log 2 ( f ) = ∑ f log 2( 1 f k ) k k k (bits) k k 4 G f G I( f ) = 2 − H( Motif containing m bits of info. will occur approximately once per 2 m bases of random sequence