What is a dNa (RNa)Motif? a pattern common to a set of dNa (RNa) sequences that share a common biological property, such as being binding sites for a regulatory protein Common motif adjectives exact/precise versus degenerate strong versus weak good versus lousy) high information content versus low information content
What is a DNA (RNA) Motif ? A pattern common to a set of DNA (RNA) sequences that share a common biological property, such as being binding sites for a regulatory protein Common motif adjectives: exact/precise versus degenerate strong versus weak (good versus lousy ) high information content versus low information content
Information Theory So we end up with Shannon's famous formula: 20 H=·∑PogP Where H= the"Shannon Entropy In bits per position in the alignment What does this mean ??? H is a measure of entropy or randomness or disorder it tells us how much uncertainty there is for the different amino acid abundances at one position in a sequence motif This slide courtesy of M. yaffe
Information Theory So we end up with Shannon’s famous formula: 20 H = - ∑Pi(log 2 Pi) Where H = the “Shannon Entropy” In bits per position in the alignment i=1 What does this mean??? H is a measure of entropy or randomness or disorder ….it tells us how much uncertainty there is for the different amino acid abundances at one position in a sequence motif This slide courtesy of M. Yaffe
Information Theory Courtesy of M. yaffe GDSFHQ- VSHG SDAFHQYISEG GDSYWNFISFG FHQEMSEG LDSY啊NY烈sFG Assuming all 20 amino acids equally possible before =4.32 Therefore, this position encodes 4 32-0 =4.32 bits of information! Another position in the motif that contains all 20 amino acids Before =4, 32, Hafter 4.32 Therefore, this position encodes 4.. 32=0 bits of information
Information Theory Courtesy of M. Yaffe ….G D S F H Q FV S HG….. …. S D A F HQY I S FG….. ….G D S Y WN F L S FG….. …. S D S F H Q FM S FG….. ….L D S Y WN YA S FG….. Assuming all 20 amino acids equally possible: Hbefore = 4.32, Hafter=0 Therefore, this position encodes 4.32-0 =4.32 bits of information! Another position in the motif that contains all 20 amino acids… Hbefore = 4.32, Hafter=4.32 Therefore, this position encodes 4.32-4.32 =0 bits of information!
Information Content of a dna motif Information at position j: I, =Before -Hafter Motif probabilities (k=A, C, G, T) Background probabilities: Q=4(=A, C,G, D) =29k0g9∑plgD=2H motif=2I= 2w-Hmotif(motif of width w bases) og base 2 gives entropy/information in bits
Information Content of a DNA Motif Information at position j: Ij = Hbefore - Hafter Motif probabilities: pk (k = A, C, G, T) 1 Background probabilities: qk = 4 (k = A, C, G, T) 4 4 Ij = − ∑ qk log qk - − ∑ pk 2 log 2 pk = 2 - Hj k =1 k =1 w Imotif = ∑ Ij = 2w - Hmotif (motif of width w bases) j =1 Log base 2 gives entropy/information in ‘bits’