Variables Affecting Motif Finding gcggaagagggcactagcccatgtgagagggcaaggacca atctttctcttaaaaataacataattcagggccaggatgt gtcacgagctttatcctacagatgatgaatgcaaatcagc taaaagataatatcgaccctagcgtggcgggcaaggtgct gtagattcgggtaccgttcataaaagtacgggaatttcgg L avg sequence length tatacttttaggtcgttatgttaggcgagggcaaaagtca ctctgccgattcggcgagtgatcgaagagggcaatgcctc aggatggggaaaatatgagaccaggggagggccacactgc acacgtctagggctgtgaaatctctgccgggctaacagac N=no of sequences gtgtcgatgttgagaacgtaggcgccgaggccaacgctga atgcaccgccattagtccggttccaagagggcaactttgt gcgggcggcccagtgcgcaacgcacagggcaaggttta= info content of motif gtcgcctaccctggcaattgtaaaacgacggcaatgttcg cgtattaatgataaagaggggggtaggaggtcaactcttc aatgcttataacataggagtagagtagtgggtaaactacg tctgaaccttctttatgcgaagacgcgagggcaatcggga W=motif width tgcatgtctgacaacttgtccaggaggaggtcaacgactc cgtgtcatagaattccatccgccacgcggggtaatttgga tcccgtcaaagtgccaacttgtgccggggggctagcagct acagcccgggaatatagacgcgtttggagtgcaaacatac acgggaagatacgagttcgatttcaagagttcaaaacgtg cccgataggactaataaggacgaaacgagggcgatcaatg ttagtacaaacccgctcacccgaaaggagggcaaatacct agcaaggttcagatatacagccaggggagacctataactc gtccacgtgcgtatgtactaattgtggagagcaaatcatt
Variables Affecting Motif Finding gcggaagagggcactagcccatgtgagagggcaaggacca atctttctcttaaaaataacataattcagggccaggatgt gtcacgagctttatcctacagatgatgaatgcaaatcagc taaaagataatatcgaccctagcgtggcgggcaaggtgct gtagattcgggtaccgttcataaaagtacgggaatttcgg L = avg. sequence length tatacttttaggtcgttatgttaggcgagggcaaaagtca ctctgccgattcggcgagtgatcgaagagggcaatgcctc aggatggggaaaatatgagaccaggggagggccacactgc acacgtctagggctgtgaaatctctgccgggctaacagac N = no. of sequences gtgtcgatgttgagaacgtaggcgccgaggccaacgctga atgcaccgccattagtccggttccaagagggcaactttgt ctgcgggcggcccagtgcgcaacgcacagggcaaggttta tgtgttgggcggttctgaccacatgcgagggcaacctccc I = info. content of motif gtcgcctaccctggcaattgtaaaacgacggcaatgttcg cgtattaatgataaagaggggggtaggaggtcaactcttc aatgcttataacataggagtagagtagtgggtaaactacg tctgaaccttctttatgcgaagacgcgagggcaatcggga W = motif width tgcatgtctgacaacttgtccaggaggaggtcaacgactc cgtgtcatagaattccatccgccacgcggggtaatttgga tcccgtcaaagtgccaacttgtgccggggggctagcagct acagcccgggaatatagacgcgtttggagtgcaaacatac acgggaagatacgagttcgatttcaagagttcaaaacgtg cccgataggactaataaggacgaaacgagggcgatcaatg ttagtacaaacccgctcacccgaaaggagggcaaatacct agcaaggttcagatatacagccaggggagacctataactc gtccacgtgcgtatgtactaattgtggagagcaaatcatt …
How is the 5'ss recognized? U1 SnRNA CCAUUCAUAG-5 1|| Pre-mRNA 5 UUCGUGAGU c G ≤
How is the 5’ss recognized? U1 snRNA 3’ ………CCAUUCAUAG-5’ |||||| Pre-mRNA 5’…………UUCGUGAGU…………… 3’
RNA Energetics i CCAUUCAUAG-5′ 1|| Free energy of helix formation 5..CGUGAGU..3 derives from G G base pairing U U base stacking U GpA AY CpU Y A G Doug Turner's Energy rules A 1.30 2.40 2.10 1.00 T-0.90 1.30
RNA Energetics I …CCAUUCAUAG-5’ |||||| Free energy of helix formation 5’…CGUGAGU……… 3’ derives from: G A G • base pairing: > > C U U • base stacking: 5' --> 3' UX AY |G p A | 3' <-- 5’ C p U X Y A C G U A . . . -1.30 Doug Turner’s Energy Rules: C . . -2.40 . G . -2.10 . -1.00 T -0.90 . -1.30
RNA Energetics II npNpNpNpNpNpn Lots of consecutive XX NpNpNpNpNpNpN base pairs-good NpnpNpNpnpnpN X X Internal loop -bad NpnpNpNpnpnpN npNp NpNpNpN Terminal base pair X X X not stable- bad NpnpnpnpnpNpN Generally a will be more stable than B or c
RNA Energetics II N p N p N p N p N p N p N A) x | | | | xx N p N p N p N p N p N p N B) N p N p N p N p N p N p N x | | x | | x N p N p N p N p N p N p N N p N p N p N p N p N p N C) x | | | x | x N p N p N p N p N p N p N Lots of consecutive base pairs - good Internal loop - bad Terminal base pair not stable - bad Generally A will be more stable than B or C
Conditional Frequencies in 5'ss Sequences ≤ 1123456 5ss which have g at +5 5'ss which lack G at +5 Pos-1+3+4|+6 Pos-1+3+4|+6 A 447514 A2815122 C43418 C 32820 G78511319 G9715930 T93949 T021228 Data from Burge, 1998"Computational Methods in Molecular Biology
Conditional Frequencies in 5’ss Sequences -1123456 5’ss which have G at +5 5’ss which lack G at +5 Pos -1 +3 +4 +6 A 9 44 75 14 C 4 3 4 18 G 78 51 13 19 T 9 3 9 49 Pos -1 +3 +4 +6 A 2 81 51 22 C 1 3 28 20 G 97 15 9 30 T 0 2 12 28 Data from Burge, 1998 “Computational Methods in Molecular Biology