Each strand carries the same amount of information but different sets of genes Two strands are equivalent in information content Two strands are not equivalent in gene content Biological processing(duplication, transcription) goes from 5 to 3 Finding genes on one strand at a time or on two strands at the same time: one-pass or two-pass programs
5‘ 3‘ 3‘ 5‘ ▪Each strand carries the same amount of information, but different sets of genes. ▪Two strands are equivalent in information content. ▪Two strands are not equivalent in gene content. ▪Biological processing (duplication, transcription) goes from 5’ to 3’. ▪Finding genes on one strand at a time or on two strands at the same time: one-pass or two-pass programs
P Genomic dna transcribe RNA PolⅡ+ Pre-mRNA splicesome splice uu2u4usU6RNP mRNA 5-UTR 3'-UTRI translate ribosome AA seq( protein primary seq) +i elong. facto fold term Protein fold C chaperonin
5’-UTR 3’-UTR transcribe Genomic DNA Pre-mRNA splice mRNA translate AA seq ( protein primary seq ) fold Protein fold start stop 5’ 3’ RNA Pol II +… splicesome u1u2u4u5u6RNP ribsome init. + elong. factors term. chaperonine
Three Scales of search Local: Signals with minimal signature(start, stop splicing); movable signals(caps, promoters, polyAs branching points, some very weak)---clustering discrimination analysis, various statistical models Intermediate: exons, introns, intergenic ---Markov semi-Markov, hidden-Markov models intron length distribution Global: optimal combination of the above dynamic programming
Three Scales of Search • Local: signals with minimal signature (start, stop, splicing); movable signals (caps, promoters, polyAs, branching points, some very weak) --- clustering, discrimination analysis, various statistical models • Intermediate: exons, introns, intergenic --- Markov, semi-Markov, Hidden-Markov models; intron length distribution • Global: optimal combination of the above --- dynamic programming
Trans lation {()【(.)(.)(.)】()} Signals: t transcription start( downstream of promoters) transcription end(upstream of poly-A) translation start(ctg, 1/64 in a random seq translation end(tag, tga, taa, 3/64 splicing donor site(minimal signal=gt, 1/16 splicing accepter site(ag, 1/16) branching point(very weak.a.)
{()【(.)(.)(.)】()} Signals: • { transcription start (downstream of promoters) • } transcription end (upstream of poly-A) • 【 translation start (ctg, 1/64 in a random seq.) • 】 translation end (tag, tga, taa, 3/64) • ( splicing donor site (minimal signal=gt, 1/16) • ) splicing accepter site (ag, 1/16) • · branching point (very weak …a…) Transcription Translation Translation Transcription start start end end
Trans lation Translation {()【(.)(.)(.)】()} First exon ) Internal exon 】 Last exon i( Non-coding 5'exon )【Non- coding5’exon Intron Non-coding 3'exon(rare) Non-coding 3 exon(rare) Intergenic region
{()【(.)(.)(.)】()} • 【( First exon • )( Internal exon • )】 Last exon • {( Non-coding 5’ exon • )【 Non-coding 5’ exon • (.) Intron • 】( Non-coding 3’ exon (rare) • )} Non-coding 3’ exon (rare) • }{ Intergenic region Transcription Translation Translation Transcription start start end end