Multiple Alignment Pairwise alignment Example-4 sequences SI S2 S3 S4 6 pairwise comparisons then cluster analysis Multiple alignment following the tree from A Gaps to optimize alignment New gap to optimize alignment of(S,S4) with(S,S3) align alignments-preserve gaps
Pairwise Alignment Steps in Multiple Alignment Multiple alignment following the tree from A S1 S1 S1 6 pairwise comparisons then cluster analysis similarity align most similar pair Gaps to optimize alignment New gap to optimize alignment of (S2S4) with (S1S3) align next most similar pair align alignments-preserve gaps Example - 4 sequences S1 S2 S3 S4 S2 S2 S2 S3 S3 S3 S4 S4 S4 S1 S2 S3 S4
Progressive Alignments Note that the final msa is eXtremely dependent on the initial pairwise sequence alignments If the sequences are close in evolution and you can see the alignment- Great! If the sequences are Not close in evolution, and you CANNOT See the alignment -errors will be propogated to the final msa Has led to other approaches to do msa's that aren't so Dependent on initial states.i.e genetic algorithm
Progressive Alignments Note that the final msa is EXTREMELY DEPENDENT on the initial pairwise sequence alignments! If the sequences are close in evolution, and you can see the alignment – GREAT! If the sequences are NOT close in evolution, and you CANNOT See the alignment – errors will be propogated to the final msa Has led to other approaches to do msa’s that aren’t so Dependent on initial states….i.e. genetic algorithm
Finding patterns(i.e. motifs and domains in Multiple Sequence Analysis Block Analysis, Position Specific Scoring Matrices(PSSM) BUILD an msa from groups of related proteins BLOCKS represent a conserved region in that msa that is LAcKING IN GAPS-i.e, no insertions/deletions The bLoCKs are typically anwhere from 3-60 amino acids long based on exact amino acid matches -i.e. alignment will tolerate mismatches, but doesn't use any kind of PAM or BLOSUM matrix . in fact they generate the blosuM matrix A single protein contains numerous such BLOCKS separated by stretches of intervening sequences that can differ in length and composition These blocks may be whole domains, short sequence motifs key parts of enzyme active sites etc, etc. BLOCKS database.. so far exploration limited. Lots of stuff to probe
Finding patterns (i.e. motifs and domains) in Multiple Sequence Analysis Block Analysis, Position Specific Scoring Matrices (PSSM) BUILD an msa from groups of related proteins BLOCKS represent a conserved region in that msa that is LACKING IN GAPS – i.e. no insertions/deletions The BLOCKS are typically anwhere from 3-60 amino acids long, based on exact amino acid matches – i.e. alignment will tolerate mismatches, but doesn’t use any kind of PAM or BLOSUM matrix…in fact they generate the BLOSUM matrix! A single protein contains numerous such BLOCKS separated by stretches of intervening sequences that can differ in length and composition. These blocks may be whole domains, short sequence motifs, key parts of enzyme active sites etc, etc. BLOCKS database….so far exploration limited. Lots of stuff to probe!
Can use these conserved blocks to derive a pssm The dirty secret behind prosite! Scansite! And in a twised way Psi-BLAST! 12345 11 GDSFHYEVSHG GDAFHYYISFG GDSYHYFL SFG.。 DSFHYFMSFG GDSFHFFAFG Now build a matrix with 20 amino acids as the columns and 11 rows For the positions in the BLoCK
Can use these conserved BLOCKS to derive a PSSM • The dirty secret behind prosite! Scansite! And in a 12345………….11 twised way Psi-BLAST! ….G D S F H Y FV S HG….. ….G D A F HYY I S FG….. ….G D S Y H Y F L S FG….. …. S D S F H Y FM S FG….. ….G D S F HFFA S FG….. Now build a matrix with 20 amino acids as the columns, and 11 rows For the positions in the BLOCK
Each matrix entry is the log frequency of the amino acid occurance) at that position in the blocK. GDSFHYEVSHG GDAFHYYISFG GDSYHYFLSFG ●● 。 DSFHYFMS FG GDSFHFFASFG A D E I K ■■■■ Log(4 12345 Log(5)
Each matrix entry is the log(frequency of the amino acid occurance) at that position in the BLOCK. ….G D S F H Y FV S HG….. ….G D A F HYY I S FG….. ….G D S Y H Y F L S FG….. …. S D S F H Y FM S FG….. ….G D S F HFFA S FG….. A C D E F G H I K…. 1 Log(4) 2 Log(5) 3 4 5