Challenges Finding an optimal alignment of more than two sequences that includes matches mismatches, and gaps, and that takes into account the degree of variation in all of the sequences at the same time poses a very difficult challenge A second computational challenge is identifying a reasonable method of obtaining a cumulative score for the substitutions in the column ot an msa
• Finding an optimal alignment of more than two sequences that includes matches, mismatches, and gaps, and that takes into account the degree of variation in all of the sequences at the same time poses a very difficult challenge. • A second computational challenge is identifying a reasonable method of obtaining a cumulative score for the substitutions in the column of an msa. Challenges
Multiple sequence alignment is computational complex Suppose one tries to align three sequences by extending the method of aligning 2 sequences to a 3 dimensional scoring matrix Sequence 1 ■■■■■■■■■■■■ Problems . Time and space needed is length of seq raised to power of no of sequences 8c8 ? Optimal score in 3 Can do for three sequences dimensions but not more than three
Multiple sequence alignment is computational complex Suppose one tries to align three sequences by extending the method of aligning 2 sequences to a 3 dimensional scoring matrix. Sequence 1 Sequence 2 Y W W ? Optimal score in 3 dimensions Problems: •Time and space needed is length of seq. raised to power of no. of sequences •Can do for three sequences but not more than three
Alignment of three sequences by dynamic programming For three protein sequences each 300 amino acids in length and excluding gaps, the number of comparisons to be made by dynamic programming is equal to 3003=2.7 X 107, whereas only 3002=9 X104 is required for two sequences of this length (The number of steps and memory required for N M-amino-acid sequences: Mv Carrillo and Lipman(1988) found a way(the sum of pairs, sP method, the msa program)to reduce the number of comparisons that must be made without compromising the attempt to find an optimal alignment
Alignment of three sequences by dynamic programming • For three protein sequences each 300 amino acids in length and excluding gaps, the number of comparisons to be made by dynamic programming is equal to 3003 = 2.7 ×107 , whereas only 3002 = 9 ×104 is required for two sequences of this length. (The number of steps and memory required for N M-amino-acid sequences: MN) • Carrillo and Lipman (1988) found a way (the sum of pairs, SP method, the MSA program) to reduce the number of comparisons that must be made without compromising the attempt to find an optimal alignment
Basic idea of msa program sum of pairs(SP)method B A-C sequence A
Basic idea of MSA program: sum of pairs (SP) method
Thus, approximate methods are used, including (1)a progressive global alignment of the sequences starting with an alignment of the most alike sequences and then building an alignment by adding more sequences; (2)iterative methods that make an initial alignment of groups of sequences and then revise the alignment to achieve a more reasonable result (3)alignments based on locally conserved patterns found in the same order in the sequences (4)use of statistical methods and probabilistic models of the sequences
Thus, approximate methods are used, including: (1) a progressive global alignment of the sequences starting with an alignment of the most alike sequences and then building an alignment by adding more sequences; (2) iterative methods that make an initial alignment of groups of sequences and then revise the alignment to achieve a more reasonable result; (3) alignments based on locally conserved patterns found in the same order in the sequences; (4) use of statistical methods and probabilistic models of the sequences