Many potential alignments for two sequences a: CTGTATC b: CTATAATCCC CTATAATCCC CT-ATAATCCC CTATAATCCC CTGTA-TC CTG-TA-T-C CTGTA-T--C CTATAATCCC CTATAATCCC CTATAATCCC CTGT-ATC CTGT-AT-C CTGT-AT--C
CTATAATCCC CT–ATAATCCC CTATAATCCC CTGTA–TC CTG–TA–T–C CTGTA–T– –C CTATAATCCC CTATAATCCC CTATAATCCC CTGT–ATC CTGT–AT–C CTGT–AT– –C Many potential alignments for two sequences! a: CTGTATC b: CTATAATCCC
Three key steps to answer if two sequences are related (1)The scoring system used to rank alignments, (2) The algorithm used to find optimal(or good) scoring alignments (3)The statistical methods used to evaluate the significance of an alignment score
(1)The scoring system used to rank alignments; (2)The algorithm used to find optimal (or good) scoring alignments; (3)The statistical methods used to evaluate the significance of an alignment score. Three key steps to answer if two sequences are related
What's a sequence alignment? Global Alignment the global alignment is stretched over the entire sequence length to include as many matching amino acids or nucleotides as possible up to and including the sequence ends local Alignment In a local alignment, the alignment stops at the ends ofregions of identity or strong similarity, and a much higher priority is given to finding these local regions than to extending the alignment to include more neighboring amino acid or nucleotide pairs
Global Alignment the global alignment is stretched over the entire sequence length to include as many matching amino acids or nucleotides as possible up to and including the sequence ends. local Alignment In a local alignment, the alignment stops at the ends of regions of identity or strong similarity, and a much higher priority is given to finding these local regions than to extending the alignment to include more neighboring amino acid or nucleotide pairs. What’s a sequence alignment?
Three principle methods of pair wise sequence alignment Dot matrix pair-wise sequence comparison The dynamic programming (dp)algorithm Needleman and Wunsch(1970) Smith and Waterman (1981) Word or k-tuple methods heuristic algorithms, used by the programs of FASTA and blast(See Section 3.3
Three principle methods of pairwise sequence alignment • Dot matrix pair-wise sequence comparison • The dynamic programming (DP) algorithm • Needleman and Wunsch (1970) • Smith and Waterman (1981) • Word or k-tuple methods • heuristic algorithms, used by the programs of FASTA and BLAST (See Section 3.3)
全局 Needleman- Wunsch算法 动态规划 Smith- Waterman算法 获得最优联配 局部 BLAST算法 BLAT算法 PAM 替换矩阵 BLOSUM 计分系统 两条序列联配方法及其应用 位置特异性矩阵(PSSM) 空位罚分 统计测验 极值分布 字符串索引 数据库搜索 BLAST算法 搜索算法 BLAT算法 HMMER算法