《生物信息学》(第二版)(樊龙江主编,2021)配套PPT3-3 3.4 Multiple sequence alignment and domain finding (1) Multiple sequence alignment and progressive global alignment(Clustal W) (2) Find and model local multiple alignment (3)How to evaluate the quality of a PSSM?
• (1) Multiple sequence alignment and progressive global alignment (ClustalW) • (2) Find and model local multiple alignment • (3) How to evaluate the quality of a PSSM? 3.4 Multiple sequence alignment and domain finding 《生物信息学》(第二版)(樊龙江主编,2021)配套PPT3-3
数据库 DNA/RNA 保守功能位点/元件 (Rfam、Dfam等) 联配多条序列 信息量/熵 概型 蛋白质 功能域 数据库 HMM ( PROSITE、Pfam等) 正则表达式
(1)Multiple sequence alignment and progressive global alignment(ClustalW) Why produce a multiple sequence alignment? Conserved regions/domains are likely to represent regions that are essential for structure and function core of proteins A multiple sequence alignment is a starting point for an evolutionary(phylogenetic)analysis Using more than two sequences results in a more convincing alignment by revealing conserved regions in all of the sequences
(1) Multiple sequence alignment and progressive global alignment (ClustalW) • Conserved regions/domains are likely to represent regions that are essential for structure and function - core of proteins • A multiple sequence alignment is a starting point for an evolutionary (phylogenetic) analysis • Using more than two sequences results in a more convincing alignment by revealing conserved regions in all of the sequences Why produce a multiple sequence alignment?
Types of multiple sequence alignment Global alignment in which entire sequences are aligned at the same time using extension of dy ynamic programming Local alignment in which conserved local regions derived by removing stretches of global alignment found by statistical methods
Types of multiple sequence alignment • Global alignment in which entire sequences are aligned at the same time using extension of dynamic programming • Local alignment in which conserved local regions • derived by removing stretches of global alignment • found by statistical methods
EXample of local msa Rest of Rest of proteins do not align we∥ot proteins do align well Domain that aligns well To find an identifiable common AGGCTT usually longest, pattern with AAGCTA 2 some degree of variability. No agactt 3 gaps are shown in this example AAACTA/ 4 but they can be accomodated
Example of local msa AGGCTT AAGCTA AGACTT AAACTA 1 2 3 4 Domain that aligns well Rest of proteins do not align well Rest of proteins do not align well To find an identifiable common, usually longest, pattern with some degree of variability. No gaps are shown in this example but they can be accomodated