当前位置：和泉文库 > 生物 > 浏览文档

麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 3 More Multiple Sequence Alignment

Outline Multiple Sequence AlignmentCarillo-& Lipman, Clustal(W) Position-Specific Scoring Matrices(PSSM) Information content, Shannon entropy Sequence logos Hidden Markov Models Other approaches: Genetic algorithms, expectation maximization, MEME, Gibbs sampler

文件格式：PDF，文件大小：967.34KB，售价：17.7元

共65页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约65页）

We can now use the pssm to search a database for other proteins that have the bloCK (or motif). Problem 1- We need to think about what kind of information is Contained within the Pssm -Leads to concepts of Information Content Entropy GDF班YEVv!HG GDAFHYY工!FG GDY班YE!FG. FHYEM: FG ·● CD FHFFAS FG Problem 2-The PSSM must accurately represent the expected BLoCK Sampling of the BLOCK/motif? Is it too narrow because of small datase(? Or motif. . and we have only limited amounts of data! Is it a good statistica Should we broaden it by adding extra amino acids that we choose using Some type of randomization scheme(called adding pseudocounts). If so, How many should we add?

We can now use the PSSM to search a database for other proteins that have the BLOCK (or motif). Problem 1 – We need to think about what kind of information is Contained within the PSSM. →Leads to concepts of Information Content & Entropy ….G D S F H Y FV S HG….. ….G D A F HYY I S FG….. ….G D S Y H Y F L S FG….. …. S D S F H Y FM S FG….. ….G D S F HFFA S FG….. Problem 2 –The PSSM must accurately represent the expected BLOCK Or motif….and we have only limited amounts of data! Is it a good statistical Sampling of the BLOCK/motif? Is it too narrow because of small dataset? Should we broaden it by adding extra amino acids that we choose using Some type of randomization scheme (called adding pseudocounts). If so, How many should we add?

Finding patterns(i.e. motifs and domains in Multiple Sequence Analysis Block Analysis, Position Specific Scoring Matrices(PSSM) BUILD an msa from groups of related proteins BLOCKS represent a conserved region in that msa that is LAcKING IN GAPS-i.e, no insertions/deletions The bLoCKs are typically anwhere from 3-60 amino acids long based on exact amino acid matches -i.e. alignment will tolerate mismatches, but doesn't use any kind of PAM or BLOSUM matrix. in fact they generate the blosuM matrix These blocks may be whole domains, short sequence motifs, key parts of enzyme active sites etc, etc

Finding patterns (i.e. motifs and domains) in Multiple Sequence Analysis Block Analysis, Position Specific Scoring Matrices (PSSM) BUILD an msa from groups of related proteins BLOCKS represent a conserved region in that msa that is LACKING IN GAPS – i.e. no insertions/deletions The BLOCKS are typically anwhere from 3-60 amino acids long, based on exact amino acid matches – i.e. alignment will tolerate mismatches, but doesn’t use any kind of PAM or BLOSUM matrix…in fact they generate the BLOSUM matrix! These blocks may be whole domains, short sequence motifs, key parts of enzyme active sites etc, etc

Position Specific Scoring Matrices PSSM 12345…11 GDSEHQFVSHG SDAFHOY工SEG GDSYWNELSFG SDSFHOFMSEG ·● GDSYWNYASFG This BloCK might represent some small part of a modular protein domain, or might represent a motif for something like a phosphorylation site on the s in position 9 Now build a matrix with 20 amino acids as the columns and 11 rows For the positions in the BLOCK

Position Specific Scoring Matrices PSSM 12345………….11 ….G D S F H Q FV S HG….. …. S D A F HQY I S FG….. ….G D S Y WN F L S FG….. …. S D S F H Q FM S FG….. ….G D S Y WN YA S FG….. This BLOCK might represent some small part of a modular protein domain, or might represent a motif for something …..like a phosphorylation site on the S in position 9 Now build a matrix with 20 amino acids as the columns, and 11 rows For the positions in the BLOCK

Each matrix entry is the Log(frequency of the amino acid occurance) at that position in the blocK 12345…11 GDSEHQFVSHG SDAFHQY工sEG ·● GDSYWNELSFG l。··●● SDSEHQFMSEG GDSYWNYASFG A E F GHI KLS T Log( 3 .og(2 cO+ 12345 Log(5)

Each matrix entry is the Log(frequency of the amino acid occurance) at that position in the BLOCK. Position 12345………….11 ….G D S F H Q FV S HG….. …. S D A F HQY I S FG….. ….G D S Y WN F L S FG….. …. S D S F H Q FM S FG….. ….G D S Y WN YA S FG….. A C D E F G H I K…. S T… 1 Log(3) Log(2) 2 Log(5) 3 4 5

We can now use the PSSM to look for the bLocK (motif in single proteins or- use the PssM to search a database for other proteins that have the blocK (or motif) Problem 1 -The PsSM must accurately represent the expected blocK Or motif. . and we have only limited amounts of data! Is it a good statistical Sampling of the BLOCK/motif? is it too narrow because of small dataset? Should we broaden it by adding extra amino acids that we choose using Some type of randomization scheme(called adding pseudocounts). If so, How many should we add? GDSFHOEVSHG SDAFHOYI BFG GDSYWNELSFG SDSFHOEMSFG GDSYWNYASFG .●● Problem 2-We need to think about what kind of information is Contained within the pssm Leads to concepts of Information Content Entropy

We can now use the PSSM to look for the BLOCK (motif) in single proteins -oruse the PSSM to search a database for other proteins that have the BLOCK (or motif). Problem 1 –The PSSM must accurately represent the expected BLOCK Or motif….and we have only limited amounts of data! Is it a good statistical Sampling of the BLOCK/motif? Is it too narrow because of small dataset? Should we broaden it by adding extra amino acids that we choose using Some type of randomization scheme (called adding pseudocounts). If so, How many should we add? ….G D S F H Q FV S HG….. …. S D A F HQY I S FG….. ….G D S Y WN F L S FG….. …. S D S F H Q FM S FG….. ….G D S Y WN YA S FG….. Problem 2 –We need to think about what kind of information is Contained within the PSSM. →Leads to concepts of Information Content & Entropy

点击进入文档下载页（PDF格式）

共65页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 1 Michael Yaffe Introduction to Bioinformatics
《微生物遗传学》第四章基因工程技术在改进微生物
《分子生物学》课程教学资源（练习题）试题详解（含参考答案）
南京军区南京总医院：《组织芯片应用的现状与前景》讲义
《酶学》课程教学资源（讲义）第四章酶的结构和功能
《酶学》课程教学资源（讲义）第十一章酶在医学方面的应用
《酶学》课程教学资源（讲义）第六章多种因素对酶反应速度的影响
《酶学》课程教学资源（讲义）第八章酶的别构效应
《酶学》课程教学资源（讲义）第五章酶催化动力学基础
《酶学》课程教学资源（讲义）第二章酶的一般性质和分类
《酶学》课程教学资源（讲义）第九章固定化生物催化剂
《酶学》课程教学资源（讲义）第三章酶活性的测定及分离纯化
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 2 More Pairwise Sequence Comparisons
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 4 Database Searching
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 5 Molecular Phylogenetics
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 2 The Language of genomics
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 1 Genome Sequencing
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 3 Review of DNA Seq
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 6 Predicting rna Secondary structure
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 4 Organization of topics
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 6 Structure Prediction
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 5 Markov models
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 5 Review -Homology Modeling
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 1 Review of protein structure hierarchy

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录