当前位置：和泉文库 > 生物 > 麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 4 Database Searching

麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 4 Database Searching

Outline FASTA, Blast searching, Smith-Waterman Psi-Blast Review of Genomic DNA structure Substitution patterns and mutation rates Synonymous and non-Synonymous substitutions Jukes-Cantor Model Kimura's Two-Parameter Model

文件格式：PDF，文件大小：599.82KB，售价：20.05元

共75页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约75页）

Database searching FASTA: Basic Idea 2- Repeat for all possible k-tuples i.e. cV=2-tuple Seq 1: AHFYRWNKLCV Seg 2: DRWNLFCVATYWE 3-Make a Hash Table Hashing) that has the position of each k-tuple in each sequence 2-tuple pos in Seg1 pos in Seg2 offset (pos1-p0s2 RW 5 2 CV 10 AH

Database Searching FASTA: Basic Idea 2- Repeat for all possible k-tuples i.e. CV = 2-tuple Seq 1: AHFYRWNKLCV Seq 2: DRWNLFCVATYWE 3- Make a Hash Table (Hashing) that has the position of each k-tuple in each sequence 2-tuple pos. in Seq1 RW 5 CV 10 AH 1 i.e. pos in Seq 2 Offset (pos1-pos2) 2 3 7 3 ---- ----

Database searching Seq 1: AHFYRWNKLEV Seg 2: DRNLFCVATYWE 3.Make a Hash Table HAshing)that has the position of each k-tuple heach sequence e 2-tuple pos in Seq1 pos in Seg ofset(pos1-pos2 RW CV 10 3 AH 4-Look for words(k-tuples) with same offset These are in-phase and reveal a region of alignment between the two sequences 5-Build a local alignment based on these, extend it outwards Seg 1: AHFYRWNKLCV Seg 2: DRWNLFCVATYWE

Database Searching Seq 1: AHFYRWNKLCV Seq 2: DRWNLFCVATYWE 3- Make a Hash Table i.e. (Hashing) that has the position of each k-tuple in each sequence 2-tuple pos. in Seq1 pos in Seq 2 Offset (pos1-pos2) 3 3 RW 5 2 CV 10 7 AH 1 ---- ---- 4- Look for words (k-tuples) with same offset These are in-phase and reveal a region of alignment between the two sequences. 5- Build a local alignment based on these, extend it outwards Seq 1: AHFYRWNKLCV Seq 2: DRWNLFCVATYWE

Database searching With hashing, number of comparisons is proportional To the average sequence length i.e. an o(n) problem) Not an o(mn)problem as in dynamic programming Proteins-ktup=1-2 Nucleotides, ktup=4-6 One big problem- low complexity regions Seq 1: AHFYPPPPPPPPFSER Seq 2: DVATPPPPPPPPPPPNLFK

Database Searching With hashing, number of comparisons is proportional To the average sequence length (i.e. an O(n) problem), Not an O(mn) problem as in dynamic programming. Proteins – ktup = 1-2, Nucleotides, ktup=4-6 One big problem – low complexity regions. Seq 1: AHFYPPPPPPPPFSER Seq 2: DVATPPPPPPPPPPPNLFK

Database searching BLAST Same basic idea as fasta but faster and more sensitivel How? BLAST searches for common words or k-tuples, but limits the search for k-tuples that are most significant by using the log-odds values in the Blosum62 amino acid substitution matrix . e. look for WHK and might accept WHR but not HFK as a possible match(note 8000 possibilities) Repeat for all 3-tuples in the query Search the database for a match to the top 50 3-tuples that match the first query position in the sequence, the second query position, etc Use any match to seed an ungapped alignment (old BLasT)

Database Searching BLAST Same basic idea as FASTA, but faster and more sensitive! How? BLAST searches for common words or k-tuples, but limits the search for k-tuples that are most significant, by using the log-odds values in the Blosum62 amino acid substitution matrix i.e. look for WHK and might accept WHR but not HFK as a possible match (note 8000 possibilities) Repeat for all 3-tuples in the query Search the database for a match to the top 50 3-tuples that match the first query position in the sequence, the second query position, etc. Use any match to seed an ungapped alignment (old BLAST)

Database searching Word length is fixed: 3-tuple for proteins 11-tuple for nucleotides By default, filters out low complexity regions Determine if the alignment is statistically significant calculates the probability of observing a score greater than or equal to your alignment based on extreme value distribution Calculates an E-value expectation value This is the probability of finding an unrelated sequence that shows this good an alignment just by chance. Remember if p=.0001 and my database has 500,000 sequences, I will have an E=50!(normal starting E=10

Database Searching Word length is fixed: 3-tuple for proteins 11-tuple for nucleotides By default, filters out low complexity regions. Determine if the alignment is statistically significant. calculates the probability of observing a score greater than or equal to your alignment based on extreme value distribution. Calculates an E-value = expectation value: This is the probability of finding an unrelated sequence that shows this good an alignment just by chance. Remember if p=.0001 and my database has 500,000 sequences, I will have an E=50! (normal starting E=10)

点击进入文档下载页（PDF格式）

共75页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 2 More Pairwise Sequence Comparisons
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 3 More Multiple Sequence Alignment
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 1 Michael Yaffe Introduction to Bioinformatics
《微生物遗传学》第四章基因工程技术在改进微生物
《分子生物学》课程教学资源（练习题）试题详解（含参考答案）
南京军区南京总医院：《组织芯片应用的现状与前景》讲义
《酶学》课程教学资源（讲义）第四章酶的结构和功能
《酶学》课程教学资源（讲义）第十一章酶在医学方面的应用
《酶学》课程教学资源（讲义）第六章多种因素对酶反应速度的影响
《酶学》课程教学资源（讲义）第八章酶的别构效应
《酶学》课程教学资源（讲义）第五章酶催化动力学基础
《酶学》课程教学资源（讲义）第二章酶的一般性质和分类
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 5 Molecular Phylogenetics
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 2 The Language of genomics
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 1 Genome Sequencing
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 3 Review of DNA Seq
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 6 Predicting rna Secondary structure
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 4 Organization of topics
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 6 Structure Prediction
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 5 Markov models
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 5 Review -Homology Modeling
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 1 Review of protein structure hierarchy
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 1 How are X-ray crystal structures
麻省理工大学：《Foundations of Biology》课程教学资源（英文版）Lecture 3 For a molecular simulation or model

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录