当前位置：和泉文库 > 计算机 > 浏览文档

香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 10 An Introduction to Bioinformatics and its application in Protein-DNA/Protein Interactions Research and Drug Discovery

文件格式：PPTX，文件大小：10.43MB，售价：13.2元

文档详细内容（约60页）

Overall Methodology TRANSFAC Binding Sequence Data A progressive approach: TF TE Binding Binding TFBS TFBS Use the available TFBS motifs C GOAL:discovering TE from Transfac DB-already approximate binding rules Binding approximate with ambiguity TF Motif T TFBS Motif C TFBS code representation-TFBS :TF Motif T TFBS Motif C side done！？■dH■ 7 TE : Binding TF Motif T TFBS Motif C TFBS Group TF sequences with TFBS motif C ready in TRANSFAC e.g.M00041:TGACGTYA different motif C similarity Grouped TF data by different C Approximate TF(Core) thresholds TY=0.0.0.1,0.3 similarity thresholds(7Y Motif Discovery 4 E Approximate TF Core Motif Customized Algorithm Rulek Rulek+1 give W and E TF side T-NRIAA C=TGACGTYA T=. C=. done SS图NKIAA SSESNRAAA {l= Associating T((fi with C Approximate TF-TFBS Rules 16

16 Overall Methodology TF TFBS T..F. TFBS Binding TFBS T..F. Binding TFBS T..F. Binding ... GOAL: discovering approximate binding rules ... TRANSFAC Binding Sequence Data TFBS motif C ready in TRANSFAC e.g. M00041: TGACGTYA Grouped TF data by different C similarity thresholds (TY) ... ... ... ... Approximate TF (Core) Motif Discovery Binding TF Motif T TFBS Motif C ? ? TF Motif T TFBS Motif C ? ? TF Motif T TFBS Motif C ? ? Approximate TF-TFBS Rules W, E ... T=NRIAA C=TGACGTYA {ti,j}= NKIAA NRIAA NRAAA .. { . } T=... C=... {ti,j}= { ... } ... ... ... ... Rulek Rulek+1 ... Use the available TFBS motifs C from Transfac DB—already approximate with ambiguity code representation—TFBS side done! Group TF sequences with different motif C similarity thresholds TY=0.0, 0.1, 0.3 Approximate TF Core Motif Discovery for T (instance set {ti,j}) give W and E—TF side done A progressive approach: Associating T ({ti,j}) with C Customized Algorithm

TF Side:Core TF Motif Discovery o The customized algorithm Input:width W and (substitution)error E,TF Sequences S RFind WV-patterns (at least 1 hydrophilic amino acid)and their E approximate matches o Iteratively find the optimal match set (t.}based on the Bayesian scoring function ffor motif discovery: p=S is the abundance ratio f={H(∑ ab a=1be】日a,blo80.b position weight matrix(PWM)）Θ RTop K=10 motifs are output,each with its instance set} 17

17 TF Side: Core TF Motif Discovery  The customized algorithm  Input: width W and (substitution) error E, TF Sequences S  Find W-patterns (at least 1 hydrophilic amino acid) and their E approximate matches  Iteratively find the optimal match set {ti,j} based on the Bayesian scoring function f for motif discovery:  Top K=10 motifs are output, each with its instance set {ti,j}

Results and Analysis Protein-DNA P-D Pairs PDB Data =2*1-1 with close residues at the centers TF 4 os Verification FLERNRAAA TAAATGACA ■ LERNRAAAS CTATGTCAT QRon Protein Data Bank (PDB) KRMRNRIAA TCGATGACG Most representative database of T-NRIAA C-TGACGTYA experimentally determined protein-DNA residues TF. 3D structure data (≤3.5A) TF NKIAA TFBS 1 ins:N expensive and time consuming {twd NRAAA4 ins Y most accurate evidence for verification NRIAA 5 ins:Y An approximate TF-TFBS rule Rm=0.9 RTE.TFBS-0.9 R Check the approximate TF-TFBS rules T((t))-C Approximate appearance in binding pairs from PDB 3D structure data:width W bounded by E 8 TF side(R):instance oriented-ti}evaluated [0,1]higher the better TFBS side(RTETES):pattern oriented-C evaluated 18

18  Verification  on Protein Data Bank (PDB)  Check the approximate TF-TFBS rules T({t i,j})-C  Approximate appearance in binding pairs from PDB 3D structure data : width W bounded by E  TF side (RTF): instance oriented—{t i,j} evaluated  TFBS side (RTF-TFBS): pattern oriented—C evaluated [0,1] higher the better Results and Analysis Most representative database of experimentally determined protein-DNA 3D structure data * expensive and time consuming * most accurate evidence for verification

Biological verification TFBS:TGACG TFBS:TGACG os Recall the challenge NRIAA NKIAA R Given sequence datasets of tens of TF sequences,each hundreds of AA in length,grouped by TFBS consensus C (5~20bp), R Predict W=5,6)substrings(()associated with C TF:NRAAA TF:NRIAA Which can be verified in actual 3D TF-TFBS binding structures as well as homology modeling (by bio experts)！ TFBS:TGACA TFBS:TGACG PDB Verified examples in Rule NRIAA(NKIAA;NRAAA;NREAA:NRIAA)-TGACGTYA 19

19 Biological verification  Recall the challenge  Given sequence datasets of tens of TF sequences, each hundreds of AA in length, grouped by TFBS consensus C (5~20bp),  Predict W(=5,6) substrings ({t i,j}) associated with C PDB Verified examples in Rule NRIAA(NKIAA; NRAAA; NREAA; NRIAA)-TGACGTYA Which can be verified in actual 3D TF-TFBS binding structures as well as homology modeling (by bio experts)! NRIAA NKIAA

One more verified example Results and Analysis TF:ERKRR TF:ERRRR TF:ERRRR TF:ERKRR TFBS:CACGT TFBS:CACGT /TFBS:CACGT TFBS:/CACGT (a)1AN2:ERKRR-CACGT (b)1AN4:ERRRR-CACGT (c)1AN4:ERRRR-CACGT (2nd (d)INLW:ERKRR-CACGT match) TFBS:CACGT TFBS:CACGT TF:ERQRR TF:ERQRR 1NKP: y TFBS:CACGT TFBS:CACGT TF:ERKRR TF:ERKRR (e)INKP:ERKRR-CACGT (f)INKP:ERKRR-CACGT (2nd (g)INKP:ERQRR-CACGT (h)INKP:ERQRR-CACGT (2nd match) match) TFBS:CACGT TFBS:CACGT TF:ERKRR TF:ERKRR (IHLO:ERKRR-CACGT (j)IHLO:ERKRR-CACGT (2nd match) Figure 4.PDB verifications for rule M00217:ERKRR(ERKRR:ERQRR;ERRRR)-CACGTG for W=5,E=1.TY=0.1 using ProteinWorkshop. M00217:ERKRR(ERKRR:ERQRR;ERRRR)-CACGTG 20

20 Results and Analysis M00217: ERKRR(ERKRR; ERQRR; ERRRR)-CACGTG 1NKP: One more verified example

点击进入文档下载页（PPTX格式）

共60页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 11 Design of Microfluidics-Based Biochips
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 07-2 Research and Applications of Virtual Medicine Part II Virtual Reality Based Surgical Simulations
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 07-1 Research and Applications of Virtual Medicine Part I Introduction to Medical Visualization
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 06 3D computer vision techniques
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 05 Fault-Tolerant Computing
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 04 CRYPTOGRAPHY
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 03 Controlling Salinity in a Potable Water Supply System Using a Constraint Programming Approach
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 02 Game theory in computer science
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 01 A Secure Overlay Cloud Storage System with Access Control and Assured Deletion
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 08 An introduction to expander graphs（EXPANDER GRAPHS AND THEIR APPLICATIONS）
香港中文大学：《Theory of Computational Complexity》课程教学资源（讲义）Lecture 12 A glimpse of computational complexity
香港中文大学：《Theory of Computational Complexity》课程教学资源（讲义）Lecture 11 Information theoretical argument
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 12 Introduction to Computational Photography
Minimal Cover-Automata for Finite Languages
香港中文大学：《Topics in Theoretical Computer Science》课程教学资源（PPT课件讲稿）Lecture 7 Stable matching.Gale-Shapley algorithm
《农业信息技术概论》课程教学资源（教学大纲）
《仿真与虚拟农业》课程教学资源（实验指导）
天津农学院：《微机原理与汇编语言程序设计》课程教学资源（实验指导书）
《3S技术导论》课程教学资源（实验指导）
《3S技术导论》课程教学资源（讲义）
《仿真与虚拟农业》课程教学资源（教学大纲）
软件设计师考试同步辅导（第4版）第2章程序设计语言基础
安徽理工大学：《算法设计与分析 Algorithm Design and Analysis》课程教学资源（PPT课件讲稿）第1章导引与基本数据结构论（任课老师：郭娟、方欢）
安徽理工大学：《算法设计与分析 Algorithm Design and Analysis》课程教学资源（PPT课件讲稿）第2章递归算法设计与分析

点击购买下载（PPTX）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录