MODBASE http://alto.compbio.ucsfedumodbase-cgi/index.cgi 733, 239 sequences 7, 120 non-redundant structures Fold Assignments (by PSI-BLAST) Reliable fold assignments: 827,007 for 413, 311 sequences Average folds per sequence: 2.0 Average length of queries: 511 amino acids Average length of folds: 229 amino acids Comparative Models(by mODELLeR) Reliable models 547,473 Sequences with reliable models: 327, 393(59%) Structures used as templates 6366(89%) For a reliable fold assignment psi-BLaSt E value<0.0001 oR a reliable model For a reliable model, 30% of Ca atoms superpose within 3. 5A of their correct positions
MODBASE http://alto.compbio.ucsf.edu/modbase-cgi/index.cgi • 733,239 sequences & 7,120 non-redundant structures • Fold Assignments (by PSI-BLAST) • Reliable fold assignments: 827,007 for 413,311 sequences • Average folds per sequence: 2.0 • Average length of queries: 511 amino acids • Average length of folds: 229 amino acids • Comparative Models (by MODELLER) • Reliable models 547,473 • Sequences with reliable models: 327,393 (59%) • Structures used as templates: 6.366 (89%) For a reliable fold assignment, PSI-BLAST E value < 0.0001 OR a reliable model. For a reliable model, 30% of Cα atoms superpose within 3.5Å of their correct positions
EXample You' ve just cloned a new gene from Pombe look it up in ModBase putative galactosyltransferase associated protein kinase (GenBank accession 3006192) ASE TARGET MODEL DATA TEMPLATE Model/ old Sequenc Sequence Reliabilty based vie Database Database Organism Segment Annotation Links Annotation 端需噩密 serineithreonine Q“。TRQ60145 Dataset 39871368298450016-1211.00141291 human cyclin-dependent SP/TR-2001 PFAM kinase erineithre onine Q∴TR96045Dt5e 述的代39859423000111100824260msep38 SP/TR-2001 PFAM Q“TRQ60145Daae 45393533031010 bitchin SP/TR-2001 PFAM serinelthreonine ScHamps-depende 速为39823936020101010°:23281021371804 lapm 3 2(catalytic ubunit)alpha isoe Pieper, Ursula, Narayanan Eswar, Ashley C. Stuart, Valentin A Ilyin, and Andrej sali. "MODBASE, A Database of Annotated Comparative Protein Structure Models "Nuc. Acids Res. 30(2002 255-259 http://alto.compbio.ucsf.edu/modbase-cgi/index.cgi
Example You’ve just cloned a new gene from Pombe - look it up in ModBase • putative galactosyltransferase associated protein kinase (GenBank accession # 3006192) Pieper, Ursula, Narayanan Eswar, Ashley C. Stuart, Valentin A. Ilyin, and Andrej Sali. "MODBASE, A Database of Annotated Comparative Protein Structure Models." Nucl. Acids Res. 30 (2002): 255-259. http://alto.compbio.ucsf.edu/modbase-cgi/index.cgi
Model of new PomBE gene TARGET TEMPLATE= 1HCL PDB ID: 1HCL Schulze Gahmen, U, J Brandsen, H D. Jones, D O Morgan, L Meijer, J. Vesely, and S H. Kim. Multiple Modes of Ligand Recognition Crystal Structures of Cyclin-dependent Protein Kinase 2 in Complex with ATP and Two Inhibitors Olomoucine and Isopentenyladenine. "Proteins 22(1995): 378 roteindAtabAnk(pdb-http://www.pdb.orgisthesingleworldwiderepositoryfortheprocessinganddistributionof3-dbiologicalmacromolecularstructuredata lan, H M, J. Westbrook, Z. Feng, G. Gilliland, T. N Bhat, H Weissig, L N. Shindyalov, and P E Bourne. The Protein Data Bank. Nucleic Acids Research 28 (2000:235242 (pDbAdvisoryNoticeonusingmaterialsavailableinthearchivehttp:/www.rcsb.org/pdb/advisory.html)
Model of new POMBE gene TARGET TEMPLATE = 1HCL PDB ID: 1HCL Schulze-Gahmen, U., J. Brandsen, H. D. Jones, D. O. Morgan, L. Meijer, J. Vesely, and S. H. Kim. "Multiple Modes of Ligand Recognition: Crystal Structures of Cyclin-dependent Protein Kinase 2 in Complex with ATP and Two Inhibitors, Olomoucine and Isopentenyladenine." Proteins 22 (1995): 378. The Protein Data Bank (PDB - http://www.pdb.org/) is the single worldwide repository for the processing and distribution of 3-D biological macromolecular structure data. Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. The Protein Data Bank. Nucleic Acids Research 28 (2000): 235-242. (PDB Advisory Notice on using materials available in the archive: http://www.rcsb.org/pdb/advisory.html)
The casp contests Critical assessment of protein structure Prediction Began in 1994( CASP1) · Held every two years Experimentalists submit target sequences Predictors submit and rank blind predictions assessors develop criteria to judge success A meeting is held to discuss the results and a journal issue (of ProteIns) is published to describe them In theory, this identifies the problem areas and people go back and work on them for the next round of casp
The CASP contests • Critical Assessment of Protein Structure Prediction • Began in 1994 (CASP1) • Held every two years • Experimentalists submit target sequences • Predictors submit and rank blind predictions • Assessors develop criteria to judge success • A meeting is held to discuss the results and a journal issue (of PROTEINS) is published to describe them • In theory, this identifies the problem areas and people go back and work on them for the next round of CASP
CASP4 Target Toll 1. Protein name Example of a CASP target 2. Ors Name Escherichia coli 3. Number of amino acids(approx) 4. Accession number P08324 5. Sequence Database -pr 6. Amino acid sequence SKIVKIIGREIIDSRGNPTVEAEVHLEGGFVGMAAAPSGASTGSREALEL RDGDKSRFLGKGVTKAVAAVNGPIAQALIGKDAKDQAGIDKIMIDLDGTE NKSKFGANAILAVSLANAKAAAAAKGMPLYEHIAELNGTPGKYSMPVPMM NIINGGEHADNNVDIQEFMIQPVGAKTVKEAIRMGSEVFHHLAK VLKAKG MNTAVGDEGGYAPNLGSNAEALAVIAEAVKAAGYELGKDITLAMDCAASE FYKDGKYVLAGEGNKAFTSEEFTHFLEELTKQYPIVSIEDGLDESDWDGF AYQTKVLGDKIQL VGDDLFVTNTKILKEGIEKGIANSILIKFNQIGSLTE TLAAIKMAKDAGYTAVISHRSGETEDATIADLAVGTAAGQIKTGSMSRSD RVAKYNQLIRIEEALGEKAPYNGRKEIKGQA 7. Additional Information oligomerization state: dimer in the presence of magnesium by dynamic light scattering and small angle x-ray solution scattering and in the recently solved crystal structure 8. Homologous Sequence of known structure 9. Current state of the experimental work Structure solved by molecular replacement. Currently the refinement to 2. 5 a resolution is near completion Current Rfree 27 %. R 22
CASP4 Target T0111 1. Protein Name Example of a CASP target enolase 2. Organism Name Escherichia coli 3. Number of amino acids (approx) 431 4. Accession number P08324 5. Sequence Database Swiss-prot 6. Amino acid sequence SKIVKIIGREIIDSRGNPTVEAEVHLEGGFVGMAAAPSGASTGSREALEL RDGDKSRFLGKGVTKAVAAVNGPIAQALIGKDAKDQAGIDKIMIDLDGTE NKSKFGANAILAVSLANAKAAAAAKGMPLYEHIAELNGTPGKYSMPVPMM NIINGGEHADNNVDIQEFMIQPVGAKTVKEAIRMGSEVFHHLAKVLKAKG MNTAVGDEGGYAPNLGSNAEALAVIAEAVKAAGYELGKDITLAMDCAASE FYKDGKYVLAGEGNKAFTSEEFTHFLEELTKQYPIVSIEDGLDESDWDGF AYQTKVLGDKIQLVGDDLFVTNTKILKEGIEKGIANSILIKFNQIGSLTE TLAAIKMAKDAGYTAVISHRSGETEDATIADLAVGTAAGQIKTGSMSRSD RVAKYNQLIRIEEALGEKAPYNGRKEIKGQA 7. Additional Information oligomerization state: dimer in the presence of magnesium by dynamic light scattering and small angle x-ray solution scattering and in the recently solved crystal structure. 8. Homologous Sequence of known structure yes 9. Current state of the experimental work Structure solved by molecular replacement. Currently, the refinement to 2.5 A resolution is near completion. Current Rfree 27 % ; R 22 %