26CHAPTER4PeptideYPeptideXPeptideZJHCarboxyl terminalAmino terminalportion ofportion ofpeptide Xpeptide YH0HFigure 4-7.The overlapping peptide Zis used to deducethatpeptidesXandYarepresentintheoriginalPhenylisothiocyanate (Edman reagent)and a peptideproteininthe orderX→Y,notYX.sequence can be determined and the genetic code usedto infer the primary structure of the encoded poly-peptide.The hybrid approach enhances the speed and effi-Kciency of primary structure analysis and the range ofproteins that can be sequenced. It also circumvents ob-.staclessuchasthepresenceofanamino-terminal block-ing group or the lack of a key overlap peptide.Only afew segments of primarystructuremust bedeterminedHR'0byEdmananalysis.DNA sequencing reveals the order in which aminoAphenylthiohydantoicacidacids are added to the nascent polypeptide chain as it issynthesized on the ribosomes.However, itprovides noHt, nitro-H,Omethaneinformation about posttranslational modifications suchas proteolytic processing,methylation, glycosylation,phosphorylation, hydroxylation of proline and lysine,anddisulfidebondformationthataccompanymaturation.While Edman sequencing can detect the presenceofmostposttranslationalevents,technical limitationsoften prevent identification ofaspecificmodification.A phenylthiohydantoin and a peptideshorterbyone residueTable4-1.Methodsforcleavingpolypeptides.Figure 4-6.TheEdmanreaction.Phenylisothio-cyanatederivatizestheamino-terminal residueofaMethodBondCleavedpeptideas aphenylthiohydantoicacid.TreatmentwithMet-XCNBracid inanonhydroxylicsolvent releasesaphenylthio-hydantoin,whichissubsequentlyidentifiedbyitschro-TrypsinLys-X and Arg-Xmatographic mobility,and a peptideone residueChymotrypsinHydrophobic amino acid-Xshorter.Theprocessisthenrepeated.Lys-XEndoproteinaseLys-CEndoproteinase Arg-CArg-Xular protein, some means of identifying the correctEndoproteinaseAsp-NX-Aspeg,knowledge of a portion of its nucleotide se-clone-essential.A hybrid approach thus hasquencGlu-X,particularly whereX is hydroV8proteaseemerged. Edman sequencing is used to provide a partiaphobicamino acid sequence.Oligonucleotide primers modeledAsn-GlyHydroxylamineon this partial sequence can then be used to identifyclones or to amplify the appropriate gene by the poly-Trp-Xo-lodosobenzenemerasechainreaction(PCR)(seeChapter40).OnceanMild acidAsp-Proauthentic DNA clone is obtained, its oligonucleotide
26 / CHAPTER 4 N H H N R O R′ NH2 O S O R Phenylisothiocyanate (Edman reagent) and a peptide N H H N R O R′ N H S NH O A phenylthiohydantoic acid N H NH2 O R N NH A phenylthiohydantoin and a peptide shorter by one residue S C N H H2O + , nitromethane + + Figure 4–6. The Edman reaction. Phenylisothiocyanate derivatizes the amino-terminal residue of a peptide as a phenylthiohydantoic acid. Treatment with acid in a nonhydroxylic solvent releases a phenylthiohydantoin, which is subsequently identified by its chromatographic mobility, and a peptide one residue shorter. The process is then repeated. Peptide X Peptide Y Peptide Z Carboxyl terminal portion of peptide X Amino terminal portion of peptide Y Figure 4–7. The overlapping peptide Z is used to deduce that peptides X and Y are present in the original protein in the order X → Y, not Y ← X. ular protein, some means of identifying the correct clone—eg, knowledge of a portion of its nucleotide sequence—is essential. A hybrid approach thus has emerged. Edman sequencing is used to provide a partial amino acid sequence. Oligonucleotide primers modeled on this partial sequence can then be used to identify clones or to amplify the appropriate gene by the polymerase chain reaction (PCR) (see Chapter 40). Once an authentic DNA clone is obtained, its oligonucleotide sequence can be determined and the genetic code used to infer the primary structure of the encoded polypeptide. The hybrid approach enhances the speed and efficiency of primary structure analysis and the range of proteins that can be sequenced. It also circumvents obstacles such as the presence of an amino-terminal blocking group or the lack of a key overlap peptide. Only a few segments of primary structure must be determined by Edman analysis. DNA sequencing reveals the order in which amino acids are added to the nascent polypeptide chain as it is synthesized on the ribosomes. However, it provides no information about posttranslational modifications such as proteolytic processing, methylation, glycosylation, phosphorylation, hydroxylation of proline and lysine, and disulfide bond formation that accompany maturation. While Edman sequencing can detect the presence of most posttranslational events, technical limitations often prevent identification of a specific modification. Table 4–1. Methods for cleaving polypeptides. Method Bond Cleaved CNBr Met-X Trypsin Lys-X and Arg-X Chymotrypsin Hydrophobic amino acid-X Endoproteinase Lys-C Lys-X Endoproteinase Arg-C Arg-X Endoproteinase Asp-N X-Asp V8 protease Glu-X, particularly where X is hydrophobic Hydroxylamine Asn-Gly o-Iodosobenzene Trp-X Mild acid Asp-Pro ch04.qxd 2/13/2003 2:02 PM Page 26
27PROTEINS:DETERMINATIONOFPRIMARYSTRUCTUREMASSSPECTROMETRYDETECTSCOVALENTMODIFICATIONS0AA田OMass spectrometry,which discriminates molecules+Obased solely on their mass, is ideal for detecting thesAphosphate, hydroxyl, and other groups on posttransla-tionallymodified aminoacids.Each addsa specificandreadily identified increment of mass to the modifiedamino acid (Table 4-2). For analysis by mass spec-trometry,a samplein a vacuum isvaporized underEconditions where protonation can occur,impartingpositive charge. An electrical field then propels thecations through a magnetic field which deflects themFigure4-8.Basiccomponents ofasimplemassat a right angle to their original direction of flight andspectrometer.Amixtureofmoleculesisvaporizedinanfocuses them onto a detector (Figure 4-8).The magionized state in thesample chamber s.Thesemole-netic force required to deflect thepath of each ionicculesarethenaccelerateddowntheflighttubebyanspecies onto the detector, measured as the current ap-electrical potential applied to accelerator grid A. An ad-plied to the electromagnet, is recorded. For ions ofjustableelectromagnet,E,appliesamagneticfieldthatidentical net charge,this force is proportionateto theirdeflectstheflightoftheindividual ionsuntil theystrikemass.In a time-of-flight mass spectrometer, a brieflythe detector,D.The greater the mass of the ion, theapplied electric field accelerates the ions towards a dehigherthemagneticfieldrequiredtofocusitontothetector that records the time at which each ion arrivesdetector.For molecules of identical charge, the velocity to whichthey are acceleratedand hence the time required toreachthedetector-will be inverselyproportionatetotheir mass.Conventional mass spectrometers generally are usedphase HPLC column are introduced directly into thetodeterminethemassesof moleculesof 1000Daormass spectrometer for immediate determination ofless,whereas time-of-flight mass spectrometers aretheir masses.suited for determining the large masses of proteins.Peptides inside the mass spectrometer are brokenThe analysis of peptides and proteins by mass spec-down into smaller units by coilisions with neutral he-tometry initially was hindered by difficulties inlium atoms (collision-induced dissociation),and themasses of the individual fragments are determined.volatilizing large organic molecules.However,matrix-assisted laser-desorption (MALDI)and electrospraySince peptide bonds are much more labile than carbon-dispersion(eg,nanospray)permitthemassesof evencarbonbonds,the mostabundantfragmentswill differlarge polypeptides (>100,000 Da)tobedeterminedfrom one another by units equivalent to one or twowith extraordinary accuracy(± 1Da).Using electro-aminoacids.Since-withtheexception ofleucineandspray dispersion, peptides eluting from a reversedisoleucine-the molecular mass of each amino acid isunique, the sequence of the peptide can be recon-structed from the masses of its fragments.Table4-2.Mass increasesresultingfromcommonposttranslationalmodifications.TandemMassSpectrometryComplex peptide mixtures can now be analyzed with-ModificationMassIncrease(Da)outprior purification by tandem mass spectrometry,80which employs the equivalent of twomass spectrome-Phosphorylationters linked in series.The first spectrometer separates in-16Hydroxylationdividual peptides based upon their differences in mass.14MethylationBy adjusting the field strength of the first magnet, a sin-gle peptide can be directed into the second mass spec-42Acetylationtrometer, where fragments are generated and theirn210Myristylationmasses determined. As the sensitivity and versatility of238mass spectrometry continue to increase, it is displacingPalmitoylationEdman sequencers for the direct analysis of protein pri-162Glycosylationmarystructure
PROTEINS: DETERMINATION OF PRIMARY STRUCTURE / 27 S A E D Figure 4–8. Basic components of a simple mass spectrometer. A mixture of molecules is vaporized in an ionized state in the sample chamber S. These molecules are then accelerated down the flight tube by an electrical potential applied to accelerator grid A. An adjustable electromagnet, E, applies a magnetic field that deflects the flight of the individual ions until they strike the detector, D. The greater the mass of the ion, the higher the magnetic field required to focus it onto the detector. MASS SPECTROMETRY DETECTS COVALENT MODIFICATIONS Mass spectrometry, which discriminates molecules based solely on their mass, is ideal for detecting the phosphate, hydroxyl, and other groups on posttranslationally modified amino acids. Each adds a specific and readily identified increment of mass to the modified amino acid (Table 4–2). For analysis by mass spectrometry, a sample in a vacuum is vaporized under conditions where protonation can occur, imparting positive charge. An electrical field then propels the cations through a magnetic field which deflects them at a right angle to their original direction of flight and focuses them onto a detector (Figure 4–8). The magnetic force required to deflect the path of each ionic species onto the detector, measured as the current applied to the electromagnet, is recorded. For ions of identical net charge, this force is proportionate to their mass. In a time-of-flight mass spectrometer, a briefly applied electric field accelerates the ions towards a detector that records the time at which each ion arrives. For molecules of identical charge, the velocity to which they are accelerated—and hence the time required to reach the detector—will be inversely proportionate to their mass. Conventional mass spectrometers generally are used to determine the masses of molecules of 1000 Da or less, whereas time-of-flight mass spectrometers are suited for determining the large masses of proteins. The analysis of peptides and proteins by mass spectometry initially was hindered by difficulties in volatilizing large organic molecules. However, matrixassisted laser-desorption (MALDI) and electrospray dispersion (eg, nanospray) permit the masses of even large polypeptides (> 100,000 Da) to be determined with extraordinary accuracy (± 1 Da). Using electrospray dispersion, peptides eluting from a reversedphase HPLC column are introduced directly into the mass spectrometer for immediate determination of their masses. Peptides inside the mass spectrometer are broken down into smaller units by collisions with neutral helium atoms (collision-induced dissociation), and the masses of the individual fragments are determined. Since peptide bonds are much more labile than carboncarbon bonds, the most abundant fragments will differ from one another by units equivalent to one or two amino acids. Since—with the exception of leucine and isoleucine—the molecular mass of each amino acid is unique, the sequence of the peptide can be reconstructed from the masses of its fragments. Tandem Mass Spectrometry Complex peptide mixtures can now be analyzed without prior purification by tandem mass spectrometry, which employs the equivalent of two mass spectrometers linked in series. The first spectrometer separates individual peptides based upon their differences in mass. By adjusting the field strength of the first magnet, a single peptide can be directed into the second mass spectrometer, where fragments are generated and their masses determined. As the sensitivity and versatility of mass spectrometry continue to increase, it is displacing Edman sequencers for the direct analysis of protein primary structure. Table 4–2. Mass increases resulting from common posttranslational modifications. Modification Mass Increase (Da) Phosphorylation 80 Hydroxylation 16 Methylation 14 Acetylation 42 Myristylation 210 Palmitoylation 238 Glycosylation 162 ch04.qxd 2/13/2003 2:02 PM Page 27
28/CHAPTER4in the hemoglobin tetramer undergo change pre- andGENOMICSENABLESPROTEINSTOBEpostpartum.Many proteins undergoposttranslationalIDENTIFIEDFROMSMALLAMOUNTSmodifications during maturation into functionallyOFSEQUENCEDATAcompetentforms or as a means ofregulating theirpropPrimary structure analysis has been revolutionized byerties.Knowledge of the human genome therefore rep-genomics,theapplication ofautomatedoligonucleotideresents only the beginning of the task of describing liv-sequencing and computerized data retrieval and analysising organisms in molecular detail and understandingthe dynamics of processes such as growth,aging,andto sequence an organism's entiregenetic complementThe first genome sequenced was that of Haemophilusdisease. As the human body contains thousands of cellinfluenzae,in1995.Bymid 2001,the completetypes, each containing thousands of proteins, the pro-genome sequences for over 50 organisms had been de-teome-the set of all the proteins expressed by an indi-termined. These include the human genome and thosevidual cell at a particular time—represents a movingof several bacterial pathogens; the results and signifi-target of formidable dimensions.cance of the Human Genome Project are discussed inChapter 54.Where genome sequence is known,theTwo-Dimensional Electrophoresis&task of determining a protein's DNA-derived primaryGeneArrayChipsAreUsedtoSurveysequence is materially simplified. In essence, the secondProtein Expressionhalf of the hybrid approach has already been com-Onegoal ofproteomics is the identification of proteinspleted.All that remains is to acquire sufficient informa-whose levels of expression correlate with medically sigtion to permittheopen reading frame (ORF)thatnificant events.Thepresumption isthatproteins whoseencodes the protein to be retrieved from an Internet-appearance ordisappearance isassociatedwitha specificaccessible genome database and identified.In somephysiologic condition or disease will provide insightscases, a segment of amino acid sequence only four orintorootcausesandmechanisms.Determinationofthefive residues in length may be sufficient to identify theproteomes characteristic of each cell type requires thecorrectORF.Computerized search algorithms assist the identifiutmostefficiencyintheisolationandidentification ofindividual proteins.The contemporary approach uti-cation of the gene encoding a given protein and clarifylizes robotic automation tospeed samplepreparationuncertainties that arise from Edman sequencing andand largetwo-dimensional gels to resolve cellularpromass spectrometry:By exploiting computers to solveteins. Individual polypeptides are then extracted andcomplexpuzzles,thespectrumofinformationsuitableanalyzed byEdman sequencing ormass spectroscopy.for identification of the ORF that encodes a particularWhile onlyabout 1000 proteins can be resolved on apolypeptide is greatly expanded. In peptide mass profil-single gel, two-dimensional electrophoresis has a majoring, for example, a peptide digest is introduced into theadvantage in that it examines the proteins themselves.mass spectrometer and the sizes of the peptides are de-Analternativeand complementaryapproachemploystermined. A computer is then used to find an ORFgene arrays, sometimes called DNA chips, to detect thewhose predicted protein product would, if brokenexpression of the mRNAs which encode proteins.down into peptides by the cleavage method selected,While changes in the expression of the mRNA encod-produce a set of peptides whose masses match those obing a protein do not necessarily reflect comparableserved by mass spectrometry.changes in the level of the corresponding protein, genearrays are more sensitive probes than two-dimensionalPROTEOMICS&THEPROTEOMEgels and thus can examinemore gene products.TheGoal ofProteomicsIstoIdentifytheEntireComplementof ProteinsElaboratedBioinformaticsAssistsIdentificationbyaCell UnderDiverseConditionsof ProteinFunctionsWhile the sequence of the human genome is knownThefunctions ofa large proportion of the proteins en-the picture provided by genomics alone is both staticcoded by the human genome are presently unknown.and incomplete. Proteomics aims to identify the entireRecent advances in bioinformaticspermit researchers tocomplement of proteins elaborated by a cell under di-compare amino acid sequences to discover clues to po-verse conditions. As genes are switched on and off, pro-tential properties,physiologic roles,and mechanisms ofteins are synthesized in particular cell types at specificaction of proteins. Algorithms exploit the tendency oftimes of growth or differentiation and in response tonature to employ variations of a structural theme toexternal stimuli.Muscle cells express proteins not ex-perform similar functions in several proteins (eg,thepressed by neural cells, and the type of subunits presentRossmannnucleotidebindingfoldtobindNAD(P)H
28 / CHAPTER 4 GENOMICS ENABLES PROTEINS TO BE IDENTIFIED FROM SMALL AMOUNTS OF SEQUENCE DATA Primary structure analysis has been revolutionized by genomics, the application of automated oligonucleotide sequencing and computerized data retrieval and analysis to sequence an organism’s entire genetic complement. The first genome sequenced was that of Haemophilus influenzae, in 1995. By mid 2001, the complete genome sequences for over 50 organisms had been determined. These include the human genome and those of several bacterial pathogens; the results and significance of the Human Genome Project are discussed in Chapter 54. Where genome sequence is known, the task of determining a protein’s DNA-derived primary sequence is materially simplified. In essence, the second half of the hybrid approach has already been completed. All that remains is to acquire sufficient information to permit the open reading frame (ORF) that encodes the protein to be retrieved from an Internetaccessible genome database and identified. In some cases, a segment of amino acid sequence only four or five residues in length may be sufficient to identify the correct ORF. Computerized search algorithms assist the identification of the gene encoding a given protein and clarify uncertainties that arise from Edman sequencing and mass spectrometry. By exploiting computers to solve complex puzzles, the spectrum of information suitable for identification of the ORF that encodes a particular polypeptide is greatly expanded. In peptide mass profiling, for example, a peptide digest is introduced into the mass spectrometer and the sizes of the peptides are determined. A computer is then used to find an ORF whose predicted protein product would, if broken down into peptides by the cleavage method selected, produce a set of peptides whose masses match those observed by mass spectrometry. PROTEOMICS & THE PROTEOME The Goal of Proteomics Is to Identify the Entire Complement of Proteins Elaborated by a Cell Under Diverse Conditions While the sequence of the human genome is known, the picture provided by genomics alone is both static and incomplete. Proteomics aims to identify the entire complement of proteins elaborated by a cell under diverse conditions. As genes are switched on and off, proteins are synthesized in particular cell types at specific times of growth or differentiation and in response to external stimuli. Muscle cells express proteins not expressed by neural cells, and the type of subunits present in the hemoglobin tetramer undergo change pre- and postpartum. Many proteins undergo posttranslational modifications during maturation into functionally competent forms or as a means of regulating their properties. Knowledge of the human genome therefore represents only the beginning of the task of describing living organisms in molecular detail and understanding the dynamics of processes such as growth, aging, and disease. As the human body contains thousands of cell types, each containing thousands of proteins, the proteome—the set of all the proteins expressed by an individual cell at a particular time—represents a moving target of formidable dimensions. Two-Dimensional Electrophoresis & Gene Array Chips Are Used to Survey Protein Expression One goal of proteomics is the identification of proteins whose levels of expression correlate with medically significant events. The presumption is that proteins whose appearance or disappearance is associated with a specific physiologic condition or disease will provide insights into root causes and mechanisms. Determination of the proteomes characteristic of each cell type requires the utmost efficiency in the isolation and identification of individual proteins. The contemporary approach utilizes robotic automation to speed sample preparation and large two-dimensional gels to resolve cellular proteins. Individual polypeptides are then extracted and analyzed by Edman sequencing or mass spectroscopy. While only about 1000 proteins can be resolved on a single gel, two-dimensional electrophoresis has a major advantage in that it examines the proteins themselves. An alternative and complementary approach employs gene arrays, sometimes called DNA chips, to detect the expression of the mRNAs which encode proteins. While changes in the expression of the mRNA encoding a protein do not necessarily reflect comparable changes in the level of the corresponding protein, gene arrays are more sensitive probes than two-dimensional gels and thus can examine more gene products. Bioinformatics Assists Identification of Protein Functions The functions of a large proportion of the proteins encoded by the human genome are presently unknown. Recent advances in bioinformatics permit researchers to compare amino acid sequences to discover clues to potential properties, physiologic roles, and mechanisms of action of proteins. Algorithms exploit the tendency of nature to employ variations of a structural theme to perform similar functions in several proteins (eg, the Rossmann nucleotide binding fold to bind NAD(P)H, ch04.qxd 2/13/2003 2:02 PM Page 28
29PROTEINS:DETERMINATIONOFPRIMARYSTRUCTUREnuclear targeting sequences,and EF hands to bind.Scientists are now trying to determine the primaryCa+). These domains generally are detected in the pri-sequence and functional role of every protein ex-mary structure by conservation of particular aminopressed in a living cell, known as its proteome.acids at key positions. Insights into the properties andAmajor goal is the identificationof proteins whosephysiologicroleofa newlydiscoveredproteinthus mayappearance ordisappearance correlates withphysio-be inferred by comparing its primary structure withlogic phenomena, aging,or specific diseases.that ofknown proteins.REFERENCESSUMMARYDeutscher MP (editor): Guide to Protein Purification. Methods En-Long amino acid polymers or polypeptides constitutezymol 1990;182. (Entire volume.)thebasic structural unit of proteins,and the structureGeveart K, Vandekerckhove J: Protein identification methods inof a protein provides insight into how it fulfills itsproteomics. Electrophoresis 2000;21:1145.functions.Helmuth L: Genome research: map of the human genome 3.0. Sci-.The Edman reaction enabled amino acid sequenceence2001;293:583.Khan J et al: DNA microarray technology: the anticipated impactanalysis to be automated.Mass spectrometry pro-on the study of human disease. Biochim Biophys Actavides a sensitive and versatile tool for determining1999;1423:M17.primary structure and for the identification of postMcLafferty FW et al: Biomolecule mass spectrometry. Sciencetranslational modifications1999;284:1289..DNA cloning and molecular biology coupled withPatnaik SK,Blumenfeld OO:Use ofon-line tools and databasesforprotein chemistry provide a hybrid approach thatroutine sequence analyses, Anal Biochem 2001;289:1.greatly increases the speed and efficiency for determi-Schena M et al: Quantitative monitoring of gene expression pat-nation of primary structures of proteins.terns with a complementary DNA microarray. Science1995;270:467..Genomics—the analysis of the entire oligonucleotideSemsarian C, Seidman CE: Molecular medicine in the 21st cen-sequence of an organism's complete genetic mater-tury.Intern Med J2001;31:53.ial-has provided further enhancements.Temple LK et al: Essays on science and society: defining disease in·Computer algorithms facilitate identification of thethe genomics era. Science 2001;293:807.open reading frames that encode a given protein byWilkins MR et al: High-throughput mass spectrometric discoveryusing partial sequences and peptide mass profiling toof protein post-translational modifications. J Mol Biolsearchsequencedatabases.1999;289:645
PROTEINS: DETERMINATION OF PRIMARY STRUCTURE / 29 nuclear targeting sequences, and EF hands to bind Ca2+). These domains generally are detected in the primary structure by conservation of particular amino acids at key positions. Insights into the properties and physiologic role of a newly discovered protein thus may be inferred by comparing its primary structure with that of known proteins. SUMMARY • Long amino acid polymers or polypeptides constitute the basic structural unit of proteins, and the structure of a protein provides insight into how it fulfills its functions. • The Edman reaction enabled amino acid sequence analysis to be automated. Mass spectrometry provides a sensitive and versatile tool for determining primary structure and for the identification of posttranslational modifications. • DNA cloning and molecular biology coupled with protein chemistry provide a hybrid approach that greatly increases the speed and efficiency for determination of primary structures of proteins. • Genomics—the analysis of the entire oligonucleotide sequence of an organism’s complete genetic material—has provided further enhancements. • Computer algorithms facilitate identification of the open reading frames that encode a given protein by using partial sequences and peptide mass profiling to search sequence databases. • Scientists are now trying to determine the primary sequence and functional role of every protein expressed in a living cell, known as its proteome. • A major goal is the identification of proteins whose appearance or disappearance correlates with physiologic phenomena, aging, or specific diseases. REFERENCES Deutscher MP (editor): Guide to Protein Purification. Methods Enzymol 1990;182. (Entire volume.) Geveart K, Vandekerckhove J: Protein identification methods in proteomics. Electrophoresis 2000;21:1145. Helmuth L: Genome research: map of the human genome 3.0. Science 2001;293:583. Khan J et al: DNA microarray technology: the anticipated impact on the study of human disease. Biochim Biophys Acta 1999;1423:M17. McLafferty FW et al: Biomolecule mass spectrometry. Science 1999;284:1289. Patnaik SK, Blumenfeld OO: Use of on-line tools and databases for routine sequence analyses. Anal Biochem 2001;289:1. Schena M et al: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995;270:467. Semsarian C, Seidman CE: Molecular medicine in the 21st century. Intern Med J 2001;31:53. Temple LK et al: Essays on science and society: defining disease in the genomics era. Science 2001;293:807. Wilkins MR et al: High-throughput mass spectrometric discovery of protein post-translational modifications. J Mol Biol 1999;289:645. ch04.qxd 2/13/2003 2:02 PM Page 29
5Proteins:HigherOrdersof StructureVictorW.Rodwell,PhD,&PeterJ.Kennelly,PhDBIOMEDICALIMPORTANCEGlobular proteins are compact,are roughly sphericalor ovoid in shape, and have axial ratios (the ratio ofProteins catalyze metabolic reactions, power cellulartheir shortest to longest dimensions) of not over 3.motion, and form macromolecular rods and cables thatMost enzymes areglobular proteins, whose large inter-provide structural integrity to hair, bones, tendons, andnal volume provides ample space in which to con-teeth. In nature, form follows function.The structuralstruct cavities of the specific shape, charge, and hy-variety of human proteins therefore reflects the sophis-drophobicityorhydrophilicityrequiredtobindtication and diversity of their biologic roles.Maturationsubstrates and promote catalysis.By contrast,manyof a newlysynthesized polypeptideinto abiologicallystructural proteins adopt highly extended conforma-functionalproteinrequiresthatitbefoldedintoaspetions.These fibrous proteins possess axial ratios of 10cific three-dimensional arrangement,or conformation.or more.During maturation, posttranslational modificationsLipoproteins and glycoproteins contain covalentlymay add new chemical groups or remove transientlybound lipid and carbohydrate,respectively.Myoglobin,needed peptide segments. Genetic or nutritional defi-hemoglobin,cytochromes,and many otherproteinsciencies that impede protein maturation are deleteriouscontain tightly associated metal ions and are termedto health. Examples of the former include Creutzfeldt-metalloproteins.With the development and applica-Jakob disease, scrapie, Alzheimer's disease, and bovinetion of techniques for determining the amino acid se-spongiform encephalopathy (mad cow disease).Scurvyquences of proteins (Chapter 4), more precise classifica-represents a nutritional deficiency that impairs proteintion schemes have emerged based upon similarity,ormaturation.homology,in amino acid sequence and structure.However, many early classification terms remain incommon use.CONFORMATIONVERSUSCONFIGURATIONThe terms configuration and conformation are oftenPROTEINSARECONSTRUCTEDUSINGconfused. Configuration refers to the geometric rela-tionship between a given set of atoms,for example,MODULARPRINCIPLESthose that distinguish L-from D-amino acids. Intercon-Proteins perform complexphysical and catalytic func-version of configurational alternatives requires breakingtionsby positioningspecific chemical groups ina pre-covalent bonds.Conformation refers to the spatial re-cise three-dimensional arrangement.The polypeptidelationshipofeveryatom inamolecule.Interconversionscaffold containing thesegroups must adopt a confor-between conformers occurs without covalent bond rup-mation that is both functionally efficient and phys-ture, with retention of configuration, and typically viaically strong.At first glance, the biosynthesis ofrotation about single bonds.polypeptides comprised of tens of thousands of indi-vidual atoms wouldappear to be extremely challeng-ing.When one considers that a typical polypeptidePROTEINSWEREINITIALLYCLASSIFIEDcan adopt≥105odistinctconformations,foldingintoBYTHEIRGROSSCHARACTERISTICSthe conformation appropriate to their biologic func-Scientists initially approached structure-function rela-tion would appear to be even more difficult. As de-tionships in proteins by separating them into classesscribed in Chapters 3 and 4, synthesis of the polypep-basedupon properties such as solubility,shape,or thetidebackbones of proteins employsa small set ofpresence of nonprotein groups.For example, the pro-common building blocks or modules, theamino acids,teins that can be extracted from cells using solutions atjoined by a common linkage,the peptidebond.Aphysiologic pH and ionic strength are classified as sol-stepwise modular pathway simplifies the folding anduble.Extraction of integral membrane proteins re-processing of newly synthesized polypeptides into ma-quires dissolution of the membrane with detergents.ture proteins.30
Proteins: Higher Orders of Structure 5 30 Victor W. Rodwell, PhD, & Peter J. Kennelly, PhD BIOMEDICAL IMPORTANCE Proteins catalyze metabolic reactions, power cellular motion, and form macromolecular rods and cables that provide structural integrity to hair, bones, tendons, and teeth. In nature, form follows function. The structural variety of human proteins therefore reflects the sophistication and diversity of their biologic roles. Maturation of a newly synthesized polypeptide into a biologically functional protein requires that it be folded into a specific three-dimensional arrangement, or conformation. During maturation, posttranslational modifications may add new chemical groups or remove transiently needed peptide segments. Genetic or nutritional deficiencies that impede protein maturation are deleterious to health. Examples of the former include CreutzfeldtJakob disease, scrapie, Alzheimer’s disease, and bovine spongiform encephalopathy (mad cow disease). Scurvy represents a nutritional deficiency that impairs protein maturation. CONFORMATION VERSUS CONFIGURATION The terms configuration and conformation are often confused. Configuration refers to the geometric relationship between a given set of atoms, for example, those that distinguish L- from D-amino acids. Interconversion of configurational alternatives requires breaking covalent bonds. Conformation refers to the spatial relationship of every atom in a molecule. Interconversion between conformers occurs without covalent bond rupture, with retention of configuration, and typically via rotation about single bonds. PROTEINS WERE INITIALLY CLASSIFIED BY THEIR GROSS CHARACTERISTICS Scientists initially approached structure-function relationships in proteins by separating them into classes based upon properties such as solubility, shape, or the presence of nonprotein groups. For example, the proteins that can be extracted from cells using solutions at physiologic pH and ionic strength are classified as soluble. Extraction of integral membrane proteins requires dissolution of the membrane with detergents. Globular proteins are compact, are roughly spherical or ovoid in shape, and have axial ratios (the ratio of their shortest to longest dimensions) of not over 3. Most enzymes are globular proteins, whose large internal volume provides ample space in which to construct cavities of the specific shape, charge, and hydrophobicity or hydrophilicity required to bind substrates and promote catalysis. By contrast, many structural proteins adopt highly extended conformations. These fibrous proteins possess axial ratios of 10 or more. Lipoproteins and glycoproteins contain covalently bound lipid and carbohydrate, respectively. Myoglobin, hemoglobin, cytochromes, and many other proteins contain tightly associated metal ions and are termed metalloproteins. With the development and application of techniques for determining the amino acid sequences of proteins (Chapter 4), more precise classification schemes have emerged based upon similarity, or homology, in amino acid sequence and structure. However, many early classification terms remain in common use. PROTEINS ARE CONSTRUCTED USING MODULAR PRINCIPLES Proteins perform complex physical and catalytic functions by positioning specific chemical groups in a precise three-dimensional arrangement. The polypeptide scaffold containing these groups must adopt a conformation that is both functionally efficient and physically strong. At first glance, the biosynthesis of polypeptides comprised of tens of thousands of individual atoms would appear to be extremely challenging. When one considers that a typical polypeptide can adopt ≥ 1050 distinct conformations, folding into the conformation appropriate to their biologic function would appear to be even more difficult. As described in Chapters 3 and 4, synthesis of the polypeptide backbones of proteins employs a small set of common building blocks or modules, the amino acids, joined by a common linkage, the peptide bond. A stepwise modular pathway simplifies the folding and processing of newly synthesized polypeptides into mature proteins. ch05.qxd 2/13/2003 2:06 PM Page 30