version date: 1 December 2006 EXERCISE 12 HYDROPHOBICITY IN DRUG DESIGN Pietro Cozzini and francesca Spyrakis Laboratory of Molecular Modelling, Department of General and Inorganic Chemistry, Chemical-Physics and Analytical Chemistry, University of Parma 43100 Parma, Italy, Department of Biochemistry and Molecular Biology, University of Parma, 43100 Parma, Italy Hydrophobicity represents the tendency of a substance to repel water and to avoid the complete dissolution in water. The term"hydrophobic"means"water fearing,, from the Greek words hydro water,and phobo, fear. Being that hydrophobicity is one of the most important physicochemical parameters associated with chemical compounds, several studies have been carried out to understand, evaluate, and predict this parameter [1-8]. In fact, hydrophobicity governs numerous and different biological processes, such as, for example, transport, distribution, and metabolism of biological molecules; molecular recognition; and protein folding. Therefore, the knowledge of a parameter that describes the behavior of solutes into polar and nonpolar phases is essential to predict the transport and activity of drugs, pesticides, and xenobiotic The hydrophobic effect can be defined as" the tendency of nonpolar groups to cluster, shielding themselves from contact with an aqueous environment". The hydrophobic effect in proteins can also be described as the tendency of polar species to congregate in such a manner to maximize electrostatic interactions. Proteins, in fact, organize themselves to expose polar side-chains toward the solvent, and retain hydrophobic amino acid in a central hydrophobic core. The hydrophobic effect constitutes one of the main determinants of globular protein molecules structure and folding The hydrophilic regions tend to surround hydrophobic areas, which gather into the central hydrophobic core, generating a protein characterized by a specific and function-related three- dimensional structure. This driving force not only guides protein folding processes, but also any kind of biological interaction. Biological molecules interact, mainly, via electrostatic forces including hydrogen bonds or hydrogen-bonding networks, often formed through water molecules During a protein-ligand association, water molecules not able to properly locate themselves at the complex interface, are displaced and pushed into the bulk solvent, increasing entropy. Thus, it is possible to define the hydrophobic effect as a free energy phenomenon, constituted by both <ww.iupac. org/publications/cd/medicinal chemistry/>
1 EXERCISE I.12 HYDROPHOBICITY IN DRUG DESIGN Pietro Cozzini1 and Francesca Spyrakis2 1 Laboratory of Molecular Modelling, Department of General and Inorganic Chemistry, Chemical-Physics and Analytical Chemistry, University of Parma, 43100 Parma, Italy; 2 Department of Biochemistry and Molecular Biology, University of Parma, 43100 Parma, Italy Hydrophobicity represents the tendency of a substance to repel water and to avoid the complete dissolution in water. The term “hydrophobic” means “water fearing”, from the Greek words hydro, water, and phobo, fear. Being that hydrophobicity is one of the most important physicochemical parameters associated with chemical compounds, several studies have been carried out to understand, evaluate, and predict this parameter [1–8]. In fact, hydrophobicity governs numerous and different biological processes, such as, for example, transport, distribution, and metabolism of biological molecules; molecular recognition; and protein folding. Therefore, the knowledge of a parameter that describes the behavior of solutes into polar and nonpolar phases is essential to predict the transport and activity of drugs, pesticides, and xenobiotics. The hydrophobic effect can be defined as “the tendency of nonpolar groups to cluster, shielding themselves from contact with an aqueous environment”. The hydrophobic effect in proteins can also be described as the tendency of polar species to congregate in such a manner to maximize electrostatic interactions. Proteins, in fact, organize themselves to expose polar side-chains toward the solvent, and retain hydrophobic amino acid in a central hydrophobic core. The hydrophobic effect constitutes one of the main determinants of globular protein molecules structure and folding: The hydrophilic regions tend to surround hydrophobic areas, which gather into the central hydrophobic core, generating a protein characterized by a specific and function-related threedimensional structure. This driving force not only guides protein folding processes, but also any kind of biological interaction. Biological molecules interact, mainly, via electrostatic forces, including hydrogen bonds or hydrogen-bonding networks, often formed through water molecules. During a protein-ligand association, water molecules not able to properly locate themselves at the complex interface, are displaced and pushed into the bulk solvent, increasing entropy. Thus, it is possible to define the hydrophobic effect as a free energy phenomenon, constituted by both <www.iupac.org/publications/cd/medicinal_chemistry/> version date: 1 December 2006
version date: 1 December 2006 enthalpic and entropic phenomena [9 The hydrophobic character of different amino acids was deeply studied, and the possibility of creating amino acid hydrophobicity scales was pursued by several biochemical researchers with different methods and approaches [10, 11]. A complete understanding of the forces that guide amino acid interactions within proteins could lead to the prediction of protein structure and processes that drive a protein to fold into its native form The octanol/water partition coefficient(log Porw)constitutes a quantitative, and easily accessible, hydrophobicity measurement. P is defined as the ratio of the equilibrium concentration of a substance dissolved in a two-phase system, formed by two immiscible solvents PoN C water As a result, the partition coefficient P is the quotient of two concentrations and is normally calculated in the form of its logarithm to base 10 (log P), because P ranges from 10 to 10 Log p values are widely used in bio-accumulation studies, in drug absorption and toxicity predictions and, recently, even in biological interactions modeling [ 12, 13]. Several endeavours have been carried out to develop rapid and reliable log P estimation methodologies, capable of predicting the partition coefficient values for compounds not experimentally tested The common and standard procedure adopted for experimental log P estimation is the shake- flask method, used to determine the hydrophobicity of compounds ranging from -2 to 4 log P values. Log P>0 characterize hydrophobic substances soluble in the lipid phase, while log P<0 typifies polar compounds soluble in the water phase(Panel 1) <www.iupac.org/publications/cd/medicinalchemistry/>
2 enthalpic and entropic phenomena [9]. The hydrophobic character of different amino acids was deeply studied, and the possibility of creating amino acid hydrophobicity scales was pursued by several biochemical researchers with different methods and approaches [10,11]. A complete understanding of the forces that guide amino acid interactions within proteins could lead to the prediction of protein structure and processes that drive a protein to fold into its native form. The octanol/water partition coefficient (log PO/W) constitutes a quantitative, and easily accessible, hydrophobicity measurement. P is defined as the ratio of the equilibrium concentration of a substance dissolved in a two-phase system, formed by two immiscible solvents: PO/W = water octanol c c As a result, the partition coefficient P is the quotient of two concentrations and is normally calculated in the form of its logarithm to base 10 (log P), because P ranges from 10–4 to 108 . Log P values are widely used in bio-accumulation studies, in drug absorption and toxicity predictions and, recently, even in biological interactions modeling [12,13]. Several endeavours have been carried out to develop rapid and reliable log P estimation methodologies, capable of predicting the partition coefficient values for compounds not experimentally tested. The common and standard procedure adopted for experimental log P estimation is the shakeflask method, used to determine the hydrophobicity of compounds ranging from –2 to 4 log P values. Log P > 0 characterize hydrophobic substances soluble in the lipid phase, while log P < 0 typifies polar compounds soluble in the water phase (Panel 1). <www.iupac.org/publications/cd/medicinal_chemistry/> version date: 1 December 2006
version date: 1 December 2006 HYDROPHOBICITY measured as waterloctanol partition coefficient Pa octan [octanol log Pa = log water logP>0→ lipid phase ogP<0→ water phase Panel 1 Fi As an experimental alternative, high-performance liquid chromatography(HPLC)is used for more hydrophobic compounds ranging from 0 to 6 log P values. Log P can be experimentally measured, or predicted from structural data. Experimental measurements are often time-consuming and difficult to make, thus, the need to properly and rapidly estimate hydrophobic parameters more and more pressing. This need was also triggered by the advent of molecular modeling and the screening of large molecular libraries in the perspective of virtual screening and drug design Simultaneously, with new computational applications and molecular modeling progress and achievements, several methods, capable of predicting log P values for thousand of compounds, have been developed, and can now be classified into five major classes [14]: substituent methods fragments methods, methods based on atomic contribution and/or surface areas. methods based on molecular properties, and, finally, methods based on solvatochromic parameters The first"by substituent"approach was proposed by Fujita and coworkers in 1964 [15]. Their technique is based on the following equation log Px-log Pe where Px represents the partition coefficient of a derivative between 1-octanol and water and Ph that of the parent compound. Being that T typically is derived from equilibrium processes, it is possible to directly consider it as a free energy constant. As a consequence, log P represents additive-constitutive, free energy-related property, numerically equivalent to the sum of the parent log P compound, plus a T term, representing the log P difference between a determinate substituent and the hydrogen atom which邮 eplasednble/mt是88¥ tthe log P determination for
3 Panel 1 Fig. 1 As an experimental alternative, high-performance liquid chromatography (HPLC) is used for more hydrophobic compounds ranging from 0 to 6 log P values. Log P can be experimentally measured, or predicted from structural data. Experimental measurements are often time-consuming and difficult to make, thus, the need to properly and rapidly estimate hydrophobic parameters is more and more pressing. This need was also triggered by the advent of molecular modeling and the screening of large molecular libraries in the perspective of virtual screening and drug design. Simultaneously, with new computational applications and molecular modeling progress and achievements, several methods, capable of predicting log P values for thousand of compounds, have been developed, and can now be classified into five major classes [14]: substituent methods, fragments methods, methods based on atomic contribution and/or surface areas, methods based on molecular properties, and, finally, methods based on solvatochromic parameters. The first “by substituent” approach was proposed by Fujita and coworkers in 1964 [15]. Their technique is based on the following equation: π = log PX – log PH where PX represents the partition coefficient of a derivative between 1-octanol and water and PH that of the parent compound. Being that π typically is derived from equilibrium processes, it is possible to directly consider it as a free energy constant. As a consequence, log P represents an additive-constitutive, free energy-related property, numerically equivalent to the sum of the parent log P compound, plus a π term, representing the log P difference between a determinate substituent and the hydrogen atom which has been replaced [16]. As an example, the log P determination for water octanol water octanol [A] [A] log PA = log log P > 0 ⇒ lipid phase log P < 0 ⇒ water phase HYDROPHOBICITY measured as water/octanol partition coefficient PA <www.iupac.org/publications/cd/medicinal_chemistry/> version date: 1 December 2006
version date: 1 December 2006 the methyl group is reported log PCH3=log P-log P The following"by fragments "methods was supported by Rekker and Mannhold, who stated that log P can be calculated as the sum of the fragment values plus certain correction factors. They determined the averaged contributions of simple fragments, using a large database of experimentally measured log P values [17, 18]. Rekker did not indicate which fragment could be considered a valid fragment. The log P of molecules can be calculated using the formula log p=>ann+>bmFm where a is the number of occurrences of fragment f of type n while b is the number of occurrences of correction factor F of type m The well-known CLOGP method clearly represents an improvement of the Rekker approach and in fact, can be expressed by the same equation. CLOGP program breaks molecules into fragments and sums these constant fragment values and structure-dependent correction values taken from Hansch and Leo's database, to predict log P of several organic molecules. The program divides the target molecule into different fragments following a set of simple rules not alterable by users CLOGP represents the first stand-alone program developed by Pomona MedChem, following Rekker general formulation. The program is now available on the Web (http://www.daylightcom/daycgi/clogp) Different from chemical group fragments, the methods based on atomic contribution and/or surface area use atomic fragments and surface area data to predict hydrophobicity. The contribution of each atom to a molecule, in terms of hydrophobicity, can be evaluated by multiplying the corresponding atomic parameter by the degree of exposure to the surrounding solvent. The exposure degree is typically represented by the solvent-accessible surface area(SASA). The first promoters of this method were Broto and his colleagues, who developed a 222 descriptors set, made by combinations of up to four atoms with specific bonding pathways up to four in length, reaching a precision of about 0.4 log units [19]. Later, the concept of sAsa was used by Iwase [20] and Dunn [21] in principal component analysis, to improve their log P estimations. Dunn computed the isotropic surface area, calculating the number of water molecules able to hydrate the polar portions of the solute molecules. As an example, one water molecule was allowed for groups as nitro, aniline, ketones, and tertiary amines, while two waters are allowed for other amines, three for arboxyls, and five for amide groups. The use of SASA parameters has been extended and introduced in several log P calculation algorithms, like the program HINT created by abraham and Kellogg in 1991, which will be subsequently discussed and used for a practical session Various researchers diglot. apree orgpu BreXioHs yareegrteda tragmestl methods, claiming that a
4 the methyl group is reported. log P CH3 = log P – log P The following “by fragments” methods was supported by Rekker and Mannhold, who stated that log P can be calculated as the sum of the fragment values plus certain correction factors. They determined the averaged contributions of simple fragments, using a large database of experimentally measured log P values [17,18]. Rekker did not indicate which fragment could be considered a valid fragment. The log P of molecules can be calculated using the formula log P = ∑anfn + ∑bmFm where a is the number of occurrences of fragment f of type n while b is the number of occurrences of correction factor F of type m. The well-known CLOGP method clearly represents an improvement of the Rekker approach and, in fact, can be expressed by the same equation. CLOGP program breaks molecules into fragments and sums these constant fragment values and structure-dependent correction values taken from Hansch and Leo’s database, to predict log P of several organic molecules. The program divides the target molecule into different fragments following a set of simple rules not alterable by users. CLOGP represents the first stand-alone program developed by Pomona MedChem, following Rekker general formulation. The program is now available on the Web (http://www.daylight.com/daycgi/clogp). Different from chemical group fragments, the methods based on atomic contribution and/or surface area use atomic fragments and surface area data to predict hydrophobicity. The contribution of each atom to a molecule, in terms of hydrophobicity, can be evaluated by multiplying the corresponding atomic parameter by the degree of exposure to the surrounding solvent. The exposure degree is typically represented by the solvent-accessible surface area (SASA). The first promoters of this method were Broto and his colleagues, who developed a 222 descriptors set, made by combinations of up to four atoms with specific bonding pathways up to four in length, reaching a precision of about 0.4 log units [19]. Later, the concept of SASA was used by Iwase [20] and Dunn [21] in principal component analysis, to improve their log P estimations. Dunn computed the isotropic surface area, calculating the number of water molecules able to hydrate the polar portions of the solute molecules. As an example, one water molecule was allowed for groups as nitro, aniline, ketones, and tertiary amines, while two waters are allowed for other amines, three for carboxyls, and five for amide groups. The use of SASA parameters has been extended and introduced in several log P calculation algorithms, like the program HINT created by Abraham and Kellogg in 1991, which will be subsequently discussed and used for a practical session. Various researchers did not agree with previously reported fragmental methods, claiming that a CH3 <www.iupac.org/publications/cd/medicinal_chemistry/> version date: 1 December 2006
version date: 1 December 2006 molecule is rarely a simple sum of its parts and prediction of any molecular property on empirical or calculated fragments has no scientific basis [22]. The Bodor's method computes log P as a function of different calculated molecular properties, like conformations, ionization, hydration, ion- pair formation, keto-enol tautomerism, intramolecular and intermolecular H-bond formation, folding and so forth The fifth log P determination method, based on solvatochromic comparisons, was proposed by Kamlet and coworkers [23] and constitutes, once more, a molecular properties methodology. Log P can be calculated through the following equation log Poct =aV+ bI*+c BH+ daH +e V is a solute volume term, T' is a polarity/polarizability solute term, BH is an independent measure of solute hydrogen-bond acceptor strength, aH the corresponding hydrogen-bond donor strength, while e is the intercept. T*, BH, and aH represent solvatochromic parameters obtained averaging multiple normalized solvent effects on a variety of properties, involving many different types of indicators Several research groups have tried to extend to amino acids the log P calculations, in order to better understand and investigate events like protein folding and biological interactions. However, experimental methods, like chromatography or site-directed mutagenesis, give ambiguous and different results [11]. Generally, each amino acid is characterized by a wide range of hydrophobicity values, thus, deciding and stating which value should correspond to a true measure becomes very difficult and time-consuming In order to obtain rapid and proper estimation of biological molecule hydrophobicity, in 1987 Abraham and Leo extended to common amino acids the fragment method of calculating partition coefficients [10]. Fundamental hydrophobic fragments, obtained from partitioning experiments performed on thousands of compounds, were subsequently reduced to atomic values with inherent bond, ring, chain, branching, and proximity factors. The derived hydrophobic atomic constants and the corresponding SASAs constituted the key parameter of the software HINT(Hydropathic INTeractions), able to directly calculate them for small molecules like ligands, or to obtain them from a residue-based dictionary. The program was thus created with the purpose of rapidly and properly estimating biological interactions such as protein-protein, protein-DNA, and protein- ligand and folding ph Why should we use log P to study and predict recognition and interactions between biological molecules? At least three reasonable answers could be given:(i) log P is essentially an experimental reproducible measurement; (i) partition experiments are low cost and perform relatively rapidly and (iii) log P is directly related to the free energy of binding. In fact, being that hydrophobicity is defined in terms of solubility, log Po/w, and consequently also the hydrophobic atomic constants, <www.iupac.org/publications/cd/medicinalchemistry/>
5 molecule is rarely a simple sum of its parts and prediction of any molecular property on empirical or calculated fragments has no scientific basis [22]. The Bodor’s method computes log P as a function of different calculated molecular properties, like conformations, ionization, hydration, ionpair formation, keto-enol tautomerism, intramolecular and intermolecular H-bond formation, folding, and so forth. The fifth log P determination method, based on solvatochromic comparisons, was proposed by Kamlet and coworkers [23] and constitutes, once more, a molecular properties methodology. Log P can be calculated through the following equation: log Poct = a V + b π* +c βH + d αH + e V is a solute volume term, π* is a polarity/polarizability solute term, βH is an independent measure of solute hydrogen-bond acceptor strength, αH the corresponding hydrogen-bond donor strength, while e is the intercept. π*, βH, and αH represent solvatochromic parameters obtained averaging multiple normalized solvent effects on a variety of properties, involving many different types of indicators. Several research groups have tried to extend to amino acids the log P calculations, in order to better understand and investigate events like protein folding and biological interactions. However, experimental methods, like chromatography or site-directed mutagenesis, give ambiguous and different results [11]. Generally, each amino acid is characterized by a wide range of hydrophobicity values, thus, deciding and stating which value should correspond to a true measure becomes very difficult and time-consuming. In order to obtain rapid and proper estimation of biological molecule hydrophobicity, in 1987 Abraham and Leo extended to common amino acids the fragment method of calculating partition coefficients [10]. Fundamental hydrophobic fragments, obtained from partitioning experiments performed on thousands of compounds, were subsequently reduced to atomic values with inherent bond, ring, chain, branching, and proximity factors. The derived hydrophobic atomic constants and the corresponding SASAs constituted the key parameter of the software HINT (Hydropathic INTeractions), able to directly calculate them for small molecules like ligands, or to obtain them from a residue-based dictionary. The program was thus created with the purpose of rapidly and properly estimating biological interactions such as protein–protein, protein–DNA, and protein– ligand and folding phenomena. Why should we use log P to study and predict recognition and interactions between biological molecules? At least three reasonable answers could be given: (i) log P is essentially an experimental reproducible measurement; (ii) partition experiments are low cost and perform relatively rapidly; and (iii) log P is directly related to the free energy of binding. In fact, being that hydrophobicity is defined in terms of solubility, log Po/w, and consequently also the hydrophobic atomic constants, <www.iupac.org/publications/cd/medicinal_chemistry/> version date: 1 December 2006