2. Selection of Descriptors (1)经典方法 ●向前法 ●向后法 ●逐步回归法 (2)主成分分析 (3)正交变换 (4) Leaps-and-Bounds Regression (5)模拟退火法 (6)遗传算法 (7)人工神经网络法 Xu lu et al., Anal. Chim. Acta. 2001. 446, 477-483
2. Selection of Descriptors (1)经典方法 •向前法 •向后法 •逐步回归法 (2)主成分分析 (3)正交变换 (4)Leaps-and-Bounds Regression (5)模拟退火法 (6)遗传算法 (7)人工神经网络法 * Xu Lu et al., Anal. Chim. Acta,2001, 446, 477-483
Leaps-and-Bounds Regression Descriptor R F 0.084 4.6 0.14 0.831 5.6 0.13 3 2,7 0.833 18.9 0.12 ,2,3,7 0.868 18.4 0.11 ,2,3,5,7 0.901 198 0.097 ,2,3,5,6,7 0.913 12.5 0.098 1,2,3,5,6,7,8 0.913 15.1 0.096
Leaps-and-Bounds Regression _______________________ No Descriptor R F S ---------------------------------------------------- 1 1 0.084 4.6 0.14 2 1,7 0.831 5.6 0.13 3 1,2,7 0.833 18.9 0.12 4 1,2,3,7 0.868 18.4 0.11 5 1,2,3,5,7 0.901 19.8 0.097 6 1,2,3,5,6,7 0.913 12.5 0.098 7 1,2,3,5,6,7,8 0.913 15.1 0.096 ____________________________
数学方法的选择 (1)多元回归分析 (2)人工神经网络方法 (3)C0MFA方法
数学方法的选择 (1)多元回归分析 (2)人工神经网络方法 (3)CoMFA方法
3. Multiple regression The rule:n/m≥5 n: number of samples; m: number of variables log(LD50)=-0.760-1.744*103Er-5452*102E1 1295*10-2Am3+1.556*103Ep 1.171÷103Ee R=0.901,F=19.8,S=0.097,n=29 where E heat of formation; EI: LUMO Am3: topological index; Ep: repulse energy Ee: electronic energy
The rule: n/m5 n: number of samples; m: number of variables. -log(LD50) = -0.760 - 1.744*10-3Ef - 5.452*10-2EL - 1.295*10-2Am3 + 1.556*10-3Ep -1.171*10-3Ee R = 0.901, F = 19.8, S = 0.097, n = 29 where Ef : heat of formation; EL : LUMO ; Am3 : topological index; Ep : repulse energy; Ee : electronic energy. 3. Multiple Regression
4. Artificial Neural Network Algorithm: BFGS quasi-Newton method Architecture: 6: 3: 1 Results: (Regression) R=0.967 (R=0.901 F=386.6 (F=198) S=0.053 (S=0.097) These are much better than those obtained by multiple regression analysis
4. Artificial Neural Network Algorithm: BFGS quasi-Newton method Architecture: 6: 3: 1 Results: (Regression) R = 0.967 (R = 0.901) F = 386.6 (F = 19.8 ) S = 0.053 (S = 0.097) These are much better than those obtained by multiple regression analysis