Citeseer:http://citeseer.istpsu.edu/ Nips:http://books.nips.cc/ I enjoyed reading papers with innovating ideas. In understanding and in- cluding the papers into my book, they were especially helpful if they were well written in that in the abstract and the introduction their proposed methods were explained not by their characteristics but by their approaches and ideas and in the main text the ideas behind the algorithms were explained before their detailed explanation Kobe. October 2004. October 2009 Shigeo Abe References 1. V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995 2. V.N. Vapnik. Statistical Learning Theory. John Wiley Sons, New York, 1998. 3. R. Herbrich. Learning Kenel Classifiers: Theory and Algorithms. MIT Press, Cam- ge,MA,2002. 4. B. Scholkopf and A. J Smola. Learning with Kernels: Support Vector Machines, Reg- ularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2002 5. S. Young and T Downs. CARVE-A constructive algorithm for real-valued examples. IEEE Transactions on Neural Networks, 9 (6): 1180-1190, 1998 6. S. Abe. Pattern Classification: Neuro-Fuzzy Methods and Their Comparison. Springe Verlag, London, UK, 2001
xii Acknowledgments CiteSeer: http://citeseer.ist.psu.edu/ NIPS: http://books.nips.cc/ I enjoyed reading papers with innovating ideas. In understanding and including the papers into my book, they were especially helpful if they were well written in that in the abstract and the introduction their proposed methods were explained not by their characteristics but by their approaches and ideas and in the main text the ideas behind the algorithms were explained before their detailed explanation. Kobe, October 2004, October 2009 Shigeo Abe References 1. V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995. 2. V. N. Vapnik. Statistical Learning Theory. John Wiley & Sons, New York, 1998. 3. R. Herbrich. Learning Kernel Classifiers: Theory and Algorithms. MIT Press, Cambridge, MA, 2002. 4. B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2002. 5. S. Young and T. Downs. CARVE—A constructive algorithm for real-valued examples. IEEE Transactions on Neural Networks, 9(6):1180–1190, 1998. 6. S. Abe. Pattern Classification: Neuro-Fuzzy Methods and Their Comparison. SpringerVerlag, London, UK, 2001
Contents Acknowledgments Symbols 1 Introduction 1.1 Decision functions 1.1.1 Decision Functions for Two-Class Problems 1.1.2 Decision Functions for Multiclass problems 1.2 Determination of decision functions 122489 1. 3 Data Sets Used in the book 1. 4 Classifier Evaluation 2 Two-Class Support Vector Machines 21 2.1 Hard-Margin Support Vector Machines 2.2 LI Soft-Margin Support Vector Machines 2.3 Mapping to a High-Dimensional Space 2.3.1 Kernel Tricks 2.3.2 Kernels reels 2.3.4 Properties of Mapping Functions Associated with 2.3.5 Implicit Bias Terms 2.3.6 Empirical Feature Space 2.4 L2 Soft-Margin Support Vector Machines 2.5 Advantages and Disadvantages 2.5.1 Advantages 2.5.2 Disadvantages 2. 6 Characteristics of Solutions 60 2.6.2D of solutions
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix 1 Introduction .............................................. 1 1.1 Decision Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Decision Functions for Two-Class Problems . . . . . . . . . . 2 1.1.2 Decision Functions for Multiclass Problems . . . . . . . . . . 4 1.2 Determination of Decision Functions . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Data Sets Used in the Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Classifier Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2 Two-Class Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . 21 2.1 Hard-Margin Support Vector Machines . . . . . . . . . . . . . . . . . . . . 21 2.2 L1 Soft-Margin Support Vector Machines . . . . . . . . . . . . . . . . . . 28 2.3 Mapping to a High-Dimensional Space . . . . . . . . . . . . . . . . . . . . 31 2.3.1 Kernel Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.2 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.3 Normalizing Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.3.4 Properties of Mapping Functions Associated with Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.3.5 Implicit Bias Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.3.6 Empirical Feature Space . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.4 L2 Soft-Margin Support Vector Machines . . . . . . . . . . . . . . . . . . 56 2.5 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.5.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.5.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.6 Characteristics of Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.6.1 Hessian Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.6.2 Dependence of Solutions on C . . . . . . . . . . . . . . . . . . . . . 62 xiii
2.6.3 Equivalence of Ll and L2 Support Vector Machines.. 67 2.6.4 Nonunique Solutions 5.5 Reducing the Number of Support Vectors 2.6.6 Degenerate Solutions 2.6.7 Duplicate Copies of Data 2. 6.8 Imbalanced Data 2.6.9 Classification for the blood Cell Data 2. 7 Class Boundaries for Different Kernels 2.8 Developing Classifiers 2.8.1 Model Selection 2.8.2 Estimating Generalization Errors 2.8.3 Sophistication of Model Selection 89337 2.8.4 Effect of Model Selection by Cross-Validation 2. 9 Invariance for Linear transformation References Iticlass Support Vector Machines 113 One-Against-All Support Vector Machines 114 3.1.1 Conventional Support Vector Machines 114 3.1.2 Fuzzy Support Vector Machines 3.1.3 Equivalence of Fuzzy Support Vector Machines and Support Vector Machines with Continuous Decision 119 3.1.4 Decision-Tree-Based Support Vector Machines 3.2 Pairwise Support Vector Machines 3.2.1 Conventional Support Vector Machines 3.2.2 Fuzzy Support Vector Machines 128 3.2.3 Performance Comparison of Fuzzy Support Vector Machines 3.2.4 Cluster-Based Support Vector Machines 3.2.5 Decision-Tree-Based Support Vector Machines 133 3.2.6 Pairwise Classification with Correcting Classifiers 3.3 Error-Correcting Output Codes 144 3.3.1 Output Coding by Error-Correcting Codes 3.3.2 Unified Scheme for Output Coding 3.3.3 Equivalence of ECOC with Membership Functions.. 147 3.3.4 Performance Evaluation 147 3.4 All-at-Once Support Vector Machines 149 3.5 Comparisons of Architectures 152 3.5.1 One-Against-All Support Vector Machines 152 3.5.2 Pairwise Support Vector Machines 152 3.5.3 ECOC Support Vector Machines 3.5.4 All-at-Once Support Vector Machines 153 5.5 Training Difficulty 3.5.6 Training Time Comp arIson References 158
xiv Contents 2.6.3 Equivalence of L1 and L2 Support Vector Machines . . . 67 2.6.4 Nonunique Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2.6.5 Reducing the Number of Support Vectors . . . . . . . . . . . 78 2.6.6 Degenerate Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 2.6.7 Duplicate Copies of Data . . . . . . . . . . . . . . . . . . . . . . . . . . 83 2.6.8 Imbalanced Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 2.6.9 Classification for the Blood Cell Data . . . . . . . . . . . . . . . 85 2.7 Class Boundaries for Different Kernels . . . . . . . . . . . . . . . . . . . . 88 2.8 Developing Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 2.8.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 2.8.2 Estimating Generalization Errors . . . . . . . . . . . . . . . . . . . 93 2.8.3 Sophistication of Model Selection . . . . . . . . . . . . . . . . . . . 97 2.8.4 Effect of Model Selection by Cross-Validation . . . . . . . . 98 2.9 Invariance for Linear Transformation . . . . . . . . . . . . . . . . . . . . . . 102 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 3 Multiclass Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . 113 3.1 One-Against-All Support Vector Machines . . . . . . . . . . . . . . . . . 114 3.1.1 Conventional Support Vector Machines . . . . . . . . . . . . . . 114 3.1.2 Fuzzy Support Vector Machines . . . . . . . . . . . . . . . . . . . . 116 3.1.3 Equivalence of Fuzzy Support Vector Machines and Support Vector Machines with Continuous Decision Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 3.1.4 Decision-Tree-Based Support Vector Machines . . . . . . . 122 3.2 Pairwise Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . 127 3.2.1 Conventional Support Vector Machines . . . . . . . . . . . . . . 127 3.2.2 Fuzzy Support Vector Machines . . . . . . . . . . . . . . . . . . . . 128 3.2.3 Performance Comparison of Fuzzy Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 3.2.4 Cluster-Based Support Vector Machines . . . . . . . . . . . . . 132 3.2.5 Decision-Tree-Based Support Vector Machines . . . . . . . 133 3.2.6 Pairwise Classification with Correcting Classifiers. . . . . 143 3.3 Error-Correcting Output Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 144 3.3.1 Output Coding by Error-Correcting Codes. . . . . . . . . . . 145 3.3.2 Unified Scheme for Output Coding . . . . . . . . . . . . . . . . . 146 3.3.3 Equivalence of ECOC with Membership Functions . . . . 147 3.3.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 3.4 All-at-Once Support Vector Machines . . . . . . . . . . . . . . . . . . . . . 149 3.5 Comparisons of Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 3.5.1 One-Against-All Support Vector Machines . . . . . . . . . . . 152 3.5.2 Pairwise Support Vector Machines . . . . . . . . . . . . . . . . . . 152 3.5.3 ECOC Support Vector Machines . . . . . . . . . . . . . . . . . . . 153 3.5.4 All-at-Once Support Vector Machines . . . . . . . . . . . . . . . 153 3.5.5 Training Difficulty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 3.5.6 Training Time Comparison . . . . . . . . . . . . . . . . . . . . . . . . 157 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4 Variants of Support Vector Machines 4.1 Least-Squares Support Vector Machines 163 4.1.1 Two-Class Least-Squares Support Vector Machines.. 1( 4.1.2 One-Against-All Least-Squares Support Vector Machines 166 4.1.3 Pairwise Least-Squares Support Vector Machines 4.1.4 All-at-Once Least-Squares Support Vector Machines.. 169 4.1.5 Performance Comparison 170 4.2 Linear Programming Support Vector Machines 174 4.2.1 Architecture 175 4.2.2 Performance evaluation 4.3 Sparse Support Vector Machines 180 4.3.1 Several Approaches for Sparse Support Vector machine 4.3.2 Idea 183 4.3.3 Support Vector Machines Trained in the Empirical Feature Space 184 4.3.4 Selection of Linearly Independent Data 4.3.5 Performance evaluation 4.4 Performance Comparison of Different Classifiers 192 4.5 Robust Support Vector Machines 6 Bayesian Support Vector Machines 4.6.1 One-Dimensional Bayesian Decision Functions 4.6.2 Parallel Displacement of a Hyperplane 4. 6.3 Normal Test 4.7 Incremental Training 4.7.1 Overview 201 1.7.2 Incremental Training Using Hyperspheres 4.8 Learning Using Privileged Information 213 4.9 Semi-Supervised Learning 4.10 Multiple Classifier Systems 4.11 Multiple Kernel Learning 218 4.12 Confidence Level 4.13 Visualization References 5 Training Methods 5.1 Preselecting Support Vector Candidates 227 5.1.1 Approximation of Boundary Data 5.1.2 Performance evaluation 5.2 Decomposition Techniques 231 5.3 KKT Conditions Revisited 5.4 Overview of Training Methods 239 5.5 PrimalDual Interior-Point methods
Contents xv 4 Variants of Support Vector Machines . . . . . . . . . . . . . . . . . . . . . 163 4.1 Least-Squares Support Vector Machines . . . . . . . . . . . . . . . . . . . 163 4.1.1 Two-Class Least-Squares Support Vector Machines . . . 164 4.1.2 One-Against-All Least-Squares Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 4.1.3 Pairwise Least-Squares Support Vector Machines . . . . . 168 4.1.4 All-at-Once Least-Squares Support Vector Machines . . 169 4.1.5 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 170 4.2 Linear Programming Support Vector Machines . . . . . . . . . . . . . 174 4.2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 4.2.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 4.3 Sparse Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . 180 4.3.1 Several Approaches for Sparse Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 4.3.2 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 4.3.3 Support Vector Machines Trained in the Empirical Feature Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 4.3.4 Selection of Linearly Independent Data. . . . . . . . . . . . . . 187 4.3.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 4.4 Performance Comparison of Different Classifiers . . . . . . . . . . . . 192 4.5 Robust Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . 196 4.6 Bayesian Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . 197 4.6.1 One-Dimensional Bayesian Decision Functions . . . . . . . 199 4.6.2 Parallel Displacement of a Hyperplane . . . . . . . . . . . . . . 200 4.6.3 Normal Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 4.7 Incremental Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 4.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 4.7.2 Incremental Training Using Hyperspheres . . . . . . . . . . . 204 4.8 Learning Using Privileged Information . . . . . . . . . . . . . . . . . . . . 213 4.9 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 4.10 Multiple Classifier Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 4.11 Multiple Kernel Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 4.12 Confidence Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 4.13 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 5 Training Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 5.1 Preselecting Support Vector Candidates . . . . . . . . . . . . . . . . . . . 227 5.1.1 Approximation of Boundary Data . . . . . . . . . . . . . . . . . . 228 5.1.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 5.2 Decomposition Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 5.3 KKT Conditions Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 5.4 Overview of Training Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 5.5 Primal–Dual Interior-Point Methods . . . . . . . . . . . . . . . . . . . . . . 242
Contents 5.5.1 Primal-Dual Interior-Point methods for linear Programming 242 5.5.2 Primal-Dual Interior-Point Methods for Quadratic Programming 5.5.3 Performance evaluation 5.6 Steepest Ascent Methods and Newton's 5.6.1 Solving Quadratic Programming Problems Without Constraints 252 5.6.2 Training of Ll Soft-Margin Support Vector Machines. 254 5.6.3 Sequential Minimal Optimization 259 5.6.4 Training of L2 Soft-Margin Support Vector Machines. 260 5.6.5 Performance evaluation 5.7 Batch Training by Exact Incremental Training 262 5.7.1 KKT Conditions 263 5.7.2 Training by Solving a Set of Linear Equations 5.7.3 Performance evaluation 5.8 Active Set Training in Primal and Dual 5.8.1 Training Support Vector Machines in the Primal 273 5.8.2 Comparison of Training Support Vector Machines in the primal and the dual 276 5.8.3 Performance Evaluation 5.9 Training of Linear Programming Support Vector Machines.. 281 5.9.1 Decomposition Techniques 282 5.9.2 Decomposition Techniques for Linear Programming Support Vector Machines 5.9.3 Computer Experim References 299 6 Kernel-Based methods 6.1 Kernel Least Squares 6.1.1 Algorithm 6.1.2 Performance evaluation 6.2 Kernel Principal Component Analysis 311 6.3 Kernel mahalanobis distance 6.3.1 SVD-Based Kernel mahalanobis distance 315 6.3.2 KPCA-Based Mahalanobis Distance 6.4 Principal Component Analysis in the Empirical Feature Space 6.5 Kernel Discriminant Analysis 6.5.1 Kernel Discriminant Analysis for Two-Class Problems. 321 6.5.2 Linear Discriminant Analysis for Two-Class Problems in the Empirical Feature Space 6.5.3 Kernel Discriminant Analysis for Multiclass Problems. 325 References
xvi Contents 5.5.1 Primal–Dual Interior-Point Methods for Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 5.5.2 Primal–Dual Interior-Point Methods for Quadratic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 5.5.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 5.6 Steepest Ascent Methods and Newton’s Methods . . . . . . . . . . . 252 5.6.1 Solving Quadratic Programming Problems Without Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 5.6.2 Training of L1 Soft-Margin Support Vector Machines . 254 5.6.3 Sequential Minimal Optimization . . . . . . . . . . . . . . . . . . . 259 5.6.4 Training of L2 Soft-Margin Support Vector Machines . 260 5.6.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 5.7 Batch Training by Exact Incremental Training . . . . . . . . . . . . . 262 5.7.1 KKT Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 5.7.2 Training by Solving a Set of Linear Equations . . . . . . . . 264 5.7.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 5.8 Active Set Training in Primal and Dual . . . . . . . . . . . . . . . . . . . 273 5.8.1 Training Support Vector Machines in the Primal . . . . . 273 5.8.2 Comparison of Training Support Vector Machines in the Primal and the Dual . . . . . . . . . . . . . . . . . . . . . . . . . . 276 5.8.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 5.9 Training of Linear Programming Support Vector Machines . . . 281 5.9.1 Decomposition Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 282 5.9.2 Decomposition Techniques for Linear Programming Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 289 5.9.3 Computer Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 6 Kernel-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 6.1 Kernel Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 6.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 6.1.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 6.2 Kernel Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . 311 6.3 Kernel Mahalanobis Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 6.3.1 SVD-Based Kernel Mahalanobis Distance. . . . . . . . . . . . 315 6.3.2 KPCA-Based Mahalanobis Distance . . . . . . . . . . . . . . . . 318 6.4 Principal Component Analysis in the Empirical Feature Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 6.5 Kernel Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 6.5.1 Kernel Discriminant Analysis for Two-Class Problems . 321 6.5.2 Linear Discriminant Analysis for Two-Class Problems in the Empirical Feature Space . . . . . . . . . . . . . . . . . . . . . 324 6.5.3 Kernel Discriminant Analysis for Multiclass Problems . 325 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327