16 CHAPTER 2.BASIC DISCRIMINANTS
16 CHAPTER 2. BASIC DISCRIMINANTS
Chapter 3 Support Vector Machines In training any classifier,an obvious goal would be to correctly classify our training data.But if we create a classifier that is too well fit to our data,we run the risk of developing a model that performs poorly when asked to classify unknown data,a condition known as overfitting.This is a particularly important concern when the available amount of training data is small or suspected to be a poor representation of the general population.The motivation behind Support Vector Machines is to maximize the generalization ability,meaning that it finds the classifier that most accurately can predict unknown data points based on the training data. For now,we focus on case of trying to distinguish between two classes. 3.1 Lagrange Multipliers Before discussing SVMs,it is useful to review Lagrange Multipliers,as they play a major role in the construction of SVMs.In short,Lagrange Multipliers are constants that help solve constrained maximization/minimization problems. Suppose we want to find the maximum value for a given function f(x)subject to a constraint g(x)=0.Geometrically,if x E Rm,then the constraint g(x)=0 represents some m-1 dimensional surface in R" Lemma 3.1.1.Vg(x)is orthogonal to g(x)for all x on the surface g()=0.2] Proof.Consider the linear approximation of g(x):g(x+e)g(x)+eVg(x)for two points x,x+e that are on the surface g(x)=0.Since g(x)=0=g(x+e),we know eVg(x)0.Moreover, as e approaches 0,e becomes parallel to the surface g(x)=0 and so we conclude that Vg(x)is orthogonal to the surface. ◇ Moreover,if f(x)is maximized on g(x)=0,then Vf(x)is also orthogonal to g(x)=0.Recall 17
Chapter 3 Support Vector Machines In training any classifier, an obvious goal would be to correctly classify our training data. But if we create a classifier that is too well fit to our data, we run the risk of developing a model that performs poorly when asked to classify unknown data, a condition known as overfitting. This is a particularly important concern when the available amount of training data is small or suspected to be a poor representation of the general population. The motivation behind Support Vector Machines is to maximize the generalization ability, meaning that it finds the classifier that most accurately can predict unknown data points based on the training data. For now, we focus on case of trying to distinguish between two classes. 3.1 Lagrange Multipliers Before discussing SVMs, it is useful to review Lagrange Multipliers, as they play a major role in the construction of SVMs. In short, Lagrange Multipliers are constants that help solve constrained maximization/minimization problems. Suppose we want to find the maximum value for a given function f(x) subject to a constraint g(x) = 0. Geometrically, if x ∈ R m, then the constraint g(x) = 0 represents some m−1 dimensional surface in R m. Lemma 3.1.1. ∇g(x) is orthogonal to g(x) for all x on the surface g(x) = 0. [2] Proof. Consider the linear approximation of g(x): g(x+) ∼= g(x) + T ∇g(x) for two points x, x+ that are on the surface g(x) = 0. Since g(x) = 0 = g(x + ), we know T ∇g(x) ∼= 0. Moreover, as |||| approaches 0, becomes parallel to the surface g(x) = 0 and so we conclude that ∇g(x) is orthogonal to the surface. Moreover, if f(x) is maximized on g(x) = 0, then ∇f(x) is also orthogonal to g(x) = 0. Recall 17