1.1 Decision Functions and into Class 2 if g1(x)<g2(x) (1.5) In this case, the class boundary is given by(see the dotted curve in Fig 12) (1.6) This means that the class boundary is indirectly obtained by solving(1.6) for x. We call this type of decision function an indirect decision function Class 2 Fig. 1.2 Class boundary for Fig. 1.1 If we define the decision functions by 1(x) (1.7) we classify x into Class l if 91(x)>0 and into Class 2 if g2(x)>0. Thus the class boundary is given by (x)=-92(x)=0. Namely, the class boundary corresponds to the curve where the decision func tion vanishes. We call this type of decision function a direct decision function. If the decision function is linear, namely, g1 (x)is given by
1.1 Decision Functions 3 and into Class 2 if g1(x) < g2(x). (1.5) In this case, the class boundary is given by (see the dotted curve in Fig. 1.2) g1(x) = g2(x). (1.6) This means that the class boundary is indirectly obtained by solving (1.6) for x. We call this type of decision function an indirect decision function. Class 1 x1 x2 0 Class 2 g1 (x) = 0 g2 (x) = 0 Class boundary Fig. 1.2 Class boundary for Fig. 1.1 If we define the decision functions by g1(x) = −g2(x), (1.7) we classify x into Class 1 if g1(x) > 0 (1.8) and into Class 2 if g2(x) > 0. (1.9) Thus the class boundary is given by g1(x) = −g2(x)=0. (1.10) Namely, the class boundary corresponds to the curve where the decision function vanishes. We call this type of decision function a direct decision function. If the decision function is linear, namely, g1(x) is given by
1 Introduction g1(x)=w x+b, where w is an m-dimensional vector and b is a bias term, and if one class is on the positive side of the hyperplane, i.e., 91(x)>0, and the other class is on the negative side, the given problem is said to be linearly separable. 1.1.2 Decision Functions for Multiclass Problems 1.1.2.1 Indirect Decision Functions For an n(> 2)-class problem, suppose we have indirect decision functions gi(x) for classes i. To avoid unclassifiable regions, we classify x into class j given rg where arg returns the subscript with the maximum value of gi (x). If more than one decision function take the same maximum value for x, namely, x is on the class boundary, it is not classifiable In the following we discuss several methods to obtain the direct decision functions for multiclass problems 1.1.2.2 One-Against-All Formulation The first approach is to determine the decision functions by the one-against ll formulation 10. We determine the ith decision function gi(x)(i 1, .., n), so that when x belongs to class i g1(x)>0 and when x belongs to one of the remaining classes g(x)<0. (1.14) When x is given, we classify x into class i if gi(x)>0 and gi(x)<oG+ =l,..., n). But by these decision functions, unclassifiable regions exist when more than one decision function are positive or no decision functions are positive, as seen from Fig. 1.3. To resolve these unclassifiable regions we introduce membership functions in Chapter 3
4 1 Introduction g1(x) = wx + b, (1.11) where w is an m-dimensional vector and b is a bias term, and if one class is on the positive side of the hyperplane, i.e., g1(x) > 0, and the other class is on the negative side, the given problem is said to be linearly separable. 1.1.2 Decision Functions for Multiclass Problems 1.1.2.1 Indirect Decision Functions For an n(> 2)-class problem, suppose we have indirect decision functions gi(x) for classes i. To avoid unclassifiable regions, we classify x into class j given by j = arg max i=1,...,n gi(x), (1.12) where arg returns the subscript with the maximum value of gi(x). If more than one decision function take the same maximum value for x, namely, x is on the class boundary, it is not classifiable. In the following we discuss several methods to obtain the direct decision functions for multiclass problems. 1.1.2.2 One-Against-All Formulation The first approach is to determine the decision functions by the one-againstall formulation [10]. We determine the ith decision function gi(x) (i = 1,...,n), so that when x belongs to class i, gi(x) > 0, (1.13) and when x belongs to one of the remaining classes, gi(x) < 0. (1.14) When x is given, we classify x into class i if gi(x) > 0 and gj (x) < 0 (j = i, j = 1,...,n). But by these decision functions, unclassifiable regions exist when more than one decision function are positive or no decision functions are positive, as seen from Fig. 1.3. To resolve these unclassifiable regions we introduce membership functions in Chapter 3.
1.1 Decision Functions Class 1 马2(x)=0 Fig. 1. 3 Class boundaries by one-against-all formulation 1.1.2.3 Decision-Tree formulation The second approach is based on a decision tree. It is considered to be a variant of one-against-all formulation. We determine the ith decision function gi (x)(i=l,., n-1), so that when x belongs to class i, (x)>0 and when x belongs to one of the classes i+1,.,n g1(x)<0 (1.16) In classifying x, starting from g1(x), we find the first positive gi (x)and classify x into class i. If there is no such i among gi(x)(i=l,-., n-1),w classify x into class n. Figure 1. 4 shows an example of decision functions for four classes. The decision functions change if we determine decision functions in descending order or in an arbitrary order of class labels. Therefore, in this architecture, we need to determine the decision functions so that classification performance in he upper level of the tree is more accurate than in the lower one. Otherwise the classification performance may not be good
1.1 Decision Functions 5 Class 1 x1 x2 0 g Class 2 1 (x) = 0 g2 (x) = 0 g3 (x) = 0 Class 3 Fig. 1.3 Class boundaries by one-against-all formulation 1.1.2.3 Decision-Tree Formulation The second approach is based on a decision tree. It is considered to be a variant of one-against-all formulation. We determine the ith decision function gi(x) (i = 1,...,n − 1), so that when x belongs to class i, gi(x) > 0, (1.15) and when x belongs to one of the classes {i + 1,...,n}, gi(x) < 0. (1.16) In classifying x, starting from g1(x), we find the first positive gi(x) and classify x into class i. If there is no such i among gi(x) (i = 1,...,n − 1), we classify x into class n. Figure 1.4 shows an example of decision functions for four classes. The decision functions change if we determine decision functions in descending order or in an arbitrary order of class labels. Therefore, in this architecture, we need to determine the decision functions so that classification performance in the upper level of the tree is more accurate than in the lower one. Otherwise, the classification performance may not be good
1 Introduction 91(x)=0 Class 1 9×)=0 Class 4 Fig. 1. 4 Decision-tree-based decision functions 1.1.2.4 Pairwise Formulation The third approach is to determine the decision functions by pairwise formu- lation 11. For classes i and j we determine the decision function gii(x)(i+ 1,2,3 1, 7), so that 91(x)>0 when x belongs to class i and 9i(x)<0 when x belongs to class j In this formulation, gij (x)=-gji(x), and we need to determine n(n-1)/2 decision functions. Classification is done by voting, namely, we calculate sign(gii(x)). (1.19) where sign(r) x≥0, -1x<0. (1.20) and we classify x into the class with the maximum gi(x). By this formu- lation also, unclassifiable regions exist if gi(x) take the maximum value for I We may define the sign function by x>0, sgn(x)={0x=0
6 1 Introduction Class 1 x1 x2 0 Class 2 g1 (x) = 0 g2 (x) = 0 g3 (x) = 0 Class 4 Class 3 Fig. 1.4 Decision-tree-based decision functions 1.1.2.4 Pairwise Formulation The third approach is to determine the decision functions by pairwise formulation [11]. For classes i and j we determine the decision function gij (x) (i = j, i, j = 1,...,n), so that gij (x) > 0 (1.17) when x belongs to class i and gij (x) < 0 (1.18) when x belongs to class j. In this formulation, gij (x) = −gji(x), and we need to determine n(n−1)/2 decision functions. Classification is done by voting, namely, we calculate gi(x) = n j=i,j=1 sign(gij (x)), (1.19) where sign(x) = 1 x ≥ 0, −1 x < 0, (1.20) and we classify x into the class with the maximum gi(x). 1 By this formulation also, unclassifiable regions exist if gi(x) take the maximum value for 1 We may define the sign function by sign(x) = 1 x > 0, 0 x = 0, −1 x < 0.
1.1 Decision functions more than one class(see the hatched region in Fig. 1.5). These can be re- solved by decision-tree formulation or by introducing membership functions as discussed in Chapter 3 Class 1 Class 2 Fig. 1. 5 Class boundaries by pairwise formulation 1.1. 2.5 Error-Correcting Output Codes he fourth approach is to use error-correcting codes for encoding outputs 12 One-against-all formulation is a special case of error-correcting code with no error-correcting capability, and Chapter3,if“dont” care bits are introduced 1.1.2.6 All-at-Once Formulation The fifth approach is to determine decision functions all at once, namely, we determine the decision functions gi(x) by gi(x)>9i(x)for j*i, j=1, (1.21) In this formulation we need to determine n decision functions at all once[13, pp. 174-176, 8, pp. 437-440,[14-20. This results in solving a problem with a larger number of variables than the previous methods
1.1 Decision Functions 7 more than one class (see the hatched region in Fig. 1.5). These can be resolved by decision-tree formulation or by introducing membership functions as discussed in Chapter 3. Class 1 x1 x2 0 Class 2 g12(x) = 0 g23(x) = 0 g13(x) = 0 Class 3 Fig. 1.5 Class boundaries by pairwise formulation 1.1.2.5 Error-Correcting Output Codes The fourth approach is to use error-correcting codes for encoding outputs [12]. One-against-all formulation is a special case of error-correcting code with no error-correcting capability, and so is pairwise formulation, as discussed in Chapter 3, if “don’t” care bits are introduced. 1.1.2.6 All-at-Once Formulation The fifth approach is to determine decision functions all at once, namely, we determine the decision functions gi(x) by gi(x) > gj (x) for j = i, j = 1, . . . , n. (1.21) In this formulation we need to determine n decision functions at all once [13, pp. 174–176], [8, pp. 437–440], [14–20]. This results in solving a problem with a larger number of variables than the previous methods.