.5 Approaches to Classification Unknown Bayes Decision Learning Parametric Parametric Nonparametric Rules 巴 Mixture Density-Based Approaches Geometric Approach Fig. 2.10 Various approaches in statistical pattern recognition the model that characterizes the class-specific expectations for the distribution (Csiszar 1996). Instance-based learning algorithms are lazy-learning algorithms(Mitchel 1997), so-called because they delay the induction or generalization process until classification is performed. Lazy-learning algorithms(Aha 1998: De Mantaras and Armengol 1998)require less computation time during the training phase than eager-learning algorithms( such as Bayesian networks, decision trees, or neural networks) but more computation time during the classification process One of the most straightforward instance-based learning algorithms is the nearest neighbor algorithm. The relationship between a number of statistical pattern recognition methods shown in Fig. 2.10. Moving from top to bottom and left to right, less information is available and as a result, the difficulty of classification increases 2. Nonmetric approaches( Chap. 3): decision trees, syntactic (or grammatical) methods. and rule-based classifiers It is natural and intuitive to classify a pattern by asking a series of questions in which the next question depends on the r to the previos approach is particularly useful for nonmetric (or categorical) data, because the questions can be posed to elicit"yes/no"or"true/false"answers, although it can so be used with quantitative data. The sequence of questions can be displayed as a decision tree in the form of a tree structure(Fig. 2. 11), which has decision nodes that ask a question and have branches for each possible answer(outcome) These are connected until we reach the terminal or leaf node which indicates the classification In the case of complex patterns, a pattern can be viewed as a hierarchical composite of simple sub-patterns which themselves are built from yet simpler
the model that characterizes the class-specific expectations for the distribution (Csiszar 1996). Instance-based learning algorithms are lazy-learning algorithms (Mitchell 1997), so-called because they delay the induction or generalization process until classification is performed. Lazy-learning algorithms (Aha 1998; De Mantaras and Armengol 1998) require less computation time during the training phase than eager-learning algorithms (such as Bayesian networks, decision trees, or neural networks) but more computation time during the classification process. One of the most straightforward instance-based learning algorithms is the nearest neighbor algorithm. The relationship between a number of statistical pattern recognition methods is shown in Fig. 2.10. Moving from top to bottom and left to right, less information is available and as a result, the difficulty of classification increases. 2. Nonmetric approaches (Chap. 3): decision trees, syntactic (or grammatical) methods, and rule-based classifiers. It is natural and intuitive to classify a pattern by asking a series of questions, in which the next question depends on the answer to the previous question. This approach is particularly useful for nonmetric (or categorical) data, because the questions can be posed to elicit “yes/no” or “true/false” answers, although it can also be used with quantitative data. The sequence of questions can be displayed as a decision tree in the form of a tree structure (Fig. 2.11), which has decision nodes that ask a question and have branches for each possible answer (outcome). These are connected until we reach the terminal or leaf node which indicates the classification. In the case of complex patterns, a pattern can be viewed as a hierarchical composite of simple sub-patterns which themselves are built from yet simpler Class-Conditional Densities Supervised Learning Unsupervised Learning Bayes Decision Theory Known Parametric Parametric Nonparametric Nonparametric “Optimal” Rules Density-Based Approaches Geometric Approach Plug-in Rules Decision Boundary Construction (e.g., k-NN) Density Estimation Artificial Neural Networks Mixture Resolving Cluster Analysis Unknown Fig. 2.10 Various approaches in statistical pattern recognition 2.5 Approaches to Classification 19
2 Classification Travel Cost/Km Car Ownership Fig 2.11 A decision tree(the decision nodes are colored blue, and the leaf nodes orange) sub-patterns(Fu 1982). The simplest sub-patterns are called primitives, and the complex pattern is represented in terms of relationships between these primitives in a way similar to the syntax of a langu language, and the patterns are sentences generated according to a certain gram- mar(i. e, set of rules)that is inferred from the available training samples. This approach is appealing in situations where patterns have a definite structure which can be coded by a set of rules (e. g, ECG waveforms, texture in images However, difficulties in segmentation of noisy images to detect the primitives and the inference of the grammar from training data often impede their imple- mentation. The syntactic approach may yield a combinatorial explosion of possibilities to be investigated, requiring large training sets and huge computa tional efforts(Perlovsky 1998) Cognitive approaches, which include neural networks and support vector machines(SⅤMs) Neural networks are based on the organization of the human brain, where nerve cells(neurons) are linked together by strands of fiber(axons). Neural networks are massively parallel computing systems consisting of a huge number of simple processors with many interconnections. They are able to learn com- plex non-linear input-output relationships and use sequential training procedures. However, in spite of the seemingly different underlying principles, most of the neural network models are implicitly similar to statistical pattern recognition methods(Anderson et al. 1990; Ripley 1993). It has been pointed out (Anderson et al. 1990)that"neural networks are statistics for amateurs.. Most NNs conceal the statistics from the user" Support Vector Machines, SVMs(Cristianini and Shawe-Taylor 2000),rep- resent the training examples as points in p-dimensional space, mapped so that the examples of the data classes are separated by a(p -1)-dimensional hyperplane which is chosen to maximize the"margins "on either side of the hyperplane (Fig.212)
sub-patterns (Fu 1982). The simplest sub-patterns are called primitives, and the complex pattern is represented in terms of relationships between these primitives in a way similar to the syntax of a language. The primitives are viewed as a language, and the patterns are sentences generated according to a certain grammar (i.e., set of rules) that is inferred from the available training samples. This approach is appealing in situations where patterns have a definite structure which can be coded by a set of rules (e.g., ECG waveforms, texture in images). However, difficulties in segmentation of noisy images to detect the primitives and the inference of the grammar from training data often impede their implementation. The syntactic approach may yield a combinatorial explosion of possibilities to be investigated, requiring large training sets and huge computational efforts (Perlovsky 1998). 3. Cognitive approaches, which include neural networks and support vector machines (SVMs). Neural networks are based on the organization of the human brain, where nerve cells (neurons) are linked together by strands of fiber (axons). Neural networks are massively parallel computing systems consisting of a huge number of simple processors with many interconnections. They are able to learn complex non-linear input–output relationships and use sequential training procedures. However, in spite of the seemingly different underlying principles, most of the neural network models are implicitly similar to statistical pattern recognition methods (Anderson et al. 1990; Ripley 1993). It has been pointed out (Anderson et al. 1990) that “neural networks are statistics for amateurs ... Most NNs conceal the statistics from the user”. Support Vector Machines, SVMs (Cristianini and Shawe-Taylor 2000), represent the training examples as points in p-dimensional space, mapped so that the examples of the data classes are separated by a (p 1)-dimensional hyperplane, which is chosen to maximize the “margins” on either side of the hyperplane (Fig. 2.12). Travel Cost/Km ? Gender ? Car Ownership ? Train Car Expensive Standard Cheap Male Female 0 1 Bus Bus Train Fig. 2.11 A decision tree (the decision nodes are colored blue, and the leaf nodes orange) 20 2 Classification
2.6 Examples Fig. 2.12 A linear SVM in separates the two classes with small margin, but H2 separates them with the maximum margin(H3 2.6 Examples 2.6.1 Classification by Shape Figure 2. 13a is an image containing both bolts and nuts, some of which lie on their sides. We should be able to distinguish(and therefore classify) the objects on the basis of their shape. The bolts are long, with an end piece, and the nuts either have a hole in them(the"face-on"nuts )or are short and linear(the"end-on"nuts). In this ase, pre-processing was not required and automatic segmentation(using Otsu thresholding produces a simplified, binary image(Fig. 2. 13b) The skeleton of this image shows the essential shape differences between the bolts and the two types of nut(Fig. 2. 13c). A skeleton comprises pixels which can be distinguished on the basis of their connectivity to other pixels on the skeleton: end pixels(which have only one neighboring pixel on the skeleton), link pixels (which have two neighbors), and branch pixels (which have three neighbors). Because of their characteristic shape, only the skeletons of the bolts will have branch pixels. If they are used as a seed image, and conditionally dilated under the condition that the seed image is constrained to remain within the bounds of a mask image(the original binary image, Fig. 2.13b), then an image of the bolts alone results(Fig. 2.13d). The nuts can now be obtained(Fig. 2. 13e by logically combining this figure with the original binary figure (using (Fig. 2.13b AND(NOT Fig. 2. 13d)). The nuts and bolts can then be joined in a color-coded image(Fig. 2. 13f), which presents the nuts and bolts in different pseudocolor
2.6 Examples 2.6.1 Classification by Shape Figure 2.13a is an image containing both bolts and nuts, some of which lie on their sides. We should be able to distinguish (and therefore classify) the objects on the basis of their shape. The bolts are long, with an end piece, and the nuts either have a hole in them (the “face-on” nuts) or are short and linear (the “end-on” nuts). In this case, pre-processing was not required and automatic segmentation (using Otsu thresholding) produces a simplified, binary image (Fig. 2.13b). The skeleton of this image shows the essential shape differences between the bolts and the two types of nut (Fig. 2.13c). A skeleton comprises pixels which can be distinguished on the basis of their connectivity to other pixels on the skeleton: end pixels (which have only one neighboring pixel on the skeleton), link pixels (which have two neighbors), and branch pixels (which have three neighbors). Because of their characteristic shape, only the skeletons of the bolts will have branch pixels. If they are used as a seed image, and conditionally dilated under the condition that the seed image is constrained to remain within the bounds of a mask image (the original binary image, Fig. 2.13b), then an image of the bolts alone results (Fig. 2.13d). The nuts can now be obtained (Fig. 2.13e) by logically combining this figure with the original binary figure (using (Fig. 2.13b AND (NOT Fig. 2.13d)). The nuts and bolts can then be joined in a color-coded image (Fig. 2.13f), which presents the nuts and bolts in different pseudocolors. Fig. 2.12 A linear SVM in 2D feature space. H1 separates the two classes with a small margin, but H2 separates them with the maximum margin (H3 doesn’t separate the two classes at all) 2.6 Examples 21
2 Classification ol∴。 Fig. 2.13(a)Original image, (b) after Otsu thresholding, (c) after subsequent skeletonization, (d)after conditionally dilating the branch pixels from(c),(e)after logically combining(b)and(d ), (f color coding the nuts and bolts 2.6.2 Classification by Size An alternative approach to separating the nuts and bolts involves measuring fferent feature properties, such as the area, perimeter, or length. If we measure the area of the labeled objects in the segmented image(Fig. 2. 14a)by counting the pixels belonging to each label and plot these values in one dimension(Fig. 2. 14b), then we can see that the nuts and bolts are well discriminated on the basis of area with the bolts having larger areas. There are three clusters, comprising the bolts edge-on nuts with the lowest areas. If the objects are then re-labeled with their area values(Fig. 2.14c), that image(or the"area"feature) can be thresholded to show ust the bolts(Fig. 2. 14d): in this particular case, a threshold of 800(viz., an area of 800 pixels) would work well, although auto-thresholding, using either the isodata Dubes and Jain 1976)or Otsu(Otsu 1979)algorithm, for example, is prefera since that will preserve generality. The nuts can then be found by logically combining this image with the segmented nuts-and-bolts image as before. Only one feature(area) is used to discriminate the two classes, that is, the feature space (Fig. 2. 14b) is one-dimensional
2.6.2 Classification by Size An alternative approach to separating the nuts and bolts involves measuring different feature properties, such as the area, perimeter, or length. If we measure the area of the labeled objects in the segmented image (Fig. 2.14a) by counting the pixels belonging to each label and plot these values in one dimension (Fig. 2.14b), then we can see that the nuts and bolts are well discriminated on the basis of area with the bolts having larger areas. There are three clusters, comprising the bolts with the highest areas, followed by the face-on nuts with intermediate areas, and the edge-on nuts with the lowest areas. If the objects are then re-labeled with their area values (Fig. 2.14c), that image (or the “area” feature) can be thresholded to show just the bolts (Fig. 2.14d): in this particular case, a threshold of 800 (viz., an area of 800 pixels) would work well, although auto-thresholding, using either the isodata (Dubes and Jain 1976) or Otsu (Otsu 1979) algorithm, for example, is preferable since that will preserve generality. The nuts can then be found by logically combining this image with the segmented nuts-and-bolts image as before. Only one feature (area) is used to discriminate the two classes, that is, the feature space (Fig. 2.14b) is one-dimensional. Fig. 2.13 (a) Original image, (b) after Otsu thresholding, (c) after subsequent skeletonization, (d) after conditionally dilating the branch pixels from (c), (e) after logically combining (b) and (d), (f) color coding the nuts and bolts 22 2 Classification
2.6 Examples 0.3 02 0.1 Fig. 2.14 (a) Segmented, labeled image (using Fig. 2.13a),(b) one-dimensional space showing the areas of the features, (e)the features"painted"with grayscales representing their measured areas, and (d)after thresholding image(c)at a value of 800 The two alternative measures, shape and size, could be tested for robustness on other images of nuts and bolts to see which performs better. 2.6.3 More examples Figure 2. 15a is an image containing a number of electronic components, of different shapes and sizes(the transistors are three-legged, the thyristors are three-legged with a hole in them, the electrolytic capacitors have a round body with two legs, and he ceramic capacitors are larger than the resistors). A combination of the shape and size methods can be used to separate the objects into their different classes (Fig.2.15b) Similar techniques have been used to classify the fruit in Fig. 2. 16 into three different classes. Think about the features that would be most discriminating in this case
The two alternative measures, shape and size, could be tested for robustness on other images of nuts and bolts to see which performs better. 2.6.3 More Examples Figure 2.15a is an image containing a number of electronic components, of different shapes and sizes (the transistors are three-legged, the thyristors are three-legged with a hole in them, the electrolytic capacitors have a round body with two legs, and the ceramic capacitors are larger than the resistors). A combination of the shape and size methods can be used to separate the objects into their different classes (Fig. 2.15b). Similar techniques have been used to classify the fruit in Fig. 2.16 into three different classes. Think about the features that would be most discriminating in this case. a c b d 200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 400 600 800 1000 1200 1400 Fig. 2.14 (a) Segmented, labeled image (using Fig. 2.13a), (b) one-dimensional feature space showing the areas of the features, (c) the features “painted” with grayscales representing their measured areas, and (d) after thresholding image (c) at a value of 800 2.6 Examples 23