1.2 Classification Statistics maehine learning. Pattern ration Artificial Intelligence Fig. 1. 2 Pattern recognition and related fields 1.2 Classification Classification is often the final step in a general process(Fig. 1.3). It involves sorting objects into separate classes. In the case of an image, the acquired image is egmented to isolate different objects from each other and from the background, and the different objects are labeled. A typical pattern recognition system contains a sensor, a preprocessing mechanism (prior to segmentation), a feature extraction mechanism, a set of examples(training data) already classified(post-processing) and a classification algorithm. The feature extraction step reduces the data by measuring certain characteristic properties or features(such as size, shape, and texture)of the labeled objects. These features(or, more precisely, the values of these features) are then passed to a classifier that evaluates the evidence presented and makes a decision regarding the class each object should be assigned, depending on whether the values of its features fall inside or outside the tolerance of that class The quality of the acquired image depends on the resolution, sensitivity, bandwidth and signal-to-noise ratio of the imaging system. Pre-processing steps such as image enhancement(.g, brightness adjustment, contrast enhancement, image averaging. frequency domain filtering, edge enhancement)and image restoration(e. g, photo- metric correction, inverse filtering, Wiener filtering)may be required prior to segmen- tation, which is often a challenging process. Typically enhancement will precede restoration. Often these are performed sequentially, but more sophisticated tasks will require feedback i.e., advanced processing steps will pass parameters back to preced ing steps so that the processing includes a number of iterative loops
1.2 Classification Classification is often the final step in a general process (Fig. 1.3). It involves sorting objects into separate classes. In the case of an image, the acquired image is segmented to isolate different objects from each other and from the background, and the different objects are labeled. A typical pattern recognition system contains a sensor, a preprocessing mechanism (prior to segmentation), a feature extraction mechanism, a set of examples (training data) already classified (post-processing), and a classification algorithm. The feature extraction step reduces the data by measuring certain characteristic properties or features (such as size, shape, and texture) of the labeled objects. These features (or, more precisely, the values of these features) are then passed to a classifier that evaluates the evidence presented and makes a decision regarding the class each object should be assigned, depending on whether the values of its features fall inside or outside the tolerance of that class. This process is used, for example, in classifying lesions as benign or malignant. The quality of the acquired image depends on the resolution, sensitivity, bandwidth and signal-to-noise ratio of the imaging system. Pre-processing steps such as image enhancement (e.g., brightness adjustment, contrast enhancement, image averaging, frequency domain filtering, edge enhancement) and image restoration (e.g., photometric correction, inverse filtering, Wiener filtering) may be required prior to segmentation, which is often a challenging process. Typically enhancement will precede restoration. Often these are performed sequentially, but more sophisticated tasks will require feedback i.e., advanced processing steps will pass parameters back to preceding steps so that the processing includes a number of iterative loops. Fig. 1.2 Pattern recognition and related fields 1.2 Classification 3
Introduction Sensing and labeling Feature Post- Classificatio Decision extraction Fig. 1.3 A general classification system pA-μB Fig. 1. 4 A good feature, x, measured for two different classes (blue and red) should have small intra-class variations and large inter-class variations The quality of the features is related to their ability to discriminate examples from different classes. Examples from the same class should have similar feature values, while examples from different classes should have different feature values, e,good features should have small intra-class variations and large inter-class variations(Fig 1.4). The measured features can be transf classifier We have assumed that the features are continuous (i.e, quantitative), but they could be categorical or non-metric (i.e, qualitative)instead, which is often the case ita mining. Categorical features can either be nominal (i.e, unordered, e.g., zip codes, employee ID, gender) or ordinal [i. e, ordered, e.g., street numbers, grades, degree of satisfaction(very bad, bad, OK, good, very good)]. There is some ability to move data from one type to another, e. g, continuous data could be discretized into ordinal data, and ordinal data could be assigned integer numbers(although they would lack many of the properties of real numbers, and should be treated more like symbols). The preferred features are always the most informative(and, therefore in this context, the most discriminating). Given a choice, scientific applications will generally prefer continuous data since more operations can be performed on them (e. g, mean and standard deviation). With categorical data, there may be doubts s to whether all relevant categories have been accounted for, or they may evolve with time Humans are adept at recognizing objects within an image, using size, shape, color, and other visual clues. They can do this despite the fact that the objects may appear from different viewpoints and under different lighting conditions, have
The quality of the features is related to their ability to discriminate examples from different classes. Examples from the same class should have similar feature values, while examples from different classes should have different feature values, i.e., good features should have small intra-class variations and large inter-class variations (Fig. 1.4). The measured features can be transformed or mapped into an alternative feature space, to produce better features, before being sent to the classifier. We have assumed that the features are continuous (i.e., quantitative), but they could be categorical or non-metric (i.e., qualitative) instead, which is often the case in data mining. Categorical features can either be nominal (i.e., unordered, e.g., zip codes, employee ID, gender) or ordinal [i.e., ordered, e.g., street numbers, grades, degree of satisfaction (very bad, bad, OK, good, very good)]. There is some ability to move data from one type to another, e.g., continuous data could be discretized into ordinal data, and ordinal data could be assigned integer numbers (although they would lack many of the properties of real numbers, and should be treated more like symbols). The preferred features are always the most informative (and, therefore in this context, the most discriminating). Given a choice, scientific applications will generally prefer continuous data since more operations can be performed on them (e.g., mean and standard deviation). With categorical data, there may be doubts as to whether all relevant categories have been accounted for, or they may evolve with time. Humans are adept at recognizing objects within an image, using size, shape, color, and other visual clues. They can do this despite the fact that the objects may appear from different viewpoints and under different lighting conditions, have Fig. 1.3 A general classification system Fig. 1.4 A good feature, x, measured for two different classes (blue and red) should have small intra-class variations and large inter-class variations 4 1 Introduction
1.2 Classification Fig. 1.5 Face recognition needs to be able to handle different expressions, lighting. an Fig. 1.6 Classes mapped as decision regions, with decision boundaries different sizes, or be rotated. We can even recognize when they are partially obstructed from view(Fig. 1.5). These tasks are challenging for machine vision systems in general. The goal of the classifier is to classify new data(test data) to one of the classes, characterized by a decision region. The borders between decision regions are called decision boundaries(Fig. 1.6) Classification techniques can be divided into two broad areas: statistical or structural (or syntactic) techniques, with a third area that borrows from both sometimes called cognitive methods, which include neural networks and genetic quantifiable statistical basis for their generation and are described by quantitative features such as length, area, and texture. The second area deals with objects best described by qualitative features describing structural or syntactic relationships inherent in the object. Statistical classification methods are more popular than
different sizes, or be rotated. We can even recognize them when they are partially obstructed from view (Fig. 1.5). These tasks are challenging for machine vision systems in general. The goal of the classifier is to classify new data (test data) to one of the classes, characterized by a decision region. The borders between decision regions are called decision boundaries (Fig. 1.6). Classification techniques can be divided into two broad areas: statistical or structural (or syntactic) techniques, with a third area that borrows from both, sometimes called cognitive methods, which include neural networks and genetic algorithms. The first area deals with objects or patterns that have an underlying and quantifiable statistical basis for their generation and are described by quantitative features such as length, area, and texture. The second area deals with objects best described by qualitative features describing structural or syntactic relationships inherent in the object. Statistical classification methods are more popular than Fig. 1.5 Face recognition needs to be able to handle different expressions, lighting, and occlusions Fig. 1.6 Classes mapped as decision regions, with decision boundaries 1.2 Classification 5
I Int structural methods; cognitive methods have gained popularity over the last decade or so. The models are not necessarily independent and hybrid systems involving ultiple classifiers are increasingly common(Fu 1983) 1.3 Organization of the book In Chap. 2, we will look at the classification process in detail and the different approaches to it, and will look at a few examples of classification tasks. In Chap 3, we will look at non-metric methods such as decision trees; and in Chap. 4, we will onsider probability theory, leading to Bayes Rule and the roots of statistical pattern recognition. Chapter 5 considers supervised learning-and examples of both parametric and non-parametric learning. We will look at different ways to evaluate the performance of classifiers in Chap 5. Chapter 6 considers the curse of dimensionality and how to keep the number of features to a useful minimum Chapter 7 considers unsupervised learning techniques, and Chap. 8 looks at ways to evaluate the performance of the various classifiers. Chapter 9 will consider stochastic methods, and Chap. 10 will discuss some interesting classification problems By judiciously avoiding some of the details, the material can be covered in a single semester. Alternatively, fully featured (!)and with a healthy dose of exercises/applications and some project work, it would form the basis for two semesters of work. The independent reader, on the other hand, can follow the material at his or her own pace and should find sufficient amusement for a few nonths! Enjoy, and happy studying 1. 4 Exerci 1. List a number of applications of classification, additional to those mentioned in he text 2. Consider the data of four adults, indicating their weight(actually, their mass and their health status. Devise a simple classifier that can properly classify all eight (kg) Class labe 60 Healthy How is a fifth adult of weight 76 kg classified using this classifier?
structural methods; cognitive methods have gained popularity over the last decade or so. The models are not necessarily independent and hybrid systems involving multiple classifiers are increasingly common (Fu 1983). 1.3 Organization of the Book In Chap. 2, we will look at the classification process in detail and the different approaches to it, and will look at a few examples of classification tasks. In Chap. 3, we will look at non-metric methods such as decision trees; and in Chap. 4, we will consider probability theory, leading to Bayes’ Rule and the roots of statistical pattern recognition. Chapter 5 considers supervised learning—and examples of both parametric and non-parametric learning. We will look at different ways to evaluate the performance of classifiers in Chap. 5. Chapter 6 considers the curse of dimensionality and how to keep the number of features to a useful minimum. Chapter 7 considers unsupervised learning techniques, and Chap. 8 looks at ways to evaluate the performance of the various classifiers. Chapter 9 will consider stochastic methods, and Chap. 10 will discuss some interesting classification problems. By judiciously avoiding some of the details, the material can be covered in a single semester. Alternatively, fully featured (!!) and with a healthy dose of exercises/applications and some project work, it would form the basis for two semesters of work. The independent reader, on the other hand, can follow the material at his or her own pace and should find sufficient amusement for a few months! Enjoy, and happy studying! 1.4 Exercises 1. List a number of applications of classification, additional to those mentioned in the text. 2. Consider the data of four adults, indicating their weight (actually, their mass) and their health status. Devise a simple classifier that can properly classify all four patterns. Weight (kg) Class label 50 Unhealthy 60 Healthy 70 Healthy 80 Unhealthy How is a fifth adult of weight 76 kg classified using this classifier? 6 1 Introduction
References 7 3. Consider the following items bought in a supermarket and some of their Item Cost Volume no. (S) (cm) Color Class label Blue Inexpensive 10020 Red Expense Red Expensive Red Expensive Which of the three features(cost, volume and color)is the best classif Consider the problem of classifying objects into circles and ellipses. How would you classify such objects? References Alpaydin, E: Introduction to Machine learning, 2nd edn. MIT Press Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford Unive ss. Oxford (2006 Duda. R.o. Hart. P.E. Stork. D G. Pattern Classification. 2nd edn. York(2001 Fu, K.S.: A step towards unification of syntactic and statistical pattern recognition. IEEE Trans. Pattern Anal. Mach Intell. 5, 200-205(1983) Han, J, Kamber, M. Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco(2006) McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York (1992) Russell,S, Norvig, P: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, New York(2002)
3. Consider the following items bought in a supermarket and some of their characteristics: Item no. Cost ($) Volume (cm3 ) Color Class label 1 20 6 Blue Inexpensive 2 50 8 Blue Inexpensive 3 90 10 Blue Inexpensive 4 100 20 Red Expensive 5 160 25 Red Expensive 6 180 30 Red Expensive Which of the three features (cost, volume and color) is the best classifier? 4. Consider the problem of classifying objects into circles and ellipses. How would you classify such objects? References Alpaydin, E.: Introduction to Machine learning, 2nd edn. MIT Press, Cambridge (2010) Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (2006) Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001) Fu, K.S.: A step towards unification of syntactic and statistical pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 5, 200–205 (1983) Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006) McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York (1992) Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, New York (2002) References 7