Chapter 2 Classification 2.1 The Classification Process A general classification system, without feedback between stages, is shown in Fig 2.1 The sensinglacquisition stage uses a transducer such as a camera or a micro- phone. The acquired signal (e.g, an image) must be of sufficient quality that distinguishing"features"can be adequately measured pend on the characteristics of the transducer but for a camera this would include the following: ts resolution, dynamic range, sensitivity, distortion, signal-to-noise ratio, whether focused or not. etc. Pre-processing is often used to condition the image for segmentation. For example, smoothing of the image (e. g, by convolution with a Gaussian mask) itigates the confounding effect of noise on segmentation by thresholding(since the random fluctuations comprising noise can result in pixels being shifted across a threshold and being misclassified). Pre-processing with a median mask effectively removes shot (i.e, salt-and-pepper) noise. Removal of a variable background brightness and histogram equalization are often used to ensure even illumination. Depending on the circumstances, we may have to handle missing data(Batista and Monard 2003), and detect and handle outlier data(Hodge and Austin 2004) lar tae mentation partitions an image into regions that are meaningful for a particu- task-the foreground, comprising the objects of interest, and the background. everything else. There are two major approaches---region-based methods, in which similarities are detected, and boundary-based methods, in which discontinuities (edges)are detected and linked to form continuous boundaries around regions Region-based methods find connected regions based on some similarity of the pixels within them. The most basic feature in defining the regions is image gray level or brightness, but other features such as color or texture can also be used. However, if we require that the pixels in a region be very similar, we may over ment the image, and if we allow too much dissimilarity we may merge what should be separate objects. The goal is to find regions that correspond to objects as G. Dougherty, Pattern DOI 10.1007/978-1-4614-5323-9_2, C Springer Science+Business Media New York 2013
Chapter 2 Classification 2.1 The Classification Process A general classification system, without feedback between stages, is shown in Fig. 2.1. The sensing/acquisition stage uses a transducer such as a camera or a microphone. The acquired signal (e.g., an image) must be of sufficient quality that distinguishing “features” can be adequately measured. This will depend on the characteristics of the transducer, but for a camera this would include the following: its resolution, dynamic range, sensitivity, distortion, signal-to-noise ratio, whether focused or not, etc. Pre-processing is often used to condition the image for segmentation. For example, smoothing of the image (e.g., by convolution with a Gaussian mask) mitigates the confounding effect of noise on segmentation by thresholding (since the random fluctuations comprising noise can result in pixels being shifted across a threshold and being misclassified). Pre-processing with a median mask effectively removes shot (i.e., salt-and-pepper) noise. Removal of a variable background brightness and histogram equalization are often used to ensure even illumination. Depending on the circumstances, we may have to handle missing data (Batista and Monard 2003), and detect and handle outlier data (Hodge and Austin 2004). Segmentation partitions an image into regions that are meaningful for a particular task—the foreground, comprising the objects of interest, and the background, everything else. There are two major approaches—region-based methods, in which similarities are detected, and boundary-based methods, in which discontinuities (edges) are detected and linked to form continuous boundaries around regions. Region-based methods find connected regions based on some similarity of the pixels within them. The most basic feature in defining the regions is image gray level or brightness, but other features such as color or texture can also be used. However, if we require that the pixels in a region be very similar, we may oversegment the image, and if we allow too much dissimilarity we may merge what should be separate objects. The goal is to find regions that correspond to objects as humans see them, which is not an easy goal. G. Dougherty, Pattern Recognition and Classification: An Introduction, DOI 10.1007/978-1-4614-5323-9_2, # Springer Science+Business Media New York 2013 9
2 Classification Sensing -Pre-proce Segmentation and labeling Feature Post- Classification Decision Processi Fig. 2.1 A general classification process Region-based methods include thresholding [either using a global or a locally adaptive threshold; optimal thresholding(e. g, Otsu, isodata, or maximum entropy thresholding)). If this results in overlapping objects, thresholding of the distance transform of the image or using the watershed algorithm can help to separate them Other region-based methods include region growing(a bottom-up approach using seed"pixels) and split-and-merge(a top-down quadtree-based approach) Boundary-based methods tend to use either an edge detector(e. g, the Canny ng to king to form continuous boundaries. Alternatively, an active contour (or snake) can be used; this is a controlled continuity contour which elastically snaps around and encloses a target object by locking on to its edges Segmentation provides a simplified, binary image that separates objects of interest (foreground)from the background, while retaining their shape and size for later measurement. The foreground pixels are set to"I"(white), and the background pixels set toO(black). It is often desirable to label the objects in the image with discrete groups its pixels into components based on pixel connectivity, i.e., all pixels in a connected component share similar pixel values and are in some way connected with each other. Once all groups have been determined, each pixel is labeled with a number (1, 2, 3,...), according to the component to which it was assigned, and these numbers an be looked up as gray levels or colors for display(Fig. 2. 2). One obvious result of labeling is that the objects in an image can be readily counted. More generally, the labeled binary objects can be used to mask the original mage to isolate each(grayscale)object but retain its original pixel values so that its properties or features can be measured separately. Masking can be performed in several different ways. The binary mask can be used in an overlay, or alpha channel, in the display hardware to prevent pixels from being displayed. It is also possible to use the mask to modify the stored image. This can be achieved either by multiplying the grayscale image by the binary mask or by bit-wise ANDing the original image with the binary mask. Isolating features, which can then be measured indepen- dently, are the basis of region-of-interest(Rol) processing Post-processing of the segmented image can be used to prepare it for feature extraction. For example, partial objects can be removed from around the periphery of the image(e.g, Fig. 2.2e), disconnected objects can be merged, objects smaller or larger than certain limits can be removed, or holes in the objects or background can be filled by morphological opening or closin
Region-based methods include thresholding [either using a global or a locally adaptive threshold; optimal thresholding (e.g., Otsu, isodata, or maximum entropy thresholding)]. If this results in overlapping objects, thresholding of the distance transform of the image or using the watershed algorithm can help to separate them. Other region-based methods include region growing (a bottom-up approach using “seed” pixels) and split-and-merge (a top-down quadtree-based approach). Boundary-based methods tend to use either an edge detector (e.g., the Canny detector) and edge linking to link any breaks in the edges, or boundary tracking to form continuous boundaries. Alternatively, an active contour (or snake) can be used; this is a controlled continuity contour which elastically snaps around and encloses a target object by locking on to its edges. Segmentation provides a simplified, binary image that separates objects of interest (foreground) from the background, while retaining their shape and size for later measurement. The foreground pixels are set to “1” (white), and the background pixels set to “0” (black). It is often desirable to label the objects in the image with discrete numbers. Connected components labeling scans the segmented, binary image and groups its pixels into components based on pixel connectivity, i.e., all pixels in a connected component share similar pixel values and are in some way connected with each other. Once all groups have been determined, each pixel is labeled with a number (1, 2, 3, ...), according to the component to which it was assigned, and these numbers can be looked up as gray levels or colors for display (Fig. 2.2). One obvious result of labeling is that the objects in an image can be readily counted. More generally, the labeled binary objects can be used to mask the original image to isolate each (grayscale) object but retain its original pixel values so that its properties or features can be measured separately. Masking can be performed in several different ways. The binary mask can be used in an overlay, or alpha channel, in the display hardware to prevent pixels from being displayed. It is also possible to use the mask to modify the stored image. This can be achieved either by multiplying the grayscale image by the binary mask or by bit-wise ANDing the original image with the binary mask. Isolating features, which can then be measured independently, are the basis of region-of-interest (RoI) processing. Post-processing of the segmented image can be used to prepare it for feature extraction. For example, partial objects can be removed from around the periphery of the image (e.g., Fig. 2.2e), disconnected objects can be merged, objects smaller or larger than certain limits can be removed, or holes in the objects or background can be filled by morphological opening or closing. Fig. 2.1 A general classification process 10 2 Classification
2.2 Features [=(a)-(b)1.(d)segmented image [Otsu thresholding of (c)l,(e) partial objects removed image Fig. 2.2 (a)Original image, (b) variable background [from blurring (a)].(e)im (d), (f) labeled components image. (g)color-coded labeled components image 2.2 Features The next stage is feature extraction. Features are characteristic properties of the objects whose value should be similar for objects in a particular class, and different continuous (i.e, with numerical values)or categorical (i.e, with labeled values) Examples of continuous variables would be length, area, and texture. Categorical features are either ordinal [ where the order of the labeling is meaningful(e.g, class standing, military rank, level of satisfaction)] or nominal [where the ordering is not
2.2 Features The next stage is feature extraction. Features are characteristic properties of the objects whose value should be similar for objects in a particular class, and different from the values for objects in another class (or from the background). Features may be continuous (i.e., with numerical values) or categorical (i.e., with labeled values). Examples of continuous variables would be length, area, and texture. Categorical features are either ordinal [where the order of the labeling is meaningful (e.g., class standing, military rank, level of satisfaction)] or nominal [where the ordering is not Fig. 2.2 (a) Original image, (b) variable background [from blurring (a)], (c) improved image [¼(a) (b)], (d) segmented image [Otsu thresholding of (c)], (e) partial objects removed from (d), (f) labeled components image, (g) color-coded labeled components image 2.2 Features 11
2 Classification Fig. 2.3(a)Image and (b)its skeleton(red), with its branch points(white) and end points (gree meaningful (e. g, name, zip code, department)]. The choice of appropriate features depends on the particular image and the application at hand. However, they should be: Robust(i.e, they should normally be invariant to translation, orientation(rota- tion), scale, and illumination and well-designed features will be at least partially invariant to the presence of noise and artifacts; this may require some pre processing of the image) Discriminating (i.e, the range of values for objects in different classes should be different and preferably be well separated and non-overlapping) Reliable (i.e, all objects of the same class should have similar values Independent (i.e, uncorrelated; as a counter-example, length and area are correlated and it would be wasteful to consider both as separate features Features are higher level representations of structure and shape. Structural Measurements obtainable from the gray-level histogram of an object (using region-of-interest processing), such as its mean pixel value(grayness or color) nd its standard deviation, its contrast, and its entropy The texture of an object, using either statistical moments of the gray-level histogram of the object or its fractal dimension Shape features include The size or area, A, of an object, obtained directly from the number of pixels comprising each object, and its perimeter, P(obtained from its chain code) Its circularity(a ratio of perimeter to area, or area to perimeter(or a scaled version, such as 4TA/P)) Its skeleton or medial axis transform, or points within it such as branch nd end points, which can be obtained by counting the number of neighboring pixels on the skeleton(viz., 3 and 1, respectively)(e.g, Fig 2.3)
meaningful (e.g., name, zip code, department)]. The choice of appropriate features depends on the particular image and the application at hand. However, they should be: • Robust (i.e., they should normally be invariant to translation, orientation (rotation), scale, and illumination and well-designed features will be at least partially invariant to the presence of noise and artifacts; this may require some preprocessing of the image) • Discriminating (i.e., the range of values for objects in different classes should be different and preferably be well separated and non-overlapping) • Reliable (i.e., all objects of the same class should have similar values) • Independent (i.e., uncorrelated; as a counter-example, length and area are correlated and it would be wasteful to consider both as separate features) Features are higher level representations of structure and shape. Structural features include: • Measurements obtainable from the gray-level histogram of an object (using region-of-interest processing), such as its mean pixel value (grayness or color) and its standard deviation, its contrast, and its entropy • The texture of an object, using either statistical moments of the gray-level histogram of the object or its fractal dimension Shape features include: • The size or area, A, of an object, obtained directly from the number of pixels comprising each object, and its perimeter, P (obtained from its chain code) • Its circularity (a ratio of perimeter2 to area, or area to perimeter2 (or a scaled version, such as 4pA/P2 )) • Its aspect ratio (i.e., the ratio of the feret diameters, given by placing a bounding box around the object) • Its skeleton or medial axis transform, or points within it such as branch points and end points, which can be obtained by counting the number of neighboring pixels on the skeleton (viz., 3 and 1, respectively) (e.g., Fig. 2.3) Fig. 2.3 (a) Image and (b) its skeleton (red), with its branch points (white) and end points (green) circled 12 2 Classification
2.2 Features Fig 2.4 Image containing nuts and bolts ol The Euler number the number of connected components (i. e, objects)minus the number of holes in the ima Statistical moments of the boundary (ID)orarea(2D): the(m, n)th moment of a 2D discrete function, fx, y), such as a digital image with M x N pixels is defined as pyf(r, y) (2.1) where moo is the sum of the pixels of an image: for a binary image, it is equal to its area. The centroid, or center of gravity, of the image, (Hr, 4,), is given by(mo/moo, mo /moo). The central moments(viz., taken about the mean) are given by km=∑∑(x-n)y-)(x,y) where H20 and Ho2 are the variances of x and y, respectively, and Ao2 is covariance between x and y. The covariance matrix, C or cov(x, y), is from which shape features can be computed. The reader should consider what features would separate out the nuts(some face-on and some edge-on) and the bolts in Fig. 2. 4 a feature vector, x, is a vector containing the measured features, x1, x2....,xn
• The Euler number: the number of connected components (i.e., objects) minus the number of holes in the image • Statistical moments of the boundary (1D) or area (2D): the (m, n)th moment of a 2D discrete function, f(x, y), such as a digital image with M N pixels is defined as mmn ¼ X M x¼1 X N y¼1 xmyn fðx; yÞ (2.1) where m00 is the sum of the pixels of an image: for a binary image, it is equal to its area. The centroid, or center of gravity, of the image, (mx, my), is given by (m10/m00, m01/m00). The central moments (viz., taken about the mean) are given by mmn ¼ X M x¼1 X N y¼1 ðx mxÞ mðy myÞ n fðx; yÞ (2.2) where m20 and m02 are the variances of x and y, respectively, and m02 is the covariance between x and y. The covariance matrix, C or cov(x, y), is C ¼ m20 m11 m11 m02 (2.3) from which shape features can be computed. The reader should consider what features would separate out the nuts (some face-on and some edge-on) and the bolts in Fig. 2.4. A feature vector, x, is a vector containing the measured features, x1, x2, ..., xn Fig. 2.4 Image containing nuts and bolts 2.2 Features 13