Feature Vectors x Prototypes FIGURE 17.7 Vector quantization of a 2-D feature space The final step in image encoding is entropy coding. Here we convert the stream of prototype feature vectors to a binary stream of 0's and Is. Ideally, we would like to perform this conversion in a manner that yields the minimum average number of binary digits per prototype feature vector. In 1948, Claude Shannon proved that it is possible to code a discrete memoryless source using on the average as few binary digits per source symbol as the source entropy defined as Pn log, p Here p denotes the probability or relative frequency of occurrence of the nth symbol in the source alphabet, and log(x)=In(x)/In(2)is the base 2 logarithm of x The units of H are bits/source symbol. The proof of Shannons theorem is based on grouping the source symbols into large blocks and assigning binary code words of varying length to each block of source symbols. More probable blocks of source symbols are assigned shorter code words, whereas less probable blocks are assigned longer code words. As the block length approaches infinity, the bit rate tends to H Huffman determined the optimum variable-length coding scheme for a discrete memoryless source using blocks of any finite length. Table 17. 1 provides an example illustrating the concept of source coding. The source alphabet contains eight symbols with the probabilities indicated. For convenience, these symbols have been labeled in order of decreasing probability In the context of image encoding, the source alphabet would simply consist of the prototype feature vectors generated by the vector quantizer. The entropy of this source is 2. 31 bits/source symbol. If we were to use a fixed-length code for this source, we would need to use three binary digits for each source symbol as shown in Table 17.1. on the other hand the code words for the huffman code contain from 1 to 4 code letters (binary digits). In this case, the average code word length 1=∑P is 1=2.31 binary digits. Here I is the number of code letters in the code word for the source symbol an. This is the average number of binary digits per source symbol that would be needed to encode the source, and it is equal to the entropy. Thus, for this particular source, the Huffman code achieves the lower bound. It can be shown that in general the rate for the Huffman code will always be within 1 binary digit of the source entropy By grouping source symbols into blocks of length L and assigning code words to each block, this maximum e 2000 by CRC Press LLC
© 2000 by CRC Press LLC The final step in image encoding is entropy coding. Here we convert the stream of prototype feature vectors to a binary stream of 0’s and 1’s. Ideally, we would like to perform this conversion in a manner that yields the minimum average number of binary digits per prototype feature vector. In 1948, Claude Shannon proved that it is possible to code a discrete memoryless source using on the average as few binary digits per source symbol as the source entropy defined as Here pn denotes the probability or relative frequency of occurrence of the nth symbol in the source alphabet, and log2(x) = ln(x)/ln(2) is the base 2 logarithm of x. The units of H are bits/source symbol. The proof of Shannon’s theorem is based on grouping the source symbols into large blocks and assigning binary code words of varying length to each block of source symbols. More probable blocks of source symbols are assigned shorter code words, whereas less probable blocks are assigned longer code words. As the block length approaches infinity, the bit rate tends to H. Huffman determined the optimum variable-length coding scheme for a discrete memoryless source using blocks of any finite length. Table 17.1 provides an example illustrating the concept of source coding. The source alphabet contains eight symbols with the probabilities indicated. For convenience, these symbols have been labeled in order of decreasing probability. In the context of image encoding, the source alphabet would simply consist of the prototype feature vectors generated by the vector quantizer. The entropy of this source is 2.31 bits/source symbol. If we were to use a fixed-length code for this source, we would need to use three binary digits for each source symbol as shown in Table 17.1. On the other hand, the code words for the Huffman code contain from 1 to 4 code letters (binary digits). In this case, the average code word length is l ¯ = 2.31 binary digits. Here ln is the number of code letters in the code word for the source symbol an. This is the average number of binary digits per source symbol that would be needed to encode the source, and it is equal to the entropy. Thus, for this particular source, the Huffman code achieves the lower bound. It can be shown that in general the rate for the Huffman code will always be within 1 binary digit of the source entropy. By grouping source symbols into blocks of length L and assigning code words to each block, this maximum FIGURE 17.7 Vector quantization of a 2-D feature space. H p p n n n = -Â log 2 l p ln n n = Â
TABLE 17.1 A Discrete Source with an Eight-Symbol Alphabet and Two Source Symbol Probability P. Fixed-Length Code Huffman Code 0 1100 l1110 Ia distance can be decreased to 1/L binary digits. Note the subtle distinction here between bits, which are units of information, a property of the source alone, and binary digits, which are units of code word length, and hence only a property of the code used to represent the source. Also note that the Huffman code satisfies the prefix condition, i.e., no code word is the prefix of another longer code word. This means that a stream of 0s and 1s may be uniquely decoded into the corresponding sequence of source symbols without the need for markers to delineate boundaries between code words The Huffman code is determined from the binary tree shown in Fig. 17. 8. This tree is constructed recursively by combining the two least probable symbols in the alphabet into one composite symbol whose probability of occurrence is the sum of the probabilities of the two symbols that it represents. The code words for these two symbols are the same as that of the composite symbol with a 0 or a 1 appended at the end to distinguish between them. This procedure is repeated until the reduced alphabet contains only a single code word. Then the code word for a particular source symbol is determined by traversing the tree from its root to the leaf node for that source symbol. Reconstruction The objective of image reconstruction is to compute an unknown image from many complex measurements of the image. Usually, each measurement depends on many pixels in the image which may be spatially distant from one another source probability 1/16 code letter 1/32 FIGURE 17. 8 Binary tree used to generate the Huffman code for the source shown in Table 17 c2000 by CRC Press LLC
© 2000 by CRC Press LLC distance can be decreased to 1/L binary digits. Note the subtle distinction here between bits, which are units of information, a property of the source alone, and binary digits, which are units of code word length, and hence only a property of the code used to represent the source. Also note that the Huffman code satisfies the prefix condition, i.e., no code word is the prefix of another longer code word. This means that a stream of 0’s and 1’s may be uniquely decoded into the corresponding sequence of source symbols without the need for markers to delineate boundaries between code words. The Huffman code is determined from the binary tree shown in Fig. 17.8. This tree is constructed recursively by combining the two least probable symbols in the alphabet into one composite symbol whose probability of occurrence is the sum of the probabilities of the two symbols that it represents. The code words for these two symbols are the same as that of the composite symbol with a 0 or a 1 appended at the end to distinguish between them. This procedure is repeated until the reduced alphabet contains only a single code word. Then the code word for a particular source symbol is determined by traversing the tree from its root to the leaf node for that source symbol. Reconstruction The objective of image reconstruction is to compute an unknown image from many complex measurements of the image. Usually, each measurement depends on many pixels in the image which may be spatially distant from one another. TABLE 17.1 A Discrete Source with an Eight-Symbol Alphabet and Two Schemes for Encoding It Source Symbol Probability pn Fixed-Length Code Huffman Code a1 1/2 000 0 a2 1/8 001 100 a3 1/8 010 101 a4 1/16 011 1100 a5 1/16 100 1101 a6 1/16 101 1110 a7 1/32 110 11110 a8 1/32 111 11111 H = 2.31 l ¯ = 3 binary l ¯ = 2.31 binary bits/source digits/source digits/source symbol symbol symbol FIGURE 17.8 Binary tree used to generate the Huffman code for the source shown in Table 17.1
e FIGURE 17.9 Projection data for angle 0, resulting in the one-dimensional function p(e, t) A typical reconstruction problem is tomography, in which each measurement is obtained by integrating the pixel values along a ray through the image. Figure 17.9 illustrates the measurement of these ray integrals in the projection process. For each angle 0 a set of ray integrals is computed by varying the position t at which the ray passes through the image. The points along a ray are given by all the solutions (x,y) to the equation t= x cos 0 +y sin 0 We may therefore compute the ideal projection integrals by the following expression known as the Radon p(8,1)=f(y)8(t-x cos6-y sin 0)dxdy (17.2) where 8(t-x cos 8-y sin 0)is an impulse function that is nonzero along the projection ray. In practice, these projection integrals may be measured using a variety of physical techniques In transmission tomography, Ar photons are emitted into an object under test. a detector then counts the number of photons, a(e, t), which pass through the object without being absorbed. Collimators are used to ensure the detected energy passes straight through the object along the desired path. Since the attenuation of energy as it passes through the object is exponentially related to the integral of the object's density, the projection integral may ted from the form (,t) In emission tomography, one wishes to measure the rate of photon emission at each pixel. In this case, various methods may be used to collect and count all the photons emitted along a ray passing through the object. Once the projections p(e, t) have been measured, the objective is to compute the unknown cross section f(x,y). The image and projections may be related by first computing the Fourier transform of the 2-D image F(O2,,) dxdy and the 1-D projection for each angle e 2000 by CRC Press LLC
© 2000 by CRC Press LLC A typical reconstruction problem is tomography, in which each measurement is obtained by integrating the pixel values along a ray through the image. Figure 17.9 illustrates the measurement of these ray integrals in the projection process. For each angle u a set of ray integrals is computed by varying the position t at which the ray passes through the image. The points along a ray are given by all the solutions (x,y) to the equation t = x cos u + y sin u We may therefore compute the ideal projection integrals by the following expression known as the Radon transform (17.2) where d(t – x cos u –y sin u) is an impulse function that is nonzero along the projection ray. In practice, these projection integrals may be measured using a variety of physical techniques. In transmission tomography, lT photons are emitted into an object under test. A detector then counts the number of photons, l(u,t), which pass through the object without being absorbed. Collimators are used to ensure the detected energy passes straight through the object along the desired path. Since the attenuation of energy as it passes through the object is exponentially related to the integral of the object’s density, the projection integral may be computed from the formula In emission tomography, one wishes to measure the rate of photon emission at each pixel. In this case, various methods may be used to collect and count all the photons emitted along a ray passing through the object. Once the projections p(u,t) have been measured, the objective is to compute the unknown cross section f(x,y). The image and projections may be related by first computing the Fourier transform of the 2-D image and the 1-D projection for each angle FIGURE 17.9 Projection data for angle u, resulting in the one-dimensional function p(u,t). p(q, t) = f(x, y)d(t - x cos q - y sin q)dxdy -• • -• • Ú Ú p t t T ( , ) log ( , ) q l q l = - Ê Ë Á ˆ ¯ ˜ F f x y e dxdy x y j x y x y ( , ) ( , ) ( ) w w w w = + -• • -• • Ú Ú
P(6,o) p(e, t)e/ dt These two transforms are then related by the Fourier slice theorem. F(o cos 0, o sin 0)=P(0, o) In words, P0, o )corresponds to the value of the 2-D Fourier transform RO, O, ) along a 1-D line at an angle of 0 passing through the origin. The Fourier slice theorem may be used to develop two methods for inverting the Radon transform and thereby computing the image f. The first method, known as filtered back projection, computes the inverse Fourier transform in polar coordinates using the transformed projection dat x)=「 P(e, o)o lero( os 6-+ sin do.8 Notice that the o term accounts for the integration in polar coordinates A second inversion method results from performing all computations in the space domain rather than first transforming to the frequency domain a. This can be done by expressing the inner integral of filtered back on in the C P(e, o)lolejodo=[p(e, Dh(s-t) Here h(t) is the inverse Fourier transform of o. This results in the inversion formula known as convolution back projection f(x,y)=p(e, D )h(x cos 0+y sin 0-D)drde In practice, h must be a low-pass approximation to the true inverse Fourier transform of o.This to suppress noise in the projection data. In practice, the choice of h is the most important element in the of the reconstruction algorithm e detection The ability to find gray level edge structures in images is an important image processing operation. We shall define an edge to be regions in the image where there is a large change in gray level over a relatively small spatial region. The process of finding edge locations in digital images is known as edge detection. Most edge detection operators, also known as edge operators, use a window operator to first enhance the edges in the image, followed by thresholding the enhanced image. There has been a great deal of research performed in the area of edge detection. Some of the research issues include robust threshold selection, window size selection, noise response, edge linking, and the detection of edges in moving objects. While it is beyond the scope of this section to discuss these issues in detail, it is obvious that such things as threshold selection will greatly affect the performance of the edge detection algorithm. If the threshold is set too high, then many edge points will be missed; if set too low, then many"false"edge points will be obtained because of the inherent noise in the image. The investigation of the"optimal"choice of the threshold is an important research area. Selection of the particular window operation to enhance the edges an image, as an initial step in edge detection, has recently been based on using models of the performance of the human visual system in detecting edge c2000 by CRC Press LLC
© 2000 by CRC Press LLC These two transforms are then related by the Fourier slice theorem. F (w cos u, w sin u) = P (u,w) In words, P(u,w) corresponds to the value of the 2-D Fourier transform F(wx , wy) along a 1-D line at an angle of u passing through the origin. The Fourier slice theorem may be used to develop two methods for inverting the Radon transform and thereby computing the image f. The first method, known as filtered back projection, computes the inverse Fourier transform in polar coordinates using the transformed projection data. Notice that the *w* term accounts for the integration in polar coordinates. A second inversion method results from performing all computations in the space domain rather than first transforming to the frequency domain w. This can be done by expressing the inner integral of filtered back projection as a convolution in the space domain. Here h(t) is the inverse Fourier transform of *w*. This results in the inversion formula known as convolution back projection In practice, h must be a low-pass approximation to the true inverse Fourier transform of *w*. This is necessary to suppress noise in the projection data. In practice, the choice of h is the most important element in the design of the reconstruction algorithm. Edge Detection The ability to find gray level edge structures in images is an important image processing operation. We shall define an edge to be regions in the image where there is a large change in gray level over a relatively small spatial region. The process of finding edge locations in digital images is known as edge detection. Most edge detection operators, also known as edge operators, use a window operator to first enhance the edges in the image, followed by thresholding the enhanced image. There has been a great deal of research performed in the area of edge detection. Some of the research issues include robust threshold selection, window size selection, noise response, edge linking, and the detection of edges in moving objects. While it is beyond the scope of this section to discuss these issues in detail, it is obvious that such things as threshold selection will greatly affect the performance of the edge detection algorithm. If the threshold is set too high, then many edge points will be missed; if set too low, then many “false’’ edge points will be obtained because of the inherent noise in the image. The investigation of the “optimal’’ choice of the threshold is an important research area. Selection of the particular window operation to enhance the edges of an image, as an initial step in edge detection, has recently been based on using models of the performance of the human visual system in detecting edges. P p t e dt j t (q w, ) (q, ) w = -• • Ú f x y P e d d j x y ( , ) ( , ) ( cos sin ) = -• • + Ú Ú 1 2p 0 q w w w q p w q q * * 1 2p q w w w q w P e d p t h s t dt j s ( , )* * ( , ) ( ) -• • -• • Ú Ú = - f(x, y) = p( ,t)h(x cos + y sin - t)dtd -• • Ú Ú q q q q p 0
Analysis and Computer Vision The process of extracting useful measurements from an image or sequence of images is known as image analysis or computer vision. Before analysis can be performed one must first determine pertinent features or attributes of the object in the scene and extract information about these features. The selection of which features in the image to measure must be chosen a priori, based on empirical results. Most features used consist of shape properties, shape change properties, shading, texture, motion, depth, and color. After the features are extracted, one must then use the feature measurements to determine scene characteristics such as object identification In the past, simple pattern recognition algorithms, i. e, nearest-neighbor classification, have been used to compare the feature measurements of an image to a set of feature measurements that correspond to a known object. a decision is then made as to whether or not the features of the image match those of the known type. Recently, there has been work in the application of artificial intelligence techniques to image analysis. These approaches are very much different from classical statistical pattern recognition in that the feature measurements are used in a different manner as part of a larger system that attempts to model the scene and then determine what is in it based on the model Defining Terms Digital image: An array of numbers representing the spatial distribution of energy in a scene which is obtained by a process of sampling and quantization Edge: A localized region of rapid change in gray level in the image Entropy: A measure of the minimum amount of information required on the average to store or transmit Image compression or coding: The process of reducing the number of binary digits or bits required to represent the image Image enhancement: An image processing operation that is intended to improve the visual quality of the image or to emphasize certain features. feature: An attribute of a block of image pixels. reconstruction: The process of obtaining an image from nonimage data that characterizes that image. Lossless vs lossy compression: If the reconstructed or decoded image is identical to the original, the com- Pixel: A single sample or picture element in the digital image which is located at specific spatial coordinates. Point operation: An image processing operation in which individual pixels are mapped to new values irre pective of values o Projection: A set of parallel line integrals across the image oriented at a particular angle Quantization: The process of converting from a continuous-amplitude image to an image that takes on only te number of different am Sampling: The process of converting from a continuous-parameter image to a discrete-parameter image by discretizing the spatial coordinate Tomography: The process of reconstructing an image from projection data. Vector quantization: The process of replacing an exact vector of features by a prototype vector that is used to represent all feature vectors contained within a cluster window operation: An image processing operation in which the new value assigned to a given pixel depends on all the pixels within a window centered at that pixel location. Related Topics 15. 1 Coding, Transmission, and Storage. 73.6 Data Compression H C. Andrews and B.R. Hunt, Digital Image Restoration, Englewood Cliffs, N J. Prentice-Hall,1977. D H. Ballard and C. M. Brown, Computer Vision, Englewood Cliffs, N J Prentice-Hall, 1982 e 2000 by CRC Press LLC
© 2000 by CRC Press LLC Analysis and Computer Vision The process of extracting useful measurements from an image or sequence of images is known as image analysis or computer vision. Before analysis can be performed one must first determine pertinent features or attributes of the object in the scene and extract information about these features. The selection of which features in the image to measure must be chosen a priori, based on empirical results. Most features used consist of shape properties, shape change properties, shading, texture, motion, depth, and color. After the features are extracted, one must then use the feature measurements to determine scene characteristics such as object identification. In the past, simple pattern recognition algorithms, i.e., nearest-neighbor classification, have been used to compare the feature measurements of an image to a set of feature measurements that correspond to a known object. A decision is then made as to whether or not the features of the image match those of the known type. Recently, there has been work in the application of artificial intelligence techniques to image analysis. These approaches are very much different from classical statistical pattern recognition in that the feature measurements are used in a different manner as part of a larger system that attempts to model the scene and then determine what is in it based on the model. Defining Terms Digital image: An array of numbers representing the spatial distribution of energy in a scene which is obtained by a process of sampling and quantization. Edge: A localized region of rapid change in gray level in the image. Entropy: A measure of the minimum amount of information required on the average to store or transmit each quantized feature vector. Image compression or coding: The process of reducing the number of binary digits or bits required to represent the image. Image enhancement: An image processing operation that is intended to improve the visual quality of the image or to emphasize certain features. Image feature: An attribute of a block of image pixels. Image reconstruction: The process of obtaining an image from nonimage data that characterizes that image. Lossless vs. lossy compression: If the reconstructed or decoded image is identical to the original, the compression scheme is lossless. Otherwise, it is lossy. Pixel: A single sample or picture element in the digital image which is located at specific spatial coordinates. Point operation: An image processing operation in which individual pixels are mapped to new values irrespective of the values of any neighboring pixels. Projection: A set of parallel line integrals across the image oriented at a particular angle. Quantization: The process of converting from a continuous-amplitude image to an image that takes on only a finite number of different amplitude values. Sampling: The process of converting from a continuous-parameter image to a discrete-parameter image by discretizing the spatial coordinate. Tomography: The process of reconstructing an image from projection data. Vector quantization: The process of replacing an exact vector of features by a prototype vector that is used to represent all feature vectors contained within a cluster. Window operation: An image processing operation in which the new value assigned to a given pixel depends on all the pixels within a window centered at that pixel location. Related Topics 15.1 Coding, Transmission, and Storage • 73.6 Data Compression References H. C. Andrews and B.R. Hunt, Digital Image Restoration, Englewood Cliffs, N.J.: Prentice-Hall, 1977. D. H. Ballard and C. M. Brown, Computer Vision, Englewood Cliffs, N.J.: Prentice-Hall, 1982