Delp, E.J. Allebach, J, Bouman, C.A., Rajala, S.A., Bose, N.K., Sibul, L H, Wolf, w Zhang, Y-Q. "Multidimensional Signal Processing The electrical Engineering Handbook Ed. Richard C. dorf Boca Raton: CRc Press llc. 2000
Delp, E.J. Allebach, J., Bouman, C.A., Rajala, S.A., Bose, N.K., Sibul, L.H., Wolf, W., Zhang, Y-Q. “Multidimensional Signal Processing” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000
17 Multidimensional Signal Processing Edward J Delp Purdue University Jan Allebach Image Processing Capture. Point Operations Enhance Compression. Reconstruction Detection· Analysis Charles a bouman and Computer Vision Purdue University 17.2 Video Signal Processing mpling. Quantization. Vector Quantization. Video ah A. rajala Compression. Information-Preserving Coders. Predictive North Carolina State University Coding. Motion-Compensated Predictive Coding. Transform Coding.Subband Coding. HDTV. Motion Estimation N. K. Bose Techniques Token Matching Methods. Image Quality and Visual Pennsylvania State University Perception. Visual Perception L H. Sibul 17.3 Sensor Array Process patial Arrays, Beamformers, and FIR Filters. Discrete Arrays for Pennsylvania State University Beamforming. Discrete Arrays and Polynomials. Velocity Filtering Wayne Wolf 17.4 Video Processing Architectures Princeton University Computational Techniques.Heterogeneous Multiprocessors. Video Signal Processors. Instruction Set Extensions Ya-Qin Zhang 17.5 MPEG-4 Based Multimedia Information System Microsoft Research, china MPEG-4 Multimedia System 17.1 Digital Image processing Edward Delp, Jan Allebach, and Charles A Bouman What is a digital image? What is digital image processing? Why does the use of computers to process pictures seem to be everywhere? The space program, robots, and even people with personal computers are using digital image processing techniques. In this section we shall describe what a digital image is, how one obtains digital images, what the problems with digital images are(they are not trouble-free), and finally how these images are on of processing the images is presented later in the section. At the end of this section is a bibliography of selected references on digital image processing. The use of computers to process pictures is about 30 years old. While some work was done more than 50 years ago, the year 1960 is usually the accepted date when serious work was started in such areas as optical character recognition, image coding, and the space program. NASAs Ranger moon mission was one of the first programs to return digital images from space. The Jet Propulsion Laboratory (JPL) established one of the early genera purpose image processing facilities using second-generation computer technology. < The early attempts at digital image processing were hampered because of the relatively slow computers used, the IBM 7094, the fact that computer time itself was expensive, and that image digitizers had to be built by the research centers. It was not until the late 1960s that image processing hardware was generally available Ithough expensive). Today it is possible to put together a small laboratory system for less than $60,000;a system based on a popular home computer can be assembled for about $5,000. As the cost of computer hardware c2000 by CRC Press LLC
© 2000 by CRC Press LLC 17 Multidimensional Signal Processing 17.1 Digital Image Processing Image Capture • Point Operations • Image Enhancement • Digital Image Compression • Reconstruction • Edge Detection • Analysis and Computer Vision 17.2 Video Signal Processing Sampling • Quantization • Vector Quantization • Video Compression • Information-Preserving Coders • Predictive Coding • Motion-Compensated Predictive Coding • Transform Coding • Subband Coding • HDTV • Motion Estimation Techniques • Token Matching Methods • Image Quality and Visual Perception • Visual Perception 17.3 Sensor Array Processing Spatial Arrays, Beamformers, and FIR Filters • Discrete Arrays for Beamforming • Discrete Arrays and Polynomials • Velocity Filtering 17.4 Video Processing Architectures Computational Techniques • Heterogeneous Multiprocessors • Video Signal Processors • Instruction Set Extensions 17.5 MPEG-4 Based Multimedia Information System MPEG-4 Multimedia System 17.1 Digital Image Processing Edward J. Delp, Jan Allebach, and Charles A. Bouman What is a digital image? What is digital image processing? Why does the use of computers to process pictures seem to be everywhere? The space program, robots, and even people with personal computers are using digital image processing techniques. In this section we shall describe what a digital image is, how one obtains digital images, what the problems with digital images are (they are not trouble-free), and finally how these images are used by computers. A discussion of processing the images is presented later in the section. At the end of this section is a bibliography of selected references on digital image processing. The use of computers to process pictures is about 30 years old. While some work was done more than 50 years ago, the year 1960 is usually the accepted date when serious work was started in such areas as optical character recognition, image coding, and the space program. NASA’s Ranger moon mission was one of the first programs to return digital images from space. The Jet Propulsion Laboratory (JPL) established one of the early generalpurpose image processing facilities using second-generation computer technology. The early attempts at digital image processing were hampered because of the relatively slow computers used, i.e., the IBM 7094, the fact that computer time itself was expensive, and that image digitizers had to be built by the research centers. It was not until the late 1960s that image processing hardware was generally available (although expensive). Today it is possible to put together a small laboratory system for less than $60,000; a system based on a popular home computer can be assembled for about $5,000. As the cost of computer hardware Edward J. Delp Purdue University Jan Allebach Purdue University Charles A. Bouman Purdue University Sarah A. Rajala North Carolina State University N. K. Bose Pennsylvania State University L. H. Sibul Pennsylvania State University Wayne Wolf Princeton University Ya-Qin Zhang Microsoft Research, China
decreases, more uses of digital image processing will appear in all facets of life. Some people have predicted that by the turn of the century at least 50% of the images we handle in our private and professional lives will have been processed on a computer. Image Capture a digital image is nothing more than a matrix of numbers. The question is how does this matrix represent a real image that one sees on a computer screen? K Like all imaging processes, whether they are analog or digital, one first starts with a sensor(or transducer) that converts the original imaging energy into an electrical signal. These sensors, for instance, could be the photomultiplier tubes used in an x-ray system that converts the x-ray energy into a known electrical voltage. The transducer system used in ultrasound imaging is an example where sound pressure is converted to electrical energy, a simple Tv camera is perhaps the most ubiquitous example. An important fact to note is that the process of conversion from one energy form to an electrical signal is not necessarily a linear process. In other words, a proportional charge in the input energy to the sensor will not always cause the same proportional charge in the output electrical signal. In many cases calibration data are obtained in the laboratory so that the relationship between the input energy and output electrical signal is known. These data are necessary because istics change with age and other usage facto The sensor is not the only thing needed to form an image in an imaging system. The sensor must have some spatial extent before an image is formed By spatial extent we mean that the sensor must not be a simple point source examining only one location of energy output. To explain this further, let us examine two types of imaging sensors used in imaging: a CCD video camera and the ultrasound transducer used in many medical imaging applications The CCD camera consists of an array of light sensors known as charge-coupled devices. The image is formed by examining the output of each sensor in a preset order for a finite time. The electronics of the system then forms an electrical signal which produces an image that is shown on a cathode-ray tube(Crt) display. The image formed because there is an array of sensors, each one examining only one spatial location of the region to be sensed. The process of sampling the output of the sensor array in a particular order is known as scanning. Scanning the typical method used to convert a two-dimensional energy signal or image to a one-dimensional electrical ignal that can be handled by the computer. (An image can be thought of as an energy field with spatial extent. Another form of scanning is used in ultrasonic imaging. In this application there is only one sensor instead of an array of sensors. The ultrasound transducer is moved or steered (either mechanically or electrically)to various spatial locations on the patient's chest or stomach. As the sensor is moved to each location, the output electrical signal of the sensor is sampled and the electronics of the system then form a television-like signal which is displayed. Nearly all the transducers used in imaging form an image by either using an array of sensors or a single sensor that is moved to each spatial location. One immediately observes that both of the approaches discussed above are equivalent in that the energy is sensed at various spatial locations of the object to be imaged. This energy is then converted to an electrical signal by the transducer. The image formation processes just described are classical analog image formation, with the distance between the sensor locations limiting the spatial resolution in the system. In the array sensors, resolution is determined by how close the sensors are located in the array In the single-sensor approach, the spatial resolution is limited by how far the sensor is moved. In an actual system spatial resolution is also determined by the performance characteristics of the sensor. Here we are assuming for our purposes perfect sensors. In digital image formation one is concerned about two processes: spatial sampling and quantization. Sam pling is quite similar to scanning in analog image formation. The second process is known as quantization or analog-to-digital conversion, whereby at each spatial location a number is assigned to the amount of energy the transducer observes at that location. This number is usually proportional to the electrical signal at the output of the transducer. The overall process of sampling and quantization is known as digitization. Sometimes the digitization process is just referred to as analog-to-digital conversion, or A/D conversion; however, the reader should remember that digitization also includes spatial sampling The digital image formulation process is summarized in Fig 17. 1. The spatial sampling process can be considered as overlaying a grid on the object, with the sensor examining the energy output from each grid box C 2000 by CRC Press LLC
© 2000 by CRC Press LLC decreases, more uses of digital image processing will appear in all facets of life. Some people have predicted that by the turn of the century at least 50% of the images we handle in our private and professional lives will have been processed on a computer. Image Capture A digital image is nothing more than a matrix of numbers. The question is how does this matrix represent a real image that one sees on a computer screen? Like all imaging processes, whether they are analog or digital, one first starts with a sensor (or transducer) that converts the original imaging energy into an electrical signal. These sensors, for instance, could be the photomultiplier tubes used in an x-ray system that converts the x-ray energy into a known electrical voltage. The transducer system used in ultrasound imaging is an example where sound pressure is converted to electrical energy; a simple TV camera is perhaps the most ubiquitous example. An important fact to note is that the process of conversion from one energy form to an electrical signal is not necessarily a linear process. In other words, a proportional charge in the input energy to the sensor will not always cause the same proportional charge in the output electrical signal. In many cases calibration data are obtained in the laboratory so that the relationship between the input energy and output electrical signal is known. These data are necessary because some transducer performance characteristics change with age and other usage factors. The sensor is not the only thing needed to form an image in an imaging system. The sensor must have some spatial extent before an image is formed. By spatial extent we mean that the sensor must not be a simple point source examining only one location of energy output. To explain this further, let us examine two types of imaging sensors used in imaging: a CCD video camera and the ultrasound transducer used in many medical imaging applications. The CCD camera consists of an array of light sensors known as charge-coupled devices. The image is formed by examining the output of each sensor in a preset order for a finite time. The electronics of the system then forms an electrical signal which produces an image that is shown on a cathode-ray tube (CRT) display. The image is formed because there is an array of sensors, each one examining only one spatial location of the region to be sensed. The process of sampling the output of the sensor array in a particular order is known as scanning. Scanning is the typical method used to convert a two-dimensional energy signal or image to a one-dimensional electrical signal that can be handled by the computer. (An image can be thought of as an energy field with spatial extent.) Another form of scanning is used in ultrasonic imaging. In this application there is only one sensor instead of an array of sensors. The ultrasound transducer is moved or steered (either mechanically or electrically) to various spatial locations on the patient’s chest or stomach. As the sensor is moved to each location, the output electrical signal of the sensor is sampled and the electronics of the system then form a television-like signal which is displayed. Nearly all the transducers used in imaging form an image by either using an array of sensors or a single sensor that is moved to each spatial location. One immediately observes that both of the approaches discussed above are equivalent in that the energy is sensed at various spatial locations of the object to be imaged. This energy is then converted to an electrical signal by the transducer. The image formation processes just described are classical analog image formation, with the distance between the sensor locations limiting the spatial resolution in the system. In the array sensors, resolution is determined by how close the sensors are located in the array. In the single-sensor approach, the spatial resolution is limited by how far the sensor is moved. In an actual system spatial resolution is also determined by the performance characteristics of the sensor. Here we are assuming for our purposes perfect sensors. In digital image formation one is concerned about two processes: spatial sampling and quantization. Sampling is quite similar to scanning in analog image formation. The second process is known as quantization or analog-to-digital conversion, whereby at each spatial location a number is assigned to the amount of energy the transducer observes at that location. This number is usually proportional to the electrical signal at the output of the transducer. The overall process of sampling and quantization is known as digitization. Sometimes the digitization process is just referred to as analog-to-digital conversion, or A/D conversion; however, the reader should remember that digitization also includes spatial sampling. The digital image formulation process is summarized in Fig. 17.1. The spatial sampling process can be considered as overlaying a grid on the object, with the sensor examining the energy output from each grid box
c Object to be imaged 0001410o o39|9g|97 o59996 The quantization process 0049842I 0069950o o0289400 oo02|300|0 FIGURE 17.1 Digital image formation: sampling and quantization. and converting it to an electrical signal. The quantization process then assigns a number to the electrical signal; the result, which is a matrix of numbers, is the digital representation of the image. Each spatial location in the image (or grid)to which a number is assigned is known as a picture element or pixel (or pel). The size of the sampling grid is usually given by the number of pixels on each side of the grid, e.g., 256 X 256, 512 X 512, 488×380 The quantization process is necessary because all information to be processed using computers must be represented by numbers. The quantization process can be thought of as one where the input energy to the transducer is represented by a finite number of energy values. If the energy at a particular pixel location does not take on one of the finite energy values, it is assigned to the closest value. For instance, suppose that we me a priori that only energy values of 10, 20, 50, and 110 will be represented(the units are of no concern this example). Suppose at one pixel an energy of 23.5 was observed by the transducer. The A/D converter would then assign this pixel the energy value of 20(the closest one). Notice that the quantization process makes mistakes; this error in assignment is known as quantization error or quantization noise In our example, each pixel is represented by one of four possible values. For ease of representation of the data, it would be simpler to assign to each pixel the index value 0, 1, 2, 3, instead of 10, 20, 50, 110. In fact, this is typically done by the quantization process. One needs a simple table to know that a pixel assigned the value 2 corresponds to an energy of 50. Also, the number of possible energy levels is typically some integer power of two to also aid in representation. This power is known as the number of bits needed to represent the energy of each pixel In our example each pixel is represented by two bits. One question that immediately arises is how accurate the digital representation of the image is when one compares the digital image with a corresponding analog image. It should first be pointed out that after the digital image is obtained one requires special hardware to convert the matrix of pixels back to an image that can be viewed on a CRT display. The process of converting the digital image back to an image that can be viewed is known as digital-to-analog conversion, or D/a conversion C 2000 by CRC Press LLC
© 2000 by CRC Press LLC and converting it to an electrical signal. The quantization process then assigns a number to the electrical signal; the result, which is a matrix of numbers, is the digital representation of the image. Each spatial location in the image (or grid) to which a number is assigned is known as a picture element or pixel (or pel). The size of the sampling grid is usually given by the number of pixels on each side of the grid, e.g., 256 256, 512 512, 488 380. The quantization process is necessary because all information to be processed using computers must be represented by numbers. The quantization process can be thought of as one where the input energy to the transducer is represented by a finite number of energy values. If the energy at a particular pixel location does not take on one of the finite energy values, it is assigned to the closest value. For instance, suppose that we assume a priori that only energy values of 10, 20, 50, and 110 will be represented (the units are of no concern in this example). Suppose at one pixel an energy of 23.5 was observed by the transducer. The A/D converter would then assign this pixel the energy value of 20 (the closest one). Notice that the quantization process makes mistakes; this error in assignment is known as quantization error or quantization noise. In our example, each pixel is represented by one of four possible values. For ease of representation of the data, it would be simpler to assign to each pixel the index value 0, 1, 2, 3, instead of 10, 20, 50, 110. In fact, this is typically done by the quantization process. One needs a simple table to know that a pixel assigned the value 2 corresponds to an energy of 50. Also, the number of possible energy levels is typically some integer power of two to also aid in representation. This power is known as the number of bits needed to represent the energy of each pixel. In our example each pixel is represented by two bits. One question that immediately arises is how accurate the digital representation of the image is when one compares the digital image with a corresponding analog image. It should first be pointed out that after the digital image is obtained one requires special hardware to convert the matrix of pixels back to an image that can be viewed on a CRT display. The process of converting the digital image back to an image that can be viewed is known as digital-to-analog conversion, or D/A conversion. FIGURE 17.1 Digital image formation: sampling and quantization.
RE 17. 2 This image shows the effects of aliasing due to sampling the image at too low a rate. The image should be lines converging at a point. Because of undersampling, it appears as if there are patterns in the lines at various These are known as moire patterns. The quality of representation of the image is determined by how close spatially the pixels are located and how many levels or numbers are used in the quantization, i. e, how coarse or fine is the quantization. The sampling accuracy is usually measured in how many pixels there are in a given area and is cited in pixels/unit length, i. e, pixels/cm. This is known as the spatial sampling rate. One would desire to use the lowest rate possible the number of pixels needed to represent the object. If the sampling rate is too low, then obviously me details of the object to be imaged will not be represented very well. In fact, there is a mathematical theorem which determines the lowest sampling rate possible to preserve details in the object. This rate is known as the Nyquist sampling rate(named after the late Bell Laboratories engineer Harry Nyquist). The theorem states that the sampling rate must be twice the highest possible detail one expects to image in the object. If the object has details closer than, say 1 mm, one must take at least 2 pixels/mm. (The Nyquist theorem actually lys more than this, but a discussion of the entire theorem is beyond the scope of this section. If we sample at a lower rate than the theoretical lowest limit, the resulting digital representation of the object will be distorted. This type of distortion or sampling error is known as aliasing errors. Aliasing errors usually manifest themselves in the image as moire patterns(Fig 17. 2). The important point to remember is that there is a lower limit to the spatial sampling rate such that object detail can be maintained. The sampling rate can also be stated as the total number of pixels needed to represent the digital image, i. e, the matrix size(or grid size). One often sees these sampling rates cited as 256 X 256, 512 X 512, and so on. If the same object is imaged with a large matrix ize, the sampling rate has obviously increased. Typically, images are sampled on 256X 256, 512 X 512, or 1024 X 1024 grids, depending on the application and type of modality. One immediately observes an important sue in digital representation of images: that of the large number of pixels needed to represent the image. A 256 X 256 image has 65, 536 pixels and a 512 X 512 image has 262, 144 pixels! We shall return to this point later when we discuss processing or storage of these images The quality of the representation of the digital image is also determined by the number of levels or shades of gray that are used in the quantization. If one has more levels, then fewer mistakes will be made in assigning values at the output of the transducer. Figure 17.3 demonstrates how the number of gray levels affects the digital representation of an artery. When a small number of levels are used, the quantization is coarse and the quantization error is large. The quantization error usually manifests itself in the digital image by the appearance e 2000 by CRC Press LLC
© 2000 by CRC Press LLC The quality of representation of the image is determined by how close spatially the pixels are located and how many levels or numbers are used in the quantization, i.e., how coarse or fine is the quantization. The sampling accuracy is usually measured in how many pixels there are in a given area and is cited in pixels/unit length, i.e., pixels/cm. This is known as the spatial sampling rate. One would desire to use the lowest rate possible to minimize the number of pixels needed to represent the object. If the sampling rate is too low, then obviously some details of the object to be imaged will not be represented very well. In fact, there is a mathematical theorem which determines the lowest sampling rate possible to preserve details in the object. This rate is known as the Nyquist sampling rate (named after the late Bell Laboratories engineer Harry Nyquist). The theorem states that the sampling rate must be twice the highest possible detail one expects to image in the object. If the object has details closer than, say 1 mm, one must take at least 2 pixels/mm. (The Nyquist theorem actually says more than this, but a discussion of the entire theorem is beyond the scope of this section.) If we sample at a lower rate than the theoretical lowest limit, the resulting digital representation of the object will be distorted. This type of distortion or sampling error is known as aliasing errors. Aliasing errors usually manifest themselves in the image as moiré patterns (Fig. 17.2). The important point to remember is that there is a lower limit to the spatial sampling rate such that object detail can be maintained. The sampling rate can also be stated as the total number of pixels needed to represent the digital image, i.e., the matrix size (or grid size). One often sees these sampling rates cited as 256 3 256, 512 3 512, and so on. If the same object is imaged with a large matrix size, the sampling rate has obviously increased. Typically, images are sampled on 256 3 256, 512 3 512, or 1024 3 1024 grids, depending on the application and type of modality. One immediately observes an important issue in digital representation of images: that of the large number of pixels needed to represent the image. A 256 3 256 image has 65,536 pixels and a 512 3 512 image has 262,144 pixels! We shall return to this point later when we discuss processing or storage of these images. The quality of the representation of the digital image is also determined by the number of levels or shades of gray that are used in the quantization. If one has more levels, then fewer mistakes will be made in assigning values at the output of the transducer. Figure 17.3 demonstrates how the number of gray levels affects the digital representation of an artery. When a small number of levels are used, the quantization is coarse and the quantization error is large. The quantization error usually manifests itself in the digital image by the appearance FIGURE 17.2 This image shows the effects of aliasing due to sampling the image at too low a rate. The image should be straight lines converging at a point. Because of undersampling, it appears as if there are patterns in the lines at various angles. These are known as moiré patterns