GLOSSARY Three Step Search, a motion estimation algorithm ⅤCEG Video Coding Experts Group, a committee of If VCL Video Coding Layer video packet Coded unit suitable for packetisation VLC Variable length code VLD Variable length decoder VLE Variable Length Encoder ⅤLSI Very Large Scale Integrated circuit Video object Ⅴ ideo obiect Plane VQEG Video Quality Experts Group Video Quality Experts Motion compensation in which the prediction samples from two prediction references are scaled YCbCr Luminance, Blue chrominance, Red chrominance colour space YUV colour space(see YCbCr)
•xxiv GLOSSARY TSS Three Step Search, a motion estimation algorithm VCEG Video Coding Experts Group, a committee of ITU VCL Video Coding Layer video packet Coded unit suitable for packetisation VLC Variable Length Code VLD Variable Length Decoder VLE Variable Length Encoder VLSI Very Large Scale Integrated circuit VO Video Object VOP Video Object Plane VQEG Video Quality Experts Group VQEG Video Quality Experts Group Weighted Motion compensation in which the prediction samples from two prediction references are scaled YCbCr Luminance, Blue chrominance, Red chrominance colour space YUV A colour space (see YCbCr)
Introduction 1.1 THE SCENE Scene 1: Your avatar(a realistic 3D model with your appearance and voice) walks through a sophisticated virtual world populated by other avatars, product advertisements and video walls. On one virtual video screen is a news broadcast from your favourite channel; you want to see more about the current financial situation and so you interact with the broadcast and pull up the latest stock market figures. On another screen you call up a videoconference link vith three friends. The video images of the other participants, neatly segmented from their backgrounds, are presented against yet another virtual backdrop Scene 2: Your new 3G vidphone rings; you flip the lid open and answer the call. The face of your friend appears on the screen and you greet each other. Each sees a small, clear image of the other on the phone's screen, without any of the obvious " blockiness'of older-model video phones. After the call has ended, you call up a live video feed from a football match. The ality of the basic-rate stream isnt too great and you switch seamlessly to the higher-quality out more expensive)'premium' stream. For a brief moment the radio signal starts to break p but all you notice is a slight, temporary distortion in the video picture. These two scenarios illustrate different visions of the next generation of multimedia plications. The first is a vision of MPEG-4 Visual: a rich, interactive on-line world bring ing together synthetic, natural, video, image, 2D and 3D objects'. The second is a vision of H.264/AVC: highly efficient and reliable video communications, supporting two-way, streaming and broadcast applications and robust to channel transmission problems. The two standards, each with their advantages and disadvantages and each with their supporters and critics, are contenders in the race to provide video compression for next-generation comm- unication applications Turn on the television and surf through tens or hundreds of digital channels. Play yo favourite movies on the dvd player and breathe a sigh of relief that you can throw out your antiquated VHS tapes. Tune in to a foreign TV news broadcast on the web(still just a postage tamp video window but the choice and reliability of video streams is growing all the time) Chat to your friends and family by PC videophone. These activities are now commonplace and unremarkable, demonstrating that digital video is well on the way to becoming a ubiquitous H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia
1 Introduction 1.1 THE SCENE Scene 1: Your avatar (a realistic 3D model with your appearance and voice) walks through a sophisticated virtual world populated by other avatars, product advertisements and video walls. On one virtual video screen is a news broadcast from your favourite channel; you want to see more about the current financial situation and so you interact with the broadcast and pull up the latest stock market figures. On another screen you call up a videoconference link with three friends. The video images of the other participants, neatly segmented from their backgrounds, are presented against yet another virtual backdrop. Scene 2: Your new 3G vidphone rings; you flip the lid open and answer the call. The face of your friend appears on the screen and you greet each other. Each sees a small, clear image of the other on the phone’s screen, without any of the obvious ‘blockiness’ of older-model video phones. After the call has ended, you call up a live video feed from a football match. The quality of the basic-rate stream isn’t too great and you switch seamlessly to the higher-quality (but more expensive) ‘premium’ stream. For a brief moment the radio signal starts to break up but all you notice is a slight, temporary distortion in the video picture. These two scenarios illustrate different visions of the next generation of multimedia applications. The first is a vision of MPEG-4 Visual: a rich, interactive on-line world bringing together synthetic, natural, video, image, 2D and 3D ‘objects’. The second is a vision of H.264/AVC: highly efficient and reliable video communications, supporting two-way, ‘streaming’ and broadcast applications and robust to channel transmission problems. The two standards, each with their advantages and disadvantages and each with their supporters and critics, are contenders in the race to provide video compression for next-generation communication applications. Turn on the television and surf through tens or hundreds of digital channels. Play your favourite movies on the DVD player and breathe a sigh of relief that you can throw out your antiquated VHS tapes. Tune in to a foreign TV news broadcast on the web (still just a postagestamp video window but the choice and reliability of video streams is growing all the time). Chat to your friends and family by PC videophone. These activities are now commonplace and unremarkable, demonstrating that digital video is well on the way to becoming a ubiquitous H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia. Iain E. G. Richardson. C 2003 John Wiley & Sons, Ltd. ISBN: 0-470-84837-5
INTRODUCTION and essential component of the entertainment, computing, broadcasting and communications industries Pervasive, seamless, high-quality digital video has been the goal of companies, re- searchers and standards bodies over the last two decades. In some areas(for example broadcast television and consumer video storage), digital video has clearly captured the market, whilst in others(videoconferencing, video email, mobile video), market success is perhaps still too early to judge. However, there is no doubt that digital video is a globally important indus which will continue to pervade businesses, networks and homes. The continuous evolution the digital video industry is being driven by commercial and technical forces. The commercial drive comes from the huge revenue potential of persuading consumers and businesses(a)to replace analogue technology and older digital technology with new, efficient, high-quality digital video products and(b) to adopt new communication and entertainment products that have been made possible by the move to digital video. The technical drive comes from con- tinuing improvements in processing performance, the availability of higher-capacity storage and transmission mechanisms and research and development of video and image processing 6.9 Getting digital video from its source(a camera or a stored clip )to its destination(a dis play) involves a chain of components or processes. Key to this chain are the processes ompression(encoding) and decompression(decoding ), in which bandwidth-intensive raw digital video is reduced to a manageable size for transmission or storage then reconstructed for display. Getting the compression and decompression processes 'right' can give a significant technical and commercial edge to a product, by providing better image quality, greater relia- bility and/or more flexibility than competing solutions. There is therefore a keen interest in the continuing development and improvement of video compression and decompression methods and systems. The interested parties include entertainment, communication and broadcasting companies, software and hardware developers, researchers and holders of potentially lucrative patents on new compression algorithms The early successes in the digital video industry (notably broadcast digital television and DVD-Video) were underpinned by international standard Iso/EC 13818[1, popularly known as MPEG-2'(after the working group that developed the standard, the Moving Picture Experts Group). Anticipation of a need for better compression tools has led to the development of two further standards for video compression, known as ISo/EC 14496 Part 2(MPEG-4 Visual)[2]andITU-T Recommendation H. 264/SO/EC 14496 Part 10(H. 264)3. MPEG 4 Visual and H. 264 share the same ancestry and some common features(they both draw on well-proven techniques from earlier standards)but have notably different visions, seeking to improve upon the older standards in different ways. The vision of MPEG-4 Visual is to move away from a restrictive reliance on rectangular video images and to provide an open, flexible ramework for visual communications that uses the best features of efficient video compression and object-oriented processing. In contrast, H. 264 has a more pragmatic vision, aiming to do what previous standards did (provide a mechanism for the compression of rectangular video images) but to do it in a more efficient, robust and practical way, supporting the types of applications that are becoming widespread in the marketplace(such as broadcast, storage and streaming) At the present time there is a lively debate about which(if either) of these standards will come to dominate the market. mPeg-4 Visual is the more mature of the two new standards(its first Edition was published in 1999, whereas H. 264 became an International
•2 INTRODUCTION and essential component of the entertainment, computing, broadcasting and communications industries. Pervasive, seamless, high-quality digital video has been the goal of companies, researchers and standards bodies over the last two decades. In some areas (for example broadcast television and consumer video storage), digital video has clearly captured the market, whilst in others (videoconferencing, video email, mobile video), market success is perhaps still too early to judge. However, there is no doubt that digital video is a globally important industry which will continue to pervade businesses, networks and homes. The continuous evolution of the digital video industry is being driven by commercial and technical forces. The commercial drive comes from the huge revenue potential of persuading consumers and businesses (a) to replace analogue technology and older digital technology with new, efficient, high-quality digital video products and (b) to adopt new communication and entertainment products that have been made possible by the move to digital video. The technical drive comes from continuing improvements in processing performance, the availability of higher-capacity storage and transmission mechanisms and research and development of video and image processing technology. Getting digital video from its source (a camera or a stored clip) to its destination (a display) involves a chain of components or processes. Key to this chain are the processes of compression (encoding) and decompression (decoding), in which bandwidth-intensive ‘raw’ digital video is reduced to a manageable size for transmission or storage, then reconstructed for display. Getting the compression and decompression processes ‘right’ can give a significant technical and commercial edge to a product, by providing better image quality, greater reliability and/or more flexibility than competing solutions. There is therefore a keen interest in the continuing development and improvement of video compression and decompression methods and systems. The interested parties include entertainment, communication and broadcasting companies, software and hardware developers, researchers and holders of potentially lucrative patents on new compression algorithms. The early successes in the digital video industry (notably broadcast digital television and DVD-Video) were underpinned by international standard ISO/IEC 13818 [1], popularly known as ‘MPEG-2’ (after the working group that developed the standard, the Moving Picture Experts Group). Anticipation of a need for better compression tools has led to the development of two further standards for video compression, known as ISO/IEC 14496 Part 2 (‘MPEG-4 Visual’) [2] and ITU-T Recommendation H.264/ISO/IEC 14496 Part 10 (‘H.264’) [3]. MPEG- 4 Visual and H.264 share the same ancestry and some common features (they both draw on well-proven techniques from earlier standards) but have notably different visions, seeking to improve upon the older standards in different ways. The vision of MPEG-4 Visual is to move away from a restrictive reliance on rectangular video images and to provide an open, flexible framework for visual communications that uses the best features of efficient video compression and object-oriented processing. In contrast, H.264 has a more pragmatic vision, aiming to do what previous standards did (provide a mechanism for the compression of rectangular video images) but to do it in a more efficient, robust and practical way, supporting the types of applications that are becoming widespread in the marketplace (such as broadcast, storage and streaming). At the present time there is a lively debate about which (if either) of these standards will come to dominate the market. MPEG-4 Visual is the more mature of the two new standards (its first Edition was published in 1999, whereas H.264 became an International
Ⅴ IDEO COMPRESSION Standard/Recommendation in 2003). There is no doubt that H.264 can out-perform MPEG-4 Visual in compression efficiency but it does not have the older standard's bewildering flexi- bility. The licensing situation with regard to MPEG-4 Visual is clear(and not popular with some parts of the industry) but the cost of licensing H. 264 remains to be agreed. This book is about these two important new standards and examines the background to the standards, the core concepts and technical details of each standard and the factors that will determine the answer to the question"MPEG-4 Visual or H. 264? l.2ⅴ IDEO COMIPRESSION Network bitrates continue to increase(dramatically in the local area and somewhat the wider area), high bitrate connections to the home are commonplace and the storage of hard disks, flash memories and optical media is greater than ever before. with per transmitted or stored bit continually falling, it is perhaps not immediately obvious why video compression is necessary(and why there is such a significant effort to make it better) Video compression has two important benefits. First, it makes it possible to use digital video in transmission and storage environments that would not support uncompressed (raw) video For example, current Internet throughput rates are insufficient to handle uncompressed video in real time(even at low frame rates and/or small frame size) a Digital Versatile Disk(dvd) can only store a few seconds of raw video at television-quality resolution and frame rate and so DVD-Video storage would not be practical without video and audio compression econd, video compression enables more efficient use of transmission and storage resources If a high bitrate transmission channel is available, then it is a more attractive proposition to send high-resolution compressed video or multiple compressed video channels than to send a single, low-resolution, uncompressed stream. Even with constant advances in storage and transmission capacity, compression is likely to be an essential component of multimedia services for many years to come An information-carrying signal may be compressed by removing redundancy from the signal. In a lossless compression system statistical redundancy is removed so that the origi nal signal can be perfectly reconstructed at the receiver. Unfortunately, at the present time lossless methods can only achieve a modest amount of compression of image and video signals. Most practical video compression techniques are based on lossy compression, in which greater compression is achieved with the penalty that the decoded signal is not identical to the original. The goal of a video compression algorithm is to achieve efficient compression whilst minimising the distortion introduced by the compression process and video compression algorithms operate by removing redundancy in the temporal, spatial and/or frequency domains. Figure 1. I shows an example of a single video frame. Within the highlighted regions, there is little variation in the content of the image and hence there is significant spatial redundancy. Figure 1.2 shows the same frame after the background region has been low-pass filtered(smoothed), removing some of the higher-frequency content. The human eye and brain(Human Visual System) are more sensitive to lower frequencies and so the image is still recognisable despite the fact that much of theinformation'has been removed Figure 1.3 shows the next frame in the video sequence. The sequence was captured from a camera at 25 frames per second and so there is little change between the two frames in the short interval of 1/25 of a second. There is clearly significant temporal redundancy, i.e. most
VIDEO COMPRESSION •3 Standard/Recommendation in 2003). There is no doubt that H.264 can out-perform MPEG-4 Visual in compression efficiency but it does not have the older standard’s bewildering flexibility. The licensing situation with regard to MPEG-4 Visual is clear (and not popular with some parts of the industry) but the cost of licensing H.264 remains to be agreed. This book is about these two important new standards and examines the background to the standards, the core concepts and technical details of each standard and the factors that will determine the answer to the question ‘MPEG-4 Visual or H.264?’. 1.2 VIDEO COMPRESSION Network bitrates continue to increase (dramatically in the local area and somewhat less so in the wider area), high bitrate connections to the home are commonplace and the storage capacity of hard disks, flash memories and optical media is greater than ever before. With the price per transmitted or stored bit continually falling, it is perhaps not immediately obvious why video compression is necessary (and why there is such a significant effort to make it better). Video compression has two important benefits. First, it makes it possible to use digital video in transmission and storage environments that would not support uncompressed (‘raw’) video. For example, current Internet throughput rates are insufficient to handle uncompressed video in real time (even at low frame rates and/or small frame size). A Digital Versatile Disk (DVD) can only store a few seconds of raw video at television-quality resolution and frame rate and so DVD-Video storage would not be practical without video and audio compression. Second, video compression enables more efficient use of transmission and storage resources. If a high bitrate transmission channel is available, then it is a more attractive proposition to send high-resolution compressed video or multiple compressed video channels than to send a single, low-resolution, uncompressed stream. Even with constant advances in storage and transmission capacity, compression is likely to be an essential component of multimedia services for many years to come. An information-carrying signal may be compressed by removing redundancy from the signal. In a lossless compression system statistical redundancy is removed so that the original signal can be perfectly reconstructed at the receiver. Unfortunately, at the present time lossless methods can only achieve a modest amount of compression of image and video signals. Most practical video compression techniques are based on lossy compression, in which greater compression is achieved with the penalty that the decoded signal is not identical to the original. The goal of a video compression algorithm is to achieve efficient compression whilst minimising the distortion introduced by the compression process. Video compression algorithms operate by removing redundancy in the temporal, spatial and/or frequency domains. Figure 1.1 shows an example of a single video frame. Within the highlighted regions, there is little variation in the content of the image and hence there is significant spatial redundancy. Figure 1.2 shows the same frame after the background region has been low-pass filtered (smoothed), removing some of the higher-frequency content. The human eye and brain (Human Visual System) are more sensitive to lower frequencies and so the image is still recognisable despite the fact that much of the ‘information’ has been removed. Figure 1.3 shows the next frame in the video sequence. The sequence was captured from a camera at 25 frames per second and so there is little change between the two frames in the short interval of 1/25 of a second. There is clearly significant temporal redundancy, i.e. most
INTRODUCTION Homogeneous regions Figure 1.1 Video frame(showing examples of homogeneous regions Figure 1. 2 Video frame(low-pass filtered background
•4 INTRODUCTION Figure 1.1 Video frame (showing examples of homogeneous regions) Figure 1.2 Video frame (low-pass filtered background)