MPEG-4 AND H. 264 Figure 1. 3 Video frame 2 of the image remains unchanged between successive frames. By removing different types of redundancy(spatial, frequency and/or temporal) it is possible to compress the data significantly at the expense of a certain amount of information loss(distortion). Further compression can be achieved by encoding the processed data using an entropy coding scheme such as Huffman coding or Arithmetic codins Image and video compression has been a very active field of research and development for over 20 years and many different systems and algorithms for compression and decompression have been proposed and developed. In order to encourage interworking, competition and increased choice, it has been necessary to define standard methods of compression encoding and decoding to allow products from different manufacturers to communicate effectively. This has led to the development of a number of key International Standards for image and video compression, including the JPEG, MPEG and H 26x series of standards 1.3 MPEG-4 AND H. 264 MPEG-4 Visual and H. 264(also known as Advanced Video Coding) are standards for the coded representation of visual information. Each standard is a document that primarily defines two things, a coded representation(or syntax) that describes visual data in a compressed form and a method of decoding the syntax to reconstruct visual information. Each standard aims to ensure that compliant encoders and decoders can successfully interwork with each other, whilst allow manufacturers the freedom to develop competitive and innovative products. The standard ifically do not define an encoder; rather, they define the output that an encoder should
MPEG-4 AND H.264 •5 Figure 1.3 Video frame 2 of the image remains unchanged between successive frames. By removing different types of redundancy (spatial, frequency and/or temporal) it is possible to compress the data significantly at the expense of a certain amount of information loss (distortion). Further compression can be achieved by encoding the processed data using an entropy coding scheme such as Huffman coding or Arithmetic coding. Image and video compression has been a very active field of research and development for over 20 years and many different systems and algorithms for compression and decompression have been proposed and developed. In order to encourage interworking, competition and increased choice, it has been necessary to define standard methods of compression encoding and decoding to allow products from different manufacturers to communicate effectively. This has led to the development of a number of key International Standards for image and video compression, including the JPEG, MPEG and H.26× series of standards. 1.3 MPEG-4 AND H.264 MPEG-4 Visual and H.264 (also known as Advanced Video Coding) are standards for the coded representation of visual information. Each standard is a document that primarily defines two things, a coded representation (or syntax) that describes visual data in a compressed form and a method of decoding the syntax to reconstruct visual information. Each standard aims to ensure that compliant encoders and decoders can successfully interwork with each other, whilst allowing manufacturers the freedom to develop competitive and innovative products. The standards specifically do not define an encoder; rather, they define the output that an encoder should
INTRODUCTION produce. a decoding method is defined in each standard but manufacturers are free to develop alternative decoders as long as they achieve the same result as the method in the standard MPEG-4 Visual(Part 2 of the MPEG-4 group of standards) was developed by the Moving Picture Experts Group(MPEG), a working group of the International Organisation for Stan- dardisation (ISo). This group of several hundred technical experts(drawn from industry and research organisations)meet at 2-3 month intervals to develop the MPEG series of standards. MPEG-4(a multi-part standard covering audio coding, systems issues and related asp audio/visual communication)was first conceived in 1993 and Part 2 was standardised in 1999 The H.264 standardisation effort was initiated by the Video Coding Experts Group(VCEG), a working group of the International Telecommunication Union(ITU-T)that operates in a similar way to MPEG and has been responsible for a series of visual telecommunication stan dards. The final stages of developing the H. 264 standard have been carried out by the Joint Video Team, a collaborative effort of both VCEG and MPEG, making it possible to publish the final standard under the joint auspices of iso/EC (as MPEG-4 Part 10)and ITU-T(as Recommendation H. 264)in 2003 MPEG-4 Visual and H. 264 have related but significantly different visions. Both are con- cerned with compression of visual data but MPEG-4 Visual emphasises flexibility whilst H. 264s emphasis is on efficiency and reliability. MPEG-4 Visual provides a highly flexible toolkit of coding techniques and resources, making it possible to deal with a wide range of ypes of visual data including rectangular frames(traditional video material), video objects (arbitrary-shaped regions of a visual scene), still images and hybrids of natural (real-world) and synthetic(computer-generated) visual information. MPEG-4 Visual provides its func- tionality through a set of coding tools, organised into'profiles', recommended groupings of tools suitable for certain applications. Classes of profiles include'simple' profiles(coding of rectangular video frames), object-based profiles(coding of arbitrary-shaped visual objects), still texture profiles(coding of still images or'texture'), scalable profiles(coding at multiple In contrast with the highly flexible approach of MPEG-4 Visual, H. 264 concentrates specifically on efficient compression of video frames. Key features of the standard include compression efficiency(providing significantly better compression than any previous stan dard), transmission efficiency(with a number of built-in features to support reliable, robust transmission over a range of channels and networks)and a focus on popular applications of video compression. Only three profiles are currently supported (in contrast to nearly 20 in MPEG-4 Visual), each targeted at a class of popular video communication applications. The Baseline profile may be particularly useful for"conversational"applications such as video- conferencing, the Extended profile adds extra tools that are likely to be useful for video stream- ing across networks and the Main profile includes tools that may be suitable for consumer pplications such as video broadcast and storage 1.4 THIS BOOK The aim of this book is to provide a technically-oriented guide to the MPEg-4 Visual and H. 264/AVC standards, with an emphasis on practical issues. Other works cover the details of the other parts of the MPEG-4 standard [4-6] and this book concentrates on the application of MPEG-4 Visual and H. 264 to the coding of natural video. Most practical applications of
•6 INTRODUCTION produce. A decoding method is defined in each standard but manufacturers are free to develop alternative decoders as long as they achieve the same result as the method in the standard. MPEG-4 Visual (Part 2 of the MPEG-4 group of standards) was developed by the Moving Picture Experts Group (MPEG), a working group of the International Organisation for Standardisation (ISO). This group of several hundred technical experts (drawn from industry and research organisations) meet at 2–3 month intervals to develop the MPEG series of standards. MPEG-4 (a multi-part standard covering audio coding, systems issues and related aspects of audio/visual communication) was first conceived in 1993 and Part 2 was standardised in 1999. The H.264 standardisation effort was initiated by the Video Coding Experts Group (VCEG), a working group of the International Telecommunication Union (ITU-T) that operates in a similar way to MPEG and has been responsible for a series of visual telecommunication standards. The final stages of developing the H.264 standard have been carried out by the Joint Video Team, a collaborative effort of both VCEG and MPEG, making it possible to publish the final standard under the joint auspices of ISO/IEC (as MPEG-4 Part 10) and ITU-T (as Recommendation H.264) in 2003. MPEG-4 Visual and H.264 have related but significantly different visions. Both are concerned with compression of visual data but MPEG-4 Visual emphasises flexibility whilst H.264’s emphasis is on efficiency and reliability. MPEG-4 Visual provides a highly flexible toolkit of coding techniques and resources, making it possible to deal with a wide range of types of visual data including rectangular frames (‘traditional’ video material), video objects (arbitrary-shaped regions of a visual scene), still images and hybrids of natural (real-world) and synthetic (computer-generated) visual information. MPEG-4 Visual provides its functionality through a set of coding tools, organised into ‘profiles’, recommended groupings of tools suitable for certain applications. Classes of profiles include ‘simple’ profiles (coding of rectangular video frames), object-based profiles (coding of arbitrary-shaped visual objects), still texture profiles (coding of still images or ‘texture’), scalable profiles (coding at multiple resolutions or quality levels) and studio profiles (coding for high-quality studio applications). In contrast with the highly flexible approach of MPEG-4 Visual, H.264 concentrates specifically on efficient compression of video frames. Key features of the standard include compression efficiency (providing significantly better compression than any previous standard), transmission efficiency (with a number of built-in features to support reliable, robust transmission over a range of channels and networks) and a focus on popular applications of video compression. Only three profiles are currently supported (in contrast to nearly 20 in MPEG-4 Visual), each targeted at a class of popular video communication applications. The Baseline profile may be particularly useful for “conversational” applications such as videoconferencing, the Extended profile adds extra tools that are likely to be useful for video streaming across networks and the Main profile includes tools that may be suitable for consumer applications such as video broadcast and storage. 1.4 THIS BOOK The aim of this book is to provide a technically-oriented guide to the MPEG-4 Visual and H.264/AVC standards, with an emphasis on practical issues. Other works cover the details of the other parts of the MPEG-4 standard [4–6] and this book concentrates on the application of MPEG-4 Visual and H.264 to the coding of natural video. Most practical applications of
REFERENCES MPEG-4(and emerging applications of H. 264) make use of a subset of the tools provided by each standard (a' profile)and so the treatment of each standard in this book is organised according to profile, starting with the most basic profiles and then introducing the extra tools supported by more advanced profiles Chapters 2 and 3 cover essential background material that is required for an understanding of both MPEG-4 Visual and H. 264. Chapter 2 introduces the basic concepts of digital video including capture and representation of video in digital form, colour-spaces, formats and quality measurement. Chapter 3 covers the fundamentals of video compression, concentrating on aspects of the compression process that are common to both standards and introducing the transform-based CODEC 'model that is at the heart of all of the major video coding standards Chapter 4 looks at the standards themselves and examines the way that the standards have been shaped and developed, discussing the composition and procedures of the VCEG and MPEG standardisation groups. The chapter summarises the content of the standards and gives practical advice on how to approach and interpret the standards and ensure conformance Related image and video coding standards are briefly discussed Chapters 5 and 6 focus on the technical features of MPEG-4 Visual and H. 264. The ap- proach is based on the structure of the Profiles of each standard (important conformance points for CODEC developers). The Simple Profile(and related Profiles)have shown themselves to be by far the most popular features of MPEG-4 Visual to date and so Chapter 5 concentrates first on the compression tools supported by these Profiles, followed by the remaining (les commercially popular)Profiles supporting coding of video objects, still texture, scalable ob- cts and so on. Because this book is primarily about compression of natural(real-world) video information, MPEG-4 Visual,s synthetic visual tools are covered only briefly. H.264's Baseline Profile is covered first in Chapter 6, followed by the extra tools included in the Main and Extended Profiles. Chapters 5 and 6 make extensive reference back to Chapter 3(Video Coding Concepts). H. 264 is dealt with in greater technical detail than MPEG-4 Visual because of the limited availability of reference material on the newer standard Practical issues related to the design and performance of video CODECs are discussed in Chapter 7. The design requirements of each of the main functional modules required n a practical encoder or decoder are addressed, from motion estimation through to entropy oding. The chapter examines interface requirements and practical approaches to pre-and post- processing of video to improve cor on efficiency and/or visual quality. The compression nd computational performance of the two standards is compared and rate control(matchin the encoder output to practical transmission or storage mechanisms) and issues faced in transporting and storing of compressed video are discussed Chapter 8 examines the requirements of some current and emerging applications, lists some currently-available CODECs and implementation platforms and discusses the important implications of commercial factors such as patent licenses. Finally, some predictions are made about the next steps in the standardisation process and emerging research issues that may influence the development of future video coding standards 1. 5 REFERENCES 1. ISo/EC 13818, Information Technology-Generic Coding of Moving Pictures and Associated Audio Information. 2000
REFERENCES •7 MPEG-4 (and emerging applications of H.264) make use of a subset of the tools provided by each standard (a ‘profile’) and so the treatment of each standard in this book is organised according to profile, starting with the most basic profiles and then introducing the extra tools supported by more advanced profiles. Chapters 2 and 3 cover essential background material that is required for an understanding of both MPEG-4 Visual and H.264. Chapter 2 introduces the basic concepts of digital video including capture and representation of video in digital form, colour-spaces, formats and quality measurement. Chapter 3 covers the fundamentals of video compression, concentrating on aspects of the compression process that are common to both standards and introducing the transform-based CODEC ‘model’ that is at the heart of all of the major video coding standards. Chapter 4 looks at the standards themselves and examines the way that the standards have been shaped and developed, discussing the composition and procedures of the VCEG and MPEG standardisation groups. The chapter summarises the content of the standards and gives practical advice on how to approach and interpret the standards and ensure conformance. Related image and video coding standards are briefly discussed. Chapters 5 and 6 focus on the technical features of MPEG-4 Visual and H.264. The approach is based on the structure of the Profiles of each standard (important conformance points for CODEC developers). The Simple Profile (and related Profiles) have shown themselves to be by far the most popular features of MPEG-4 Visual to date and so Chapter 5 concentrates first on the compression tools supported by these Profiles, followed by the remaining (less commercially popular) Profiles supporting coding of video objects, still texture, scalable objects and so on. Because this book is primarily about compression of natural (real-world) video information, MPEG-4 Visual’s synthetic visual tools are covered only briefly. H.264’s Baseline Profile is covered first in Chapter 6, followed by the extra tools included in the Main and Extended Profiles. Chapters 5 and 6 make extensive reference back to Chapter 3 (Video Coding Concepts). H.264 is dealt with in greater technical detail than MPEG-4 Visual because of the limited availability of reference material on the newer standard. Practical issues related to the design and performance of video CODECs are discussed in Chapter 7. The design requirements of each of the main functional modules required in a practical encoder or decoder are addressed, from motion estimation through to entropy coding. The chapter examines interface requirements and practical approaches to pre- and postprocessing of video to improve compression efficiency and/or visual quality. The compression and computational performance of the two standards is compared and rate control (matching the encoder output to practical transmission or storage mechanisms) and issues faced in transporting and storing of compressed video are discussed. Chapter 8 examines the requirements of some current and emerging applications, lists some currently-available CODECs and implementation platforms and discusses the important implications of commercial factors such as patent licenses. Finally, some predictions are made about the next steps in the standardisation process and emerging research issues that may influence the development of future video coding standards. 1.5 REFERENCES 1. ISO/IEC 13818, Information Technology – Generic Coding of Moving Pictures and Associated Audio Information, 2000
INTRODUCTION 2. ISO/EC 14496-2, Coding of Audio-Visual Objects- Part 2: Visual, 2001 3. ISO/EC 14496-10 and ITU-T Rec. H. 264, Advanced Video Coding, 2003. 4. F. Pereira and T Ebrahimi (eds). The MPeg-4 Book, IMSC Press, 2002. 5. A. Walsh and M. Bourges-Sevenier(eds), MPEG-4 Jump Start, Prentice-Hall, 2002 6.Iso/ecJtci/sc29/glLN4668,Mpeg-4Overviewshttp://www.m4if.org/resources/ Overview.pdf, March 2002
•8 INTRODUCTION 2. ISO/IEC 14496-2, Coding of Audio-Visual Objects – Part 2:Visual, 2001. 3. ISO/IEC 14496-10 and ITU-T Rec. H.264, Advanced Video Coding, 2003. 4. F. Pereira and T. Ebrahimi (eds), The MPEG-4 Book, IMSC Press, 2002. 5. A. Walsh and M. Bourges-S´evenier (eds), MPEG-4 Jump Start, Prentice-Hall, 2002. 6. ISO/IEC JTC1/SC29/WG11 N4668, MPEG-4 Overview, http://www.m4if.org/resources/ Overview.pdf, March 2002
2 Video Formats and Quality 2.1 INTRODUCTION Video coding is the process of compressing and decompressing a digital video signal. This chapter examines the structure and characteristics of digital images and video signals and introduces concepts such as sampling formats and quality metrics that are helpful to an understanding of video coding. Digital video is a representation of a natural(real-world) visual scene, sampled spatially and temporally. A scene is sampled at a point in time to produce a frame(a representation of the complete visual scene at that point in time)or field (consisting of odd-or even-numbered lines of spatial samples). Sampling is repeated at intervals(e. g 1/25 or 1/30 second intervalsto produce a moving video signal. Three sets of samples(components)are typically required to represent a scene in colour. Popular for- mats for representing video in digital form include the ITu-R 601 standard and the set of intermediate formats. The accuracy of a reproduction of a visual scene must be measured to determine the performance of a visual communication system, a notoriously difficult and inexact process. Subjective measurements are time consuming and prone to variations in the response of human viewers. Objective(automatic)measurements are easier to implement but s yet do not accurately match the opinion of a human 2.2 NATURAL VIDEO SCENES A typical 'real world or 'natural video scene is composed of multiple objects each with eir own characteristic shape, depth, texture and illumination. The colour and brightness of a natural video scene changes with varying degrees of smoothness throughout the scene Continuous tone). Characteristics of a typical natural video scene(Figure 2.1)that are relevant for video processing and compression include spatial characteristics(texture variation within scene, number and shape of objects, colour, etc )and temporal characteristics(object notion, changes in illumination, movement of the camera or viewpoint and so on) H.264 and MPEG-4 Video Compression: Vi Next-generation Multimedi
2 Video Formats and Quality 2.1 INTRODUCTION Video coding is the process of compressing and decompressing a digital video signal. This chapter examines the structure and characteristics of digital images and video signals and introduces concepts such as sampling formats and quality metrics that are helpful to an understanding of video coding. Digital video is a representation of a natural (real-world) visual scene, sampled spatially and temporally. A scene is sampled at a point in time to produce a frame (a representation of the complete visual scene at that point in time) or a field (consisting of odd- or even-numbered lines of spatial samples). Sampling is repeated at intervals (e.g. 1/25 or 1/30 second intervals) to produce a moving video signal. Three sets of samples (components) are typically required to represent a scene in colour. Popular formats for representing video in digital form include the ITU-R 601 standard and the set of ‘intermediate formats’. The accuracy of a reproduction of a visual scene must be measured to determine the performance of a visual communication system, a notoriously difficult and inexact process. Subjective measurements are time consuming and prone to variations in the response of human viewers. Objective (automatic) measurements are easier to implement but as yet do not accurately match the opinion of a ‘real’ human. 2.2 NATURAL VIDEO SCENES A typical ‘real world’ or ‘natural’ video scene is composed of multiple objects each with their own characteristic shape, depth, texture and illumination. The colour and brightness of a natural video scene changes with varying degrees of smoothness throughout the scene (‘continuous tone’). Characteristics of a typical natural video scene (Figure 2.1) that are relevant for video processing and compression include spatial characteristics (texture variation within scene, number and shape of objects, colour, etc.) and temporal characteristics (object motion, changes in illumination, movement of the camera or viewpoint and so on). H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia. Iain E. G. Richardson. C 2003 John Wiley & Sons, Ltd. ISBN: 0-470-84837-5