Advanced Video Coding


Advanced Video Coding, also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 79% of video industry developers. It supports a maximum resolution of 8K UHD.
The intent of the H.264/AVC project was to create a standard capable of providing good video quality at substantially lower bit rates than previous standards, without increasing the complexity of design so much that it would be impractical or excessively expensive to implement. This was achieved with features such as a reduced-complexity integer discrete cosine transform, variable block-size segmentation, and multi-picture inter-picture prediction. An additional goal was to provide enough flexibility to allow the standard to be applied to a wide variety of applications on a wide variety of networks and systems, including low and high bit rates, low and high resolution video, broadcast, DVD storage, RTP/IP packet networks, and ITU-T multimedia telephony systems. The H.264 standard can be viewed as a "family of standards" composed of a number of different profiles, although its "High profile" is by far the most commonly used format. A specific decoder decodes at least one, but not necessarily all profiles. The standard describes the format of the encoded data and how the data is decoded, but it does not specify algorithms for encoding—that is left open as a matter for encoder designers to select for themselves, and a wide variety of encoding schemes have been developed. H.264 is typically used for lossy compression, although it is also possible to create truly lossless-coded regions within lossy-coded pictures or to support rare use cases for which the entire encoding is lossless.
H.264 was standardized by the ITU-T Video Coding Experts Group of Study Group 16 together with the ISO/IEC JTC 1 Moving Picture Experts Group. The project partnership effort is known as the Joint Video Team. The ITU-T H.264 standard and the ISO/IEC MPEG-4 AVC standard are jointly maintained so that they have identical technical content. The final drafting work on the first version of the standard was completed in May 2003, and various extensions of its capabilities have been added in subsequent editions. High Efficiency Video Coding, a.k.a. H.265 and MPEG-H Part 2 is a successor to H.264/MPEG-4 AVC developed by the same organizations, while earlier standards are still in common use.
H.264 is perhaps best known as being the most commonly used video encoding format on Blu-ray Discs. It has also been widely used by streaming Internet sources, such as videos from Netflix, Hulu, Amazon Prime Video, Vimeo, YouTube, and the iTunes Store, Web software such as the Adobe Flash Player and Microsoft Silverlight, and also various HDTV broadcasts over terrestrial, cable, and satellite systems.
H.264 is restricted by patents owned by various parties. A license covering most patents essential to H.264 is administered by a patent pool formerly administered by MPEG LA. Via Licensing Corp acquired MPEG LA in April 2023 and formed a new patent pool administration company called Via Licensing Alliance. The commercial use of patented H.264 technologies requires the payment of royalties to Via and other patent owners. MPEG LA has allowed the free use of H.264 technologies for streaming Internet video that is free to end users, and Cisco paid royalties to MPEG LA on behalf of the users of binaries for its open source H.264 encoder openH264.

Naming

The H.264 name follows the ITU-T naming convention, where Recommendations are given a letter corresponding to their series and a recommendation number within the series. H.264 is part of "H-Series Recommendations: Audiovisual and multimedia systems". H.264 is further categorized into "H.200-H.499: Infrastructure of audiovisual services" and "H.260-H.279: Coding of moving video". The MPEG-4 AVC name relates to the naming convention in ISO/IEC MPEG, where the standard is part 10 of ISO/IEC 14496, which is the suite of standards known as MPEG-4. The standard was developed jointly in a partnership of VCEG and MPEG, after earlier development work in the ITU-T as a VCEG project called H.26L. It is thus common to refer to the standard with names such as H.264/AVC, AVC/H.264, H.264/MPEG-4 AVC, or MPEG-4/H.264 AVC, to emphasize the common heritage. Occasionally, it is also referred to as "the JVT codec", in reference to the Joint Video Team organization that developed it. Some software programs internally identify this standard as AVC1.

History

Overall history

In early 1998, the Video Coding Experts Group issued a call for proposals on a project called H.26L, with the target to double the coding efficiency in comparison to any other existing video coding standards for a broad variety of applications. VCEG was chaired by Gary Sullivan. The first draft design for that new standard was adopted in August 1999. In 2000, Thomas Wiegand became VCEG co-chair.
In December 2001, VCEG and the Moving Picture Experts Group formed a Joint Video Team, with the charter to finalize the video coding standard. Formal approval of the specification came in March 2003. The JVT was chaired by Gary Sullivan, Thomas Wiegand, and Ajay Luthra. In July 2004, the Fidelity Range Extensions project was finalized. From January 2005 to November 2007, the JVT was working on an extension of H.264/AVC towards scalability by an Annex called Scalable Video Coding. The JVT management team was extended by Jens-Rainer Ohm. From July 2006 to November 2009, the JVT worked on Multiview Video Coding, an extension of H.264/AVC towards 3D television and limited-range free-viewpoint television. That work included the development of two new profiles of the standard: the Multiview High Profile and the Stereo High Profile.
Throughout the development of the standard, additional messages for containing supplemental enhancement information have been developed. SEI messages can contain various types of data that indicate the timing of the video pictures or describe various properties of the coded video or how it can be used or enhanced. SEI messages are also defined that can contain arbitrary user-defined data. SEI messages do not affect the core decoding process, but can indicate how the video is recommended to be post-processed or displayed. Some other high-level properties of the video content are conveyed in video usability information, such as the indication of the color space for interpretation of the video content. As new color spaces have been developed, such as for high dynamic range and wide color gamut video, additional VUI identifiers have been added to indicate them.

Fidelity range extensions and professional profiles

The standardization of the first version of H.264/AVC was completed in May 2003. In the first project to extend the original standard, the JVT then developed what was called the Fidelity Range Extensions. These extensions enabled higher quality video coding by supporting increased sample bit depth precision and higher-resolution color information, including the sampling structures known as Y′CBCR 4:2:2 and 4:4:4. Several other features were also included in the FRExt project, such as adding an 8×8 integer discrete cosine transform with adaptive switching between the 4×4 and 8×8 transforms, encoder-specified perceptual-based quantization weighting matrices, efficient inter-picture lossless coding, and support of additional color spaces. The design work on the FRExt project was completed in July 2004, and the drafting work on them was completed in September 2004.
Five other new profiles intended primarily for professional applications were then developed, adding extended-gamut color space support, defining additional aspect ratio indicators, defining two additional types of "supplemental enhancement information", and deprecating one of the prior FRExt profiles that industry feedback indicated should have been designed differently.

Scalable video coding

The next major feature added to the standard was Scalable Video Coding. Specified in Annex G of H.264/AVC, SVC allows the construction of bitstreams that contain layers of sub-bitstreams that also conform to the standard, including one such bitstream known as the "base layer" that can be decoded by a H.264/AVC codec that does not support SVC. For temporal bitstream scalability, complete access units are removed from the bitstream when deriving the sub-bitstream. In this case, high-level syntax and inter-prediction reference pictures in the bitstream are constructed accordingly. On the other hand, for spatial and quality bitstream scalability, the NAL is removed from the bitstream when deriving the sub-bitstream. In this case, inter-layer prediction is typically used for efficient coding. The Scalable Video Coding extensions were completed in November 2007.

Multiview video coding

The next major feature added to the standard was Multiview Video Coding. Specified in Annex H of H.264/AVC, MVC enables the construction of bitstreams that represent more than one view of a video scene. An important example of this functionality is stereoscopic 3D video coding. Two profiles were developed in the MVC work: Multiview High profile supports an arbitrary number of views, and Stereo High profile is designed specifically for two-view stereoscopic video. The Multiview Video Coding extensions were completed in November 2009.

3D-AVC and MFC stereoscopic coding

Additional extensions were later developed that included 3D video coding with joint coding of depth maps and texture, multi-resolution frame-compatible stereoscopic and 3D-MFC coding, various additional combinations of features, and higher frame sizes and frame rates.

Versions

Versions of the H.264/AVC standard include the following completed revisions, corrigenda, and amendments. Each version represents changes relative to the next lower version that is integrated into the text.
  • Version 1 : First approved version of H.264/AVC containing Baseline, Main, and Extended profiles.
  • Version 2 : Corrigendum containing various minor corrections.
  • Version 3 : Major addition containing the first amendment, establishing the Fidelity Range Extensions. This version added the High, High 10, High 4:2:2, and High 4:4:4 profiles. After a few years, the High profile became the most commonly used profile of the standard.
  • Version 4 : Corrigendum containing various minor corrections and adding three aspect ratio indicators.
  • Version 5 : Amendment consisting of removal of prior High 4:4:4 profile.
  • Version 6 : Amendment consisting of minor extensions like extended-gamut color space support.
  • Version 7 : Amendment containing the addition of the High 4:4:4 Predictive profile and four Intra-only profiles.
  • Version 8 : Major addition to H.264/AVC containing the amendment for Scalable Video Coding containing Scalable Baseline, Scalable High, and Scalable High Intra profiles.
  • Version 9 : Corrigendum containing minor corrections.
  • Version 10 : Amendment containing definition of a new profile with only the common subset of capabilities supported in various previously specified profiles.
  • Version 11 : Major addition to H.264/AVC containing the amendment for Multiview Video Coding extension, including the Multiview High profile.
  • Version 12 : Amendment containing definition of a new MVC profile for two-view video coding with support of interlaced coding tools and specifying an additional supplemental enhancement information message termed the frame packing arrangement SEI message.
  • Version 13 : Corrigendum containing minor corrections.
  • Version 14 : Amendment specifying a new level supporting higher processing rates in terms of maximum macroblocks per second, and a new profile supporting only the frame coding tools of the previously specified High profile.
  • Version 15 : Corrigendum containing minor corrections.
  • Version 16 : Amendment containing definition of three new profiles intended primarily for real-time communication applications: the Constrained High, Scalable Constrained Baseline, and Scalable Constrained High profiles.
  • Version 17 : Amendment with additional SEI message indicators.
  • Version 18 : Amendment to specify the coding of depth map data for 3D stereoscopic video, including a Multiview Depth High profile.
  • Version 19 : Corrigendum to correct an error in the sub-bitstream extraction process for multiview video.
  • Version 20 : Amendment to specify additional color space identifiers and an additional model type in the tone mapping information SEI message.
  • Version 21 : Amendment to specify the Enhanced Multiview Depth High profile.
  • Version 22 : Amendment to specify the multi-resolution frame compatible enhancement for 3D stereoscopic video, the MFC High profile, and minor corrections.
  • Version 23 : Amendment to specify MFC stereoscopic video with depth maps, the MFC Depth High profile, the mastering display color volume SEI message, and additional color-related VUI codepoint identifiers.
  • Version 24 : Amendment to specify additional levels of decoder capability supporting larger picture sizes, the green metadata SEI message, the alternative depth information SEI message, and additional color-related VUI codepoint identifiers.
  • Version 25 : Amendment to specify the Progressive High 10 profile, hybrid log–gamma, and additional color-related VUI code points and SEI messages.
  • Version 26 : Amendment to specify additional SEI messages for ambient viewing environment, content light level information, content color volume, equirectangular projection, cubemap projection, sphere rotation, region-wise packing, omnidirectional viewport, SEI manifest, and SEI prefix.
  • Version 27 : Amendment to specify additional SEI messages for annotated regions and shutter interval information, and miscellaneous minor corrections and clarifications.
  • Version 28 : Amendment to specify additional SEI messages for neural-network postfilter characteristics, neural-network post-filter activation, and phase indication, additional colour type identifiers, and miscellaneous minor corrections and clarifications.