Video quality


Video quality is a characteristic of a video passed through a video transmission or processing system that describes perceived video degradation. Video processing systems may introduce some amount of distortion or artifacts in the video signal that negatively impact the user's perception of the system. For many stakeholders in video production and distribution, ensuring video quality is an important task.
Video quality evaluation is performed to describe the quality of a set of video sequences under study. Video quality can be evaluated objectively or subjectively. Also, the quality of a system can be determined offline or in-service.

From analog to digital video

Since the world's first video sequence was recorded and transmitted, many video processing systems have been designed. Such systems encode video streams and transmit them over various kinds of networks or channels. In the age of analog video systems, it was possible to evaluate the quality aspects of a video processing system by calculating the system's frequency response using test signals.
Digital video systems have almost fully replaced analog ones, and quality evaluation methods have changed. The performance of a digital video processing and transmission system can vary significantly and depends on many factors, including the characteristics of the input video signal, the settings used for encoding and transmission, and the channel fidelity or network performance.

Objective video quality

Objective video quality models are mathematical models that approximate results from subjective quality assessment, in which human observers are asked to rate the quality of a video. In this context, the term model may refer to a simple statistical model in which several independent variables are fit against results obtained in a subjective quality evaluation test using regression techniques. A model may also be a more complicated algorithm implemented in software or hardware.

Terminology

The terms model and metric are often used interchangeably in the field to mean a descriptive statistic that provides an indicator of quality. The term "objective" refers to the fact that, in general, quality models are based on criteria that can be measured objectively, that is, free from human interpretation. They can be automatically evaluated by a computer program. Unlike a panel of human observers, an objective model should always deterministically output the same quality score for a given set of input parameters.
Objective quality models are sometimes also referred to as instrumental models, in order to emphasize their application as measurement instruments. Some authors suggest that the term "objective" is misleading, as it "implies that instrumental measurements bear objectivity, which they only do in cases where they can be generalized."

Classification of the objective video quality models

Objective models can be classified by the amount of information available about the original signal, the received signal, or whether there is a signal present at all:
  • Full Reference Methods : FR models compute the quality difference by comparing the original video signal against the received video signal. Typically, every pixel from the source is compared against the corresponding pixel in the received video, with no knowledge about the encoding or transmission process in between. More elaborate algorithms may choose to combine the pixel-based estimation with other approaches, such as those described below. FR models are usually the most accurate, at the expense of higher computational effort. As they require the availability of the original video before transmission or coding, they cannot be used in all situations.
  • Reduced Reference Methods : RR models extract some features of both videos and compare them to give a quality score. They are used when all the original video is not available or when it would be practically impossible to do so, e.g., in a transmission with a limited bandwidth. This makes them more efficient than FR models at the expense of lower accuracy.
  • No-Reference Methods : NR models try to assess the quality of a distorted video without any reference to the original signal. Due to the absence of an original signal, they may be less accurate than FR or RR approaches but are more efficient to compute. The Video Quality Experts Group has a dedicated working group on developing no-reference metrics.
  • * Pixel-Based Methods : Pixel-based models use a decoded representation of the signal and analyze the quality based on the pixel information. Some of these evaluate specific degradation types only, such as blurring or other coding artifacts.
  • * Parametric/Bitstream Methods : These models make use of features extracted from the transmission container and/or video bitstream, e.g., MPEG-TS packet headers, motion vectors, and quantization parameters. They do not have access to the original signal and require no decoding of the video, which makes them more efficient. In contrast to NR-P models, they have no access to the final decoded signal. In some cases, the prediction accuracy of bitstream-based metrics can reach one full reference without requiring a reference.
  • * Hybrid Methods : Hybrid models combine parameters extracted from the bitstream with a decoded video signal. They are therefore a mix between the NR-P and NR-B models.

    Use of picture quality models for video quality estimation

Some models that are used for video quality assessment are simply image quality models, whose output is calculated for every frame of a video sequence. An overview of recent no-reference image quality models has also been given in a journal paper by Shahid et al.
The quality measure of every frame in a video can then be recorded and pooled over time to assess the quality of an entire video sequence. While this method is easy to implement, it does not factor in certain kinds of degradations that develop over time, such as the moving artifacts caused by packet loss and its concealment. A video quality model that considers the temporal aspects of quality degradations, like or the MOVIE Index, may be able to produce more accurate predictions of human-perceived quality.

Video quality artifacts

The estimation of visual artifacts is a well known technique for estimating overall video quality. The majority of these artifacts are compression artifacts caused by lossy compression. Some of the attributes typically estimated by pixel-based metrics include:
Spatial
  • Blurring — a result of loss of high spatial frequency image detail, usually at sharp edges.
  • Blocking — is caused by multiple algorithms because of the internal representation of an image with blocks size 8, 16, or 32. With specific parameters, they can average pixels inside a block making blocks distinct
  • Ringing, echoing or ghosting - takes the form of a "halo," band, or "ghost" near sharp edges.
  • Color bleeding — occurs when the edges of one colour in the image unintentionally bleeds or overlaps into another colour
  • Staircase noise — is a special case of blocking along a diagonal or curved edge. Rather than rendering as smooth, it takes on the appearance of stair steps
Temporal
  • Flickering — is usually frequent brightness or colour changes along the time dimension. It is often broken out as fine-grain flickering and coarse-grain flickering.
  • Mosquito noise — a variant of flickering, it's typified as haziness and/or shimmering around high-frequency content.
  • Floating — refers to illusory motion in certain regions while the surrounding areas remain static. Visually, these regions appear as if they were floating on top of the surrounding background
  • Jerkiness or judder — is the perceived uneven or wobbly motion due to frame sampling. It's often caused by the conversion of 24 fps movies to a 30 or 60 fps video format.

    Examples of video quality metrics

This section lists examples of video quality metrics.

Training and performance evaluation

Since objective video quality models are expected to predict results given by human observers, they are developed with the aid of subjective test results. During the development of an objective model, its parameters should be trained so as to achieve the best correlation between the objectively predicted values and the subjective scores, often available as mean opinion scores.
The most widely used subjective test materials are in the public domain and include still pictures, motion pictures, streaming video, high definition, 3-D, and special-purpose picture quality-related datasets. These so-called databases are created by various research laboratories around the world. Some of them have become de facto standards, including several public-domain subjective picture quality databases created and maintained by the as well the . A collection of databases can be found in the repository. The hosts freely available video test sequences for model development.
Some databases also provide pre-computed metric scores to allow others to benchmark new metrics against existing ones. Examples can be seen in the table below
BenchmarkNumber of videosNumber of metricsType of metrics
58511No-reference
1,20011No-reference
1,5009No-reference
2,50015No-reference
2,50044Full-reference
39,0006No-reference
4375No-reference
3153No-reference

In theory, a model can be trained on a set of data in such a way that it produces perfectly matching scores on that dataset. However, such a model will be over-trained and will therefore not perform well on new datasets. It is therefore advised to validate models against new data and use the resulting performance as a real indicator of the model's prediction accuracy.
To measure the performance of a model, some frequently used metrics are the linear correlation coefficient, Spearman's rank correlation coefficient, and the root mean square error. Other metrics are the kappa coefficient and the outliers ratio. ITU-T Rec. gives an overview of statistical procedures to evaluate and compare objective models.