Dolby Digital Plus


Dolby Digital Plus, also known as Enhanced AC-3, is a digital audio compression scheme developed by Dolby Labs for the transport and storage of multi-channel digital audio. It is a successor to Dolby Digital, and has a number of improvements over that codec, including support for a wider range of data rates, an increased channel count, and multi-program support, as well as additional tools for representing compressed data and counteracting artifacts. Whereas Dolby Digital supports up to five full-bandwidth audio channels at a maximum bitrate of 640 kbit/s, E-AC-3 supports up to 15 full-bandwidth audio channels at a maximum bitrate of 6.144 Mbit/s.
The full set of technical specifications for E-AC-3 are standardized and published in Annex E of ATSC A/52:2012, as well as Annex E of ETSI TS 102 366.

Technical details

Specifications

Dolby Digital Plus is capable of the following:
  • Coded bitrate: 0.032 to 6.144 Mbit/s
  • Audio channels: 1.0 to 15.1
  • Number of audio programs per bitstream: 8
  • Sample rate: 32, 44.1 or 48 kHz

    Structure

A Dolby Digital Plus service consists of one or more substreams. There are three types of substreams:
  • Independent substreams, which can contain a single program of up to 5.1 channels. Up to eight dependent substreams may be present in a Dolby Digital Plus stream. The channels present in an independent substream are limited to the traditional 5.1 channels: Left, Right, Center, Left Surround, and Right Surround channels, as well as a Low Frequency Effects channel.
  • Legacy substreams, which contain a single 5.1 program, and which correspond directly to Dolby Digital content. At most a single legacy substream may be present in a DD+ stream.
  • Dependent substreams, which contain additional channels beyond the traditional 5.1 channels. As dependent substreams have the same structure as independent substreams, each dependent substream may contain up to five full-bandwidth channels and one low-frequency channel; however these channels may be assigned to different speaker placements. Metadata in the substream describes the purpose of each included channel.
All DD+ streams must contain at least one independent substream or legacy substream, which contains the first 5.1 channels of the primary audio program. Additional independent substreams may be used for secondary audio programs such as foreign language soundtracks, commentary, or descriptions/voiceovers for the visually impaired. Dependent substreams may be provided for programs that have additional soundstage channels beyond 5.1.
Within each substream, provision is made for encoding five full-bandwidth channels, one low-frequency channel, and one coupling channel. The coupling channel is used for medium-to-high-frequency information which is common to multiple full-bandwidth channels. Its content is mixed in with the other channels in a fashion prescribed by the metadata, it is not reproduced as a discrete channel by the decoder.
Dolby Digital Plus includes comprehensive bitstream metadata for decoder control over output loudness, downmixing, and reversible dynamic range control.

Syntax

Dolby Digital Plus is nominally a 16-bit-aligned protocol, though very few fields in the syntax respect any byte or word boundaries. As many syntax elements are optional or variable-length, including some whose presence or length is dependent on complex preceding calculations, and there is little redundancy in the syntax, DD+ can be extremely difficult to parse correctly, with syntactically valid but incorrect parsings easily produced by defective encoders.
A DD+ stream is a collection of fixed-length syncframe packets, each of which corresponds to either 256, 512, 768, or 1536 consecutive time-domain audio samples.. Each syncframe is independently decodable, and belongs to a specific substream within the service. A syncframe consists of the following syntax elements :
  • A 16-bit sync word, which has the value 0x0b77.
  • A Bitstream Info section, which includes key metadata such as the frame size, the bitstream identifier, channel mode, the substream identifier, the encoded dialog level, and metadata to guide decoder production of a downmix.
  • An Audio Frame section, which contains decoding information common to all audio blocks within the syncframe, including the necessary information to determine how exponents and mantissas are packed.
  • One, two, three, or six Audio Block sections. These sections contain additional decoding metadata, as well as the encoded and quantized frequency coefficients. Each Audio Block corresponds to 256 PCM samples in each channel.
  • A final section containing user-defined auxiliary data, any necessary padding to produce uniform syncframe lengths, and a 16-bit cyclic redundancy check code for error detection.

    Storage of transform coefficients

At the heart of both Dolby Digital and DD+ is a modified discrete cosine transform, which is used to transform the audio signal into the frequency domain; within each block up to 256 frequency coefficients may be transmitted. Coefficients are transmitted in a binary floating-point format, with exponents transmitted separately from mantissas. This allows for highly efficient coding.
Exponents for each channel are encoded in a highly packed differential format, with the deltas between consecutive frequency bins being given in the stream. Three formats, or exponent strategies, are used; these are known as "D15", "D25", and "D45". In D15, each bin has a unique exponent, while in D25 and D45, delta values correspond to either pairs or quads of frequency bins. Audio blocks other than the first in a syncframe may additionally reuse the prior block's exponent set.
The decoded exponents, along with a set of metadata parameters, is used to derive the bit allocation pointers, which specify the number of bits allocated to each mantissa. Bins which correspond to frequencies in which human hearing is more precise are allocated more bits; bins which correspond to frequencies that humans are less sensitive to are allocated fewer. Anywhere between zero and 16 bits may be allocated for each mantissa; if zero bits are transmitted, a dither function may be optionally applied to generate the frequency coefficient.

Algorithm

Dolby Digital Plus, like many lossy audio codecs, uses a heavily quantized frequency-domain representation of the signal to achieve coding gain; this section describes the operation of the base transform as well as various optional "tools" specified by the standard, which are used to achieve either greater compression or to reduce audible coding artifacts.

Modified discrete cosine transform

Both Dolby Digital and DD+ encoder converts a multichannel audio signal to the frequency domain using the modified discrete cosine transform, with a switchable block length of either 256 or 512 samples. The frequency domain representation is then quantized according to a psycho-acoustic model and transmitted. A floating-point format for frequency coefficients is used, and mantissas and exponents are stored and transmitted separately, with both being heavily compressed.

Adaptive hybrid transform (AHT)

For highly stationary signals, such as long notes in musical performance, the Adaptive Hybrid Transform is used. This tool is unique to Dolby Digital Plus, and uses an additional Type II discrete cosine transform to combine six adjacent transform blocks into an effectively longer block. In addition to the two-stage transform, a different bit-allocation structure is used, and two ways of representing encoded mantissas are deployed: use of vector quantization, which gives the highest coding gain, and use of gain-adapted quantization when greater signal-fidelity is required. Gain-adaptive quantization may be independently enabled for each frequency bin within a channel, and permits variable-length mantissa encoding.

Coupling

As many multi-channel audio programs have high degrees of correlation between individual channels, a coupling channel is typically used. High frequency information which is common among two or more channels is transmitted in a separate channel known as the coupling channel; along with coefficients known as "coupling coordinates" that guide the decoder on how to reconstruct the original channels.
Dolby Digital Plus supports a more elaborate version of the coupling tool known as Enhanced Coupling. This algorithm, which is considerably more expensive to process allows phase information to be included in coupling coordinates, allowing for phase relationships between channels that are coupled to be preserved.

Spectral extension

Dolby Digital Plus provides another tool for high frequencies. As high frequency components are often harmonics of lower-frequency sounds, Spectral Extension allows high frequency components to be synthesized algorithmically from lower-frequency components. This tool is also unique to Dolby Digital Plus, and unsupported in Dolby Digital.

Rematrixing

Stereo programs are typically rematrixed and encoded as an L+R and L-R channel. This is done both to increase coding gain, and to preserve phase relationships necessary for proper playback of Dolby Surround-encoded material.

Transient pre-noise processing

Transient pre-noise processing is a Dolby Digital Plus-specific tool to reduce the resulting artifacts of signal quantization and other compression techniques. Unlike the other tools described above, which operate in the frequency domain and precede the conversion back into PCM samples, TPNP is a tool which essentially performs a windowed cut-and-paste operation on the time-domain signal to erase certain predictable quantization artifacts.

Relation to Dolby Digital and Dolby Atmos

Dolby Digital Plus bitstreams are not directly backward compatible with legacy Dolby Digital decoders. However, Dolby Digital Plus is a functional superset of Dolby Digital, and decoders include a mandatory component that directly converts the Dolby Digital Plus bitstream to a Dolby Digital bitstream for carriage via legacy S/PDIF connections to external decoders. All Dolby Digital Plus decoders can decode Dolby Digital bitstreams.
However, Dolby Atmos bitstreams are encoded to be backwards compatible with Dolby Digital Plus decoders, and as such Dolby Atmos can be decoded by Dolby Digital Plus compatible devices. This has been marketed by Dolby, as the lossy compression variation of Dolby Atmos under the label "Dolby Digital Plus Atmos" to differentiate it from the lossless DolbyHD-based original. Most Dolby Digital Plus bitstreams are now encoded in Atmos encoding.