Audio system measurements

Audio system measurements are used to quantify audio system performance. These measurements are made for several purposes. Designers take measurements to specify the performance of a piece of equipment. Maintenance engineers make them to ensure equipment is still working to specification, or to ensure that the cumulative defects of an audio path are within limits considered acceptable. Audio system measurements often accommodate psychoacoustic principles to measure the system in a way that relates to human hearing.

Subjectivity and frequency weighting

Subjectively valid methods came to prominence in consumer audio in the UK and Europe in the 1970s, when the introduction of compact cassette tape, dbx and noise reduction techniques revealed the unsatisfactory nature of many basic engineering measurements. The specification of weighted CCIR-468 quasi-peak noise, and weighted quasi-peak wow and flutter became particularly widely used and attempts were made to find more valid methods for distortion measurement.
Measurements based on psychoacoustics, such as the measurement of noise, often use a weighting filter. It is well established that human hearing is more sensitive to some frequencies than others, as demonstrated by equal-loudness contours, but it is not well appreciated that these contours vary depending on the type of sound. The measured curves for pure tones, for instance, are different from those for random noise. The ear also responds less well to short bursts, below 100 to 200 ms, than to continuous sounds such that a quasi-peak detector has been found to give the most representative results when noise contains click or bursts, as is often the case for noise in digital systems. For these reasons, a set of subjectively valid measurement techniques have been devised and incorporated into BS, IEC, EBU and ITU standards. These methods of audio quality measurement are used by broadcast engineers throughout most of the world, as well as by some audio professionals, though the older A-weighting standard for continuous tones is still commonly used by others.
No single measurement can assess audio quality. Instead, engineers use a series of measurements to analyze various types of degradation that can reduce fidelity. Thus, when testing an analogue tape machine it is necessary to test for wow and flutter and tape speed variations over longer periods, as well as for distortion and noise. When testing a digital system, testing for speed variations is normally considered unnecessary because of the accuracy of clocks in digital circuitry, but testing for aliasing and timing jitter is often desirable, as these have caused audible degradation in many systems.
Once subjectively valid methods have been shown to correlate well with listening tests over a wide range of conditions, then such methods are generally adopted as preferred. Standard engineering methods are not always sufficient when comparing like with like. One CD player, for example, might have higher measured noise than another CD player when measured with a RMS method, or even an A-weighted RMS method, yet sound quieter and measure lower when 468-weighting is used. This could be because it has more noise at high frequencies, or even at frequencies beyond, both of which are less important since human ears are less sensitive to them. This effect is how Dolby B works and why it was introduced. Cassette noise, which was predominately high frequency and unavoidable given the small size and speed of the recorded track could be made subjectively much less important. The noise sounded quieter, but failed to measure much better unless 468-weighting was used rather than A-weighting.

Measurable performance

Analog electrical

; Frequency response
; Total harmonic distortion
; Output power
; Intermodulation distortion
; Noise
; Crosstalk
; Common-mode rejection ratio
; Dynamic range and Signal-to-noise ratio
; Phase distortion, Group delay, and Phase delay : A perfect audio component will maintain the phase coherency of a signal over the full range of frequencies. Phase distortion can be extremely difficult to reduce or eliminate. The human ear is largely insensitive to phase distortion, though it is exquisitely sensitive to relative phase relationships within heard sounds. The complex nature of our sensitivity to phase errors, coupled with the lack of a convenient test that delivers an easily understood quality rating, is the reason that it is not a part of conventional audio specifications. Multi-driver loudspeaker systems may have complex phase distortions, caused or corrected by crossovers, driver placement, and the phase behaviour of the specific driver.
; Transient response : A system may have low distortion for a steady-state signal, but not on sudden transients. In amplifiers, this problem can be traced to power supplies in some instances, to insufficient high-frequency performance or to excessive negative feedback. Related measurements are slew rate and rise time. Distortion in transient response can be hard to measure. Many otherwise good power amplifier designs have been found to have inadequate slew rates, by modern standards. In loudspeakers, transient response performance is affected by the mass and resonances of drivers and enclosures and by group delay and phase delay introduced by crossover filtering or inadequate time alignment of the loudspeaker's drivers. Most loudspeakers generate significant amounts of transient distortion, though some designs are less prone to this.
; Damping factor : The ratio of the output impedance of an amplifier and connecting cables to the DC resistance of a voice coil. A higher number is generally believed to be better. This is a measure of how well a power amplifier controls the undesired motion of a loudspeaker driver. An amplifier must be able to suppress resonances caused by mechanical motion of a speaker cone. This is especially important for a low-frequency driver with greater mass. For conventional loudspeaker drivers, this essentially involves ensuring that the output impedance of the amplifier is close to zero and that the speaker wires are sufficiently short and have sufficiently large diameter. A damping factor of 20 or greater is considered adequate for live sound reinforcement systems, as the SPL of inertia-related driver movement is 26 dB less than signal level and won't be heard. Negative feedback in an amplifier lowers its effective output impedance and thus increases its damping factor.

Mechanical

; Wow and flutter : These measurements are related to physical motion in a component, largely the drive mechanism of analogue media, such as vinyl records and magnetic tape. Wow is a slow-speed variation, caused by longer-term drift of the drive motor speed, whereas flutter is faster speed variations, usually caused by mechanical defects such as out-of-roundness of the capstan of a tape transport mechanism. The measurement is given in percent, and a lower number is better.
; Rumble : The measure of the low frequency noise contributed by the turntable of an analogue playback system. It is caused by imperfect bearings, uneven motor windings, vibrations in driving bands in some turntables, room vibrations that is transmitted by the turntable mounting and so to the phono cartridge.

Digital

Note that digital systems do not suffer from many of these effects at a signal level, though the same processes occur in the circuitry since the data being handled is symbolic. As long as the symbol survives the transfer between components, and can be perfectly regenerated the data itself is perfectly maintained. The data is typically buffered in a memory, and is clocked out by a very precise crystal oscillator. The data usually does not degenerate as it passes through many stages, because each stage regenerates new symbols for transmission.
Digital systems have their own problems. Digitizing adds noise, which is measurable and depends on the audio bit depth of the system, regardless of other quality issues. Timing errors in sampling clocks result in non-linear distortion of the signal. One quality measurement for a digital system relates to the probability of an error in transmission or reception. Other metrics on the quality of the system are defined by sample rate and bit depth. In general, digital systems are much less prone to error than analogue systems; However, nearly all digital systems have analogue inputs and/or outputs, and certainly all of those that interact with the analogue world do so. These analogue components of the digital system can suffer analogue effects and potentially compromise the integrity of a well designed digital system.
; Jitter : A measurement of the variation in period and absolute timing between measured clock timing versus an ideal clock. Less jitter is generally better for sampling systems.
; Sample rate : A specification of the rate at which measurements are taken of the analogue signal. This is measured in samples per second, or hertz. A higher sampling rate allows a greater total bandwidth or pass-band frequency response and allows less-steep anti-aliasing/anti-imaging filters to be used in the stop-band, which can in turn improve overall phase linearity in the pass-band.
; Bit depth : In Pulse-code modulation audio, the bit depth is the number of bits of information in each sample. Quantization, a process used in digital audio sampling, creates an error in the reconstructed signal. The Signal-to-quantization-noise ratio is a multiple of the bit depth.
; Sample accuracy/synchronisation : Not as much a specification as an ability. Since independent digital audio devices are each run by their own crystal oscillator, and no two crystals are exactly the same, sample rate will be slightly different. This will cause the devices to drift apart over time. The effects of this can vary. If one digital device is used to monitor another digital device, this will cause dropouts or distortion in the audio, as one device will be producing more or less data than the other per unit time. If two independent devices record at the same time, one will lag the other more and more over time. This effect can be circumvented with a word clock synchronization. It can also be corrected in the digital domain using a drift correction algorithm. Such an algorithm compares the relative rates of two or more devices and drops or adds samples from the streams of any devices that drift too far from the master device. Sample rate will also vary slightly over time, as crystals change in temperature, etc. See also clock recovery
; Linearity : Differential non-linearity and integral non-linearity are two measurements of the accuracy of an analog-to-digital converter. Basically, they measure how close the threshold levels for each bit are to the theoretical equally-spaced levels.

Automated sequence testing

Sequence testing uses a specific sequence of test signals, for frequency response, noise, distortion etc., generated and measured automatically to carry out a complete quality check on a piece of equipment or signal path. A single 32-second sequence was standardized by the EBU in 1985, incorporating 13 tones for frequency response measurement, two tones for distortion plus crosstalk and compander tests. This sequence, which began with a 110-baud FSK signal for synchronizing purposes, also became CCITT standard O.33 in 1985.
Lindos Electronics expanded the concept, retaining the FSK concept, and inventing segmented sequence testing, which separated each test into a 'segment' starting with an identifying character transmitted as 110-baud FSK so that these could be regarded as 'building blocks' for a complete test suited to a particular situation. Regardless of the mix chosen, the FSK provides both identification and synchronization for each segment, so that sequence tests sent over networks and even satellite links are automatically responded to by measuring equipment. Thus TUND represents a sequence made up of four segments which test the alignment level, frequency response, noise and distortion in less than a minute, with many other tests, such as Wow and flutter, Headroom, and Crosstalk also available in segments as well as a whole.
The Lindos sequence test system is now a 'de facto' standard in broadcasting and many other areas of audio testing, with over 25 different segments recognized by Lindos test sets, and the EBU standard is no longer used.

Unquantifiable?

Many audio components are tested for performance using objective and quantifiable measurements, e.g., THD, dynamic range and frequency response. Some take the view that objective measurements are useful and often relate well to subjective performance, i.e., the sound quality as experienced by the listener. Floyd Toole has extensively evaluated loudspeakers in acoustical engineering research. In a peer reviewed scientific journal, Toole has presented findings that subjects have a range of abilities to distinguish good loudspeakers from bad, and that blind listening tests are more reliable than sighted tests. He found that subjects can more accurately perceive differences in speaker quality during monaural playback though a single loudspeaker, whereas subjective perception of stereophonic sound is more influenced by room effects. One of Toole's papers showed that objective measurements of loudspeaker performance match subjective evaluations in listening tests.
Some argue that because human hearing and perception are not fully understood, listener experience should be valued above everything else. This is often encountered in the world of home audio publications. The usefulness of blind listening tests and common objective performance measurements, e.g., THD, are questioned. For instance, crossover distortion at a given THD is much more audible than clipping distortion at the same THD, since the harmonics produced are at higher frequencies. This does not imply that the defect is somehow unquantifiable or unmeasurable; just that a single THD number is inadequate to specify it and must be interpreted with care. Taking THD measurements at different output levels would expose whether the distortion is clipping or crossover.
Whichever the view, some measurements have been historically favoured. For example, THD is an average of a number of harmonics equally weighted, even though research identifies that lower order harmonics are harder to hear at the same level, compared with higher-order ones. In addition, even-order harmonics are said to be generally harder to hear than odd order. A number of formulas that attempt to correlate THD with actual audibility have been published, however, none have gained mainstream use.
The mass market consumer magazine Stereophile promotes the claim that home audio enthusiasts prefer sighted tests than blind tests.