Binaural unmasking

Binaural unmasking is phenomenon of auditory perception where the brain combines information from the two ears in order to improve signal detection and identification in noise. The phenomenon, discovered by Ira Hirsh, is most commonly observed when there is a difference between the interaural phase of the signal and the interaural phase of the noise. When such a difference is present there is an improvement in masking threshold compared to a reference situation in which the interaural phases are the same, or when the stimulus has been presented monaurally. Those two cases usually give very similar thresholds. The size of the improvement is known as the "binaural masking level difference", or simply as the "masking level difference".
Binaural unmasking is most effective at low frequencies. The BMLD for pure tones in broadband noise reaches a maximum value of about 15 decibels at 250 Hz and progressively declines to 2-3 dB at 1500 Hz. The BMLD then stabilises at 2-3 dB for all higher frequencies, up to at least 4 kHz. Binaural unmasking can also be observed for narrowband masking noises, but the effect behaves differently: larger BMLDs can be observed and there is little evidence of a decline with increasing frequency.
Improved identification of speech in noise was first reported by J.C.R. Licklider. Licklider noted that a difference in interaural phase that was being used in unmasking is similar to interaural time difference, which varies with the direction of a sound source and is involved in sound localisation. The fact that speech can be unmasked and the underlying cues vary with sound direction raised the possibility that binaural unmasking plays a role in the cocktail party effect.

Labelling system

A systematic labelling system for different stimulus configurations, first used by Jeffress, has been adopted by most authors in the area. The condition names are written NxSy, where x is interaural configuration of the noise and y is the interaural configuration of the signal. Some common values for x and y include:

0 means that the signal or noise is identical at the two ears
means that the signal or noise has an interaural phase difference of radians
means that the signal or noise has an interaural time difference, where the exact value of the time difference,, is specified elsewhere.
ρ means that the noise has an interaural correlation of less than one, the exact correlation being specified elsewhere.
u means that the signal or noise is uncorrelated across the two ears.
m means that the signal or noise is monaural.

Theories

Binaural unmasking has two main explanatory frameworks. These are based on interaural cross-correlation and interaural subtraction.
The cross-correlation account relies on the existence of a coincidence detection network in the midbrain similar to that proposed by Lloyd A. Jeffress to account for sensitivity to interaural time differences in sound localization. Each coincidence detector receives a stream of action potentials from the two ears via a network of axons that introduce differential transmission delays. Detection of a signal is thought to occur when the response rate of the most active coincidence detector is reduced by the presence of a signal. Cross-correlation of the signals at the two ears is often used as mathematical surrogate for modelling such an array of coincidence detecting neurons; the reduced response rate is translated into a reduction in the cross-correlation maximum.
The subtractive account is known as "equalization-cancellation" or "EC" theory. In this account, the waveforms at the two ears are temporally aligned by the brain, before being subtracted one from the other. In effect, the coincidence detectors are replaced with neurons that are excited by action potentials from one ear, but inhibited by action potentials from the other. However, EC theory is not generally framed in such explicit neurological terms, and no suitable neural substrate has been identified in the brain. Nonetheless, EC theory has proved a very popular modelling framework, and has fared well in direct comparison with cross-correlation models in psychoacoustic experiments

Perceptual cues

The ear filters incoming sound into different frequencies: a given place in the cochlea, and a given auditory nerve fibre, respond only to a limited range of frequencies. Consequently, researchers have examined the cues that are generated by mixtures of speech and noise at the two ears within a narrow frequency band around the signal. When a signal and narrowband noise are added, a vector summation occurs in which the resultant amplitude and phase differ from those of the noise or signal alone. For a binaural unmasking stimulus, the differences between the interaural parameters of the signal and noise mean that there will be a different vector summation at each ear. Consequently, regardless of the stimulus construction, there tend to be fluctuations in both the level and phase differences of the stimuli at the listener's ears.
Experiments have examined which of these cues the auditory system can best detect. These have shown that, at low frequencies, the auditory system is most sensitive to the interaural time differences. At higher frequencies, however, there seems to be a transition to using interaural level differences.

Practical implications

In everyday life, speech is more easily understood in noise when speech and noise come from different directions, a phenomenon known as "spatial release from masking". In this situation, the speech and noise have distinct interaural time differences and interaural level differences. The time differences are produced by the differences in the length of the sound path to the two ears and the level differences are caused by the acoustic shadowing effect of the head. These two cues play a major role in sound localisation, and have both been shown to have independent effects in spatial release from masking. The interaural level differences can give rise to one ear or the other having a better signal-to-noise ratio, which would allow the listener to gain an intelligibility improvement by simply listening to that ear. However, the interaural time differences can only be exploited by comparing the waveforms at the two ears. Successful models of spatial release from masking tend to use equalization-cancellation theory to generate the effects of interaural time differences.