Binaural unmasking
Binaural unmasking is phenomenon of auditory perception where the brain combines information from the two ears in order to improve signal detection and identification in noise. The phenomenon, discovered by Ira Hirsh, is most commonly observed when there is a difference between the interaural phase of the signal and the interaural phase of the noise. When such a difference is present there is an improvement in masking threshold compared to a reference situation in which the interaural phases are the same, or when the stimulus has been presented monaurally. Those two cases usually give very similar thresholds. The size of the improvement is known as the "binaural masking level difference", or simply as the "masking level difference".
Binaural unmasking is most effective at low frequencies. The BMLD for pure tones in broadband noise reaches a maximum value of about 15 decibels at 250 Hz and progressively declines to 2-3 dB at 1500 Hz. The BMLD then stabilises at 2-3 dB for all higher frequencies, up to at least 4 kHz. Binaural unmasking can also be observed for narrowband masking noises, but the effect behaves differently: larger BMLDs can be observed and there is little evidence of a decline with increasing frequency.
Improved identification of speech in noise was first reported by J.C.R. Licklider. Licklider noted that a difference in interaural phase that was being used in unmasking is similar to interaural time difference, which varies with the direction of a sound source and is involved in sound localisation. The fact that speech can be unmasked and the underlying cues vary with sound direction raised the possibility that binaural unmasking plays a role in the cocktail party effect.
Labelling system
A systematic labelling system for different stimulus configurations, first used by Jeffress, has been adopted by most authors in the area. The condition names are written NxSy, where x is interaural configuration of the noise and y is the interaural configuration of the signal. Some common values for x and y include:- 0 means that the signal or noise is identical at the two ears
- means that the signal or noise has an interaural phase difference of radians
- means that the signal or noise has an interaural time difference, where the exact value of the time difference,, is specified elsewhere.
- ρ means that the noise has an interaural correlation of less than one, the exact correlation being specified elsewhere.
- u means that the signal or noise is uncorrelated across the two ears.
- m means that the signal or noise is monaural.
Theories
Binaural unmasking has two main explanatory frameworks. These are based on interaural cross-correlation and interaural subtraction.The cross-correlation account relies on the existence of a coincidence detection network in the midbrain similar to that proposed by Lloyd A. Jeffress to account for sensitivity to interaural time differences in sound localization. Each coincidence detector receives a stream of action potentials from the two ears via a network of axons that introduce differential transmission delays. Detection of a signal is thought to occur when the response rate of the most active coincidence detector is reduced by the presence of a signal. Cross-correlation of the signals at the two ears is often used as mathematical surrogate for modelling such an array of coincidence detecting neurons; the reduced response rate is translated into a reduction in the cross-correlation maximum.
The subtractive account is known as "equalization-cancellation" or "EC" theory. In this account, the waveforms at the two ears are temporally aligned by the brain, before being subtracted one from the other. In effect, the coincidence detectors are replaced with neurons that are excited by action potentials from one ear, but inhibited by action potentials from the other. However, EC theory is not generally framed in such explicit neurological terms, and no suitable neural substrate has been identified in the brain. Nonetheless, EC theory has proved a very popular modelling framework, and has fared well in direct comparison with cross-correlation models in psychoacoustic experiments
Perceptual cues
The ear filters incoming sound into different frequencies: a given place in the cochlea, and a given auditory nerve fibre, respond only to a limited range of frequencies. Consequently, researchers have examined the cues that are generated by mixtures of speech and noise at the two ears within a narrow frequency band around the signal. When a signal and narrowband noise are added, a vector summation occurs in which the resultant amplitude and phase differ from those of the noise or signal alone. For a binaural unmasking stimulus, the differences between the interaural parameters of the signal and noise mean that there will be a different vector summation at each ear. Consequently, regardless of the stimulus construction, there tend to be fluctuations in both the level and phase differences of the stimuli at the listener's ears.Experiments have examined which of these cues the auditory system can best detect. These have shown that, at low frequencies, the auditory system is most sensitive to the interaural time differences. At higher frequencies, however, there seems to be a transition to using interaural level differences.