Sound localization

Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.
The sound localization mechanisms of the mammalian auditory system have been extensively studied. The auditory system uses several cues for sound source localization, including time difference and level difference between the ears, and spectral information. Other animals, such as birds and reptiles, also use them but they may use them differently, and some also have localization cues which are absent in the human auditory system, such as the effects of ear movements. Animals with the ability to localize sound have a evolutionary advantage.

How sound reaches the brain

Sound is the perceptual result of mechanical vibrations traveling through a medium such as air or water. Through the mechanisms of compression and rarefaction, sound waves travel through the air, bounce off the pinna and concha of the exterior ear, and enter the ear canal. In mammals, the sound waves vibrate the tympanic membrane, causing the three bones of the middle ear to vibrate, which then sends the energy through the oval window and into the cochlea where it is changed into a chemical signal by hair cells in the organ of Corti, which synapse onto spiral ganglion fibers that travel through the cochlear nerve into the brain.

Neural interactions

In vertebrates, interaural time differences are known to be calculated in the superior olivary nucleus of the brainstem. According to Jeffress, this calculation relies on delay lines: neurons in the superior olive which accept innervation from each ear with different connecting axon lengths. Some cells are more directly connected to one ear than the other, thus they are specific for a particular interaural time difference. This theory is equivalent to the mathematical procedure of cross-correlation. However, because Jeffress's theory is unable to account for the precedence effect, in which only the first of multiple identical sounds is used to determine the sounds' location, it cannot be entirely used to explain the response. Furthermore, a number of recent physiological observations made in the midbrain and brainstem of small mammals have shed considerable doubt on the validity of Jeffress's original ideas.
Neurons sensitive to interaural level differences are excited by stimulation of one ear and inhibited by stimulation of the other ear, such that the response magnitude of the cell depends on the relative strengths of the two inputs, which in turn, depends on the sound intensities at the ears.
In the auditory midbrain nucleus, the inferior colliculus, many ILD sensitive neurons have response functions that decline steeply from maximum to zero spikes as a function of ILD. However, there are also many neurons with much more shallow response functions that do not decline to zero spikes.

Human auditory system

Sound localization is the process of determining the location of a sound source. The brain utilizes subtle differences in intensity, spectral, and timing cues to localize sound sources.
Localization can be described in terms of three-dimensional position: the azimuth or horizontal angle, the elevation or vertical angle, and the distance or velocity.
The azimuth of a sound is signaled by the difference in arrival times between the ears, by the relative amplitude of high-frequency sounds, and by the asymmetrical spectral reflections from various parts of our bodies, including torso, shoulders, and pinnae.
The distance cues are the loss of amplitude, the loss of high frequencies, and the ratio of the direct signal to the reverberated signal.
Depending on where the source is located, our head acts as a barrier to change the timbre, intensity, and spectral qualities of the sound, helping the brain orient where the sound emanated from. These minute differences between the two ears are known as interaural cues.
Lower frequencies, with longer wavelengths, diffract the sound around the head forcing the brain to focus only on the phasing cues from the source.
Helmut Haas discovered that we can discern the sound source despite additional reflections at 10 decibels louder than the original wave front, using the earliest arriving wave front. This principle is known as the Haas effect, a specific version of the precedence effect. Haas measured down to even a 1 millisecond difference in timing between the original sound and reflected sound increased the spaciousness, allowing the brain to discern the true location of the original sound. The nervous system combines all early reflections into a single perceptual whole allowing the brain to process multiple different sounds at once. The nervous system will combine reflections that are within about 35 milliseconds of each other and that have a similar intensity.

Duplex theory

To determine the lateral input direction, the auditory system analyzes the following ear signal information:
In 1907, Lord Rayleigh utilized tuning forks to generate monophonic excitation and studied the lateral sound localization theory on a human head model without auricle. He first presented the interaural clue difference based sound localization theory, which is known as Duplex Theory. Human ears are on different sides of the head, and thus have different coordinates in space. As shown in the duplex theory figure, since the distances between the acoustic source and ears are different, there are time difference and intensity difference between the sound signals of two ears. We call those kinds of differences as Interaural Time Difference and Interaural Intensity Difference respectively.
From the duplex theory figure we can see that for source B1 or source B2, there will be a propagation delay between two ears, which will generate the ITD. Simultaneously, human head and ears may have a shadowing effect on high-frequency signals, which will generate IID.

Interaural time difference – Sound from the right side reaches the right ear earlier than the left ear. The auditory system evaluates interaural time differences from: Phase delays at low frequencies and group delays at high frequencies.
Theory and experiments show that ITD relates to the signal frequency. Suppose the angular position of the acoustic source is, the head radius is and the acoustic velocity is, the function of ITD is given by: . In above closed form, we assumed that the 0 degree is in the right ahead of the head and counter-clockwise is positive.
Interaural intensity difference or interaural level difference – Sound from the right side has a higher level at the right ear than at the left ear, because the head shadows the left ear. These level differences are highly frequency dependent and they increase with increasing frequency. Massive theoretical researches demonstrate that IID relates to the signal frequency and the angular position of the acoustic source. The function of IID is given by:
For frequencies below 1000 Hz, mainly ITDs are evaluated, for frequencies above 1500 Hz mainly IIDs are evaluated. Between 1000 Hz and 1500 Hz there is a transition zone, where both mechanisms play a role.
Localization accuracy is 1 degree for sources in front of the listener and 15 degrees for sources to the sides. Humans can discern interaural time differences of 10 microseconds or less.

For frequencies below 800 Hz, the dimensions of the head are smaller than the half wavelength of the sound waves. So the auditory system can determine phase delays between both ears without confusion. Interaural level differences are very low in this frequency range, especially below about 200 Hz, so a precise evaluation of the input direction is nearly impossible on the basis of level differences alone. As the frequency drops below 80 Hz it becomes difficult or impossible to use either time difference or level difference to determine a sound's lateral source, because the phase difference between the ears becomes too small for a directional evaluation.
For frequencies above 1600 Hz the dimensions of the head are greater than the length of the sound waves. An unambiguous determination of the input direction based on interaural phase alone is not possible at these frequencies. However, the interaural level differences become larger, and these level differences are evaluated by the auditory system. Also, delays between the ears can still be detected via some combination of phase differences and group delays, which are more pronounced at higher frequencies; that is, if there is a sound onset, the delay of this onset between the ears can be used to determine the input direction of the corresponding sound source. This mechanism becomes especially important in reverberant environments. After a sound onset there is a short time frame where the direct sound reaches the ears, but not yet the reflected sound. The auditory system uses this short time frame for evaluating the sound source direction, and keeps this detected direction as long as reflections and reverberation prevent an unambiguous direction estimation. The mechanisms described above cannot be used to differentiate between a sound source ahead of the hearer or behind the hearer; therefore additional cues have to be evaluated.

Pinna filtering effect

Duplex theory shows that ITD and IID play significant roles in sound localization, but they can only deal with lateral localization problems. For example, if two acoustic sources are placed symmetrically at the front and back of the right side of the human head, they will generate equal ITDs and IIDs, in what is called the cone model effect. However, human ears can still distinguish between these sources. Besides that, in natural sense of hearing, one ear alone, without any ITD or IID, can distinguish between them with high accuracy. Due to the disadvantages of duplex theory, researchers proposed the pinna filtering effect theory. The shape of the human pinna is concave with complex folds and asymmetrical both horizontally and vertically. Reflected and direct waves generate a frequency spectrum on the eardrum, relating to the acoustic sources. Then auditory nerves localize the sources using this frequency spectrum.
These spectrum clues generated by the pinna filtering effect can be presented as a head-related transfer function. The corresponding time domain expressions are called the head-related impulse response. The HRTF is also described as the transfer function from the free field to a specific point in the ear canal. HRTFs are usually recognized as LTI systems:
where L and R represent the left ear and right ear respectively, and represent the amplitude of the sound pressure at the entrances to the left and right ear canals, and is the amplitude of sound pressure at the center of the head coordinate when listener does not exist. In general, an HRTF's and are functions of source angular position, elevation angle, the distance between the source and the center of the head, the angular velocity and the equivalent dimension of the head.
At present, the main institutes that work on measuring HRTF database include CIPIC International Lab, MIT Media Lab, the Graduate School in Psychoacoustics at the University of Oldenburg, the Neurophysiology Lab at the University of Wisconsin–Madison and Ames Lab of NASA. Databases of HRIRs from humans with normal and impaired hearing and from animals are publicly available.