Categorical perception

Categorical perception is a phenomenon of perception of distinct categories when there is gradual change in a variable along a continuum. It was originally observed for auditory stimuli but now found to be applicable to other perceptual modalities.

Motor theory of speech perception

If one analyzes the sound spectrogram of and, for example, and can be visualized as lying somewhere on an acoustic continuum based on their VOT. It is possible to construct a continuum of some intermediate tokens lying between the and endpoints by gradually decreasing the voice onset time.
Alvin Liberman and colleagues reported that when people listen to sounds that vary along the voicing continuum, they perceive only /ba/s and /pa/s, nothing in between. This effect—in which a perceived quality jumps abruptly from one category to another at a certain point along a continuum, instead of changing gradually—he dubbed "categorical perception". He suggested that CP was unique to speech, that CP made speech special, and, in what came to be called "the motor theory of speech perception," he suggested that CP's explanation lay in the anatomy of speech production.
According to the motor theory of speech perception, the reason people perceive an abrupt change between /ba/ and /pa/ is that the way we hear speech sounds is influenced by how people produce them when they speak. What is varying along this continuum is voice-onset-time: the "b" in has shorter VOT than the "p" in . Apparently, unlike the synthetic "morphing" apparatus, people's natural vocal apparatus is not capable of producing anything in between ba and pa. So when one hears a sound from the VOT continuum, their brain perceives it by trying to match it with what it would have had to do to produce it. Since the only thing they can produce is /ba/ or /pa/, they will perceive any of the synthetic stimuli along the continuum as either /ba/ or /pa/, whichever it is closer to. A similar CP effect is found with ba/da ; these too lie along a continuum acoustically, but vocally, /ba/ is formed with the two lips, /da/ with the tip of the tongue and the alveolar ridge, and our anatomy does not allow any intermediates.
The motor theory of speech perception explained how speech was special and why speech-sounds are perceived categorically: sensory perception is mediated by motor production.

Acquired distinctiveness

If motor production mediates sensory perception, then one assumes that this CP effect is a result of learning to produce speech. Eimas et al., however, found that infants already have speech CP before they begin to speak. Perhaps, then, it is an innate effect, evolved to "prepare" us to learn to speak. But Kuhl found that chinchillas also have "speech CP" even though they never learn to speak, and presumably did not evolve to do so. Lane went on to show that CP effects can be induced by learning alone, with a purely sensory continuum in which there is no motor production discontinuity to mediate the perceptual discontinuity. He concluded that speech CP is not special after all, but merely a special case of Lawrence's classic demonstration that stimuli to which you learn to make a different response become more distinctive and stimuli to which you learn to make the same response become more similar.
It also became clear that CP was not quite the all-or-none effect Liberman had originally thought it was: It is not that all /pa/s are indistinguishable and all /ba/s are indistinguishable: We can hear the differences, just as we can see the differences between different shades of red. It is just that the within-category differences sound/look much smaller than the between-category differences, even when the size of the underlying physical differences are actually the same.

Identification and discrimination tasks

The study of categorical perception often uses experiments involving discrimination and identification tasks in order to categorize participants' perceptions of sounds. Voice onset time is measured along a continuum rather than a binary. English bilabial stops /b/ and /p/ are voiced and voiceless counterparts of the same place and manner of articulation, yet native speakers distinguish the sounds primarily by where they fall on the VOT continuum. Participants in these experiments establish clear phoneme boundaries on the continuum; two sounds with different VOT will be perceived as the same phoneme if on the same side of the boundary. Participants take longer to discriminate between two sounds falling in the same category of VOT than between two on opposite sides of the phoneme boundary, even if the difference in VOT is greater between the two in the same category.

Identification

In a categorical perception identification task, participants often must identify stimuli, such as speech sounds. An experimenter testing the perception of the VOT boundary between /p/ and /b/ may play several sounds falling on various parts of the VOT continuum and ask volunteers whether they hear each sound as /p/ or /b/. In such experiments, sounds on one side of the boundary are heard almost universally as /p/ and on the other as /b/. Stimuli on or near the boundary take longer to identify and are reported differently by different volunteers, but are perceived as either /b/ or /p/, rather than as a sound somewhere in the middle.

Discrimination

A simple AB discrimination task presents participants with two options and participants must decide if they are identical. Predictions for a discrimination task in an experiment are often based on the preceding identification task. An ideal discrimination experiment validating categorical perception of stop consonants would result in volunteers more often correctly discriminating stimuli that fall on opposite sides of the boundary, while discriminating at chance level on the same side of the boundary.
In an ABX discrimination task, volunteers are presented with three stimuli. A and B must be distinct stimuli and volunteers decide which of the two the third stimulus X matches. This discrimination task is much more common than a simple AB task.

Whorf hypothesis

According to the Sapir–Whorf hypothesis, language affects the way that people perceive the world. For example, colors are perceived categorically only because they happen to be named categorically: Our subdivisions of the spectrum are arbitrary, learned, and vary across cultures and languages. But Berlin & Kay suggested that this was not so: Not only do most cultures and languages subdivide and name the color spectrum the same way, but even for those who don't, the regions of compression and separation are the same. We all see blues as more alike and greens as more alike, with a fuzzy boundary in between, whether or not we have named the difference. This view has been challenged in a review article by Regier and Kay who discuss a distinction between the questions "1. Do color terms affect color perception?" and "2. Are color categories determined by largely arbitrary linguistic convention?". They report evidence that linguistic categories, stored in the left hemisphere of the brain for most people, do affect categorical perception but primarily in the right visual field, and that this effect is eliminated with a concurrent verbal interference task.
Universalism, in contrasts to the Sapir-Whorf hypothesis, posits that perceptual categories are innate, and are unaffected by the language that one speaks.

Support

Support of the Sapir-Whorf hypothesis describes instances in which speakers of one language demonstrate categorical perception in a way that is different from speakers of another language. Examples of such evidence are provided below:
Regier and Kay reported evidence that linguistic categories affect categorical perception primarily in the right visual field. The right visual field is controlled by the left hemisphere of the brain, which also controls language faculties. Davidoff presented evidence that in color discrimination tasks, native English speakers discriminated more easily between color stimuli across a determined blue-green boundary than within the same side, but did not show categorical perception when given the same task with Berinmo "nol" and "wor"; Berinmo speakers performed oppositely.
A popular theory in current research is "weak-Whorfianism,' which is the theory that although there is a strong universal component to perception, cultural differences still have an impact. For example, a 1998 study found that while there was evidence of universal perception of color between speakers of Setswana and English, there were also marked differences between the two language groups.

Evolved categorical perception

The signature of categorical perception is within-category compression and/or between-category separation. The size of the CP effect is merely a scaling factor; it is this compression/separation "accordion effect", that is CP's distinctive feature. In this respect, the "weaker" CP effect for vowels, whose motor production is continuous rather than categorical, but whose perception is by this criterion categorical, is every bit as much of a CP effect as the ba/pa and ba/da effects. But, as with colors, it looks as if the effect is an innate one: Our sensory category detectors for both color and speech sounds are born already "biased" by evolution: Our perceived color and speech-sound spectrum is already "warped" with these compression/separations.

Learned categorical perception

The Lane/Lawrence demonstrations, lately replicated and extended by Goldstone, showed that CP can be induced by learning alone. There are also the countless categories cataloged in our dictionaries that, according to categorical perception, are unlikely to be inborn. Nativist theorists such as Fodor have sometimes seemed to suggest that all of our categories are inborn. There are recent demonstrations that, although the primary color and speech categories may be inborn, their boundaries can be modified or even lost as a result of learning, and weaker secondary boundaries can be generated by learning alone.
In the case of innate CP, our categorically biased sensory detectors pick out their prepared color and speech-sound categories far more readily and reliably than if our perception had been continuous.
Learning is a cognitive process that results in a relatively permanent change in behavior. Learning can influence perceptual processing. Learning influences perceptual processing by altering the way in which an individual perceives a given stimulus based on prior experience or knowledge. This means that the way something is perceived is changed by how it was seen, observed, or experienced before. The effects of learning can be studied in categorical perception by looking at the processes involved.
Learned categorical perception can be divided into different processes through some comparisons. The processes can be divided into between category and within category groups of comparison
. Between category groups are those that compare between two separate sets of objects. Within category groups are those that compare within one set of objects. Between subjects comparisons lead to a categorical expansion effect. A categorical expansion occurs when the classifications and boundaries for the category become broader, encompassing a larger set of objects. In other words, a categorical expansion is when the "edge lines" for defining a category become wider. Within subjects comparisons lead to a categorical compression effect. A categorical compression effect corresponds to the narrowing of category boundaries to include a smaller set of objects. Therefore, between category groups lead to less rigid group definitions whereas within category groups lead to more rigid definitions.
Another method of comparison is to look at both supervised and unsupervised group comparisons. Supervised groups are those for which categories have been provided, meaning that the category has been defined previously or given a label; unsupervised groups are groups for which categories are created, meaning that the categories will be defined as needed and are not labeled.
In studying learned categorical perception, themes are important. Learning categories is influenced by the presence of themes. Themes increase quality of learning. This is seen especially in cases where the existing themes are opposite. In learned categorical perception, themes serve as cues for different categories. They assist in designating what to look for when placing objects into their categories. For example, when perceiving shapes, angles are a theme. The number of angles and their size provide more information about the shape and cue different categories. Three angles would cue a triangle, whereas four might cue a rectangle or a square. Opposite to the theme of angles would be the theme of circularity. The stark contrast between the sharp contour of an angle and the round curvature of a circle make it easier to learn.
Similar to themes, labels are also important to learned categorical perception. Labels are “noun-like” titles that can encourage categorical processing with a focus on similarities. The strength of a label can be determined by three factors: analysis of affective strength, permeability of boundaries, and a judgment of discreteness. Sources of labels differ, and, similar to unsupervised/supervised categories, are either created or already exist. Labels affect perception regardless of their source. Peers, individuals, experts, cultures, and communities can create labels. The source doesn’t appear to matter as much as mere presence of a label, what matters is that there is a label. There is a positive correlation between strength of the label and the degree to which the label affects perception, meaning that the stronger the label, the more the label affects perception.
Cues used in learned categorical perception can foster easier recall and access of prior knowledge in the process of learning and using categories. An item in a category can be easier to recall if the category has a cue for the memory. As discussed, labels and themes both function as cues for categories, and, therefore, aid in the memory of these categories and the features of the objects belonging to them.
There are several brain structures at work that promote learned categorical perception. The areas and structures involved include: neurons, the prefrontal cortex, and the inferotemporal cortex. Neurons in general are linked to all processes in the brain and, therefore, facilitate learned categorical perception. They send the messages between brain areas and facilitate the visual and linguistic processing of the category. The prefrontal cortex is involved in “forming strong categorical representations.” The inferotemporal cortex has cells that code for different object categories and are turned along diagnostic category dimensions, areas distinguishing category boundaries.
The learning of categories and categorical perception can be improved through adding verbal labels, making themes relevant to the self, making more separate categories, and by targeting similar features that make it easier to form and define categories.
Learned categorical perception occurs not only in human species but has been demonstrated in animal species as well. Studies have targeted categorical perception using humans, monkeys, rodents, birds, frogs. These studies have led to numerous discoveries. They focus primarily on learning the boundaries of categories, where inclusion begins and ends, and they support the hypothesis that categorical perception does have a learned component.

Computational and neural models

Computational modeling has shown that many types of category-learning mechanisms display CP-like effects. In back-propagation nets, the hidden-unit activation patterns that "represent" an input build up within-category compression and between-category separation as they learn; other kinds of nets display similar effects. CP seems to be a means to an end: Inputs that differ among themselves are "compressed" onto similar internal representations if they must all generate the same output; and they become more separate if they must generate different outputs. The network's "bias" is what filters inputs onto their correct output category. The nets accomplish this by selectively detecting the invariant features that are shared by the members of the same category and that reliably distinguish them from members of different categories; the nets learn to ignore all other variation as irrelevant to the categorization.

Brain basis

Neural data provide correlates of CP and of learning. Differences between event-related potentials recorded from the brain have been found to be correlated with differences in the perceived category of the stimulus viewed by the subject. Neural imaging studies have shown that these effects are localized and even lateralized to certain brain regions in subjects who have successfully learned the category, and are absent in subjects who have not.
Categorical perception is identified with the left prefrontal cortex with this showing such perception for speech units while this is not by posterior areas earlier in their processing such as areas in the left superior temporal gyrus.

Language-induced

Both innate and learned CP are sensorimotor effects: The compression/separation biases are sensorimotor biases, and presumably had sensorimotor origins, whether during the sensorimotor life-history of the organism, in the case of learned CP, or the sensorimotor life-history of the species, in the case of innate CP. The neural net I/O models are also compatible with this fact: Their I/O biases derive from their I/O history. But when we look at our repertoire of categories in a dictionary, it is highly unlikely that many of them had a direct sensorimotor history during our lifetimes, and even less likely in our ancestors' lifetimes. How many of us have seen a unicorn in real life? We have seen pictures of them, but what had those who first drew those pictures seen? And what about categories I cannot draw or see : What about the most abstract categories, such as goodness and truth?
Some of our categories must originate from another source than direct sensorimotor experience, and here we return to language and the Whorf Hypothesis: Can categories, and their accompanying CP, be acquired through language alone? Again, there are some neural net simulation results suggesting that once a set of category names has been "grounded" through direct sensorimotor experience, they can be combined into Boolean combinations and into still higher-order combinations which not only pick out the more abstract, higher-order categories much the way the direct sensorimotor detectors do, but also inherit their CP effects, as well as generating some of their own. Bachelor inherits the compression/separation of unmarried and man, and adds a layer of separation/compression of its own.
These language-induced CP-effects remain to be directly demonstrated in human subjects; so far only learned and innate sensorimotor CP have been demonstrated. The latter shows the Whorfian power of naming and categorization, in warping our perception of the world. That is enough to rehabilitate the Whorf Hypothesis from its apparent failure on color terms, but to show that it is a full-blown language effect, and not merely a vocabulary effect, it will have to be shown that our perception of the world can also be warped, not just by how things are named but by what we are told about them.

Emotion

Emotions are an important characteristic of the human species. An emotion is an abstract concept that is most easily observed by looking at facial expressions. Emotions and their relation to categorical perception are often studied using facial expressions. Faces contain a large amount of valuable information.
Emotions are divided into categories because they are discrete from one another. Each emotion entails a separate and distinct set of reactions, consequences, and expressions. The feeling and expression of emotions is a natural occurrence, and, it is actually a universal occurrence for some emotions. There are six basic emotions that are considered universal to the human species across age, gender, race, country, and culture and that are considered to be categorically distinct. These six basic emotions are: happiness, disgust, sadness, surprise, anger, and fear. According to the discrete emotions approach, people experience one emotion and not others, rather than a blend. Categorical perception of emotional facial expressions does not require lexical categories. Of these six emotions, happiness is the most easily identified.
The perception of emotions using facial expressions reveals slight gender differences based on the definition and boundaries of the categories. The emotion of anger is perceived easier and quicker when it is displayed by males. However, the same effects are seen in the emotion of happiness when portrayed by women. These effects are essentially observed because the categories of the two emotions are more closely associated with other features of these specific genders.
Although a verbal label is provided to emotions, it is not required to categorically perceive them. Before language in infants, they can distinguish emotional responses. The categorical perception of emotions is by a "hardwired mechanism". Additional evidence exists showing the verbal labels from cultures that may not have a label for a specific emotion but can still categorically perceive it as its own emotion, discrete and isolated from other emotions. The perception of emotions into categories has also been studied using the tracking of eye movements which showed an implicit response with no verbal requirement because the eye movement response required only the movement and no subsequent verbal response.
The categorical perception of emotions is sometimes a result of joint processing. Other factors may be involved in this perception. Emotional expression and invariable features often work together. Race is one of the invariable features that contribute to categorical perception in conjunction with expression. Race can also be considered a social category. Emotional categorical perception can also be seen as a mix of categorical and dimensional perception. Dimensional perception involves visual imagery. Categorical perception occurs even when processing is dimensional.