Affective computing


Affective computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer science, psychology, and cognitive science. While some core ideas in the field may be traced as far back as to early philosophical inquiries into emotion, the more modern branch of computer science originated with Rosalind Picard's 1995 paper entitled "Affective Computing" and her 1997 book of the same name published by MIT Press. One of the motivations for the research is the ability to give machines emotional intelligence, including to simulate empathy. The machine should interpret the emotional state of humans and adapt its behavior to them, giving an appropriate response to those emotions. Recent experimental research has shown that subtle affective haptic feedback can shape human reward learning and mobile interaction behavior, suggesting that affective computing systems may not only interpret emotional states but also actively modulate user actions through emotion-laden outputs.

Areas

Detecting and recognizing emotional information

Detecting emotional information usually begins with passive sensors that capture data about the user's physical state or behavior without interpreting the input. The data gathered is analogous to the cues humans use to perceive emotions in others. For example, a video camera might capture facial expressions, body posture, and gestures, while a microphone might capture speech. Other sensors detect emotional cues by directly measuring physiological data, such as skin temperature and galvanic resistance.
Recognizing emotional information requires the extraction of meaningful patterns from the gathered data. This is done using machine learning techniques that process different modalities, such as speech recognition, natural language processing, or facial expression detection. The goal of most of these techniques is to produce labels that would match the labels a human perceiver would give in the same situation: For example, if a person makes a facial expression furrowing their brow, then the computer vision system might be taught to label their face as appearing "confused" or as "concentrating" or "slightly negative". These labels may or may not correspond to what the person is actually feeling.

Emotion in machines

Another area within affective computing is the design of computational devices proposed to exhibit either innate emotional capabilities or that are capable of convincingly simulating emotions. A more practical approach, based on current technological capabilities, is the simulation of emotions in conversational agents in order to enrich and facilitate interactivity between human and machine.
Marvin Minsky, one of the pioneering computer scientists in artificial intelligence, relates emotions to the broader issues of machine intelligence stating in The Emotion Machine that emotion is "not especially different from the processes that we call 'thinking.'" The innovative approach "digital humans" or virtual humans includes an attempt to give these programs, which simulate humans, the emotional dimension as well, including reactions in accordance with the reaction that a real person would react in a certain emotionally stimulating situation as well as facial expressions and gestures.
Emotion in machines often refers to emotion in computational, often AI-based, systems. As a result, the terms 'emotional AI' and '' are being used.

Technologies

In psychology, cognitive science, and in neuroscience, there have been two main approaches for describing how humans perceive and classify emotion: continuous or categorical. The continuous approach tends to use dimensions such as negative vs. positive, calm vs. aroused.
The categorical approach tends to use discrete classes such as happy, sad, angry, fearful, surprise, disgust. Different kinds of machine learning regression and classification models can be used for having machines produce continuous or discrete labels. Sometimes models are also built that allow combinations across the categories, e.g. a happy-surprised face or a fearful-surprised face.
The following sections consider many of the kinds of input data used for the task of emotion recognition.

Emotional speech

Various changes in the autonomic nervous system can indirectly alter a person's speech, and affective technologies can leverage this information to recognize emotion. For example, speech produced in a state of fear, anger, or joy becomes fast, loud, and precisely enunciated, with a higher and wider range in pitch, whereas emotions such as tiredness, boredom, or sadness tend to generate slow, low-pitched, and slurred speech. Some emotions have been found to be more easily computationally identified, such as anger or approval.
Emotional speech processing technologies recognize the user's emotional state using computational analysis of speech features. Vocal parameters and prosodic features such as pitch variables and speech rate can be analyzed through pattern recognition techniques.
Speech analysis is an effective method of identifying affective state, having an average reported accuracy of 70 to 80% in research from 2003 and 2006. These systems tend to outperform average human accuracy but are less accurate than systems which employ other modalities for emotion detection, such as physiological states or facial expressions. However, since many speech characteristics are independent of semantics or culture, this technique is considered to be a promising route for further research.

Algorithms

The process of speech/text affect detection requires the creation of a reliable database, knowledge base, or vector space model, broad enough to fit every need for its application, as well as the selection of a successful classifier which will allow for quick and accurate emotion identification.
, the most frequently used classifiers were linear discriminant classifiers, k-nearest neighbor, Gaussian mixture model, support vector machines, artificial neural networks, decision tree algorithms and hidden Markov models. Various studies showed that choosing the appropriate classifier can significantly enhance the overall performance of the system. The list below gives a brief description of each algorithm:
  • LDC – Classification happens based on the value obtained from the linear combination of the feature values, which are usually provided in the form of vector features.
  • k-NN – Classification happens by locating the object in the feature space, and comparing it with the k nearest neighbors. The majority vote decides on the classification.
  • GMM – is a probabilistic model used for representing the existence of subpopulations within the overall population. Each sub-population is described using the mixture distribution, which allows for classification of observations into the sub-populations.
  • SVM – is a type of linear classifier which decides in which of the two possible classes, each input may fall into.
  • ANN – is a mathematical model, inspired by biological neural networks, that can better grasp possible non-linearities of the feature space.
  • Decision tree algorithms – work based on following a decision tree in which leaves represent the classification outcome, and branches represent the conjunction of subsequent features that lead to the classification.
  • HMMs – a statistical Markov model in which the states and state transitions are not directly available to observation. Instead, the series of outputs dependent on the states are visible. In the case of affect recognition, the outputs represent the sequence of speech feature vectors, which allow the deduction of states' sequences through which the model progressed. The states can consist of various intermediate steps in the expression of an emotion, and each of them has a probability distribution over the possible output vectors. The states' sequences allow us to predict the affective state which we are trying to classify, and this is one of the most commonly used techniques within the area of speech affect detection.
It is proved that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM-RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all multiclass SVM with Hybrid kernels and the set of classifiers which consists of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers.

Databases

The vast majority of present systems are data-dependent. This creates one of the biggest challenges in detecting emotions based on speech, as it implicates choosing an appropriate database used to train the classifier. Most of the currently possessed data was obtained from actors and is thus a representation of archetypal emotions. Those so-called acted databases are usually based on the Basic Emotions theory, which assumes the existence of six basic emotions, the others simply being a mix of the former ones. Nevertheless, these still offer high audio quality and balanced classes, which contribute to high success rates in recognizing emotions.
However, for real life application, naturalistic data is preferred. A naturalistic database can be produced by observation and analysis of subjects in their natural context. Ultimately, such database should allow the system to recognize emotions based on their context as well as work out the goals and outcomes of the interaction. The nature of this type of data allows for authentic real life implementation, due to the fact it describes states naturally occurring during the human–computer interaction.
Despite the numerous advantages which naturalistic data has over acted data, it is difficult to obtain and usually has low emotional intensity. Moreover, data obtained in a natural context has lower signal quality, due to surroundings noise and distance of the subjects from the microphone. The first attempt to produce such database was the FAU Aibo Emotion Corpus for CEICES, which was developed based on a realistic context of children playing with Sony's Aibo robot pet. Likewise, producing one standard database for all emotional research would provide a method of evaluating and comparing different affect recognition systems.