Operant conditioning


Operant conditioning, also called instrumental conditioning, is a learning process in which voluntary behaviors are modified by association with the addition of reward or aversive stimuli. The frequency or duration of the behavior may increase through reinforcement or decrease through punishment or extinction.

Origins

Operant conditioning originated with Edward Thorndike, whose law of effect theorised that behaviors arise as a result of consequences as satisfying or discomforting. In the 20th century, operant conditioning was studied by behavioral psychologists, who believed that much of mind and behaviour is explained through environmental conditioning. Reinforcements are environmental stimuli that increase behaviors, whereas punishments are stimuli that decrease behaviors. Both kinds of stimuli can be further categorised into positive and negative stimuli, which respectively involve the addition or removal of environmental stimuli.
Operant conditioning differs from classical conditioning in both mechanism and outcome. While classical conditioning pairs stimuli to produce involuntary, reflexive behaviors, operant conditioning shapes voluntary behaviors through their consequences. Actions followed by rewards tend to be repeated, while those followed by negative outcomes diminish.
The study of animal learning in the 20th century was dominated by the analysis of these two sorts of learning, and they are still at the core of behavior analysis. They have also been applied to the study of social psychology, helping to clarify certain phenomena such as the false consensus effect.

History

Thorndike's law of effect

Operant conditioning, sometimes called instrumental learning, was first extensively studied by Edward L. Thorndike, who observed the behavior of cats trying to escape from home-made puzzle boxes. A cat could escape from the box by a simple response such as pulling a cord or pushing a pole, but when first constrained, the cats took a long time to get out. With repeated trials ineffective responses occurred less frequently and successful responses occurred more frequently, so the cats escaped more and more quickly. Thorndike generalized this finding in his law of effect, which states that behaviors followed by satisfying consequences tend to be repeated and those that produce unpleasant consequences are less likely to be repeated. In short, some consequences strengthen behavior and some consequences weaken behavior. By plotting escape time against trial number Thorndike produced the first known animal learning curves through this procedure.
Humans appear to learn many simple behaviors through the sort of process studied by Thorndike, now called operant conditioning. That is, responses are retained when they lead to a successful outcome and discarded when they do not, or when they produce aversive effects. This usually happens without being planned by any "teacher", but operant conditioning has been used by parents in teaching their children for thousands of years.

B. F. Skinner

is referred to as the Father of operant conditioning, and his work is frequently cited in connection with this topic. His 1938 book "The Behavior of Organisms: An Experimental Analysis", initiated his lifelong study of operant conditioning and its application to human and animal behavior. Following the ideas of Ernst Mach, Skinner rejected Thorndike's reference to unobservable mental states such as satisfaction, building his analysis on observable behavior and its equally observable consequences.
Skinner believed that classical conditioning was too simplistic to be used to describe something as complex as human behavior. Operant conditioning, in his opinion, better described human behavior as it examined causes and effects of intentional behavior.
To implement his empirical approach, Skinner invented the operant conditioning chamber, or "Skinner Box", in which subjects such as pigeons and rats were isolated and could be exposed to carefully controlled stimuli. Unlike Thorndike's puzzle box, this arrangement allowed the subject to make one or two simple, repeatable responses, and the rate of such responses became Skinner's primary behavioral measure. Another invention, the cumulative recorder, produced a graphical record from which these response rates could be estimated. These records were the primary data that Skinner and his colleagues used to explore the effects on response rate of various reinforcement schedules. A reinforcement schedule may be defined as "any procedure that delivers reinforcement to an organism according to some well-defined rule". The effects of schedules became, in turn, the basic findings from which Skinner developed his account of operant conditioning. He also drew on many less formal observations of human and animal behavior.
Many of Skinner's writings are devoted to the application of operant conditioning to human behavior. In 1948 he published Walden Two, a fictional account of a peaceful, happy, productive community organized around his conditioning principles. In 1957, Skinner published Verbal Behavior, which extended the principles of operant conditioning to language, a form of human behavior that had previously been analyzed quite differently by linguists and others. Skinner defined new functional relationships such as "mands" and "tacts" to capture some essentials of language, but he introduced no new principles, treating verbal behavior like any other behavior controlled by its consequences, which included the reactions of the speaker's audience.

Concepts and procedures

Origins of operant behavior: operant variability

Operant behavior is said to be "emitted"; that is, initially it is not elicited by any particular stimulus. Thus one may ask why it happens in the first place. The answer to this question is like Darwin's answer to the question of the origin of a "new" bodily structure, namely, variation and selection. Similarly, the behavior of an individual varies from moment to moment, in such aspects as the specific motions involved, the amount of force applied, or the timing of the response. Variations that lead to reinforcement are strengthened, and if reinforcement is consistent, the behavior tends to remain stable. However, behavioral variability can itself be altered through the manipulation of certain variables.

Modifying operant behavior: reinforcement and punishment

Reinforcement and punishment are the core tools through which operant behavior is modified. These terms are defined by their effect on behavior. "Positive" and "negative" refer to whether a stimulus was added or removed, respectively. Similarly, "reinforcement" and "punishment" refer to the future frequency of the behavior. Reinforcement describes a consequence that makes a behavior occur more often in the future, whereas punishment is a consequence that makes a behavior occur less often.
There are a total of four consequences:
  1. Positive reinforcement occurs when a behavior results in a desired stimulus being added and increases the frequency of that behavior in the future. Example: if a rat in a Skinner box gets food when it presses a lever, its rate of pressing will go up. Pressing the lever was positively reinforced.
  2. Negative reinforcement occurs when a behavior is followed by the removal of an aversive stimulus, thereby increasing the original behavior's frequency. Example: A child is afraid of loud noises at a fireworks display. They put on a pair of headphones, and they can no longer hear the fireworks. The next time the child sees fireworks, they put on a pair of headphones. Putting on headphones was negatively reinforced.
  3. Positive punishment occurs when a behavior is followed by an aversive stimulus which makes the behavior less likely to occur in the future. Example: A child touches a hot stove and burns his hand. The next time he sees a stove, he does not touch it. Touching the stove was positively punished.
  4. Negative punishment occurs when a behavior is followed by the removal of a stimulus, and the behavior is less likely to occur in the future. Example: When an employee puts their lunch in a communal refrigerator, it gets stolen before break time. The next time the employee brings a lunch to work, they do not put it in the refrigerator. Putting the lunch in the refrigerator was negatively punished.
  • Extinction is a consequence strategy that occurs when a previously reinforced behavior is no longer reinforced with either positive or negative reinforcement. During extinction the behavior becomes less probable. Occasional reinforcement can lead to an even longer delay before behavior extinction due to the learning factor of repeated instances becoming necessary to get reinforcement, when compared with reinforcement being given at each opportunity before extinction.
A study suggests that tactile feedback, such as haptic vibrations from mobile devices, can function as secondary reinforcers, strengthening consumer behaviors such as online purchasing.

Schedules of reinforcement

Schedules of reinforcement are rules that control the delivery of reinforcement. The rules specify either the time that reinforcement is to be made available, or the number of responses to be made, or both. Many rules are possible, but the following are the most basic and commonly used
  • Fixed interval schedule: Reinforcement occurs following the first response after a fixed time has elapsed after the previous reinforcement. This schedule yields a "break-run" pattern of response; that is, after training on this schedule, the organism typically pauses after reinforcement, and then begins to respond rapidly as the time for the next reinforcement approaches.
  • Variable interval schedule: Reinforcement occurs following the first response after a variable time has elapsed from the previous reinforcement. This schedule typically yields a relatively steady rate of response that varies with the average time between reinforcements.
  • Fixed ratio schedule: Reinforcement occurs after a fixed number of responses have been emitted since the previous reinforcement. An organism trained on this schedule typically pauses for a while after a reinforcement and then responds at a high rate. If the response requirement is low there may be no pause; if the response requirement is high the organism may quit responding altogether.
  • Variable ratio schedule: Reinforcement occurs after a variable number of responses have been emitted since the previous reinforcement. This schedule typically yields a very high, persistent rate of response.
  • Continuous reinforcement: Reinforcement occurs after each response. Organisms typically respond as rapidly as they can, given the time taken to obtain and consume reinforcement, until they are satiated.