Learning rule

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weight and bias levels of a network when it is simulated in a specific data environment. A learning rule may accept existing conditions of the network, and will compare the expected result and actual result of the network to give new and improved values for the weights and biases. Depending on the complexity of the model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.
The learning rule is one of the factors which decides how fast or how accurately the neural network can be developed. Depending on the process to develop the network, there are three main paradigms of machine learning:

Background

A lot of the learning methods in machine learning work similar to each other, and are based on each other, which makes it difficult to classify them in clear categories. But they can be broadly understood in 4 categories of learning methods, though these categories don't have clear boundaries and they tend to belong to multiple categories of learning methods -

Hebbian - Neocognitron, Brain-state-in-a-box
Gradient Descent - ADALINE, Hopfield Network, Recurrent Neural Network
Competitive - Learning Vector Quantisation, Self-Organising Feature Map, Adaptive Resonance Theory
Stochastic - Boltzmann Machine, Cauchy Machine

Though these learning rules might appear to be based on similar ideas, they do have subtle differences, as they are a generalisation or application over the previous rule, and hence it makes sense to study them separately based on their origins and intents.

Hebbian Learning

Developed by Donald Hebb in 1949 to describe biological neuron firing. In the mid-1950s it was also applied to computer simulations of neural networks.
Where represents the learning rate, represents the input of neuron i, and y is the output of the neuron. It has been shown that Hebb's rule in its basic form is unstable. Oja's Rule, BCM Theory are other learning rules built on top of or alongside Hebb's Rule in the study of biological neurons.

Perceptron Learning Rule (PLR)

The perceptron learning rule originates from the Hebbian assumption, and was used by Frank Rosenblatt in his perceptron in 1958. The net is passed to the activation function and the function's output is used for adjusting the weights. The learning signal is the difference between the desired response and the actual response of a neuron. The step function is often used as an activation function, and the outputs are generally restricted to -1, 0, or 1.
The weights are updated with
where "t" is the target value and "o" is the output of the perceptron, and is called the learning rate.
The algorithm converges to the correct classification if:

the training data is linearly separable*
is sufficiently small

*It should also be noted that a single layer perceptron with this learning rule is incapable of working on linearly non-separable inputs, and hence the XOR problem cannot be solved using this rule alone

Backpropagation

Seppo Linnainmaa in 1970 is said to have developed the Backpropagation Algorithm but the origins of the algorithm go back to the 1960s with many contributors. It is a generalisation of the least mean squares algorithm in the linear perceptron and the Delta Learning Rule.
It implements gradient descent search through the space possible network weights, iteratively reducing the error, between the target values and the network outputs.

Widrow-Hoff Learning (Delta Learning Rule)

Similar to the perceptron learning rule but with different origin. It was developed for use in the ADALINE network, which differs from the Perceptron mainly in terms of the training. The weights are adjusted according to the weighted sum of the inputs, whereas in perceptron the sign of the weighted sum was useful for determining the output as the threshold was set to 0, -1, or +1. This makes ADALINE different from the normal perceptron.
Delta rule is similar to the Perceptron Learning Rule, with some differences:

Error in DR is not restricted to having values of 0, 1, or -1, but may have any value
DR can be derived for any differentiable output/activation function f, whereas in PLR only works for threshold output function

Sometimes only when the Widrow-Hoff is applied to binary targets specifically, it is referred to as Delta Rule, but the terms seem to be used often interchangeably. The delta rule is considered to a special case of the back-propagation algorithm.
Delta rule also closely resembles the Rescorla-Wagner model under which Pavlovian conditioning occurs.

Competitive Learning

Competitive learning is considered a variant of Hebbian learning, but it is special enough to be discussed separately. Competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data.
Models and algorithms based on the principle of competitive learning include vector quantization and self-organizing maps.