Hopfield network
A Hopfield network is a form of recurrent neural network, or a spin glass system, that can serve as a content-addressable memory. The Hopfield network, named for John Hopfield, consists of a single layer of neurons, where each neuron is connected to every other neuron except itself. These connections are bidirectional and symmetric, meaning the weight of the connection from neuron i to neuron j is the same as the weight from neuron j to neuron i. Patterns are associatively recalled by fixing certain inputs, and dynamically evolve the network to minimize an energy function, towards local energy minimum states that correspond to stored patterns. Patterns are associatively learned by a Hebbian learning algorithm.
One of the key features of Hopfield networks is their ability to recover complete patterns from partial or noisy inputs, making them robust in the face of incomplete or corrupted data. Their connection to statistical mechanics, recurrent networks, and human cognitive psychology has led to their application in various fields, including physics, psychology, neuroscience, and machine learning theory and practice. Due to their binary-valued neurons, limited scalability, and incompatibility with gradient-based learning, classical Hopfield networks are rarely used in modern machine learning.
History
One origin of associative memory is human cognitive psychology, specifically the associative memory. Frank Rosenblatt studied "close-loop cross-coupled perceptrons", which are 3-layered perceptron networks whose middle layer contains recurrent connections that change by a Hebbian learning rule.Another model of associative memory is where the output does not loop back to the input. W. K. Taylor proposed such a model trained by Hebbian learning in 1956. Karl Steinbuch, who wanted to understand learning, and was inspired by watching his children learn, published the Lernmatrix in 1961. It was translated to English in 1963. Similar research was done with the correlogram of D. J. Willshaw et al. in 1969. Teuvo Kohonen trained an associative memory by gradient descent in 1974.
Another origin of associative memory was statistical mechanics. The Ising model was published in 1920s as a model of magnetism, however it studied the thermal equilibrium, which does not change with time. Roy J. Glauber in 1963 studied the Ising model evolving in time, as a process towards thermal equilibrium, adding in the component of time.
The second component to be added was adaptation to stimulus. This component has been added independently by different sources, including Rosenblatt, Kaoru Nakano, and Shun'ichi Amari. They proposed to modify the weights of an Ising model by Hebbian learning rule as a model of associative memory. The same idea was published by in 1974, who was acknowledged by Hopfield in his 1982 paper.
See Carpenter and Cowan for a technical description of some of these early works in associative memory.
The Sherrington–Kirkpatrick model of spin glass, published in 1975, is the Hopfield network with random initialization. Sherrington and Kirkpatrick found that it is highly likely for the energy function of the SK model to have many local minima. In the 1982 paper, Hopfield applied this recently developed theory to study the Hopfield network with binary activation functions. In a 1984 paper he extended this to continuous activation functions. It became a standard model for the study of neural networks through statistical mechanics.
A major advance in memory storage capacity was developed by Dimitry Krotov and Hopfield in 2016 through a change in network dynamics and energy function. This idea was further extended by Demircigil and collaborators in 2017. The continuous dynamics of large memory capacity models was developed in a series of papers between 2016 and 2020. Large memory storage capacity Hopfield Networks are now called Dense Associative Memories or modern Hopfield networks.
In 2024, John J. Hopfield and Geoffrey E. Hinton were awarded the Nobel Prize in Physics for their foundational contributions to machine learning, such as the Hopfield network.
Structure
The units in Hopfield nets are binary threshold units, i.e. the units only take on two different values for their states, and the value is determined by whether or not the unit's input exceeds its threshold. Discrete Hopfield nets describe relationships between binary neurons. At a certain time, the state of the neural net is described by a vector, which records which neurons are firing in a binary word of bits.The interactions between neurons have units that usually take on values of 1 or −1, and this convention will be used throughout this article. However, other literature might use units that take values of 0 and 1. These interactions are "learned" via Hebb's law of association, such that, for a certain state and distinct nodes
but.
Once the network is trained, no longer evolve. If a new state of neurons is introduced to the neural network, the net acts on neurons such that
- if
- if
The connections in a Hopfield net typically have the following restrictions:
Hopfield also modeled neural nets for continuous values, in which the electric output of each neuron is not binary but some value between 0 and 1. He found that this type of network was also able to store and reproduce memorized states.
Notice that every pair of units i and j in a Hopfield network has a connection that is described by the connectivity weight. In this sense, the Hopfield network can be formally described as a complete undirected graph, where is a set of McCulloch–Pitts neurons and is a function that links pairs of units to a real value, the connectivity weight.
Updating
Updating one unit in the Hopfield network is performed using the following rule:where:
- is the strength of the connection weight from unit j to unit i.
- is the state of unit i.
- is the threshold of unit i.
- Asynchronous: Only one unit is updated at a time. This unit can be picked at random, or a pre-defined order can be imposed from the very beginning.
- Synchronous: All units are updated at the same time. This requires a central clock to the system in order to maintain synchronization. This method is viewed by some as less realistic, based on an absence of observed global clock influencing analogous biological or physical systems of interest.
Neurons "attract or repel each other" in state space
- when, the contribution of j in the weighted sum is positive. Thus, is pulled by j towards its value
- when, the contribution of j in the weighted sum is negative. Then again, is pushed by j towards its value
Convergence properties of discrete and continuous Hopfield networks
in his paper in 1990 studied discrete Hopfield networks and proved a generalized convergence theorem that is based on the connection between the network's dynamics and cuts in the associated graph. This generalization covered both asynchronous as well as synchronous dynamics and presented elementary proofs based on greedy algorithms for max-cut in graphs. A subsequent paper further investigated the behavior of any neuron in both discrete-time and continuous-time Hopfield networks when the corresponding energy function is minimized during an optimization process. Bruck showed that neuron j changes its state if and only if it further decreases the following biased pseudo-cut. The discrete Hopfield network minimizes the following biased pseudo-cut for the synaptic weight matrix of the Hopfield net.where and represents the set of neurons which are −1 and +1, respectively, at time. For further details, see the recent paper.
The discrete-time Hopfield Network always minimizes exactly the following pseudo-cut
The continuous-time Hopfield network always minimizes an upper bound to the following weighted cut
where is a zero-centered sigmoid function.
The complex Hopfield network, on the other hand, generally tends to minimize the so-called shadow-cut of the complex weight matrix of the net.
Energy
Hopfield nets have a scalar value associated with each state of the network, referred to as the "energy", E, of the network, where:This quantity is called "energy" because it either decreases or stays the same upon network units being updated. Furthermore, under repeated updating the network will eventually converge to a state which is a local minimum in the energy function. Thus, if a state is a local minimum in the energy function it is a stable state for the network. Note that this energy function belongs to a general class of models in physics under the name of Ising models; these in turn are a special case of Markov networks, since the associated probability measure, the Gibbs measure, has the Markov property.