Connectionism
Connectionism is an approach to the study of human mental processes and cognition that utilizes mathematical models known as connectionist networks or artificial neural networks.
Connectionism has had many "waves" since its beginnings. The first wave appeared 1943 with Warren Sturgis McCulloch and Walter Pitts both focusing on comprehending neural circuitry through a formal and mathematical approach, and Frank Rosenblatt who published the 1958 paper "The Perceptron: A Probabilistic Model For Information Storage and Organization in the Brain" in Psychological Review, while working at the Cornell Aeronautical Laboratory. The first wave ended with the 1969 book Perceptrons about limitations of the original perceptron idea, written by Marvin Minsky and Seymour Papert, which contributed to discouraging major funding agencies in the US from investing in connectionist research. With a few noteworthy deviations, most connectionist research entered a period of inactivity until the mid-1980s. The term connectionist model was reintroduced in a 1982 paper in the journal Cognitive Science by Jerome Feldman and Dana Ballard.
The second wave blossomed in the late 1980s, following a 1987 book Parallel Distributed Processing by James L. McClelland, David E. Rumelhart, et al., which introduced a couple of improvements to the simple perceptron idea, such as intermediate processors alongside input and output units, and used a sigmoid activation function instead of the old "all-or-nothing" function. Their work built upon that of John Hopfield, who was a key figure investigating the mathematical characteristics of sigmoid activation functions. From the late 1980s to the mid-1990s, connectionism took on an almost revolutionary tone when Schneider, Terence Horgan and Tienson posed the question of whether connectionism represented a fundamental shift in psychology and so-called "good old-fashioned AI", or GOFAI. Some advantages of the second wave connectionist approach included its applicability to a broad array of functions, structural approximation to biological neurons, low requirements for innate structure, and capacity for graceful degradation. Its disadvantages included the difficulty in deciphering how ANNs process information or account for the compositionality of mental representations, and a resultant difficulty explaining phenomena at a higher level.
The current wave has been marked by advances in deep learning, which have made possible the creation of large language models. The success of deep-learning networks in the past decade has greatly increased the popularity of this approach, but the complexity and scale of such networks has brought with them increased interpretability problems.
Basic principle
The central connectionist principle is that mental phenomena can be described by interconnected networks of simple and often uniform units. The form of the connections and the units can vary from model to model. For example, units in the network could represent neurons and the connections could represent synapses, as in the human brain. This principle has been seen as an alternative to GOFAI and the classical theories of mind based on symbolic computation, but the extent to which the two approaches are compatible has been the subject of much debate since their inception.Activation function
Internal states of any network change over time due to neurons sending a signal to a succeeding layer of neurons in the case of a feedforward network, or to a previous layer in the case of a recurrent network. Discovery of non-linear activation functions has enabled the second wave of connectionism.Memory and learning
Neural networks follow two basic principles:- Any mental state can be described as a n-dimensional vector of numeric activation values over neural units in a network.
- Memory and learning are created by modifying the 'weights' of the connections between neural units, generally represented as an n×''m matrix. The weights are adjusted according to some learning rule or algorithm, such as Hebbian learning.
- Interpretation of units: Units can be interpreted as neurons or groups of neurons.
- Definition of activation: Activation can be defined in a variety of ways. For example, in a Boltzmann machine, the activation is interpreted as the probability of generating an action potential spike, and is determined via a logistic function on the sum of the inputs to a unit.
- Learning algorithm'': Different networks modify their connections differently. In general, any mathematically defined change in connection weights over time is referred to as the "learning algorithm".
Biological realism
Precursors
Precursors of the connectionist principles can be traced to early work in psychology, such as that of William James. Psychological theories based on knowledge about the human brain were fashionable in the late 19th century. As early as 1869, the neurologist John Hughlings Jackson argued for multi-level, distributed systems. Following from this lead, Herbert Spencer's Principles of Psychology, 3rd edition, and Sigmund Freud's Project for a Scientific Psychology propounded connectionist or proto-connectionist theories. These tended to be speculative theories. But by the early 20th century, Edward Thorndike was writing about human learning that posited a connectionist type network.Hopfield networks had precursors in the Ising model due to Wilhelm Lenz and Ernst Ising, though the Ising model conceived by them did not involve time. Monte Carlo simulations of Ising model required the advent of computers in the 1950s.
The first wave
The first wave begun in 1943 with Warren Sturgis McCulloch and Walter Pitts both focusing on comprehending neural circuitry through a formal and mathematical approach. McCulloch and Pitts showed how neural systems could implement first-order logic: Their classic paper "A Logical Calculus of Ideas Immanent in Nervous Activity" is important in this development here. They were influenced by the work of Nicolas Rashevsky in the 1930s and symbolic logic in the style of Principia Mathematica.Hebb contributed greatly to speculations about neural functioning, and proposed a learning principle, Hebbian learning. Lashley argued for distributed representations as a result of his failure to find anything like a localized engram in years of lesion experiments. Friedrich Hayek independently conceived the model, first in a brief unpublished manuscript in 1920, then expanded into a book in 1952.
The Perceptron machines were proposed and built by Frank Rosenblatt, who published the 1958 paper “The Perceptron: A Probabilistic Model For Information Storage and Organization in the Brain” in Psychological Review, while working at the Cornell Aeronautical Laboratory. He cited Hebb, Hayek, Uttley, and Ashby as main influences.
Another form of connectionist model was the relational network framework developed by the linguist Sydney Lamb in the 1960s.
The research group led by Widrow empirically searched for methods to train two-layered ADALINE networks, with limited success.
A method to train multilayered perceptrons with arbitrary levels of trainable weights was published by Alexey Grigorevich Ivakhnenko and Valentin Lapa in 1965, called the Group Method of Data Handling. This method employs incremental layer by layer training based on regression analysis, where useless units in hidden layers are pruned with the help of a validation set.
The first multilayered perceptrons trained by stochastic gradient descent was published in 1967 by Shun'ichi Amari. In computer experiments conducted by Amari's student Saito, a five layer MLP with two modifiable layers learned useful internal representations to classify non-linearily separable pattern classes.
In 1972, Shun'ichi Amari produced an early example of self-organizing network.
The neural network winter
There was some conflict among artificial intelligence researchers as to what neural networks are useful for. Around late 1960s, there was a widespread lull in research and publications on neural networks, "the neural network winter", which lasted through the 1970s, during which the field of artificial intelligence turned towards symbolic methods. The publication of Perceptrons is typically regarded as a catalyst of this event.The second wave
The second wave begun in the early 1980s. Some key publications included which popularized Hopfield networks, the 1986 paper that popularized backpropagation, and the 1987 two-volume book about the Parallel Distributed Processing by James L. McClelland, David E. Rumelhart et al., which has introduced a couple of improvements to the simple perceptron idea, such as intermediate processors alongside input and output units and using sigmoid activation function instead of the old 'all-or-nothing' function.Hopfield approached the field from the perspective of statistical mechanics, providing some early forms of mathematical rigor that increased the perceived respectability of the field. Another important series of publications proved that neural networks are universal function approximators, which also provided some mathematical respectability.
Some early popular demonstration projects appeared during this time. NETtalk learned to pronounce written English. It achieved popular success, appearing on the Today show. TD-Gammon reached top human level in backgammon.