Types of artificial neural networks
There are many types of artificial neural networks.
Artificial neural networks are computational models inspired by biological neural networks, and are used to approximate functions that are generally unknown. Particularly, they are inspired by the behaviour of neurons and the electrical signals they convey between input, processing, and output from the brain. The way neurons semantically communicate is an area of ongoing research. Most artificial neural networks bear only some resemblance to their more complex biological counterparts, but are very effective at their intended tasks.
Some artificial neural networks are adaptive systems and are used for example to model populations and environments, which constantly change.
Neural networks can be hardware- or software-based, and can use a variety of topologies and learning algorithms.
Feedforward
In feedforward neural networks the information moves from the input to output directly in every layer. There can be hidden layers with or without cycles/loops to sequence inputs. Feedforward networks can be constructed with various types of units, such as binary McCulloch–Pitts neurons, the simplest of which is the perceptron. Continuous neurons, frequently with sigmoidal activation, are used in the context of backpropagation.Group method of data handling
The Group Method of Data Handling features fully automatic structural and parametric model optimization. The node activation functions are Kolmogorov–Gabor polynomials that permit additions and multiplications. It uses a deep multilayer perceptron with eight layers. It is a supervised learning network that grows layer by layer, where each layer is trained by regression analysis. Useless items are detected using a validation set, and pruned through regularization. The size and depth of the resulting network depends on the task.Autoencoder
An autoencoder, autoassociator or Diabolo network is similar to the multilayer perceptron – with an input layer, an output layer and one or more hidden layers connecting them. However, the output layer has the same number of units as the input layer. Its purpose is to reconstruct its own inputs. Therefore, autoencoders are unsupervised learning models. An autoencoder is used for unsupervised learning of efficient codings, typically for the purpose of dimensionality reduction and for learning generative models of data.Probabilistic
A probabilistic neural network is a four-layer feedforward neural network. The layers are Input, hidden pattern, hidden summation, and output. In the PNN algorithm, the parent probability distribution function of each class is approximated by a Parzen window and a non-parametric function. Then, using PDF of each class, the class probability of a new input is estimated and Bayes’ rule is employed to allocate it to the class with the highest posterior probability. It was derived from the Bayesian network and a statistical algorithm called Kernel Fisher discriminant analysis. It is used for classification and pattern recognition.Time delay
A time delay neural network is a feedforward architecture for sequential data that recognizes features independent of sequence position. In order to achieve time-shift invariance, delays are added to the input so that multiple data points are analyzed together.It usually forms part of a larger pattern recognition system. It has been implemented using a perceptron network whose connection weights were trained with back propagation.
Convolutional
A convolutional neural network is a class of deep network, composed of one or more convolutional layers with fully connected layers on top. It uses tied weights and pooling layers. In particular, max-pooling. It is often structured via Fukushima's convolutional architecture. They are variations of multilayer perceptrons that use minimal preprocessing. This architecture allows CNNs to take advantage of the 2D structure of input data.Its unit connectivity pattern is inspired by the organization of the visual cortex. Units respond to stimuli in a restricted region of space known as the receptive field. Receptive fields partially overlap, over-covering the entire visual field. Unit response can be approximated mathematically by a convolution operation.
CNNs are suitable for processing visual and other two-dimensional data. They have shown superior results in both image and speech applications. They can be trained with standard backpropagation. CNNs are easier to train than other regular, deep, feed-forward neural networks and have many fewer parameters to estimate.
Capsule Neural Networks add structures called capsules to a CNN and reuse output from several capsules to form more stable representations.
Examples of applications in computer vision include DeepDream and robot navigation. They have wide applications in image and video recognition, recommender systems and natural language processing.
Deep stacking network
A deep stacking network is based on a hierarchy of blocks of simplified neural network modules. It was introduced in 2011 by Deng and Yu. It formulates the learning as a convex optimization problem with a closed-form solution, emphasizing the mechanism's similarity to stacked generalization. Each DSN block is a simple module that is easy to train by itself in a supervised fashion without backpropagation for the entire blocks.Each block consists of a simplified multi-layer perceptron with a single hidden layer. The hidden layer h has logistic sigmoidal units, and the output layer has linear units. Connections between these layers are represented by weight matrix U; input-to-hidden-layer connections have weight matrix W. Target vectors t form the columns of matrix T, and the input data vectors x form the columns of matrix X. The matrix of hidden units is. Modules are trained in order, so lower-layer weights W are known at each stage. The function performs the element-wise logistic sigmoid operation. Each block estimates the same final label class y, and its estimate is concatenated with original input X to form the expanded input for the next block. Thus, the input to the first block contains the original data only, while downstream blocks' input adds the output of preceding blocks. Then learning the upper-layer weight matrix U given other weights in the network can be formulated as a convex optimization problem:
which has a closed-form solution.
Unlike other deep architectures, such as DBNs, the goal is not to discover the transformed feature representation. The structure of the hierarchy of this kind of architecture makes parallel learning straightforward, as a batch-mode optimization problem. In purely discriminative tasks, DSNs outperform conventional DBNs.
Tensor deep stacking networks
This architecture is a DSN extension. It offers two important improvements: it uses higher-order information from covariance statistics, and it transforms the non-convex problem of a lower-layer to a convex sub-problem of an upper-layer. TDSNs use covariance statistics in a bilinear mapping from each of two distinct sets of hidden units in the same layer to predictions, via a third-order tensor.While parallelization and scalability are not considered seriously in conventional, all learning for s and s is done in batch mode, to allow parallelization. Parallelization allows scaling the design to larger architectures and data sets.
The basic architecture is suitable for diverse tasks such as classification and regression.
Physics-informed
Such a neural network is designed for the numerical solution of mathematical equations, such as differential, integral, delay, fractional and others. As input parameters, PINN accepts variables, transmits them through the network block. At the output, it produces an approximate solution and substitutes it into the mathematical model, considering the initial and boundary conditions. If the solution does not satisfy the required accuracy, one uses the backpropagation and rectify the solution.Besides PINN, other architectures have been developed to produce surrogate models for scientific computing tasks. Examples include the DeepONet, integral neural operators, and neural fields.
Regulatory feedback
Regulatory feedback networks account for feedback found throughout brain recognition processing areas. Instead of recognition-inference being feedforward as in neural networks, regulatory feedback assumes inference iteratively compares inputs to outputs & neurons inhibit their own inputs, collectively evaluating how important and unique each input is for the next iteration. This ultimately finds neuron activations minimizing mutual input overlap, estimating distributions during recognition and offloading the need for complex neural network training & rehearsal.Regulatory feedback processing suggests an important real-time recognition processing role for ubiquitous feedback found between brain pre and post synaptic neurons, which is meticulously maintained by homeostatic plasticity: found to be kept in balance through multiple, often redundant, mechanisms. RF also inherently shows neuroscience phenomena such as Excitation-Inhibition balance, network-wide bursting followed by quieting, and human cognitive search phenomena of difficulty with similarity and pop-out when multiple inputs are present, without additional parameters.
A regulatory feedback network makes inferences using negative feedback. The feedback is used to find the optimal activation of units. It is most similar to a non-parametric method but is different from K-nearest neighbor in that it mathematically emulates feedforward networks.
Radial basis function
Radial basis functions are functions that have a distance criterion with respect to a center. Radial basis functions have been applied as a replacement for the sigmoidal hidden layer transfer characteristic in multi-layer perceptrons. RBF networks have two layers: In the first, input is mapped onto each RBF in the 'hidden' layer. The RBF chosen is usually a Gaussian. In regression problems the output layer is a linear combination of hidden layer values representing mean predicted output. The interpretation of this output layer value is the same as a regression model in statistics. In classification problems the output layer is typically a sigmoid function of a linear combination of hidden layer values, representing a posterior probability. Performance in both cases is often improved by shrinkage techniques, known as ridge regression in classical statistics. This corresponds to a prior belief in small parameter values in a Bayesian framework.RBF networks have the advantage of avoiding local minima in the same way as multi-layer perceptrons. This is because the only parameters that are adjusted in the learning process are the linear mapping from hidden layer to output layer. Linearity ensures that the error surface is quadratic and therefore has a single easily found minimum. In regression problems this can be found in one matrix operation. In classification problems the fixed non-linearity introduced by the sigmoid output function is most efficiently dealt with using iteratively re-weighted least squares.
RBF networks have the disadvantage of requiring good coverage of the input space by radial basis functions. RBF centres are determined with reference to the distribution of the input data, but without reference to the prediction task. As a result, representational resources may be wasted on areas of the input space that are irrelevant to the task. A common solution is to associate each data point with its own centre, although this can expand the linear system to be solved in the final layer and requires shrinkage techniques to avoid overfitting.
Associating each input datum with an RBF leads naturally to kernel methods such as support vector machines and Gaussian processes. All three approaches use a non-linear kernel function to project the input data into a space where the learning problem can be solved using a linear model. Like Gaussian processes, and unlike SVMs, RBF networks are typically trained in a maximum likelihood framework by maximizing the probability. SVMs avoid overfitting by maximizing instead a margin. SVMs outperform RBF networks in most classification applications. In regression applications they can be competitive when the dimensionality of the input space is relatively small.