Particle swarm optimization
In computational science, particle swarm optimization is a computational method that optimizes a problem by iteratively trying to improve a population of candidate solutions with regard to a given measure of quality. It solves a problem through interactions among a population of candidate solutions, dubbed particles, moving the particles around in the search-space according to simple mathematical formulae that adjust each particle's position and velocity. Each particle's movement is influenced by its own best known position so far, and by the best known position in its topological neighborhood ; vectors are updated as better positions are found. This is expected to move the swarm toward good solutions.
PSO is originally attributed to Kennedy and Eberhart and was first intended for simulating social behaviour, as a stylized representation of the movement of organisms in a bird flock or fish school, or the evolution of attitudes in a human population. Simulation of principles of social behavior was observed to be capable of solving hard mathematical problems. The book by Kennedy and Eberhart describes many philosophical aspects of PSO and swarm intelligence. An extensive survey of PSO applications is made by Poli. In 2017, a comprehensive review on theoretical and experimental works on PSO was published by Bonyadi and Michalewicz.
PSO is a metaheuristic as it makes few or no assumptions about the problem being optimized and can search very large spaces of candidate solutions. Also, PSO does not use the gradient of the problem being optimized, which means PSO does not require that the optimization problem be differentiable as is required by classic optimization methods such as gradient descent and quasi-newton methods. However, metaheuristics such as PSO do not guarantee an optimal solution is ever found.
Algorithm
A basic variant of the PSO algorithm is initialized with a connected population of candidate solutions. A candidate solution is a vector of numeric values which can be considered as coordinates of a point in a search space; as an iteratively moving point it can be conceptualized as a particle. The particles move around in the search-space according to a few simple formulae. Each particle has some neighbors it is connected to, where the neighborhood may be a few or all other population members. The next position of a particle is stochastically determined by its own best-so-far position in the search space as well as the particle's best neighbor's best-so-far position. When an improved position is discovered - one that produces a better result in the objective function - the best-so-far position of the particle is updated. The process is repeated and by doing so it is expected, but not guaranteed, that a satisfactory solution will eventually be discovered.Formally, let f: ℝn → ℝ be the cost function which must be minimized. The function takes a candidate solution as an argument in the form of a vector of real numbers and produces a real number as output which indicates the objective function value of the given candidate solution. The gradient of f is not known. The goal is to find a solution a for which f ≤ f for all b in the search-space, which would mean a is the global minimum.
Let S be the number of particles in the swarm, each having a position xi ∈ ℝn in the search-space and a velocity vi ∈ ℝn. Let pi be the best known position of particle i and let g be the best known position of the particle's neighborhood. A basic PSO algorithm to minimize the cost function is then:
for each particle i = 1, ..., S do
Initialize the particle's position with a uniformly distributed random vector: xi ~ U
Initialize the particle's best known position to its initial position: pi ← xi
if f < f then
update the swarm's best known position: g ← pi
Initialize the particle's velocity: vi ~ U
while a termination criterion is not met do:
for each particle i = 1, ..., S do
for each dimension d = 1, ..., n do
Pick random numbers: rp, rg ~ U
Update the particle's velocity: vi,d ← w vi,d + φp rp + φg rg
Update the particle's position: xi ← xi + vi
if f < f then
Update the particle's best known position: pi ← xi
if f < f then
Update the swarm's best known position: g ← pi
The values blo and bup represent the lower and upper boundaries of the search-space respectively. The w parameter is the inertia weight. The parameters φp and φg are often called cognitive coefficient and social coefficient.
The termination criterion can be the number of iterations performed, or a solution where the adequate objective function value is found. The parameters w, φp, and φg are selected by the practitioner and control the behaviour and efficacy of the PSO method.
Parameter selection
The choice of PSO parameters can have a large impact on optimization performance. Selecting PSO parameters that yield good performance has therefore been the subject of much research.To prevent divergence the inertia weight must be smaller than 1. The two other parameters can be then derived thanks to the constriction approach, or freely selected, but the analyses suggest convergence domains to constrain them. Typical values are in.
The PSO parameters can also be tuned by using another overlaying optimizer, a concept known as meta-optimization, or even fine-tuned during the optimization, e.g., by means of fuzzy logic.
Parameters have also been tuned for various optimization scenarios.
Neighbourhoods and topologies
The topology of the swarm defines the subset of particles with which each particle can exchange information. The basic version of the algorithm uses the global topology as the swarm communication structure. This topology allows all particles to communicate with all the other particles, thus the whole swarm share the same best position g from a single particle. However, this approach might lead the swarm to be trapped into a local minimum, thus different topologies have been used to control the flow of information among particles. For instance, in local topologies, particles only share information with a subset of particles. This subset can be a geometrical one – for example "the m nearest particles" – or, more often, a social one, i.e. a set of particles that is not depending on any distance. In such cases, the PSO variant is said to be local best.A commonly used swarm topology is the ring, in which each particle has just two neighbours, but there are many others. The topology is not necessarily static. In fact, since the topology is related to the diversity of communication of the particles, some efforts have been done to create adaptive topologies
By using the ring topology, PSO can attain generation-level parallelism, significantly enhancing the evolutionary speed.
Inner workings
There are several schools of thought as to why and how the PSO algorithm can perform optimization.A common belief amongst researchers is that the swarm behaviour varies between exploratory behaviour, that is, searching a broader region of the search-space, and exploitative behaviour, that is, a locally oriented search so as to get closer to a optimum. This school of thought has been prevalent since the inception of PSO. This school of thought contends that the PSO algorithm and its parameters must be chosen so as to properly balance between exploration and exploitation to avoid premature convergence to a local optimum yet still ensure a good rate of convergence to the optimum. This belief is the precursor of many PSO variants, see [|below].
Another school of thought is that the behaviour of a PSO swarm is not well understood in terms of how it affects actual optimization performance, especially for higher-dimensional search-spaces and optimization problems that may be discontinuous, noisy, and time-varying. This school of thought merely tries to find PSO algorithms and parameters that cause good performance regardless of how the swarm behaviour can be interpreted in relation to e.g. exploration and exploitation. Such studies have led to the simplification of the PSO algorithm, see below.
Convergence
In relation to PSO the word convergence typically refers to two different definitions:- Convergence of the sequence of solutions in which all particles have converged to a point in the search-space, which may or may not be the optimum,
- Convergence to a local optimum where all personal bests p or, alternatively, the swarm's best known position g, approaches a local optimum of the problem, regardless of how the swarm behaves.
Convergence to a local optimum has been analyzed for PSO in and. It has been proven that PSO needs some modification to guarantee finding a local optimum.
This means that determining the convergence capabilities of different PSO algorithms and parameters still depends on empirical results. One attempt at addressing this issue is the development of an "orthogonal learning" strategy for an improved use of the information already existing in the relationship between p and g, so as to form a leading converging exemplar and to be effective with any PSO topology. The aims are to improve the performance of PSO overall, including faster global convergence, higher solution quality, and stronger robustness. However, such studies do not provide theoretical evidence to actually prove their claims.