A* search algorithm


A* is a graph traversal and pathfinding algorithm that is used in many fields of computer science due to its completeness, optimality, and optimal efficiency. Given a weighted graph, a source node and a goal node, the algorithm finds the shortest path from source to goal.
One major practical drawback is its space complexity where is the depth of the shallowest solution and is the branching factor. In practical travel-routing systems, it is generally outperformed by algorithms that can pre-process the graph to attain better performance, as well as by memory-bounded approaches; however, A* is still the best solution in many cases.
Peter Hart, Nils Nilsson and Bertram Raphael of Stanford Research Institute first published the algorithm in 1968. It can be seen as an extension of Dijkstra's algorithm. A* achieves better performance by using heuristics to guide its search.
The A* algorithm terminates once it finds the shortest path to a specified goal, rather than generating the entire shortest-path tree from a specified source to all possible goals.

History

A* was created as part of the Shakey project, which had the aim of building a mobile robot that could plan its own actions. Nils Nilsson originally proposed using the Graph Traverser algorithm for Shakey's path planning. Graph Traverser is guided by a heuristic function, the estimated distance from node to the goal node: it entirely ignores, the distance from the start node to. Bertram Raphael suggested using the sum,. Peter Hart invented the concepts we now call admissibility and consistency of heuristic functions. A* was originally designed for finding least-cost paths when the cost of a path is the sum of its costs, but it has been shown that A* can be used to find optimal paths for any problem satisfying the conditions of a cost algebra.
The original 1968 A* paper contained a theorem stating that no A*-like algorithm could expand fewer nodes than A* if the heuristic function is consistent and A*'s tie-breaking rule is suitably chosen. A "correction" was published a few years later claiming that consistency was not required, but this was shown to be false in 1985 in Dechter and Pearl's definitive study of A*'s optimality, which gave an example of A* with a heuristic that was admissible but not consistent expanding arbitrarily more nodes than an alternative A*-like algorithm.

Description

A* is an informed search algorithm, or a best-first search, meaning that it is formulated in terms of weighted graphs: starting from a specific starting node of a graph, it aims to find a path to the given goal node having the smallest cost. It does this by maintaining a tree of paths originating at the start node and extending those paths one edge at a time until the goal node is reached.
At each iteration of its main loop, A* needs to determine which of its paths to extend. It does so based on the cost of the path and an estimate of the cost required to extend the path all the way to the goal. Specifically, A* selects the path that minimizes
where is the next node on the path, is the cost of the path from the start node to, and is a heuristic function that estimates the cost of the cheapest path from to the goal. The heuristic function is problem-specific.
Typical implementations of A* use a priority queue to perform the repeated selection of minimum cost nodes to expand. This priority queue is known as the open set, fringe or frontier. At each step of the algorithm, the node with the lowest value is removed from the queue, the and values of its neighbors are updated accordingly, and these neighbors are added to the queue. The algorithm continues until a removed node is a goal node. The value of that goal is then also the cost of the shortest path, since at the goal is zero in an admissible heuristic.
The algorithm described so far only gives the length of the shortest path. To find the actual sequence of steps, the algorithm can be easily revised so that each node on the path keeps track of its predecessor. After this algorithm is run, the ending node will point to its predecessor, and so on, until some node's predecessor is the start node.
As an example, when searching for the shortest route on a map, might represent the straight-line distance to the goal, since that is physically the smallest possible distance between any two points. For a grid map from a video game, using the Taxicab distance or the Chebyshev distance becomes better depending on the set of movements available.
If the heuristic satisfies the additional condition for every edge of the graph, then is called monotone, or consistent. With a consistent heuristic, A* is guaranteed to find an optimal path without processing any node more than once and A* is equivalent to running Dijkstra's algorithm with the reduced cost.

Pseudocode

The following pseudocode describes the algorithm:

function reconstruct_path
total_path :=
while current in cameFrom.Keys:
current := cameFrom
total_path.prepend
return total_path
// A* finds a path from start to goal.
// h is the heuristic function. h estimates the cost to reach goal from node n.
function A_Star
// The set of discovered nodes that may need to be expanded.
// Initially, only the start node is known.
// This is usually implemented as a min-heap or priority queue rather than a hash-set.
openSet :=
// For node n, cameFrom is the node immediately preceding it on the cheapest path from the start
// to n currently known.
cameFrom := an empty map
// For node n, gScore is the currently known cost of the cheapest path from start to n.
gScore := map with default value of Infinity
gScore := 0
// For node n, fScore := gScore + h. fScore represents our current best guess as to
// how cheap a path could be from start to finish if it goes through n.
fScore := map with default value of Infinity
fScore := h
while openSet is not empty
// This operation can occur in O time if openSet is a min-heap or a priority queue
current := the node in openSet having the lowest fScore value
if current = goal
return reconstruct_path
openSet.Remove
for each neighbor of current
// d is the weight of the edge from current to neighbor
// tentative_gScore is the distance from start to the neighbor through current
tentative_gScore := gScore + d
if tentative_gScore < gScore
// This path to neighbor is better than any previous one. Record it!
cameFrom := current
gScore := tentative_gScore
fScore := tentative_gScore + h
if neighbor not in openSet
openSet.add
// Open set is empty but goal was never reached
return failure

Remark: In this pseudocode, if a node is reached by one path, removed from openSet, and subsequently reached by a cheaper path, it will be added to openSet again. This is essential to guarantee that the path returned is optimal if the heuristic function is admissible but not consistent. If the heuristic is consistent, when a node is removed from openSet the path to it is guaranteed to be optimal so the test ‘tentative_gScore < gScore’ will always fail if the node is reached again. The pseudocode implemented here is sometimes called the graph-search version of A*. This is in contrast with the version without the ‘tentative_gScore < gScore’ test to add nodes back to openSet, which is sometimes called the tree-search version of A* and require a consistent heuristic to guarantee optimality.
Image:Astar progress animation.gif|thumb|Illustration of A* search for finding path from a start node to a goal node in a robot motion planning problem. The empty circles represent the nodes in the open set, i.e., those that remain to be explored, and the filled ones are in the closed set. Color on each closed node indicates the distance from the goal: the greener, the closer. One can first see the A* moving in a straight line in the direction of the goal, then when hitting the obstacle, it explores alternative routes through the nodes from the open set.

Example

An example of an A* algorithm in action where nodes are cities connected with roads and h is the straight-line distance to the target point:
Key: green: start; blue: goal; orange: visited
The A* algorithm has real-world applications. In this example, edges are railroads and h is the great-circle distance to the target. The algorithm is searching for a path between Washington, D.C., and Los Angeles.

Implementation details

There are a number of simple optimizations or implementation details that can significantly affect the performance of an A* implementation. The first detail to note is that the way the priority queue handles ties can have a significant effect on performance in some situations. If ties are broken so the queue behaves in a LIFO manner, A* will behave like depth-first search among equal cost paths.
When a path is required at the end of the search, it is common to keep with each node a reference to that node's parent. At the end of the search, these references can be used to recover the optimal path. If these references are being kept then it can be important that the same node doesn't appear in the priority queue more than once. A standard approach here is to check if a node about to be added already appears in the priority queue. If it does, then the priority and parent pointers are changed to correspond to the lower-cost path. A standard binary heap based priority queue does not directly support the operation of searching for one of its elements, but it can be augmented with a hash table that maps elements to their position in the heap, allowing this decrease-priority operation to be performed in logarithmic time. Alternatively, a Fibonacci heap can perform the same decrease-priority operations in constant amortized time.