Heap (data structure)


In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is the parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.
The heap is one maximally efficient implementation of an abstract data type called a priority queue, and in fact, priority queues are often referred to as "heaps", regardless of how they may be implemented. In a heap, the highest priority element is always stored at the root. However, a heap is not a sorted structure; it can be regarded as being partially ordered. A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest priority, or when insertions need to be interspersed with removals of the root node.
A common implementation of a heap is the binary heap, in which the tree is a complete binary tree. The heap data structure, specifically the binary heap, was introduced by J. W. J. Williams in 1964, as a data structure for the heapsort sorting algorithm. Heaps are also crucial in several efficient graph algorithms such as Dijkstra's algorithm. When a heap is a complete binary tree, it has the smallest possible height—a heap with N nodes and a branches for each node always has loga N height.
Note that, as shown in the graphic, there is no implied ordering between siblings or cousins and no implied sequence for an in-order traversal. The heap relation mentioned above applies only between nodes and their parents, grandparents. The maximum number of children each node can have depends on the type of heap.
Heaps are typically constructed in-place in the same array where the elements are stored, with their structure being implicit in the access pattern of the operations. Heaps differ in this way from other data structures with similar or in some cases better theoretic bounds such as radix trees in that they require no additional memory beyond that used for storing the keys.

Operations

The common operations involving heaps are:
;Basic
  • find-max : find a maximum item of a max-heap, or a minimum item of a min-heap, respectively
  • insert: adding a new key to the heap
  • extract-max : returns the node of maximum value from a max heap after removing it from the heap
  • delete-max : removing the root node of a max heap, respectively
  • replace: pop root and push a new key. This is more efficient than a pop followed by a push, since it only needs to balance once, not twice, and is appropriate for fixed-size heaps.
;Creation
  • create-heap: create an empty heap
  • heapify: create a heap out of given array of elements
  • merge : joining two heaps to form a valid new heap containing all the elements of both, preserving the original heaps.
  • meld: joining two heaps to form a valid new heap containing all the elements of both, destroying the original heaps.
;Inspection
  • size: return the number of items in the heap.
  • is-empty: return true if the heap is empty, false otherwise.
;Internal
  • increase-key or decrease-key: updating a key within a max- or min-heap, respectively
  • delete: delete an arbitrary node
  • sift-up: move a node up in the tree, as long as needed; used to restore heap condition after insertion. Called "sift" because node moves up the tree until it reaches the correct level, as in a sieve.
  • sift-down: move a node down in the tree, similar to sift-up; used to restore heap condition after deletion or replacement.

    Implementation using arrays

Heaps are usually implemented with an array, as follows:
  • Each element in the array represents a node of the heap, and
  • The parent / child relationship is defined implicitly by the elements' indices in the array.
For a binary heap, in the array, the first index contains the root element. The next two indices of the array contain the root's children. The next four indices contain the four children of the root's two child nodes, and so on. Therefore, given a node at index, its children are at indices and, and its parent is at index. This simple indexing scheme makes it efficient to move "up" or "down" the tree.
Balancing a heap is done by sift-up or sift-down operations. As we can build a heap from an array without requiring extra memory, heapsort can be used to sort an array in-place.
After an element is inserted into or deleted from a heap, the heap property may be violated, and the heap must be re-balanced by swapping elements within the array.
Although different types of heaps implement the operations differently, the most common way is as follows:
  • Insertion: Add the new element at the end of the heap, in the first available free space. If this will violate the heap property, sift up the new element until the heap property has been reestablished.
  • Extraction: Remove the root and insert the last element of the heap in the root. If this will violate the heap property, sift down the new root to reestablish the heap property.
  • Replacement: Remove the root and put the new element in the root and sift down. When compared to extraction followed by insertion, this avoids a sift up step.
Construction of a binary heap out of a given array of elements may be performed in linear time using the classic Floyd algorithm, with the worst-case number of comparisons equal to 2N − 2s2e2, where s2 is the sum of all digits of the binary representation of N and e2 is the exponent of 2 in the prime factorization of N. This is faster than a sequence of consecutive insertions into an originally empty heap, which is log-linear.

Variants

Applications

The heap data structure has many applications.
  • Heapsort: One of the best sorting methods being in-place and with no quadratic worst-case scenarios.
  • Selection algorithms: A heap allows access to the min or max element in constant time, and other selections can be done in sub-linear time on data that is in a heap.
  • Graph algorithms: By using heaps as internal traversal data structures, run time will be reduced by polynomial order. Examples of such problems are Prim's minimal-spanning-tree algorithm and Dijkstra's shortest-path algorithm.
  • Priority queue: A priority queue is an abstract concept like "a list" or "a map"; just as a list can be implemented with a linked list or an array, a priority queue can be implemented with a heap or a variety of other methods.
  • K-way merge: A heap data structure is useful to merge many already-sorted input streams into a single sorted output stream. Examples of the need for merging include external sorting and streaming results from distributed data such as a log structured merge tree. The inner loop is obtaining the min element, replacing with the next element for the corresponding input stream, then doing a sift-down heap operation.

    Programming language implementations

  • The C++ Standard Library provides the, and algorithms for heaps, which operate on arbitrary random access iterators. It treats the iterators as a reference to an array, and uses the array-to-heap conversion. It also provides the class Sequence container #Priority queue|, which wraps these facilities in a container-like class. However, there is no standard support for the replace, sift-up/sift-down, or decrease/increase-key operations.
  • The Boost C++ libraries include a heaps library. Unlike the STL, it supports decrease and increase operations, and supports additional types of heap: specifically, it supports d-ary, binomial, Fibonacci, pairing and skew heaps.
  • There is a for C and C++ with D-ary heap and B-heap support. It provides an STL-like API.
  • The standard library of the D programming language includes , which is implemented in terms of D's . Instances can be constructed from any . exposes an that allows iteration with D's built-in statements and integration with the range-based API of the .
  • For Haskell there is the module.
  • The Java platform provides a binary heap implementation with the class in the Java Collections Framework. This class implements by default a min-heap; to implement a max-heap, programmer should write a custom comparator. There is no support for the replace, sift-up/sift-down, or decrease/increase-key operations.
  • Python has a module that implements a priority queue using a binary heap. The library exposes a heapreplace function to support k-way merging.
  • PHP has both max-heap and min-heap as of version 5.3 in the Standard PHP Library.
  • Perl has implementations of binary, binomial, and Fibonacci heaps in the distribution available on CPAN.
  • The Go language contains a package with heap algorithms that operate on an arbitrary type that satisfies a given interface. That package does not support the replace, sift-up/sift-down, or decrease/increase-key operations.
  • Apple's Core Foundation library contains a structure.
  • Pharo has an implementation of a heap in the Collections-Sequenceable package along with a set of test cases. A heap is used in the implementation of the timer event loop.
  • The Rust programming language has a binary max-heap implementation, , in the module of its standard library.
  • .NET has class which uses quaternary min-heap implementation. It is available from.NET 6.