Red–black tree
In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.
When the tree is modified, the new tree is rearranged and "repainted" to restore the coloring properties that constrain how unbalanced the tree can become in the worst case. The properties are designed such that this rearranging and recoloring can be performed efficiently.
The balancing is not perfect, but guarantees searching in time, where is the number of entries in the tree. The [|insert] and delete operations, along with tree rearrangement and recoloring, also execute in time.
Tracking the color of each node requires only one bit of information per node because there are only two colors. The tree does not contain any other data specific to it being a red–black tree, so its memory footprint is almost identical to that of a classic binary search tree. In some cases, the added bit of information can be stored at no added memory cost.
History
In 1972, Rudolf Bayer invented a data structure that was a special order-4 case of a B-tree. These trees maintained all paths from root to leaf with the same number of nodes, creating perfectly balanced trees. However, they were not binary search trees. Bayer called them a "symmetric binary B-tree" in his paper and later they became popular as 2–3–4 trees or even 2–3 trees.In a 1978 paper, "A Dichromatic Framework for Balanced Trees", Leonidas J. Guibas and Robert Sedgewick derived the red–black tree from the symmetric binary B-tree. The color "red" was chosen because it was the best-looking color produced by the color laser printer available to the authors while working at Xerox PARC. Another response from Guibas states that it was because of the red and black pens available to them to draw the trees.
In 1993, Arne Andersson introduced the idea of a right leaning tree to simplify insert and delete operations.
In 1999, Chris Okasaki showed how to make the insert operation purely functional. Its balance function needed to take care of only four unbalanced cases and one default balanced case.
The original algorithm used eight unbalanced cases, but reduced that to six unbalanced cases. Sedgewick showed that the insert operation can be implemented in just 46 lines of Java.
In 2008, Sedgewick proposed the left-leaning red–black tree, leveraging Andersson’s idea that simplified the insert and delete operations. Sedgewick originally allowed nodes whose two children are red, making his trees more like 2–3–4 trees, but later this restriction was added, making new trees more like 2–3 trees. Sedgewick implemented the insert algorithm in just 33 lines, significantly shortening his original 46 lines of code.
Terminology
The black depth of a node is defined as the number of black nodes from the root to that node. The black height of a red–black tree is the number of black nodes in any path from the root to the leaves, which, by [|requirement 4], is constant.The black height of a node is the black height of the subtree rooted by it. In this article, the black height of a null node shall be set to 0, because its subtree is empty as suggested by the example figure, and its tree height is also 0.
Properties
In addition to the requirements imposed on a binary search tree the following must be satisfied by a- Every node is either red or black.
- All null nodes are considered black.
- A red node does not have a red child.
- Every path from a given node to any of its leaf nodes goes through the same number of black nodes.
- If a node N has exactly one child, the child must be red. If the child were black, its leaves would sit at a different black depth than N's null node, violating requirement 4.
This article also omits it, because it slightly disturbs the recursive algorithms and proofs.
As an example, every perfect binary tree that consists only of black nodes is a red–black tree.
The read-only operations, such as search or tree traversal, do not affect any of the requirements. In contrast, the modifying operations insert and delete easily maintain requirements 1 and 2, but with respect to the other requirements some extra effort must be made, to avoid introducing a violation of requirement 3, called a red-violation, or of requirement 4, called a black-violation.
The requirements enforce a critical property of red–black trees: the path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf. The result is that the tree is height-balanced. Since operations such as inserting, deleting, and finding values require worst-case time proportional to the height of the tree, this upper bound on the height allows red–black trees to be efficient in the worst case, namely logarithmic in the number of entries, i.e. . For a mathematical proof see section [|Proof of bounds].
Red–black trees, like all binary search trees, allow quite efficient sequential access of their elements. But they support also asymptotically optimal direct access via a traversal from root to leaf, resulting in search time.
Analogy to 2–3–4 trees
Red–black trees are similar in structure to 2–3–4 trees, which are B-trees of order 4. In 2–3–4 trees, each node can contain between 1 and 3 values and have between 2 and 4 children. These 2–3–4 nodes correspond to black node – red children groups in red-black trees, as shown in figure 1. It is not a 1-to-1 correspondence, because 3-nodes have two equivalent representations: the red child may lie either to the left or right. The left-leaning red-black tree variant makes this relationship exactly 1-to-1, by only allowing the left child representation. Since every 2–3–4 node has a corresponding black node, invariant 4 of red-black trees is equivalent to saying that the leaves of a 2–3–4 tree all lie at the same level.Despite structural similarities, operations on red–black trees are more economical than B-trees. B-trees require management of vectors of variable length, whereas red-black trees are simply binary trees.
Applications and related data structures
Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures that provide worst-case guarantees. For example, many data structures used in computational geometry are based on red–black trees, and the Completely Fair Scheduler and epoll system call of the Linux kernel use red–black trees.The AVL tree is another structure supporting search, insertion, and removal. AVL trees can be colored red–black, and thus are a subset of red-black trees. The worst-case height of AVL is 0.720 times the worst-case height of red-black trees, so AVL trees are more rigidly balanced. The performance measurements of Ben Pfaff with realistic test cases in 79 runs find AVL to RB ratios between 0.677 and 1.077, median at 0.947, and geometric mean 0.910. The performance of WAVL trees lie in between AVL trees and red-black trees.
Red–black trees are also particularly valuable in functional programming, where they are one of the most common persistent data structures, used to construct associative arrays and sets that can retain previous versions after mutations. The persistent version of red–black trees requires space for each insertion or deletion, in addition to time.
For every 2–3–4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion operations on 2–3–4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2–3–4 trees an important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce 2–3–4 trees just before red–black trees, even though 2–3–4 trees are not often used in practice.
In 2008, Sedgewick introduced a simpler version of the red–black tree called the left-leaning red–black tree by eliminating a previously unspecified degree of freedom in the implementation. The LLRB maintains an additional invariant that all red links must lean left except during inserts and deletes. Red–black trees can be made isometric to either 2–3 trees, or 2–3–4 trees, for any sequence of operations. The 2–3–4 tree isometry was described in 1978 by Sedgewick. With 2–3–4 trees, the isometry is resolved by a "color flip," corresponding to a split, in which the red color of two children nodes leaves the children and moves to the parent node.
The original description of the tango tree, a type of tree optimised for fast searches, specifically uses red–black trees as part of its data structure.
As of Java 8, the HashMap has been modified such that instead of using a LinkedList to store different elements with colliding hashcodes, a red–black tree is used. This results in the improvement of time complexity of searching such an element from to where is the number of elements with colliding hashes.
Implementation
The read-only operations, such as search or tree traversal, on a red–black tree require no modification from those used for binary search trees, because every red–black tree is a special case of a simple binary search tree. However, the immediate result of an insertion or removal may violate the properties of a red–black tree, the restoration of which is called rebalancing so that red–black trees become self-balancing.Rebalancing has a worst-case time complexity of and average of, though these are very quick in practice. Additionally, rebalancing takes no more than three tree rotations.
This is an example implementation of insert and remove in C. Below are the data structures and the
rotate_subtree helper function used in the insert and remove examples.typedef enum Color: char Color;
typedef enum Direction: char Direction;
// red-black tree node
typedef struct Node Node;
typedef struct Tree;
static Direction direction
Node* rotate_subtree