Random binary tree
In computer science and probability theory, a random binary tree is a binary tree selected at random from some probability distribution on binary trees. Different distributions have been used, leading to different properties for these trees.
Random binary trees have been used for analyzing the average-case complexity of data structures based on binary search trees. For this application it is common to use random trees formed by inserting nodes one at a time according to a random permutation. The resulting trees are very likely to have logarithmic depth and logarithmic Strahler number. The treap and related balanced binary search trees use update operations that maintain this random structure even when the update sequence is non-random.
Other distributions on random binary trees include the uniform discrete distribution in which all distinct trees are equally likely, distributions on a given number of nodes obtained by repeated splitting, binary tries and radix trees for random data, and trees of variable size generated by branching processes.
For random trees that are not necessarily binary, see random tree.
Background
A binary tree is a rooted tree in which each node may have up to two children, and those children are designated as being either left or right. It is sometimes convenient instead to consider extended binary trees in which each node is either an external node with zero children, or an internal node with exactly two children. A binary tree that is not in extended form may be converted into an extended binary tree by treating all its nodes as internal, and adding an external node for each missing child of an internal node. In the other direction, an extended binary tree with at least one internal node may be converted back into a non-extended binary tree by removing all its external nodes. In this way, these two forms are almost entirely equivalent for the purposes of mathematical analysis, except that the extended form allows a tree consisting of a single external node, which does not correspond to anything in the non-extended form. For the purposes of computer data structures, the two forms differ, as the external nodes of the first form may be represented explicitly as objects in a data structure.In a binary search tree the internal nodes are labeled by numbers or other ordered values, called keys, arranged so that an inorder traversal of the tree lists the keys in sorted order. The external nodes remain unlabeled. Binary trees may also be studied with all nodes unlabeled, or with labels that are not given in sorted order. For instance, the Cartesian tree data structure uses labeled binary trees that are not necessarily binary search trees.
A random binary tree is a random tree drawn from a certain probability distribution on binary trees. In many cases, these probability distributions are defined using a given set of keys, and describe the probabilities of binary search trees having those keys. However, other distributions are possible, not necessarily generating binary search trees, and not necessarily giving a fixed number of nodes.
From random permutations
For any sequence of distinct ordered keys, one may form a binary search tree in which each key is inserted in sequence as a leaf of the tree, without changing the structure of the previously inserted keys. The position for each insertion can be found by a binary search in the previous tree. The random permutation model, for a given set of keys, is defined by choosing the sequence randomly from the permutations of the set, with each permutation having equal probability.For instance, if the three keys 1,3,2 are inserted into a binary search tree in that sequence, the number 1 will sit at the root of the tree, the number 3 will be placed as its right child, and the number 2 as the left child of the There are six different permutations of the keys 1,2, and 3, but only five trees may be constructed from them. That is because the permutations 2,1,3 and 2,3,1 form the same tree. Thus, this tree has probability of being generated, whereas the other four trees each have
Expected depth of a node
For any key in a given set of keys, the expected value of the length of the path from the root to in a random binary search tree is at most, where "" denotes the natural logarithm function and the introduces big O notation. By linearity of expectation, the expected number of ancestors of equals the sum, over other keys, of the probability that is an ancestor of. A key is an ancestor of exactly when is the first key to be inserted from the interval. Because each key in the interval is equally likely to be first, this happens with probability inverse to the length of the interval. Thus, the keys that are adjacent to in the sorted sequence of keys have probability of being an ancestor of, the keys one step away have probability, etc. The sum of these probabilities forms two copies of the harmonic series extending away from in both directions in the sorted sequence, giving the bound above. This bound also holds for the expected search path length for a value that is one of the given keys.The longest path
The longest root-to-leaf path, in a random binary search tree, is longer than the expected path length, but only by a constant factor. Its length, for a tree with nodes, is with high probability approximatelywhere is the unique number in the range satisfying the equation
Expected number of leaves
In the random permutation model, each key except the smallest and largest has probability of being a leaf in the tree. This is because it is a leaf when it inserted after its two neighbors, which happens for two out of the six permutations of it and its two neighbors, all of which are equally likely. By similar reasoning, the smallest and largest key have probability of being a leaf. Therefore, the expected number of leaves is the sum of these probabilities, which for is exactly.Strahler number
The Strahler number of vertices in any tree is a measure of the complexity of the subtrees under those vertices. A leaf has Strahler number one. For any other node, the Strahler number is defined recursively from the Strahler numbers of its children. In a binary tree, if two children have different Strahler numbers, the Strahler number of their parent is the larger of the two child numbers. But if two children have equal Strahler numbers, their parent has a number that is greater by one. The Strahler number of the whole tree is the number at the root node. For -node random binary search trees, simulations suggest that the expected Strahler number is. A weaker upper bound has been proven.Treaps and randomized binary search trees
In applications of binary search tree data structures, it is rare for the keys to be inserted without deletion in a random order, limiting the direct applications of random binary trees. However, algorithm designers have devised data structures that allow arbitrary insertions and deletions to preserve the property that the shape of the tree is random, as if the keys had been inserted randomly.If a given set of keys is assigned numeric priorities, these priorities may be used to construct a Cartesian tree for the numbers, the binary search tree that would result from inserting the keys in priority order. By choosing the priorities to be independent random real numbers in the unit interval, and by maintaining the Cartesian tree structure using tree rotations after any insertion or deletion of a node, it is possible to maintain a data structure that behaves like a random binary search tree. Such a data structure is known as a treap or a randomized binary search tree.
Variants of the treap including the zip tree and zip-zip tree replace the tree rotations by "zipping" operations that split and merge trees, and that limit the number of random bits that need to be generated and stored alongside the keys. The result of these optimizations is still a tree with a random structure, but one that does not exactly match the random permutation model.
Uniformly random binary trees
The number of binary trees with nodes is a Catalan number. For these numbers of trees areThus, if one of these trees is selected uniformly at random, its probability is the reciprocal of a Catalan number. Trees generated from a model in this distribution are sometimes called random binary Catalan trees. They have expected depth proportional to the square root of, rather than to the logarithm. More precisely, the expected depth of a randomly chosen node in an -node tree of this type is
The expected Strahler number of a uniformly random -node binary tree is, lower than the expected Strahler number of random binary search trees.
Due to their large heights, this model of equiprobable random trees is not generally used for binary search trees. However, it has other applications, including:
- Modeling the parse trees of algebraic expressions in compiler design. Here the internal nodes of the tree represent binary operations in an expression and the external nodes represent the variables or constants on which the expressions operate. The bound on Strahler number translates into the number of registers needed to evaluate an expression.
- Modeling river networks, the original application for which the Strahler number was developed.
- Modeling possible evolutionary trees for a fixed number of species. In this application, an extended binary tree is used, with the species at its external nodes.