Wavelet Tree
The Wavelet Tree is a succinct data structure to store strings in compressed space. It generalizes the and operations defined on bitvectors to arbitrary alphabets.
Originally introduced to represent compressed suffix arrays, it has found application in several contexts. The tree is defined by recursively partitioning the alphabet into pairs of subsets; the leaves correspond to individual symbols of the alphabet, and at each node a bitvector stores whether a symbol of the string belongs to one subset or the other.
The name derives from an analogy with the wavelet transform for signals, which recursively decomposes a signal into low-frequency and high-frequency components.
Properties
Let be a finite alphabet with. By using succinct dictionaries in the nodes, a string can be stored in, where is the order-0 empirical entropy of.If the tree is balanced, the operations,, and can be supported in time.
Access operation
A wavelet tree contains a bitmap representation of a string. If we know the alphabet set, then the exact string can be inferred by tracking bits down the tree. To find the letter at ith position in the string :-Input:
- The position i in the string of which we want to know the letter, starting at 1.
- The top node W of the wavelet tree that represents the string
Output: The letter at position i
if W.isLeafNode return W.letter
if W.bitvector = 0 return access
else 'return' access
In this context, the rank of a position in a bitvector is the number of ones that appear in the first positions of. Because the rank can be calculated in O by using succinct dictionaries, any S in string S can be accessed in time, as long as the tree is balanced.