Hilbert R-tree
Hilbert R-tree, an R-tree variant, is an index for multidimensional objects such as lines, regions, 3-D objects, or high-dimensional feature-based parametric objects. It can be thought of as an extension to B+-tree for multidimensional objects.
The performance of R-trees depends on the quality of the algorithm that clusters the data rectangles on a node. Hilbert R-trees use space-filling curves, and specifically the Hilbert curve, to impose a linear ordering on the data rectangles.
There are two types of Hilbert R-trees: one for static databases, and one for dynamic databases. In both cases Hilbert space-filling curves are used to achieve better ordering of multidimensional objects in the node. This ordering has to be "good", in the sense that it should group "similar" data rectangles together, to minimize the area and perimeter of the resulting minimum bounding rectangles. Packed Hilbert R-trees are suitable for static databases in which updates are very rare or in which there are no updates at all.
The dynamic Hilbert R-tree is suitable for dynamic databases where insertions, deletions, or updates may occur in real time. Moreover, dynamic Hilbert R-trees employ flexible deferred splitting mechanism to increase the space utilization. Every node has a well defined set of sibling nodes. This is done by proposing an ordering on the R-tree nodes. The Hilbert R-tree sorts rectangles according to the Hilbert value of the center of the rectangles. Given the ordering, every node has a well-defined set of sibling nodes; thus, deferred splitting can be used. By adjusting the split policy, the Hilbert R-tree can achieve a degree of space utilization as high as desired. To the contrary, other R-tree variants have no control over the space utilization.
The basic idea
Although the following example is for a static environment, it explains the intuitive principles for good R-tree design. These principles are valid for both static and dynamic databases.Roussopoulos and Leifker proposed a method for building a packed R-tree that achieves almost 100% space utilization. The idea is to sort the data on the x or y coordinate of one of the corners of the rectangles. Sorting on any of the four coordinates gives similar results. In this discussion points or rectangles are sorted on the x coordinate of the lower left corner of the rectangle, referred to as a "lowx packed R-tree". The sorted list of rectangles is scanned; successive rectangles are assigned to the same R-tree leaf node until that node is full; a new leaf node is then created, and the scanning of the sorted list continues. Thus, the nodes of the resulting R-tree will be fully packed, with the possible exception of the last node at each level. This leads to space utilization ≈100%. Higher levels of the tree are created in a similar way.
Figure 1 highlights the problem of the lowx packed R-tree. Figure 1 shows the leaf nodes of the R-tree that the lowx packing method will create for the points of Figure 1 . The fact that the resulting father nodes cover little area explains why the lowx packed R-tree achieves excellent performance for point queries. However, the fact that the fathers have large perimeters explains the degradation of performance for region queries. This is consistent with the analytical formulas for R-tree performance. Intuitively, the packing algorithm should ideally assign nearby points to the same leaf node. Ignorance of the y coordinate by the lowx packed R-tree tends to violate this empirical rule.
figure1 left figure1 right
Figure 1: 200 points uniformly distributed; MBR of nodes generated by the "lowx packed R-tree" algorithm
The section below describes two variants of the Hilbert R-trees. The first index is suitable for the static database in which updates are very rare or in which there are no updates at all. The nodes of the resulting R-tree will be fully packed, with the possible exception of the last node at each level. Thus, the space utilization is ≈100%; this structure is called a packed Hilbert R-tree. The second index, called a Dynamic Hilbert R-tree, supports insertions and deletions, and is suitable for a dynamic environment.
Packed Hilbert R-trees
The following provides a brief introduction to the Hilbert curve. The basic Hilbert curve on a 2x2 grid, denoted by H1 is shown in Figure 2. To derive a curve of order i, each vertex of the basic curve is replaced by the curve of order i – 1, which may be appropriately rotated and/or reflected. Figure 2 also shows the Hilbert curves of order two and three. When the order of the curve tends to infinity, like other space filling curves, the resulting curve is a fractal, with a fractal dimension of two. The Hilbert curve can be generalized for higher dimensionalities. Algorithms for drawing the two-dimensional curve of a given order can be found in and. An algorithm for higher dimensionalities is given in.The path of a space filling curve imposes a linear ordering on the grid points; this path may be calculated by starting at one end of the curve and following the path to the other end. The actual coordinate values of each point can be calculated. However, for the Hilbert curve this is much harder than for example the Z-order curve. Figure 2 shows one such ordering for a 4x4 grid. For example, the point on the H2 curve has a Hilbert value of 0, while the point has a Hilbert value of 2. The Hilbert value of a rectangle is defined as the Hilbert value of its center.
Hilbert curves of order 1, 2, and 3
Figure 2: Hilbert curves of order 1, 2, and 3
The Hilbert curve imposes a linear ordering on the data rectangles and then traverses the sorted list, assigning each set of C rectangles to a node in the R-tree. The final result is that the set of data rectangles on the same node will be close to each other in the linear ordering, and most likely in the native space; thus, the resulting R-tree nodes will have smaller areas. Figure 2 illustrates the intuitive reasons why our Hilbert-based methods will result in good performance. The data is composed of points. By grouping the points according to their Hilbert values, the MBRs of the resulting R-tree nodes tend to be small square-like rectangles. This indicates that the nodes will likely have small area and small perimeters. Small area values result in good performance for point queries; small area and small perimeter values lead to good performance for larger queries.
Algorithm Hilbert-Pack
Step 1. Calculate the Hilbert value for each data rectangleStep 2. Sort data rectangles on ascending Hilbert values
Step 3. /* Create leaf nodes */
- While
- *generate a new R-tree node
- *assign the next C rectangles to this node
- While
- *sort nodes at level l ≥ 0 on ascending creation time
- *repeat Step 3
Dynamic Hilbert R-trees
The performance of R-trees depends on the quality of the algorithm that clusters the data rectangles on a node. Hilbert R-trees use space-filling curves, and specifically the Hilbert curve, to impose a linear ordering on the data rectangles. The Hilbert value of a rectangle is defined as the Hilbert value of its center.Tree structure
The Hilbert R-tree has the following structure. A leaf node contains at mostCl entries each of the form where Cl is the capacity of the leaf, R is the MBR of the real object and obj-id is a pointer to the object description record. The main difference between the Hilbert R-tree and the R*-tree is that non-leaf nodes also contain information about the LHVs. Thus, a non-leaf node in the Hilbert R-tree contains at most Cn entries of the form where Cn is the capacity of a non-leaf node, R is the MBR that encloses all the children of that node, ptr is a pointer to the child node, and LHV is the largest Hilbert value among the data rectangles enclosed by R. Notice that since the non-leaf node picks one of the Hilbert values of the children to be the value of its own LHV, there is not extra cost for calculating the Hilbert values of the MBR of non-leaf nodes. Figure 3 illustrates some rectangles organized in a Hilbert R-tree. The Hilbert values of the centers are the numbers near the "x" symbols. The LHV's are in . Figure 4 shows how the tree of Figure 3 is stored on the disk; the contents of the parent node "II" are shown in more detail. Every data rectangle in node "I" has a Hilbert value v ≤33; similarly every rectangle in node "II" has a Hilbert value greater than 33 and ≤ 107, etc.
Data rectangles organized in a Hilbert R-tree
Figure 3: Data rectangles organized in a Hilbert R-tree
A plain R-tree splits a node on overflow, creating two nodes from the original one. This policy is called a 1-to-2 splitting policy. It is possible also to defer the split, waiting until two nodes split into three. Note that this is similar to the B*-tree split policy. This method is referred to as the 2-to-3 splitting policy.
In general, this can be extended to s-to- splitting policy; where s is the order of the splitting policy. To implement the order-s splitting policy, the overflowing node tries to push some of its entries to one of its s - 1 siblings; if all of them are full, then s-to- split need to be done. The s -1 siblings are called the cooperating siblings.
Next, the algorithms for searching, insertion, and overflow handling are described in detail.
Searching
The searching algorithm is similar to the one used in other R-tree variants. Starting from the root, it descends the tree and examines all nodes that intersect the query rectangle. At the leaf level, it reports all entries that intersect the query window w as qualified data items.Algorithm Search:
S1. Search nonleaf nodes:
S2. Search leaf nodes:
Data rectangles organized in a Hilbert R-tree
Figure 4: The file structure for the Hilbert R-tree