K-D-B-tree
In computer science, a K-D-B-tree is a tree data structure for subdividing a k-dimensional search space. The aim of the K-D-B-tree is to provide the search efficiency of a balanced k-d tree, while providing the block-oriented storage of a B-tree for optimizing external memory accesses.
Informal description
Much like the k-d tree, a K-D-B-tree organizes points in k-dimensional space, useful for tasks such as range-searching and multi-dimensional database queries. K-D-B-trees subdivide space into two subspaces by comparing elements in a single domain. Using a 2-D-B-tree as an example, space is subdivided in the same manner as a k-d tree: using a point in just one of the domains, or axes in this case, all other values are either less than or greater than the current value, and fall to the left and right of the splitting plane respectively.Unlike a k-d tree, each half-space is not its own node. Instead, as in a B-tree, nodes in the K-D-B-tree are stored as pages and the tree stores a pointer to the root page.
Structure
The K-D-B-tree contains two types of pages:- Region pages: A collection of ' pairs containing a description of the bounding region along with a pointer to the child page corresponding to that region.
- Point pages: A collection of ' pairs. In the case of databases, location may point to the index of the database record, while for points in k-dimensional space, it can be seen as the point's coordinates in that space.
Throughout insertion/deletion operations, the K-D-B-tree maintains a certain set of properties:
- The graph is a multi-way tree. Region pages always point to child pages, and can not be empty. Point pages are the leaf nodes of the tree.
- Like a B-tree, the path length to the leaves of the tree is the same for all queries.
- The regions that make up a region page are disjoint.
- If the root is a region page the union of its regions is the entire search space.
- When the child of a pair in a region page is also a region page, the union of all the regions in the child is region.
- Conversely in the case above, if child is a point page, all points in child must be contained in region.
Operations on a K-D-B-tree
Queries
Queries on a K-D-B-tree are a range search over intervals in all domains or axes in the tree. This collection of intervals is called the query region. In k-space, the query region can be visualized as a bounding volume around some subspace in the entire k-dimensional search space. A query can fall into one of three categories:- Some intervals span an entire domain or axis, making the query a partial range query.
- Some intervals are points, the others full domains, and so the query is a partial match query.
- The intervals are all points, and so the bounding volume is also just a point. This is an exact match query.
Algorithm
- If the root of the tree is null, terminate, otherwise let page be root.
- If page is a point page, return every point in a ' pair that lies within the query region.
- Otherwise, page is a region page, so for all ' pairs where region and query region intersect, set page to be child and recurse from step 2.
Insertions
Since an insertion into a K-D-B-tree may require the splitting of a page in the case of a page overflow, it is important to first define the splitting operation.Splitting algorithm
First, a region page is split along some plane to create two new region pages, the left and right pages. These pages are filled with the regions from the old region page, and the old region page is deleted. Then, for every in the original region page, remembering child is a page and region specifies an actual bounding region:- If region lies entirely to the left of the splitting plane, add ' to the left page.
- If region lies entirely to the right of the splitting plane, add ' to the right page.
- Otherwise:
- # Recursively split child by the splitting plane, resulting in the pages new_left_page and new_right_page
- # Split region by the splitting plane, resulting in left_region and right_region
- # Add ' to the left page, and ' to the right page.
Insertion algorithm
Using the splitting algorithm, insertions of a new ' pair can be implemented as follows:- If the root page is null, simply make the root page a new point page containing '
Deletions
Deletions from a K-D-B-tree are incredibly simple if no minimum requirements are placed on storage utilization. Using an exact match query to find a pair, we simply remove the record from the tree if it exists.Reorganization algorithm
Since deletions can result in pages that contain very little data, it may be necessary to reorganize the K-D-B-tree to meet some minimum storage utilization criteria. The reorganization algorithm to be used when a page contains too little data is as follows:- Let page be the parent of P, containing .
- Find regions in page such that the regions are adjacent and the union of which forms a rectangular region. These regions are considered "joinable". Let R denote the set of these regions.
- Merge the set R into one page S, and if the S is overfull, repeatedly split until none of the resulting pages are overfull.
- Replace the set R of regions in page with the resulting pages from splitting S.