Vantage-point tree

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Template:Short description A vantage-point tree (or VP tree) is a metric tree that segregates data in a metric space by choosing a position in the space (the "vantage point") and partitioning the data points into two parts: those points that are nearer to the vantage point than a threshold, and those points that are not. By recursively applying this procedure to partition the data into smaller and smaller sets, a tree data structure is created where neighbors in the tree are likely to be neighbors in the space.[1]

One generalization is called a multi-vantage-point tree (or MVP tree): a data structure for indexing objects from large metric spaces for similarity search queries. It uses more than one point to partition each level.[2][3]

History

Peter Yianilos claimed that the vantage-point tree was discovered independently by him (Peter Yianilos) and by Jeffrey Uhlmann.[1] Yet, Uhlmann published this method before Yianilos in 1991.[4] Uhlmann called the data structure a metric tree, the name VP-tree was proposed by Yianilos. Vantage-point trees have been generalized to non-metric spaces using Bregman divergence by Nielsen et al.[5]

This iterative partitioning process is similar to that of a kScript error: No such module "Check for unknown parameters".-d tree, but uses circular (or spherical, hyperspherical, etc.) rather than rectilinear partitions. In two-dimensional Euclidean space, this can be visualized as a series of circles segregating the data.

The vantage-point tree is particularly useful in dividing data in a non-standard metric space into a metric tree.

Understanding a vantage-point tree

The way a vantage-point tree stores data can be represented by a circle.[6] First, understand that each node of this tree contains an input point and a radius. All the left children of a given node are the points inside the circle and all the right children of a given node are outside of the circle. The tree itself does not need to know any other information about what is being stored. All it needs is the distance function that satisfies the properties of the metric space.[6]

Searching through a vantage-point tree

A vantage-point tree can be used to find the nearest neighbor of a point xScript error: No such module "Check for unknown parameters".. The search algorithm is recursive. At any given step we are working with a node of the tree that has a vantage point vScript error: No such module "Check for unknown parameters". and a threshold distance tScript error: No such module "Check for unknown parameters".. The point of interest xScript error: No such module "Check for unknown parameters". will be some distance from the vantage point vScript error: No such module "Check for unknown parameters".. If that distance dScript error: No such module "Check for unknown parameters". is less than tScript error: No such module "Check for unknown parameters". then use the algorithm recursively to search the subtree of the node that contains the points closer to the vantage point than the threshold tScript error: No such module "Check for unknown parameters".; otherwise recurse to the subtree of the node that contains the points that are farther than the vantage point than the threshold tScript error: No such module "Check for unknown parameters".. If the recursive use of the algorithm finds a neighboring point nScript error: No such module "Check for unknown parameters". with distance to xScript error: No such module "Check for unknown parameters". that is less than Template:AbsScript error: No such module "Check for unknown parameters". then it cannot help to search the other subtree of this node; the discovered node nScript error: No such module "Check for unknown parameters". is returned. Otherwise, the other subtree also needs to be searched recursively.

A similar approach works for finding the kScript error: No such module "Check for unknown parameters". nearest neighbors of a point xScript error: No such module "Check for unknown parameters".. In the recursion, the other subtree is searched for kk′Script error: No such module "Check for unknown parameters". nearest neighbors of the point xScript error: No such module "Check for unknown parameters". whenever only k′ (< k)Script error: No such module "Check for unknown parameters". of the nearest neighbors found so far have distance that is less than Template:AbsScript error: No such module "Check for unknown parameters"..

Advantages of a vantage-point tree

  1. Instead of inferring multidimensional points for domain before the index being built, we build the index directly based on the distance.[6] Doing this, avoids pre-processing steps.
  2. Updating a vantage-point tree is relatively easy compared to the FastMap approach. For FastMap, after inserting or deleting data, there will come a time when FastMap will have to rescan itself. That takes up too much time and it is unclear to know when the rescanning will start.[6]
  3. Distance based methods are flexible. It is “able to index objects that are represented as feature vectors of a fixed number of dimensions."[6]

Complexity

The time cost to build a vantage-point tree is approximately O(n log n)Script error: No such module "Check for unknown parameters".. For each element, the tree is descended by log nScript error: No such module "Check for unknown parameters". levels to find its placement. However there is a constant factor kScript error: No such module "Check for unknown parameters". where kScript error: No such module "Check for unknown parameters". is the number of vantage points per tree node.[3]

The time cost to search a vantage-point tree to find a single nearest neighbor is O(log n)Script error: No such module "Check for unknown parameters".. There are log nScript error: No such module "Check for unknown parameters". levels, each involving kScript error: No such module "Check for unknown parameters". distance calculations, where kScript error: No such module "Check for unknown parameters". is the number of vantage points (elements) at that position in the tree.

The time cost to search a vantage-point tree for a range, which may be the most important attribute, can vary greatly depending on the specifics of the algorithm used and parameters. Brin's paper[3] gives the result of experiments with several vantage point algorithms with various parameters to investigate the cost, measured in number of distance calculations.

The space cost for a vantage-point tree is approximately nScript error: No such module "Check for unknown parameters".. Each element is stored, and each tree element in each non-leaf node requires a pointer to its descendant nodes. (See Brin for details on one implementation choice. The parameter for number of elements at each node plays a factor.)

With Template:Mvar points there are O(n2)Script error: No such module "Check for unknown parameters". pairwise distances between points. However, the creation of a vantage-point tree requires that only O(n log n)Script error: No such module "Check for unknown parameters". distances be calculated explicitly, and a search requires only O(log n)Script error: No such module "Check for unknown parameters". distance calculations. For example, if Template:Mvar and Template:Mvar are points and it is known that the distance d(x, y)Script error: No such module "Check for unknown parameters". is small then any point Template:Mvar that is far from Template:Mvar will also necessarily be almost as far from Template:Mvar because the metric space's triangle inequality gives d(y, z) ≥ d(x, z) − d(x, y)Script error: No such module "Check for unknown parameters"..

References

<templatestyles src="Reflist/styles.css" />

  1. a b Script error: No such module "citation/CS1".
  2. Script error: No such module "Citation/CS1".
  3. a b c Script error: No such module "Citation/CS1".
  4. Script error: No such module "Citation/CS1".
  5. Script error: No such module "citation/CS1".
  6. a b c d e Script error: No such module "citation/CS1".

Script error: No such module "Check for unknown parameters".

External links

Template:CS-Trees