Van Emde Boas tree: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>AmirOnWiki
In practice: no need to show wrong solution; add missing O
 
imported>Davemck
 
(One intermediate revision by one other user not shown)
Line 18: Line 18:
A '''van Emde Boas tree''' ({{IPA|nl|vɑn ˈɛmdə ˈboːɑs}}), also known as a '''vEB tree''' or '''van Emde Boas priority queue''', is a [[tree data structure]] which implements an [[associative array]] with {{math|''m''}}-bit integer keys. It was invented by a team led by Dutch computer scientist [[Peter van Emde Boas]] in 1975.<ref>[[Peter van Emde Boas]]: ''Preserving order in a forest in less than logarithmic time'' (''Proceedings of the 16th Annual Symposium on Foundations of Computer Science'' 10: 75-84, 1975)</ref> It performs all operations in {{math|[[Big-O notation|''O'']](log&nbsp;''m'')}} time (assuming that an <math>m</math> bit operation can be performed in constant time), or equivalently in <math>O(\log \log M)</math> time, where <math>M = 2^m</math> is the largest element that can be stored in the tree. The parameter <math>M</math> is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured.  
A '''van Emde Boas tree''' ({{IPA|nl|vɑn ˈɛmdə ˈboːɑs}}), also known as a '''vEB tree''' or '''van Emde Boas priority queue''', is a [[tree data structure]] which implements an [[associative array]] with {{math|''m''}}-bit integer keys. It was invented by a team led by Dutch computer scientist [[Peter van Emde Boas]] in 1975.<ref>[[Peter van Emde Boas]]: ''Preserving order in a forest in less than logarithmic time'' (''Proceedings of the 16th Annual Symposium on Foundations of Computer Science'' 10: 75-84, 1975)</ref> It performs all operations in {{math|[[Big-O notation|''O'']](log&nbsp;''m'')}} time (assuming that an <math>m</math> bit operation can be performed in constant time), or equivalently in <math>O(\log \log M)</math> time, where <math>M = 2^m</math> is the largest element that can be stored in the tree. The parameter <math>M</math> is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured.  


The standard vEB tree has inadequate space efficiency. For example, for storing 32-bit integers (i.e., when <math>m = 32</math>), it requires <math>M = 2^{32}</math> bits of storage. However, similar data structures with equally good time efficiency and space efficiency of <math>O(n)</math> (where <math>n</math> is the number of stored elements) exist, and vEB trees can be modified to require only <math>O(n \log M)</math> space.  
The standard vEB tree has an unideal space efficiency of <math>O(M)</math>. For example, for storing 32-bit integers (i.e., when <math>m = 32</math>), it requires <math>M = 2^{32}</math> bits of storage. To resolve this, vEB trees can be modified to achieve <math>O(n \log M)</math> space, or similar data structures with equivalent asymptotic time efficiency and space efficiency of <math>O(n)</math> (where <math>n</math> is the number of stored elements) can be used.  


==Supported operations==
==Supported operations==
A vEB supports the operations of an ''ordered [[associative array]]'', which includes the usual associative array operations along with two more ''order'' operations, ''FindNext'' and ''FindPrevious'':<ref>[[Gudmund Skovbjerg Frandsen]]: ''[http://www.daimi.au.dk/~gudmund/dynamicF04/vEB.pdf Dynamic algorithms: Course notes on van Emde Boas trees (PDF)]'' ([[University of Aarhus]], Department of Computer Science) {{dead link|date=April 2025}}</ref>
A vEB supports the operations of an ''ordered [[associative array]]'', which includes the usual associative array operations along with two more ''order'' operations, ''FindNext'' and ''FindPrevious'':<ref>[[Gudmund Skovbjerg Frandsen]]: ''[http://www.daimi.au.dk/~gudmund/dynamicF04/vEB.pdf Dynamic algorithms: Course notes on van Emde Boas trees (PDF)] {{Webarchive|url=https://web.archive.org/web/20150923212438/http://www.daimi.au.dk/~gudmund/dynamicF04/vEB.pdf |date=2015-09-23 }}'' ([[University of Aarhus]], Department of Computer Science) {{dead link|date=April 2025}}</ref>
* ''Insert'': insert a key/value pair with an {{math|''m''}}-bit key
* ''Insert'': insert a key/value pair with an {{math|''m''}}-bit key
* ''Delete'': remove the key/value pair with a given key
* ''Delete'': remove the key/value pair with a given key
Line 134: Line 134:


[[Fusion tree|Fusion trees]] are another type of tree data structure that implements an [[associative array]] on w-bit integers on a finite universe. They use word-level parallelism and bit manipulation techniques to achieve {{math|''O''(log<sub>''w''</sub> ''n'')}} time for [[Predecessor problem|predecessor/successor queries]] and updates, where {{math|''w''}} is the word size.<ref>{{Cite web |date=4 April 2019 |title=Fusion Tree |url=https://iq.opengenus.org/fusion-tree/ |access-date=30 August 2023 |website=OpenGenus IQ: Computing Expertise & Legacy |language=en}}</ref> Fusion trees use {{math|''O''(''n'')}} space and can be made dynamic with hashing or exponential trees.
[[Fusion tree|Fusion trees]] are another type of tree data structure that implements an [[associative array]] on w-bit integers on a finite universe. They use word-level parallelism and bit manipulation techniques to achieve {{math|''O''(log<sub>''w''</sub> ''n'')}} time for [[Predecessor problem|predecessor/successor queries]] and updates, where {{math|''w''}} is the word size.<ref>{{Cite web |date=4 April 2019 |title=Fusion Tree |url=https://iq.opengenus.org/fusion-tree/ |access-date=30 August 2023 |website=OpenGenus IQ: Computing Expertise & Legacy |language=en}}</ref> Fusion trees use {{math|''O''(''n'')}} space and can be made dynamic with hashing or exponential trees.
Lenhof and Smid<ref>{{Cite journal |title=Using Persistent data structures for adding range restrictions to searching problems |journal=[[RAIRO-Theoretical Informatics and Applications]] |last=Lenhof |first=Hans-Peter |volume=28 |issue=1 |pages=25-49 |last2=Smid |first2=Michiel}}</ref> present a variant of the vEB tree which uses space ''O''(''n'') and take ''O''(1) expected amortized time for inserting an element, under the restriction that insertions are in increasing order; in other words, the inserted element is always the
new maximum. This structure uses [[dynamic perfect hashing]] to implement the tree in small space and moreover decrease the size of the
tree by a {{math|log log ''M''}} factor by keeping at the leaves buckets of size {{math|log log ''M''}}.
A space-efficient and [[Persistent data structure|partially persistent]] version of vEB trees was presented by Dietz and Raman<ref>{{cite techreport
| last1      = Dietz
| first1      = Paul F.
| last2      = Raman
| first2      = Rajeev
| title      = Persistence, Amortization and Randomization
| institution = University of Rochester
| year      = 1991
| number    = TR 353
}}</ref>. This version uses ''O''(''n'') space, and supports an insert in the current version in {{math|''O''(log log ''M'')}} amortized and expected time, while a query on any version remains {{math|''O''(log log ''M'')}}.


==Implementations==
==Implementations==

Latest revision as of 16:49, 19 November 2025

Template:Short description Script error: No such module "Unsubst". Template:Infobox data structure

A van Emde Boas tree (Script error: No such module "IPA".), also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with mScript error: No such module "Check for unknown parameters".-bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975.[1] It performs all operations in O(log m)Script error: No such module "Check for unknown parameters". time (assuming that an m bit operation can be performed in constant time), or equivalently in O(loglogM) time, where M=2m is the largest element that can be stored in the tree. The parameter M is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured.

The standard vEB tree has an unideal space efficiency of O(M). For example, for storing 32-bit integers (i.e., when m=32), it requires M=232 bits of storage. To resolve this, vEB trees can be modified to achieve O(nlogM) space, or similar data structures with equivalent asymptotic time efficiency and space efficiency of O(n) (where n is the number of stored elements) can be used.

Supported operations

A vEB supports the operations of an ordered associative array, which includes the usual associative array operations along with two more order operations, FindNext and FindPrevious:[2]

  • Insert: insert a key/value pair with an mScript error: No such module "Check for unknown parameters".-bit key
  • Delete: remove the key/value pair with a given key
  • Lookup: find the value associated with a given key
  • FindNext: find the key/value pair with the smallest key which is greater than a given kScript error: No such module "Check for unknown parameters".
  • FindPrevious: find the key/value pair with the largest key which is smaller than a given kScript error: No such module "Check for unknown parameters".

A vEB tree also supports the operations Minimum and Maximum, which return the minimum and maximum element stored in the tree respectively.[3] These both run in O(1) time, since the minimum and maximum element are stored as attributes in each tree.

Function

Example van Emde Boas tree
An example van Emde Boas tree with dimension 5 and the root's aux structure after 1, 2, 3, 5, 8 and 10 have been inserted.

Let

log2m=k

for some integer

k

. Define

M=2m

. A vEB tree <templatestyles src="Mono/styles.css" />T over the universe

{0,,M1}

has a root node that stores an array <templatestyles src="Mono/styles.css" />T.children of length

M

. <templatestyles src="Mono/styles.css" />T.children[i] is a pointer to a vEB tree that is responsible for the values

{iM,,(i+1)M1}

. Additionally, T stores two values <templatestyles src="Mono/styles.css" />T.min and <templatestyles src="Mono/styles.css" />T.max as well as an auxiliary vEB tree <templatestyles src="Mono/styles.css" />T.aux.

Data is stored in a vEB tree as follows: The smallest value currently in the tree is stored in <templatestyles src="Mono/styles.css" />T.min and largest value is stored in <templatestyles src="Mono/styles.css" />T.max. Note that <templatestyles src="Mono/styles.css" />T.min is not stored anywhere else in the vEB tree, while <templatestyles src="Mono/styles.css" />T.max is. If T is empty then we use the convention that <templatestyles src="Mono/styles.css" />T.max=−1 and <templatestyles src="Mono/styles.css" />T.min=M. Any other value x is stored in the subtree <templatestyles src="Mono/styles.css" />T.children[i] where i=x/M. The auxiliary tree <templatestyles src="Mono/styles.css" />T.aux keeps track of which children are non-empty, so <templatestyles src="Mono/styles.css" />T.aux contains the value j if and only if <templatestyles src="Mono/styles.css" />T.children[j] is non-empty.

FindNext

The operation <templatestyles src="Mono/styles.css" />FindNext(T, x) that searches for the successor of an element x in a vEB tree proceeds as follows: If <templatestyles src="Mono/styles.css" />x<T.min then the search is complete, and the answer is <templatestyles src="Mono/styles.css" />T.min. If <templatestyles src="Mono/styles.css" />x≥T.max then the next element does not exist, return M. Otherwise, let i=x/M. If <templatestyles src="Mono/styles.css" />x < T.children[i].max, then the value being searched for is contained in <templatestyles src="Mono/styles.css" />T.children[i], so the search proceeds recursively in <templatestyles src="Mono/styles.css" />T.children[i]. Otherwise, we search for the successor of the value i in <templatestyles src="Mono/styles.css" />T.aux. This gives us the index j of the first subtree that contains an element larger than x. The algorithm then returns <templatestyles src="Mono/styles.css" />T.children[j].min. The element found on the children level needs to be composed with the high bits to form a complete next element.

function FindNext(T, x)
    if x < T.min then
        return T.min
    if x ≥ T.max then // no next element
        return M
    i = floor(x/M)
    lo = x mod M
    
    if lo < T.children[i].max then
        return (M i) + FindNext(T.children[i], lo)
    j = FindNext(T.aux, i)
    return (M j) + T.children[j].min
end

Note that, in any case, the algorithm performs O(1) work and then possibly recurses on a subtree over a universe of size M1/2 (an m/2 bit universe). This gives a recurrence for the running time of T(m)=T(m/2)+O(1), which resolves to O(logm)=O(loglogM).

Insert

The call <templatestyles src="Mono/styles.css" />insert(T, x) that inserts a value <templatestyles src="Mono/styles.css" />x into a vEB tree <templatestyles src="Mono/styles.css" />T operates as follows:

  1. If T is empty then we set <templatestyles src="Mono/styles.css" />T.min = T.max = x and we are done.
  2. Otherwise, if <templatestyles src="Mono/styles.css" />x<T.min then we insert <templatestyles src="Mono/styles.css" />T.min into the subtree <templatestyles src="Mono/styles.css" />i responsible for <templatestyles src="Mono/styles.css" />T.min and then set <templatestyles src="Mono/styles.css" />T.min = x. If <templatestyles src="Mono/styles.css" />T.children[i] was previously empty, then we also insert <templatestyles src="Mono/styles.css" />i into <templatestyles src="Mono/styles.css" />T.aux
  3. Otherwise, if <templatestyles src="Mono/styles.css" />x>T.max then we insert <templatestyles src="Mono/styles.css" />x into the subtree <templatestyles src="Mono/styles.css" />i responsible for <templatestyles src="Mono/styles.css" />x and then set <templatestyles src="Mono/styles.css" />T.max = x. If <templatestyles src="Mono/styles.css" />T.children[i] was previously empty, then we also insert <templatestyles src="Mono/styles.css" />i into <templatestyles src="Mono/styles.css" />T.aux
  4. Otherwise, <templatestyles src="Mono/styles.css" />T.min< x < T.max so we insert <templatestyles src="Mono/styles.css" />x into the subtree <templatestyles src="Mono/styles.css" />i responsible for <templatestyles src="Mono/styles.css" />x. If <templatestyles src="Mono/styles.css" />T.children[i] was previously empty, then we also insert <templatestyles src="Mono/styles.css" />i into <templatestyles src="Mono/styles.css" />T.aux.

In code:

function Insert(T, x)
    if T.min == x || T.max == x then // x is already inserted
        return
    if T.min > T.max then // T is empty
        T.min = T.max = x;
        return
    if x < T.min then
        swap(x, T.min)
    if x > T.max then
        T.max = x
    i = floor(x / M)
    lo = x mod M
    Insert(T.children[i], lo)
    if T.children[i].min == T.children[i].max then
        Insert(T.aux, i)
end

The key to the efficiency of this procedure is that inserting an element into an empty vEB tree takes O(1)Script error: No such module "Check for unknown parameters". time. So, even though the algorithm sometimes makes two recursive calls, this only occurs when the first recursive call was into an empty subtree. This gives the same running time recurrence of Template:Tmath as before.

Delete

Deletion from vEB trees is the trickiest of the operations. The call <templatestyles src="Mono/styles.css" />Delete(T, x) that deletes a value x from a vEB tree T operates as follows:

  1. If <templatestyles src="Mono/styles.css" />T.min = T.max = x then x is the only element stored in the tree and we set <templatestyles src="Mono/styles.css" />T.min = M and <templatestyles src="Mono/styles.css" />T.max = −1 to indicate that the tree is empty.
  2. Otherwise, if <templatestyles src="Mono/styles.css" />x == T.min then we need to find the second-smallest value y in the vEB tree, delete it from its current location, and set <templatestyles src="Mono/styles.css" />T.min=y. The second-smallest value y is <templatestyles src="Mono/styles.css" />T.children[T.aux.min].min, so it can be found in O(1)Script error: No such module "Check for unknown parameters". time. We delete y from the subtree that contains it.
  3. If <templatestyles src="Mono/styles.css" />x≠T.min and <templatestyles src="Mono/styles.css" />x≠T.max then we delete x from the subtree <templatestyles src="Mono/styles.css" />T.children[i] that contains x.
  4. If <templatestyles src="Mono/styles.css" />x == T.max then we will need to find the second-largest value y in the vEB tree and set <templatestyles src="Mono/styles.css" />T.max=y. We start by deleting x as in previous case. Then value y is either <templatestyles src="Mono/styles.css" />T.min or <templatestyles src="Mono/styles.css" />T.children[T.aux.max].max, so it can be found in O(1)Script error: No such module "Check for unknown parameters". time.
  5. In any of the above cases, if we delete the last element x or y from any subtree <templatestyles src="Mono/styles.css" />T.children[i] then we also delete i from <templatestyles src="Mono/styles.css" />T.aux.

In code:

function Delete(T, x)
    if T.min == T.max == x then
        T.min = M
        T.max = −1
        return
    if x == T.min then
        hi = T.aux.min * M
        j = T.aux.min
        T.min = x = hi + T.children[j].min
    i = floor(x / M)
    lo = x mod M
    Delete(T.children[i], lo)
    if T.children[i] is empty then
        Delete(T.aux, i)
    if x == T.max then
        if T.aux is empty then
            T.max = T.min
        else
            hi = T.aux.max * M
            j = T.aux.max
            T.max = hi + T.children[j].max
end

Again, the efficiency of this procedure hinges on the fact that deleting from a vEB tree that contains only one element takes only constant time. In particular, the second Delete call only executes if x was the only element in <templatestyles src="Mono/styles.css" />T.children[i] prior to the deletion.

In practice

The assumption that log mScript error: No such module "Check for unknown parameters". is an integer is unnecessary. The operations xM and xmodM can be replaced by taking only higher-order m/2⌉Script error: No such module "Check for unknown parameters". and the lower-order m/2⌋Script error: No such module "Check for unknown parameters". bits of Template:Mvar, respectively. On any existing machine, this is more efficient than division or remainder computations.

In practical implementations, especially on machines with shift-by-k and find first zero instructions, performance can further be improved by switching to a bit array once Template:Mvar equal to the word size (or a small multiple thereof) is reached. Since all operations on a single word are constant time, this does not affect the asymptotic performance, but it does avoid the majority of the pointer storage and several pointer dereferences, achieving a significant practical savings in time and space with this trick.

An optimization of vEB trees is to discard empty subtrees. This makes vEB trees quite compact when they contain many elements, because no subtrees are created until something needs to be added to them. Initially, each element added creates about log(m)Script error: No such module "Check for unknown parameters". new trees containing about m/2Script error: No such module "Check for unknown parameters". pointers all together. As the tree grows, more and more subtrees are reused, especially the larger ones.

The implementation described above uses pointers and occupies a total space of O(M) = O(2m)Script error: No such module "Check for unknown parameters"., proportional to the size of the key universe. This can be seen as follows. The recurrence is S(M)=O(M)+(M+1)S(M). One can show that S(M) = O(M)Script error: No such module "Check for unknown parameters". by induction.[4]

Similar structures

The O(M)Script error: No such module "Check for unknown parameters". space usage of vEB trees is an enormous overhead unless a large fraction of the universe of keys is being stored. This is one reason why vEB trees are not popular in practice. This limitation can be addressed by changing the array used to store children to another data structure. One possibility is to use only a fixed number of bits per level, which results in a trie. Alternatively, each array may be replaced by a hash table, reducing the space to O(n log log M)Script error: No such module "Check for unknown parameters". (where Template:Mvar is the number of elements stored in the data structure) at the expense of making the data structure randomized.

x-fast tries and the more complicated y-fast tries have comparable update and query times to vEB trees and use randomized hash tables to reduce the space used. x-fast tries use O(n log M)Script error: No such module "Check for unknown parameters". space while y-fast tries use O(n)Script error: No such module "Check for unknown parameters". space.

Fusion trees are another type of tree data structure that implements an associative array on w-bit integers on a finite universe. They use word-level parallelism and bit manipulation techniques to achieve O(logw n)Script error: No such module "Check for unknown parameters". time for predecessor/successor queries and updates, where wScript error: No such module "Check for unknown parameters". is the word size.[5] Fusion trees use O(n)Script error: No such module "Check for unknown parameters". space and can be made dynamic with hashing or exponential trees.

Lenhof and Smid[6] present a variant of the vEB tree which uses space O(n) and take O(1) expected amortized time for inserting an element, under the restriction that insertions are in increasing order; in other words, the inserted element is always the new maximum. This structure uses dynamic perfect hashing to implement the tree in small space and moreover decrease the size of the tree by a log log MScript error: No such module "Check for unknown parameters". factor by keeping at the leaves buckets of size log log MScript error: No such module "Check for unknown parameters"..

A space-efficient and partially persistent version of vEB trees was presented by Dietz and Raman[7]. This version uses O(n) space, and supports an insert in the current version in O(log log M)Script error: No such module "Check for unknown parameters". amortized and expected time, while a query on any version remains O(log log M)Script error: No such module "Check for unknown parameters"..

Implementations

There is a verified implementation in Isabelle (proof assistant).[8] Both functional correctness and time bounds are proved. Efficient imperative Standard ML code can be generated.

See also

References

<templatestyles src="Reflist/styles.css" />

  1. Peter van Emde Boas: Preserving order in a forest in less than logarithmic time (Proceedings of the 16th Annual Symposium on Foundations of Computer Science 10: 75-84, 1975)
  2. Gudmund Skovbjerg Frandsen: Dynamic algorithms: Course notes on van Emde Boas trees (PDF) Template:Webarchive (University of Aarhus, Department of Computer Science) Script error: No such module "Unsubst".
  3. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. MIT Press, 2009. Template:ISBN. Chapter 20: The van Emde Boas tree, pp. 531–560.
  4. Script error: No such module "citation/CS1".
  5. Script error: No such module "citation/CS1".
  6. Script error: No such module "Citation/CS1".
  7. Template:Cite techreport
  8. Script error: No such module "citation/CS1".

Script error: No such module "Check for unknown parameters".

Further reading

<templatestyles src="Refbegin/styles.css" />

  • Erik Demaine, Sam Fingeret, Shravas Rao, Paul Christiano. Massachusetts Institute of Technology. 6.851: Advanced Data Structures (Spring 2012). Lecture 11 notes. March 22, 2012.
  • Script error: No such module "Citation/CS1".

Template:CS trees Template:Authority control