Judy array: Difference between revisions
imported>Stumpybumpo m Replaced dead link to mirror |
imported>Bender the Bot m HTTP to HTTPS for SourceForge |
||
| (One intermediate revision by one other user not shown) | |||
| Line 1: | Line 1: | ||
{{Short description| | {{Short description|Implementation of an associative array}} | ||
{{notability|date=April 2022}} | {{notability|date=April 2022}} | ||
In [[computer science]], a '''Judy array''' is an early-2000s [[Hewlett-Packard]] hand-optimized implementation of a 256-ary [[radix tree]] that uses many situational node types to reduce latency from CPU [[cache-line]] fills.<ref name="patent">[https://patents.google.com/patent/US6735595B2/en Robert Gobeille and Douglas Baskins' patent]</ref><ref name="shop">Alan Silverstein, "[https://judy.sourceforge.net/application/shop_interm.pdf Judy IV Shop Manual]", 2002</ref> As a compressed radix tree, a Judy array can store potentially sparse integer- or string-indexed data with comparatively low memory usage and low read latency, without relying on hashing or tree balancing, and without sacrificing in-order traversal.<ref name="ten">{{Cite web|url=http://judy.sourceforge.net/doc/10minutes.htm|title=A 10-Minute Description of How Judy Arrays Work and Why They Are So Fast}}</ref> Per-operation latency scales as <math>O(\log n)</math>—as expected of a tree—and the leading constant factor is small enough that Judy arrays are suitable even to the peta-element range.<ref>{{Cite web|url=http://packages.debian.org/buster/libjudy-dev|title=Debian -- Details of package libjudy-dev in buster}}</ref> When applicable, they can be faster than implementations of [[AVL tree]]s, [[B-tree]]s, [[hash table]]s, or [[skip list]]s from the same time period.<ref name="ten" />{{Update inline|date=June 2025}} | |||
==History== | ==History== | ||
The Judy array was invented by Douglas Baskins and named after his sister.<ref name="judy.sourceforge.net">{{cite web |url= | The Judy array was invented by Douglas Baskins over the years leading up to 2002 and named after his sister.<ref name="judy.sourceforge.net">{{cite web |url=https://judy.sourceforge.net/|title=Home|website=judy.sourceforge.net}}</ref> | ||
== | ==Node types== | ||
= | Broadly, tree nodes in Judy arrays fall into one of three categories, though the implementation uses situational variations within each category:<ref name="shop" /> | ||
* A '''linear''' node is a short, fixed-capacity, array-based [[association list]] meant to fit in one cache line. That is, such a node has an array of key bytes and a parallel array of values or pointers. Lookup is by [[linear search]] over the key array and then random access to the corresponding index in the value/pointer array. | |||
* A '''bitmap''' node is a size-256 [[bit array|bitvector]] tracking which values/children are present and then a sorted list of corresponding values or pointers. Lookup is by [[Hamming_weight#Processor_support|population count]] of the bits up to the target index and then random access to the corresponding entry in the value/pointer array. The bitmap fits within a typical CPU cache line, and random access only loads one cache line from the sorted list, so for reading these nodes require at most two cache-line fills. | |||
* An '''uncompressed''' node is a conventional [[trie]] node as an array of values/pointers. Lookup is by random access using the key byte as an index, which at the CPU level requires visiting one cache line. | |||
Linear nodes are used for low branching, bitmap nodes for intermediate branching, and uncompressed nodes for high branching.<ref name="shop" /> | |||
== | ==Advantages and disadvantages== | ||
Judy arrays are | Due to [[CPU cache|cache]] optimizations, Judy arrays are fast, especially for very large datasets. On certain tasks involving data that are sequential or nearly sequential, Judy arrays can even outperform hash tables, since, unlike hash tables, the internal tree structure of Judy arrays maintains the ordering of the keys.<ref name="nothings">{{Cite web|url=http://www.nothings.org/computer/judy/|title=A performance comparison of Judy to hash tables}}</ref> | ||
On the other hand, Judy arrays are not suitable for all key types, rely heavily on compile-time case-splitting (which increases both the compiled code size and the work involved in retuning for a new architecture<ref name="nothings"/>), make some concessions to older architectures that may not be relevant to modern machines, and do not exploit [[SIMD]].<ref name="shop" /> They are optimized for read performance over write performance.<ref name="shop" /> | |||
==See also== | ==See also== | ||
* [[Radix tree]] | * [[Radix tree]] | ||
* [[Bitwise trie with bitmap]] | |||
* [[Hash array mapped trie]] | * [[Hash array mapped trie]] | ||
| Line 26: | Line 29: | ||
==External links== | ==External links== | ||
*[ | *[https://judy.sourceforge.net/ Main Judy arrays site] | ||
*[ | *[https://judy.sourceforge.net/downloads/10minutes.htm How Judy arrays work and why they are so fast] | ||
*[ | *[https://judy.sourceforge.net/application/shop_interm.pdf A complete technical description of Judy arrays] | ||
*[http://www.nothings.org/computer/judy/ An independent performance comparison of Judy to Hash Tables] | *[http://www.nothings.org/computer/judy/ An independent performance comparison of Judy to Hash Tables] | ||
*[https://github.com/JerrySievert/judyarray A compact implementation of Judy arrays in 1250 lines of C code] | *[https://github.com/JerrySievert/judyarray A compact implementation of Judy arrays in 1250 lines of C code] | ||
[[Category:Associative arrays]] | [[Category:Associative arrays]] | ||
Latest revision as of 21:11, 9 August 2025
Template:Short description Script error: No such module "Unsubst".
In computer science, a Judy array is an early-2000s Hewlett-Packard hand-optimized implementation of a 256-ary radix tree that uses many situational node types to reduce latency from CPU cache-line fills.[1][2] As a compressed radix tree, a Judy array can store potentially sparse integer- or string-indexed data with comparatively low memory usage and low read latency, without relying on hashing or tree balancing, and without sacrificing in-order traversal.[3] Per-operation latency scales as —as expected of a tree—and the leading constant factor is small enough that Judy arrays are suitable even to the peta-element range.[4] When applicable, they can be faster than implementations of AVL trees, B-trees, hash tables, or skip lists from the same time period.[3]Template:Update inline
History
The Judy array was invented by Douglas Baskins over the years leading up to 2002 and named after his sister.[5]
Node types
Broadly, tree nodes in Judy arrays fall into one of three categories, though the implementation uses situational variations within each category:[2]
- A linear node is a short, fixed-capacity, array-based association list meant to fit in one cache line. That is, such a node has an array of key bytes and a parallel array of values or pointers. Lookup is by linear search over the key array and then random access to the corresponding index in the value/pointer array.
- A bitmap node is a size-256 bitvector tracking which values/children are present and then a sorted list of corresponding values or pointers. Lookup is by population count of the bits up to the target index and then random access to the corresponding entry in the value/pointer array. The bitmap fits within a typical CPU cache line, and random access only loads one cache line from the sorted list, so for reading these nodes require at most two cache-line fills.
- An uncompressed node is a conventional trie node as an array of values/pointers. Lookup is by random access using the key byte as an index, which at the CPU level requires visiting one cache line.
Linear nodes are used for low branching, bitmap nodes for intermediate branching, and uncompressed nodes for high branching.[2]
Advantages and disadvantages
Due to cache optimizations, Judy arrays are fast, especially for very large datasets. On certain tasks involving data that are sequential or nearly sequential, Judy arrays can even outperform hash tables, since, unlike hash tables, the internal tree structure of Judy arrays maintains the ordering of the keys.[6]
On the other hand, Judy arrays are not suitable for all key types, rely heavily on compile-time case-splitting (which increases both the compiled code size and the work involved in retuning for a new architecture[6]), make some concessions to older architectures that may not be relevant to modern machines, and do not exploit SIMD.[2] They are optimized for read performance over write performance.[2]
See also
References
<templatestyles src="Reflist/styles.css" />
Script error: No such module "Check for unknown parameters".