44 lines
4.0 KiB
Factor
44 lines
4.0 KiB
Factor
USING: arrays assocs bytearrays help.markup help.syntax io.encodings.utf8


kernel math serialize trees.cb.private ;


IN: trees.cb




ARTICLE: "trees.cb" "Binary critbit trees"


"The " { $vocablink "trees.cb" } " vocabulary is a library for binary critical bit trees, a variant of PATRICIA tries. A critbit tree stores each element of a nonempty set of keys " { $snippet "K" } " in a leaf node. Each leaf node is attached to the tree of internal split nodes for bit strings " { $snippet "x" } " such that " { $snippet "x0" } " and " { $snippet "x1" } " are prefixes of (serialized byte arrays of) elements in " { $snippet "K" } " and ancestors of other bit strings higher up in the tree. Split nodes store the prefix compressed as two values, the byte number and bit position, in the subset of " { $snippet "K" } " at which the prefixes of all ancestors to the left differ from all ancestors to the right."


$nl


"Serialization of keys is implemented using " { $link key>bytes } ". Critbit trees can store arbitrary keys and values, even mixed (but see implementation notes to " { $link key>bytes* } "). Due to the nature of critbit trees, for any given input key set that shares a common prefix, the tree compresses the common prefix into the split node at the joint extending the lookup by one node for arbitrarily long prefixes."


$nl


"Keys are serialized once for every lookup and insertion not adding a new leaf node. Two keys are serialized for every insertion adding a new leaf node to the tree."


$nl


"Due to ordering ancestors at split nodes into critbit '0' (left) and critbit '1' (right), the order of the elements in a critbit tree is total allowing efficient suffix searches and minimum searches."


$nl


"Critbit trees consume 2 * " { $emphasis "n" } "  1 nodes in total for storing " { $emphasis "n" } " elements; each internal split node consumes two pointers and a fixnum and an integer; each leaf node two pointers to the key and value. Their shape is unique for any given set of keys, which also means lookup times are deterministic for a known set of keys regardless of insertion order or the tree having been cloned."


$nl


"Compared to hash tables, critbit trees provide fast access without being prone to malicious input (but see limitations of the standard implementation of " { $link key>bytes* } ") and also provide ordered operations (e.g. finding minimums). Compared to heaps, they support exact searches and suffix searches in addition. Compared to other ordered trees (AVL, B), they support the same set of operations while keeping a simpler inner structure."


$nl


"Critbit trees conform to the assoc protocol."


;




ABOUT: "trees.cb"




HELP: CB{


{ $syntax "CB{ { key value }... }" }


{ $values { "key" "a key" } { "value" "a value" } }


{ $description "Literal syntax for a critbit tree." } ;




HELP: <cb>


{ $values { "tree" cb } }


{ $description "Creates an empty critbit tree" } ;




HELP: >cb


{ $values { "assoc" assoc } { "tree" cb } }


{ $description "Converts any " { $link assoc } " into a critbit tree. If the input assoc is a " { $link cb } ", the elements are cloned before insertion. The resulting tree is, then, identical to the input, as critbit trees are unique for any given content." } ;




HELP: cb


{ $classdescription "This is the class for binary critbit trees (i.e. discriminating on a single critical bit)." } ;




HELP: key>bytes*


{ $values { "key" object } { "bytes" bytearray } }


{ $description "Converts a key, which can be any " { $link object } ", into a " { $link bytearray } ". Standard methods convert strings into its " { $link utf8 } " byte sequences and " { $link float } " values into byte arrays representing machinespecific doubles. Integrals are converted into a byte sequence of at least machine word size in little endian byte order."


$nl


"All other objects are serialized using " { $link object>bytes } ". In the standard implementation, this maps " { $link f } " to the byte array " { $snippet "B{ 110 }" } " and " { $link t } " to " { $snippet "B{ 116 }" } ", which is identical to using the respective literal byte arrays as inputs." } ;
