Xfast trie  

Type  Trie  
Invented  1982  
Invented by  Dan Willard  

In computer science, an xfast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982,^{[1]} along with the more complicated yfast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.
An xfast trie is a bitwise trie: a binary tree where each subtree stores values whose binary representations start with a common prefix. Each internal node is labeled with the common prefix of the values in its subtree and typically, the left child adds a 0 to the end of the prefix, while the right child adds a 1. The binary representation of an integer between 0 and M − 1 uses ⌈log_{2} M⌉ bits, so the height of the trie is O(log M).
All values in the xfast trie are stored at the leaves. Internal nodes are stored only if they have leaves in their subtree. If an internal node would have no left child, it stores a pointer to the smallest leaf in its right subtree instead, called a descendant pointer. Likewise, if it would have no right child, it stores a pointer to the largest leaf in its left subtree. Each leaf stores a pointer to its predecessor and successor, thereby forming a doubly linked list. Finally, there is a hash table for each level that contains all the nodes on that level. Together, these hash tables form the levelsearch structure (LSS). To guarantee the worstcase query times, these hash tables should use dynamic perfect hashing or cuckoo hashing.
The total space usage is O(n log M), since each element has a roottoleaf path of length O(log M).
Like van Emde Boas trees, xfast tries support the operations of an ordered associative array. This includes the usual associative array operations, along with two more order operations, Successor and Predecessor:
Finding the value associated with a key k that is in the data structure can be done in constant time by looking up k in LSS[0], which is a hash table on all the leaves.^{[2]}
To find the successor or predecessor of a key k, we first find A_{k}, the lowest ancestor of k. This is the node in the trie that has the longest common prefix with k. To find A_{k}, we perform a binary search on the levels. We start at level h/2, where h is the height of the trie. On each level, we query the corresponding hash table in the levelsearch structure with the prefix of k of the right length. If a node with that prefix does not exist, we know that A_{k} must be at a higher level and we restrict our search to those. If a node with that prefix does exist, A_{k} can not be at a higher level, so we restrict our search to the current and lower levels.
Once we find the lowest ancestor of k, we know that it has leaves in one of its subtrees (otherwise it wouldn't be in the trie) and k should be in the other subtree. Therefore the descendant pointer points to the successor or the predecessor of k. Depending on which one we are looking for, we might have to take one step in the linked list to the next or previous leaf.
Since the trie has height O(log M), the binary search for the lowest ancestor takes O(log log M) time. After that, the successor or predecessor can be found in constant time, so the total query time is O(log log M).^{[1]}
To find the key k itself or its successor, we first follow the edge starting from the root. In the above graph, the edge to the left child is labeled "0", and the edge to the right child is labeled "1". If we can find the child by following the edge, then we keep following the edge, until it leads to the key k we are looking for. If an edge we want does not exist, then there must be a pointer instead pointing to its successor, so we follow the pointer and go to its successor.
Since in the worst case, we have to walk down the entire height of the trie, this operation takes O(log M) time.^{[3]}
To insert a keyvalue pair (k, v), we first find the predecessor and successor of k. Then we create a new leaf for k, insert it in the linked list of leaves between the successor and predecessor, and give it a pointer to v. Next, we walk from the root to the new leaf, creating the necessary nodes on the way down, inserting them into the respective hash tables and updating descendant pointers where necessary.
Since we have to walk down the entire height of the trie, this process takes O(log M) time.^{[4]}
To delete a key k, we find its leaf using the hash table on the leaves. We remove it from the linked list, but remember which were the successor and predecessor. Then we walk from the leaf to the root of the trie, removing all nodes whose subtree only contained k and updating the descendant pointers where necessary. Descendant pointers that used to point to k will now point to either the successor or predecessor of k, depending on which subtree is missing.
Like insertion, this takes O(log M) time, as we have to walk through every level of the trie.^{[4]}
Here is an example of implementing XFast Tire in Java:
class XFastTrie {
private final int MAX_BITS;
private Node root;
private HashMap<Integer, Node>[] levelMaps;
public XFastTrie(int universeSize) {
this.MAX_BITS = (int) (Math.log(universeSize) / Math.log(2)) + 1;
this.root = new Node();
this.levelMaps = new HashMap[MAX_BITS];
for (int i = 0; i < MAX_BITS; i++) {
levelMaps[i] = new HashMap<>();
}
}
class Node {
Node left = null, right = null;
Integer value = null; // Use the wrapper class to allow null
}
public void insert(int value) {
int path = 0;
Node current = root;
for (int i = MAX_BITS  1; i >= 0; i) {
levelMaps[i].put(path, current); // Update the level hash map
int bit = (value >>> i) & 1;
path = (path << 1)  bit;
if (bit == 0) {
if (current.left == null) {
current.left = new Node();
}
current = current.left;
} else {
if (current.right == null) {
current.right = new Node();
}
current = current.right;
}
}
current.value = value; // Store the actual value at the leaf
}
public Node search(int value) {
int path = 0;
Node current = root;
for (int i = MAX_BITS  1; i >= 0; i) {
int bit = (value >>> i) & 1;
path = (path << 1)  bit;
if (bit == 0) {
if (current.left == null) {
return null; // Value not found
}
current = current.left;
} else {
if (current.right == null) {
return null; // Value not found
}
current = current.right;
}
}
return current.value != null ? current : null; // Check if it's an actual value node
}
}
Willard introduced xfast tries largely as an introduction to yfast tries, which provide the same query time, while using only O(n) space and allowing insertions and deletions in O(log log M) time.^{[1]}
A compression technique similar to patricia tries can be used to significantly reduce the space usage of xfast tries in practice.^{[5]}
By using an exponential search before the binary search over the levels and by querying not only the current prefix x, but also its successor x + 1, xfast tries can answer predecessor and successor queries in time O(log log Δ), where Δ is the difference between the query value and its predecessor or successor.^{[2]}