Radix trees in Rcpp

Oliver Keyes

2023-03-04

A radix tree is a data structure optimised for storing key-value pairs in a way optimised for searching. This makes them very, very good for efficiently matching data against keys, and retrieving the values associated with those keys.

triebeard provides an implementation of radix trees for Rcpp (and also for use directly in R). To start using radix trees in your Rcpp development, simply modify your C++ file to include at the top:

//[[Rcpp::depends(triebeard)]]
#include <radix.h>

Constructing trees

Trees are constructed using the syntax:

radix_tree<type1, type2> radix;

Where type represents the type of the keys (for example, std::string) and type2 the type of the values.

Radix trees can have any scalar type as keys, although strings are most typical; they can also have any scalar type for values. Once you’ve constructed a tree, new entries can be added in a very R-like way: radix[new_key] = new_value;. Entries can also be removed, with radix.erase(key).

Matching against trees

We then move on to the fun bit: matching! As mentioned, radix trees are really good for matching arbitrary values against keys (well, keys of the same type) and retrieving the associated values.

There are three types of supported matching; longest, prefix, and greedy. Longest does exactly what it says on the tin: it finds the key-value pair where the longest initial part of the key matches the arbitrary value:

radix_tree<std::string, std::string> radix;
radix["turnin"] = "entry the first";
radix["turin"] = "entry the second";

radix_tree<std::string, std::string>::iterator it;

it = radix.longest_match("turing");

if(it = radix.end()){
  printf("No match was found :(");
} else {
  std::string result = "Key of longest match: " + it->first + " , value of longest match: " + it->second;
}

Prefix matching provides all trie entries where the value-to-match is a prefix of the key:

radix_tree<std::string, std::string> radix;
radix["turnin"] = "entry the first";
radix["turin"] = "entry the second";

std::vector<radix_tree<std::string, std::string>::iterator> vec;
std::vector<radix_tree<std::string, std::string>::iterator>::iterator it;

it = radix.prefix_match("tur");

if(it == vec.end()){
  printf("No match was found :(");
} else {
  for (it = vec.begin(); it != vec.end(); ++it) {
    std::string result = "Key of a prefix match: " + it->first + " , value of a prefix match: " + it->second;
  }
}

Greedy matching matches very, very fuzzily (a value of ‘bring’, for example, will match ‘blind’, ‘bind’ and ‘binary’) and, syntactically, looks exactly the same as prefix-matching, albeit with radix.greedy_match() instead of radix.prefix_match().

Other trie things

If you have ideas for other trie-like structures, or functions that would be useful with these tries, the best approach is to either request it or add it!