Dominik Schöffmann
Betreuer: Sebastian Gallenmüller
Betreuer: Paul Emmerich
Seminar: Future Internet WS2016/17
Lehrstuhl Netzarchitekturen und Netzdienste
Fakultät für Informatik, Technische Universität München
Email: schoeffm@in.tum.de
0 1 0 1 0 1
terms “left” and “right” are used to indicate a 0-bit and a 1- A B D
bit respectively. Inside of the trie, a route matches at exactly
1 0 0 0 1 skip 00 1
the position corresponding with the prefix.
E A B D
When looking up an IP address, the n-th most significant 1 0 skip 1 0 1 0
bit of the address is used to index the n-th level of the trie. F G H E
The trie is traversed using this fashion, until a leaf is found.
Once this is achieved, the route at the leaf might match the 0 1 1
IP address in question, or might not match it. If it does G H F
match, the search is over. If it does not match, the trie
(a) Basic (b) Compressed
needs to be backtracked upwards again.
Figure 2: Path compressed trie
For example, if the IP address which is to be looked up
is 128.0.1.24, the algorithm would take the first branch to Node Route Next Hop
the right, and the next branch to the left, reaching the leaf A 0.0.0.0/2 1.2.3.4
labeled C. Indeed this route matched the IP address, and B 64.0.0.0/2 2.3.4.5
the lookup was successful at this step. D 192.0.0.0/2 4.5.6.7
E 192.0.0.0/3 5.6.7.8
Another example is the address 96.4.5.6, which would branch F 112.0.0.0/4 6.7.8.9
left, then two times right, and left again. The leaf which is G 128.0.0.0/5 7.8.9.0
found does not contain any route. Since the algorithm did H 136.0.0.0/5 8.9.0.1
not reach a leaf which corresponds with a route matching
the IP address, upwards traversal of the trie is needed. As Table 2: Path compressed trie, Routing table
soon as the node B is rediscovered, the algorithm found the
longest matching prefix and finishes. The FreeBSD Operating System actually uses such a path
compressed trie as its IP lookup scheme, although in a differ-
It should again be noted, that this is not the only possible ent flavor. In the examples used here, routes could be inter-
method of making use of a basic trie, but was chosen for nal nodes, whereas in the FreeBSD implementation, routes
simplicity. For example one simple extension would be to are always leafs. [9][12] The same routing table, as above, is
save the last found route, in order to avoid the upwards also represented in Figure 3, using the FreeBSD implemen-
traversal. tation. It should be noted, that the numbers of the internal
nodes depict bit positions, on which the trie branches. Fur-
thermore, in this representation, two different routes sharing
5.3 Path compression the same prefix, but having a different prefix length cannot
When developing a routing lookup algorithm, the size of be handled solely within the trie. As a solution, the nodes
the data structure, and the number of memory accesses, have a list of routes, which are checked in the order of the
are of big importance. One method to decrease the num- prefix length. Longer prefixes get checked first. Internal
ber of needed memory accesses is called “path compression”. nodes can reference leafs, which is useful for the backtrack-
Again, different implementations exist, which utilize path ing, and depicted in Figure 3 as the dashed lines.
compression in their own way. In the first part, the notation
of the previous section is preserved, and the compression is Again, an example lookup may help to understand this data
influenced by Nilsson and Tikkanen [7]. The second part structure. The IP address to be looked up is 96.45.56.67.
describes the approach of FreeBSD. For simplicity, let’s split the first octet up into its binary
representation: 9610 = 011000002 This means, that we need
Path compression allows the data structure to skip interme- to take a left branch followed by two right branches, and left
diate nodes which only have one child each. branches from there on.
Figure 2 shows two tries, whereby Figure 2a is a none com- Following these directions reveals the leaf F which does not
pressed trie and Figure 2b is the path compressed version, match the IP address in question. The algorithm backtracks
representing the same data. The routing table correspond- one level up, where is encounters a reference to the leaf B. B
ing to Figure 2 is given in Table 2. Empty leafs are omitted in return holds a route which matches the IP address. This
for brevity. terminates the algorithm.
1 1 1 1
A 2 4 E&D 2 2 4 E&D
B F G H 3 3 5 F G H
It should be noted, that this rarely reflects the reality, since the algorithm are dependent on the actual implementation
almost always a default gateway exists, which can be used to be used. The algorithm presented for path compressed
as a last resort. tries still holds for this kind of level compressed trie, with
only minor changes.
5.4 Level compression
Although path compression is a pretty good start, for re- Although the FreeBSD trie does not use level compression,
ducing the memory consumption and accesses, level com- the Linux kernel does. [8] The Linux kernel furthermore
pression can further help to achieve these goals. As the adapts an optimization presented by Nilsson and Tikka-
name already suggests, multiple nodes on the same level nen [7]. Using the strict criteria presented in this paper
get merged, in order to have more information at the same imposes great cost on the maintenance of the trie. Abstain-
memory location. ing from the demand that no child node is empty, but in-
troducing thresholds as to when to double or halve a node
Saving nodes means, that less pointers to nodes need to be allows for good compression at reduced cost.
held, which is good for the memory consumption, and it
also means, that less nodes need to be accessed to find the 5.5 Poptrie
desired information, which is helpful to counter expensive Poptrie is a high performance IP lookup data structure based
memory accesses. The increased size of a single node is only upon LPC tries, developed by Asai and Ohara [11]. It incor-
a minor concern, because it usually still easily fits inside one porates both path and level compression, as well as support
cache line. for a new CPU instruction commonly found in commodity
hardware, namely “popcount”. This instruction counts the
A node can be doubled when all of its children themselves bits set to 1 in an integer. The data structure is highly com-
have the same amount of children, and all children split at pressed in order to fit into lower level caches, compared to
the same bit position for path compression. other schemes, which may have to make use of main memory
or the L3 cache.
Figure 4a shows an extended version of the FreeBSD trie
used above. A level compressed version of this trie is given Poptrie maintains two arrays, one for the internal nodes,
in Figure 4b. As can be seen, the level compression merges and one for the leafs. An internal node is a struct, which
multiple nodes into a single one. The notation “x/y” of the contains an offset into both of these arrays, and a field of
internal nodes means, that this node starts comparing bits bits. As usual the IP address is used as the key into the
at position x for a length of y bits. data structure which is divided into multiple parts, one for
each level. The part of the IP address to be processed at the
The internal structure of the nodes and therefore parts of moment is used to index the bit field. If the corresponding
Three arrays are used inside of the DXR algorithm, which For a basic trie the performance gain is still small, but no-
are called the “lookup table”, the “range table” and the “next ticeable. The time to route 10000000 addresses in depen-
hop table”. The lookup table is indexed using the first n bits dence of the batch size is presented in Figure 7.
of the IP address, thus having 2n entries. Commonly used
values for n are 16 and 18. An entry can have two different A batch size of one indicates the not-batched version of the
kinds of content. Either a route with a prefix longer than n algorithm. It can be observed, that the batch sizes 2, 16, and
exists, which thus cannot be handled inside the lookup table, 32 have the best speed. In the end, the best enhancement
8. REFERENCES
[1] J. Postel. (1981, September) Internet protocol.
[Online]. Available: https://tools.ietf.org/html/rfc791
[2] L. Rizzo, “netmap: a novel framework for fast packet
I/O,” in USENIX Annual Technical Conference, 2012.
[3] DPDK Developers. (2016, September). [Online].
Available: http://dpdk.org/
[4] Intel, DPDK Programmer’s Guide, version: 16.04.
[5] J. Thompson, “pfSense around the world, better IPsec,
tryforward and netmap-fwd,” October 2015. [Online].
Available: https://blog.pfsense.org/?p=1866
[6] P. Gupta, S. Lin, and N. McKeown, “Routing lookups
in hardware at memory access speeds,” in
INFOCOM’98. Seventeenth Annual Joint Conference
of the IEEE Computer and Communications Societies.