2.2 TRIE-BASED ALGORITHMS 35 The lookup is performed in strides of k bits.The lookup complexity is the number of bits in the prefix divided by k bits,O(W/k).For example,if W is 32 and k is 4,then 8 lookups in the worst case are required to access that node.An update requires a search through W/k lookup iterations plus access to each child node(2).The update complexity is O(W/k+2).In the worst case,each prefix would need an entire path of length (W/k)and each node would have 2k entries.The space complexity would then be O((2**N*W)/k). 2.2.4 Level Compression Trie The path-compressed trie in Section 2.2.2 is an effective way to compress a trie when nodes are sparsely populated.Nilsson and Karlsson [9]have proposed a technique called level compression trie (LC-trie),to compress the trie where nodes are densely populated.The LC-trie actually combines the path-compression and multi-bit trie concepts to optimize a binary trie structure. Data Structure.The fixed-stride multi-bit trie (in Section 2.2.3)can improve lookup performance,but it may incur redundant storage.However,from an angle of local scopes, if we only perform multi-bit lookup wherever the sub-trie structures are full sub-trees,then no redundant storage is needed in those nodes.The construction of the LC-trie is,in fact, a process of transforming a path-compressed trie to a multi-bit path-compressed trie.The process is to find the full sub-trees of a path-compressed trie,and transform them into multi-bit lookup nodes.Therefore,information stored in each node of the LC-trie includes those that are needed for both the path-compressed trie and the multi-bit trie lookup. The LC-trie algorithm starts withadisjoint-prefix binary trie as described in Section 2.2.1, where only leaf nodes contain prefixes.Figure 2.12 shows three different tries:(a)1-bit binary trie with prefixes at leaf nodes;(b)path-compressed trie;and(c)LC-trie.The LC-trie needs only three levels instead of the six required in the path-compressed trie.Furthermore, Figure 2.12 shows a straight-forward transformation from path-compressed trie to LC-trie, as shown in the dashed boxes in Figures 2.12b and c,where the first three levels of the path-compressed trie that form a full sub-trie are converted to a single-level 3-bit sub-trie in the LC-trie. Route Lookup.Let us again use the 8-bit destination address 11100000 as an example to explain the lookup process.The lookup starts at the root node of the LC-trie in Figure 2.12c. In this node,the multi-bit and path-compression trie lookup information shows 'stride =3' and 'skip =0'.'Stride =3'indicates that there would be 23=8 branches in this node and that we should use (the next)3 bits of the address to find the corresponding branch. 'Skip =0'indicates that no path compression is involved here.The first three bits of the address 111 are inspected,and we should take the (7+1)=8th branch of this node.Then, at the branched node,we have 'stride 1'and 'skip =3'.'Stride =1'indicates that there are only 2=2 branches in this node.'Skip =3'indicates that the following path of the trie is path-compressed,and we should compare with the stored segment,and if it matches then skip the next three bits of the IP address.We skip the next three bits 000 and examine the fourth bit,0.The bit 0 indicates that we should take the left branch of this node,and then we come to node 13.On finding that node 13 is a leaf node,the lookup stops here,and the next hop information corresponding to node 13 is returned
2.2 TRIE-BASED ALGORITHMS 35 The lookup is performed in strides of k bits. The lookup complexity is the number of bits in the prefix divided by k bits, O(W/k). For example, if W is 32 and k is 4, then 8 lookups in the worst case are required to access that node. An update requires a search through W/k lookup iterations plus access to each child node (2k ). The update complexity is O(W/k + 2k ). In the worst case, each prefix would need an entire path of length (W/k) and each node would have 2k entries. The space complexity would then be O((2k ∗ N ∗ W)/k). 2.2.4 Level Compression Trie The path-compressed trie in Section 2.2.2 is an effective way to compress a trie when nodes are sparsely populated. Nilsson and Karlsson [9] have proposed a technique called level compression trie (LC-trie), to compress the trie where nodes are densely populated. The LC-trie actually combines the path-compression and multi-bit trie concepts to optimize a binary trie structure. Data Structure. The fixed-stride multi-bit trie (in Section 2.2.3) can improve lookup performance, but it may incur redundant storage. However, from an angle of local scopes, if we only perform multi-bit lookup wherever the sub-trie structures are full sub-trees, then no redundant storage is needed in those nodes. The construction of the LC-trie is, in fact, a process of transforming a path-compressed trie to a multi-bit path-compressed trie. The process is to find the full sub-trees of a path-compressed trie, and transform them into multi-bit lookup nodes. Therefore, information stored in each node of the LC-trie includes those that are needed for both the path-compressed trie and the multi-bit trie lookup. The LC-trie algorithm starts with a disjoint-prefix binary trie as described in Section 2.2.1, where only leaf nodes contain prefixes. Figure 2.12 shows three different tries: (a) 1-bit binary trie with prefixes at leaf nodes; (b) path-compressed trie; and (c) LC-trie. The LC-trie needs only three levels instead of the six required in the path-compressed trie. Furthermore, Figure 2.12 shows a straight-forward transformation from path-compressed trie to LC-trie, as shown in the dashed boxes in Figures 2.12b and c, where the first three levels of the path-compressed trie that form a full sub-trie are converted to a single-level 3-bit sub-trie in the LC-trie. Route Lookup. Let us again use the 8-bit destination address 11100000 as an example to explain the lookup process. The lookup starts at the root node of the LC-trie in Figure 2.12c. In this node, the multi-bit and path-compression trie lookup information shows ‘stride = 3’ and ‘skip = 0’. ‘Stride = 3’ indicates that there would be 23 = 8 branches in this node and that we should use (the next) 3 bits of the address to find the corresponding branch. ‘Skip = 0’ indicates that no path compression is involved here. The first three bits of the address 111 are inspected, and we should take the (7 + 1) = 8th branch of this node. Then, at the branched node, we have ‘stride = 1’ and ‘skip = 3’. ‘Stride = 1’ indicates that there are only 21 = 2 branches in this node. ‘Skip = 3’ indicates that the following path of the trie is path-compressed, and we should compare with the stored segment, and if it matches then skip the next three bits of the IP address. We skip the next three bits 000 and examine the fourth bit, 0. The bit 0 indicates that we should take the left branch of this node, and then we come to node 13. On finding that node 13 is a leaf node, the lookup stops here, and the next hop information corresponding to node 13 is returned
36 IP ADDRESS LOOKUP 10 13 14 (a) Skip= Skip=3 14 (b) Stride=3,Skip=0 Skip=2 Stride=1.Skip=3 2 0 13 (c) Figure 2.12 (a)One-bit trie;(b)Path-compressed trie;(c)LC-trie. Performance.Instead of only transforming strictly full sub-tries in the path-compressed trie,we could transform nearly full sub-tries as well.An optimization proposed by Nilsson and Karlsson [9]suggests this improvement under the control of a fill factor x,where 0<x<1.The transformation could be performed on this sub-trie when a k-level sub-trie has more than 2.x leaf nodes available. Less than 1-Mbyte memory is required and 2 Mpackets/s(assuming average packet size of 250 bytes)search speed is achieved when running on a SUN Sparc Ultra II
36 IP ADDRESS LOOKUP Figure 2.12 (a) One-bit trie; (b) Path-compressed trie; (c) LC-trie. Performance. Instead of only transforming strictly full sub-tries in the path-compressed trie, we could transform nearly full sub-tries as well. An optimization proposed by Nilsson and Karlsson [9] suggests this improvement under the control of a fill factor x, where 0 < x < 1. The transformation could be performed on this sub-trie when a k-level sub-trie has more than 2k · x leaf nodes available. Less than 1-Mbyte memory is required and 2 Mpackets/s (assuming average packet size of 250 bytes) search speed is achieved when running on a SUN Sparc Ultra II
2.2 TRIE-BASED ALGORITHMS 37 workstation [9].This is for an LC-trie built with a fill factor of 0.5 and using 16 bits for branching at the root and a forwarding table with around 38,000 entries.However,using a node array to store the LC-trie makes incremental updates very difficult. An LC-trie searches in strides of k bits,and thus the lookup complexity of a k-stride LC-trie is O(W/k).To update a particular node,we would have to go through W/k lookups and then access each child of the node(2).Thus,the update complexity is O(W/k+2*). The memory consumption increases exponentially as the stride size (k)increases.In the worst case,each prefix would need an entire path of length(W/k)and each node has 2k entries.The space complexity would then be O((2*N*W)/k). 2.2.5 Lulea Algorithm Degermark et al.[10]have proposed a data structure that can represent large forward- ing tables in a very compact form.This solution is small enough to fit entirely in the Level 1/Level 2 cache.This provides an advantage in that this fast IP route-lookup algorithm can be implemented in software running on general-purpose microprocessors at high speeds. The key idea of the Lulea scheme is to replace all consecutive elements in a trie node that have the same value with a single value,and use a bitmap to indicate which elements have been omitted.This can significantly reduce the number of elements in a trie node and save memory. Basically,the Lulea trie is a multi-bit trie that uses bitmap compression.We begin with the corresponding multi-bit'leaf-pushing'trie structure given in Section 2.2.3.For instance, consider the root node in Figure 2.11.Before bitmap compression,the root node has the sequence of values (P3,P3,P1,P1,ptr1,P4,P2,ptr2),where ptrl is a pointer to the trie node containing P6 and ptr2 is a pointer to the trie node containing P7.If we replace each string of consecutive values by the first value in the sequence,we get P3,-,P1,-,ptrl, P4,P2,ptr2.Notice the two redundant values have been removed.We can now get rid of the original trie node and replace it with a bitmap 10101111 where the '0's indicate the removed position and a compressed list(P3,P1,ptr1,P4,P2,ptr2).The result of doing bitmap compression for all four trie nodes is shown in Figure 2.13. Data Structure.The multi-bit trie in the Lulea algorithm is a disjoint-prefix trie as described in Section 2.2.1.As shown in Figure 2.14,level one of the data structure covers the trie down to depth 16,level two covers depths 17 to 24,and level three depths 25 to 32. A level-two chunk describes parts of the trie that are deeper than 16.Similarly,chunks at level three describe parts of the trie that are deeper than 24.The result of searching a level of the data structure is either an index into the next-hop information or an index into an array of chunks for the next level. The first level is essentially a complete trie with 1-64k children nodes,which covers the trie down to depth 16.Imagine a cut through the trie at depth 16.Figure 2.15 shows an example of partial cut.The cut is stored in a bit-vector,with one bit per possible node at depth 16.Thus,26 bits=64 kbits=8 kbytes are required for this.The upper 16 bits of the address are used as an index into the bit-vector to find the bit corresponding to the initial part of an IP address
2.2 TRIE-BASED ALGORITHMS 37 workstation [9]. This is for an LC-trie built with a fill factor of 0.5 and using 16 bits for branching at the root and a forwarding table with around 38,000 entries. However, using a node array to store the LC-trie makes incremental updates very difficult. An LC-trie searches in strides of k bits, and thus the lookup complexity of a k-stride LC-trie is O(W/k). To update a particular node, we would have to go through W/k lookups and then access each child of the node (2k ). Thus, the update complexity is O(W/k + 2k ). The memory consumption increases exponentially as the stride size (k) increases. In the worst case, each prefix would need an entire path of length (W/k) and each node has 2k entries. The space complexity would then be O((2k ∗ N ∗ W)/k). 2.2.5 Lulea Algorithm Degermark et al. [10] have proposed a data structure that can represent large forwarding tables in a very compact form. This solution is small enough to fit entirely in the Level 1/Level 2 cache. This provides an advantage in that this fast IP route-lookup algorithm can be implemented in software running on general-purpose microprocessors at high speeds. The key idea of the Lulea scheme is to replace all consecutive elements in a trie node that have the same value with a single value, and use a bitmap to indicate which elements have been omitted. This can significantly reduce the number of elements in a trie node and save memory. Basically, the Lulea trie is a multi-bit trie that uses bitmap compression. We begin with the corresponding multi-bit ‘leaf-pushing’trie structure given in Section 2.2.3. For instance, consider the root node in Figure 2.11. Before bitmap compression, the root node has the sequence of values (P3, P3, P1, P1, ptr1, P4, P2, ptr2), where ptr1 is a pointer to the trie node containing P6 and ptr2 is a pointer to the trie node containing P7. If we replace each string of consecutive values by the first value in the sequence, we get P3, −, P1, −, ptr1, P4, P2, ptr2. Notice the two redundant values have been removed. We can now get rid of the original trie node and replace it with a bitmap 10101111 where the ‘0’s indicate the removed position and a compressed list (P3, P1, ptr1, P4, P2, ptr2). The result of doing bitmap compression for all four trie nodes is shown in Figure 2.13. Data Structure. The multi-bit trie in the Lulea algorithm is a disjoint-prefix trie as described in Section 2.2.1. As shown in Figure 2.14, level one of the data structure covers the trie down to depth 16, level two covers depths 17 to 24, and level three depths 25 to 32. A level-two chunk describes parts of the trie that are deeper than 16. Similarly, chunks at level three describe parts of the trie that are deeper than 24. The result of searching a level of the data structure is either an index into the next-hop information or an index into an array of chunks for the next level. The first level is essentially a complete trie with 1–64k children nodes, which covers the trie down to depth 16. Imagine a cut through the trie at depth 16. Figure 2.15 shows an example of partial cut. The cut is stored in a bit-vector, with one bit per possible node at depth 16. Thus, 216 bits = 64 kbits = 8 kbytes are required for this. The upper 16 bits of the address are used as an index into the bit-vector to find the bit corresponding to the initial part of an IP address
38 IP ADDRESS LOOKUP Root node 10101111☐ p34 P1 Ptr1 P4 P2 …+…,…。,,+。 Node 1 Node 2 11101000 11101000 P64 P54 Ptr3 P8 P6 ” P7 P2 Node 3 10001000 P6 P9 Figure 2.13 Example of a Lulea trie. As shown in Figure 2.15,a bit in the bit-vector can be set to: One representing that a prefix tree continues below the cut.They are called a root head (bits 6,12,and 13 in Fig.2.15). One representing a prefix at depth 16 or less.For the latter case,only the lowest bit in the interval covered by that prefix is set.They are called a genuine head (bits 0,4,7, 8,14,and15 in Fig.2.15). Zero,which means that this value is a member of a range covered by a prefix at a depth less than 16 (bits 1,2,3,5,9,10,and 11 in Fig.2.15). 16 Level 1 24 Level 2 32 Level 3 Figure 2.14 Three levels of the data structure [101
38 IP ADDRESS LOOKUP Figure 2.13 Example of a Lulea trie. As shown in Figure 2.15, a bit in the bit-vector can be set to: • One representing that a prefix tree continues below the cut. They are called a root head (bits 6, 12, and 13 in Fig. 2.15). • One representing a prefix at depth 16 or less. For the latter case, only the lowest bit in the interval covered by that prefix is set. They are called a genuine head (bits 0, 4, 7, 8, 14, and 15 in Fig. 2.15). • Zero, which means that this value is a member of a range covered by a prefix at a depth less than 16 (bits 1, 2, 3, 5, 9, 10, and 11 in Fig. 2.15). Figure 2.14 Three levels of the data structure [10]
2.2 TRIE-BASED ALGORITHMS 39 Depth 16 团o@回团回团团团可o可团▣团团■ 0123456789101112131415 Figure 2.15 Part of cut with corresponding bit-vector [10]. A pointer to the next-hop information is stored for genuine heads.Members behind the genuine head use the same index.A pointer to the level two chunk that represents the corresponding sub-trie is stored in the root heads. The head information is encoded in 16-bit pointers stored in an array.Two bits of the pointer encode what kind of pointer it is.The 14 remaining bits are either indices into the next-hop information or into an array containing level two chunks.Note that there are as many pointers associated with a bit-mask as its number of set bits. Finding Pointer Groups.The bit-vector is divided into bit-masks of length 16 and there are 212=4096 of those.Figure 2.16 is an illustration of how the data structure for finding pointers corresponds to the bit-masks.The data structure consists of an array of code words of all bit-masks and an array of base indices of one per four code words.The code words consist of a 6-bit offset (0,3,10,11,...)and a 10-bit value (r1,r2,...). The first bit-mask in Figure 2.16 has three set bits.The second code word thus has an offset of three because three pointers must be skipped over to find the first pointer associated with that bit-mask.The second bit-mask has seven set bits and consequently the offset in the third code word is 3+7=10. After four code words,the offset value might be too large to represent with six bits. Therefore,a base index is used together with the offset to find a group of pointers.There can be at most 64k pointers in a level of the data structure,so the base indices need to be at most 16 bits(216=64k).In Figure 2.16,the second base index is 13 because there are 13 set bits in the first four bit-masks.This explains how a group of pointers is located.The first 12 bits of the IP address are an index to the proper code words.The first 10 bits are an index to the array of base indices. How to find the correct pointer in the group of pointers will now be explained.This is what the 10-bit value is for(r1,r2,...in Fig.2.16).The value is an index into a table that maps bit-numbers in the IP address to pointer offsets.Since bit-masks are generated from a complete trie,not all combinations of the 16 bits are possible.As shown by Degermark et al.[10],the number of possible bit-masks with length 16 is only 678.They are all stored in a table,called maptable,as shown in Figure 2.17.An index into a table with an entry for each bit-mask only needs 10 bits.The content of each entry of the maptable is a bit-mask of 4-bit offsets.The offset specifies how many pointers to skip over to find the wanted one,so it is equal to the number of set bits smaller than the bit index.For instance, if the 2nd 4-bit mask in Figure 2.16 is chosen and the low 4 of high 16 bits of IP address is
2.2 TRIE-BASED ALGORITHMS 39 Figure 2.15 Part of cut with corresponding bit-vector [10]. A pointer to the next-hop information is stored for genuine heads. Members behind the genuine head use the same index. A pointer to the level two chunk that represents the corresponding sub-trie is stored in the root heads. The head information is encoded in 16-bit pointers stored in an array. Two bits of the pointer encode what kind of pointer it is. The 14 remaining bits are either indices into the next-hop information or into an array containing level two chunks. Note that there are as many pointers associated with a bit-mask as its number of set bits. Finding Pointer Groups. The bit-vector is divided into bit-masks of length 16 and there are 212 = 4096 of those. Figure 2.16 is an illustration of how the data structure for finding pointers corresponds to the bit-masks. The data structure consists of an array of code words of all bit-masks and an array of base indices of one per four code words. The code words consist of a 6-bit offset (0, 3, 10, 11, ...) and a 10-bit value (r1,r2, ...). The first bit-mask in Figure 2.16 has three set bits. The second code word thus has an offset of three because three pointers must be skipped over to find the first pointer associated with that bit-mask. The second bit-mask has seven set bits and consequently the offset in the third code word is 3 + 7 = 10. After four code words, the offset value might be too large to represent with six bits. Therefore, a base index is used together with the offset to find a group of pointers. There can be at most 64k pointers in a level of the data structure, so the base indices need to be at most 16 bits (216 = 64k). In Figure 2.16, the second base index is 13 because there are 13 set bits in the first four bit-masks. This explains how a group of pointers is located. The first 12 bits of the IP address are an index to the proper code words. The first 10 bits are an index to the array of base indices. How to find the correct pointer in the group of pointers will now be explained. This is what the 10-bit value is for (r1,r2, ... in Fig. 2.16). The value is an index into a table that maps bit-numbers in the IP address to pointer offsets. Since bit-masks are generated from a complete trie, not all combinations of the 16 bits are possible. As shown by Degermark et al. [10], the number of possible bit-masks with length 16 is only 678. They are all stored in a table, called maptable, as shown in Figure 2.17. An index into a table with an entry for each bit-mask only needs 10 bits. The content of each entry of the maptable is a bit-mask of 4-bit offsets. The offset specifies how many pointers to skip over to find the wanted one, so it is equal to the number of set bits smaller than the bit index. For instance, if the 2nd 4-bit mask in Figure 2.16 is chosen and the low 4 of high 16 bits of IP address is