CHAPTER 2 IP ADDRESS LOOKUP 2.1 OVERVIEW The primary role of routers is to forward packets toward their final destinations.To this purpose,a router must decide for each incoming packet where to send it next,that is,finding the address of the next-hop router as well as the egress port through which the packet should be sent.This forwarding information is stored in a forwarding table that the router computes based on the information gathered by routing protocols.To consult the forwarding table, the router uses the packet's destination address as a key-this operation is called address lookup [1].Once the forwarding information is retrieved,the router can transfer the packet from the incoming link to the appropriate outgoing link. Classful Addressing Scheme.IPv4 IP addresses are 32 bits in length and are divided into 4 octets.Each octet has 8 bits that are separated by dots.For example,the address 10000010 01010110 00010000 01000010 corresponds in dotted-decimal notation to 130.86.16.66.The bits in an IP address are ordered as shown in Figure 2.1,where the 1st bit is the most significant bit(MSB)that lies in the leftmost position.The 32nd bit is the least significant bit(LSB)and it lies in the rightmost position. The IP address consists of two parts.The first part contains the IP addresses for networks and the second part contains the IP addresses for hosts.The network part corresponds to the first bits of the IP address,called the address prefix.We will write prefixes as bit strings of up to 32 bits in IPv4 followed by an asterisk(*).For example,the prefix 1000001001010110* represents all the 216 addresses that begin with the bit pattern 1000001001010110.Alter- natively,prefixes can be indicated using the dotted-decimal notation,so the same prefix can be written as 130.86/16,where the number after the slash indicates the length of the prefix. High Performance Switches and Routers.by H.Jonathan Chao and Bin Liu Copyright 2007 John Wiley Sons,Inc. 25
CHAPTER 2 IP ADDRESS LOOKUP 2.1 OVERVIEW The primary role of routers is to forward packets toward their final destinations. To this purpose, a router must decide for each incoming packet where to send it next, that is, finding the address of the next-hop router as well as the egress port through which the packet should be sent. This forwarding information is stored in a forwarding table that the router computes based on the information gathered by routing protocols. To consult the forwarding table, the router uses the packet’s destination address as a key – this operation is called address lookup [1]. Once the forwarding information is retrieved, the router can transfer the packet from the incoming link to the appropriate outgoing link. Classful Addressing Scheme. IPv4 IP addresses are 32 bits in length and are divided into 4 octets. Each octet has 8 bits that are separated by dots. For example, the address 10000010 01010110 00010000 01000010 corresponds in dotted-decimal notation to 130.86.16.66. The bits in an IP address are ordered as shown in Figure 2.1, where the 1st bit is the most significant bit (MSB) that lies in the leftmost position. The 32nd bit is the least significant bit (LSB) and it lies in the rightmost position. The IP address consists of two parts. The first part contains the IP addresses for networks and the second part contains the IP addresses for hosts. The network part corresponds to the first bits of the IP address, called the address prefix. We will write prefixes as bit strings of up to 32 bits in IPv4 followed by an asterisk(*). For example, the prefix 10000010 01010110* represents all the 216 addresses that begin with the bit pattern 10000010 01010110. Alternatively, prefixes can be indicated using the dotted-decimal notation, so the same prefix can be written as 130.86/16, where the number after the slash indicates the length of the prefix. High Performance Switches and Routers, by H. Jonathan Chao and Bin Liu Copyright © 2007 John Wiley & Sons, Inc. 25
26 IP ADDRESS LOOKUP MSB LSB 30 31 32 Figure 2.1 IP address bit positions Since routing occurs at the network level to locate the destination network,routers only forward packets based on network level IP addresses.Thus,all hosts attached to the network can be stored in the router's forwarding table by a single network IP address,known as address aggregation.A group of addresses are represented by prefixes.An example of a router's forwarding table is shown in Table 2.1.Each entry in the forwarding table contains a prefix,next-hop IP address,and output interface number.The forwarding information is located by searching for the prefix of the destination address. The Internet addressing architecture was first designed using an allocation scheme known as classful addressing.Classful addressing defines three different sized networks of classes: A,B,or C(Fig.2.2).The classes are based on the amount of IP addresses contained in the network partition.With the IPv4 address space of 32 bits,Class A has a network size of 8 bits and a host size of 24 bits.Class B has a network size of 16bits and a host size of 16 bits.Class C has a network size of 24 bits and a host size of 8 bits.Class D is for multicasting applications. The classful addressing scheme created very few class A networks.Their address space contains 50 percent of the total IPv4 address space (231 addresses out of a total of 232). Class B address space contains 16.384(214)networks with up to65,534 hosts per network. Class Caddress space contains 2,097.152(221)networks with up to 256 hosts per network. Classless Inter-Domain Routing(CIDR)Addressing Scheme.The evolution and growth of the Internet in recent years has proven that the classful address scheme is inflexible and wasteful.For most organizations,class C is too small while class B is too large.The three choices resulted in address space being exhausted very rapidly,even though only a small fraction of the addresses allocated were actually in use.The lack of a network class of a size that is appropriate for mid-sized organizations results in the exhaustion of the class B network address space.In order to use the address space efficiently,bundles of class C addresses were given out instead of class B addresses.This causes a massive growth of forwarding table entries. CIDR [2]was introduced to remedy the inefficiencies of classful addressing.The Inter- net Engineering Task Force (IETF)began implementing CIDR in the early 1990s [2,3]. With CIDR,IP address space is better conserved through arbitrary aggregation of network TABLE 2.1 Router's Forwarding Table Structure [1] Destination Address Prefix Next Hop IP Address Output Interface 24.40.32/20 192.41.177.148 130.86/16 192.41.177.181 6 208.12.16/20 192.41.177.241 4 208.12.21/24 192.41.177.196 167.24.103/24 192.41.177.3 4
26 IP ADDRESS LOOKUP Figure 2.1 IP address bit positions. Since routing occurs at the network level to locate the destination network, routers only forward packets based on network level IP addresses. Thus, all hosts attached to the network can be stored in the router’s forwarding table by a single network IP address, known as address aggregation. A group of addresses are represented by prefixes. An example of a router’s forwarding table is shown in Table 2.1. Each entry in the forwarding table contains a prefix, next-hop IP address, and output interface number. The forwarding information is located by searching for the prefix of the destination address. The Internet addressing architecture was first designed using an allocation scheme known as classful addressing. Classful addressing defines three different sized networks of classes: A, B, or C (Fig. 2.2). The classes are based on the amount of IP addresses contained in the network partition. With the IPv4 address space of 32 bits, Class A has a network size of 8 bits and a host size of 24 bits. Class B has a network size of 16 bits and a host size of 16 bits. Class C has a network size of 24 bits and a host size of 8 bits. Class D is for multicasting applications. The classful addressing scheme created very few class A networks. Their address space contains 50 percent of the total IPv4 address space (231 addresses out of a total of 232). Class B address space contains 16,384 (214) networks with up to 65,534 hosts per network. Class C address space contains 2,097,152 (221) networks with up to 256 hosts per network. Classless Inter-Domain Routing (CIDR) Addressing Scheme. The evolution and growth of the Internet in recent years has proven that the classful address scheme is inflexible and wasteful. For most organizations, class C is too small while class B is too large. The three choices resulted in address space being exhausted very rapidly, even though only a small fraction of the addresses allocated were actually in use. The lack of a network class of a size that is appropriate for mid-sized organizations results in the exhaustion of the class B network address space. In order to use the address space efficiently, bundles of class C addresses were given out instead of class B addresses. This causes a massive growth of forwarding table entries. CIDR [2] was introduced to remedy the inefficiencies of classful addressing. The Internet Engineering Task Force (IETF) began implementing CIDR in the early 1990s [2, 3]. With CIDR, IP address space is better conserved through arbitrary aggregation of network TABLE 2.1 Router’s Forwarding Table Structure [1] Destination Address Prefix Next Hop IP Address Output Interface 24.40.32/20 192.41.177.148 2 130.86/16 192.41.177.181 6 208.12.16/20 192.41.177.241 4 208.12.21/24 192.41.177.196 1 167.24.103/24 192.41.177.3 4
2.1 OVERVIEW 27 ClassA 0 Network Host 14 Class B 10 Network Host 21 Class C 110 Network Host 28 Class D 1110 Multicast network Figure 2.2 Classful addresses [1]. addresses rather than being limited to 8,16,or 24 bits in length for the network part.This type of granularity provides an organization with more precise matches for IP address space requirements.The growth of forwarding table entries is also slowed by allowing address aggregation to occur at several levels within the heirarchy of the Internet's topology.Back- bone routers can now maintain the forwarding information at the level of the arbitrary aggregates of networks,instead of at the network level only. For example,consider the networks represented by the network numbers from 208.12.16/24 through 208.12.31/24(see Fig.2.3)and in arouter all these network addresses are reachable through the same service provider.The leftmost 20 bits of all the addresses in this range are the same (11010000 00001100 0001).Thus,these 16 networks can be aggregated into one 'supemetwork'represented by the 20-bit prefix,which in decimal notation gives 208.12.16/20.Indicating the prefix length is necessary in decimal notation, because the same value may be associated with prefixes of different lengths;for instance, 208.12.16/20(11010000000011000001*)is different from208.12.16/22(11010000 00001100000100*). Address aggregation does not reduce entries in the router's forwarding table for all cases. Consider the scenario where a customer owns the network 208.12.21/24 and changes its service provider,but does not want to renumber its network.Now,all the networks from 208.12.16/24 through 208.12.31/24 can be reached through the same service provider, except for the network 208.12.21/24 (see Fig.2.4).We cannot perform aggregation as before,and instead of only one entry,16 entries need to be stored in the forwarding table. One solution is aggregating in spite of the exception networks and additionally storing
2.1 OVERVIEW 27 Figure 2.2 Classful addresses [1]. addresses rather than being limited to 8, 16, or 24 bits in length for the network part. This type of granularity provides an organization with more precise matches for IP address space requirements. The growth of forwarding table entries is also slowed by allowing address aggregation to occur at several levels within the heirarchy of the Internet’s topology. Backbone routers can now maintain the forwarding information at the level of the arbitrary aggregates of networks, instead of at the network level only. For example, consider the networks represented by the network numbers from 208.12.16/24 through 208.12.31/24 (see Fig. 2.3) and in a router all these network addresses are reachable through the same service provider. The leftmost 20 bits of all the addresses in this range are the same (11010000 00001100 0001). Thus, these 16 networks can be aggregated into one ‘supernetwork’ represented by the 20-bit prefix, which in decimal notation gives 208.12.16/20. Indicating the prefix length is necessary in decimal notation, because the same value may be associated with prefixes of different lengths; for instance, 208.12.16/20 (11010000 00001100 0001*) is different from 208.12.16/22 (11010000 00001100 000100*). Address aggregation does not reduce entries in the router’s forwarding table for all cases. Consider the scenario where a customer owns the network 208.12.21/24 and changes its service provider, but does not want to renumber its network. Now, all the networks from 208.12.16/24 through 208.12.31/24 can be reached through the same service provider, except for the network 208.12.21/24 (see Fig. 2.4). We cannot perform aggregation as before, and instead of only one entry, 16 entries need to be stored in the forwarding table. One solution is aggregating in spite of the exception networks and additionally storing
28 IP ADDRESS LOOKUP 208.12.16/24 110100000000110000010000* 208.12.21/24 110100000000110000010101* .. 208.12.31/24 110100000000110000011111* 208.12.16/20 11010000000011000001* Figure 2.3 Prefix aggregation [1]. 208.12.16/24 208.12.21/24 208.12.31/24 Total IPv4 address space 232-1 Figure 2.4 Prefix ranges [1]. entries for the exception networks.In this example,this will result in only two entries in the forwarding table:208.12.16/20 and 208.12.21/24 (see Fig.2.5 and Table 2.1).However, now some addresses will match both entries because of the prefixes overlap.In order to always make the correct forwarding decision,routers need to do more than search for a prefix that matches.Since exceptions in the aggregations may exist,a router must find the most specific match,which is the longest matching prefix.In summary,the address lookup problem in routers requires searching the forwarding table for the longest prefix that matches the destination address of a packet. Obviously,the longest prefix match is harder than the exact match used for class-based addressing because the destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix.Hence,we need to search among the space of all prefix lengths,as well as the space of all prefixes of a given length.Many algorithms have been proposed in recent years regarding the longest prefix match.This chapter provides a survey of these techniques.But before that,we introduce some performance metrics [4]for the comparison of these lookup algorithms. 208.12.2124 208.12.16/20 Total IPv4 address space 0 232-1 These addresses match both prefixes Figure 2.5 Exception prefix [1]
28 IP ADDRESS LOOKUP Figure 2.3 Prefix aggregation [1]. Figure 2.4 Prefix ranges [1]. entries for the exception networks. In this example, this will result in only two entries in the forwarding table: 208.12.16/20 and 208.12.21/24 (see Fig. 2.5 and Table 2.1). However, now some addresses will match both entries because of the prefixes overlap. In order to always make the correct forwarding decision, routers need to do more than search for a prefix that matches. Since exceptions in the aggregations may exist, a router must find the most specific match, which is the longest matching prefix. In summary, the address lookup problem in routers requires searching the forwarding table for the longest prefix that matches the destination address of a packet. Obviously, the longest prefix match is harder than the exact match used for class-based addressing because the destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix. Hence, we need to search among the space of all prefix lengths, as well as the space of all prefixes of a given length. Many algorithms have been proposed in recent years regarding the longest prefix match. This chapter provides a survey of these techniques. But before that, we introduce some performance metrics [4] for the comparison of these lookup algorithms. Figure 2.5 Exception prefix [1]
2.2 TRIE-BASED ALGORITHMS 29 Lookup Speed.The explosive growth of link bandwidth requires faster IP lookups.For example,links running at 10 Gbps can carry 31.25 million packets per second(mpps) (assuming minimum sized 40-byte IP packets). Storage Requirement.Small storage means fast memory access speed and low power consumption,which are important for cache-based software algorithms and SRAM (static RAM)-based hardware algorithms. Update Time.Currently,the Internet has a peak of a few hundred BGP(Border Gateway Protocol)updates per second.Thus,a certain algorithm should be able to perform 1k updates per second to avoid routing instabilities.These updates should interfere little with normal lookup operations. Scalability.It is expected that the size of forwarding tables will increase at a speed of 25k entries per year.Hence,there will be around 250k entries for the next five years. The ability of an algorithm to handle large forwarding tables is required. Flexibility in Implementation.Most current lookup algorithms can be implemented in either software or hardware.Some of them have the flexibility of being implemented in different ways,such as ASIC,a network processor,or a generic processor. 2.2 TRIE-BASED ALGORITHMS 2.2.1 Binary Trie A trie structure is a multi-way tree where each node contains zero or more pointers to point its child nodes,allowing the organization of prefixes on a digital basis by using the bits of prefixes to direct the branching.In the binary trie(or 1-bit trie)structure [5],each node contains two pointers,the 0-pointer and the 1-pointer. Data Structure.A node X at the level h of the trie represents the set of all addresses that begin with the sequence of h bits consisting of the string of bits labeling the path from the root to that node.Depending on the value of the (h+1)th bit,0 or 1,each pointer of the node X points to the corresponding subtree (if it exists),which represents the set of all route prefixes that have the same(h+1)bits as their first bits.An example data structure of each node (i.e.,the entry in a memory)is shown in Figure 2.6,including the next hop address (if it is a prefix node),a left pointer pointing to the left node location (with an address bit 0) and a right pointer pointing to the right node location (with an address bit 1). A prefix database is defined as a collection of all prefixes in a forwarding table.A prefix database example is shown in Figure 2.6 [6],where the prefixes are arranged in an ascending order of their lengths for easy illustration. To add a route prefix,say 10111*,simply follow the pointer to where 10111 would be in the trie.If no pointer exists for that prefix,it should be added.If the node for the prefix already exists,it needs to be marked with a label as being in the forwarding table(for example,Pi).The nodes in gray are prefix nodes.When deleting a route prefix that has no children,the node and the pointer pointing to it are deleted and the parent node is examined. If the parent node has another child or it is a gray node,it is left alone.Otherwise,that node is also deleted and its parent node is examined.The deletion process is repeated up to the trie until a node that has another child or a gray node is found
2.2 TRIE-BASED ALGORITHMS 29 Lookup Speed. The explosive growth of link bandwidth requires faster IP lookups. For example, links running at 10 Gbps can carry 31.25 million packets per second (mpps) (assuming minimum sized 40-byte IP packets). Storage Requirement. Small storage means fast memory access speed and low power consumption, which are important for cache-based software algorithms and SRAM (static RAM)-based hardware algorithms. Update Time. Currently, the Internet has a peak of a few hundred BGP (Border Gateway Protocol) updates per second. Thus, a certain algorithm should be able to perform 1k updates per second to avoid routing instabilities. These updates should interfere little with normal lookup operations. Scalability. It is expected that the size of forwarding tables will increase at a speed of 25k entries per year. Hence, there will be around 250 k entries for the next five years. The ability of an algorithm to handle large forwarding tables is required. Flexibility in Implementation. Most current lookup algorithms can be implemented in either software or hardware. Some of them have the flexibility of being implemented in different ways, such as ASIC, a network processor, or a generic processor. 2.2 TRIE-BASED ALGORITHMS 2.2.1 Binary Trie A trie structure is a multi-way tree where each node contains zero or more pointers to point its child nodes, allowing the organization of prefixes on a digital basis by using the bits of prefixes to direct the branching. In the binary trie (or 1-bit trie) structure [5], each node contains two pointers, the 0-pointer and the 1-pointer. Data Structure. A node X at the level h of the trie represents the set of all addresses that begin with the sequence of h bits consisting of the string of bits labeling the path from the root to that node. Depending on the value of the (h + 1)th bit, 0 or 1, each pointer of the node X points to the corresponding subtree (if it exists), which represents the set of all route prefixes that have the same (h + 1) bits as their first bits. An example data structure of each node (i.e., the entry in a memory) is shown in Figure 2.6, including the next hop address (if it is a prefix node), a left pointer pointing to the left node location (with an address bit 0) and a right pointer pointing to the right node location (with an address bit 1). A prefix database is defined as a collection of all prefixes in a forwarding table. A prefix database example is shown in Figure 2.6 [6], where the prefixes are arranged in an ascending order of their lengths for easy illustration. To add a route prefix, say 10111*, simply follow the pointer to where 10111 would be in the trie. If no pointer exists for that prefix, it should be added. If the node for the prefix already exists, it needs to be marked with a label as being in the forwarding table (for example, Pi). The nodes in gray are prefix nodes. When deleting a route prefix that has no children, the node and the pointer pointing to it are deleted and the parent node is examined. If the parent node has another child or it is a gray node, it is left alone. Otherwise, that node is also deleted and its parent node is examined. The deletion process is repeated up to the trie until a node that has another child or a gray node is found