Sec.5.1 Introduction 373 order,and some packets may travel on long routes to their destinations:indeed.one should take precautions to ensure that a packet cannot travel on a cycle indefinitely.For some recent analyses of deflection routing,see [BrC91a],[BrC91b],[HaC90j,[Haj91]. [GrH89],[Max87],[Syz90],[Var90]. Cut-through routing.We have been implicitly assuming thus far that a packet must be fully received at a node before that node starts relaying it to another node.There is,however,an incentive to split a long message into several smaller packets in order to reduce the message's delay on multiple link paths,taking advantage of pipelining (see the discussion in Section 2.5.5).The idea of message splitting.when carried to its extreme,leads to a transmission method called cut-through routing,whereby a node can start relaying any portion of a packet to another node without waiting to receive the packet in its entirety.In the absence of additional traffic on the path,the delay of the packet is equal to the transmission time on the slowest link of the path plus the propagation delay,regardless of the number of links on the path.Thus,delay can be reduced by as much as a factor of n on a path with n links. Note that the nature of cut-through routing is such that error detection and retrans- mission cannot be done on a link-by-link basis;it must be done on an end-to-end basis. Another issue is that pieces of the same packet may simultaneously be traveling on dif- ferent links while other pieces are stored at different nodes.To keep the pieces of the packet together,one imposes the requirement that once a packet starts getting transmitted on a link,the link is reserved until all bits of the packet are transmitted.The reservation is made when the link transmits the packet's first bit,and it is released when the link transmits the packet's last bit.Thus,at any time a packet is reserving a portion of its path consisting of several links as shown in Fig.5.10.Portions of the packet may be stored simultaneously at several nodes either because two or more links on the path have unequal transmission capacities.or because the packet's header had to wait in queue for a link to become available.To avoid the need for complex error recovery procedures.it is generally advisable to ensure that when the packet's first bit is transmitted by a link. Tail end of packet Bits in propagation Front end of packet (ihlli Bits processed Other packet by node B transmitted from C to D Figure 5.10 Illustration of cut-through routing.Here a packet originates at node A and is destined for node D.after going through nodes B and C.The header of the packet traveled to node B,found the link BC free.and continued to node C without waiting for the rest of the packet to arrive at B.Upon arrival to C.the header found that the link CD was in use.so it was forced to wait at node C.The header reserves links as it travels through them.A link reservation is relinquished only after the last bit of the packet goes through the link,so a packet stays together even though portions of it may be residing in scveral different nodes
Sec. 5.1 Introduction 373 order, and some packets may travel on long routes to their destinations; indeed, one should take precautions to ensure that a packet cannot travel on a cycle indefinitely. For some recent analyses of deflection routing, see [BrC9IaJ, [BrC9IbJ, [HaC90j, [Haj91], [GrH89j, [Max87], [Syz90J, [Var90]. Cut-through routing. We have been implicitly assuming thus far that a packet must be fully received at a node before that node starts relaying it to another node. There is, however, an incentive to split a long message into several smaller packets in order to reduce the message's delay on multiple link paths, taking advantage of pipelining (see the discussion in Section 2.5.5). The idea of message splitting, when carried to its extreme, leads to a transmission method called cut-through routing, whereby a node can start relaying any portion of a packet to another node without waiting to receive the packet in its entirety. In the absence of additional traffic on the path, the delay of the packet is equal to the transmission time on the slowest link of the path plus the propagation delay, regardless of the number of links on the path. Thus, delay can be reduced by as much as a factor of n on a path with n links. Note that the nature of cut-through routing is such that error detection and retransmission cannot be done on a link-by-link basis; it must be done on an end-to-end basis. Another issue is that pieces of the same packet may simultaneously be traveling on different links while other pieces are stored at different nodes. To keep the pieces of the packet together, one imposes the requirement that once a packet starts getting transmitted on a link, the link is reserved until all bits of the packet are transmitted. The reservation is made when the link transmits the packet's first bit, and it is released when the link transmits the packet's last bit. Thus, at any time a packet is reserving a portion of its path consisting of several links as shown in Fig. 5.10. Portions of the packet may be stored simultaneously at several nodes either because two or more links on the path have unequal transmission capacities, or because the packet's header had to wait in queue for a link to become available. To avoid the need for complex error recovery procedures, it is generally advisable to ensure that when the packet's first bit is transmitted by a link, Tail end of packet Bits processed by node 8 Front end of packet Other packet transmitted from eta D Figure 5.10 Illustration of cut-through routing. Here a packet originates at node A and is destined for node D, after going through nodes Rand e. The header of the packet traveled to node E, found the link Be free. and continued to node C without waiting for the rest of the packet to arrive at B. Upon arrival to e, the header found that the link e D was in use, so it was forced to wait at node e. The header reserves links as it travels through them. A link reservation is relinquished only after the last bit of the packet goes through the link. so a packet stays together even though portions of it may be residing in several different nodes
374 Routing in Data Networks Chap.5 there is enough buffer space to store the entire packet at the other end of the link.Note that cut-through routing encounters a serious problem if a slow link is followed by a fast link along a packet's path.Then the fast link must effectively be slowed down to the speed of the slow link,and the associated idle fill issues may be hard to resolve. A variation of cut-through routing,motivated by storage concerns,drops a packet once its first bit encounters a busy link along its path.Thus a packet is either successful in crossing its path without any queueing delay,or else it is discarded and must be retransmitted in its entirety,similar to circuit switching.Indeed,with some thought,it can be seen that this version of cut-through routing is equivalent to a form of circuit switching; the circuit's path is set up by the first bit of the packet,and once the circuit is established, its duration is equal to the packet's transmission time from origin to destination. ARPANET:An example of datagram routing.The routing algorithm of the ARPANET,first implemented in 1969,has played an important historical role in the development of routing techniques.This was an ambitious,distributed,adaptive algorithm that stimulated considerable research on routing and distributed computation in general.On the other hand,the algorithm had some fundamental flaws that were finally corrected in 1979 when it was replaced by a new version.By that time,however, the original algorithm had been adopted by several other networks. Shortest path routing is used in both ARPANET algorithms.The length of each link is defined by some measure of traffic congestion on the link,and is updated peri- odically.Thus,the ARPANET algorithms are adaptive and,since the ARPANET uses datagrams,two successive packets of the same session may follow different routes.This has two undesirable effects.First,packets can arrive at their destination out of order and must,therefore,be put back in order upon arrival.(This can also happen because of the data link protocol of the ARPANET,which uses eight logical channels;see the discussion in Chapter 2.)Second,the ARPANET algorithms are prone to oscillations. This phenomenon is explained in some detail in Section 5.2.5,but basically the idea is that selecting routes through one area of the network increases the lengths of the cor- responding links.As a result,at the next routing update the algorithm tends to select routes through different areas.This makes the first area desirable at the subsequent routing update with an oscillation resulting.The feedback effect between link lengths and routing updates was a primary flaw of the original ARPANET algorithm that caused difficulties over several years and eventually led to its replacement.The latest algorithm is also prone to oscillations,but not nearly as much as the first (see Section 5.2.5). In the original ARPANET algorithm,neighboring nodes exchanged their estimated shortest distances to each destination every 625 msec.The algorithm for updating the shortest distance estimate D;of node i to a given destination is based on the Bellman- Ford method d;:minld+Di] outlined previously and discussed further in Sections 5.2.3 and 5.2.4.Each link length di;was made dependent on the number of packets waiting in the queue of link (,j)at the time of the update.Thus,link lengths were changing very rapidly,reflecting statistical traffic fluctuations as well as the effect of routing updates.To stabilize oscillations,a
374 Routing in Data Networks Chap. 5 there is enough buffer space to store the entire packet at the other end of the link. Note that cut-through routing encounters a serious problem if a slow link is followed by a fast link along a packet's path. Then the fast link must effectively be slowed down to the speed of the slow link, and the associated idle fill issues may be hard to resolve. A variation of cut-through routing, motivated by storage concerns, drops a packet once its first bit encounters a busy link along its path. Thus a packet is either successful in crossing its path without any queueing delay, or else it is discarded and must be retransmitted in its entirety, similar to circuit switching. Indeed, with some thought, it can be seen that this version of cut-through routing is equivalent to a form of circuit switching; the circuit's path is set up by the first bit of the packet, and once the circuit is established, its duration is equal to the packet's transmission time from origin to destination. ARPANET: An example of datagram routing. The routing algorithm of the ARPANET, first implemented in 1969, has played an important historical role in the development of routing techniques. This was an ambitious, distributed, adaptive algorithm that stimulated considerable research on routing and distributed computation in general. On the other hand, the algorithm had some fundamental flaws that were finally corrected in 1979 when it was replaced by a new version. By that time, however, the original algorithm had been adopted by several other networks. Shortest path routing is used in both ARPANET algorithms. The length of each link is defined by some measure of traffic congestion on the link, and is updated periodically. Thus, the ARPANET algorithms are adaptive and, since the ARPANET uses datagrams, two successive packets of the same session may follow different routes. This has two undesirable effects. First, packets can arrive at their destination out of order and must, therefore, be put back in order upon arrival. (This can also happen because of the data link protocol of the ARPANET, which uses eight logical channels; see the discussion in Chapter 2.) Second, the ARPANET algorithms are prone to oscillations. This phenomenon is explained in some detail in Section 5.2.5, but basically the idea is that selecting routes through one area of the network increases the lengths of the corresponding links. As a result, at the next routing update the algorithm tends to select routes through different areas. This makes the first area desirable at the subsequent routing update with an oscillation resulting. The feedback effect between link lengths and routing updates was a primary flaw of the original ARPANET algorithm that caused difficulties over several years and eventually led to its replacement. The latest algorithm is also prone to oscillations, but not nearly as much as the first (see Section 5.2.5). In the original ARPANET algorithm, neighboring nodes exchanged their estimated shortest distances to each destination every 625 msec. The algorithm for updating the shortest distance estimate D i of node i to a given destination is based on the BellmanFord method d i := min[dij + DjJ J outlined previously and discussed further in Sections 5.2.3 and 5.2.4. Each link length dij was made dependent on the number of packets waiting in the queue of link (i, j) at the time of the update. Thus, link lengths were changing very rapidly, reflecting statistical traffic fluctuations as well as the effect of routing updates. To stabilize oscillations, a
Sec.5.1 Introduction 375 large positive constant was added to the link lengths.Unfortunately,this reduced the sensitivity of the algorithm to traffic congestion. In the second version of the ARPANET algorithm [MRR80J,known as Shortest Path First (or SPF for short),the length of each link is calculated by keeping track of the delay of each packet in crossing the link.Each link length is updated periodically every 10 sec,and is taken to be the average packet delay on the link during the preceding 10-sec period.The delay of a packet on a link is defined as the time between the packet's arrival at the link's start node and the time the packet is correctly delivered to the link's end node (propagation delay is also included).Each node monitors the lengths of its outgoing links and broadcasts these lengths throughout the network at least once every 60 sec by using a flooding algorithm(see Section 5.3.1).Upon reception of a new link length,each node recalculates a shortest path from itself to each destination,using an incremental form of Dijkstra's algorithm (see Section 5.2.3).The algorithm is asynchronous in nature in that link length update messages are neither transmitted nor received simultaneously at all nodes.It turns out that this tends to improve the stability properties of the algorithm(see Section 5.2.5).Despite this fact,with increased traffic load,oscillations became serious enough to warrant a revision of the method for calculating link lengths.In the current method,implemented in 1987,the range of possible link lengths and the rate at which these lengths can change have been drastically reduced,with a substantial improvement of the algorithm's dynamic behavior ([KhZ89]and [ZVK89]). Note that a node is not concerned with calculating routing paths that originate at other nodes even though the information available at each node (network topology plus lengths of all links)is sufficient to compute a shortest path from every origin to every destination.In fact,the routing tables of an ARPANET node contain only the first outgoing link of the shortest path from the node to each destination (see Fig.5.1 1). When the node receives a new packet,it checks the packet's destination,consults its routing table,and places the packet in the corresponding outgoing link queue.If all nodes visited by a packet have calculated their routing tables on the basis of the same link length information,it is easily seen that the packet will travel along a shortest path. It is actually possible that a packet will occasionally travel in a loop because of changes in routing tables as the packet is in transit.However,looping of this type is rare,and when it happens,it is unlikely to persist. The ARPANET routing tables can also be used conveniently for broadcasting a packet to a selected set of destinations.Multiple copies of the packet are transmitted along shortest paths to the destinations,with each copy guided by a header that includes a subset of the destinations.However,care is taken to avoid unnecessary proliferation of the copies.In particular.let D be the set of destinations and for every link I incident to the packet's origin,let D:be the subset of destinations for which is the first link on the shortest path from the origin(the origin knows D by looking at its routing tables).The origin starts the broadcast by transmitting a copy of the packet on each link for which D is nonempty and includes the destination set D in the copy's header.Each node s that subsequently receives a copy of the packet repeats this process with D replaced by the set of remaining destinations(the set of destinations in the copy's header other than s;if s is the only destination on the copy,the node does not relay it further,as shown
Sec. 5.1 Introduction 375 large positive constant was added to the link lengths. Unfortunately, this reduced the sensitivity of the algorithm to traffic congestion. In the second version of the ARPANET algorithm [MRR80J, known as Shortest Path First (or SPF for short), the length of each link is calculated by keeping track of the delay of each packet in crossing the link. Each link length is updated periodically every 10 sec, and is taken to be the average packet delay on the link during the preceding 10-sec period. The delay of a packet on a link is defined as the time between the packet's arrival at the link's start node and the time the packet is correctly delivered to the link's end node (propagation delay is also included). Each node monitors the lengths of its outgoing links and broadcasts these lengths throughout the network at least once every 60 sec by using a flooding algorithm (see Section 5.3.1). Upon reception of a new link length, each node recalculates a shortest path from itself to each destination, using an incremental form of Dijkstra's algorithm (see Section 5.2.3). The algorithm is asynchronous in nature in that link length update messages are neither transmitted nor received simultaneously at all nodes. It turns out that this tends to improve the stability properties of the algorithm (see Section 5.2.5). Despite this fact, with increased traffic load, oscillations became serious enough to warrant a revision of the method for calculating link lengths. In the current method, implemented in 1987, the range of possible link lengths and the rate at which these lengths can change have been drastically reduced, with a substantial improvement of the algorithm's dynamic behavior ([KhZ89] and [ZYK89]). Note that a node is not concerned with calculating routing paths that originate at other nodes even though the information available at each node (network topology plus lengths of all links) is sufficient to compute a shortest path from every origin to every destination. In fact, the routing tables of an ARPANET node contain only the first outgoing link of the shortest path from the node to each destination (see Fig. 5.11). When the node receives a new packet, it checks the packet's destination, consults its routing table, and places the packet in the corresponding outgoing link queue. If all nodes visited by a packet have calculated their routing tables on the basis of the same link length information, it is easily seen that the packet will travel along a shortest path. It is actually possible that a packet will occasionally travel in a loop because of changes in routing tables as the packet is in transit. However, looping of this type is rare, and when it happens, it is unlikely to persist. The ARPANET routing tables can also be used conveniently for broadcasting a packet to a selected set of destinations. Multiple copies of the packet are transmitted along shortest paths to the destinations, with each copy guided by a header that includes a subset of the destinations. However, care is taken to avoid unnecessary proliferation of the copies. In particular, let D be the set of destinations and for every link 1 incident to the packet's origin, let D[ be the subset of destinations for which 1is the first link on the shortest path from the origin (the origin knows D[ by looking at its routing tables). The origin starts the broadcast by transmitting a copy of the packet on each link 1 for which D[ is nonempty and includes the destination set D z in the copy's header. Each node 8 that subsequently receives a copy of the packet repeats this process with D replaced by the set of remaining destinations (the set of destinations in the copy's header other than s; if s is the only destination on the copy, the node does not relay it further, as shown
376 Routing in Data Networks Chap.5 Routing table of 2 Destination Out link 1 (2.4) 3 2.41 4 (2,4) Length 5 Length 6 4 1 3 4 Routing table of 1 2 Routing table of 3 Destination Out link Length 3 Length 2 Destination Out link (1,4) 1 (3.4) 3 1,4 2 3.4 1,4) (3.4) Routing table of 4 Destination Out link 1 (4.1) 23 (4,21 (4,3} Figure 5.11 Routing tables maintained by the nodes in the ARPANET algorithm.Each node calculates a shortest path from itself to each destination.and enters the first link on that path in its routing table. in Fig.5.12).It can be seen with a little thought that if the routing tables of the nodes are consistent in the sense that they are based on the same link length information,the packet will be broadcast along the shortest path tree to all its destinations and with a minimal number of transmissions. Figure 5.12 Broadcasting a packet to a selected set of destinations using the ARPANET routing tables.In this example. 2] 48→ the broadcast is from node 1 to nodes 2.3. 4,6,and 8.The figure shows the links on which a copy of the packet is transmitted. the header of each copy.and the shortest 2 3 4 path tree from node I to all destinations (each node actually maintains only its 8] outgoing links on the tree).The number of transmissions is five.If a separate copy were to be transmitted to each destination. 5 the number of transmissions would be seven
376 2 \V 4 Routing in Data Networks Routing table of 2 Destination Out link 1 (2,4) 3 (2,4) 4 (2,4) Chap. 5 2 4 Routing table of 1 Destination Out link 2 (1,4) 3 (1,4) 4 (1,4) 2 4 Routing table of 3 Destination Out link 1 (3,4) 2 (3,4) 4 (3,4) 2 \V 4 Routing table of 4 Destination Out link 1 (4, 1) 2 (4,21 3 (4,31 Figure 5.11 Routing tables maintained by the nodes in the ARPANET algorithm. Each node calculates a shortest path from itself to each destination. and enters the first link on that path in its routing table. in Fig. 5.12). It can be seen with a little thought that if the routing tables of the nodes are consistent in the sense that they are based on the same link length information, the packet will be broadcast along the shortest path tree to all its destinations and with a minimal number of transmissions. 8 Figure 5.12 Broadcasting a packet to a selected set of destinations using the ARPANET routing tables. In this example. the broadcast is from node I to nodes 2. 3. 4, 6. and 8. The figure shows the links on which a copy of the packet is transmitted. the header of each copy. and the shortest path tree from node I to all destinations (each node actually maintains only its outgoing links on the tree). The number of transmissions is five. If a separate copy were to be transmitted to each destination. the number of transmissions would be seven
Sec.5.1 Introduction 377 TYMNET:An example of virtual circuit routing.The TYMNET routing algorithm,originally implemented in 1971,is based on the shortest path method.and is adaptive like the ARPANET algorithms.However,the implementation of the adaptive shortest path idea is very different in the two networks. In the TYMNET,the routing algorithm is centralized and is operated at a special node called the supervisor.TYMNET uses virtual circuits.so a routing decision is needed only at the time a virtual circuit is set up.A virtual circuit request reccived by the supervisor is assigned to a shortest path connecting the origin and destination nodes of the virtual circuit.The supervisor maintains a list of current link lengths and does all the shortest path calculations.The length of each link depends on the load carried by the link as well as other factors,such as the type of the link and the type of virtual circuit being routed [Tym811.While the algorithm can be classified as adaptive,it does not have the potential for oscillations of the ARPANET algorithms.This is due primarily to the use of virtual circuits rather than datagrams,as explained in Section 5.2.5. Once the supervisor has decided upon the path to be used by a virtual circuit.it in- forms the nodes on the path and the necessary information is entered in each node's rout- ing tables.These tables are structured as shown in Fig.5.13.Basically,a virtual circuit is assigned a channel(or port)number on each link it crosses.The routing table at each node matches each virtual circuit's channel number on the incoming link with a channel number on the outgoing link.In the original version of TYMNET (now called TYMNET I).the supervisor maintains an image of the routing tables of all nodes and explicitly reads and writes the tables in the nodes.In the current version (TYMNET II).the nodes maintain their own tables.The supervisor establishes a new virtual circuit by sending a"needle" packet to the origin node.The needle contains routing information and threads its way through the network,building the circuit as it goes,with the user data following behind it. Input Output from user Link 1 Link 2 to user Port 5 Port 7 10 Node A Node B Node C Figure 5.13 Structure of TYMNET routing tables.For the virtual circuit on the path ABC,the routing table at node A maps input port 5 onto channel 4 on link I.At node B. incoming channel 4 is mapped into outgoing channel 10 on link 2.Ai node C.incoming channel 10 is mapped into output port 7
Sec. 5.1 Introduction 377 TYMNET: An example of virtual circuit routing. The TYMNET routing algorithm, originally implemented in 1971, is based on the shortest path method. and is adaptive like the ARPANET algorithms. However, the implementation of the adaptive shortest path idea is very different in the two networks. In the TYMNET, the routing algorithm is centralized and is operated at a special node called the supervisor. TYMNET uses virtual circuits. so a rouljng decision i.\ needed only at the time a virtual circuit is set up. A virtual circuit request received by the supervisor is assigned to a shortest path connecting the origin and destination nodes of the virtual circuit. The supervisor maintains a list of current link lengths and doe, all the shortest path calculations. The length of each link depends on the load carried by the link as well as other factors, such as the type of the link and the type of virtual circuit being routed [Tym81]. While the algorithm can be classified as adaptive, it does not have the potential for oscillations of the ARPANET algorithms. This is due primarily to the use of virtual circuits rather than datagrams. as explained in Section 5.2.5. Once the supervisor has decided upon the path to be used by a virtual circuit. it inI'anTIS the nodes on the path and the necessary information is entered in each node's routing tables. These tables are structured as shown in Fig. 5.13. Basically, a virtual circuit is assigned a channel (or port) number on each link it crosses. The routing table at each node matches each virtual circuit's channel number on the incoming link with a channel number on the outgoing link. In the original version of TYMNET (now called TYMNET 1), the supervisor maintains an image of the routing tables of all nodes and explicitly read, and writes the tables in the nodes. In the cUlTent version (TYMNET II). the nodes maintain their own tables. The supervisor establishes a new virtual circuit by sending a "needle" packet to the origin node. The needle contains routing infonnation and threads its \\ay through the network, building the circuit as it goes, with the user data following behind it. Input Output from user • 01- Li_nk_l__-loJ>-~0r-----L-in-k-2--~.0f-t_o_u_se_r-'l.~ Port 5 4 Node A 4 Node B Port 7 Node C Figure 5.13 Structure of TYMNET routing tables. For the virtual circuit on the path ABC, the routing table at node A maps input port .5 onto channel 4 on link I. At node B. incoming channel 4 is mapped into outgoing channel I/] on link ~. AI node C incoming channel 10 is mapped into output port 7