Sec.6.1 Introduction 497 Input rate 0.8 Limited buffer space Low capacity link C=1 Input rate High capacity A link C 10 B Low capacity link C=1 Low capacity link C=1 D (a) 1.8 Throughput when infinite buffer space is available indy6nojy at the central node 1.1 0.8 Retransmissions due to limited buffer space at the central node start here 0 1.0 Input Rate f of A to B Session (b) Figure 6.1 Example demonstrating throughput degradation due to retransmissions caused by buffer overllow.(a)For f approaching unity.the central node buffer is almost always full.thereby causing retransmissions.Because the A-to-B session uses a line 10 times faster than the C-to-D session.it has a 10-fold greater chance of capturing a free buffer and getting a packet accepted at the central node.As a result.the throughput of the A- to-B session approaches unity.while the throughput of the (-to-D session approaches 0.1.(b)Total throughput as a function of the input rate of the A-t0-B session. or number of hops traveled so far;at each node,separate buffer space is reserved for different classes,while some buffer space is shared by all classes. Proper buffer management can also help avoid deadlocks due to buffer overflow. Such a deadlock can occur when two or more nodes are unable to move packets due to unavailable space at all potential receivers.The simplest example of this is two
Sec. 6.1 Introduction Input rate 0.8 497 Input rate f ::J C. .<: '"::J 1.1 :: .<: f- ro 0.8 0 f- A Limited buffer space '\, High capacity link C 10 Retransm issions due to limited buffer space at the central node start here B Low capaciw link C 1 (a) 1.8 ,---,,--' "". - ------- -- ...................... 1 .\ I \ Throughput when infinite buffer space is available at the central node I -~---r-~------------ I a 1.0 Input Rate f of A to B Session (b) Figure 6.1 Example demonstrating throughput degradation due to retransmissions caused by butler overflow. (a) For f approaching unity. the central node buffer is almost always full. thereby causing retransmissions. Because the A-to-B session uses a line 10 times faster than the C-to-D session. it has a 1O-fold greater chance of capturing a free buffer and getting a packet accepted at the central node. As a result. the throughput of the .4- to-B session approaches unity. while the throughput of the C-to-D session approaches 0.1. (b) Total throughput as a function of the input rate of the A-to-B session. or number of hops traveled so far; at each node, separate buffer space is reserved for different classes, while some buffer space is shared by all classes. Proper buffer management can also help avoid deadlocks due to buffer overflow. Such a deadlock can occur when two or more nodes are unable to move packets due to unavailable space at all potential receivers. The simplest example of this is two
498 Flow Control Chap.6 nodes A and B routing packets directly to each other as shown in Fig.6.2(a).If all the buffers of both A and B are full of packets destined for B and A,respectively. then the nodes are deadlocked into continuously retransmitting the same packets with no success,as there is no space to store the packets at the receiver.This problem can also occur in a more complex manner whereby more than two nodes arranged in a cycle are deadlocked because their buffers are full of packets destined for other nodes in the cycle [see Fig.6.2(b)].There are simple buffer management schemes that preclude this type of deadlock by organizing packets in priority classes and allocating extra buffers for packets of higher priority ([RaH76]and [Gop85]).A typical choice is to assign a level of priority to a packet equal to the number of links it has traversed in the network, as shown in Fig.6.3.If packets are not allowed to loop,it is then possible to show that a deadlock of the type just described cannot occur. We finally note that when offered load is large,limited delay and buffer overflow can be achieved only by lowering the input to the network.Thus,there is a natural trade-off between giving sessions free access to the network and keeping delay at a level low enough so that retransmissions or other inefficiencies do not degrade network performance.A somewhat oversimplified guideline is that,ideally,flow control should not be exercised at all when network delay is below some critical level,and,under heavy load conditions,should reject as much offered traffic as necessary to keep delay at the critical level.Unfortunately.this is easier said than done,since neither delay nor throughput can be represented meaningfully by single numbers in a flow control context. Fairness.When offered traffic must be cut back,it is important to do so fairly. The notion of fairness is complicated,however,by the presence of different session priorities and service requirements.For example,some sessions need a minimum guar- anteed rate and a strict upper bound on network delay.Thus,while it is appropriate to consider simple notions of fairness within a class of"similar"sessions,the notion of Figure 6.2 Deadlock due to buffer overflow.In (a).all buffers of A and B are full of packets destined for B and A. D respectively.As a result,no packet can be accepted at either node.In(b).all buffers (b) of A.B.C.and D are full of packets destined for C.D.A.and B.respectively
498 Flow Control Chap. 6 nodes A and B routing packets directly to each other as shown in Fig. 6.2(a). If all the buffers of both A and B are full of packets destined for B and A, respectively, then the nodes are deadlocked into continuously retransmitting the same packets with no success, as there is no space to store the packets at the receiver. This problem can also occur in a more complex manner whereby more than two nodes arranged in a cycle are deadlocked because their buffers are full of packets destined for other nodes in the cycle [see Fig. 6.2(b)]. There are simple buffer management schemes that preclude this type of deadlock by organizing packets in priority classes and allocating extra buffers for packets of higher priority ([RaH76] and [Gop85]). A typical choice is to assign a level of priority to a packet equal to the number of links it has traversed in the network, as shown in Fig. 6.3. If packets are not allowed to loop, it is then possible to show that a deadlock of the type just described cannot occur. We finally note that when offered load is large, limited delay and buffer overflow can be achieved only by lowering the input to the network. Thus, there is a natural trade-off between giving sessions free access to the network and keeping delay at a level low enough so that retransmissions or other inefficiencies do not degrade network performance. A somewhat oversimplified guideline is that, ideally, flow control should not be exercised at all when network delay is below some critical level, and, under heavy load conditions, should reject as much offered traffic as necessary to keep delay at the critical level. Unfortunately, this is easier said than done, since neither delay nor throughput can be represented meaningfully by single numbers in a flow control context. Fairness. When offered traffic must be cut back, it is important to do so fairly. The notion of fairness is complicated, however, by the presence of different session priorities and service requirements. For example, some sessions need a minimum guaranteed rate and a strict upper bound on network delay. Thus, while it is appropriate to consider simple notions of fairness within a class of "similar" sessions, the notion of (a) (b) Figure 6.2 Deadlock due to buffer overflow. In (a). all buffers of A and B are full of packets destined for B and A. respectively. As a result. no packet can be accepted at either node. In (b), all buffers of A. B. C. and D are full of packets destined for C. D, A. and B. respectively
Sec.6.2 Window Flow Control 499 Class N-1 Class N-1 。年0 。。。 Class+1 Class Buffer space Buffer space for packets for packets 4。t that have that have travelled Class 1 Class 1 travelled klinks +1 links Class 0 Class 0 Figure 6.3 Organization of node memory in buffer classes to avoid deadlock due to buffer overflow.A packet that has traveled k links is accepted at a node only if there is an available buffer of class or lower.where k ranges from 0 to N-I (where N is the number of nodes).Assuming that packets that travel more than N-I links are discarded as having traveled in a toop.no deadlock occurs.The proof consists of showing by induction (starting with k=N-1)that at each node the buffers of class k cannot fill up permanently. fairness between classes is complex and involves the requirements of those classes.Even within a class,there are several notions of fairness that one may wish to adopt.The example of Fig.6.4 illustrates some of the contradictions inherent in choosing a fairness criterion.There are n+I sessions each offering I unit/sec of traffic along a sequence of n links with capacity of I unit/sec.One session's traffic goes over all n links,while the rest of the traffic goes over only one link.A maximum throughput of n units/sec can be achieved by accepting all the traffic of the single-link sessions while shutting off the n-link session.However,if our objective is to give equal rate to all sessions,the resulting throughput is only (n+1)/2 units/sec.Alternatively,if our objective is to give equal resources to all sessions,the single link sessions should get a rate of n/(n +1) units/sec,while the n-link session should get a rate of 1/(n+1)units/sec. Generally,a compromise is required between equal access to a network resource and throttling those sessions that are most responsible for using up that resource.Achiev- ing the proper balance,however,is a design decision that may be hard to quantify:often such decisions must be reached by trial and error. n-link user 1 unit/sec n links each with capacity 1 unit/sec Single link user Single link user Single link user Single link user 1 unit/sec 1 unit/sec 1 unit/sec 1 unit/sec Figure 6.4 Example showing that maximizing total throughput may be incompatible with giving equal throughput to all sessions.A maximum throughput of n units/sec can be achieved if the n-link session is shut off completely.Giving equal rates of 1/2 unit/sec to all sessions achieves a throughput of only (n+1)/2 units/sec
Sec. 6.2 Window Flow Control 499 Bu ffer space for packets that have travelled k + , links (~":~-~ f-C_I_as_.s_.N_._-_'--t '\ -- Class k + , Class k L-~-----I-ff-------i [ \ Class' ) \ Class' J ~Clas----ls a ~Clas----ls a L~~L~~-t/7-----""",,,~~ / ----- Buffer space for packets that have travelled k links Figure 6.3 Organization of node memory in buffer classes to avoid dcadlock due to buffer overflow. A packet that has traveled k links is accepted at a node only if thcre is an available buffer of class k or lower. wherc k ranges from 0 to IV - I (where IV is the numbcr of nodes). Assuming that packets that travel more than IV - I links are discarded as having traveled in a loop. no deadlock occurs. Thc proof consists of showing by induction (starting with k = IV - I) that at each node the buffers of class k cannot fill up permanently. fairness between classes is complex and involves the requirements of those classes. Even within a class, there are several notions of fairness that one may wish to adopt. The example of Fig. 6.4 illustrates some of the contradictions inherent in choosing a fairness criterion. There are n + 1 sessions each offering I unit/sec of traffic along a sequence of n links with capacity of 1 unit/sec. One session's traffic goes over all n links, while the rest of the traffic goes over only one link. A maximum throughput of n units/sec can be achieved by accepting all the traffic of the single-link sessions while shutting off the n-link session. However, if our objective is to give equal rate to all sessions, the resulting throughput is only (n + 1)12 units/sec. Alternatively, if our objective is to give equal resources to all sessions, the single link sessions should get a rate of nl(n + 1) units/sec, while the fl.-link session should get a rate of I/(n + 1) units/sec. Generally, a compromise is required between equal access to a network resource and throttling those sessions that are most responsible for using up that resource. Achieving the proper balance, however, is a design decision that may be hard to quantify; often such decisions must be reached by trial and error. n-link user , unit/sec \ n links each with capacity' unit/sec p: (~I • 1C: Single link user ~ingle link user Single link user , unit/sec , unit/sec , unit/sec Figure 6.4 Example showing that maximizing total throughput may be incompatible with giving equal throughput to all sessions. A maximum throughput of n units/sec can be achieved if the It-link scssion is shut off completely. Giving equal rates of 1/2 unit/sec to all sessions achieves a throughput of only (It T 1)/2 units/scc-
500 Flow Control Chap.6 6.2 WINDOW FLOW CONTROL In this section we describe the most frequently used class of flow control methods.In Sections 6.2.1 to 6.2.3,the main emphasis is on flow control within the communication subnet.Flow control outside the subnet,at the transport layer,is discussed briefly in Section 6.2.4. A session between a transmitter A and a receiver B is said to be window'flow controlled if there is an upper bound on the number of data units that have been trans- mitted by A and are not yet known by A to have been received by B(see Fig.6.5).The upper bound (a positive integer)is called the window size or,simply,the window.The transmitter and receiver can be,for example.two nodes of the communication subnet,a user's machine and the entry node of the communication subnet,or the users'machines at the opposite ends of a session.Finally,the data units in a window can be messages, packets,or bytes,for example. The receiver B notifies the transmitter A that it has disposed of a data unit by sending a special message to A,which is called a permit (other names in the literature are acknowledgment,allocate message,etc.).Upon receiving a permit,A is free to send one more data unit to B.Thus,a permit may be viewed as a form of passport that a data unit must obtain before entering the logical communication channel between A and B.The number of permits in use should not exceed the window size. Permits are either contained in special control packets.or are piggybacked on regular data packets.They can be implemented in a number of ways:see the practical examples of Section 6.4 and the following discussion.Note also that a window flow control scheme for a given session may be combined with an error control scheme for the session,where the permits also play the role of acknowledgments;see Section 2.8.2 and the descriptions of the ARPANET and the Codex network in Section 6.4. The general idea in the window strategy is that the input rate of the transmitter is reduced when permits return slowly.Therefore,if there is congestion along the communication path of the session.the attendant large delays of the permits cause a natural slowdown of the transmitter's data rate.However,the window strategy has an additional dimension,whereby the receiver may intentionally delay permits to restrict DU DU Transmitter Receiver Permit Permit Total number of data units and permits window size WAs Figure 6.5 Window flow control between a transmitter and a receiver consists of an upper bound on the number of data units and permits in transit inside the network
500 6.2 WINDOW FLOW CONTROL Flow Control Chap. 6 In this section we describe the most frequently used class of flow control methods. In Sections 6.2.1 to 6.2.3, the main emphasis is on flow control within the communication subnet. Flow control outside the subnet, at the transport layer, is discussed briefly in Section 6.2.4. A session between a transmitter A and a receiver B is said to be window flow controlled if there is an upper bound on the number of data units that have been transmitted by A and are not yet known by A to have been received by B (see Fig. 6.5). The upper bound (a positive integer) is called the window size or, simply, the window. The transmitter and receiver can be, for example, two nodes of the communication subnet, a user's machine and the entry node of the communication subnet, or the users' machines at the opposite ends of a session. Finally, the data units in a window can be messages, packets, or bytes, for example. The receiver B notifies the transmitter A that it has disposed of a data unit by sending a special message to A, which is called a permit (other names in the literature are acknowledgment, allocate message, etc.). Upon receiving a permit, A is free to send one more data unit to B. Thus, a permit may be viewed as a form of passport that a data unit must obtain before entering the logical communication channel between A and B. The number of permits in use should not exceed the window size. Permits are either contained in special control packets, or are piggybacked on regular data packets. They can be implemented in a number of ways; see the practical examples of Section 6.4 and the following discussion. Note also that a window flow control scheme for a given session may be combined with an error control scheme for the session, where the permits also play the role of acknowledgments; see Section 2.8.2 and the descriptions of the ARPANET and the Codex network in Section 6.4. The general idea in the window strategy is that the input rate of the transmitter is reduced when permits return slowly. Therefore, if there is congestion along the communication path of the session, the attendant large delays of the permits cause a natural slowdown of the transmitter's data rate. However, the window strategy has an additional dimension, whereby the receiver may intentionally delay permits to restrict Transmitter Receiver A B ~~ Total number of data units and permits <; window size WAS Figure 6.5 Window flow control between a transmitter and a receiver consists of an upper bound \ \'.4 [J on the number of data units and permits in transit inside the network
Sec.6.2 Window Flow Control 501 the transmission rate of the session.For example,the receiver may do so to avoid buffer overflow. In the subsequent discussion,we consider two strategies,end-to-end and node-by- node windowing.The first strategy refers to flow control between the entry and exit subnet nodes of a session,while the second strategy refers to flow control between every pair of successive nodes along a virtual circuit's path. 6.2.1 End-to-End Windows In the most common version of end-to-end window flow control,the window size is alf',where a and II'are some positive numbers.Each time a new batch of a data units is received at the destination node,a permit is sent back to the source allocating a new batch of a data units.In a variation of this scheme,the destination node will send a new a-data unit permit upon reception of just the first of an a-data unit batch.(See the SNA pacing scheme description in Section 6.3.)To simplify the following exposition, we henceforth assume that a =1,but our conclusions are valid regardless of the value of a.Also,for concreteness,we talk in terms of packets,but the window maintained may consist of other data units such as bytes. Usually,a numbering scheme for packets and permits is used so that permits can be associated with packets previously transmitted and loss of permits can be detected.One possibility is to use a sliding window protocol similar to those used for data link control, whereby a packet contains a sequence number and a request number.The latter number can serve as one or more permits for flow control purposes (see also the discussion in Section 2.8.2).For example,suppose that node A receives a packet from node B with request number k.Then A knows that B has disposed of all packets sent by A and numbered less than k,and therefore A is free to send those packets up to number +-I that it has not sent yet.where W'is the window size.In such a scheme,both the sequence number and the request number are represented modulo m,where m >I+1. One can show that if packet ordering is preserved between transmitter and receiver,this representation of numbers is adequate;the proof is similar to the corresponding proof for the goback n ARQ system.In some networks the end-to-end window scheme is combined with an end-to-end retransmission protocol,and a packet is retransmitted if following a suitable timeout,the corresponding permit has not returned to the source. To simplify the subsequent presentation,the particular manner in which permits are implemented will be ignored.It will be assumed that the source node simply counts the number of packets it has already transmitted but for which it has not yet received back a permit,and transmits new packets only as long as r W. Figure 6.6 shows the flow of packets for the case where the round-trip delay d, including round-trip propagation delay,packet transmission time,and permit delay is smaller than the time required to transmit the full window of W packets,that is, d≤IIX where X is the transmission time of a single packet.Then the source is capable of transmitting at the full speed of 1/X packets/sec,and flow control is not active.(To
Sec. 6.2 Window Flow Control 501 the transmission rate of the session. For example, the receiver may do so to avoid buffer overflow. In the subsequent discussion, we consider two strategies, end-fa-end and node-bynode windowing. The first strategy refers to flow control between the entry and exit subnet nodes of a session, while the second strategy refers to flow control between every pair of successive nodes along a virtual circuit's path. 6.2.1 End-to-End Windows In the most common version of end-to-end window flow control, the window size is on-, where a and n° are some positive numbers. Each time a new batch of a data units is received at the destination node, a permit is sent back to the source allocating a new batch of a data units. In a variation of this scheme, the destination node will send a new a-data unit permit upon reception of just the first of an Q-data unit batch. (See the SNA pacing scheme description in Section 6.3.) To simplify the following exposition, we henceforth assume that 0 = I, but our conclusions are valid regardless of the value of a. Also, for concreteness, we talk in terms of packets, but the window maintained may consist of other data units such as bytes. Usually, a numbering scheme for packets and permits is used so that permits can be associated with packets previously transmitted and loss of permits can be detected. One possibility is to use a sliding window protocol similar to those used for data link control, whereby a packet contains a sequence number and a request number. The latter number can serve as one or more permits for flow control purposes (see also the discussion in Section 2.8.2). For example, suppose that node A receives a packet from node B with request number k. Then A knows that B has disposed of all packets sent by A and numbered less than k, and therefore A is free to send those packets up to number k+ W - I that it has not sent yet, where VV is the window size. In such a scheme, both the sequence number and the request number are represented modulo Tn, where Tn 2 TV +1. One can show that if packet ordering is preserved between transmitter and receiver, this representation of numbers is adequate; the proof is similar to the corresponding proof for the goback n ARQ system. In some networks the end-to-end window scheme is combined with an end-to-end retransmission protocol, and a packet is retransmitted if following a suitable timeout, the corresponding permit has not returned to the source. To simplify the subsequent presentation, the particular manner in which permits are implemented wiIl be ignored. It will be assumed that the source node simply counts the number ~. of packets it has already transmitted but for which it has not yet received back a permit, and transmits new packets only as long as x: < VV. Figure 6.6 shows the flow of packets for the case where the round-trip delay d, including round-trip propagation delay, packet transmission time, and permit delay is smaller than the time required to transmit the full window of VV packets, that is, where X is the transmission time of a single packet. Then the source is capable of transmitting at the full speed of 1/X packets/sec, and flow control is not active. (To