Initialization pa←pb/(1- count·p) count←-1 In this case a large FTP packet is more likely to be marked for each packet arrival than is a small TELNET pac calculat avg. queue size a2g Sections 6 and 7 discuss in detail the setting of the var- if the queue is nonempty ious parameters for RED gateways. Section 6 discusses aug f(1-wg)aug+w the calculation of the average queue size. The queue weight wg is determined by the size and duration of bursts in m←- f(time- g-time) queue size that are allowed at the gateway. The mini- mum and maximum thresholds minth and matth are de- if minti≤aug<mart termined by the desired average queue size. The average rement count queue size which makes the desired tradeoffs( such as the calculate probability pa tradeoff between maximizing throughput and minimizing P delay depends on network characteristics, and is left as map(aug-minth/(matth -minth)a question for further research. Section 7 discusses the pb with probability pa: In this paper our primary interest is in th mark the arriving packet operation of the RED gateways. Specific questions about count←0 the most efficient implementation of the RED algorithm Ise if ma th < aug ng packet else count t-1 5 A simple simulation when queue becomes empty a-time←time This section describes our simulator and presents a simple simulation with RED gateways Our simulator is a version of the REAL simulator [ 19]built on Columbia's ulation package [1 with extensive modifications and bug g-time: start of the queue idle time fixes made by Steven McCanne at LBL. In the simula ackets since last marked pkt. tor, FTP sources always have a packet to send and always send a maximal-sized(1000-byte) packet as soon as the Fixed parameters: congestion control window allows them to do so. A sink wg: queue weight immediately sends an aCk packet when it receives a data minth: minimum threshold for queue packet. The gateways use FIFO queueing nath: maximum threshold for queue Source and sink nodes implement a congestion con maximum value for pb trol algorithm equivalent to that in 4.3-Tahoe BSD TCP Briefly, there are two phases to the window-adjustment al Other: gorithm. A threshold is set initially to half the receiver's Pa: current pkt-marking probability advertised window. In the slow-start phase, the current g: current queue size window is doubled each roundtrip time until the window current time reaches the threshold. Then the congestion-avoidance phase a linear function of the time t is entered, and the current window is increased by roughly one packet each roundtrip time. The window is never al- Figure 2: Detailed algorithm for RED gateways lowed to increase to more than the receiver 's advertised window, which this paper refers to as the maximum win dow size". In 4.3-Tahoe BSD TCP, packet loss(a dropped the average queue size accurately reflects the average de- packet)is treated as a "congestion experienced signal lay at the gateway. When this option is used, the algo- The source reacts to a packet loss by setting the threshold rithm would be modified to ensure that the probability that to half the current window, decreasing the current window packet is marked is proportional to the packet size in to one packet, and entering the slow-start phase inth/( p, Packet Size/MaximumPacket Size or does not use the 4.3-Tahoe TCP code directly but we 6
Initialization: ✆✞✝✡✟ ✩❀✿ ✷❁✸✻✺☎✍✑✼ ✩ ✬ ✵ for each packet arrival calculate new avg. queue size ✆✞✝✡✟: if the queue is nonempty ✆✠✝✠✟ ✩ ✫✶✵ ✬❃❂❅❄❆✯❇✆✞✝✠✟❉❈❊❂❅❄●❋ else ☛ ✩❀❍ ✫ ✼■☞✥☛✚❏❅✬❑❋ ✼■☞✥☛✌❏▲✯ ✆✠✝✠✟ ✩ ✫✶✵ ✬❃❂❅❄❆✯❇▼◆✆✞✝✡✟ if ☛✤☞✥✍✑✏✓✒❖✔P✆✞✝✡✟◗✗✙☛✌✆✞✛✣✏✓✒ increment ✷✹✸✻✺✣✍✑✼ calculate probability ✂✄ : ✂☎✦ ✩ ☛✌✆✞✛❘✧ ✫ ✆✞✝✠✟✭✬✮☛✤☞✥✍✏✓✒ ✯✱✰ ✫☛✌✆✞✛✏✓✒ ✬✮☛✤☞✥✍✏✓✒ ✯ ✂✄ ✩ ✂☎✦ ✰ ✫✶✵ ✬✮✷❁✸✻✺☎✍✑✼✾✽ ✂☎✦ ✯ with probability ✂✄ : mark the arriving packet ✷❁✸✻✺✣✍✑✼ ✩❙✿ else if ☛✌✆✞✛✜✏✓✒✕✔✖✆✞✝✠✟ mark the arriving packet ✷❁✸✻✺☎✍✑✼ ✩❀✿ else ✷❁✸✻✺☎✍✑✼ ✩ ✬ ✵ when queue becomes empty ❋ ✼■☞✥☛✌❏ ✩ ✼■☞✥☛✌❏ Saved Variables: ✆✞✝✡✟: average queue size ❋ ✼■☞✥☛✌❏: start of the queue idle time ✷❁✸✻✺☎✍✑✼: packets since last marked pkt. Fixed parameters: ❂❄ : queue weight ☛✤☞✥✍✏✓✒ : minimum threshold for queue ☛✌✆✞✛✏✓✒ : maximum threshold for queue ☛✌✆✞✛★✧ : maximum value for ✂☎✦ Other: ✂☎✄ : current pkt-marking probability ❋: current queue size ✼■☞✥☛✌❏: current time ❍ ✫ ✼✶✯ : a linear function of the time ✼ Figure 2: Detailed algorithm for RED gateways. the average queue size accurately reflects the average delay at the gateway. When this option is used, the algorithm would be modified to ensure thatthe probability that a packet is marked is proportional to the packet size in bytes: ✂✢✦ ✩ ☛✚✆✞✛❘✧ ✫ ✆✞✝✠✟❚✬❃☛✌☞✥✍✏✓✒ ✯❯✰ ✫☛✌✆✞✛✏✓✒ ✬❃☛✌☞✥✍✏✓✒ ✯ ✂✦❱✩ ✂✦❳❲●❨✡❩❭❬✠❪❁❫✴❴❛❵❝❜❞❪✰▲❡❨❣❢❘❵❝❤✭✐★❤✕❲●❨✡❩❭❬✡❪✹❫❭❴❛❵❥❜✹❪ ✂✢✄ ✩ ✂✦ ✰ ✫❇✵ ✬❑✷❁✸✻✺☎✍✑✼❦✽ ✂✦ ✯ In this case a largeFTPpacketis more likely to be marked than is a small TELNET packet. Sections 6 and 7 discuss in detailthe setting of the various parameters for RED gateways. Section 6 discusses the calculation of the average queue size. The queue weight ❂❄ is determined by the size and duration of bursts in queue size that are allowed at the gateway. The minimum and maximum thresholds ☛✌☞✥✍✑✏✓✒ and ☛✚✆✠✛✣✏✓✒ are determined by the desired average queue size. The average queue size which makes the desired tradeoffs (such as the tradeoff between maximizing throughput and minimizing delay) depends on network characteristics, and is left as a question for further research. Section 7 discusses the calculation of the packet-marking probability. In this paper our primary interest is in the functional operation of the RED gateways. Specific questions about the most efficient implementation of the RED algorithm are discussed in Section 11. 5 A simple simulation This section describes our simulator and presents a simple simulation with RED gateways. Oursimulator is a version of the REAL simulator [19] built on Columbia’s Nest simulation package [1], with extensive modifications and bug fixes made by Steven McCanne at LBL. In the simulator, FTP sources always have a packet to send and always send a maximal-sized (1000-byte) packet as soon as the congestion control window allows them to do so. A sink immediately sends an ACK packet when it receives a data packet. The gateways use FIFO queueing. Source and sink nodes implement a congestion control algorithm equivalent to that in 4.3-Tahoe BSD TCP. 3 Briefly,there are two phases to the window-adjustment algorithm. A threshold is set initially to half the receiver’s advertised window. In the slow-start phase, the current window is doubled each roundtrip time until the window reachesthe threshold. Then the congestion-avoidance phase is entered, and the current window is increased by roughly one packet each roundtrip time. The window is never allowed to increase to more than the receiver’s advertised window, which this paper refers to as the “maximum window size”. In 4.3-Tahoe BSD TCP, packetloss (a dropped packet) is treated as a “congestion experienced” signal. The source reacts to a packet loss by setting the threshold to half the current window, decreasing the current window to one packet, and entering the slow-start phase. 3Our simulator does not use the 4.3-Tahoe TCP code directly but we believe it is functionally identical. 6
:业1二M 0.0 0.2 04 0.6 1.0 Queue size(solid line) and average queue size( dashed line) // 小wM加MN ※ X 0.0 0.2 04 0.6 0.8 Time Figure 3: A simulation with four FtP connections with staggered start times
Queue size (solid line) and average queue size (dashed line). ❧ Time ♠ Queue♥ 0.0 0.2 0.4 0.6 0.8 1.0 0♦10 30 max-th ♣ min-th qime r Packet Number (Mod 90) for Four Connections s 0.0 0.2 0.4 0.6 0.8 1.0 0s100s200s300s400s Figure 3: A simulation with four FTP connections with staggered start times. 7
Figure 3 shows a simple simulation with RED gate- in response to a dynamically changing load. As the num- ways. The network is shown in Figure 4. The simula- ber of connections increases, the frequency with which the tion contains four FTPconnections, each with a maximum gateway drops packets also increases. There is no global window roughly equal to the delay-bandwidth product, synchronization. The higher throughput for the connec which ranges from 33 to 112 packets. The RED gateway tions with shorter roundtrip times is due to the bias of parameters are set as follows: wg =0.002, minth= TCP's window increase algorithm in favor of connections 5 packets, matth =15 packets, and maTp 1/50. with shorter roundtrip times(as discussed in[6, 71). For The buffer size is sufficiently large that packets are never the simulation in Figure 3 the average link utilization is dropped at the gateway due to buffer overflow, in this sim- 76%. For the following second of the simulation, when ulation the red gateway controls the average queue size, all four sources are active, the average link utilization is and the actual queue size never exceeds forty packets. 82%.(This is not shown in Figure 3.) FTP SOURCES GATEWAY (tnangle' for RED, square for Drop Tan Figure 5: Comparing Drop Tail and RED gateways SINK FTP SOURCES Figure 4: Simulation network For the charts in Figure 3 the x-axis shows the time in seconds. The bottom chart shows the packets from nodes GATEWAY -4. Each of the four main rows shows the packets from one of the four connections the bottom row shows node 1 20ms packets, and the top row shows node 4 packets. There is a mark for each data packet as it arrives at the gateway and as it departs from the gateway; at this time scale, the two marks are often indistinguishable. The y-axis is a function of the packet sequence number; for packet number n from Figure 6: Simulation network node i, the y-axis shows n mod 90+(i-1)100. Thus, each vertical" line represents 90 consecutively-numbered Because RED gateways can control the average queue packets from one connection arriving at the gateway. Each size while accommodating transient congestion, RED gate- X shows a packet dropped by the gateway, and each'X ways are well-suited to provide high throughput and low is followed by a mark showing the retransmitted packet. average delay in high-speed networks with TCP connec- Node I starts sending packets at time 0, node 2 starts af- tions that have large windows. The RED gateway can ac- ter 0.2 seconds, node 3 starts after 0. 4 seconds, and node 4 commodate the short burst in the queue required by TCP's slow-start phase, thus RED gateways control the aver- size q and the calculated average queue size aug. The dot- smoothly open their windows. Figure 5 shows the results ted lines show minth and mat th, the minimum and max- of simulations of the network in Figure 6 with two TCP imum thresholds for the average queue size. Note that the connections, each with a maximum window of 240 pack- calculated average queue size aug changes fairly slowly ets, roughly equal to the delay-bandwidth product. The compared to g. The bottom row of X's on the bottom two connections are started at slightly different times. The chart shows again the time of each dropped packet. simulations compare the performance of Drop Tail and of This simulation shows the success of the REd gate- RED gateways way in controlling the average queue size at the gateway In Figure 5 the x-axis shows the total throughput as a
Figure 3 shows a simple simulation with RED gateways. The network is shown in Figure 4. The simulation containsfourFTPconnections, each with a maximum window roughly equal to the delay-bandwidth product, which ranges from 33 to 112 packets. The RED gateway parameters are set as follows: ❂❅❄✖t ✿ ✳ ✿✠✿✠✉, ☛✤☞✥✍✑✏✓✒✈t ✇ packets, ☛✌✆✞✛✜✏✓✒①t ✵ ✇ packets, and ☛✚✆✞✛✧ t ✵ ✰ ✇ ✿ . The buffer size is sufficiently large that packets are never dropped atthe gateway due to buffer overflow;in this simulation the RED gateway controls the average queue size, and the actual queue size never exceeds forty packets. 3② 1 2③ 4④ SINK ⑤ GATEWAY ⑥ 1 5⑦ 6⑧ 4④ 45Mbps ⑨ 100Mbps ⑩ 2ms ❶ FTP SOURCES 1ms ⑩ 4ms ⑨ 8ms ❷ 5ms ❸ Figure 4: Simulation network. For the charts in Figure 3, the x-axis shows the time in seconds. The bottom chart shows the packets from nodes 1-4. Each of the four main rows shows the packets from one of the four connections; the bottom row shows node 1 packets, and the top row shows node 4 packets. There is a mark for each data packet as it arrives at the gateway and as it departs from the gateway; at this time scale, the two marks are often indistinguishable. The y-axis is a function of the packet sequence number; for packet number ✍ from node ☞ , the y-axis shows ✍ ❤❺❹❘❻✭❼✠✿ ❈ ✫ ☞❽✬ ✵ ✯ ✵ ✿✡✿. Thus, each vertical ‘line’ represents 90 consecutively-numbered packetsfrom one connection arriving atthe gateway. Each ‘X’ shows a packet dropped by the gateway, and each ‘X’ is followed by a mark showing the retransmitted packet. Node 1 starts sending packets at time 0, node 2 starts after 0.2 seconds, node 3 starts after 0.4 seconds, and node 4 starts after 0.6 seconds. The top chart ofFigure 3 showsthe instantaneous queue size ❋ and the calculated average queue size ✆✠✝✠✟. The dotted lines show ☛✤☞✥✍✑✏✓✒ and ☛✚✆✠✛✣✏✓✒ , the minimum and maximum thresholds for the average queue size. Note that the calculated average queue size ✆✞✝✡✟ changes fairly slowly compared to ❋. The bottom row of X’s on the bottom chart shows again the time of each dropped packet. This simulation shows the success of the RED gateway in controlling the average queue size at the gateway in response to a dynamically changing load. As the number of connectionsincreases,the frequency with which the gateway drops packets also increases. There is no global synchronization. The higher throughput for the connections with shorter roundtrip times is due to the bias of TCP’s window increase algorithm in favor of connections with shorter roundtrip times (as discussed in [6, 7]). For the simulation in Figure 3 the average link utilization is 76%. For the following second of the simulation, when all four sources are active, the average link utilization is 82%. (This is not shown in Figure 3.) (‘triangle’ for RED, ‘square’ for Drop Tail) ❾ Throughput (%) ❿ Average Queue ➀ 0.4 0.6 0.8 1.0 0➁20 40 60 80 100➁ Figure 5: Comparing Drop Tail and RED gateways. 3➂ 4➃ SINK ➄ GATEWAY ➅ 5➆ 6➇ 4➃ 45Mbps ➈ 100Mbps FTP SOURCES 20ms ➉ 1ms Figure 6: Simulation network. Because RED gateways can control the average queue size while accommodating transient congestion, RED gateways are well-suited to provide high throughput and low average delay in high-speed networks with TCP connections that have large windows. The RED gateway can accommodate the short burstin the queue required by TCP’s slow-start phase; thus RED gateways control the average queue size while still allowing TCP connections to smoothly open their windows. Figure 5 shows the results of simulations of the network in Figure 6 with two TCP connections, each with a maximum window of 240 packets, roughly equal to the delay-bandwidth product. The two connections are started at slightly differenttimes. The simulations compare the performance of Drop Tail and of RED gateways. In Figure 5 the x-axis shows the total throughput as a 8