Congestion Control for high bandwidth-Delay product Networks Dina Katabi- Mark handley Charlie rohrst MIT-LCS ICSI Tellabs dk@mit. edu mjh@icsi. berkeley. edu crhors @mit. edu ABSTRACT General terms Theory and experiments show that as the per-flow product of band- Performance Design. width and latency increases, TCP becomes inefficient and prone to instability, regardless of the queuing scheme. This failing becomes Keywords creasingly important as the Internet evolves to incorporate high-bandwidth optical links and more large-delay satellite links Congestion control, high-speed networks, large bandwidth-delay To address this problem, we develop a novel approach to Inter- net congestion control that outperforms TCP in conventional en- vironments, and remains efficient, fair, scalable, and stable as the 1. introduction andwidth-delay product increases. This new eXplicit Control Pro- tocol, XCP, generalizes the Explicit Congestion Notification pre For the internet to continue to thrive. its col sal (ECN). In addition, XCP introduces the new concept of de- anism must remain effective as the network evolves. Technology trends indicate that the future Internet will have a large more flexible and analytically tractable protocol design and opens num可wys如 new avenues for service differentiation Using a control theory framework, we model XCP and demon- These trends are problematic because TCP reacts adversely to strate it is stable and efficient regardless of the link capacity, creases in bandwidth or delay Mathematical analysis of current congestion control simulations show that XCP outperforms TCP in both conventional algorithms reveals that, regardless of the queuing scheme, as the air bandwidth allocat gh utilization, small standing queu rone to instability. By casting the problem into a control theory ize,and near-zero packet drops, with both steady and highly var framework, Low et al. [23] show that as capacity or delay ing traffic. Additionally, the new protocol does not maintain an Random Early Discard(RED)[13], Random Early Marking (REM per-flow state in routers and requires few CPU cycles per packet 5], Proportional Integral Controller [15], and Virtual Queue [14 all eventually become oscillatory and prone to instability. They further argue that it is unlikely that any Active Queue Management scheme(AQM) can maintain stability over very high-capacity or Categories and subject Descriptors large-delay links. Furthermore, Katabi and Blake 19]show that Adaptive Virtual Queue(AvQ)[22] also becomes pron C.2.5 [Computer Systems Organization]: Computer Communi- bility when the link capacity is large enough(e.g, gigabit links) ation Networks-lLocal and wide-Area netorks Inefficiency is another problem facing TCP in the future Inter net. As the delay-bandwidth product increases, performance de- D Katabi was supported by ARPA Agreement J958100 grades. TCP's additive increase policy limits its ability to acquire contract F30602-00-20553. the and conclusions cont spare bandwidth to one packet per RTT. Since the bandwidth-delay erein are those of the authors and should not be interpret product of a single flow over very-high-bandwidth links may be representing the official policy or endorsements of DARPA many thousands of packets, TCP might waste thousands of RTts ramping up to full utilization following a burst of congestion TC. Rohrs was supported by the office of Naval Research under Further, the increase in link capacity does not improve the trans contract 6872 fer delay of short flows(the majority of the flows in the Internet) Short TCP flows cannot acquire the spare bandwidth faster than slow start" and will waste valuable RTTs ramping up even when 'ermission to make digital or hard copies of all or part of this work for Additionally, since TCP's throughput is inversely proportional to personal or classroom use is granted without fee provided that copies are the rtt, fairness too might become an issue as more flows in the Internet traverse satellite links or wireless WANs [25]. As user bear this notice and the full citation on the first page. To copy otherwise, to with substantially different RTTs compete for the same bottleneck ost on servers or to red Although the full impact of large delay-bandwidth products is yet to come, we can see the seeds of these problems in the cur-
Congestion Control for High Bandwidth-Delay Product Networks Dina Katabi Mark Handley Charlie Rohrs✁ MIT-LCS ICSI Tellabs dk@mit.edu mjh@icsi.berkeley.edu crhors@mit.edu ABSTRACT Theory and experiments show that as the per-flow product of bandwidth and latency increases, TCP becomes inefficient and prone to instability, regardless of the queuing scheme. This failing becomes increasingly important as the Internet evolves to incorporate very high-bandwidth optical links and more large-delay satellite links. To address this problem, we develop a novel approach to Internet congestion control that outperforms TCP in conventional environments, and remains efficient, fair, scalable, and stable as the bandwidth-delay product increases. This new eXplicit Control Protocol, XCP, generalizes the Explicit Congestion Notification proposal (ECN). In addition, XCP introduces the new concept of decoupling utilization control from fairness control. This allows a more flexible and analytically tractable protocol design and opens new avenues for service differentiation. Using a control theory framework, we model XCP and demonstrate it is stable and efficient regardless of the link capacity, the round trip delay, and the number of sources. Extensive packet-level simulations show that XCP outperforms TCP in both conventional and high bandwidth-delay environments. Further, XCP achieves fair bandwidth allocation, high utilization, small standing queue size, and near-zero packet drops, with both steady and highly varying traffic. Additionally, the new protocol does not maintain any per-flow state in routers and requires few CPU cycles per packet, which makes it implementable in high-speed routers. Categories and Subject Descriptors C.2.5 [Computer Systems Organization]: Computer Communication Networks—Local and Wide-Area Networks D. Katabi was supported by ARPA Agreement J958100, under contract F30602-00-20553. the views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policy or endorsements of DARPA or the US government. ✁ C. Rohrs was supported by the office of Naval Research under contract 6872400-ONR. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGCOMM’02, August 19-23, 2002, Pittsburgh, Pennsylvania, USA. Copyright 2002 ACM 1-58113-570-X/02/0008 ...$5.00. General Terms Performance, Design. Keywords Congestion control, high-speed networks, large bandwidth-delay product. 1. INTRODUCTION For the Internet to continue to thrive, its congestion control mechanism must remain effective as the network evolves. Technology trends indicate that the future Internet will have a large number of very high-bandwidth links. Less ubiquitous but still commonplace will be satellite and wireless links with high latency. These trends are problematic because TCP reacts adversely to increases in bandwidth or delay. Mathematical analysis of current congestion control algorithms reveals that, regardless of the queuing scheme, as the delay-bandwidth product increases, TCP becomes oscillatory and prone to instability. By casting the problem into a control theory framework, Low et al. [23] show that as capacity or delay increases, Random Early Discard (RED) [13], Random Early Marking (REM) [5], Proportional Integral Controller [15], and Virtual Queue [14] all eventually become oscillatory and prone to instability. They further argue that it is unlikely that any Active Queue Management scheme (AQM) can maintain stability over very high-capacity or large-delay links. Furthermore, Katabi and Blake [19] show that Adaptive Virtual Queue (AVQ) [22] also becomes prone to instability when the link capacity is large enough (e.g., gigabit links). Inefficiency is another problem facing TCP in the future Internet. As the delay-bandwidth product increases, performance degrades. TCP’s additive increase policy limits its ability to acquire spare bandwidth to one packet per RTT. Since the bandwidth-delay product of a single flow over very-high-bandwidth links may be many thousands of packets, TCP might waste thousands of RTTs ramping up to full utilization following a burst of congestion. Further, the increase in link capacity does not improve the transfer delay of short flows (the majority of the flows in the Internet). Short TCP flows cannot acquire the spare bandwidth faster than “slow start” and will waste valuable RTTs ramping up even when bandwidth is available. Additionally, since TCP’s throughput is inversely proportional to the RTT, fairness too might become an issue as more flows in the Internet traverse satellite links or wireless WANs [25]. As users with substantially different RTTs compete for the same bottleneck capacity, considerable unfairness will result. Although the full impact of large delay-bandwidth products is yet to come, we can see the seeds of these problems in the cur-
ent Internet. For example, TCP over satellite links has revealed present possible deployme etwork utilization issues and TCPs undesirable bias against long RTT Hows [4]. Currently, these problems are mitigated using ad 2. DESIGN RATIONALE hoc mechanisms such as ack spacing, split connection [4], or per- formance enhancing proxies [8] Our initial objective is to step back and rethink interne This paper develops a novel protocol for congestion control that tion control without caring about backward compatibil outperforms TCP in conventional environments, and further remains ployment. If we were to build a new congestion control arc rehired efficient, fair, and stable as the link bandwidth or the round-trip de- om scratch, what might it look like? lay increases. This new eXplicit Control Protocol, XCP, general The first observation is that packet loss is a poor signal of con- izes the Explicit Congestion Notification proposal(ECN)[27]. In- gestion. While we do not believe a cost-effective network can al- ways avoid loss, dropping packets should be a congestion signal of stead of the one bit congestion indication used by ECN, our routers last resort. As an implicit signal, loss is bad because congestion is inform the senders about the degree of congestion at the bottleneck. not the only source of loss, and because a definite decision that a fairness control. To control utilization, the new protocol adjusts packet was lost cannot be made quickly. As a binary signal, loss its aggressiveness according to the spare bandwidth in the network only signals whether there is congestion(a loss)or not(no loss) and the feedback delay. This prevents oscillations, provides stabil Thus senders must probe the network to the point of congestion ty in face of high bandwidth or large delay, and ensures efficient before backing off. Moreover, as the feedback is imprecise, the in- utilization of network resources. To control fairness, the protocol crease policy must be conservative and the decrease policy must be reclaims bandwidth from flows whose rate is above their fair share aggressive and reallocates it to other flows Tight congestion control requires explicit and precise congestion By putting the control state in the packets, XCP needs no per- flow state in routers and can scale to any number of flows. Further, nalling should reflect the degree of congestion. We propose using only a few CPU cycles ecise congestion signalling, where the network explicitly tells the per packet, making it practical even for high-speed routers sender the state of congestion and how to react to it. This allows the Using a control theory framework motivated by previous work senders to decrease their sending windows quickly when the bottle- [22, 15, 23], we show in$ 4 that a fluid model of the protocol is neck is highly congested, while performing small reductions when the sending rate is close to the bottleneck capacity. The resulting stable for any link capacity, feedback delay, or number of sources. protocol is both more responsive and less oscillatory contrast to the various AQM schemes where depend on the capacity, delay, or number of sources, our analysis Second, the aggressiveness of the sources should be adjusted ac- shows how to set the parameters of the new protocol to constant cording to the delay in the feedback-loop. The dynamics of co values that are effective independent of the environment. Our extensive packet-level simulations in$5 show that, regard- delay. A fundamental characteristic of such a system is that it be- less of the queuing scheme, TCP'S performance degrades signifi- destabilizing effect, the system must slow down as the feedback de- lay increases. In the context of congestion control, this means that rotocol achieves high utilization, small queues, and almost no as delay increases, the sources should change their sending rates drops, independent of capacity or delay. Even in conventional en more slowly. This issue has been raised by other researchers 23 ter fairness, higher utilization, and smaller queue size, with almost 26], but the important question is how exactly feedback should de- no packet drops. Further, it maintains good performance in dy pend on delay to establish stability. Using tools from control theory, we conjecture that congestion feedback based on rate-mismatch namic environments with many short web-like flows, and has no should be inversely proportional to delay, and feedback based on bias against long RTT flows. a unique characteristic of the new protocol is its ability to operate with almost zero drops queue-mismatch should be inversely proportional to the square Although we started with the goal of solving TCP's limitations delay. in high-bandwidth large-delay environments, our design has several Robustness to congestion should be independent of unknown and quickly changing parameters, such as the number of flows. A fun- First, decoupling fairness control from utilization control open damental principle from control theory states that a controller must new avenues for service differentiation using schemes that provide react as quickly as the dynamics of the controlled signal; other desired bandwidth apportioning, yet are too aggressive or too weak wise the controller will always lag behind the controlled system and for controlling congestion. In $6, we present a simple scheme that will be ineffective. In the context of current proposals for conges- implements the shadow prices model [21] tion control, the controller is an Active Queue Management scheme Second, the protocol facilitates distinguishing error losses from (AQM). The controlled signal is the aggregate traffic traversing the ongestion losses, which makes it useful for wireless environments. ink. The controller seeks to match input traffic to link capacity In XCP, drops caused by congestion are highly uncommon(e.g However, this objective might be unachievable when the input traf- than one in a million packets in simulations). Further, since fic consists of TCP flows, because the dynamics of a TCPaggregate depend on the number of flows (N). The aggregate rate increases ion drop is likely to be preceded by an explicit feedback that by N packets per RTT, or decreases proportionally to 1/N.Since tells the source to decrease its congestion window. Losses that are the number of flows in the aggregate is not constant and changes preceded and followed by an explicit increase feedback are likely ver time, no AQM controller with constant parameters can be fast error losses enough to operate with an arbitrary number of TCP flows. Thus,a Third, as shown in 7, XCP facilitates the detection of mise- third objective of our system is to make the dynamics of the aggre- ite traffic independent from the number of flows Finally, XCP's performance provides an incentive for both end This leads to the need for decoupling efficiency control (i.e, con- users and network providers to deploy the protocol. In s8 we trol of utilization or congestion)from fairness control.Robustness to congestion requires the behavior of aggregate traffic to be inde
rent Internet. For example, TCP over satellite links has revealed network ✂ utilization issues and TCP’s undesirable bias against long RTT flows [4]. Currently, these problems are mitigated using ad hoc mechanisms such as ack spacing, split connection [4], or performance enhancing proxies [8]. This paper develops a novel protocol for congestion control that outperforms TCP in conventional environments, and further remains efficient, fair, and stable as the link bandwidth or the round-trip delay increases. This new eXplicit Control Protocol, XCP, generalizes the Explicit Congestion Notification proposal (ECN) [27]. Instead of the one bit congestion indication used by ECN, our routers inform the senders about the degree of congestion at the bottleneck. Another new concept is the decoupling of utilization control from fairness control. To control utilization, the new protocol adjusts its aggressiveness according to the spare bandwidth in the network and the feedback delay. This prevents oscillations, provides stability in face of high bandwidth or large delay, and ensures efficient utilization of network resources. To control fairness, the protocol reclaims bandwidth from flows whose rate is above their fair share and reallocates it to other flows. By putting the control state in the packets, XCP needs no per- flow state in routers and can scale to any number of flows. Further, our implementation (Appendix A), requires only a few CPU cycles per packet, making it practical even for high-speed routers. Using a control theory framework motivated by previous work [22, 15, 23], we show in ✄ 4 that a fluid model of the protocol is stable for any link capacity, feedback delay, or number of sources. In contrast to the various AQM schemes where parameter values depend on the capacity, delay, or number of sources, our analysis shows how to set the parameters of the new protocol to constant values that are effective independent of the environment. Our extensive packet-level simulations in ✄ 5 show that, regardless of the queuing scheme, TCP’s performance degrades signifi- cantly as either capacity or delay increases. In contrast, the new protocol achieves high utilization, small queues, and almost no drops, independent of capacity or delay. Even in conventional environments, the simulations show that our protocol exhibits better fairness, higher utilization, and smaller queue size, with almost no packet drops. Further, it maintains good performance in dynamic environments with many short web-like flows, and has no bias against long RTT flows. A unique characteristic of the new protocol is its ability to operate with almost zero drops. Although we started with the goal of solving TCP’s limitations in high-bandwidth large-delay environments, our design has several additional advantages. First, decoupling fairness control from utilization control opens new avenues for service differentiation using schemes that provide desired bandwidth apportioning, yet are too aggressive or too weak for controlling congestion. In ✄ 6, we present a simple scheme that implements the shadow prices model [21]. Second, the protocol facilitates distinguishing error losses from congestion losses, which makes it useful for wireless environments. In XCP, drops caused by congestion are highly uncommon (e.g., less than one in a million packets in simulations). Further, since the protocol uses explicit and precise congestion feedback, a congestion drop is likely to be preceded by an explicit feedback that tells the source to decrease its congestion window. Losses that are preceded and followed by an explicit increase feedback are likely error losses. Third, as shown in ✄ 7, XCP facilitates the detection of misbehaving sources. Finally, XCP’s performance provides an incentive for both end users and network providers to deploy the protocol. In ✄ 8 we present possible deployment paths. 2. DESIGN RATIONALE Our initial objective is to step back and rethink Internet congestion control without caring about backward compatibility or deployment. If we were to build a new congestion control architecture from scratch, what might it look like? The first observation is that packet loss is a poor signal of congestion. While we do not believe a cost-effective network can always avoid loss, dropping packets should be a congestion signal of last resort. As an implicit signal, loss is bad because congestion is not the only source of loss, and because a definite decision that a packet was lost cannot be made quickly. As a binary signal, loss only signals whether there is congestion (a loss) or not (no loss). Thus senders must probe the network to the point of congestion before backing off. Moreover, as the feedback is imprecise, the increase policy must be conservative and the decrease policy must be aggressive. Tight congestion control requires explicit and precise congestion feedback. Congestion is not a binary variable, so congestion signalling should reflect the degree of congestion. We propose using precise congestion signalling, where the network explicitly tells the sender the state of congestion and how to react to it. This allows the senders to decrease their sending windows quickly when the bottleneck is highly congested, while performing small reductions when the sending rate is close to the bottleneck capacity. The resulting protocol is both more responsive and less oscillatory. Second, the aggressiveness of the sources should be adjusted according to the delay in the feedback-loop. The dynamics of congestion control may be abstracted as a control loop with feedback delay. A fundamental characteristic of such a system is that it becomes unstable for some large feedback delay. To counter this destabilizing effect, the system must slow down as the feedback delay increases. In the context of congestion control, this means that as delay increases, the sources should change their sending rates more slowly. This issue has been raised by other researchers [23, 26], but the important question is how exactly feedback should depend on delay to establish stability. Using toolsfrom control theory, we conjecture that congestion feedback based on rate-mismatch should be inversely proportional to delay, and feedback based on queue-mismatch should be inversely proportional to the square of delay. Robustness to congestion should be independent of unknown and quickly changing parameters, such as the number of flows. A fundamental principle from control theory states that a controller must react as quickly as the dynamics of the controlled signal; otherwise the controller will always lag behind the controlled system and will be ineffective. In the context of current proposals for congestion control, the controller is an Active Queue Management scheme (AQM). The controlled signal is the aggregate traffic traversing the link. The controller seeks to match input traffic to link capacity. However, this objective might be unachievable when the input traf- fic consists of TCP flows, because the dynamics of a TCP aggregate depend on the number of flows (☎). The aggregate rate increases by ☎ packets per RTT, or decreases proportionally to ✆✞✝✟☎. Since the number of flows in the aggregate is not constant and changes over time, no AQM controller with constant parameters can be fast enough to operate with an arbitrary number of TCP flows. Thus, a third objective of our system is to make the dynamics of the aggregate traffic independent from the number of flows. This leads to the need for decoupling efficiency control (i.e., control of utilization or congestion) from fairness control. Robustness to congestion requires the behavior of aggregate traffic to be inde-
lled in H_cwnd (set to senders current cwnd) back reaches the receiver. it is returned to the sender in an acknow- sender's rtt estimate) Initialized augment packet, and the sender updates its cwnd accordingly 3.2 The Congestion Header H- feedback(initialized to senders demands) nodified by Each XCP packet carries a congestion header 1), which the routers is used to communicate a flow 's state to routers back from Figure 1: Congestion header. the routers on to the receivers. The field h_cun e sender's current congestion window, whereas H_rtt is the sender's current RTT estimate. These are filled in by the sender and never modified pendent of the number of fiows in it. However, any fair bandwidth The remaining field, H_feedback, takes positive or negative values and is initialized by the sender the bottleneck. Thus, the rule for dividing bandwidth among indi- modify this field to directly control the cor vidual flows in an aggregate should be independent from the control ingestion windows of the law that governs the dynamics of the aggregate Traditionally, efficiency and fairness are coupled since the same 3.3 The XCP sender control law(such as AIMD in TCP)is used to obtain both fair ness and efficiency simultaneously 3, 9, 17, 18, 16]. Conceptually As with TCP. an XCP sender maintains a congestion window of however, efficiency and fairness are independent. Efficiency in the outstanding packets, cwnd, and an estimate of the round trip time rtt. On packet departure, the sender attaches a congestion curr y the aggregate traffic's behavior. When the input traffic header to the packet and sets the H_cwnd field to its current cwnd the link capacity, no queue builds and utilization is op timal. Fairness, on the other hand, involves the relative throughput and H_rtt to its current rtt. In the first packet of a flow, h_rtt of flows sharing a link. a scheme is fair when the flows sharing a set to zero to indicate to the routers that the source does not yet link have the same throughput irrespective of congestion. ave a valid estimate of the rtt. In our new paradigm, a router has both an efficiency controller The sender initializes the H-feedback field to request its de- ired window increase. For example, when the application has a (EC)and a fairness controller(FC). This separation simplifies the desired rate r, the sender sets H-feedback to the desired increase design and analysis of each controller by reducing the requirements in the congestion window (r-- cwnd)divided by the number imposed. It also permits modifying one of the controllers with- of packets in the current congestion window. If bandwidth is avail- out redesigning or re-analyzing the other. Furthermore, it provides able, this initialization allows the sender to reach the desired rate a fiexible framework for integrating differential bandwidth alloca tions. For example, allocating bandwidth to senders according to after one rtt their priorities or the price they pay requires changing only the fair- Whenever a new acknowledgment arrives, positive feedback in- ness controller and does not affect the efficiency or the congestion reases the senders cwnd and negative feedback reduces it: (cwnd+ H-feedback, s) where s is the packet size 3. PROTOCOL n addition to direct feedback, XCP still needs to respond to XCP provides a joint design of end-systems and routers. like osses although they are rare. It does this in a similar manner to TCP, XCP is a window-based congestion control protocol intended TCP for best effort traffic. However, its flexible architecture can easily 3.4 The XCP Receiver support differentiated services as explained in $6. The description of XCP in this section assumes a pure XCP network. In$8, we An XCP receiver is similar to a TCP receiver except that when show that xcp can coexist with tcp in the same internet and be acknowledging a packet, it copies the congestion header from the data packet to its acknowledgment 3.1 Framework 3.5 The XCP router: The Control Laws the network, then in$3.5 we explain feedback compualon ows in First we give an overview of how control information flc The job of an XCP router is to compute the feedback to cause the system to converge to optimal efficiency and min-max fairness Thus, XCP does not drop packets. It operates on top time rttI and communicate these to the routers via a congestion policy such as DropTail, RED, or AvQ. The objective of XCP is header in every packet. Routers monitor the input traffic rate to to prevent, as much as possible, the queue from building up to the each of their output queues. Based on the difference between the link bandwidth and its input traffic rate, the router tells the fows To compute the feedback, an XCP router uses an efficiency con sharing that link to increase or decrease their congestion wind troller and a fairness controller. Both of th ongestion header of data over the average rt t of the flows traversing the link which smooths Feedback is divided between flows based on their cwnd and rtt the burstiness of a window-based control protocol. Estimating pa- values so that the system converges to fairness. A more congested rameters over intervals longer than the average rtt leads to slug- router later in the path can further reduce the feedback in i contain gish se, while estimating parameters over shorter intervals gestion header by overwriting it. Ultimately, the packet wil leads to us estimates. The average RTT is computed using the feedback from the bottleneck along the path. When the feed. the information in the congestion header. XCP controllers make a single control decision every average In this document, the notation RTT refers to the physical round T(the control interval). This is motivated by the need to ob- trip time rtt refers to the variable maintained by the source's serve the results of previous control decisions before attempting a software, and H-rtt refers to a field in the congestion header new control. For example, if the router tells the sources to increase
Figure 1: Congestion header. pendent of the number of flows in it. However, any fair bandwidth allocation intrinsically depends on the number of flows traversing the bottleneck. Thus, the rule for dividing bandwidth among individual flowsin an aggregate should be independent from the control law that governs the dynamics of the aggregate. Traditionally, efficiency and fairness are coupled since the same control law (such as AIMD in TCP) is used to obtain both fairness and efficiency simultaneously [3, 9, 17, 18, 16]. Conceptually, however, efficiency and fairness are independent. Efficiency involves only the aggregate traffic’s behavior. When the input traffic rate equals the link capacity, no queue builds and utilization is optimal. Fairness, on the other hand, involves the relative throughput of flows sharing a link. A scheme is fair when the flows sharing a link have the same throughput irrespective of congestion. In our new paradigm, a router has both an efficiency controller (EC) and a fairness controller (FC). This separation simplifies the design and analysis of each controller by reducing the requirements imposed. It also permits modifying one of the controllers without redesigning or re-analyzing the other. Furthermore, it provides a flexible framework for integrating differential bandwidth allocations. For example, allocating bandwidth to senders according to their priorities or the price they pay requires changing only the fairness controller and does not affect the efficiency or the congestion characteristics. 3. PROTOCOL XCP provides a joint design of end-systems and routers. Like TCP, XCP is a window-based congestion control protocol intended for best effort traffic. However, its flexible architecture can easily support differentiated services as explained in ✄ 6. The description of XCP in this section assumes a pure XCP network. In ✄ 8, we show that XCP can coexist with TCP in the same Internet and be TCP-friendly. 3.1 Framework First we give an overview of how control information flows in the network, then in ✄ 3.5 we explain feedback computation. Senders maintain their congestion window cwnd and round trip time rtt1 and communicate these to the routers via a congestion header in every packet. Routers monitor the input traffic rate to each of their output queues. Based on the difference between the link bandwidth and its input traffic rate, the router tells the flows sharing that link to increase or decrease their congestion windows. It does this by annotating the congestion header of data packets. Feedback is divided between flows based on their cwnd and rtt values so that the system converges to fairness. A more congested router later in the path can further reduce the feedback in the congestion header by overwriting it. Ultimately, the packet will contain the feedback from the bottleneck along the path. When the feed- ✠ In this document, the notation RTT refers to the physical round trip time, rtt refers to the variable maintained by the source’s software, and ✡ ☛✞☞✌☞ refers to a field in the congestion header. back reaches the receiver, it is returned to the sender in an acknowledgment packet, and the sender updates its cwnd accordingly. 3.2 The Congestion Header Each XCP packet carries a congestion header (Figure 1), which is used to communicate a flow’s state to routers and feedback from the routers on to the receivers. The field ✡ ✍✏✎✒✑✔✓ is the sender’s current congestion window, whereas ✡ ☛✕☞✖☞ is the sender’s current RTT estimate. These are filled in by the sender and never modified in transit. The remaining field, ✡ ✗✙✘✞✘✚✓✜✛✟✢✣✍✥✤, takes positive or negative values and is initialized by the sender. Routers along the path modify this field to directly control the congestion windows of the sources. 3.3 The XCP Sender As with TCP, an XCP sender maintains a congestion window of the outstanding packets, cwnd, and an estimate of the round trip time rtt. On packet departure, the sender attaches a congestion header to the packet and sets the ✡ ✍✏✎✒✑✔✓ field to its current cwnd and ✡ ☛✕☞✖☞ to its current rtt. In the first packet of a flow, ✡ ☛✕☞✖☞ is set to zero to indicate to the routers that the source does not yet have a valid estimate of the RTT. The sender initializes the ✡ ✗✙✘✕✘✚✓✜✛✏✢✦✍✟✤ field to request its desired window increase. For example, when the application has a desired rate ☛, the sender sets ✡ ✗✙✘✞✘✚✓✜✛✟✢✣✍✥✤ to the desired increase in the congestion window (☛✣✧ rtt - cwnd) divided by the number of packets in the current congestion window. If bandwidth is available, this initialization allows the sender to reach the desired rate after one RTT. Whenever a new acknowledgment arrives, positive feedback increases the senders cwnd and negative feedback reduces it: ✍✏✎✒✑✔✓✩★✫✪✭✬✞✮✔✯✰✍✟✎✱✑✔✓✳✲✴✡ ✗✙✘✕✘✚✓✜✛✏✢✦✍✟✤✔✵✷✶✕✸✏✵ where ✶ is the packet size. In addition to direct feedback, XCP still needs to respond to losses although they are rare. It does this in a similar manner to TCP. 3.4 The XCP Receiver An XCP receiver is similar to a TCP receiver except that when acknowledging a packet, it copies the congestion header from the data packet to its acknowledgment. 3.5 The XCP Router: The Control Laws The job of an XCP router is to compute the feedback to cause the system to converge to optimal efficiency and min-max fairness. Thus, XCP does not drop packets. It operates on top of a dropping policy such as DropTail, RED, or AVQ. The objective of XCP is to prevent, as much as possible, the queue from building up to the point at which a packet has to be dropped. To compute the feedback, an XCP router uses an efficiency controller and a fairness controller. Both of these compute estimates over the average RTT of the flowstraversing the link, which smooths the burstiness of a window-based control protocol. Estimating parameters over intervals longer than the average RTT leads to sluggish response, while estimating parameters over shorter intervals leads to erroneous estimates. The average RTT is computed using the information in the congestion header. XCP controllers make a single control decision every average RTT (the control interval). This is motivated by the need to observe the results of previous control decisions before attempting a new control. For example, if the router tells the sources to increase
their congestion windows, it should wait to see how much spare deallocation of bandwidth such that the total traffic rate(and conse- bandwidth remains before telling them to increase quently the efficiency )does not change, yet the throughput of each The router maintains a per-link estimation-control timer that is individual fow changes gradually to approach the fows fair share set to the most recent estimate of the average RTt on that link The shuffled trafic is computed as follows Upon timeout the router updates its estimates and its control deci sions. In the remainder of this paper, we refer to the routers current h=max(0,·y estimate of the average RTTas d to emphasize this is the feedback where y is the input traffic in an average rTt and y is a constant set to 0. 1. This equation ensures that, every average RTT, at least 3.5.1 The Eficiency Controller(EC) 10% of the traffic is redistributed according to aImd. the choice of 10% is a tradeoff between the time to converge to fairness and The efficiency controller's purpose is to maximize link the disturbance the shuffling imposes on a system that is around tion while minimizing drop rate and persistent queues. It only at aggregate traffic and need not care about fairness Next, we compute the per-packet feedback that allow such as which flow a packet belongs to to enforce the above policies. Since the increase law is As XCP is window-based, the EC computes a desired increase or whereas the decrease is multiplicative it is convenient to decrease in the number of bytes that the aggregate traffic transmits the feedback assigned to packet i as the combination of a positive in a control interval (i. e an average RTt). This aggregate feedback feedback pi and a negative feedback ni o is computed each control interval H-feedback o=a- S-B. Q First, we compute the case when the aggregate fee a and B are constant parameters, whose values are set based on our stability analysis(3 4)to 0.4 and 0.226, respectively. The term d is all flows by the same amount. Thus, we want the he through p onstant ference between the input traffic rate and link capacity. (Note that (i.e, Throughput i o con stant). Since we are dealing with a can be negative. )Finally,Q is the persistent queue size (i.e, the window-based protocol, we want to compute the change in con- queue that does not drain in a round trip propagation delay ),as gestion window rather than the change in throughput. The change sed to a transient queue that results from the bursty nature of all window-based protocols. We compute Q by taking the minimum ut multiplied by its RTT. Hence, the change in the queue seen by an arriving packet during the last propagation delay, window of flow i should be proportional to the flow's s RTT,(i.e which we estimate by subtracting the local queuing delay from the Acwndi o rtti) average rtt. Equation I makes the feedback proportional to the spare band- window to per-packet feedback that will be reported in the conges- width because, when S>0, the link is underutilized and we want tion header. The total change in congestion window of a flow is the to send positive feedback, while when S< 0, the link is congested sum of the per-packet feedback it receives. Thus, we obtain the per- and we want to send negative feedback. However this alone is in- packet feedback by dividing the change in congestion window by ufficient because it would mean we give no feedback when the the expected number of packets from flow i that the router sees in Input traffic matches the capacity, and so the queue does not drain. drain the persistent queue we make the aggregate feedback pro- gestion window divided by its packet size(both in bytes ), u in bytes, the spare bandwidth S is multiplied by the average portional to the persistent queue too. Finally, since the feedback acket positive feedback is proportional to the square of the flows To achieve efficiency, we allocate the aggregate feedback to si RTT, and inversely proportional to its congestion window divided gle packets as H-feedback Since the eC deals only with the ag. gregate behavior, it does not care which packets get the feedback by its packet size, (i.e, Pi o cundi/s-). Thus, positive feedback pi d by how much each individual flow changes its congestion win- dow. All the EC requires is that the total traffic changes by over this control interval. How exactly we divide the feedback among the packets(and hence the flows)affects only fairness, and so is the ob of the fairness controller where Sp is a constant. The total in the aggregate traffic rate 3.5.2 The Fairness Controller(FC) res that we are computing the positive feed- The job of the fairness controller(FC)is to apportion the feed- back. This to the sum of the increase in the rates of all ack to individual packets to achieve fairness. The FC relies on the flows in the aggregate, which is the sum of the positive feedback a same principle TCP uses to converge to fairness, namely Additive- flow has received divided by its rTt, and so Increase Multiplicative-Decrease (AIMD). Thus, we want to com- pute the per-packet feedback according to the polic lo>0, allocate it so that the increase in throughput ofall flows is the sa where L is the number of packets seen by the router in an average lfo<0, allocate it so that the decrease in throughput of aflow is RTT(the sum is over packets). From this, sp can be derived as proportional to its current throughpu This ensures col us convergence to fairness as long as the s,、h+max gregate feedback is not zero. To prevent convergence stalling hen efficiency is around optimal (o a 0), we introduce the con- Similarly, we compute the per-packet negative feedback given cept of bandwidth shuffling This is the simultaneous allocation and when the aggregate feedback is negative (o 0). In this
their congestion windows, it should wait to see how much spare bandwidth ✹ remains before telling them to increase again. The router maintains a per-link estimation-control timer that is set to the most recent estimate of the average RTT on that link. Upon timeout the router updates its estimates and its control decisions. In the remainder of this paper, we refer to the router’s current estimate of the average RTT as ✓ to emphasize this is the feedback delay. 3.5.1 The Efficiency Controller (EC) The efficiency controller’s purpose is to maximize link utilization while minimizing drop rate and persistent queues. It looks only at aggregate traffic and need not care about fairness issues, such as which flow a packet belongs to. As XCP is window-based, the EC computes a desired increase or decrease in the number of bytes that the aggregate traffic transmits in a control interval (i.e., an average RTT). This aggregate feedback ✺ is computed each control interval: ✺ ★✼✻✽✧✚✓✾✧✕✿✽❀❂❁❃✧✚❄❅✵ (1) ✻ and ❁ are constant parameters, whose values are set based on our stability analysis ( ✄ 4) to ❆✣❇ ❈ and ❆✣❇ ❉❊❉❊❋, respectively. The term ✓ is the average RTT, and ✿ is the spare bandwidth defined as the difference between the input traffic rate and link capacity. (Note that ✿ can be negative.) Finally, ❄ is the persistent queue size (i.e., the queue that does not drain in a round trip propagation delay), as opposed to a transient queue that results from the bursty nature of all window-based protocols. We compute ❄ by taking the minimum queue seen by an arriving packet during the last propagation delay, which we estimate by subtracting the local queuing delay from the average RTT. Equation 1 makes the feedback proportional to the spare bandwidth because, when ✿❍●✼❆, the link is underutilized and we want to send positive feedback, while when ✿❏■❍❆, the link is congested and we want to send negative feedback. However this alone is insufficient because it would mean we give no feedback when the input traffic matches the capacity, and so the queue does not drain. To drain the persistent queue we make the aggregate feedback proportional to the persistent queue too. Finally, since the feedback is in bytes, the spare bandwidth ✿ is multiplied by the average RTT. To achieve efficiency, we allocate the aggregate feedback to single packets as ✡ ✗✙✘✞✘✚✓✜✛✟✢✣✍✥✤. Since the EC deals only with the aggregate behavior, it does not care which packets get the feedback and by how much each individual flow changes its congestion window. All the EC requires is that the total traffic changes by ✺ over this control interval. How exactly we divide the feedback among the packets (and hence the flows) affects only fairness, and so is the job of the fairness controller. 3.5.2 The Fairness Controller (FC) The job of the fairness controller (FC) is to apportion the feedback to individual packets to achieve fairness. The FC relies on the same principle TCP uses to converge to fairness, namely AdditiveIncrease Multiplicative-Decrease (AIMD). Thus, we want to compute the per-packet feedback according to the policy: If ✺✽❑ ❆, allocate it so that the increase in throughput of all flows is the same. If ✺ ■❍❆, allocate it so that the decrease in throughput of a flow is proportional to its current throughput. This ensures continuous convergence to fairness as long as the aggregate feedback ✺ is not zero. To prevent convergence stalling when efficiency is around optimal (✺❂▲ ❆), we introduce the concept of bandwidth shuffling. This is the simultaneous allocation and deallocation of bandwidth such that the total traffic rate (and consequently the efficiency) does not change, yet the throughput of each individual flow changes gradually to approach the flow’s fair share. The shuffled traffic is computed as follows: ▼ ★✼✪✭✬✕✮✔✯◆❆✣✵P❖✭✧✥◗❘❀❍❙ ✺ ❙ ✸✏✵ (2) where ◗ is the input traffic in an average RTT and ❖ is a constant set to 0.1. This equation ensures that, every average RTT, at least 10% of the traffic is redistributed according to AIMD. The choice of 10% is a tradeoff between the time to converge to fairness and the disturbance the shuffling imposes on a system that is around optimal efficiency. Next, we compute the per-packet feedback that allows the FC to enforce the above policies. Since the increase law is additive whereas the decrease is multiplicative, it is convenient to compute the feedback assigned to packet ❚ as the combination of a positive feedback ❯❲❱ and a negative feedback ✑❲❱ . ✡ ✗✙✘✕✘✚✓✜✛✏✢✦✍✟✤❱ ★❳❯❱ ❀❨✑❱ ❇ (3) First, we compute the case when the aggregate feedback is positive (✺❍❑ ❆). In this case, we want to increase the throughput of all flows by the same amount. Thus, we want the change in the throughput of any flow ❚ to be proportional to the same constant, (i.e., ❩❬☞▼ ☛✕❭❊❪❴❫▼❯✙❪❴☞✷❱✳❵❛✍✟❭✕✑❜✶✟☞❝✢✣✑❞☞ ). Since we are dealing with a window-based protocol, we want to compute the change in congestion window rather than the change in throughput. The change in the congestion window of flow ❚ is the change in its throughput multiplied by its RTT. Hence, the change in the congestion window of flow ❚ should be proportional to the flow’s RTT, (i.e., ❩❘✍✏✎✒✑✔✓✣❱❡❵✼☛✞☞✌☞❝❱ ). The next step is to translate this desired change of congestion window to per-packet feedback that will be reported in the congestion header. The total change in congestion window of a flow is the sum of the per-packet feedback it receives. Thus, we obtain the perpacket feedback by dividing the change in congestion window by the expected number of packets from flow ❚ that the router sees in a control interval ✓. This number is proportional to the flow’s congestion window divided by its packet size (both in bytes), ❢✌❣❞❤❊✐✏❥ ❦ ❥ , and inversely proportional to itsround trip time, ☛✕☞✖☞❧❱ . Thus, the perpacket positive feedback is proportional to the square of the flow’s RTT, and inversely proportional to its congestion window divided by its packet size, (i.e., ❯❞❱♠❵ ♥✷♦♣♦♣q❥ ❢❝❣❞❤❊✐ ❥✖r ❦ ❥ ). Thus, positive feedback ❯❲❱ is given by: ❯❞❱❜★ts✟✉ ☛✕☞✖☞✷✈❱ ✧✞✶ ❱ ✍✏✎✒✑✔✓❱ ✵ (4) where s✉ is a constant. The total increase in the aggregate traffic rate is ✇✞①③②⑤④❝⑥✞⑦⑨⑧✜⑩ ❶✷❷ ✐ , where ❸❹✢✦❺♠✯ ✺ ✵❝❆❻✸ ensures that we are computing the positive feedback. This is equal to the sum of the increase in the rates of all flows in the aggregate, which is the sum of the positive feedback a flow has received divided by its RTT, and so: ▼ ✲❏✪✭✬✕✮✔✯ ✺ ✵✷❆❻✸ ✓ ★ ❼❽ ❯❞❱ ☛✞☞✌☞❝❱ ✵ (5) where ❾ is the number of packets seen by the router in an average RTT (the sum is over packets). From this, s✏✉ can be derived as: s✟✉✾★ ▼ ✲❳✪✭✬✕✮✔✯ ✺ ✵✷❆❿✸ ✓➀✧✞➁ ♥➂♦➃♦ ❥✌➄ ❦ ❥ ❢❝❣❞❤✕✐ ❥ ❇ (6) Similarly, we compute the per-packet negative feedback given when the aggregate feedback is negative (✺ ■➅❆). In this
ase. we want the decrease in the ow i to be proportional to its current of regardless of their current rate. Therefore, it is the multiplicative- decrease that helps converging to fairness. In TCP, multiplicative- Athroughput; x throughput). Consequently, th decrease is tied to the occurrence of a drop, which should be a rare the flows congestion window is proportional to its current con- event. In contrast, with XCP multiplicative-decrease is decoupled gestion window (i.e, Acwndi o condi). Again, the desired per- from drops and is performed every average rTT acket feedback is the desired change in the congestion window XCP is fairly robust to estimation errors. For example, we esti- divided by the expected number of packets from this flow that the hate the value of sp every d and use it as a prediction of sp during router sees in an interval d. Thus, we finally find that the per-packet the following control interval (i. e the following d ) If we underes- negative feedback should be proportional to the packet size multi timate Sp, we will fail to allocate all of the positive feedback in the plied by its fows RTT(i.e, ni ortti'si). Thus negative feedbac current control interval. Nonetheless the bandwidth we fail to allo- ni is given by cate will appear in our next estimation of the input traffic as a spare bandwidth, which will be allocated (or partially allocated) in the rtt following control interval. Thus, in every control interval, a po where En is a constant. tion of the spare bandwidth is allocated until none is left. Since our As with the increase case, the total decrease in the aggregate underestimation of Sp causes reduced allocation, the convergence traffic rate is the sum of the decrease in the rates of all flows in the to efficiency is slower than if our prediction of sp had been correct. aggregate Yet the error does not stop XCP from reaching full utilization. Sim ilarly, if we overestimate sp then we will allocate more feedback to flows at the beginning of a control interval and run out of aggregate (8) feedback quickly. This uneven spread of feedback over As so, En can be derived as tion interval does not affect convergence to utilization down convergence to fairness. A similar argument about other estimation errors they mainly affect the d∑s time rather than the correctness of the controllet XCP's parameters( d B) are constant whose values are here the sum is over all packets in a control interval(average independent of the number of sources, the delay, and the capacity of the bottleneck. This is a significant improvement over previ- ous approaches where specific values for the parameters work only 3.5.3 Notes on the efficiencyand Fairness Controllers in specific environments(e.g, RED), or the parameters have to be a This section summarizes the important points about the design chosen differently depending on the number of sources, the capac- the efficiency controller and the fairness controller. ity, and the delay(e.g, AVQ). In $4, we show how these constant As mentioned earlier, the efficiency and fairness controllers are values are chosen decoupled. Specifically, the efficiency controller uses a Finally, implementing the efficiency and fairness Multiplicative-Increase Multiplicative-Decrease law(MIMD), which fairly simple and requires only a few lines of code increases the traffic rate proportionally to the spare bandwidth Appendix A. We note that an XCP router performs the system(instead of increasing by one packet/RTT/flow as TCP additions and 3 multiplications per packet, making it does). This allows XCP to quickl re the positive spare band- choice even as a backbone router width even over high capacity links. The fairness controller, on the other hand, uses an Additive-Increase Multiplicative-Decrease law 4. STABILITY ANALYSIS (AIMD), which converges to fairness [10]. Thus, the decoupling allows each controller to use a suitable control law We use a fluid model of the traffic to analyze the stability of The particular control laws used by the efficiency controller XCP. Our analysis considers a single link traversed by multiple (MIMD) and the fairness controller(AIMD) are not the only poss XCP flows. For the sake of simplicity and tractability, similarly ble choices. For example, in [20] we describe a fairness controller to previous work [22, 15, 23, 24], our analysis assumes all fiows have a common, finite, and positive round trip delay, and neglects that uses a binomial law similar to those described in[6]. We chose boundary conditions (i.e, queues are bounded, rates cannot be neg- the control laws above because our analysis and simulation demon- ative). Later, we demonstrate through extensive simulations that rmance even with larger topologies, different RTTs, and boundary condi- We note that the efficiency controller satisfies the requirements tions, our results still hold in $2. The dynamics of the aggregate traffic are specified by the The main result can be stated as follows aggregate feedback and stay independent of the number of flows traversing the link. Additionally, in contrast to TCP where the in- THEOREM 1. Suppose the round trip delay is d Ithe parame crease/decrease rules are indifferent to the degree of congestion in rs a and B satisfy the network, the aggregate feedback sent by the EC is proportional to the degree of under- or over-utilization. Furthermore since the aggregate feedback is given over an average rtt, XCP becomes less aggressive as the round trip delay increases then the system is stable independently of delay, capaciry, and num Although the fairness controller uses AIMD, it converges to fair- ber of sources. ness faster than TCP. Note that in AIMD, all fiows increase equally 3There is one type of error that may preven complet hich is the unbalanced a to fully grasp from Equation 1. We refer the reader all of the shuffied traffic but fa vent us from reachi mismatch Is inversely proportional to the square of delay on queue- lization. Yet note that the shuffled traffic is only 10% of the traffic. Furthermore, shufil ing exists only when |ol<.1,input
case, we want the decrease in the throughput of flow ❚ to be proportional to its current throughput (i.e., ❩throughput❱➆❵ throughput❱ ). Consequently, the desired change in the flow’s congestion window is proportional to its current congestion window (i.e., ❩cwnd❱➇❵ cwnd❱ ). Again, the desired perpacket feedback is the desired change in the congestion window divided by the expected number of packets from this flow that the router sees in an interval ✓. Thus, we finally find that the per-packet negative feedback should be proportional to the packet size multiplied by its flow’s RTT (i.e., ✑♠❱❡❵✼☛✞☞✌☞❝❱✚✧➃✶✚❱ ). Thus negative feedback ✑❱ is given by: ✑③❱♠★✼s❤ ✧✚☛✞☞✌☞P❱❞✧✕✶✚❱ (7) where s ❤ is a constant. As with the increase case, the total decrease in the aggregate traffic rate is the sum of the decrease in the rates of all flows in the aggregate: ▼ ✲❳✪✭✬✕✮✔✯❝❀ ✺ ✵✷❆❿✸ ✓ ★ ❼❽ ✑③❱ ☛✕☞✖☞ ❱ ❇ (8) As so, s ❤ can be derived as: s ❤ ★ ▼ ✲❏✪✭✬✕✮✔✯❝❀ ✺ ✵P❆❿✸ ✓✾✧✕➁➈✶✞❱ ✵ (9) where the sum is over all packets in a control interval (average RTT). 3.5.3 Notes on the Efficiency and FairnessControllers This section summarizes the important points about the design of the efficiency controller and the fairness controller. As mentioned earlier, the efficiency and fairness controllers are decoupled. Specifically, the efficiency controller uses a Multiplicative-Increase Multiplicative-Decrease law (MIMD), which increases the traffic rate proportionally to the spare bandwidth in the system (instead of increasing by one packet/RTT/flow as TCP does). This allows XCP to quickly acquire the positive spare bandwidth even over high capacity links. The fairness controller, on the other hand, uses an Additive-Increase Multiplicative-Decrease law (AIMD), which converges to fairness [10]. Thus, the decoupling allows each controller to use a suitable control law. The particular control laws used by the efficiency controller (MIMD) and the fairness controller (AIMD) are not the only possible choices. For example, in [20] we describe a fairness controller that uses a binomial law similar to those described in [6]. We chose the control laws above because our analysis and simulation demonstrate their good performance. We note that the efficiency controller satisfies the requirements in ✄ 2. The dynamics of the aggregate traffic are specified by the aggregate feedback and stay independent of the number of flows traversing the link. Additionally, in contrast to TCP where the increase/decrease rules are indifferent to the degree of congestion in the network, the aggregate feedback sent by the EC is proportional to the degree of under- or over-utilization. Furthermore, since the aggregate feedback is given over an average RTT, XCP becomes less aggressive as the round trip delay increases.2 Although the fairness controller uses AIMD, it converges to fairness faster than TCP. Note that in AIMD, all flows increase equally ✈ The relation between XCP’s dynamics and feedback delay is hard to fully grasp from Equation 1. We refer the reader to Equation 16, which shows that the change in throughput based on rate-mismatch is inversely proportional to delay, and the change based on queuemismatch is inversely proportional to the square of delay. regardless of their current rate. Therefore, it is the multiplicativedecrease that helps converging to fairness. In TCP, multiplicativedecrease is tied to the occurrence of a drop, which should be a rare event. In contrast, with XCP multiplicative-decrease is decoupled from drops and is performed every average RTT. XCP is fairly robust to estimation errors. For example, we estimate the value of s✉ every ✓ and use it as a prediction of s✉ during the following control interval (i.e., the following ✓). If we underestimate s✉ , we will fail to allocate all of the positive feedback in the current control interval. Nonetheless, the bandwidth we fail to allocate will appear in our next estimation of the input traffic as a spare bandwidth, which will be allocated (or partially allocated) in the following control interval. Thus, in every control interval, a portion of the spare bandwidth is allocated until none is left. Since our underestimation of s➂✉ causes reduced allocation, the convergence to efficiency is slower than if our prediction of s✉ had been correct. Yet the error does not stop XCP from reaching full utilization. Similarly, if we overestimate s✉ then we will allocate more feedback to flows at the beginning of a control interval and run out of aggregate feedback quickly. This uneven spread of feedback over the allocation interval does not affect convergence to utilization but it slows down convergence to fairness. A similar argument can be made about other estimation errors; they mainly affect the convergence time rather than the correctness of the controllers.3 XCP’s parameters (i.e., ✻ and ❁) are constant whose values are independent of the number of sources, the delay, and the capacity of the bottleneck. This is a significant improvement over previous approaches where specific values for the parameters work only in specific environments (e.g, RED), or the parameters have to be chosen differently depending on the number of sources, the capacity, and the delay (e.g., AVQ). In ✄ 4, we show how these constant values are chosen. Finally, implementing the efficiency and fairness controllers is fairly simple and requires only a few lines of code as shown in Appendix A. We note that an XCP router performs only a few additions and 3 multiplications per packet, making it an attractive choice even as a backbone router. 4. STABILITY ANALYSIS We use a fluid model of the traffic to analyze the stability of XCP. Our analysis considers a single link traversed by multiple XCP flows. For the sake of simplicity and tractability, similarly to previous work [22, 15, 23, 24], our analysis assumes all flows have a common, finite, and positive round trip delay, and neglects boundary conditions (i.e., queues are bounded, rates cannot be negative). Later, we demonstrate through extensive simulations that even with larger topologies, different RTTs, and boundary conditions, our results still hold. The main result can be stated as follows. THEOREM 1. Suppose the round trip delay is ✓. If the parameters ✻ and ❁ satisfy: ❆✩■➉✻❏■ ➊ ❈✣➋❉ and ❁➌★✫✻✈ ➋❉✣✵ then the system is stable independently of delay, capacity, and number of sources. ➍ There is one type of error that may prevent the convergence to complete efficiency, which is the unbalanced allocation and deallocation of the shuffled traffic. For example, if by the end of a control interval we deallocate all of the shuffled traffic but fail to allocate it, then the shuffling might prevent us from reaching full link utilization. Yet note that the shuffled traffic is only 10% of the input traffic. Furthermore, shuffling exists only when ❙ ✺ ❙✣■❍❆✣❇➎✆✟◗