《信息网络协议基础》课程教学资源（参考书籍）High Performance Switches and Routers（H. JONATHAN CHAO and BIN LIU）.pdf_P31-P35

18 INTRODUCTION stations,displays information,and takes control actions.For example,an NMS aggregates the events from all the related NEs in handling a specific fault condition to identify the root cause and to reduce the number of events that are sent to the OSS for further processing.Note that the NMS is independent of specific NEs. OSS Level.By combing the network topology information,the OSS further collects and processes the data for specific operational needs.Therefore,the OSS can have subsystems for PM,FM,AM,and SM. A key feature of this architecture is that each of the three levels performs all of the network management functions by generating,collecting,processing,and logging the events to solve the scalability issues in large-scale networks. There are many NMS tools that are commercially available [28,29].For example,Cisco's IOS for the management of LANs (local area networks)and WANs(wide area networks) built on Cisco switches and routers;and Nortel's Optivity NMS for the management of Nortel's ATM switches and routers.To manage networks with heterogeneous NEs,the available tools are HPOV,Node Manager,Aprisma's SPECTRUM,and Sun's Solstice NMS.These tools support SNMP and can be accessed through a graphical user interface (GUD)and command line interface(CLD).Some of them also provide automated assistance for CM and FM tasks. 1.5.3 Element Management System As a generic solution for configuring network devices,monitoring status,and checking devices for errors,the Internet-standard framework for network management is used for the management tasks of an NE,as for an IP network.Therefore,functionally,an EMS and NMS have the same architectures.The same five functions for network management are also used for element functions The architecture of a general EMS is shown in Figure 1.14.On the device side,the device must be manageable,that is,it must have a management agent such as the SNMP agent (or server),corresponding data structures,and a storage area for the data.On the EMS station side,the station must have a management client such as the SNMP manager (or client).In between the management station and the managed device,we also need a protocol for the communications of the two parties,for example,SNMP. The core function to manage a device is implemented by using an SNMP manager. Whenever there is a command issued by a user through the user interface,the command is received by the SNMP manager after parsing.If it is a configure command,the SNMP manager issues an SNMP request to the SNMP agent inside the device.From the device,the SNMP agent then goes to the management information bases(MIBs)to change the value of a specified MIB object.This is shown as 'Config'in Figure 1.14.Config can be done by a simple command such as 'set'. Similarly,if the command issued by the user is to get the current status of the device,the SNMP manager issues an SNMP request to the SNMP agent inside the device.From the device,the SNMP agent then goes to the MIBs to get the value of a specified MIB object by a'get'command,which is shown as 'View'in Figure 1.14.Then,the SNMP agent forwards the obtained MIB values to the SNMP manager as response back.The response is finally sent to the user for display on the GUI or CLI console. In some cases,the device may send out messages to its SNMP agent autonomously.One example is the trap or alarm,where the initiator of the event is not the user interface but

Book1099 — “c01” — 2007/2/16 — 18:26 — page 18 — #18 18 INTRODUCTION stations, displays information, and takes control actions. For example, an NMS aggregates the events from all the related NEs in handling a specific fault condition to identify the root cause and to reduce the number of events that are sent to the OSS for further processing. Note that the NMS is independent of specific NEs. OSS Level. By combing the network topology information, the OSS further collects and processes the data for specific operational needs. Therefore, the OSS can have subsystems for PM, FM, AM, and SM. A key feature of this architecture is that each of the three levels performs all of the network management functions by generating, collecting, processing, and logging the events to solve the scalability issues in large-scale networks. There are many NMS tools that are commercially available [28, 29]. For example, Cisco’s IOS for the management of LANs (local area networks) and WANs (wide area networks) built on Cisco switches and routers; and Nortel’s Optivity NMS for the management of Nortel’s ATM switches and routers. To manage networks with heterogeneous NEs, the available tools are HPOV, Node Manager, Aprisma’s SPECTRUM, and Sun’s Solstice NMS. These tools support SNMP and can be accessed through a graphical user interface (GUI) and command line interface (CLI). Some of them also provide automated assistance for CM and FM tasks. 1.5.3 Element Management System As a generic solution for configuring network devices, monitoring status, and checking devices for errors, the Internet-standard framework for network management is used for the management tasks of an NE, as for an IP network. Therefore, functionally, an EMS and NMS have the same architectures. The same five functions for network management are also used for element functions. The architecture of a general EMS is shown in Figure 1.14. On the device side, the device must be manageable, that is, it must have a management agent such as the SNMP agent (or server), corresponding data structures, and a storage area for the data. On the EMS station side, the station must have a management client such as the SNMP manager (or client). In between the management station and the managed device, we also need a protocol for the communications of the two parties, for example, SNMP. The core function to manage a device is implemented by using an SNMP manager. Whenever there is a command issued by a user through the user interface, the command is received by the SNMP manager after parsing. If it is a configure command, the SNMP manager issues an SNMP request to the SNMP agent inside the device. From the device, the SNMP agent then goes to the management information bases (MIBs) to change the value of a specified MIB object. This is shown as ‘Config’ in Figure 1.14. Config can be done by a simple command such as ‘set’. Similarly, if the command issued by the user is to get the current status of the device, the SNMP manager issues an SNMP request to the SNMP agent inside the device. From the device, the SNMP agent then goes to the MIBs to get the value of a specified MIB object by a ‘get’command, which is shown as ‘View’in Figure 1.14. Then, the SNMP agent forwards the obtained MIB values to the SNMP manager as response back. The response is finally sent to the user for display on the GUI or CLI console. In some cases, the device may send out messages to its SNMP agent autonomously. One example is the trap or alarm, where the initiator of the event is not the user interface but

20 INTRODUCTION Chapter 2 describes various schemes to look up a route.For a 10-Gbps line,each lookup is required to complete within 40 ns.In present day forwarding tables,there can be as many as 500,000 routes.As more hosts are added to the Internet,the forwarding table will grow one order of magnitude in the near future,especially as IPv6 emerges.Many high-speed lookup algorithms and architectures have been proposed in the past several years and can be generally divided into ternary content address memory (TCAM)-based or algorithmic- based.The latter uses novel data structure and efficient searching methods to look up a route in a memory.It usually requires a larger space than the TCAM approach,but consumes much less power than a TCAM. Chapter 3 describes various schemes for packet classification.To meet various QoS requirements and security concerns,other fields of a packet header,beyond the IP destination address,are often examined.Various schemes have been proposed and are compared in terms of their classification speed,the capability of accommodating a large number of fields in the packet headers,and the number of filtering rules in the classification table. Because more fields in the packet header need to be examined,in prefix or range formats, it imposes greater challenges to achieve high-speed operations.TCAM is a key component in packet classification,similar to route lookup.Various algorithmic approaches have been investigated by using ordinary memory chips to save power and cost. Chapter 4 describes several traffic management schemes to achieve various QoS require- ments.This chapter starts by explaining Integrated Services (IntServ)and Differentiated Service(DiffServ).Users need to comply with some contract to not send excessive traffic to the network.As a result,there is a need to police or shape users'traffic if they don't comply with the predetermined contract.Several schemes have been proposed to meet various QoS requirements.They are divided into two parts,packet scheduling to meet various delay/bandwidth requirements and buffer management to provide different loss preferences. Chapter 5 describes the basics of packet switching by showing some fundamental switch- ing concepts and switching fabric structures.Almost all packet switching fabrics can route packets autonomously without an external configuration controller,as circuit switches do. One of the important challenges in building large-scale,high-performance switches is to resolve packet contention,where multiple packets are heading to the same output and only one of them can be transmitted at a time.Buffers are used to temporarily store those packets that lost the contention.However,the placement of the buffers,coupled with con- tention resolution schemes,determines much of the switch's scalability,operation speed, and performance Chapter 6 describes the shared-memory switch,which is the best performance/cost switch architecture.Memory is shared by all inputs and outputs,and thus has the best buffer utilization.In addition,delay performance is also the best because of no head-of-line blocking.On the other hand,the memory needs to operate at the speed of the aggregated bandwidth of all input and output ports.As the line rate or the port number increases,the switch size is limited due to the memory speed constraint.Several architectures have been proposed to tackle the scalability issue by using multiple shared-memory switch modules in parallel.The difference between various proposed ideas lies in the ways of dispatching packets from the input ports to the switch modules. Chapter 7 describes various packet scheduling schemes for input-buffered switches.The complexity of resolving packet contention can cause the system to bottleneck as the switch size and the line speed increase.The objective is to find a feasible scheduling scheme (e.g.,in a time complexity of O(log N)where N is the switch size),close to 100 percent

Book1099 — “c01” — 2007/2/16 — 18:26 — page 20 — #20 20 INTRODUCTION Chapter 2 describes various schemes to look up a route. For a 10-Gbps line, each lookup is required to complete within 40 ns. In present day forwarding tables, there can be as many as 500,000 routes. As more hosts are added to the Internet, the forwarding table will grow one order of magnitude in the near future, especially as IPv6 emerges. Many high-speed lookup algorithms and architectures have been proposed in the past several years and can be generally divided into ternary content address memory (TCAM)-based or algorithmicbased. The latter uses novel data structure and efficient searching methods to look up a route in a memory. It usually requires a larger space than the TCAM approach, but consumes much less power than a TCAM. Chapter 3 describes various schemes for packet classification. To meet various QoS requirements and security concerns, other fields of a packet header, beyond the IP destination address, are often examined. Various schemes have been proposed and are compared in terms of their classification speed, the capability of accommodating a large number of fields in the packet headers, and the number of filtering rules in the classification table. Because more fields in the packet header need to be examined, in prefix or range formats, it imposes greater challenges to achieve high-speed operations. TCAM is a key component in packet classification, similar to route lookup. Various algorithmic approaches have been investigated by using ordinary memory chips to save power and cost. Chapter 4 describes several traffic management schemes to achieve various QoS requirements. This chapter starts by explaining Integrated Services (IntServ) and Differentiated Service (DiffServ). Users need to comply with some contract to not send excessive traffic to the network. As a result, there is a need to police or shape users’ traffic if they don’t comply with the predetermined contract. Several schemes have been proposed to meet various QoS requirements. They are divided into two parts, packet scheduling to meet various delay/bandwidth requirements and buffer management to provide different loss preferences. Chapter 5 describes the basics of packet switching by showing some fundamental switching concepts and switching fabric structures. Almost all packet switching fabrics can route packets autonomously without an external configuration controller, as circuit switches do. One of the important challenges in building large-scale, high-performance switches is to resolve packet contention, where multiple packets are heading to the same output and only one of them can be transmitted at a time. Buffers are used to temporarily store those packets that lost the contention. However, the placement of the buffers, coupled with contention resolution schemes, determines much of the switch’s scalability, operation speed, and performance. Chapter 6 describes the shared-memory switch, which is the best performance/cost switch architecture. Memory is shared by all inputs and outputs, and thus has the best buffer utilization. In addition, delay performance is also the best because of no head-of-line blocking. On the other hand, the memory needs to operate at the speed of the aggregated bandwidth of all input and output ports. As the line rate or the port number increases, the switch size is limited due to the memory speed constraint. Several architectures have been proposed to tackle the scalability issue by using multiple shared-memory switch modules in parallel. The difference between various proposed ideas lies in the ways of dispatching packets from the input ports to the switch modules. Chapter 7 describes various packet scheduling schemes for input-buffered switches. The complexity of resolving packet contention can cause the system to bottleneck as the switch size and the line speed increase. The objective is to find a feasible scheduling scheme (e.g., in a time complexity of O(log N) where N is the switch size), close to 100 percent

1.6 OUTLINE OF THE BOOK 21 throughput,and low average delay.Several promising schemes have been proposed to achieve 100 percent throughput without speeding up the internal fabric operation speed. However,their time complexity is very high and prohibits them from being implemented for real applications.One practical way to maintain high-throughput and low delay is to increase the internal switch fabric's operation speed,for example,twice the line rate,to compensate for the deficiency of contention resolution schemes.However,it requires output buffers and,thus,increases the implementation cost.Most packets are delivered to the output buffers and wait there.Thus,some kind of backpressure mechanism is needed to throttle the packets from jamming at the output buffers and from being discarded when buffers overflow. Chapter 8 describes banyan-based switches.They have a regular structure to interconnect many switch modules that can be 2 x 2 or larger.The multistage interconnection network was investigated intensively in early 1970s for interconnecting processors to make a pow- erful computer.The banyan-based switches received a lot of attention in early 1980s when people started fast packet switching research.One of the reasons is that the switch size can be scaled to very large by adding more stages.However,the interconnection wire can be very long due to the shuffle type of interconnections and they can occupy a large space, inducing considerable propagation delay between the devices,and difficulty in synchroniz- ing the switch modules on the same stage.As a result,one can rarely find a commercial switch/router that is built with the banyan structure. Chapter 9 describes Knockout switches.It has been proven that output-buffered or shared-memory switches demonstrate the best performance,where packets from all inputs need to be stored in an output buffer if they all are destined for the same output.The memory speed constraint limits the switch size.However,what is the probability that all incoming packets are destined to the same output?If the probability is very low,why do we need to have the output buffer receive all of them at the same time?A group of researchers at Bell Labs in the late 1980s tried to resolve this problem by limiting the number of pack- ets that can arrive at an output port at the same time,thus relaxing the speed requirement of the memory at the output ports.Excessive cells are discarded (or knocked out)by the switch fabric.Various switch architectures using the knockout principle are presented in this chapter. Chapter 10 describes the Abacus switch that was prototyped at Polytechnic University by the first author of the book.It takes advantage of the knockout principle by feeding back those packets that are knocked out in the first round to retry.As a result,the packets will not be discarded by the switch fabric.The Abacus switch resolves contention by taking advantage of the cross-bar structure of the switch.It can also support the multicasting function due to the nature of the cross-bar structure and the arbitration scheme used in each switch element.The switch fabric has been implemented on ASICs. Chapter 11 describes crosspoint buffered switches.There are several variants depending where the buffers are placed-only at the crosspoints,both at the inputs and the crosspoints, or at the inputs,the crosspoints,and the outputs.The crosspoint buffer increases performance from input-buffered switches,where performance degradation is due to the head-of-line blocking.In the crosspoint switches,packets are temporarily stored in the crosspoint buffer, allowing multiple packets to be sent out from the same input,which is not possible for the input-buffered switch.However,the trade-off is to implement the memory within the switch fabric.With today's very large scale integration (VLSI)technology,the on-chip memory can be a few tens of megabits,which is sufficient to store a few tens of packets at each crosspoint.Another advantage of this switch architecture is that it allows packet scheduling from inputs to the crosspoint buffers and packet scheduling from the crosspoint buffers to the

Book1099 — “c01” — 2007/2/16 — 18:26 — page 21 — #21 1.6 OUTLINE OF THE BOOK 21 throughput, and low average delay. Several promising schemes have been proposed to achieve 100 percent throughput without speeding up the internal fabric operation speed. However, their time complexity is very high and prohibits them from being implemented for real applications. One practical way to maintain high-throughput and low delay is to increase the internal switch fabric’s operation speed, for example, twice the line rate, to compensate for the deficiency of contention resolution schemes. However, it requires output buffers and, thus, increases the implementation cost. Most packets are delivered to the output buffers and wait there. Thus, some kind of backpressure mechanism is needed to throttle the packets from jamming at the output buffers and from being discarded when buffers overflow. Chapter 8 describes banyan-based switches. They have a regular structure to interconnect many switch modules that can be 2 × 2 or larger. The multistage interconnection network was investigated intensively in early 1970s for interconnecting processors to make a powerful computer. The banyan-based switches received a lot of attention in early 1980s when people started fast packet switching research. One of the reasons is that the switch size can be scaled to very large by adding more stages. However, the interconnection wire can be very long due to the shuffle type of interconnections and they can occupy a large space, inducing considerable propagation delay between the devices, and difficulty in synchronizing the switch modules on the same stage. As a result, one can rarely find a commercial switch/router that is built with the banyan structure. Chapter 9 describes Knockout switches. It has been proven that output-buffered or shared-memory switches demonstrate the best performance, where packets from all inputs need to be stored in an output buffer if they all are destined for the same output. The memory speed constraint limits the switch size. However, what is the probability that all incoming packets are destined to the same output? If the probability is very low, why do we need to have the output buffer receive all of them at the same time? A group of researchers at Bell Labs in the late 1980s tried to resolve this problem by limiting the number of packets that can arrive at an output port at the same time, thus relaxing the speed requirement of the memory at the output ports. Excessive cells are discarded (or knocked out) by the switch fabric. Various switch architectures using the knockout principle are presented in this chapter. Chapter 10 describes the Abacus switch that was prototyped at Polytechnic University by the first author of the book. It takes advantage of the knockout principle by feeding back those packets that are knocked out in the first round to retry. As a result, the packets will not be discarded by the switch fabric. The Abacus switch resolves contention by taking advantage of the cross-bar structure of the switch. It can also support the multicasting function due to the nature of the cross-bar structure and the arbitration scheme used in each switch element. The switch fabric has been implemented on ASICs. Chapter 11 describes crosspoint buffered switches. There are several variants depending where the buffers are placed – only at the crosspoints, both at the inputs and the crosspoints, or at the inputs, the crosspoints, and the outputs. The crosspoint buffer increases performance from input-buffered switches, where performance degradation is due to the head-of-line blocking. In the crosspoint switches, packets are temporarily stored in the crosspoint buffer, allowing multiple packets to be sent out from the same input, which is not possible for the input-buffered switch. However, the trade-off is to implement the memory within the switch fabric. With today’s very large scale integration (VLSI) technology, the on-chip memory can be a few tens of megabits, which is sufficient to store a few tens of packets at each crosspoint. Another advantage of this switch architecture is that it allows packet scheduling from inputs to the crosspoint buffers and packet scheduling from the crosspoint buffers to the