《信息网络协议基础》课程教学资源（参考书籍）High Performance Switches and Routers（H. JONATHAN CHAO and BIN LIU）.pdf_P26-P30

Book1099 — “c01” — 2007/2/16 — 18:26 — page 13 — #13 1.4 DESIGN OF CORE ROUTERS 13 Multi-stage switch fabric S1 S2 S1 S3 S1 S2 S3S1 Figure 1.12 High-level diagram of Cisco CRS-1 multi-stage switch fabric. 1.4 DESIGN OF CORE ROUTERS Core routers are designed to move traffic as quickly as possible. With the introduction of diverse services at the edges and rapidly increasing bandwidth requirements, core routers now have to be designed to be more flexible and scalable than in the past. To this end, design goals of core routers generally fall into the following categories: Packet Forwarding Performance. Core routers need to provide packet forwarding performance in the range of hundreds of millions of packets per second. This is required to support existing services at the edges, to grow these services in future, and to facilitate the delivery of new revenue-generating services. Scalability. As the traffic rate at the edges grows rapidly, service providers are forced to upgrade their equipment every three to five years. Latest core routers are designed to scale well such that subsequent upgrades are cheaper to the providers. To this end, the latest routers are designed as a routing matrix to add future bandwidth while keeping the current infrastructure in place. In addition, uniform software images and user interfaces across upgrades ensure the users do not need to be retrained to operate the new router. Bandwidth Density. Another issue with core routers is the amount of real estate and power required to operate them. Latest core routers increase bandwidth density by providing higher bandwidths in small form-factors. For example, core routers that provide 32 × OC-192 or 128 × OC-48 interfaces in a half-rack space are currently available on the market. Such routers consume less power and require less real estate. Service Delivery Features. In order to provide end-to-end service guarantees, core routers are also required to provide various services such as aggregate DiffServ classes, packet filtering, policing, rate-limiting, and traffic monitoring at high speeds

14 INTRODUCTION These services must be provided by core routers without impacting packet forwarding performance. Availability.As core routers form a critical part of the network,any failure of a core router can impact networks dramatically.Therefore,core routers require higher availability during high-traffic conditions and during maintenance.Availability on most core routers is achieved via redundant,hot-swappable hardware components,and modular software design.The latest core routers allow for hardware to be swapped out and permit software upgrades while the system is on-line. Security.As the backbone of network infrastructure,core routers are required to provide some security related functions as well.Besides a secure design and implementation of their own components against denial of service attacks and other vulnerabilities,the routers also provide rate-limiting,filtering,tracing,and logging to support security services at the edges of networks. It is very challenging to design a cost-effective large IP router with a capacity of a few hundred terabits/s to a few petabit/s.Obviously,the complexity and cost of building a large-capacity router is much higher than building an OXC.This is because,for packet switching,there is a requirement to process packets (such as classification,table lookup,and packet header modification),store them,schedule them,and perform buffer management. As the line rate increases,the processing and scheduling time associated with each packet is proportionally reduced.Also,as the router capacity increases,the time interval for resolving output contention becomes more constrained.Memory and interconnection technologies are the most demanding when designing a large-capacity packet switch.The former very often becomes a bottleneck for a large-capacity packet switch while the latter significantly affects a system's power consumption and cost.As a result,designing a cost-effective,large capacity switch architecture still remains a challenge.Several design issues are discussed below. Memory Speed.As optical and electronic devices operate at 10 Gbps (OC-192)at present,the technology and the demand for optical channels operating at 40 Gbps (OC-768)is a emerging.The port speed to a switch fabric is usually twice that of the line speed.This is to overcome some performance degradation that otherwise arises due to output port contention and the overhead used to carry routing,flow control,and QoS information in the packet/cell header.As a result,the aggregated I/O bandwidth of the memory at the switch port can be 120 Gbps.Considering 40-byte packets,the cycle time of the buffer memory at each port is required to be less than 2.66 ns.This is still very challenging with current memory technology,especially when the required memory size is very large and cannot be integrated into the ASIC(application specific integrated circuit),such as for the traffic manager or other switch interface chips.In addition,the pin count for the buffer memory can be several hundreds,limiting the number of external memories that can be attached to the ASIC. Packet Arbitration.An arbitrator is used to resolve output port contention among the input ports.Considering a 40-Gbps switch port with 40-byte packets and a speedup of two,the arbitrator has only about 4 ns to resolve the contention.As the number of input ports increases,the time to resolve the contention reduces.It can be implemented in a centralized way,where the interconnection between the arbitrator and all input line (or port)cards can be prohibitively complex and expensive.On the other hand,it

Book1099 — “c01” — 2007/2/16 — 18:26 — page 14 — #14 14 INTRODUCTION These services must be provided by core routers without impacting packet forwarding performance. Availability. As core routers form a critical part of the network, any failure of a core router can impact networks dramatically. Therefore, core routers require higher availability during high-traffic conditions and during maintenance. Availability on most core routers is achieved via redundant, hot-swappable hardware components, and modular software design. The latest core routers allow for hardware to be swapped out and permit software upgrades while the system is on-line. Security. As the backbone of network infrastructure, core routers are required to provide some security related functions as well. Besides a secure design and implementation of their own components against denial of service attacks and other vulnerabilities, the routers also provide rate-limiting, filtering, tracing, and logging to support security services at the edges of networks. It is very challenging to design a cost-effective large IP router with a capacity of a few hundred terabits/s to a few petabit/s. Obviously, the complexity and cost of building a large-capacity router is much higher than building an OXC. This is because, for packet switching, there is a requirement to process packets (such as classification, table lookup, and packet header modification), store them, schedule them, and perform buffer management. As the line rate increases, the processing and scheduling time associated with each packet is proportionally reduced. Also, as the router capacity increases, the time interval for resolving output contention becomes more constrained. Memory and interconnection technologies are the most demanding when designing a large-capacity packet switch. The former very often becomes a bottleneck for a large-capacity packet switch while the latter significantly affects a system’s power consumption and cost. As a result, designing a cost-effective, large capacity switch architecture still remains a challenge. Several design issues are discussed below. Memory Speed. As optical and electronic devices operate at 10 Gbps (OC-192) at present, the technology and the demand for optical channels operating at 40 Gbps (OC-768) is a emerging. The port speed to a switch fabric is usually twice that of the line speed. This is to overcome some performance degradation that otherwise arises due to output port contention and the overhead used to carry routing, flow control, and QoS information in the packet/cell header. As a result, the aggregated I/O bandwidth of the memory at the switch port can be 120 Gbps. Considering 40-byte packets, the cycle time of the buffer memory at each port is required to be less than 2.66 ns. This is still very challenging with current memory technology, especially when the required memory size is very large and cannot be integrated into the ASIC (application specific integrated circuit), such as for the traffic manager or other switch interface chips. In addition, the pin count for the buffer memory can be several hundreds, limiting the number of external memories that can be attached to the ASIC. Packet Arbitration. An arbitrator is used to resolve output port contention among the input ports. Considering a 40-Gbps switch port with 40-byte packets and a speedup of two, the arbitrator has only about 4 ns to resolve the contention. As the number of input ports increases, the time to resolve the contention reduces. It can be implemented in a centralized way, where the interconnection between the arbitrator and all input line (or port) cards can be prohibitively complex and expensive. On the other hand, it

1.4 DESIGN OF CORE ROUTERS 15 can be implemented in a distributed way,where the line cards and switch cards are involved in the arbitration.The distributed implementation may degrade throughput and delay performance due to lack of the availability of the state information of all inputs and outputs.As a result,a higher speedup is required in the switch fabric to improve performance. OoS Control.Similar to the above packet arbitration problem,as the line(port)speed increases,the execution of policing/shaping at the input ports and packet scheduling and buffer management(discarding packet policies)at the output port(to meet the QoS requirement of each flow or each class)can be very difficult and challenging.The buffer size at each line card is usually required to hold up to 100 ms worth of packets. For a 40-Gbps line,the buffer can be as large as 500 Mbytes,which can store hundreds of thousands of packets.Choosing a packet to depart or to discard within 4 to 8 ns is not trivial.In addition,the number of states that need to be maintained to do per-flow control can be prohibitively expensive.An alternative is to do class-based scheduling and buffer management,which is more sensible at the core network,because the number of flows and the link speed is too high.Several shaping and scheduling schemes require time stamping arriving packets and scheduling their departure based on the time stamp values.Choosing a packet with the smallest time stamp in 4 to 8ns can cause a bottleneck. Optical Interconnection.A large-capacity router usually needs multiple racks to house all the line cards,port cards (optional),switch fabric cards,and controller cards,such as route controller,management controller,and clock distribution cards.Each rack may accommodate 0.5 to I terabit/s capacity depending on the density of the line and switch fabric cards and may need to communicate with another rack (e.g.,the switch fabric rack)with a bandwidth of 0.5 to 1.0 terabit/s in each direction.With current VCSEL technology,an optical transceiver can transmit up to 300 meters with 12 SERDES (serializer/deserializer)channels,each running at 2.5 or 3.125 Gbps [18]. They have been widely used for backplane interconnections.However,the size and power consumption of these optical devices could limit the number of interconnections on each circuit board,resulting in more circuit boards,and thus higher implementation costs.Furthermore,a large number of optical fibers are required to interconnect multiple racks.This increases installation costs and makes fiber reconfiguration and maintenance difficult.The layout of fiber needs to be carefully designed to reduce potential interruption caused by human error.Installing new fibers to scale the router's capacity can be mistake-prone and disrupting to the existing services. Power Consumption.As SERDES technology allows more than a hundred bi-directional channels,each operating at 2.5 or 3.125 Gbps,on a CMOS (complementary metal- oxide-semiconductor)chip [19,20],its power dissipation can be as high as 20 W. With VCSEL technology,each bi-directional connection can consume 250 mW.If we assume that I terabit/s bandwidth is required for interconnection to other racks, it would need 400 optical bi-directional channels (each 2.5 Gbps),resulting in a total of 1000 W per rack for optical interconnections.Each rack may dissipate up to several thousands watts due to the heat dissipation limitation.which in turn limits the number of components that can be put on each card and limits the number of cards on each rack.The large power dissipation also increases the cost of air-conditioning the room.The power consumption cannot be overlooked from the global viewpoint of the Internet [21]

Book1099 — “c01” — 2007/2/16 — 18:26 — page 15 — #15 1.4 DESIGN OF CORE ROUTERS 15 can be implemented in a distributed way, where the line cards and switch cards are involved in the arbitration. The distributed implementation may degrade throughput and delay performance due to lack of the availability of the state information of all inputs and outputs. As a result, a higher speedup is required in the switch fabric to improve performance. QoS Control. Similar to the above packet arbitration problem, as the line (port) speed increases, the execution of policing/shaping at the input ports and packet scheduling and buffer management (discarding packet policies) at the output port (to meet the QoS requirement of each flow or each class) can be very difficult and challenging. The buffer size at each line card is usually required to hold up to 100 ms worth of packets. For a 40-Gbps line, the buffer can be as large as 500 Mbytes, which can store hundreds of thousands of packets. Choosing a packet to depart or to discard within 4 to 8 ns is not trivial. In addition, the number of states that need to be maintained to do per-flow control can be prohibitively expensive. An alternative is to do class-based scheduling and buffer management, which is more sensible at the core network, because the number of flows and the link speed is too high. Several shaping and scheduling schemes require time stamping arriving packets and scheduling their departure based on the time stamp values. Choosing a packet with the smallest time stamp in 4 to 8 ns can cause a bottleneck. Optical Interconnection. A large-capacity router usually needs multiple racks to house all the line cards, port cards (optional), switch fabric cards, and controller cards, such as route controller, management controller, and clock distribution cards. Each rack may accommodate 0.5 to 1 terabit/s capacity depending on the density of the line and switch fabric cards and may need to communicate with another rack (e.g., the switch fabric rack) with a bandwidth of 0.5 to 1.0 terabit/s in each direction. With current VCSEL technology, an optical transceiver can transmit up to 300 meters with 12 SERDES (serializer/deserializer) channels, each running at 2.5 or 3.125 Gbps [18]. They have been widely used for backplane interconnections. However, the size and power consumption of these optical devices could limit the number of interconnections on each circuit board, resulting in more circuit boards, and thus higher implementation costs. Furthermore, a large number of optical fibers are required to interconnect multiple racks. This increases installation costs and makes fiber reconfiguration and maintenance difficult. The layout of fiber needs to be carefully designed to reduce potential interruption caused by human error. Installing new fibers to scale the router’s capacity can be mistake-prone and disrupting to the existing services. Power Consumption. As SERDES technology allows more than a hundred bi-directional channels, each operating at 2.5 or 3.125 Gbps, on a CMOS (complementary metaloxide-semiconductor) chip [19, 20], its power dissipation can be as high as 20 W. With VCSEL technology, each bi-directional connection can consume 250 mW. If we assume that 1 terabit/s bandwidth is required for interconnection to other racks, it would need 400 optical bi-directional channels (each 2.5 Gbps), resulting in a total of 1000 W per rack for optical interconnections. Each rack may dissipate up to several thousands watts due to the heat dissipation limitation, which in turn limits the number of components that can be put on each card and limits the number of cards on each rack. The large power dissipation also increases the cost of air-conditioning the room. The power consumption cannot be overlooked from the global viewpoint of the Internet [21]

16 INTRODUCTION Flexibility.As we move the core routers closer to the edge of networks,we now have to support diverse protocols and services available at the edge.Therefore,router design must be modular and should evolve with future requirements.This means we cannot rely too heavily on fast ASIC operations;instead a balance needs to be struck between performance and flexibility by ways of programmable ASICs. 1.5 IP NETWORK MANAGEMENT Once many switches and routers are interconnected on the Internet,how are they managed by the network operators?In this section,we briefly introduce the functionalities.architecture. and major components of the management systems for IP networks. 1.5.1 Network Management System Functionalities In terms of the network management model defined by the International Standard Organization(ISO),a network management system(NMS)has five management function- alities [22-24]:performance management (PM),fault management (FM),configuration management(CM),accounting management(AM),and security management(SM). PM.The task of PM is to monitor,measure,report,and control the performance of the network,which can be done by monitoring,measuring,reporting,and controlling the performance of individual network elements(NEs)at regular intervals;or by analyzing logged performance data on each NE.The common performance metrics are network throughput,link utilization,and packet counts input and output from an NE. FM.The goal of FM is to collect,detect,and respond to fault conditions in the net- work,which are reported as trap events or alarm messages.These messages may be generated by a managed object or its agent built into a network device,such as Simple Network Management Protocol(SNMP)traps [25]or Common Management Information Protocol (CMIP)event notifications [26,27],or by a network man- agement system(NMS),using synthetic traps or probing events generated by,for instance,Hewlett-Packard's Open View (HPOV)stations.Fault management systems handle network failures,including hardware failures,such as link down and software failures,and protocol errors,by generating,collecting,processing,identifying,and reporting trap and alarm messages. CM.The task of CM includes configuring the switch and I/O modules in a router,the data and management ports in a module,and the protocols for a specific device.CM deals with the configuration of the NEs in a network to form a network and to carry customers'data traffic. AM.The task of AMis to control and allocate user access to network resources,and to log usage information for accounting purposes.Based on the price model,logged infor- mation,such as call detailed records(CDR),is used to provide billing to customers. The price model can be usage-based or flat rate. SM.SM deals with protection of network resources and customers'data traffic,including authorization and authentication of network resources and customers,data integrity

Book1099 — “c01” — 2007/2/16 — 18:26 — page 16 — #16 16 INTRODUCTION Flexibility. As we move the core routers closer to the edge of networks, we now have to support diverse protocols and services available at the edge. Therefore, router design must be modular and should evolve with future requirements. This means we cannot rely too heavily on fast ASIC operations; instead a balance needs to be struck between performance and flexibility by ways of programmable ASICs. 1.5 IP NETWORK MANAGEMENT Once many switches and routers are interconnected on the Internet, how are they managed by the network operators? In this section, we briefly introduce the functionalities, architecture, and major components of the management systems for IP networks. 1.5.1 Network Management System Functionalities In terms of the network management model defined by the International Standard Organization (ISO), a network management system (NMS) has five management functionalities [22–24]: performance management (PM), fault management (FM), configuration management (CM), accounting management (AM), and security management (SM). PM. The task of PM is to monitor, measure, report, and control the performance of the network, which can be done by monitoring, measuring, reporting, and controlling the performance of individual network elements (NEs) at regular intervals; or by analyzing logged performance data on each NE. The common performance metrics are network throughput, link utilization, and packet counts input and output from an NE. FM. The goal of FM is to collect, detect, and respond to fault conditions in the network, which are reported as trap events or alarm messages. These messages may be generated by a managed object or its agent built into a network device, such as Simple Network Management Protocol (SNMP) traps [25] or Common Management Information Protocol (CMIP) event notifications [26, 27], or by a network management system (NMS), using synthetic traps or probing events generated by, for instance, Hewlett-Packard’s OpenView (HPOV) stations. Fault management systems handle network failures, including hardware failures, such as link down and software failures, and protocol errors, by generating, collecting, processing, identifying, and reporting trap and alarm messages. CM. The task of CM includes configuring the switch and I/O modules in a router, the data and management ports in a module, and the protocols for a specific device. CM deals with the configuration of the NEs in a network to form a network and to carry customers’ data traffic. AM. The task ofAM is to control and allocate user access to network resources, and to log usage information for accounting purposes. Based on the price model, logged information, such as call detailed records (CDR), is used to provide billing to customers. The price model can be usage-based or flat rate. SM. SM deals with protection of network resources and customers’data traffic, including authorization and authentication of network resources and customers, data integrity