1.2 ROUTER ARCHITECTURES 7 Router Management controller controller Line card 1 CPU Memory Memory Transponder/ Network Traffic Framer transceiver processor manager Switch ◆ fabric Line card N CPU Memory Memory Transponder Framer Network Traffic transceiver processor manager Figure 1.6 Typical router architecture. from the RC to the forwarding engines in the interface cards.It is not necessary to download a new forwarding table for every route update.Route updates can be frequent,but routing protocols need time,in the order of minutes,to converge.The RC needs a dynamic routing table designed for fast updates and fast generation of forwarding tables.Forwarding tables, on the other hand,can be optimized for lookup speed and need not be dynamic. Figure 1.6 shows a typical router architecture,where multiple line cards,an RC,and an MC are interconnected through a switch fabric.The communication between the RC/MC and the line cards can be either through the switch fabric or through a separate intercon- nection network,such as a Ethernet switch.The line cards are the entry and exit points of data to and from a router.They provide the interface from physical and higher layers to the switch fabric.The tasks provided by line cards are becoming more complex as new applications develop and protocols evolve.Each line card supports at least one full-duplex fiber connection on the network side,and at least one ingress and one egress connection to the switch fabric backplane.Generally speaking,for high-bandwidth applications,such as OC-48 and above,the network connections support channelization for aggregation of lower-speed lines into a large pipe,and the switch fabric connections provide flow-control mechanisms for several thousand input and output queues to regulate the ingress and egress traffic to and from the switch fabric. A line card usually includes components such as a transponder,framer,network processor (NP),traffic manager(TM),and central processing unit(CPU). Transponder/Transceiver.This component performs optical-to-electrical and electrical- to-optical signal conversions,and serial-to-parallel and parallel-to-serial conversions [6,7]. Framer:A framer performs synchronization,frame overhead processing,and cell or packet delineation.On the transmit side,a SONET (synchronous optical network)/SDH(synchronous digital hierarchy)framer generates section,line,and path overhead.It performs framing patter insertion(Al,A2)and scrambling.It
Book1099 — “c01” — 2007/2/16 — 18:26 — page 7 — #7 1.2 ROUTER ARCHITECTURES 7 Switch fabric Transponder / transceiver Framer Network processor Traffic manager CPU Memory Memory Line card N Transponder / transceiver Framer Network processor Traffic manager CPU Memory Memory Line card 1 Router controller Management controller Figure 1.6 Typical router architecture. from the RC to the forwarding engines in the interface cards. It is not necessary to download a new forwarding table for every route update. Route updates can be frequent, but routing protocols need time, in the order of minutes, to converge. The RC needs a dynamic routing table designed for fast updates and fast generation of forwarding tables. Forwarding tables, on the other hand, can be optimized for lookup speed and need not be dynamic. Figure 1.6 shows a typical router architecture, where multiple line cards, an RC, and an MC are interconnected through a switch fabric. The communication between the RC/MC and the line cards can be either through the switch fabric or through a separate interconnection network, such as a Ethernet switch. The line cards are the entry and exit points of data to and from a router. They provide the interface from physical and higher layers to the switch fabric. The tasks provided by line cards are becoming more complex as new applications develop and protocols evolve. Each line card supports at least one full-duplex fiber connection on the network side, and at least one ingress and one egress connection to the switch fabric backplane. Generally speaking, for high-bandwidth applications, such as OC-48 and above, the network connections support channelization for aggregation of lower-speed lines into a large pipe, and the switch fabric connections provide flow-control mechanisms for several thousand input and output queues to regulate the ingress and egress traffic to and from the switch fabric. A line card usually includes components such as a transponder, framer, network processor (NP), traffic manager (TM), and central processing unit (CPU). Transponder/Transceiver. This component performs optical-to-electrical and electricalto-optical signal conversions, and serial-to-parallel and parallel-to-serial conversions [6, 7]. Framer. A framer performs synchronization, frame overhead processing, and cell or packet delineation. On the transmit side, a SONET (synchronous optical network)/SDH (synchronous digital hierarchy) framer generates section, line, and path overhead. It performs framing pattern insertion (A1, A2) and scrambling. It
INTRODUCTION generates section,line,and path bit interleaved parity(B1/B2/B3)for far-end perfor- mance monitoring.On the receive side,it processes section,line,and path overhead. It performs frame delineation,descrambling,alarm detection,pointer interpreta- tion,bit interleaved parity monitoring (B1/B2/B3),and error count accumulation for performance monitoring [8].An alternative for the framer is Ethernet framer. Network Processor.The NP mainly performs table lookup,packet classification,and packet modification.Various algorithms to implement the first two functions are presented in Chapters 2 and 3,respectively.The NP can perform those two functions at the line rate using external memory,such as static random access memory (SRAM) or dynamic random access memory(DRAM),but it may also require external content addressable memory(CAM)or specialized co-processors to perform deep packet classification at higher levels.In Chapter 16,we present some commercially available NP and ternary content addressable memory (TCAM)chips. Traffic Manager:To meet the requirements of each connection and service class,the TM performs various control functions to cell/packet streams,including traffic access con- trol,buffer management,and cell/packet scheduling.Traffic access control consists of a collection of specification techniques and mechanisms that(1)specify the expected traffic characteristics and service requirements (e.g.,peak rate,required delay bound, loss tolerance)of a data stream;(2)shape (i.e.,delay)data streams (e.g.,reducing their rates and/or burstiness);and(3)police data streams and take corrective actions (e.g.,discard,delay,or mark packets)when traffic deviates from its specification. The usage parameter control(UPC)in ATM and differentiated service(DiffServ)in IP performs similar access control functions at the network edge.Buffer manage- ment performs cell/packet discarding,according to loss requirements and priority levels,when the buffer exceeds a certain threshold.Proposed schemes include early packet discard(EPD)[9],random early packet discard(REPD)[10],weighted REPD [11],and partial packet discard(PPD)[12].Packet scheduling ensures that packets are transmitted to meet each connection's allocated bandwidth/delay requirements. Proposed schemes include deficit round-robin,weighted fair queuing(WFQ)and its variants,such as shaped virtual clock [13]and worst-case fairness WFQ(WF2Q+) [14].The last two algorithms achieve the worst-case fairness properties.Details are discussed in Chapter 4.Many quality of service(QoS)control techniques,algorithms, and implementation architectures can be found in Ref.[15].The TM may also manage many queues to resolve contention among the inputs of a switch fabric,for example, hundreds or thousands of virtual output queues (VOQs).Some of the representative TM chips on the market are introduced in Chapter 16,whose purpose it is to match the theories in Chapter 4 with practice. Central Processing Unit.The CPU performs control plane functions including connec- tion set-up/tear-down,table updates,register/buffer management,and exception han- dling.The CPU is usually not in-line with the fast-path on which maximum-bandwidth network traffic moves between the interfaces and the switch fabric. The architecture in Figure 1.6 can be realized in a multi-rack(also known as multi-chassis or multi-shelf)system as shown in Figure 1.7.In this example,a half rack,equipped with a switch fabric,a duplicated RC,a duplicated MC,a duplicated system clock(CLK),and a duplicated fabric shelf controller (FSC),is connected to all other line card(LC)shelves, each of which has a duplicated line card shelf controller (LSC).Both the FSC and the
Book1099 — “c01” — 2007/2/16 — 18:26 — page 8 — #8 8 INTRODUCTION generates section, line, and path bit interleaved parity (B1/B2/B3) for far-end performance monitoring. On the receive side, it processes section, line, and path overhead. It performs frame delineation, descrambling, alarm detection, pointer interpretation, bit interleaved parity monitoring (B1/B2/B3), and error count accumulation for performance monitoring [8]. An alternative for the framer is Ethernet framer. Network Processor. The NP mainly performs table lookup, packet classification, and packet modification. Various algorithms to implement the first two functions are presented in Chapters 2 and 3, respectively. The NP can perform those two functions at the line rate using external memory, such as static random access memory (SRAM) or dynamic random access memory (DRAM), but it may also require external content addressable memory (CAM) or specialized co-processors to perform deep packet classification at higher levels. In Chapter 16, we present some commercially available NP and ternary content addressable memory (TCAM) chips. Traffic Manager. To meet the requirements of each connection and service class, the TM performs various control functions to cell/packet streams, including traffic access control, buffer management, and cell/packet scheduling. Traffic access control consists of a collection of specification techniques and mechanisms that (1) specify the expected traffic characteristics and service requirements (e.g., peak rate, required delay bound, loss tolerance) of a data stream; (2) shape (i.e., delay) data streams (e.g., reducing their rates and/or burstiness); and (3) police data streams and take corrective actions (e.g., discard, delay, or mark packets) when traffic deviates from its specification. The usage parameter control (UPC) in ATM and differentiated service (DiffServ) in IP performs similar access control functions at the network edge. Buffer management performs cell/packet discarding, according to loss requirements and priority levels, when the buffer exceeds a certain threshold. Proposed schemes include early packet discard (EPD) [9], random early packet discard (REPD) [10], weighted REPD [11], and partial packet discard (PPD) [12]. Packet scheduling ensures that packets are transmitted to meet each connection’s allocated bandwidth/delay requirements. Proposed schemes include deficit round-robin, weighted fair queuing (WFQ) and its variants, such as shaped virtual clock [13] and worst-case fairness WFQ (WF2Q+) [14]. The last two algorithms achieve the worst-case fairness properties. Details are discussed in Chapter 4. Many quality of service (QoS) control techniques, algorithms, and implementation architectures can be found in Ref. [15]. The TM may also manage many queues to resolve contention among the inputs of a switch fabric, for example, hundreds or thousands of virtual output queues (VOQs). Some of the representative TM chips on the market are introduced in Chapter 16, whose purpose it is to match the theories in Chapter 4 with practice. Central Processing Unit. The CPU performs control plane functions including connection set-up/tear-down, table updates, register/buffer management, and exception handling. The CPU is usually not in-line with the fast-path on which maximum-bandwidth network traffic moves between the interfaces and the switch fabric. The architecture in Figure 1.6 can be realized in a multi-rack (also known as multi-chassis or multi-shelf) system as shown in Figure 1.7. In this example, a half rack, equipped with a switch fabric, a duplicated RC, a duplicated MC, a duplicated system clock (CLK), and a duplicated fabric shelf controller (FSC), is connected to all other line card (LC) shelves, each of which has a duplicated line card shelf controller (LSC). Both the FSC and the
1.3 COMMERCIAL CORE ROUTER EXAMPLES 9 Line card shelf Line cards controller (LSC) LC) LC Data path LC ●● Route Fabric LC controller shelf (RC) controller (FSC) Management System clock controller (CLK) (MC) Figure 1.7 Multi-rack router system. LSC provide local operation and maintenance for the switch fabric and line card shelves, respectively.They also provide the communication channels between the switch/line cards with the RC and the MC.The duplicated cards are for reliability concerns.The figure also shows how the system can grow by adding more LC shelves.Interconnections between the racks are sets of cables or fibers,carrying information for the data and the control planes. The cabling usually is a combination of unshielded twisted path(UTP)Category 5 Ethernet cables for control path,and fiber-optic arrays for data path. 1.3 COMMERCIAL CORE ROUTER EXAMPLES We now briefly discuss the two most popular core routers on the market:Juniper Network's T640 TX-Matrix [16]and Cisco System's Carrier Routing System(CRS-1)[17]. 1.3.1 T640 TX-Matrix A T640 TX-Matrix is composed of up to four routing nodes and a TX Routing Matrix interconnecting the nodes.A TX Routing Matrix connects up to four T640 routing nodes via a three-stage Clos network switch fabric to form a unified router with the capacity of 2.56 Terabits.The blueprint of a TX Routing Matrix is shown in Figure 1.8.The unified router is controlled by the Routing Engine of the matrix which is responsible for running routing protocols and for maintaining overall system state.Routing engines in each routing
Book1099 — “c01” — 2007/2/16 — 18:26 — page 9 — #9 1.3 COMMERCIAL CORE ROUTER EXAMPLES 9 LC Data path Control path Line card shelf controller (LSC) Line cards (LC) Route controller (RC) Management controller (MC) Fabric shelf controller (FSC) System clock (CLK) LC LC LC LSC LC LSC RC MC FSC CLK Switch fabric Switch fabric LC LC LC LC LSC LSC LSC LSC Figure 1.7 Multi-rack router system. LSC provide local operation and maintenance for the switch fabric and line card shelves, respectively. They also provide the communication channels between the switch/line cards with the RC and the MC. The duplicated cards are for reliability concerns. The figure also shows how the system can grow by adding more LC shelves. Interconnections between the racks are sets of cables or fibers, carrying information for the data and the control planes. The cabling usually is a combination of unshielded twisted path (UTP) Category 5 Ethernet cables for control path, and fiber-optic arrays for data path. 1.3 COMMERCIAL CORE ROUTER EXAMPLES We now briefly discuss the two most popular core routers on the market: Juniper Network’s T640 TX-Matrix [16] and Cisco System’s Carrier Routing System (CRS-1) [17]. 1.3.1 T640 TX-Matrix A T640 TX-Matrix is composed of up to four routing nodes and a TX Routing Matrix interconnecting the nodes. A TX Routing Matrix connects up to four T640 routing nodes via a three-stage Clos network switch fabric to form a unified router with the capacity of 2.56 Terabits. The blueprint of a TX Routing Matrix is shown in Figure 1.8. The unified router is controlled by the Routing Engine of the matrix which is responsible for running routing protocols and for maintaining overall system state. Routing engines in each routing
10 INTRODUCTION PFEs (PICs FPCs) PFEs (PICs FPCs) T640 routing node 0 T640 routing node 1 TX Matrix platform Two UTP category 5 ethernet cables Routing engine Five fiber-optic Switch cards array cables T640 routing node 3 T640 routing node 2 PFEs(PICs FPCs) PFEs (PICs FPCs) Figure 1.8 TX Routing Matrix with four T640 routing nodes. node manage their individual components in coordination with the routing engine of the matrix.Data and control plane of each routing node is interconnected via an array of optical and Ethernet cables.Data planes are interconnected using VCSEL (vertical cavity surface emitting laser)optical lines whereas control planes are interconnected using UTP Category 5 Ethernet cables. As shown in Figure 1.9,each routing node has two fundamental architectural compo- nents,namely the control plane and the data plane.The T640 routing node's control plane is implemented by the JUNOS software that runs on the node's routing engine.JUNOS is a micro-kernel-based modular software that assures reliability,fault isolation,and high avail- ability.It implements the routing protocols,generates routing tables and forwarding tables, and supports the user interface to the router.Data plane,on the other hand,is responsible for processing packets in hardware before forwarding them across the switch fabric from the ingress interface to the appropriate egress interface.The T640 routing node's data plane is implemented in custom ASICs in a distributed architecture. T640 Routing node Control plane Routing engine 100 Mbps link Data plane Packets Ingress PFE Switch fabric Egress PFE Packets n (Input processing) (out中ut processing) out interconnects PFEs Figure 1.9 T640 routing node architecture
Book1099 — “c01” — 2007/2/16 — 18:26 — page 10 — #10 10 INTRODUCTION PFEs (PICs & FPCs) T640 routing node 0 TX Matrix platform Routing engine Switch cards T640 routing node 2 PFEs (PICs & FPCs) T640 routing node 3 PFEs (PICs & FPCs) PFEs (PICs & FPCs) T640 routing node 1 Two UTP category 5 ethernet cables Five fiber-optic array cables Figure 1.8 TX Routing Matrix with four T640 routing nodes. node manage their individual components in coordination with the routing engine of the matrix. Data and control plane of each routing node is interconnected via an array of optical and Ethernet cables. Data planes are interconnected using VCSEL (vertical cavity surface emitting laser) optical lines whereas control planes are interconnected using UTP Category 5 Ethernet cables. As shown in Figure 1.9, each routing node has two fundamental architectural components, namely the control plane and the data plane. The T640 routing node’s control plane is implemented by the JUNOS software that runs on the node’s routing engine. JUNOS is a micro-kernel-based modular software that assures reliability, fault isolation, and high availability. It implements the routing protocols, generates routing tables and forwarding tables, and supports the user interface to the router. Data plane, on the other hand, is responsible for processing packets in hardware before forwarding them across the switch fabric from the ingress interface to the appropriate egress interface. The T640 routing node’s data plane is implemented in custom ASICs in a distributed architecture. Switch fabric interconnects PFEs Ingress PFE (Input processing) Egress PFE (output processing) Routing engine Packets out Packets In Control plane Data plane 100 Mbps link T640 Routing node Figure 1.9 T640 routing node architecture
1.3 COMMERCIAL CORE ROUTER EXAMPLES 11 Plane 0(backup) Network- PIC Plane 2 PIC Network Ingress Egress PFE PFE Network PIC Plane 4 PIC Network Cell-by-call Fabric Distribution ASIC Figure 1.10 T640 switch fabric planes. The T640 routing node has three major elements:Packet forwarding engines(PFEs),the switch fabric,and one or two routing engines.The PFE performs Layer 2 and Layer 3 packet processing and forwarding table lookups.A PFE is made of many ASIC components.For example,there are Media-Specific ASICs to handle Layer 2 functions that are associated with the specific physical interface cards(PICs),such as SONET,ATM,or Ethernet.L2/L3 Packet Processing,and ASICs strip off Layer 2 headers and segment packets into cells for internal processing,and reassemble cells into Layer 3 packets prior to transmission on the egress interface.In addition,there are ASICs for managing queuing functions(Queuing and Memory Interface ASIC),for forwarding cells across the switch fabric (Switch Interface ASICs),and for forwarding lookups(T-Series Internet Processor ASIC). The switch fabric in a standalone T640 routing node provides data plane connectivity among all of the PFEs in the chassis.In a TX-Routing Matrix,switch fabric provides data plane connectivity among all of the PFEs in the matrix.The T640 routing node uses a Clos network and the TX-Routing Matrix uses a multistage Clos network.This switch fabric provides nonblocking connectivity,fair bandwidth allocation,and distributed control. In order to achieve high-availability each node has up to five switch fabric planes(see Fig.1.10).At a given time,four of them are used in a round-robin fashion to distribute packets from the ingress interface to the egress interface.The fifth one is used as a hot- backup in case of failures.Access to switch fabric bandwidth is controlled by the following three-step request-grant mechanism.The request for each cell of a packet is transmitted in a round-robin order from the source PFE to the destination PFE.Destination PFE transmits a grant to the source using the same switch plane from which the corresponding request was received.Source PFE then transmits the cell to the destination PFE on the same switch plane. 1.3.2 Carrier Routing System(CRS-1) Cisco System's Carrier Routing System is shown in Figure 1.11.CRS-1 also follows the multi-chassis design with line card shelves and fabric shelves.The design allows the sys- tem to combine as many as 72 line card shelves interconnected using eight fabric shelves to operate as a single router or as multiple logical routers.It can be configured to deliver anywhere between 1.2 to 92 terabits per second capacity and the router as a whole can accommodate 1152 40-Gbps interfaces.Router Engine is implemented using at least two route processors in a line card shelf.Each route processor is a Dual PowerPC CPU com- plex configured for symmetric multiprocessing with 4 GB of DRAM for system processes and routing tables and 2GB of Flash memory for storing software images and system configuration.In addition,the system is equipped to include non-volatile random access
Book1099 — “c01” — 2007/2/16 — 18:26 — page 11 — #11 1.3 COMMERCIAL CORE ROUTER EXAMPLES 11 Plane 0 (backup) Plane 1 Plane 2 Plane 3 Plane 4 Fabric ASIC Ingress PFE PIC PIC Egress PFE PIC PIC Network Network Network Network Cell-by-call Distribution Figure 1.10 T640 switch fabric planes. The T640 routing node has three major elements: Packet forwarding engines (PFEs), the switch fabric, and one or two routing engines. The PFE performs Layer 2 and Layer 3 packet processing and forwarding table lookups. A PFE is made of many ASIC components. For example, there are Media-Specific ASICs to handle Layer 2 functions that are associated with the specific physical interface cards (PICs), such as SONET, ATM, or Ethernet. L2/L3 Packet Processing, and ASICs strip off Layer 2 headers and segment packets into cells for internal processing, and reassemble cells into Layer 3 packets prior to transmission on the egress interface. In addition, there are ASICs for managing queuing functions (Queuing and Memory Interface ASIC), for forwarding cells across the switch fabric (Switch Interface ASICs), and for forwarding lookups (T-Series Internet Processor ASIC). The switch fabric in a standalone T640 routing node provides data plane connectivity among all of the PFEs in the chassis. In a TX-Routing Matrix, switch fabric provides data plane connectivity among all of the PFEs in the matrix. The T640 routing node uses a Clos network and the TX-Routing Matrix uses a multistage Clos network. This switch fabric provides nonblocking connectivity, fair bandwidth allocation, and distributed control. In order to achieve high-availability each node has up to five switch fabric planes (see Fig. 1.10). At a given time, four of them are used in a round-robin fashion to distribute packets from the ingress interface to the egress interface. The fifth one is used as a hotbackup in case of failures. Access to switch fabric bandwidth is controlled by the following three-step request-grant mechanism. The request for each cell of a packet is transmitted in a round-robin order from the source PFE to the destination PFE. Destination PFE transmits a grant to the source using the same switch plane from which the corresponding request was received. Source PFE then transmits the cell to the destination PFE on the same switch plane. 1.3.2 Carrier Routing System (CRS-1) Cisco System’s Carrier Routing System is shown in Figure 1.11. CRS-1 also follows the multi-chassis design with line card shelves and fabric shelves. The design allows the system to combine as many as 72 line card shelves interconnected using eight fabric shelves to operate as a single router or as multiple logical routers. It can be configured to deliver anywhere between 1.2 to 92 terabits per second capacity and the router as a whole can accommodate 1152 40-Gbps interfaces. Router Engine is implemented using at least two route processors in a line card shelf. Each route processor is a Dual PowerPC CPU complex configured for symmetric multiprocessing with 4 GB of DRAM for system processes and routing tables and 2 GB of Flash memory for storing software images and system configuration. In addition, the system is equipped to include non-volatile random access