Fat-Tree Based Topology(cont) Why Fat-Tree? k Fat tree has identical bandwidth at any bisections Each layer has the same aggregated bandwidth Can be built using cheap devices with uniform capacI Ity k Each port supports same speed as end host all devices can transmit at line speed if packets are distributed uniform along available paths Great scalability k-port switch supports k/4 servers 26
Fat-Tree Based Topology (Cont.) Why Fat-Tree? Fat tree has identical bandwidth at any bisections Each layer has the same aggregated bandwidth Can be built using cheap devices with uniform capacity Each port supports same speed as end host All devices can transmit at line speed if packets are distributed uniform along available paths Great scalability k-port switch supports k3/4 servers 26
Cost of Maintaining Switches 80 ■ Power/Gbps(wat 60 10 Heat/Gbps(btu/hr) 8 40 6 30 2 D8t-0067 aRNold 8+6t 1sAJBe. Catalyst 6509-E 7352 BlackDiamond 10808 27
Cost of Maintaining Switches 27
Fat-tree Topology is Great, But Does using fat-tree topology to inter-connect racks of servers in itself sufficient? What routing protocols should we run on these switches? Layer 2 switch algorithm: data plane flooding Layer 3 IP routing shortest path IP routing will typically use only one path despite the path diversity in the topology if using equal-cost multi-path routing at each switch independently and blindly, packet re-ordering may occur; further load may not necessarily be well-balanced Aside: control plane flooding 28
Fat-tree Topology is Great, But … Does using fat-tree topology to inter-connect racks of servers in itself sufficient? What routing protocols should we run on these switches? Layer 2 switch algorithm: data plane flooding! Layer 3 IP routing: shortest path IP routing will typically use only one path despite the path diversity in the topology if using equal-cost multi-path routing at each switch independently and blindly, packet re-ordering may occur; further load may not necessarily be well-balanced Aside: control plane flooding! 28
FAT-Tree Modified Enforce a special(p)addressing scheme in dC k unused. PodNumber switchnumber. Endhost Allows host attached to same switch to route only through switch Allows intra-pod traffic to stay within pod Use two level look-ups to distribute traffic and maintain packet ordering First level is prefix lookup used to route down the topology to Prefix Output port 10.2.0.024 servers 0.2.1.024 Second level is a suffix lookup 0.0.0.00 -SuffixOutput port 0.0.0.28 used to route up towards core 0.0.038 maintain packet ordering by using 29 same ports for same server
FAT-Tree Modified Enforce a special (IP) addressing scheme in DC unused.PodNumber.switchnumber.Endhost Allows host attached to same switch to route only through switch Allows intra-pod traffic to stay within pod Use two level look-ups to distribute traffic and maintain packet ordering • First level is prefix lookup – used to route down the topology to servers • Second level is a suffix lookup – used to route up towards core – maintain packet ordering by using same ports for same server 29
Diffusion Optimizations Flow classification Eliminates local congestion Assign traffic to ports on a per-flow basis nstead of a per-host basis Flow scheduling Eliminates global congestion Prevent long lived flows from sharing the same links Assign long lived flows to different links 30
Diffusion Optimizations Flow classification Eliminates local congestion Assign traffic to ports on a per-flow basis instead of a per-host basis Flow scheduling Eliminates global congestion Prevent long lived flows from sharing the same links Assign long lived flows to different links 30