高级计算机体系结构设计及其在数据中心和云计算的应用Power DistributionMainSupply·ATS (Automatic transfer switch)Transformer-SwitchesbetweenmainsandATSGeneratorSwitchgenerator1000kWBoardUPSUPSSTS (Static transfer switch)STSPDU-SwitchesbetweenUPSsSTSPDU200kWPanelPanPDU(Powerdistributionunit)50kW0-Transformsdownto110VoltsCircuitfor racksRack2.5kW
高级计算机体系结构设计及其在数据中心和云计算的应 用 Power Distribution • ATS (Automatic transfer switch) – Switches between mains and generator • STS (Static transfer switch) – Switches between UPSs • PDU (Power distribution unit) – Transforms down to 110 Volts for racks
高级计算机体系结构设计及其在数据中心和云计算的应用UPSSystemsContain batteries or flywheels to bridge the timebetween the utility failure and the availability ofgeneratorpower.Perform the conversion AC-DC-AC when utility powerfails. Typical sizes of 100 KW - 2 MW.: Condition the incoming power feed, removing voltagespikes or sags, or harmonic distortions. Take a separateroom due to their size
高级计算机体系结构设计及其在数据中心和云计算的应 用 UPS Systems • Contain batteries or flywheels to bridge the time between the utility failure and the availability of generator power. • Perform the conversion AC Perform the conversion AC-DC-AC when utility power AC when utility power fails. Typical sizes of 100 KW – 2 MW. • Condition the incoming power feed, removing voltage spikes or sags, or harmonic distortions. Take a separate room due to their size
高级计算机体系结构设计及其在数据中心和云计算的应用Power Distribution Units (PDU)Resemble breaker panels in residential houses. Take 200-480 V feed and break it up into many 110- or220-V circuits that feed the actual servers.: Often provide additional redundancy by accepting twoindependent power sources (typically called “A side" and"B side"). Switch between them with a very small delay sothat the loss of one source does not interrupt power to theservers. Paralleling arrangements like N+1, N+2, and 2N areemployed
高级计算机体系结构设计及其在数据中心和云计算的应 用 Power Distribution Units (PDU) • Resemble breaker panels in residential houses. • Take 200–480 V feed and break it up into many 110- or 220-V circuits that feed the actual servers. • Often provide additional redundancy by accepting two independent power sources (typically called “A side” and “B side”). Switch between them with a very small delay so that the loss of one source does not interrupt power to the servers. Paralleling arrangements like N+1, N+2, and 2N are employed
高级计算机体系结构设计及其在数据中心和云计算的应用Scenes from a Typical DatacenterPowercablesaboveracks
高级计算机体系结构设计及其在数据中心和云计算的应 用 Scenes from a Typical Datacenter • Power cables above racks
高级计算机体系结构设计及其在数据中心和云计算的应用TheJoysofRealHardwareTypicalfirst yearfora newcluster:~o.5overheating (powerdownmostmachines in<5mins,~1-2days to recover)~1PDUfailure(~5o0-1ooomachinessuddenlydisappear,~6hourstocomeback)~1rack-move(plentyofwarning,~500-1000machinespowereddown,~6hours)~1networkrewiring(rolling~5%ofmachinesdownover2-dayspan)20rackfailures(40-80machinesinstantlydisappear,1-6hourstogetback)~5racksgo wonky(40-80machines see50%packetloss)~8networkmaintenances(4mightcause~3o-minuterandomconnectivitylosses)~12 routerreloads (takes out DNS and external vips fora couple minutes)~3routerfailures(havetoimmediatelypulltrafficforanhour)~dozensofminor3o-secondblipsfordns~10oo individual machine failures~thousandsofharddrivefailuresslow disks,bad memory,misconfiguredmachines,flakymachines,etc.Long distance links: wild dogs,sharks, dead horses,drunken hunters, etc
高级计算机体系结构设计及其在数据中心和云计算的应 用 25