高级计算机体系结构设计及其在数据中心和云计算的应用Lecture 11Multi-{Socket,Core,Thread]
高级计算机体系结构设计及其在数据中心和云计算的应用 Lecture 11 Multi-{Socket,Core,Thread}
高级计算机体系结构设计及其在数据中心和云计算的应用GettingMore Performance· Keep pushing IPC and/or frequenecy- Design complexity (time to market)- Cooling (cost)- Power delivery (cost)Possible,but too costly
高级计算机体系结构设计及其在数据中心和云计算的应用 Getting More Performance • Keep pushing IPC and/or frequenecy – Design complexity (time to market) – Cooling (cost) – Power delivery (cost) – . • Possible, but too costly
高级计算机体系结构设计及其在数据中心和云计算的应用Bridging the GapWatts/IPCPower has been growingexponentiallyaswell100101Diminishingreturns w.r.t.largerinstructionwindow,higherissue-widthSingle-IssueLimitsSuperscalarSuperscalarPipelinedOut-of-OrderOut-of-Order(Today)(Hypothetical-Aggressive)
高级计算机体系结构设计及其在数据中心和云计算的应用 Bridging the Gap IPC 100 10 Power has been growing exponentially as well Watts / 1 Single-Issue Pipelined Superscalar Out-of-Order (Today) Superscalar Out-of-Order (HypotheticalAggressive) Limits Diminishing returns w.r.t. larger instruction window, higher issue-width
高级计算机体系结构设计及其在数据中心和云计算的应用Higher Complexity not Worth EffortPerformanceMadesensetogoSuperscalar/OO:goodROlVerylittlegain forsubstantialeffort"Effort"ScalarModerate-PipeVery-Deep-PipeIn-OrderSuperscalar/000AggressiveSuperscalar/000
高级计算机体系结构设计及其在数据中心和云计算的应用 Higher Complexity not Worth Effort Performance Made sense to go Superscalar/OOO: good ROI Very little gain for substantial effort “Effort” Scalar In-Order Moderate-Pipe Superscalar/OOO Very-Deep-Pipe Aggressive Superscalar/OOO
高级计算机体系结构设计及其在数据中心和云计算的应用User Visible/Invisible. All performance gains up to this point were“free"- No user intervention required (beyond buying new chip)·Recompilation/rewritingcouldprovideevenmorebenefit-Higherfrequency&higherIPC- Same IsA, different micro-architecture:Multi-processing pushes parallelism above ISA-Coarsegrainedparallelism.Providemultipleprocessingelements- User (or developer) responsible for finding parallelism·User decides howto use resources
高级计算机体系结构设计及其在数据中心和云计算的应用 User Visible/Invisible • All performance gains up to this point were “free” – No user intervention required (beyond buying new chip) • Recompilation/rewriting could provide even more benefit – Higher frequency & higher IPC – Same ISA, different micro-architecture • Multi-processing pushes parallelism above ISA – Coarse grained parallelism • Provide multiple processing elements – User (or developer) responsible for finding parallelism • User decides how to use resources