高级计算机体系结构设计及其在数据中心和云计算的应用Putting It All TogetherMUX1targetPC+1PC+10ROeq?ALUR6asregFinstructionMresultregBUvalAALUInstPCYmdataCacheresultDatavalBMUXCacheR7dataoffsetdestvalB25XdestdestdestopopopIF/IDID/EXEX/MemMem/WB
高级计算机体系结构设计及其在数据中心和云计算的应 用 Putting It All Together Register file M A U 1 + + valA PC+1 PC+1 target ALU result instruction eq? 0 R2 R3 R1 R0 regA regB M U X PC Inst Cache Register file U X A L U Data Cache M U X IF/ID ID/EX EX/Mem Mem/WB M U X op dest offset valB valA ALU result op dest valB op dest result mdata instruction R2 R3 R4 R5 R6 R7 regB data dest
高级计算机体系结构设计及其在数据中心和云计算的应用Pipelining ldealismUniformSub-operations- Operation can partitioned into uniform-latency sub-opsRepetitionofIdenticalOperations- Same ops performed on many different inputsRepetitionofIndependentOperations- All repetitions of op are mutually independent
高级计算机体系结构设计及其在数据中心和云计算的应 用 Pipelining Idealism • Uniform Sub-operations – Operation can partitioned into uniform-latency sub-ops • Repetition of Identical Operations – Same ops performed on many different inputs • Repetition of Independent Operations – All repetitions of op are mutually independent
高级计算机体系结构设计及其在数据中心和云计算的应用PipelineRealism. Uniform Sub-operations ... NOT!- Balance pipeline stages·Stagequantizationtoyieldbalancedstages: Minimize internal fragmentation (left-overtime near end of cycle).Repetition of Identical Operations ...NOT!- Unifying instruction typesCoalescinginstructiontypesintoone“multi-function"pipe.Minimize externalfragmentation (idle stagestomatchlength)Repetition of Independent Operations ... NOT!-Resolvedataandresourcehazards·Inter-instructiondependencydetection and resolutionPipelining is expensive
高级计算机体系结构设计及其在数据中心和云计算的应 用 Pipeline Realism • Uniform Sub-operations . NOT! – Balance pipeline stages • Stage quantization to yield balanced stages • Minimize internal fragmentation (left-over time near end of cycle) • Repetition of Identical Operations . NOT! – Unifying instruction types • Coalescing instruction types into one “multi-function” pipe • Minimize external fragmentation (idle stages to match length) • Repetition of Independent Operations . NOT! – Resolve data and resource hazards • Inter-instruction dependency detection and resolution Pipelining is expensive
高级计算机体系结构设计及其在数据中心和云计算的应用The Generic Instruction PipelineIFInstructionFetchIDInstructionDecodeOFOperandFetchEXInstructionExecuteWBWrite-back
高级计算机体系结构设计及其在数据中心和云计算的应 用 The Generic Instruction Pipeline Instruction Fetch Instruction Decode IF ID Operand Fetch Instruction Execute Write-back OF EX WB
高级计算机体系结构设计及其在数据中心和云计算的应用Balancing Pipeline StagesIFTe= 6 unitsMWithout pipeliningTcyc~ Tie+TiD+To+Tex+TosTip= 2 units= 31PipelinedOFcvc ~ max(Tip TiD,Top Tex, Tos]Tp= 9 units= 9EXSpeedup= 31 / 9Tex=5 unitsWBTos= 9 unitsCan wedo better?
高级计算机体系结构设计及其在数据中心和云计算的应 用 Balancing Pipeline Stages TIF= 6 units TID= 2 units Without pipelining Tcyc TIF+TID+TOF+TEX+TOS = 31 Pipelined IF ID TID= 9 units TEX= 5 units TOS= 9 units Pipelined Tcyc max{TIF, TID, TOF, TEX, TOS} = 9 Speedup= 31 / 9 OF EX WB Can we do better?