高级计算机体系结构设计及其在数据中心和云计算的应用Block Execution : Software View Automatic ScalabilityKernelGridBlock3BlockOBlock1Block2Block4Block5Block7Block6Devicewith2SMsDevicewith4SMsSMOSM1SMOSM1SM2SM3BlockoBlock1Block2Block3Block0Block1Block2Block3Block4Block5Block6Block7Block4Block5Block6Block7
高级计算机体系结构设计及其在数据中心和云计算的应 用 Block Execution : Software View • Automatic Scalability
高级计算机体系结构设计及其在数据中心和云计算的应用Block Execution : Hardware ViewBlocksdivided in32-threadWarpsThisisanimplementationdecision,notpart of the CUDA programming modelBlock1WarpsBlock2Warpstot1t2t31tot1t2t3-Warpsareschedulingunits inSM3 blocks/SMandeachBlock256threadsthenStroamingMuluprocessorDataL1- Block is divided into 256/32 = 8 WarpsInstructionL1InstructionFetch/Dispatch Total 8 * 3 = 24 WarpsShared Memory- Only 1 of the 24 Warps will be selectedSPSPSPSPfor instruction fetch and execution.SFUSFUSPSPSPSP
高级计算机体系结构设计及其在数据中心和云计算的应 用 Block Execution : Hardware View • Blocks divided in 32-thread Warps – This is an implementation decision, not part of the CUDA programming model – Warps are scheduling units in SM • 3 blocks/SM and each Block 256 threads, then – Block is divided into 256/32 = 8 Warps – Total 8 * 3 = 24 Warps – Only 1 of the 24 Warps will be selected for instruction fetch and execution
高级计算机体系结构设计及其在数据中心和云计算的应用Warp Scheduling IZero-overhead Context Switching.Next warp-Instruction which has it's operands ready.Eligible warps-Prioritized scheduling policy (no detailsavailable). Threads in Warp execute the same instruction
高级计算机体系结构设计及其在数据中心和云计算的应 用 Warp Scheduling I • Zero-overhead Context Switching • Next warp – Instruction which has it’s operands ready • Eligible warps – Prioritized scheduling policy (no details available) • Threads in Warp execute the same instruction
高级计算机体系结构设计及其在数据中心和云计算的应用Warp Scheduling IlSM multithreadedThreadWarp4Warp schedulerThreadWarp3Threadsavailable000forschedulingtimeThread Warp8ThreadWarp7warp 8instruction1lI-Cachewarp1instruction42DecodeRegisterRead★+++warp3instruction95Threads accessingALU+memoryheiarchyD-Cache++++++Miss?ThreadWarp5All Hit?instruction12walThreadWarp6Data000ThreadWarp1tion 96ThreadWarp2Writeback
高级计算机体系结构设计及其在数据中心和云计算的应 用 Warp Scheduling II