当前位置：和泉文库 > 计算机 > 浏览文档

《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 15 GPGPU Architecture and Programming Paradigm

文件格式：PDF，文件大小：4.72MB，售价：12.6元

文档详细内容（约57页）

高级计算机体系结构设计及其在数据中心和云计算的应用OutlineGPGPUArchitectureOverviewCore ArchitectureMemory HierarchyInterconnect CPU-GPU InterfacingProgramming Paradigm

高级计算机体系结构设计及其在数据中心和云计算的应用 Outline • GPGPU Architecture Overview • Core Architecture • Memory Hierarchy • Interconnect • CPU-GPU Interfacing • Programming Paradigm

高级计算机体系结构设计及其在数据中心和云计算的应用Inside Streaming MultiprocessorStreamingMultiprocessorInstructionL1Data L1Streaming Multiprocessor (G80)InstructionFetch/Dispatch- 8 Streaming Processors (SP)Shared MemorySPSP- 2 Super Function Units (SFU)SPSPSFUSFUSPSPMulti-threaded instruction dispatchSPSP- 1 to 512 threads activeShader Core- Shared instruction fetch per 32ThreadWarThreadWapthreadsThreadWarpSchedulerPIpENDACoverlatencyoftexture/memory loadslocal/globatDRETofnterehrows16 KB shared memoryWriteh

高级计算机体系结构设计及其在数据中心和云计算的应用 Inside Streaming Multiprocessor • Streaming Multiprocessor (G80) – 8 Streaming Processors (SP) – 2 Super Function Units (SFU) • Multi-threaded instruction dispatch – 1 to 512 threads active – Shared instruction fetch per 32 threads – Cover latency of texture/memory loads • 16 KB shared memory

高级计算机体系结构设计及其在数据中心和云计算的应用RegisterFile8192 registers in each SM in G80ablocks4blocks- Implementation decision, not partof programming abstraction-Registersaredynamicallypartitioned across all blocksassigned to the SM- Once assigned to a block, theregister is NOT accessible bythreads in other blocks_ Threads access registers assignedto itself

高级计算机体系结构设计及其在数据中心和云计算的应用 Register File • 8192 registers in each SM in G80 – Implementation decision, not part of programming abstraction – Registers are dynamically partitioned across all blocks assigned to the SM – Once assigned to a block, the register is NOT accessible by threads in other blocks – Threads access registers assigned to itself

高级计算机体系结构设计及其在数据中心和云计算的应用Thread Dispatch Policy. Hierarchy of grid of blocks ofHostDevicethreadsGrid1KernelBlockBlockBlock Blocks are serially distributed to(0, 0)(1,0)(2, 0)SMBlockBlockBlock(0,2)(1,1)(2, 1)· Potentially >1 Block/SMGrid2KernelSM launches Warps (322threads)Block (1, 1): 2 levels of parallelismRound-robin, ready-to-executescheduling policyFiguresource-NvidiaCUDAProgrammingGuide2.3

高级计算机体系结构设计及其在数据中心和云计算的应用 Thread Dispatch Policy • Hierarchy of grid of blocks of threads • Blocks are serially distributed to SM • Potentially >1 Block/SM • SM launches Warps (32 threads) • 2 levels of parallelism • Round-robin, ready-to-execute scheduling policy Figure source – Nvidia CUDA Programming Guide 2.3

高级计算机体系结构设计及其在数据中心和云计算的应用Block Execution : Software ViewSM block execution (G80)SNOSM1totit2totit2tnMTIUMTIUBlocks- Assignment in blockSSFgranularityBlocks. Up to 8 blocks/SM as resourceallows: SM in G80 can take up to 768TFthreadsTextureL1- 256 (threads/block)L2* 3 blocks-Or 128(threads/block) * 6blocks, etc.Threadsrunconcurrently

高级计算机体系结构设计及其在数据中心和云计算的应用 Block Execution : Software View • SM block execution (G80) – Assignment in block granularity • Up to 8 blocks/SM as resource allows • SM in G80 can take up to 768 threads  256 (threads/block) * 3 blocks  Or 128 (threads/block) * 6 blocks, etc.  Threads run concurrently

点击进入文档下载页（PDF格式）

共57页，可试读19页，点击继续阅读 ↓↓

您可能感兴趣的文档

《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 12 Shared Memory Multiprocessor
《现代计算机体系结构》课程教学课件（留学生版）Lecture 1 Instruction Set Architecture（Introduction）
《现代计算机体系结构》课程教学课件（留学生版）Lecture 0 Introduction and Performance Evaluation
《现代计算机体系结构》课程教学课件（留学生版）Lecture 3 Pipelining
《现代计算机体系结构》课程教学课件（留学生版）Lecture 2 Instruction Set Architecture（Microarchitecture Implementation）
《现代计算机体系结构》课程教学课件（留学生版）Lecture 7 Multiprocessors
《现代计算机体系结构》课程教学课件（留学生版）Lecture 4 Spectualtive Execution
《现代计算机体系结构》课程教学课件（留学生版）Lecture 6 Memory Hierarchy and Cache
《现代计算机体系结构》课程教学课件（留学生版）Lecture 5 Out of Order Execution
武汉理工大学：《模式识别》课程教学资源（PPT课件）第4章基于统计决策的概率分类法
武汉理工大学：《模式识别》课程教学资源（PPT课件）第1章绪论、第2章聚类分析
武汉理工大学：《模式识别》课程教学资源（PPT课件）第3章判别函数及几何分类法
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 14 Towards Renewable Energy Powered Sustainable and Green Cloud Datacenters
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 11 Multi-core and Multi-threading
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 10 Out of Order and Speculative Execution
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 13 An Introduction to Cloud Data Centers
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 09 Case Study- Jave Branch Prediction Optimization
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 07 Instruction Decode
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 08 Instruction Fetch and Branch Predictioin
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 06 Scoreboarding and Tomasulo
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 04 Memory Data Prefetching
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 05 Core Pipelining
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 02 Memory Hierarchy and Caches
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 03 Main Memory and DRAM

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录