当前位置：和泉文库 > 计算机 > 浏览文档

《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 15 GPGPU Architecture and Programming Paradigm

文件格式：PDF，文件大小：4.72MB，售价：12.6元

文档详细内容（约57页）

高级计算机体系结构设计及其在数据中心和云计算的应用Examples : G80 and GT200. MT- unit (Global Block)Global BlockSchedderGT200TPCOschedulerSM ControlierOsMControtioeSMControfeTPC - texture24K8KETX0processor cluster0(group of SM sharesame texture unit)GiobatBlockSchedulenG80TPCC2 GPU generation G80SMControler7SM ControllerOSMController1and GT200 shownSM

高级计算机体系结构设计及其在数据中心和云计算的应用 Examples : G80 and GT200 • MT- unit (Global Block) scheduler • TPC – texture processor cluster (group of SM share same texture unit) • 2 GPU generation G80 and GT200 shown

高级计算机体系结构设计及其在数据中心和云计算的应用Examples : GT300GT300 (Fermi)InstructionCachWarp SohedulerWarp ScheculerDispatch UnitDispatch UnitRegisterFile(32768x32-bitRANSHDRAMCUDA CoreDispatch PonSFUL2CacheFPUnitINTUnitDRAMSFUResult CueueDRANSHUFermi's16SMarepositionedaroundacommonL2cache.EachSMisavertical rectangularstripthatcontainanorangeportion64KBSharedMemoryL1Cach(scheduleranddispatch),agreenportion(executionunits),andlightblueportions(registerfileandLicache)FermiStreamingMultiprocessor(SM)http://www.nvidia.com/content/PDF/fermiwhitepapers/NVIDIAFermiCompute_Architecture_Whitepaper.pdf

高级计算机体系结构设计及其在数据中心和云计算的应用 Examples : GT300 • GT300 (Fermi) http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA _Fermi_Compute_Architecture_Whitepaper.pdf Fermi’s 16 SM are positioned around a common L2 cache. Each SM is a vertical rectangular strip that contain an orange portion (scheduler and dispatch), a green portion (execution units), and light blue portions (register file and L1 cache) Fermi Streaming Multiprocessor (SM)

高级计算机体系结构设计及其在数据中心和云计算的应用ComparisonG80 S. GT200 vS. GT300GPUG80GT200Fermi681million1.4billion3.0billionTransistors128240512CUDACoresNone30FMAops/clock256FMAops/clockDouble PrecisionFloatingPoint Capability128MAD240MADOPS/512FMAops/clockSingle PrecisionFloatingops/clockclockPointCapability112Warpschedulers(perSM)224SpecialFunction Units(SFUs)/SM16KB16KBSharedMemory(perSM)Configurable48KBor16KBL1Cache(perSM)NoneNoneConfigurable16KBor48KBNone768KBL2Cache(perSM)NoneNoNoYesEcCMemorySupportNoNoConcurrentKernelsUpto1632-bit32-bit64-bitLoad/StoreAddressWidthhttp:/www.dvhardware.net/article38173.html

高级计算机体系结构设计及其在数据中心和云计算的应用 Comparison • G80 vs. GT200 vs. GT300 http://www.dvhardware.net/article38173.html

高级计算机体系结构设计及其在数据中心和云计算的应用Example: GK110 (Kepler Architecture)PCiExpress1.0HosIntertacL2CacheKepler:FastfEfficientTexSMSMXMorepower efficient than FermiFermiKeplerNewSMarchitecture(SMX)CNRLLOECONTROLLOGC3xRevampedmemoryarchitectureHardwaresupportfornewPerf/Wattprogramingmodels192coresCapableofDynamicParallelismSource:http://www.nvidia.com/content/PDF/kepler/NVIDIA-KeplerGK110-Architecture-Whitenaner.nd

高级计算机体系结构设计及其在数据中心和云计算的应用 Example: GK110 (Kepler Architecture) More power efficient than Fermi. New SM architecture (SMX). Revamped memory architecture. Hardware support for new programing models. Capable of Dynamic Parallelism. Source: http://www.nvidia.com/content/PDF/kepler/NVIDIA-KeplerGK110-Architecture-Whitepaper.pdf

高级计算机体系结构设计及其在数据中心和云计算的应用BasicGPGPUProcessor PipelineSimplein-orderexecutioninSIMT-SingleinstructionmultiplethreadsSchedule Warp andFetchInstructionSchedulerchooses one of severalwarps (PC)水I-cacheFetches 1 instruction from the Is per warpDecode+I-BufferandDecodesthe instruction,reads register andScoreboarddispatchesSharedIssueInstructionScoreboard maintains dependenciesMemoryRegister FileMulti-ported registerfileprovidesdataforalllanesSpecialLoadiIntegerFloatStoreUnitALUALUFunctions1NumerousALU,FPU,LD/ST,SFUlanesrunOf-chipDataRegister Write Backsimultaneously (differentspeeds)DRAMicacheWriteback updatestheregisterfile

高级计算机体系结构设计及其在数据中心和云计算的应用 Basic GPGPU Processor Pipeline • Simple in-order execution in SIMT – Single instruction multiple threads • Scheduler chooses one of several warps (PC) • Fetches 1 instruction from the I$ per warp • Decodes the instruction, reads register and dispatches – Scoreboard maintains dependencies • Multi-ported register file provides data for all lanes • Numerous ALU, FPU, LD/ST, SFU lanes run simultaneously (different speeds) • Writeback updates the register file

点击进入文档下载页（PDF格式）

共57页，可试读19页，点击继续阅读 ↓↓

您可能感兴趣的文档

《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 12 Shared Memory Multiprocessor
《现代计算机体系结构》课程教学课件（留学生版）Lecture 1 Instruction Set Architecture（Introduction）
《现代计算机体系结构》课程教学课件（留学生版）Lecture 0 Introduction and Performance Evaluation
《现代计算机体系结构》课程教学课件（留学生版）Lecture 3 Pipelining
《现代计算机体系结构》课程教学课件（留学生版）Lecture 2 Instruction Set Architecture（Microarchitecture Implementation）
《现代计算机体系结构》课程教学课件（留学生版）Lecture 7 Multiprocessors
《现代计算机体系结构》课程教学课件（留学生版）Lecture 4 Spectualtive Execution
《现代计算机体系结构》课程教学课件（留学生版）Lecture 6 Memory Hierarchy and Cache
《现代计算机体系结构》课程教学课件（留学生版）Lecture 5 Out of Order Execution
武汉理工大学：《模式识别》课程教学资源（PPT课件）第4章基于统计决策的概率分类法
武汉理工大学：《模式识别》课程教学资源（PPT课件）第1章绪论、第2章聚类分析
武汉理工大学：《模式识别》课程教学资源（PPT课件）第3章判别函数及几何分类法
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 14 Towards Renewable Energy Powered Sustainable and Green Cloud Datacenters
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 11 Multi-core and Multi-threading
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 10 Out of Order and Speculative Execution
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 13 An Introduction to Cloud Data Centers
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 09 Case Study- Jave Branch Prediction Optimization
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 07 Instruction Decode
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 08 Instruction Fetch and Branch Predictioin
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 06 Scoreboarding and Tomasulo
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 04 Memory Data Prefetching
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 05 Core Pipelining
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 02 Memory Hierarchy and Caches
《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 03 Main Memory and DRAM

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录