当前位置：和泉文库 > 计算机 > 浏览文档

电子科技大学：《GPU并行编程 GPU Parallel Programming》课程教学资源（课件讲稿）Lecture 10 Computational Thinking

– To provide you with a framework for further studies on – Thinking about the problems of parallel programming – Discussing your work with others – Approaching complex parallel programming problems – Using or building useful tools and environments

文件格式：PDF，文件大小：748.04KB，售价：9.51元

共33页，可试读12页，点击往前阅读 ↑↑

文档详细内容（约33页）

Synchronization Synchronization =Control Sharing Barriers make threads wait until all threads catch up Waiting is lost opportunity for work Atomic operations may reduce waiting Watch out for serialization Important:be aware of which items of work are truly independent 6 ②nVIDIA/ILLINOIS

6 Synchronization – Synchronization == Control Sharing – Barriers make threads wait until all threads catch up – Waiting is lost opportunity for work – Atomic operations may reduce waiting – Watch out for serialization – Important: be aware of which items of work are truly independent 6

Parallel Programming Coding Styles- Program and Data Models Program Models Data Models SPMD Shared Data Master/Worker Shared Queue Loop Parallelism Distributed Array Fork/Join These are not necessarily mutually exclusive. 7 ②nVIDIA ■uNo5

7 Parallel Programming Coding Styles – Program and Data Models Fork/Join Master/Worker SPMD Program Models Loop Parallelism Distributed Array Shared Queue Shared Data Data Models These are not necessarily mutually exclusive

Program Models SPMD (Single Program,Multiple Data) All PE's (Processor Elements)execute the same program in parallel,but has its own data Each PE uses a unique ID to access its portion of data - Different PE can follow different paths through the same code This is essentially the CUDA Grid model (also OpenCL,MPI) SIMD is a special case-WARP used for efficiency -Master/Worker Loop Parallelism -Fork/Join 8 8 ②nVIDIA/ILLINOIS

8 Program Models – SPMD (Single Program, Multiple Data) – All PE’s (Processor Elements) execute the same program in parallel, but has its own data – Each PE uses a unique ID to access its portion of data – Different PE can follow different paths through the same code – This is essentially the CUDA Grid model (also OpenCL, MPI) – SIMD is a special case – WARP used for efficiency – Master/Worker – Loop Parallelism – Fork/Join 8

SPMD 1.Initialize-establish localized data structure and communication channels. 2.Uniquify-each thread acquires a unique identifier,typically ranging from 0 to N-1,where N is the number of threads.Both OpenMP and CUDA have built-in support for this. 3.Distribute data-decompose global data into chunks and localize them,or sharing/replicating major data structures using thread IDs to associate subsets of the data to threads. 4.Compute-run the core computation!Thread IDs are used to differentiate the behavior of individual threads.Use thread ID in loop index calculations to split loop iterations among threads-beware of the potential for memory/data divergence.Use thread ID or conditions based on thread ID to branch to their specific actions- beware of the potential for instruction/execution divergence 一 5.Finalize-reconcile global data structure,and prepare for the next major iteration or group of program phases. 9 ②nVIDIA ILLINOIS

9 SPMD – 1. Initialize—establish localized data structure and communication channels. – 2. Uniquify—each thread acquires a unique identifier, typically ranging from 0 to N-1, where N is the number of threads. Both OpenMP and CUDA have built-in support for this. – 3. Distribute data—decompose global data into chunks and localize them, or sharing/replicating major data structures using thread IDs to associate subsets of the data to threads. – 4. Compute—run the core computation! Thread IDs are used to differentiate the behavior of individual threads. Use thread ID in loop index calculations to split loop iterations among threads—beware of the potential for memory/data divergence. Use thread ID or conditions based on thread ID to branch to their specific actions— beware of the potential for instruction/execution divergence. – 5. Finalize—reconcile global data structure, and prepare for the next major iteration or group of program phases

Program Models SPMD(Single Program,Multiple Data) Master/Worker (OpenMP,OpenACC,TBB) -A Master thread sets up a pool of worker threads and a bag of tasks Workers execute concurrently,removing tasks until done 一 Loop Parallelism (OpenMP,OpenACC,C++AMP) Loop iterations execute in parallel FORTRAN do-all (truly parallel),do-across (with dependence) Fork/Join (Posix p-threads) Most general,generic way of creation of threads 10 /②nVIDIA ILLINOIS

10 Program Models – SPMD (Single Program, Multiple Data) – Master/Worker (OpenMP, OpenACC, TBB) – A Master thread sets up a pool of worker threads and a bag of tasks – Workers execute concurrently, removing tasks until done – Loop Parallelism (OpenMP, OpenACC, C++AMP) – Loop iterations execute in parallel – FORTRAN do-all (truly parallel), do-across (with dependence) – Fork/Join (Posix p-threads) – Most general, generic way of creation of threads 1 0

点击进入文档下载页（PDF格式）

共33页，可试读12页，点击继续阅读 ↓↓

您可能感兴趣的文档

电子科技大学：《GPU并行编程 GPU Parallel Programming》课程教学资源（课件讲稿）Lecture 09 Parallel patterns（MERGE SORT）
电子科技大学：《GPU并行编程 GPU Parallel Programming》课程教学资源（课件讲稿）Lecture 08 Parallel Sparse Methods
电子科技大学：《GPU并行编程 GPU Parallel Programming》课程教学资源（课件讲稿）Lecture 07 JOINT CUDA-MPI PROGRAMMING
电子科技大学：《GPU并行编程 GPU Parallel Programming》课程教学资源（课件讲稿）Lecture 06 PARALLEL COMPUTATION PATTERNS（SCAN）
电子科技大学：《GPU并行编程 GPU Parallel Programming》课程教学资源（课件讲稿）Lecture 05 PARALLEL COMPUTATION PATTERNS（HISTOGRAM）
电子科技大学：《GPU并行编程 GPU Parallel Programming》课程教学资源（课件讲稿）Lecture 04 Performance considerations
电子科技大学：《GPU并行编程 GPU Parallel Programming》课程教学资源（课件讲稿）Lecture 03 MEMORY AND DATA LOCALITY
电子科技大学：《GPU并行编程 GPU Parallel Programming》课程教学资源（课件讲稿）Lecture 02 CUDA PARALLELISM MODEL
电子科技大学：《GPU并行编程 GPU Parallel Programming》课程教学资源（课件讲稿）Lecture 01 Introduction To Cuda C
《GPU并行编程 GPU Parallel Programming》课程教学资源（参考文献）NVIDIA CUDA C Programming Guide（Design Guide，June 2017）
《GPU并行编程 GPU Parallel Programming》课程教学资源（参考文献）Methods of conjugate gradients for solving linear systems
《GPU并行编程 GPU Parallel Programming》课程教学资源（参考文献）NVIDIA Parallel Prefix Sum（Scan）with CUDA（April 2007）
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）课程简介（杜平安）
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）第一章绪论
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）第二章有限元法的基本原理（平面问题有限元法）
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）第七章动态分析有限元法 FEM of Dynamic Analysis
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）第3～6章其他问题有限元法
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）第八章热分析有限元法 FEM of Thermal Analysis
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）第二篇有限元建模方法第十二章有限元建模概述 Overview of Finite Element Modeling
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）第二篇有限元建模方法第十一章有限元建模的基本原则
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）第二篇有限元建模方法第十四章几何模型的建立
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）第二篇有限元建模方法第十五章单元类型及特性定义
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）第二篇有限元建模方法第十六章网格划分方法
电子科技大学：《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源（课件讲稿）第二篇有限元建模方法第十七章模型检查与处理 Model Checking and Processing

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录