LECTURE4 PERFORMANCE CONSIDERATIONS
Warps and SIMD Hardware
Warps and SIMD performance impact of control divergence Parallel reduction Memory parallelism 电子件做女字
2 Warps and SIMD performance impact of control divergence Parallel reduction Memory parallelism
Objective To understand how CUDA threads execute on SIMD Hardware Warp partitioning -SIMD Hardware - Control divergence 电子科妓女学 O
3 Objective – To understand how CUDA threads execute on SIMD Hardware – Warp partitioning – SIMD Hardware – Control divergence
执行过程 软件 硬件 0 □ CUDA Core=ALU=SP CUDA 线程 Core 瑙 SM=内核=逻辑架构里的CORE Thread Multiprocessor Block 当调用kerneli函数时,启动起来很多线程 ,然后分配给硬件去执行,执行过程中要 占用硬件资源。 电F神线女学 Device
4 软件 硬件 CUDA Core=ALU=SP 线程 CUDA Core Thread Block Multiprocessor Grid 执行过程 Device SM=内核=逻辑架构里的CORE 当调用kernel函数时,启动起来很多线程 ,然后分配给硬件去执行,执行过程中要 占用硬件资源
Warps A thread block consists of 32- 32 Threads thread warps 32 Threads 汹 日 32 Threads A warp is executed physically in 32 Threads Thread Multiprocessor parallel (SIMD)on a Block Warps multiprocessor 电子科妓女学 O
5 Multiprocessor 32 Threads Warps A thread block consists of 32- thread warps A warp is executed physically in parallel (SIMD) on a multiprocessor = Warps Thread Block 32 Threads 32 Threads 32 Threads