® nVIDIA. GPU Teaching Kit Accelerated Computing ILLINOIS UN VERSITY OF ILLINUS AT URBANA-LHANPWGN Module 11-Computational Thinking
Accelerated Computing GPU Teaching Kit Module 11 – Computational Thinking
Objective To provide you with a framework for further studies on Thinking about the problems of parallel programming Discussing your work with others Approaching complex parallel programming problems Using or building useful tools and environments 2 ②nVIDIA/ ILLINOIS
2 Objective – To provide you with a framework for further studies on – Thinking about the problems of parallel programming – Discussing your work with others – Approaching complex parallel programming problems – Using or building useful tools and environments
Fundamentals of Parallel Computing Parallel computing requires that The problem can be decomposed into sub-problems that can be safely solved at the same time The programmer structures the code and data to solve these sub-problems concurrently The goals of parallel computing are To solve problems in less time (strong scaling),and/or To solve bigger problems (weak scaling),and/or To achieve better solutions (advancing science) The problems must be large enough to justify parallel computing and to exhibit exploitable concurrency. 3 ②nVIDIA ■tuNo5
3 Fundamentals of Parallel Computing – Parallel computing requires that – The problem can be decomposed into sub-problems that can be safely solved at the same time – The programmer structures the code and data to solve these sub-problems concurrently – The goals of parallel computing are – To solve problems in less time (strong scaling), and/or – To solve bigger problems (weak scaling), and/or – To achieve better solutions (advancing science) The problems must be large enough to justify parallel computing and to exhibit exploitable concurrency
Shared Memory vs.Message Passing We have focused on shared memory parallel programming This is what CUDA(and OpenMP,OpenCL)is based on Future massively parallel microprocessors are expected to support shared memory at the chip level The programming considerations of message passing model is quite different! However,you will find parallels for almost every technique you learned in this course Need to be aware of space-time constraints ②nVIDIA ILLINOIS
4 Shared Memory vs. Message Passing – We have focused on shared memory parallel programming – This is what CUDA (and OpenMP, OpenCL) is based on – Future massively parallel microprocessors are expected to support shared memory at the chip level – The programming considerations of message passing model is quite different! – However, you will find parallels for almost every technique you learned in this course – Need to be aware of space-time constraints
Data Sharing Data sharing can be a double-edged sword Excessive data sharing drastically reduces advantage of parallel execution Localized sharing can improve memory bandwidth efficiency Efficient memory bandwidth usage can be achieved by synchronizing the execution of task groups and coordinating their usage of memory data Efficient use of on-chip,shared storage and datapaths Read-only sharing can usually be done at much higher efficiency than read-write sharing,which often requires more synchronization - Many:Many,One:Many,Many:One,One:One 5 ②nVIDIA ILLINOIS
5 Data Sharing – Data sharing can be a double-edged sword – Excessive data sharing drastically reduces advantage of parallel execution – Localized sharing can improve memory bandwidth efficiency – Efficient memory bandwidth usage can be achieved by synchronizing the execution of task groups and coordinating their usage of memory data – Efficient use of on-chip, shared storage and datapaths – Read-only sharing can usually be done at much higher efficiency than read-write sharing, which often requires more synchronization – Many:Many, One:Many, Many:One, One:One