当前位置：和泉文库 > 计算机 > 浏览文档

《并行与分布式程序设计》课程教学参考书：NVIDIA《CUDA C Programming》（Professional）

文件格式：PDF，文件大小：51.56MB，售价：44.78元

文档详细内容（约517页）

INTRODUCTION We present code in following way: /distributing jobs among devices for (int i 0;i ngpus;i++) cudaSetDevice(i); cudaMemcpyAsync(d A[i],h_A[i],iBytes,cudaMemcpyDefault,stream[i]); cudaMemcpyAsync(d_B[i],h_B[i],iBytes,cudaMemcpyDefault,stream[i]); iKernel<<<grid,block,0,stream[i]>>>(d_A[i],d_B[i],d c[i],isize); cudaMemcpyAsync(gpuRef [i],d_c[i],iBytes,cudaMemcpyDefault,stream[i]); We introduce CUDA runtime functions in the following way: cudaError_t cudaDevicesynchronize (void); We present the output of programs as follows: /reduce starting reduction at device 0:Tesla M2070 with array size 16777216 grid 32768 block 512 cpu reduce elapsed 0.029138 sec cpu_sum:2139353471 gpu Warmup elapsed 0.011745 sec gpu_sum:2139353471 <<<grid 32768 block 512>>> gpu Neighbored elapsed 0.011722 sec gpu_sum:2139353471 <<<grid 32768 block 512>>> We give command-line instructions as follows: nvprof --devices 0--metrics branch_efficiency./reduce SOURCE CODE As you work through the examples in this book,you might choose either to type in all the code manually or to use the source code files that accompany the book.All of the source code used in this book is available for download at www.wrox.com/go/procudac.Once at the site,simply locate the book's title (either by using the Search box or by using one of the title lists)and click the Download Code link on the book's detail page to obtain all the source code for the book. When you work on the exercises at the end of each chapter,we highly encourage you to try to write them yourself by referencing the example codes.All the exercise code files are also downloadable from the Wrox website. ERRATA We make every effort to ensure that there are no errors in the text or in the code.However,no one is perfect,and mistakes do occur.If you find an error in one of our books,like a spelling mistake or faulty piece of code,we would be very grateful for your feedback.By sending in errata,you might save another reader hours of frustration and at the same time you will be helping us provide even higher quality information. XXV www.it-ebooks.info

xxv INTRODUCTION fl ast.indd 08/07/2014 Page xxv We present code in following way: // distributing jobs among devices for (int i = 0; i < ngpus; i++) { cudaSetDevice(i); cudaMemcpyAsync(d_A[i], h_A[i], iBytes, cudaMemcpyDefault,stream[i]); cudaMemcpyAsync(d_B[i], h_B[i], iBytes, cudaMemcpyDefault,stream[i]); iKernel<<<grid, block,0,stream[i]>>> (d_A[i], d_B[i], d_C[i],iSize); cudaMemcpyAsync(gpuRef[i], d_C[i], iBytes, cudaMemcpyDefault,stream[i]); } We introduce CUDA runtime functions in the following way: cudaError_t cudaDeviceSynchronize (void); We present the output of programs as follows: ./reduce starting reduction at device 0: Tesla M2070 with array size 16777216 grid 32768 block 512 cpu reduce elapsed 0.029138 sec cpu_sum: 2139353471 gpu Warmup elapsed 0.011745 sec gpu_sum: 2139353471 <<<grid 32768 block 512>>> gpu Neighbored elapsed 0.011722 sec gpu_sum: 2139353471 <<<grid 32768 block 512>>> We give command-line instructions as follows: $ nvprof --devices 0 --metrics branch_efficiency ./reduce SOURCE CODE As you work through the examples in this book, you might choose either to type in all the code manually or to use the source code fi les that accompany the book. All of the source code used in this book is available for download at www.wrox.com/go/procudac. Once at the site, simply locate the book’s title (either by using the Search box or by using one of the title lists) and click the Download Code link on the book’s detail page to obtain all the source code for the book. When you work on the exercises at the end of each chapter, we highly encourage you to try to write them yourself by referencing the example codes. All the exercise code fi les are also downloadable from the Wrox website. ERRATA We make every effort to ensure that there are no errors in the text or in the code. However, no one is perfect, and mistakes do occur. If you fi nd an error in one of our books, like a spelling mistake or faulty piece of code, we would be very grateful for your feedback. By sending in errata, you might save another reader hours of frustration and at the same time you will be helping us provide even higher quality information. www.it-ebooks.info

INTRODUCTION To find the errata page for this book,go to www.wrox.com/go/procudac.Then,on the book's details page,click the Book Errata link.On this page you can view all errata that has been submit- ted for this book and posted by Wrox editors. P2P.WROX.COM For author and peer discussion,join the P2P forums at p2p.wrox.com.The forums are a web-based system for you to post messages relating to Wrox books and related technologies and interact with other readers and technology users.The forums offer a subscription feature where topics of inter- est of your choosing when new posts are made to the forums can be sent to you via e-mail.Wrox authors,editors,other industry experts,and your fellow readers are present on these forums. At http://p2p.wrox.com you will find a number of different forums that will help you not only as you read this book,but also as you develop your own applications.To join the forums,just follow these steps: 1.Go to p2p.wrox.com and click the Register link. 2. Read the terms of use and click Agree. 3. Complete the required information to join as well as any optional information you wish to provide and click Submit. 4.You will receive an e-mail with information describing how to verify your account and com- plete the joining process. You can read messages in the forums without joining P2P,but in order to post your own messages, you must join.Once you join,you can post new messages and respond to messages other users post. You can read messages at any time on the web.If you would like to have new messages from a par- ticular forum sent to your e-mail address,click the "Subscribe to this Forum"icon by the forum name in the forum listing. For more information about how to use the Wrox P2P,be sure to read the P2P FAQs for answers to questions about how the forum software works as well as many common questions specific to P2P and Wrox books.To read the FAQs,click the FAQ link on any P2P page. USEFUL LINKS GTC On-Demand:http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php GTC Express Webinar Program:https://developer.nvidia.com/gpu-computing-webinars Developer Zone:www.gputechconf.com/resources/developer-zone NVIDIA Parallel Programming Blog:http://devblogs.nvidia.com/parallelforall NVIDIA Developer Zone Forums:devtalk.nvidia.com NVIDIA support e-mail:devtools-support@nvidia.com xXvi www.it-ebooks.info

xxvi INTRODUCTION fl ast.indd 08/07/2014 Page xxvi To fi nd the errata page for this book, go to www.wrox.com/go/procudac. Then, on the book’s details page, click the Book Errata link. On this page you can view all errata that has been submitted for this book and posted by Wrox editors. P2P.WROX.COM For author and peer discussion, join the P2P forums at p2p.wrox.com. The forums are a web-based system for you to post messages relating to Wrox books and related technologies and interact with other readers and technology users. The forums offer a subscription feature where topics of interest of your choosing when new posts are made to the forums can be sent to you via e-mail. Wrox authors, editors, other industry experts, and your fellow readers are present on these forums. At http://p2p.wrox.com you will fi nd a number of different forums that will help you not only as you read this book, but also as you develop your own applications. To join the forums, just follow these steps: 1. Go to p2p.wrox.com and click the Register link. 2. Read the terms of use and click Agree. 3. Complete the required information to join as well as any optional information you wish to provide and click Submit. 4. You will receive an e-mail with information describing how to verify your account and complete the joining process. You can read messages in the forums without joining P2P, but in order to post your own messages, you must join. Once you join, you can post new messages and respond to messages other users post. You can read messages at any time on the web. If you would like to have new messages from a particular forum sent to your e-mail address, click the “Subscribe to this Forum” icon by the forum name in the forum listing. For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works as well as many common questions specifi c to P2P and Wrox books. To read the FAQs, click the FAQ link on any P2P page. USEFUL LINKS GTC On-Demand: http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php GTC Express Webinar Program: https://developer.nvidia.com/gpu-computing-webinars Developer Zone: www.gputechconf.com/resources/developer-zone NVIDIA Parallel Programming Blog: http://devblogs.nvidia.com/parallelforall NVIDIA Developer Zone Forums: devtalk.nvidia.com NVIDIA support e-mail: devtools-support@nvidia.com www.it-ebooks.info

2 ❘ CHAPTER 1 HETEROGENEOUS PARALLEL COMPUTING WITH CUDA c01.indd 08/19/2014 Page 2 PARALLEL COMPUTING During the past several decades, there has been ever-increasing interest in parallel computation. The primary goal of parallel computing is to improve the speed of computation. From a pure calculation perspective, parallel computing can be defi ned as a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently. From the programmer’s perspective, a natural question is how to map the concurrent calculations onto computers. Suppose you have multiple computing resources. Parallel computing can then be defi ned as the simultaneous use of multiple computing resources (cores or computers) to perform the concurrent calculations. A large problem is broken down into smaller ones, and each smaller one is then solved concurrently on different computing resources. The software and hardware aspects of parallel computing are closely intertwined together. In fact, parallel computing usually involves two distinct areas of computing technologies: ➤ Computer architecture (hardware aspect) ➤ Parallel programming (software aspect) Computer architecture focuses on supporting parallelism at an architectural level, while parallel programming focuses on solving a problem concurrently by fully using the computational power of the computer architecture. In order to achieve parallel execution in software, the hardware must provide a platform that supports concurrent execution of multiple processes or multiple threads. Most modern processors implement the Harvard architecture, as shown in Figure 1-1, which is comprised of three main components: ➤ Memory (instruction memory and data memory) ➤ Central processing unit (control unit and arithmetic logic unit) ➤ Input/Output interfaces CPU Arithmetic Logic Unit Control Unit Instruction Memory Data Memory Input/Output FIGURE 1-1 www.it-ebooks.info

点击进入文档下载页（PDF格式）

共517页，可试读40页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录