当前位置：和泉文库 > 计算机 > 浏览文档

《并行与分布式程序设计》课程教学参考书：NVIDIA《CUDA C Programming》（Professional）

文件格式：PDF，文件大小：51.56MB，售价：44.78元

文档详细内容（约517页）

Heterogeneous Computing 9 The switch from homogeneous systems to heterogeneous systems is a milestone in the history of high-performance computing.Homogeneous computing uses one or more processor of the same architecture to execute an application.Heterogeneous computing instead uses a suite of processor architectures to execute an application,applying tasks to architectures to which they are well-suited, yielding performance improvement as a result. Although heterogeneous systems provide significant advantages compared to traditional high- performance computing systems,effective use of such systems is currently limited by the increased application design complexity.While parallel programming has received much recent attention,the inclusion of heterogeneous resources adds complexity. If you are new to parallel programming,then you can benefit from the performance improvements and advanced software tools now available on heterogeneous architectures.If you are already a good parallel programmer,adapting to parallel programming on heterogeneous architectures is straightforward. Heterogeneous Architecture A typical heterogeneous compute node nowadays consists of two multicore CPU sockets and two or more many-core GPUs.A GPU is currently not a standalone platform but a co-processor to a CPU. Therefore,GPUs must operate in conjunction with a CPU-based host through a PCI-Express bus,as shown in Figure 1-9.That is why,in GPU computing terms,the CPU is called the host and the GPU is called the device. ALU ALU Control ALU ALU Cache PCle Bus DRAM DRAM CPU GPU FIGURE 1-9 A heterogeneous application consists of two parts: >Host code Device code Host code runs on CPUs and device code runs on GPUs.An application executing on a heteroge- neous platform is typically initialized by the CPU.The CPU code is responsible for managing the environment,code,and data for the device before loading compute-intensive tasks on the device. With computational intensive applications,program sections often exhibit a rich amount of data parallelism.GPUs are used to accelerate the execution of this portion of data parallelism.When a www.it-ebooks.info

Heterogeneous Computing ❘ 9 c01.indd 08/19/2014 Page 9 The switch from homogeneous systems to heterogeneous systems is a milestone in the history of high-performance computing. Homogeneous computing uses one or more processor of the same architecture to execute an application. Heterogeneous computing instead uses a suite of processor architectures to execute an application, applying tasks to architectures to which they are well-suited, yielding performance improvement as a result. Although heterogeneous systems provide signifi cant advantages compared to traditional highperformance computing systems, effective use of such systems is currently limited by the increased application design complexity. While parallel programming has received much recent attention, the inclusion of heterogeneous resources adds complexity. If you are new to parallel programming, then you can benefi t from the performance improvements and advanced software tools now available on heterogeneous architectures. If you are already a good parallel programmer, adapting to parallel programming on heterogeneous architectures is straightforward. Heterogeneous Architecture A typical heterogeneous compute node nowadays consists of two multicore CPU sockets and two or more many-core GPUs. A GPU is currently not a standalone platform but a co-processor to a CPU. Therefore, GPUs must operate in conjunction with a CPU-based host through a PCI-Express bus, as shown in Figure 1-9. That is why, in GPU computing terms, the CPU is called the host and the GPU is called the device. Control Cache DRAM DRAM CPU GPU PCle Bus ALU ALU ALU ALU FIGURE 1-9 A heterogeneous application consists of two parts: ➤ Host code ➤ Device code Host code runs on CPUs and device code runs on GPUs. An application executing on a heterogeneous platform is typically initialized by the CPU. The CPU code is responsible for managing the environment, code, and data for the device before loading compute-intensive tasks on the device. With computational intensive applications, program sections often exhibit a rich amount of data parallelism. GPUs are used to accelerate the execution of this portion of data parallelism. When a www.it-ebooks.info

10 CHAPTER 1 HETEROGENEOUS PARALLEL COMPUTING WITH CUDA hardware component that is physically separate from the CPU is used to accelerate computationally intensive sections of an application,it is referred to as a hardware accelerator.GPUs are arguably the most common example of a hardware accelerator. NVIDIA's GPU computing platform is enabled on the following product families: Tegra GeForce Quadro Tesla The Tegra product family is designed for mobile and embedded devices such as tablets and phones, GeForce for consumer graphics,Quadro for professional visualization,and Tesla for datacenter par- allel computing.Fermi,the GPU accelerator in the Tesla product family,has recently gained wide- spread use as a computing accelerator for high-performance computing applications.Fermi,released by NVIDIA in 2010,is the world's first complete GPU computing architecture.Fermi GPU accel- erators have already redefined and accelerated high-performance computing capabilities in many areas,such as seismic processing,biochemistry simulations,weather and climate modeling,signal processing,computational finance,computer-aided engineering,computational fluid dynamics,and data analysis.Kepler,the current generation of GPU computing architecture after Fermi,released in the fall of 2012,offers much higher processing power than the prior GPU generation and provides new methods to optimize and increase parallel workload execution on the GPU,expecting to fur- ther revolutionize high-performance computing.The Tegra K1 contains a Kepler GPU and provides everything you need to unlock the power of the GPU for embedded applications. There are two important features that describe GPU capability: >Number of CUDA cores >Memory size Accordingly,there are two different metrics for describing GPU performance: >Peak computational performance Memory bandwidth Peak computational performance is a measure of computational capability,usually defined as how many single-precision or double-precision floating point calculations can be processed per second. Peak performance is usually expressed in gflops(billion floating-point operations per second)or tflops(trillion floating-point calculations per second).Memory bandwidth is a measure of the ratio at which data can be read from or stored to memory.Memory bandwidth is usually expressed in gigabytes per second,GB/s.Table 1-1 provides a brief summary of Fermi and Kepler architectural and performance features. www.it-ebooks.info

10 ❘ CHAPTER 1 HETEROGENEOUS PARALLEL COMPUTING WITH CUDA c01.indd 08/19/2014 Page 10 hardware component that is physically separate from the CPU is used to accelerate computationally intensive sections of an application, it is referred to as a hardware accelerator. GPUs are arguably the most common example of a hardware accelerator. NVIDIA’s GPU computing platform is enabled on the following product families: ➤ Tegra ➤ GeForce ➤ Quadro ➤ Tesla The Tegra product family is designed for mobile and embedded devices such as tablets and phones, GeForce for consumer graphics, Quadro for professional visualization, and Tesla for datacenter parallel computing. Fermi, the GPU accelerator in the Tesla product family, has recently gained widespread use as a computing accelerator for high-performance computing applications. Fermi, released by NVIDIA in 2010, is the world’s fi rst complete GPU computing architecture. Fermi GPU accelerators have already redefi ned and accelerated high-performance computing capabilities in many areas, such as seismic processing, biochemistry simulations, weather and climate modeling, signal processing, computational fi nance, computer-aided engineering, computational fl uid dynamics, and data analysis. Kepler, the current generation of GPU computing architecture after Fermi, released in the fall of 2012, offers much higher processing power than the prior GPU generation and provides new methods to optimize and increase parallel workload execution on the GPU, expecting to further revolutionize high-performance computing. The Tegra K1 contains a Kepler GPU and provides everything you need to unlock the power of the GPU for embedded applications. There are two important features that describe GPU capability: ➤ Number of CUDA cores ➤ Memory size Accordingly, there are two different metrics for describing GPU performance: ➤ Peak computational performance ➤ Memory bandwidth Peak computational performance is a measure of computational capability, usually defi ned as how many single-precision or double-precision fl oating point calculations can be processed per second. Peak performance is usually expressed in gflops (billion fl oating-point operations per second) or tflops (trillion fl oating-point calculations per second). Memory bandwidth is a measure of the ratio at which data can be read from or stored to memory. Memory bandwidth is usually expressed in gigabytes per second, GB/s. Table 1-1 provides a brief summary of Fermi and Kepler architectural and performance features. www.it-ebooks.info

点击进入文档下载页（PDF格式）

共517页，可试读40页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录