当前位置：和泉文库 > 计算机 > 浏览文档

《GPU并行编程 GPU Parallel Programming》课程教学资源（参考文献）NVIDIA Parallel Prefix Sum（Scan）with CUDA（April 2007）

文件格式：PDF，文件大小：499.04KB，售价：6.18元

文档详细内容（约21页）

Parallel Prefix Sum(Scan)with CUDA Introduction A simple and common parallel algorithm building block is the allprefix-sums operation.In this paper we will define and illustrate the operation,and discuss in detail its efficient implementation on NVIDIA CUDA.As mentioned by Blelloch [1],all-prefix-sums is a good example of a computation that seems inherently sequential,but for which there is an efficient parallel algorithm.The all-prefix-sums operation is defined as follows in [1]: Definition:The all-prefix-sums operation takes a binary associative operator and an array of n elements [ao,a,...,a and returns the array [a,(⊕a),,(⊕4©.⊕a-l Example:If is addition,then the all-prefix-sums operation on the array [317041631, would return [34111114162225. There are many uses for all-prefix-sums,including,but not limited to sorting,lexical analysis, string comparison,polynomial evaluation,stream compaction,and building histograms and data structures(graphs,trees,etc.)in parallel.For example applications,we refer the reader to the survey by Blelloch [1]. In general,all-prefix-sums can be used to convert some sequential computations into equivalent,but parallel,computations as shown in Figure 1. out[0]=0 forall j in parallel do forall j from 1 to n do temp[j]f(in[j]); out[j]outlj-1]f(in[j-1)); all prefix sums (out,temp); Figure 1:A sequential computation and its parallel equivalent. Inclusive and Exclusive Scan All-prefix-sums on an array of data is commonly known as scan.We will use this simpler terminology (which comes from theAPL programming language [11)for the remainder of this paper.As shown in the last section,a scan of an array generates a new array where each element jis the sum of all elements up to and including j.This is an inclsire scan.It is often useful for each element /in the results of a scan to contain the sum of all previous elements, but not jitself.This operation is commonly known as an exclusire scan(or prescan)[1]. Definition:The exclusive scan operation takes a binary associative operator with identity I,and an array of n elements [,a,,as-1 April 2007 3

Parallel Prefix Sum (Scan) with CUDA April 2007 3 Introduction A simple and common parallel algorithm building block is the all-prefix-sums operation. In this paper we will define and illustrate the operation, and discuss in detail its efficient implementation on NVIDIA CUDA. As mentioned by Blelloch [1], all-prefix-sums is a good example of a computation that seems inherently sequential, but for which there is an efficient parallel algorithm. The all-prefix-sums operation is defined as follows in [1]: Definition: The all-prefix-sums operation takes a binary associative operator ⊕, and an array of n elements [a0, a1, …, an-1], and returns the array [a0, (a0 ⊕ a1), …, (a0 ⊕ a1 ⊕ … ⊕ an-1)]. Example: If ⊕ is addition, then the all-prefix-sums operation on the array [3 1 7 0 4 1 6 3], would return [3 4 11 11 14 16 22 25]. There are many uses for all-prefix-sums, including, but not limited to sorting, lexical analysis, string comparison, polynomial evaluation, stream compaction, and building histograms and data structures (graphs, trees, etc.) in parallel. For example applications, we refer the reader to the survey by Blelloch [1]. In general, all-prefix-sums can be used to convert some sequential computations into equivalent, but parallel, computations as shown in Figure 1. out[0] = 0; forall j from 1 to n do out[j] = out[j-1] + f(in[j-1]); forall j in parallel do temp[j] = f(in[j]); all_prefix_sums(out, temp); Figure 1: A sequential computation and its parallel equivalent. Inclusive and Exclusive Scan All-prefix-sums on an array of data is commonly known as scan. We will use this simpler terminology (which comes from the APL programming language [1]) for the remainder of this paper. As shown in the last section, a scan of an array generates a new array where each element j is the sum of all elements up to and including j. This is an inclusive scan. It is often useful for each element j in the results of a scan to contain the sum of all previous elements, but not j itself. This operation is commonly known as an exclusive scan (or prescan) [1]. Definition: The exclusive scan operation takes a binary associative operator ⊕ with identity I, and an array of n elements [a0, a1, …, an-1]

点击进入文档下载页（PDF格式）

共21页，可试读7页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录