Typical Applications of Scan Scan is a simple and useful parallel building block -Convert recurrences from sequential: for(j=1;j<n;j++) out[j]out[j-1]f(j); Into parallel: forall(j)temp[j]=f(j)}; scan (out,temp); 电子科妓女学 O
6 Typical Applications of Scan – Scan is a simple and useful parallel building block – Convert recurrences from sequential: for(j=1;j<n;j++) out[j] = out[j-1] + f(j); – Into parallel: forall(j) { temp[j] = f(j) }; scan(out, temp);
Other Applications Assigning camping spots Assigning Farmer's Market spaces 一 Allocating memory to parallel threads Allocating memory buffer space for communication channels 电子科妓女学 O
7 Other Applications – Assigning camping spots – Assigning Farmer’s Market spaces – Allocating memory to parallel threads – Allocating memory buffer space for communication channels – … 7
An Inclusive Sequential Addition Scan Given a sequence [Xo,X1,X2,... Calculate output [yo,y1,y2,...] Such that yo=Xo y1=X0+X1 y2=X0+X1+X2 Using a recursive definition y%=y%-1+X 电子科妓女学 O
8 An Inclusive Sequential Addition Scan Given a sequence [x0 , x1 , x2 , ... ] Calculate output [y0 , y1 , y2 , ... ] Such that y0 = x0 y1 = x0 + x1 y2 = x0 + x1+ x2 … Using a recursive definition yi = yi − 1 + xi 8
A Work Efficient C Implementation y[0]=[0]: for(i=1;i<Max_;i++)y[0=y[i-1]+[0]: Computationally efficient: N additions needed for N elements-O(N)! Only slightly more expensive than sequential reduction. 电子科妓女学 O
9 A Work Efficient C Implementation y[0] = x[0]; for (i = 1; i < Max_i; i++) y[i] = y [i-1] + x[i]; Computationally efficient: N additions needed for N elements - O(N) ! Only slightly more expensive than sequential reduction. 9
A Naive Inclusive Parallel Scan Assign one thread to calculate each y element Have every thread to add up all x elements needed for the y element yo=Xo y1=X0+X1 y2=X0+X1+X2 "Parallel programming is easy as long as you do not care about performance." 电子科妓女学 O
10 A Naïve Inclusive Parallel Scan – Assign one thread to calculate each y element – Have every thread to add up all x elements needed for the y element y0 = x0 y1 = x0 + x1 y2 = x0 + x1+ x2 “Parallel programming is easy as long as you do not care about performance.” 10