I Recursive Halving and Doubling p Figure 13.1. Summation in log(N) steps
Recursive Halving and Doubling Figure 13.1. Summation in log(N) steps
I Recursive Halving and Doubling Step 3: Processor 1 then must broadcast this sum to all other processors. This broadcast operation can be done using the same communication structure as the summation but in reverse. You will see pseudocode for this at the end of this section. Note that if the total number of processors is N, then only 2 log(N)(log base 2) steps are needed to complete the operation There is an even more efficient way to finish the job in only log(N) steps. By way of example, look at the next figure containing 8 processors. At each step, processor i and processor i+k send and receive data in a pairwise fashion and then perform the summation k is iterated from 1 through N/2 in powers of 2. If the total number of processors is N, then log(N) steps are needed. As an exercise, you should write out the necessary pseudocode for this example
Recursive Halving and Doubling • Step 3: Processor 1 then must broadcast this sum to all other processors. This broadcast operation can be done using the same communication structure as the summation, but in reverse. You will see pseudocode for this at the end of this section. Note that if the total number of processors is N, then only 2 log(N) (log base 2) steps are needed to complete the operation. • There is an even more efficient way to finish the job in only log(N) steps. By way of example, look at the next figure containing 8 processors. At each step, processor i and processor i+k send and receive data in a pairwise fashion and then perform the summation. k is iterated from 1 through N/2 in powers of 2. If the total number of processors is N, then log(N) steps are needed. As an exercise, you should write out the necessary pseudocode for this example
I Recursive Halving and Doubling pI p2 p3 p4 p5 p6 p7 Qoq9QsQo Figure 13. 2. Summation to all processors in log(N) steps
Recursive Halving and Doubling Figure 13.2. Summation to all processors in log(N) steps
I Recursive Halving and Doubling What about adding vectors? That is, how do you add several vectors component-wise to get a new vector? The answer is, you employ the method discussed earlier in a component-wise fashion This fascinating way to reduce the communications and to avoid abundant summations is described next. this method utilizes the recursive halving and doubling technique and is illustrated in Figure 13.3
Recursive Halving and Doubling • What about adding vectors? That is, how do you add several vectors component-wise to get a new vector? The answer is, you employ the method discussed earlier in a component-wise fashion. This fascinating way to reduce the communications and to avoid abundant summations is described next. This method utilizes the recursive halving and doubling technique and is illustrated in Figure 13.3
I Recursive Halving and Doubling Suppose there are 4 processors and the length of each vector is also 4 Step 1: Processor pO sends the first two components of the vector to rocessor p1, and p1 sends the last two components of the vector to pO. Then po gets the partial sums for the last two components, and p1 gets the partial sums for the first two components. So do p2 and p3 Step 2: Processor po sends the partial sum of the third component to processor p3. Processor p3 then adds to get the total sum of the third component. Similarly, processor 0, 1 and 2 find the total sums of the 4th, 2nd, and 1st components, respectively. Now the sum of the vectors are found and the components are stored in different processors Step 3: Broadcast the result using the reverse of the above communication process
Recursive Halving and Doubling • Suppose there are 4 processors and the length of each vector is also 4. • Step 1: Processor p0 sends the first two components of the vector to processor p1, and p1 sends the last two components of the vector to p0. Then p0 gets the partial sums for the last two components, and p1 gets the partial sums for the first two components. So do p2 and p3. • Step 2: Processor p0 sends the partial sum of the third component to processor p3. Processor p3 then adds to get the total sum of the third component. Similarly, processor 0,1 and 2 find the total sums of the 4th, 2nd, and 1st components, respectively. Now the sum of the vectors are found and the components are stored in different processors. • Step 3: Broadcast the result using the reverse of the above communication process