Parallel Algorithms Underlying MPl Implementations
Parallel Algorithms Underlying MPI Implementations
Parallel Algorithms Underlying MPI Implementations This chapter looks at a few of the parallel algorithms underlying the implementations of some simple MPI calls The purpose of this is not to teach you how to roll your own"versions of these routines, but rather to help you start thinking about algorithms in a parallel fashion First, the method of recursive halving and doubling, which is the algorithm underlying operations such as broadcasts and reduction operations, is discussed Then, specific examples of parallel algorithms that implement message passing are given
Parallel Algorithms Underlying MPI Implementations • This chapter looks at a few of the parallel algorithms underlying the implementations of some simple MPI calls. • The purpose of this is not to teach you how to "roll your own" versions of these routines, but rather to help you start thinking about algorithms in a parallel fashion. • First, the method of recursive halving and doubling, which is the algorithm underlying operations such as broadcasts and reduction operations, is discussed. • Then, specific examples of parallel algorithms that implement message passing are given
Recursive Halving and Doubling
Recursive Halving and Doubling
Recursive Halving and Doubling To illustrate recursive halving and doubling suppose you have a vector distributed among p processors, and you need the sum of all components of the vector in each processor, i.e a sum reduction One method is to use a tree-based algorithm to compute the sum to a single processor and then broadcast the sum to every processor
Recursive Halving and Doubling • To illustrate recursive halving and doubling, suppose you have a vector distributed among p processors, and you need the sum of all components of the vector in each processor, i.e., a sum reduction. • One method is to use a tree-based algorithm to compute the sum to a single processor and then broadcast the sum to every processor
I Recursive Halving and Doubling Assume that each processor has formed the partial sum of the components of the vector that it has Step 1: Processor 2 sends its partial sum to processor 1 and processor 1 adds this partial sum to its own. Meanwhile, processor 4 sends its partial sum to processor 3 and processor 3 performs a similar summation Step 2: Processor 3 sends its partial sum, which is now the sum of the components on processors 3 and 4, to processor 1 and processor 1 adds it to its partial sum to get the final sum across all the components of the vector. At each stage of the process, the number of processes doing work is cut in half. The algorithm is depicted in the Figure 13.1 below, where the solid arrow denotes a send operation and the dotted line arrow denotes a receive operation followed by a summation
Recursive Halving and Doubling • Assume that each processor has formed the partial sum of the components of the vector that it has. • Step 1: Processor 2 sends its partial sum to processor 1 and processor 1 adds this partial sum to its own. Meanwhile, processor 4 sends its partial sum to processor 3 and processor 3 performs a similar summation. • Step 2: Processor 3 sends its partial sum, which is now the sum of the components on processors 3 and 4, to processor 1 and processor 1 adds it to its partial sum to get the final sum across all the components of the vector. • At each stage of the process, the number of processes doing work is cut in half. The algorithm is depicted in the Figure 13.1 below, where the solid arrow denotes a send operation and the dotted line arrow denotes a receive operation followed by a summation