Processors Multistage interconnection network Memory banks 0 Stage I Stage 2 Stage n +0+++0++t++++++++4++ P-1 b- Equation 2.1 represents a left-rotation operation on the binary representation of i to obtain j.This interconnection pattern is called a perfect shuffle.Fiqure 2.10 shows a perfect shuffle interconnection pattern for eight inputs and outputs.At each stage of an omega network,a perfect shuffle interconnection pattern feeds into a set of p/2 switches or switching nodes.Each switch is in one of two connection modes. In one mode,the inputs are sent straight through to the outputs,as shown in Fiqure 2.11(a).This is called the pass-through connection.In the other mode,the inputs to the switching node are crossed over and then sent out,as shown in Figure 2.11(b).This is called the cross-over connection. Figure 2.10.A perfect shuffle interconnection for eight inputs and outputs. 0000 0 000 left_rotate(000) 001 I 001 left_rotate(100) 010 2 -2 010=left rotate(001) 011 3 3 011 left_rotate(101) 100 4 4 100 left_rotate(010) 101 5 101 left_rotate(110) 110 6 6 110 left rotate(011) 111 7 7 111 left_rotate(111) Figure 2.11.Two switching configurations of the 2 x 2 switch:(a)Pass-through;(b) Cross-over. (a) (b) An omega network has p/2 x log p switching nodes,and the cost of such a network grows as e(p log p). Note that this cost is less than the (p2)cost of a complete crossbar network.Figure 2.12 shows an omega network for eight processors (denoted by the binary numbers on the left)and eight memory banks (denoted by the binary numbers on the right).Routing data in an omega network is accomplished using a simple scheme.Let s be the binary representation of a processor that needs to write some data into memory bank t.The data traverses the link to the first switching node.If the most significant bits of s and t are the same,then the data is routed in pass-through mode by the switch.If these bits are different, then the data is routed through in crossover mode.This scheme is repeated at the next switching stage
Equation 2.1 represents a left-rotation operation on the binary representation of i to obtain j. This interconnection pattern is called a perfect shuffle. Figure 2.10 shows a perfect shuffle interconnection pattern for eight inputs and outputs. At each stage of an omega network, a perfect shuffle interconnection pattern feeds into a set of p/2 switches or switching nodes. Each switch is in one of two connection modes. In one mode, the inputs are sent straight through to the outputs, as shown in Figure 2.11(a). This is called the pass-through connection. In the other mode, the inputs to the switching node are crossed over and then sent out, as shown in Figure 2.11(b). This is called the cross-over connection. Figure 2.10. A perfect shuffle interconnection for eight inputs and outputs. Figure 2.11. Two switching configurations of the 2 x 2 switch: (a) Pass-through; (b) Cross-over. An omega network has p/2 x log p switching nodes, and the cost of such a network grows as (p log p). Note that this cost is less than the (p2) cost of a complete crossbar network. Figure 2.12 shows an omega network for eight processors (denoted by the binary numbers on the left) and eight memory banks (denoted by the binary numbers on the right). Routing data in an omega network is accomplished using a simple scheme. Let s be the binary representation of a processor that needs to write some data into memory bank t. The data traverses the link to the first switching node. If the most significant bits of s and t are the same, then the data is routed in pass-through mode by the switch. If these bits are different, then the data is routed through in crossover mode. This scheme is repeated at the next switching stage Page 19 of 56 file://C:\Documents and Settings\CIC000\Local Settings\Temp\~hh1AB0.htm 2/8/2013
using the next most significant bit.Traversing log p stages uses all log p bits in the binary representations of s and t. Figure 2.12.A complete omega network connecting eight inputs and eight outputs. 000 000 001 001 010 010 011 011 100 100 101 101 110 110 111 111 Figure 2.13 shows data routing over an omega network from processor two(010)to memory bank seven (111)and from processor six(110)to memory bank four (100).This figure also illustrates an important property of this network.When processor two (010)is communicating with memory bank seven (111),it blocks the path from processor six(110)to memory bank four(100).Communication link AB is used by both communication paths.Thus,in an omega network,access to a memory bank by a processor may disallow access to another memory bank by another processor.Networks with this property are referred to as blocking networks. Figure 2.13.An example of blocking in omega network:one of the messages (010 to 111 or 110 to 100)is blocked at link AB. 000 000 001 001 010 010 011 011 100 100 101 101 110 110 111 111 Completely-Connected Network In a completely-connected network,each node has a direct communication link to every other node in the network.Fiqure 2.14(a)illustrates a completely-connected network of eight nodes.This network is ideal in the sense that a node can send a message to another node in a single step,since a communication link exists between them.Completely-connected networks are the static counterparts of crossbar switching networks,since in both networks,the communication between any input/output pair does not block communication between any other pair. Figure 2.14.(a)A completely-connected network of eight nodes;(b)a star connected network of nine nodes
using the next most significant bit. Traversing log p stages uses all log p bits in the binary representations of s and t. Figure 2.12. A complete omega network connecting eight inputs and eight outputs. Figure 2.13 shows data routing over an omega network from processor two (010) to memory bank seven (111) and from processor six (110) to memory bank four (100). This figure also illustrates an important property of this network. When processor two (010) is communicating with memory bank seven (111), it blocks the path from processor six (110) to memory bank four (100). Communication link AB is used by both communication paths. Thus, in an omega network, access to a memory bank by a processor may disallow access to another memory bank by another processor. Networks with this property are referred to as blocking networks. Figure 2.13. An example of blocking in omega network: one of the messages (010 to 111 or 110 to 100) is blocked at link AB. Completely-Connected Network In a completely-connected network, each node has a direct communication link to every other node in the network. Figure 2.14(a) illustrates a completely-connected network of eight nodes. This network is ideal in the sense that a node can send a message to another node in a single step, since a communication link exists between them. Completely-connected networks are the static counterparts of crossbar switching networks, since in both networks, the communication between any input/output pair does not block communication between any other pair. Figure 2.14. (a) A completely-connected network of eight nodes; (b) a star connected network of nine nodes. Page 20 of 56 file://C:\Documents and Settings\CIC000\Local Settings\Temp\~hh1AB0.htm 2/8/2013
(a) (b) Star-Connected Network In a star-connected network,one processor acts as the central processor.Every other processor has a communication link connecting it to this processor.Fiqure 2.14(b)shows a star-connected network of nine processors.The star-connected network is similar to bus-based networks.Communication between any pair of processors is routed through the central processor,just as the shared bus forms the medium for all communication in a bus-based network.The central processor is the bottleneck in the star topology. Linear Arrays,Meshes,and k-d Meshes Due to the large number of links in completely connected networks,sparser networks are typically used to build parallel computers.A family of such networks spans the space of linear arrays and hypercubes.A linear array is a static network in which each node(except the two nodes at the ends)has two neighbors, one each to its left and right.A simple extension of the linear array (Fiqure 2.15(a))is the ring or a 1-D torus (Figure 2.15(b)).The ring has a wraparound connection between the extremities of the linear array. In this case,each node has two neighbors. Figure 2.15.Linear arrays:(a)with no wraparound links;(b)with wraparound link. (a) (b) A two-dimensional mesh illustrated in Figure 2.16(a)is an extension of the linear array to two-dimensions. Each dimension has P nodes with a node identified by a two-tuple(,)Every node(except those on the periphery)is connected to four other nodes whose indices differ in any dimension by one.A 2-D mesh has the property that it can be laid out in 2-D space,making it attractive from a wiring standpoint. Furthermore,a variety of regularly structured computations map very naturally to a 2-D mesh.For this reason,2-D meshes were often used as interconnects in parallel machines.Two dimensional meshes can be augmented with wraparound links to form two dimensional tori illustrated in Fiqure 2.16(b).The three- dimensional cube is a generalization of the 2-D mesh to three dimensions,as illustrated in Fiqure 2.16(c) Each node element in a 3-D cube,with the exception of those on the periphery,is connected to six other nodes,two along each of the three dimensions.A variety of physical simulations commonly executed on parallel computers(for example,3-D weather modeling,structural modeling,etc.)can be mapped naturally to 3-D network topologies.For this reason,3-D cubes are used commonly in interconnection networks for parallel computers (for example,in the Cray T3E). Figure 2.16.Two and three dimensional meshes:(a)2-D mesh with no wraparound; (b)2-D mesh with wraparound link(2-D torus);and(c)a 3-D mesh with no wraparound
Star-Connected Network In a star-connected network, one processor acts as the central processor. Every other processor has a communication link connecting it to this processor. Figure 2.14(b) shows a star-connected network of nine processors. The star-connected network is similar to bus-based networks. Communication between any pair of processors is routed through the central processor, just as the shared bus forms the medium for all communication in a bus-based network. The central processor is the bottleneck in the star topology. Linear Arrays, Meshes, and k-d Meshes Due to the large number of links in completely connected networks, sparser networks are typically used to build parallel computers. A family of such networks spans the space of linear arrays and hypercubes. A linear array is a static network in which each node (except the two nodes at the ends) has two neighbors, one each to its left and right. A simple extension of the linear array (Figure 2.15(a)) is the ring or a 1-D torus (Figure 2.15(b)). The ring has a wraparound connection between the extremities of the linear array. In this case, each node has two neighbors. Figure 2.15. Linear arrays: (a) with no wraparound links; (b) with wraparound link. A two-dimensional mesh illustrated in Figure 2.16(a) is an extension of the linear array to two-dimensions. Each dimension has nodes with a node identified by a two-tuple (i, j). Every node (except those on the periphery) is connected to four other nodes whose indices differ in any dimension by one. A 2-D mesh has the property that it can be laid out in 2-D space, making it attractive from a wiring standpoint. Furthermore, a variety of regularly structured computations map very naturally to a 2-D mesh. For this reason, 2-D meshes were often used as interconnects in parallel machines. Two dimensional meshes can be augmented with wraparound links to form two dimensional tori illustrated in Figure 2.16(b). The threedimensional cube is a generalization of the 2-D mesh to three dimensions, as illustrated in Figure 2.16(c). Each node element in a 3-D cube, with the exception of those on the periphery, is connected to six other nodes, two along each of the three dimensions. A variety of physical simulations commonly executed on parallel computers (for example, 3-D weather modeling, structural modeling, etc.) can be mapped naturally to 3-D network topologies. For this reason, 3-D cubes are used commonly in interconnection networks for parallel computers (for example, in the Cray T3E). Figure 2.16. Two and three dimensional meshes: (a) 2-D mesh with no wraparound; (b) 2-D mesh with wraparound link (2-D torus); and (c) a 3-D mesh with no wraparound. Page 21 of 56 file://C:\Documents and Settings\CIC000\Local Settings\Temp\~hh1AB0.htm 2/8/2013
a b (e) The general class of k-d meshes refers to the class of topologies consisting of d dimensions with k nodes along each dimension.Just as a linear array forms one extreme of the k-d mesh family,the other extreme is formed by an interesting topology called the hypercube.The hypercube topology has two nodes along each dimension and log p dimensions.The construction of a hypercube is illustrated in Fiqure 2.17.A zero- dimensional hypercube consists of 20,i.e.,one node.A one-dimensional hypercube is constructed from two zero-dimensional hypercubes by connecting them.A two-dimensional hypercube of four nodes is constructed from two one-dimensional hypercubes by connecting corresponding nodes.In general a d- dimensional hypercube is constructed by connecting corresponding nodes of two(d-1)dimensional hypercubes.Fiqure 2.17 illustrates this for up to 16 nodes in a 4-D hypercube. Figure 2.17.Construction of hypercubes from hypercubes of lower dimension. 100 110 00 010 101 111 011 0-D hypercube 1-D hypercube 2-D hypercube 3-D hypercube 0100 0110 1100 1110 000 0010 1010 1000 0101 0111 1101 1111 3001 0011 1001 1011 4-D hypercube It is useful to derive a numbering scheme for nodes in a hypercube.A simple numbering scheme can be derived from the construction of a hypercube.As illustrated in Fiqure 2.17,if we have a numbering of two subcubes of p/2 nodes,we can derive a numbering scheme for the cube of p nodes by prefixing the labels of one of the subcubes with a "O"and the labels of the other subcube with a "1".This numbering scheme has the useful property that the minimum distance between two nodes is given by the number of bits that
The general class of k-d meshes refers to the class of topologies consisting of d dimensions with k nodes along each dimension. Just as a linear array forms one extreme of the k-d mesh family, the other extreme is formed by an interesting topology called the hypercube. The hypercube topology has two nodes along each dimension and log p dimensions. The construction of a hypercube is illustrated in Figure 2.17. A zerodimensional hypercube consists of 20, i.e., one node. A one-dimensional hypercube is constructed from two zero-dimensional hypercubes by connecting them. A two-dimensional hypercube of four nodes is constructed from two one-dimensional hypercubes by connecting corresponding nodes. In general a ddimensional hypercube is constructed by connecting corresponding nodes of two (d - 1) dimensional hypercubes. Figure 2.17 illustrates this for up to 16 nodes in a 4-D hypercube. Figure 2.17. Construction of hypercubes from hypercubes of lower dimension. It is useful to derive a numbering scheme for nodes in a hypercube. A simple numbering scheme can be derived from the construction of a hypercube. As illustrated in Figure 2.17, if we have a numbering of two subcubes of p/2 nodes, we can derive a numbering scheme for the cube of p nodes by prefixing the labels of one of the subcubes with a "0" and the labels of the other subcube with a "1". This numbering scheme has the useful property that the minimum distance between two nodes is given by the number of bits that Page 22 of 56 file://C:\Documents and Settings\CIC000\Local Settings\Temp\~hh1AB0.htm 2/8/2013
are different in the two labels.For example,nodes labeled 0110 and 0101 are two links apart,since they differ at two bit positions.This property is useful for deriving a number of parallel algorithms for the hypercube architecture. Tree-Based Networks A tree network is one in which there is only one path between any pair of nodes.Both linear arrays and star-connected networks are special cases of tree networks.Figure 2.18 shows networks based on complete binary trees.Static tree networks have a processing element at each node of the tree(Figure 2.18(a)).Tree networks also have a dynamic counterpart.In a dynamic tree network,nodes at intermediate levels are switching nodes and the leaf nodes are processing elements(Fiqure 2.18(b)). Figure 2.18.Complete binary tree networks:(a)a static tree network;and(b)a dynamic tree network. Processing nodes Switching nodes 6 To route a message in a tree,the source node sends the message up the tree until it reaches the node at the root of the smallest subtree containing both the source and destination nodes.Then the message is routed down the tree towards the destination node. Tree networks suffer from a communication bottleneck at higher levels of the tree.For example,when many nodes in the left subtree of a node communicate with nodes in the right subtree,the root node must handle all the messages.This problem can be alleviated in dynamic tree networks by increasing the number of communication links and switching nodes closer to the root.This network,also called a fat tree,is illustrated in Fiqure 2.19. Figure 2.19.A fat tree network of 16 processing nodes. 2.4.4 Evaluating Static Interconnection Networks We now discuss various criteria used to characterize the cost and performance of static interconnection networks.We use these criteria to evaluate static networks introduced in the previous subsection. Diameter The diameter of a network is the maximum distance between any two processing nodes in the network.The distance between two processing nodes is defined as the shortest path (in terms of number of links)between them.The diameter of a completely-connected network is one,and that of a star- connected network is two.The diameter of a ring network is p/2].The diameter of a two-dimensional
are different in the two labels. For example, nodes labeled 0110 and 0101 are two links apart, since they differ at two bit positions. This property is useful for deriving a number of parallel algorithms for the hypercube architecture. Tree-Based Networks A tree network is one in which there is only one path between any pair of nodes. Both linear arrays and star-connected networks are special cases of tree networks. Figure 2.18 shows networks based on complete binary trees. Static tree networks have a processing element at each node of the tree (Figure 2.18(a)). Tree networks also have a dynamic counterpart. In a dynamic tree network, nodes at intermediate levels are switching nodes and the leaf nodes are processing elements (Figure 2.18(b)). Figure 2.18. Complete binary tree networks: (a) a static tree network; and (b) a dynamic tree network. To route a message in a tree, the source node sends the message up the tree until it reaches the node at the root of the smallest subtree containing both the source and destination nodes. Then the message is routed down the tree towards the destination node. Tree networks suffer from a communication bottleneck at higher levels of the tree. For example, when many nodes in the left subtree of a node communicate with nodes in the right subtree, the root node must handle all the messages. This problem can be alleviated in dynamic tree networks by increasing the number of communication links and switching nodes closer to the root. This network, also called a fat tree, is illustrated in Figure 2.19. Figure 2.19. A fat tree network of 16 processing nodes. 2.4.4 Evaluating Static Interconnection Networks We now discuss various criteria used to characterize the cost and performance of static interconnection networks. We use these criteria to evaluate static networks introduced in the previous subsection. Diameter The diameter of a network is the maximum distance between any two processing nodes in the network. The distance between two processing nodes is defined as the shortest path (in terms of number of links) between them. The diameter of a completely-connected network is one, and that of a starconnected network is two. The diameter of a ring network is . The diameter of a two-dimensional Page 23 of 56 file://C:\Documents and Settings\CIC000\Local Settings\Temp\~hh1AB0.htm 2/8/2013