I/O Parallelism (Cont.) Partitioning techniques(cont.): 。Range partitioning: Choose an attribute as the partitioning attribute. A partitioning vector [vo,v1,...vn-2]is chosen. Let v be the partitioning attribute value of a tuple.Tuples such that v< vit1 go to node /1.Tuples with v<vo go to node 0 and tuples with v> Vn-2 go to node n-1. E.g.,with a partitioning vector [5,11] a tuple with partitioning attribute value of 2 will go to node 0, a tuple with value 8 will go to node 1,while a tuple with value 20 will go to node2. Database System Concepts-7th Edition 21.7 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 21.7 ©Silberschatz, Korth and Sudarshan th Edition I/O Parallelism (Cont.) Partitioning techniques (cont.): ▪ Range partitioning: • Choose an attribute as the partitioning attribute. • A partitioning vector [vo , v1 , ..., vn-2 ] is chosen. • Let v be the partitioning attribute value of a tuple. Tuples such that vi vi+1 go to node I + 1. Tuples with v < v0 go to node 0 and tuples with v vn-2 go to node n-1. E.g., with a partitioning vector [5,11] ▪ a tuple with partitioning attribute value of 2 will go to node 0, ▪ a tuple with value 8 will go to node 1, while ▪ a tuple with value 20 will go to node2
Comparison of Partitioning Techniques Evaluate how well partitioning techniques support the following types of data access: 1.Scanning the entire relation. 2.Locating a tuple associatively-point queries. ■E.g,rA=25. 3. Locating all tuples such that the value of a given attribute lies within a specified range-range queries. ■E.g,10≤r.A<25. Do above evaluation for each of ·Round robin ·Hash partitioning ·Range partitioning Database System Concepts-7th Edition 21.8 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 21.8 ©Silberschatz, Korth and Sudarshan th Edition Comparison of Partitioning Techniques ▪ Evaluate how well partitioning techniques support the following types of data access: 1. Scanning the entire relation. 2. Locating a tuple associatively – point queries. ▪ E.g., r.A = 25. 3. Locating all tuples such that the value of a given attribute lies within a specified range – range queries. ▪ E.g., 10 r.A < 25. ▪ Do above evaluation for each of • Round robin • Hash partitioning • Range partitioning
Comparison of Partitioning Techniques(Cont.) Round robin: Best suited for sequential scan of entire relation on each query. All nodes have almost an equal number of tuples;retrieval work is thus well balanced between nodes. All queries must be processed at all nodes Hash partitioning: Good for sequential access Assuming hash function is good,and partitioning attributes form a key, tuples will be equally distributed between nodes Good for point queries on partitioning attribute Can lookup single node,leaving others available for answering other queries. Range queries inefficient,must be processed at all nodes Database System Concepts-7th Edition 21.9 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 21.9 ©Silberschatz, Korth and Sudarshan th Edition Comparison of Partitioning Techniques (Cont.) Round robin: ▪ Best suited for sequential scan of entire relation on each query. • All nodes have almost an equal number of tuples; retrieval work is thus well balanced between nodes. ▪ All queries must be processed at all nodes Hash partitioning: ▪ Good for sequential access • Assuming hash function is good, and partitioning attributes form a key, tuples will be equally distributed between nodes ▪ Good for point queries on partitioning attribute • Can lookup single node, leaving others available for answering other queries. ▪ Range queries inefficient, must be processed at all nodes
Comparison of Partitioning Techniques(Cont.) Range partitioning: Provides data clustering by partitioning attribute value. Good for sequential access Good for point queries on partitioning attribute:only one node needs to be accessed. For range queries on partitioning attribute,one to a few nodes may need to be accessed Remaining nodes are available for other queries. Good if result tuples are from one to a few blocks. But if many blocks are to be fetched,they are still fetched from one to a few nodes,and potential parallelism in disk access is wasted Example of execution skew. Database System Concepts-7th Edition 21.10 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 21.10 ©Silberschatz, Korth and Sudarshan th Edition Comparison of Partitioning Techniques (Cont.) Range partitioning: ▪ Provides data clustering by partitioning attribute value. • Good for sequential access • Good for point queries on partitioning attribute: only one node needs to be accessed. ▪ For range queries on partitioning attribute, one to a few nodes may need to be accessed • Remaining nodes are available for other queries. • Good if result tuples are from one to a few blocks. • But if many blocks are to be fetched, they are still fetched from one to a few nodes, and potential parallelism in disk access is wasted ▪ Example of execution skew
Handling Small Relations Partitioning not useful for small relations which fit into a single disk block or a small number of disk blocks Instead,assign the relation to a single node,or Replicate relation at all nodes For medium sized relations,choose how many nodes to partition across based on size of relation Large relations typically partitioned across all available nodes. Database System Concepts-7th Edition 21.11 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 21.11 ©Silberschatz, Korth and Sudarshan th Edition Handling Small Relations ▪ Partitioning not useful for small relations which fit into a single disk block or a small number of disk blocks • Instead, assign the relation to a single node, or • Replicate relation at all nodes ▪ For medium sized relations, choose how many nodes to partition across based on size of relation ▪ Large relations typically partitioned across all available nodes