Types of Skew Data-distribution skew:some nodes have many tuples,while others may have fewer tuples.Could occur due to Attribute-value skew. Some partitioning-attribute values appear in many tuples All the tuples with the same value for the partitioning attribute end up in the same partition. Can occur with range-partitioning and hash-partitioning. ·Partition skew. Imbalance,even without attribute -value skew Badly chosen range-partition vector may assign too many tuples to some partitions and too few to others. Less likely with hash-partitioning Database System Concepts-7th Edition 21.12 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 21.12 ©Silberschatz, Korth and Sudarshan th Edition Types of Skew ▪ Data-distribution skew: some nodes have many tuples, while others may have fewer tuples. Could occur due to • Attribute-value skew. ▪ Some partitioning-attribute values appear in many tuples ▪ All the tuples with the same value for the partitioning attribute end up in the same partition. ▪ Can occur with range-partitioning and hash-partitioning. • Partition skew. ▪ Imbalance, even without attribute –value skew ▪ Badly chosen range-partition vector may assign too many tuples to some partitions and too few to others. ▪ Less likely with hash-partitioning
Types of Skew(Cont.) Note that execution skew can occur even without data distribution skew E.g.relation range-partitioned on date,and most queries access tuples with recent dates Data-distribution skew can be avoided with range-partitioning by creating balanced range-partitioning vectors We assume for now that partitioning is static,that is partitioning vector is created once and not changed Any change requires repartitioning Dynamic partitioning once allows partition vector to be changed in a continuous manner More on this later Database System Concepts-7th Edition 21.13 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 21.13 ©Silberschatz, Korth and Sudarshan th Edition Types of Skew (Cont.) ▪ Note that execution skew can occur even without data distribution skew • E.g. relation range-partitioned on date, and most queries access tuples with recent dates ▪ Data-distribution skew can be avoided with range-partitioning by creating balanced range-partitioning vectors ▪ We assume for now that partitioning is static, that is partitioning vector is created once and not changed • Any change requires repartitioning • Dynamic partitioning once allows partition vector to be changed in a continuous manner ▪ More on this later
Handling Skew in Range-Partitioning To create a balanced partitioning vector Sort the relation on the partitioning attribute. Construct the partition vector by scanning the relation in sorted order as follows. After every 1/nth of the relation has been read,the value of the partitioning attribute of the next tuple is added to the partition vector. n denotes the number of partitions to be constructed. Imbalances can result if duplicates are present in partitioning attributes. ■To reduce cost Partitioning vector can be created using a random sample of tuples Alternatively histograms can be used to create the partitioning vector Database System Concepts-7th Edition 21.14 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 21.14 ©Silberschatz, Korth and Sudarshan th Edition Handling Skew in Range-Partitioning ▪ To create a balanced partitioning vector • Sort the relation on the partitioning attribute. • Construct the partition vector by scanning the relation in sorted order as follows. ▪ After every 1/n th of the relation has been read, the value of the partitioning attribute of the next tuple is added to the partition vector. • n denotes the number of partitions to be constructed. • Imbalances can result if duplicates are present in partitioning attributes. ▪ To reduce cost • Partitioning vector can be created using a random sample of tuples • Alternatively histograms can be used to create the partitioning vector
Histograms Histogram on attribute age of relation person 50 40 Kouanbal 30 20 10 Equi-width histograms 1-5 6-1011-1516-2021-25 Equi-depth histograms value break up range such that each range has (approximately)the same number of tuples ·E.9.(4,8,14,19) Assume uniform distribution within each range of the histogram Create partitioning vector for required number of partitions based on histogram Database System Concepts-7th Edition 21.15 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 21.15 ©Silberschatz, Korth and Sudarshan th Edition Histograms ▪ Histogram on attribute age of relation person ▪ Equi-width histograms ▪ Equi-depth histograms • break up range such that each range has (approximately) the same number of tuples • E.g. (4, 8, 14, 19) ▪ Assume uniform distribution within each range of the histogram ▪ Create partitioning vector for required number of partitions based on histogram value frequency 50 40 30 20 10 1–5 6–10 11–15 16–20 21–25