Communication cost Wide area network Communication cost will dominate Low bandwidth Low speed High protocol overhead Most algorithms ignore all other cost components Local area network Communication cost not that dominate Total cost function should be considered
16 Communication Cost Wide area network Communication cost will dominate - Low bandwidth - Low speed - High protocol overhead Most algorithms ignore all other cost components Local area network Communication cost not that dominate Total cost function should be considered
Complexity of Relational algebra Operations Measured by cardinality n and tuples are sorted on comparison attributes Operation Complexity o, (without duplicate elimination) O(n) (with duplicate elimination), GROUP O(nlogn) Join, semijoin, Division,∩,∪, O(nlogn) Cartesian-Product X O(n2)
17 Complexity of Relational Algebra Operations Measured by cardinality n and tuples are sorted on comparison attributes O(n) O(nlogn) O(nlogn) O(n 2 ) Join, Semijoin, Division, , , − , (without duplicate elimination) (with duplicate elimination), GROUP Cartesian-Product X
Types of Query Optimization Exhaustive search ◆Cost- based ◆ Optimal Combinatorial complexity in the number of relations Workable for small solution spaces Heuristics ◆ Not optimal Re-group common sub-expressions Perform selection and projection(o,)first Replace a join by a series of semijoins reorder operations to reduce intermediate relation size Optimize individual operations 18
18 Types of Query Optimization Exhaustive search Cost-based Optimal Combinatorial complexity in the number of relations Workable for small solution spaces Heuristics Not optimal Re-group common sub-expressions Perform selection and projection ( ) first Replace a join by a series of semijoins Reorder operations to reduce intermediate relation size Optimize individual operations ,
Query Optimization granularity Single query at a time Cannot use common intermediate results Multiple queries at a time Efficient if many similar queries 4 Decision space Is much larger
19 Query Optimization Granularity Single query at a time Cannot use common intermediate results Multiple queries at a time Efficient if many similar queries Decision space is much larger
Query Optimization Timing Static Do it at compilation time by using statistics, appropriate for exhaustive search optimized once but executed many times Difficult to estimate the size of the intermediate results o Can amortize over many executions Dynamic Do it at execution time, accurate about the size of the intermediate results, repeated for every execution expensIve 20
20 Query Optimization Timing Static Do it at compilation time by using statistics, appropriate for exhaustive search, optimized once, but executed many times. Difficult to estimate the size of the intermediate results Can amortize over many executions Dynamic Do it at execution time, accurate about the size of the intermediate results, repeated for every execution, expensive