Distributed Query processing Plans By centralized optimization Z ENAME(E DA ENO O-TILE-"'manager (G) Two distributed query processing plans
11 Distributed Query Processing Plans By centralized optimization, Two distributed query processing plans 11 (E (G)) ENAME ENO TITLE = "manager
Distributed Query Plan I Plan To transport all segments to query site and execute there Site 5 Result=(EMP1∪EMP2)凶 ENO OTITLE“ manager"(ASG1∪ASG2) ASG ASG EMPI EMP Site 1 Site 2 Site 3 Site 4 This causes too much network traffic, very costly
12 Distributed Query Plan I Result = (EMP1 EMP2 ) ENO TITLE=“manager” (ASG1 ASG2 ) Site 1 ASG1 ASG2 Site 2 Site 3 EMP1 EMP2 Site 4 Plan I: To transport all segments to query site and execute there. This causes too much network traffic, very costly. Site 5 ⋈
Distributed Query Plan II Plan Il(optimized) Site 5 Result=(EMP1∪EMP2”) EMP EMPO Site 3 Site 4 EMP1,=EMP1凶OASG EMP2=EMP2凶 ENO ASG2 ASGI ASG Site 1 Site 2 ASG1=O TITLE-"manager"(ASG1 ASG2'=O TITLE manager"(ASG l
13 Distributed Query Plan II Plan II (Optimized): Site 1 ASG2 ASG ’ 1 ’ EMP2 EMP1 ’ ’ ASG1 ’ = TITLE=“manager” (ASG1 ) Site 2 ASG2 ’ = TITLE=“manager” (ASG2 ) Site 3 EMP1 ’ = EMP1 ENO ASG1 ’ Site 4 EMP2 ’ = EMP2 ENO ASG2 ’ Result = (EMP1 ’ EMP2 ’) Site 5 ⋈ ⋈
Costs of the Two plans Assume size(EMP)=400, size(ASG)=1000, 20 tuples with TITLE="manager tuple access cost=1 unit; tuple transfer cost =10 units ASG and EMP are locally clustered on attribute TITLE and ENO, respectively Plan 1 Transfer EMP to site 5: 400*tuple transfer cost 4000 Transfer ASG to site 5: 1000*tuple transfer cost 10000 Produce ASG: 1000*tuple access cost 1000 Join EMP and ASG: 400*20 tuple access cost 8000 Total cost 23.000 Plan 2 Produce ASG:(10+10)*tuple access cost Transfer ASG to the sites of EMP:(10+10) *tuple transfer cost 200 Produce EMP:(10+10)*tuple access cost* 2 Transfer EMP to result site: (10+ 10)*tuple transfer cost 200 Total cost 460
14 Costs of the Two Plans Assume: size(EMP)=400, size(ASG)=1000, 20 tuples with TITLE=“manager” tuple access cost = 1 unit; tuple transfer cost = 10 units ASG and EMP are locally clustered on attribute TITLE and ENO, respectively. Plan 1 Transfer EMP to site 5: 400*tuple transfer cost 4000 Transfer ASG to site 5: 1000*tuple transfer cost 10000 Produce ASG’: 1000*tuple access cost 1000 Join EMP and ASG’: 400*20*tuple access cost 8000 Total cost 23,000 Plan 2 Produce ASG’: (10+10)*tuple access cost 20 Transfer ASG’ to the sites of EMP: (10+10)*tuple transfer cost 200 Produce EMP’: (10+10)*tuple access cost * 2 40 Transfer EMP’ to result site: (10+10)*tuple transfer cost 200 Total cost 460
Query Optimization Objectives Minimize a cost function yO cost CPU cost communication cost These might have different weights in different distributed environments Can also maximize throughout
15 Query Optimization Objectives Minimize a cost function I/O cost + CPU cost + communication cost These might have different weights in different distributed environments Can also maximize throughout