Computing Approximate Quantiles wo gt:O Pomm GKo1 Synopsis structure S: sequence of tuples t, 112……s (v,8s Sorte (vi-d min(vi) rma(v-p) sequence rmin(v)/rmx(v): min/max rank of v g: number of stream elements covered by t Invariants: g1+△1≤27 n()=∑g, +△ min Garofalakis, Gehrke, Rastogi, VLDB02 #26
Garofalakis, Gehrke, Rastogi, VLDB’02 # 26 Computing Approximate Quantiles [GK01] • Synopsis structure S: sequence of tuples • : min/max rank of • : number of stream elements covered by • Invariants: ( , , ) 1 1 1 v g ( , , ) s s s v g Sorted sequence ( ) min i−1 r v ( ) max i ( ) r v max i−1 ( ) r v min i r v i g i s t ,t ,....,t 1 2 1 t ( , , ) i−1 i−1 i−1 v g ( , , ) i i i v g i−1 t i t s t i t i g ( )/ ( ) min i max i r v r v i v gi + i 2n i j i i j j i r vi = g j r v = g + ( ) , ( ) min max
Computing Quantile from Synopsis wm m:O osa Theorem: Let i be the max index such that rmaxvi-dsr+an. Then r-8 n rank(v1)≤r+8n min (vi-d min(vi) P(v max 81+△1≤2m -a1 +n≤r(v -8 mIn Garofalakis, Gehrke, Rastogi, VLDB02 #27
Garofalakis, Gehrke, Rastogi, VLDB’02 # 27 Computing Quantile from Synopsis • Theorem: Let i be the max index such that . Then, ( ) min i−1 r v ( ) max i ( ) r v max i−1 ( ) r v min i r v gi + i 2n r −n r +n ( , , ) 1 1 1 v g ( , , ) i−1 i−1 i−1 v g ( , , ) i i i v g ( , , ) s s s v g 1 t i−1 t i t s t r max (vi−1 ) r +n r −n rank(vi−1 ) r +n ( ) max i r v r min (vi−1 ) r −n
Inserting a Stream Element into the synopsis Let v be the value of the n+ h stream element and t, and t, be tuples n s such that v1≤v<v Inserted tuple ith value :△。。·厘g△12),g.△ g min min min max max +△.≤2a7 Maintains invariants g mn (vi)-Imin (vi-D) max elements per△vaue A, for a tuple is never modif ied, after it is inserted Garofalakis, Gehrke, Rastogi, VLDB02 #28
Garofalakis, Gehrke, Rastogi, VLDB’02 # 28 Inserting a Stream Element into the Synopsis • Let v be the value of the stream element, and and be tuples in S such that • Maintains invariants • elements per value – for a tuple is never modified, after it is inserted ( ) min i−1 r v ( ) max i ( ) r v min i r v gi + i 2n (v,1,2n) ( ) min r v ( ) max r v 1 1 1 ( ) ( ) ( ) ( ) i min i min i 1 i max i min i g = r v − r v = r v − r v − Inserted tuple with value v ( , , ) 1 1 1 v g ( , , ) i−1 i−1 i−1 v g 1 t i−1 t ( , , ) i i i v g ( , , ) s s s v g i t s t i−1 t i t i i v v v −1 2 1 i i th n +1
Overview of Algorithm Analysis GORNELL Partition the A, values into log( 2En)"bands Remember: we need to maintain 8i +A, shen =>tuples in higher bands have more capacity (=max. no. of observations that can be counted in gi) Periodically(every Observations)compress the quantile synopsis in a right-to-left pass E Collapse ti into t(i+1)if: (a)t(i +1)is at a higherA -band than ti, and (b)8+g++A<2an + Maintain our error invariant (v) rmin (vD) (8+81,△4) ∑8k+g+△1≤2mn Theorem: Maximum number of alive"tuples from each A-band is lI 28 Overall space complexity: log( 2En) 28 Garofalakis, Gehrke, Rastogi, VLDB02 #29
Garofalakis, Gehrke, Rastogi, VLDB’02 # 29 Overview of Algorithm & Analysis • Partition the values into “bands” – Remember: we need to maintain => tuples in higher bands have more capacity ( = max. no. of observations that can be counted in ) • Periodically (every observations) compress the quantile synopsis in a right-to-left pass – Collapse ti into t(i+1) if: (a) t(i+1) is at a higher -band than ti, and (b) 2 1 i log( 2n) gi + i 2n i g gi + gi+1 + i+1 2n ( , , ) i+1 i+1 i+1 v g i+1 t ( , , ) i i i v g i t ( , , ) i+1 i + i+1 i+1 v g g i+1 t j j i i i s S : t t .....t t .....t t t ......t 1 2 +1 −1 +1 ( ) min j r v ( ) min i+1 ( ) r v min i r v i+1 gk + g + i+1 2n Maintain our error invariant • Theorem: Maximum number of “alive” tuples from each -band is – Overall space complexity: 2 11 log( 2 ) 2 11 n
Bands A, values split into log( 2en)bands size of band a s 2(adjusted as n increases Bands log( 2en) log( 2En)-1 C 21 △.012 27 Higher bands have higher capacities(due to smaller A, values Maximum value of 4 in band a: (2an-2) Number of elements covered by tuples with bands in [o, ..., a] 2 elements per△, value Garofalakis, Gehrke, Rastogi, VLDB'02 #30
Garofalakis, Gehrke, Rastogi, VLDB’02 # 30 Bands • values split into bands • size of band (adjusted as n increases) • Higher bands have higher capacities (due to smaller values) • Maximum value of in band : • Number of elements covered by tuples with bands in [0, ..., ]: – elements per value i 0 2n 1 2 Bands: log( 2n) 2 1 log( 2n) −1 2 i log( 2n) 2 i i (2 2 ) −1 − n 2 2 1 i