Computing Approximate Quantiles Using Samples Problem: Compute element with rank r in stream Simple sampling- based algorithm Sort sample s of stream and return element in position rs/n in sample(s is sample size) With sample of size O(log( ) possible to show that rank of E returned element is in [r-an, r+ an] with probability at least1-8 elements from s-is no more than exp 2se? ontains greater than rs/n Hoeffding s Inequality: probability that s stream: -a1 r+81 Sample s rs/n [CMN98GMP97] propose additional sampling- based methods Garofalakis, Gehrke, Rastogi, VLDB02 #21
Garofalakis, Gehrke, Rastogi, VLDB’02 # 21 Computing Approximate Quantiles Using Samples • Problem: Compute element with rank r in stream • Simple sampling-based algorithm – Sort sample S of stream and return element in position rs/n in sample (s is sample size) – With sample of size , possible to show that rank of returned element is in with probability at least • Hoeffding’s Inequality: probability that S contains greater than rs/n elements from is no more than • [CMN98][GMP97] propose additional sampling-based methods )) 1 log( 1 ( 2 O [r −n,r +n] 1− rs/n Sample S: Stream: r −n r − S r +n − S 2 2 exp− s
Algorithms for Computing GORNELL Approximate quantiles [MRL981,[MRL991,[GK01] propose sophisticated algorithms for computing stream element with rank in [r-an, r+an Space complexity proportional to- instead of E [MRL98],[MRL99] Probabilistic algorithm with space complexity oc"(an) Combined with sampling, space complexity becomes o(kog 6g(3) [GK01] Deterministic algorithm with space complexity o(log( En) Garofalakis, Gehrke, Rastogi, VLDB02 #22
Garofalakis, Gehrke, Rastogi, VLDB’02 # 22 Algorithms for Computing Approximate Quantiles • [MRL98],[MRL99],[GK01] propose sophisticated algorithms for computing stream element with rank in – Space complexity proportional to instead of • [MRL98], [MRL99] – Probabilistic algorithm with space complexity – Combined with sampling, space complexity becomes • [GK01] – Deterministic algorithm with space complexity ))) 1 log( 1 log ( 1 ( 2 O log ( )) 1 ( 2 O n log( )) 1 O( n [r −n,r +n] 2 1 1
Single-Pass Quantile Computation Algorithm [MRL 98 Split memory M into b buffers of size k(M=bk) For each successive set of k elements in stream If free buffer b exists insert k elements into b, set level of b to o Else merge two buffers b and b' at same level l output result of merge into b set level of b to 1+1 insert k elements into b, set level of b to o Output element in position r after making 2 copies of each element in final buffer and sorting them Merge operation (input buffers b and B aT eve Make 2 copies of each element in b and B' Sort copies Output elements in positions j2+2 in sorted sequence, j=0,.k-1 Garofalakis, Gehrke, Rastogi, VLDB02 #23
Garofalakis, Gehrke, Rastogi, VLDB’02 # 23 Single-Pass Quantile Computation Algorithm [MRL 98] • Split memory M into b buffers of size k (M = bk) • For each successive set of k elements in stream – If free buffer B exists • insert k elements into B, set level of B to 0 – Else • merge two buffers B and B’ at same level l • output result of merge into B’, set level of B’ to l+1 • insert k elements into B, set level of B to 0 • Output element in position r after making copies of each element in final buffer and sorting them • Merge operation (input buffers B and B’ at level l) – Make copies of each element in B and B’ – Sort copies – Output elements in positions in sorted sequence, j=0, ..., k-1 l 2 l 2 l l j2 2 1 + +
Single-Pass Algorithm(Example GORNELL M=9,b=3,k=3,r=10 111 3图 557788137 level= 2 国2圆5园9[137 8 level= 1 935 658 49 1level= Computed quantile(r=10) 111133337777 Garofalakis, Gehrke, Rastogi, VLDB'02 #24
Garofalakis, Gehrke, Rastogi, VLDB’02 # 24 Single-Pass Algorithm (Example) • M=9, b=3, k=3, r =10 • Computed quantile (r=10) 9 3 5 1 1 1 1 3 3 3 3 7 7 7 7 2 7 1 6 5 8 4 9 1 level = 0 level = 2 1 3 7 level = 1 1 3 7 1 2 3 5 7 9 1 5 8 1 1 1 1 3 3 5 5 7 7 8 8
Analysis of algorithm 人入 2 Number of elements that are neither definitely small, nor definately ge(b-2)2 Algorithm returns element with rank r, where r(b-2)22≤rsr+(b-2)22 Choose smallest b such that k2>n and bk= Mrofalakis Gehrke Rastogi VLDB 02 #25
Garofalakis, Gehrke, Rastogi, VLDB’02 # 25 Analysis of Algorithm • Number of elements that are neither definitely small, nor definately large: • Algorithm returns element with rank r’, where • Choose smallest b such that and bk = M b 2 ( 2)2 − − b b 2 2 ( 2)2 ( 2)2 − − − − + − b b r b r r b 1 2 b− k n b −1 2