A Modern Application:Distributed Web Caches The Web Web Cache 1 Web Cache 2 Web Cache 3 16
16 A Modern Application: Distributed Web Caches Web Cache 1 Web Cache 2 Web Cache 3
Web Caching Summary Cache:[Fan,Cao,Almeida,Broder] If local caches know each other's content... ..try local cache before going out to Web Sending/updating lists of URLs too expensive. Solution:use Bloom filters. ▣False positives Local requests go unfulfilled. ■ Small cost,big potential gain 17
17 Web Caching Summary Cache: [Fan, Cao, Almeida, & Broder] If local caches know each other’s content... …try local cache before going out to Web Sending/updating lists of URLs too expensive. Solution: use Bloom filters. False positives Local requests go unfulfilled. Small cost, big potential gain
2.Compressed Bloom Filters Insight:Bloom filter is not just a data structure,it is also a message. If the Bloom filter is a message,worthwhile to compress it. Compressing bit vectors is easy. Arithmetic coding gets close to entropy. Can Bloom filters be compressed? 18
18 2. Compressed Bloom Filters Insight: Bloom filter is not just a data structure, it is also a message. If the Bloom filter is a message, worthwhile to compress it. Compressing bit vectors is easy. Arithmetic coding gets close to entropy. Can Bloom filters be compressed?
Optimization,then Compression Optimize to minimize false positive. p=Pr[cell is empty ]=(1-1/m)kn e-knim f=Pr[false pos]=(1-p)(1-e-kon/m)k k=(mln 2)/n is optimal At k=m (In 2)/n,p=1/2. Bloom filter looks like a random string. Can't compress it. 19
19 Optimization, then Compression Optimize to minimize false positive. At k = m (ln 2) /n, p = 1/2. Bloom filter looks like a random string. Can’t compress it. kn kn m p m e / Pr[cell is empty ] ( 1 1 / ) k kn m k f Pr[false pos ] ( 1 p ) ( 1 e ) / k ( mln 2 ) / n is optimal
Compressed Bloom Filters "Error optimized"Bloom filter is V2 full of 0's,1's. Compression would not help. But this optimization for a fixed filter size m. Instead optimize the false positives for a fixed number of transmitted bits. Filter size m can be larger,but mostly 0's Larger,sparser Bloom filter can be compressed. Useful if transmission cost is bottleneck. Claim:transmission cost limiting factor Updates happen frequently. Machine memory is cheap. 20
20 Compressed Bloom Filters “Error optimized” Bloom filter is ½ full of 0’s, 1’s. Compression would not help. But this optimization for a fixed filter size m. Instead optimize the false positives for a fixed number of transmitted bits. Filter size m can be larger, but mostly 0’s Larger, sparser Bloom filter can be compressed. Useful if transmission cost is bottleneck. Claim: transmission cost limiting factor. Updates happen frequently. Machine memory is cheap