Alternative Approach for Bloom Filters Folklore Bloom filter construction. ■ Recall:Given a set S={x3...x on a universe U,want to answer membership queries. ■ Method:Find an n-cell perfect hash function for S. Maps set of n elements to n cells in a 1-1 manner. ■ Then keepog2(1/bit fingerprint of item in each cell. Lookups have false positive & ■ Advantage:each bit/item reduces false positives by a factor of 1/2,vs In 2 for a standard Bloom filter. ▣Negatives: Perfect hash functions non-trivial to find. Cannot handle on-line insertions. 11
11 Alternative Approach for Bloom Filters Folklore Bloom filter construction. Recall: Given a set S = { x1,x 2,x 3,… x n } on a universe U, want to answer membership queries. Method: Find an n-cell perfect hash function for S. Maps set of n elements to n cells in a 1-1 manner. Then keep bit fingerprint of item in each cell. Lookups have false positive < . Advantage: each bit/item reduces false positives by a factor of 1/2, vs ln 2 for a standard Bloom filter. Negatives: Perfect hash functions non-trivial to find. Cannot handle on-line insertions. log 2 ( 1 / )
Perfect Hashing Approach Element 1 Element 2 Element 3 Element 4 Element 5 Fingerprint(4Fingerprint(5Fingerprint(2Fingerprint(1Fingerprint(3) 12
12 Perfect Hashing Approach Element 1 Element 2 Element 3 Element 4 Element 5 Fingerprint(4)Fingerprint(5)Fingerprint(2)Fingerprint(1)Fingerprint(3)
Classic Uses of BF:Spell-Checking Once upon a time,memory was scarce... /usr/dict/words -about 210KB,25K words ▣Use25 KB Bloom filter ■8 bits per word. Optimal 5 hash functions. Probability of false positive about 2% False positive accept a misspelled word BFs still used to deal with list of words Password security [Spafford 1992],[Manber Wu,94] Keyword driven ads in web search engines,etc 13
13 Classic Uses of BF: Spell-Checking Once upon a time, memory was scarce... /usr/dict/words -- about 210KB, 25K words Use 25 KB Bloom filter 8 bits per word. Optimal 5 hash functions. Probability of false positive about 2% False positive = accept a misspelled word BFs still used to deal with list of words Password security [Spafford 1992], [Manber & Wu, 94] Keyword driven ads in web search engines, etc
Classic Uses of BF:Data Bases Join:Combine two tables with a common domain into a single table Semi-join:A join in distributed DBs in which only the joining attribute from one site is transmitted to the other site and used for selection.The selected records are sent back. Bloom-join:A semi-join where we send only a BF of the joining attribute. 14
14 Classic Uses of BF: Data Bases Join: Combine two tables with a common domain into a single table Semi-join: A join in distributed DBs in which only the joining attribute from one site is transmitted to the other site and used for selection. The selected records are sent back. Bloom-join: A semi-join where we send only a BF of the joining attribute
Example Empl Salary Addr City City Cost of living John 60K New York New York 60K George 30K New York Chicago 55K Moe 25K Topeka Topeka 30K Alice 70K Chicago Raul 30K Chicago Create a table of all employees that make 40K and live in city where COL 50K. Empl Salary Addr City COL Join:send (City,COL)for COL>50.Semi-join: send just(City). Bloom-join:send a Bloom filter for all cities with C0L>50 15
15 Example Empl Salary Addr City John 60K … New York George 30K … New York Moe 25K … Topeka Alice 70K … Chicago Raul 30K Chicago City Cost of living New York 60K Chicago 55K Topeka 30K Create a table of all employees that make < 40K and live in city where COL > 50K. Join: send (City, COL) for COL > 50. Semi-join: send just (City). Bloom-join: send a Bloom filter for all cities with COL > 50 Empl Salary Addr City COL