Bloom Filter for Network Security Haipeng Dai haipengdai@nju.edu.cn 313 CS Building Department of Computer Science and Technology Nanjing University
Bloom Filter for Network Security Haipeng Dai haipengdai@nju.edu.cn 313 CS Building Department of Computer Science and Technology Nanjing University
Bloom Filters Given a set S={xi,x,x3,,xn}on a universe☑, want to answer queries of the form: sy∈S. Bloom filter provides an answer in -“Constant”time(time to hash) -Small amount of space. -But with some probability of being wrong. Alternative to hashing with interesting tradeoffs. 2
2 Bloom Filters Given a set S = {x1, x2, x3, …, xn} on a universe U, want to answer queries of the form: Bloom filter provides an answer in ─ “Constant” time (time to hash). ─ Small amount of space. ─ But with some probability of being wrong. Alternative to hashing with interesting tradeoffs. Is y ∈ S
Bloom Filters Start with an m bit array,filled with 0s. B 000 0 0 0 0 0 10 0 0 0 0 Hash each item x;in Sk times.If H(x)=a,set B[a]=1. 0100101001110110 B To check if y is in S,check B at H(y).All k values must be 1. B 0100101001110110 Possible to have a false positive;all k values are 1,but y is not in S. B0100101001110110 n items m cn bits k hash functions 3
3 Bloom Filters Start with an m bit array, filled with 0s. Hash each item xj in S k times. If Hi (xj ) = a, set B[a] = 1. B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0 To check if y is in S, check B at Hi (y). All k values must be 1. B 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0 B 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0 Possible to have a false positive; all k values are 1, but y is not in S. n items m = cn bits k hash functions
False positive Probability Pr(specific bit of filter is 0)is p'=(1-l/m)m≈em/m=p If p is fraction of 0 bits in the filter then false positive probability is (1-p)≈(1-p')≈(1-p)*=(1-ek1c) Approximations valid as p is concentrated around E[p]. -Martingale argument suffices. Find optimal at k =(In 2)m/n by calculus. -So optimal fpp is about (0.6185)/m n items m cn bits k hash functions 4
4 False Positive Probability Pr(specific bit of filter is 0) is If ρ is fraction of 0 bits in the filter then false positive probability is Approximations valid as ρ is concentrated around E[ρ]. ─ Martingale argument suffices. Find optimal at k = (ln 2)m/n by calculus. ─ So optimal fpp is about (0.6185)m/n k k k k c k (1 ) (1 p') (1 p) (1 e ) − / − ρ ≈ − ≈ − = − n items m = cn bits k hash functions p m p kn kn m ≡ − ≈ ≡ − / ' (1 1/ ) e
Example 0.1 0.09 0.08 0EI 0.07 m/n=8 0.06 0.05 0ptk=8ln2=5.45.… 0.04 0.03 0.02 0.01 0 0 12345678 910 Hash functions n items m cn bits k hash functions 5
5 Example 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 1 2 3 4 5 6 7 8 9 10 Hash functions False positive rate m/n = 8 Opt k = 8 ln 2 = 5.45... n items m = cn bits k hash functions