Birthday Paradox n balls are thrown into m bins: event each bin receives 1 balls Suppose that balls are thrown one-by-one: Pr[]=Pr[all n balls are thrown into ditinct bins] chain Pr[the ith ball is thrown into an empty bin rule =1 first i-1 balls are thrown into ditinct bins] (--i0-)
Birthday Paradox Pr[ℰ] = Pr[all n balls are thrown into ditinct bins] = n ∏ i=1 Pr[the ith ball is thrown into an empty bin ∣ first i − 1 balls are thrown into ditinct bins] Suppose that balls are thrown one-by-one: = n ∏ i=1 (1 − i − 1 m ) = n−1 ∏ i=0 (1 − i m) n balls are thrown into m bins: event ℰ: each bin receives ≤ 1 balls chain rule
Birthday Paradox n balls are thrown into m bins: event each bin receives 1 balls (Taylor:1-xe*forx=o(1)) PT61= (-)i。n i0 Formaly::≤i(l-月)setm2 (assuming n <m) whenn=2m in →Pr[8]=(1±o(1)p
Birthday Paradox Pr[ℰ] = n−1 ∏ i=0 (1 − i m) ≈ n−1 ∏ i=0 e− i m ≈ e−n2/2m e−(1+o(1))n2/2m ≤ n−1 ∏ i=0 (1 − i m) ≤ e−(1−o(1))n2/2m (Taylor: 1 − x ≈ e for ) −x x = o(1) Formally: (assuming n ≪ m) when n = 2m ln 1 p ⟹ Pr[ℰ] = (1 ± o(1))p n balls are thrown into m bins: event ℰ: each bin receives ≤ 1 balls
Birthday Paradox n balls are thrown into m bins: event each bin receives 1 balls exacl 0.8 ---approx P阁=计(-) 0.6 n=365 0.4 0.2 0 10 20 30 40 0 60 Formally:e-(1+0(1))n212m 0.02 0 (assuming n m) 0.02 10 2030 4050 60 m when n =12 → Pr[8]=(1±o(1)p
Birthday Paradox Pr[ℰ] = n−1 ∏ i=0 (1 − i m) ≈ n−1 ∏ i=0 e− i m ≈ e−n2/2m e−(1+o(1))n2/2m ≤ n−1 ∏ i=0 (1 − i m) ≤ e−(1−o(1))n2/2m (Taylor: 1 − x ≈ e for ) −x x = o(1) Formally: (assuming n ≪ m) when n = 2m ln 1 p ⟹ Pr[ℰ] = (1 ± o(1))p n balls are thrown into m bins: event ℰ: each bin receives ≤ 1 balls
Data Structure for Set Data:a set S of n itemsx,...,=[N] Query:an itemx∈UU Determine whether x E S. Space cost:size of data structure (in bits) entropy of a set:O(n log N)bits (when N>n) Time cost:time to answer a query (in memory accesses) Balanced tree:O(n log N space,O(log n)time Perfect hashing:O(n log N space,O(1)time
Data Structure for Set Data: a set of items Query: an item Determine whether . S n x1, x2, …, xn ∈ U = [N] x ∈ U x ∈ S • Space cost: size of data structure (in bits) • entropy of a set: O(n log N) bits (when N≫n) • Time cost: time to answer a query (in memory accesses) • Balanced tree: O(n log N) space, O(log n) time • Perfect hashing: O(n log N) space, O(1) time
Perfect Hashing S=fa,b,c,d,e,f}C [N]of size n uniform no collision random h [Ny→ml Pr[perfect]e-n22m 1/2 m n2 Table T: e b d ca Birthday SUHA:Simple Uniform Hash Assumption Query(x): retrieve hash function h; check whether TTh(x)]=x;
Perfect Hashing S = {a, b, c, d, e, f} ⊆ [N] of size n e b d f c a h Table T: m uniform random Pr[perfect] Birthday = n2 [N] ! [m] SUHA: Simple Uniform Hash Assumption no collision Query(x): retrieve hash function h; check whether T[h(x)] = x; ≈ e−n2/2m > 1/2