A Motivation: Distributed Machine Learning ●Data are stored in a distributed system. ·Sampling from a probabilistic graphical model (e.g.the Markov random field)by distributed algorithms
A Motivation: Distributed Machine Learning • Data are stored in a distributed system. • Sampling from a probabilistic graphical model (e.g. the Markov random field) by distributed algorithms
Glauber Dynamics starting from an arbitrary Xo E [g] G(V,E) transition for XX+1: pick a uniform random vertex v; resample X(v)according to the marginal distribution induced by u at vertex v conditioning on X(N(v)); marginal distribution: bu(c)ΠueN回A(u,(Xu,) Prxet()) MRF:o∈g', u(a) ΠAe(o,o)Πb(o) e=(u,v)EE uV stationary distribution:u mixing time::Tmix=max min{t|drv(Xt,))≤2e} Xo
Glauber Dynamics G(V,E): pick a uniform random vertex v; resample X(v) according to the marginal distribution induced by µ at vertex v conditioning on Xt(N(v)); starting from an arbitrary X0 ∈ [q]V transition for Xt → Xt+1 : marginal distribution: Pr[Xv = x | XN(v)] = bv(x) Q u2N(v) A(u,v)(Xu, x) P y2[q] bv(y) Q u2N(v) A(u,v)(Xu, y) Ae bv v µ() / Y e=(u,v)2E Ae(u, v) Y v2V bv(v) MRF: 8 2 [q] V , stationary distribution: µ mixing time: ⌧mix = max X0 min t | dTV(Xt, µ) 1 2e
Mixing of Glauber Dynamics influence matrix [Pv,uv,uEv: Pv.u:max discrepancy (in total variation distance)of marginal distributions at v caused by any pair o,t of boundary conditions that differ only at u Dobrushin's condition: contraction of one-step plo=ms∑peu≤1-e optimal coupling in the worst u∈V case w.r.t.Hamming distance Theorem (Dobrushin'70;Jerrum'95;Salas,Sokal'97): Dobrushin's Tmix =O(nlogn) condition for Glauber dynamics for g-coloring: Dobrushin's 92(2+ε)△ condition △=max-degree
Mixing of Glauber Dynamics for q-coloring: Dobrushin’s q≥(2+ε)Δ condition Δ = max-degree u v influence matrix {⇢v,u }v,u2V : ρv,u: max discrepancy (in total variation distance) of marginal distributions at v caused by any pair σ,τ of boundary conditions that differ only at u Dobrushin’s condition: k⇢k1 = max v2V X u2V ⇢v,u 1 ✏ contraction of one-step optimal coupling in the worst case w.r.t. Hamming distance Theorem (Dobrushin ’70; Jerrum ’95; Salas, Sokal ’97): Dobrushin’s condition for Glauber dynamics ⌧mix = O (n log n)
Parallelization Glauber dynamics: starting from an arbitrary Xo E [g] G(V,E): transition for XX+1: pick a uniform random vertex v; resample X(v)according to the marginal distribution induced by u at vertex v conditioning on Xi(N(v)); Parallelization: ● Chromatic scheduler [folklore][Gonzalez et al.,AISTAT'11]: Vertices in the same color class are updated in parallel. "Hogwild!"[Niu,Recht,Re,Wright,NIPS'11]De Sa,Olukotun,Re,ICML16]: All vertices are updated in parallel,ignoring concurrency issues
Parallelization G(V,E): v Glauber dynamics: Parallelization: • Chromatic scheduler [folklore] [Gonzalez et al., AISTAT’11]: Vertices in the same color class are updated in parallel. • “Hogwild!” [Niu, Recht, Ré, Wright, NIPS’11][De Sa, Olukotun, Ré, ICML’16]: All vertices are updated in parallel, ignoring concurrency issues. pick a uniform random vertex v; resample X(v) according to the marginal distribution induced by µ at vertex v conditioning on Xt(N(v)); starting from an arbitrary X0 ∈ [q]V transition for Xt → Xt+1 :
Warm-up:When Luby meets Glauber starting from an arbitrary Xo E [g] at each step,for each vertex vV: GV,E) independently sample a random Luby number B.∈[0,1]; step if B,is locally maximum among its neighborhood Nv): resample X(v)according to the Glauber step marginal distribution induced by u at vertex v conditioning on X(Nv)); Luby step:Independently sample a random independent set. Glauber step:For independent set vertices,update correctly according to the current marginal distributions. Stationary distribution:the Gibbs distribution u
Warm-up: When Luby meets Glauber G(V,E): resample X(v) according to the marginal distribution induced by µ at vertex v conditioning on Xt(N(v)); at each step, for each vertex v∈V: starting from an arbitrary X0 ∈ [q]V independently sample a random number βv∈[0,1]; if βv is locally maximum among its neighborhood N(v): Luby step Glauber step • Luby step: Independently sample a random independent set. • Glauber step: For independent set vertices, update correctly according to the current marginal distributions. • Stationary distribution: the Gibbs distribution µ