DYNAMIC INFERENCE IN PROBABILISTIC GRAPHICAL MODELS WEIMING FENG.KUN HE.XIAOMING SUN.AND YITONG YIN ABSTRACT.Probabilistic graphical models,such as Markov random fields(MRFs),are useful for describ- ing high-dimensional distributions in terms of local dependence structures.The probabilistic inference is a fundamental problem related to graphical models,and sampling is a main approach for the problem. In this paper,we study the probabilistic inference problems when the graphical model itself is changing dynamically with time.Such dynamic inference problems arise naturally in today's application,e.g.mul- tivariate time-series data and practical learning procedures. We give a dynamic algorithm for sampling-based probabilistic inferences in MRFs,where each dy- namic update can change the underlying graph and all parameters of the MRF simultaneously,as long as the total amount of changes is bounded.Suppose that the MRF has n variables and bounded maximum degree,and N(n)independent samples are sufficient for the inference for a polynomial function N(). Our algorithm dynamically maintains an answer to the inference problem using O(nN(n))space cost, and O(N(n)+n)incremental time cost upon each update to the MRF,as long as the Dobrushin-Shlosman condition is satisfied by the MRFs.This well-known condition has long been used for guaranteeing the efficiency of Markov chain Monte Carlo(MCMC)sampling in the traditional static setting.Compared to the static case,which requires (nN(n))time cost for redrawing all N(n)samples whenever the MRF changes,our dynamic algorithm gives a significant Q(minin,N(n)))-factor speedup.Our approach relies on a novel dynamic sampling technique,which transforms traditional local Markov chains(a.k.a.single- site dynamics)to dynamic sampling algorithms,in a scenario where all parameters of the graphical model are subject to simultaneous update at each time. (Weiming Feng,Yitong Yin)STATE KEY LABORATORY FOR NOVEL SOFTWARE TECHNOLOGY,NANJING UNIVERSITY.E-mail: fengwm@smail.nju.edu.cn,yinyt@nju.edu.cn (Kun He)SHENZHEN UNIVERSITY:SHENZHEN INSTITUTE OF COMPUTING SCIENCES.E-mail:hekun.threebodyofoxmail. com (Xiaoming Sun)CAS KEY LAB OF NETWORK DATA SCIENCE AND TECHNOLOGY,INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES.E-mail:sunxiaoming@ict.ac.cn This research is supported by the National Key R&D Program of China 2018YFB1003202 and the National Science Foun- dation of China under Grant Nos.61722207 and 61672275
DYNAMIC INFERENCE IN PROBABILISTIC GRAPHICAL MODELS WEIMING FENG, KUN HE, XIAOMING SUN, AND YITONG YIN Abstract. Probabilistic graphical models, such as Markov random fields (MRFs), are useful for describing high-dimensional distributions in terms of local dependence structures. The probabilistic inference is a fundamental problem related to graphical models, and sampling is a main approach for the problem. In this paper, we study the probabilistic inference problems when the graphical model itself is changing dynamically with time. Such dynamic inference problems arise naturally in today’s application, e.g. multivariate time-series data and practical learning procedures. We give a dynamic algorithm for sampling-based probabilistic inferences in MRFs, where each dynamic update can change the underlying graph and all parameters of the MRF simultaneously, as long as the total amount of changes is bounded. Suppose that the MRF has n variables and bounded maximum degree, and N(n) independent samples are sufficient for the inference for a polynomial function N(·). Our algorithm dynamically maintains an answer to the inference problem using Oe(nN(n)) space cost, and Oe(N(n)+n) incremental time cost upon each update to the MRF, as long as the Dobrushin-Shlosman condition is satisfied by the MRFs. This well-known condition has long been used for guaranteeing the efficiency of Markov chain Monte Carlo (MCMC) sampling in the traditional static setting. Compared to the static case, which requires Ω(nN(n)) time cost for redrawing all N(n) samples whenever the MRF changes, our dynamic algorithm gives a significant Ωe(min{n, N(n)})-factor speedup. Our approach relies on a novel dynamic sampling technique, which transforms traditional local Markov chains (a.k.a. singlesite dynamics) to dynamic sampling algorithms, in a scenario where all parameters of the graphical model are subject to simultaneous update at each time. (Weiming Feng, Yitong Yin) State Key Laboratory for Novel Software Technology, Nanjing University. E-mail: fengwm@smail.nju.edu.cn,yinyt@nju.edu.cn (Kun He) Shenzhen University; Shenzhen Institute of Computing Sciences. E-mail: hekun.threebody@foxmail. com (Xiaoming Sun) CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences. E-mail: sunxiaoming@ict.ac.cn This research is supported by the National Key R&D Program of China 2018YFB1003202 and the National Science Foundation of China under Grant Nos. 61722207 and 61672275
CONTENTS 1 Introduction············· 1.1 Our results........ 1 1.2 Related work 44444.4.4。 2 1.3 Organization of the paper..··. 2 Dynamic inference problem 3 2.1 Markov random fields... 3 2.2 Probabilistic inference and sampling 3 2.3 Dynamic inference problem ........................ x 3 Main results 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 5 4 Preliminaries 4.4 6 5 Outlines of algorithm 8 6 Dynamic Gibbs sampling......... 9 6.1 Coupling for dynamic instances.......... 10 6.2 Data structure for Gibbs sampling 17 6.3 Single--sample dynamic Gibbs sampling algorithm.·.··....... 6.4 Multi-sample dynamic Gibbs sampling algorithm 21 7 Proofs for dynamic Gibbs sampling..····..············ 24 7.1 Analysis of the couplings...·.·················· 24 7.2 Implementation of the algorithms............................. 29 7.3 Dynamic Gibbs sampling for specific models·······.··.··········· 30 8 Proofs for dynamic inference 37 8.1 Proof of the main theorem..··.······.·············· 37 8.2 Dynamic inference on specific models 37 9 Conclusion············· 38 References.....·.········· 38
Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Our results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Organization of the paper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Dynamic inference problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Markov random fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Probabilistic inference and sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Dynamic inference problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 5 Outlines of algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 6 Dynamic Gibbs sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6.1 Coupling for dynamic instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 6.2 Data structure for Gibbs sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.3 Single-sample dynamic Gibbs sampling algorithm . . . . . . . . . . . . . . . . . . . . 18 6.4 Multi-sample dynamic Gibbs sampling algorithm . . . . . . . . . . . . . . . . . . . . 21 7 Proofs for dynamic Gibbs sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.1 Analysis of the couplings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.2 Implementation of the algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 7.3 Dynamic Gibbs sampling for specific models . . . . . . . . . . . . . . . . . . . . . . 30 8 Proofs for dynamic inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 8.1 Proof of the main theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 8.2 Dynamic inference on specific models . . . . . . . . . . . . . . . . . . . . . . . . . . 37 9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.INTRODUCTION The probabilistic graphical models provide a rich language for describing high-dimensional dis- tributions in terms of the dependence structures between random variables.The Markov random filed (MRF)is a basic graphical model that encodes pairwise interactions of complex systems.Given a graph G=(V,E),each vertex v E V is associated with a function:O-R,called the vertex potential,on a finite domain O=g of g spin states,and each edge eeE is associated with a symmetric function e:O2R,called the edge potential,which describes a pairwise interaction.Together,these induce a probability distribution u over all configurations o ev: o)ex(H)》=ep(∑p(o)+ pe(ou,ou)川 uev e=fu,U)EE This distribution u is known as the Gibbs distribution and H(o)is the Hamiltonian.It arises naturally from various physical models,statistics or learning problems,and combinatorial problems in computer science [MM09,KFB09]. The probabilistic inference is one of the most fundamental computational problems in graphical model.Some basic inference problems ask to calculate the marginal distribution,conditional distri- bution,or maximum-a-posteriori probabilities of one or several random variables [WJ08].Sampling is perhaps the most widely used approach for probabilistic inference.Given a graphical model,inde- pendent samples are drawn from the Gibbs distribution and certain statistics are computed using the samples to give estimates for the inferred quantity.For most typical inference problems,such statistics are easy to compute once the samples are given,for instance,for estimating the marginal distribution on a variable subset S,the statistics is the frequency of each configuration in Os among the samples, thus the cost for inference is dominated by the cost for generating random samples [JVV86,SVV09]. The classic probabilistic inference assumes a static setting,where the input graphical model is fixed. In today's application,dynamically changing graphical models naturally arise in many scenarios.In various practical algorithms for learning graphical models,e.g.the contrastive divergence algorithm for leaning the restricted Boltzmann machine [Hin12]and the iterative proportional fitting algorithm for maximum likelihood estimation of graphical models [WJ08],the optimal model I*is obtained by updating the parameters of the graphical model iteratively(usually by gradient descent),which gen- erates a sequence of graphical models I,12,...,IM,with the goal that IM is a good approximation of I'.Also in the study of the multivariate time-series data,the dynamic Gaussian graphical mod- els [CW*07],multiregression dynamic model [OS93],dynamic graphical model [OS92],and dynamic chain graph models [AQ+17],are all dynamically changing graphical models and have been used in a variety of applications.Meanwhile,with the advent of Big Data,scalable machine learning systems need to deal with continuously evolving graphical models(see e.g.[RKD+19]and [SWA09]). The theoretical studies of probabilistic inference in dynamically changing graphical models are lack- ing.In the aforementioned scenarios in practice,it is common that a sequence of graphical models is presented with time,where any two consecutive graphical models can differ from each other in all potentials but by a small total amount.Recomputing the inference problem from scratch at every time when the graphical model is changed,can give correct solution,but is very wasteful.A fundamental question is whether probabilsitic inference can be solved dynamically and efficiently. In this paper,we study the problem of probabilistic inference in an MRF when the MRF itself is changing dynamically with time.At each time,the whole graphical model,including all vertices and edges as well as their potentials,are subject to changes.Such non-local updates are very general and cover all applications mentioned above.The problem of dynamic inference then asks to maintain a correct answer to the inference in a dynamically changing MRF with low incremental cost proportional to the amount of changes made to the graphical model at each time. 1.1.Our results.We give a dynamic algorithm for sampling-based probabilistic inferences.Given an MRF instance with n vertices,suppose that N(n)independent samples are sufficient to give an approximate solution to the inference problem,where N:N+-N+is a polynomial function.We give dynamic algorithms for general inference problems on dynamically changing MRF
1. Introduction The probabilistic graphical models provide a rich language for describing high-dimensional distributions in terms of the dependence structures between random variables. The Markov random filed (MRF) is a basic graphical model that encodes pairwise interactions of complex systems. Given a graph G = (V, E), each vertex v ∈ V is associated with a function φv : Q → R, called the vertex potential, on a finite domain Q = [q] of q spin states, and each edge e ∈ E is associated with a symmetric function φe : Q 2 → R, called the edge potential, which describes a pairwise interaction. Together, these induce a probability distribution µ over all configurations σ ∈ Q V : µ(σ) ∝ exp(H(σ)) = exp Õ v ∈V φv (σv ) + Õ e={u,v }∈E φe (σu, σv ) . This distribution µ is known as the Gibbs distribution and H(σ) is the Hamiltonian. It arises naturally from various physical models, statistics or learning problems, and combinatorial problems in computer science [MM09, KFB09]. The probabilistic inference is one of the most fundamental computational problems in graphical model. Some basic inference problems ask to calculate the marginal distribution, conditional distribution, or maximum-a-posteriori probabilities of one or several random variables [WJ08]. Sampling is perhaps the most widely used approach for probabilistic inference. Given a graphical model, independent samples are drawn from the Gibbs distribution and certain statistics are computed using the samples to give estimates for the inferred quantity. For most typical inference problems, such statistics are easy to compute once the samples are given, for instance, for estimating the marginal distribution on a variable subset S, the statistics is the frequency of each configuration in Q S among the samples, thus the cost for inference is dominated by the cost for generating random samples [JVV86, ŠVV09]. The classic probabilistic inference assumes a static setting, where the input graphical model is fixed. In today’s application, dynamically changing graphical models naturally arise in many scenarios. In various practical algorithms for learning graphical models, e.g. the contrastive divergence algorithm for leaning the restricted Boltzmann machine [Hin12] and the iterative proportional fitting algorithm for maximum likelihood estimation of graphical models [WJ08], the optimal model I ∗ is obtained by updating the parameters of the graphical model iteratively (usually by gradient descent), which generates a sequence of graphical models I1, I2, · · · , IM , with the goal that IM is a good approximation of I ∗ . Also in the study of the multivariate time-series data, the dynamic Gaussian graphical models [CW+ 07], multiregression dynamic model [QS93], dynamic graphical model [QS92], and dynamic chain graph models [AQ+ 17], are all dynamically changing graphical models and have been used in a variety of applications. Meanwhile, with the advent of Big Data, scalable machine learning systems need to deal with continuously evolving graphical models (see e.g. [RKD+ 19] and [SWA09]). The theoretical studies of probabilistic inference in dynamically changing graphical models are lacking. In the aforementioned scenarios in practice, it is common that a sequence of graphical models is presented with time, where any two consecutive graphical models can differ from each other in all potentials but by a small total amount. Recomputing the inference problem from scratch at every time when the graphical model is changed, can give correct solution, but is very wasteful. A fundamental question is whether probabilsitic inference can be solved dynamically and efficiently. In this paper, we study the problem of probabilistic inference in an MRF when the MRF itself is changing dynamically with time. At each time, the whole graphical model, including all vertices and edges as well as their potentials, are subject to changes. Such non-local updates are very general and cover all applications mentioned above. The problem of dynamic inference then asks to maintain a correct answer to the inference in a dynamically changing MRF with low incremental cost proportional to the amount of changes made to the graphical model at each time. 1.1. Our results. We give a dynamic algorithm for sampling-based probabilistic inferences. Given an MRF instance with n vertices, suppose that N(n) independent samples are sufficient to give an approximate solution to the inference problem, where N : N + → N + is a polynomial function. We give dynamic algorithms for general inference problems on dynamically changing MRF. 1
Suppose that the current MRF has n vertices and bounded maximum degree,and each update to the MRF may change the underlying graph or all vertex/edge potentials,as long as the total amount of changes is bounded.Our algorithm maintains an approximate solution to the inference with O(nN(n)) space cost,and with O(N(n)+n)incremental time cost upon each update,assuming that the MRFs sat- isfy the Dobrushin-Shlosman condition [DS85a,DS85b,DS87].The condition has been widely used to imply the efficiency of Markov chain Monte Carlo(MCMC)sampling(e.g.see [Hay06,DGJ08]).Com- pared to the static algorithm,which requires (nN(n))time for redrawing all N(n)samples each time, our dynamic algorithm significantly improves the time cost with an (min{n,N(n)})-factor speedup. On specific models,the Dobrushin-Shlosman condition has been established in the literature,which directly gives us following efficient dynamic inference algorithms,with O(nN(n))space cost and O(N(n)+n)time cost per update,on graphs with n vertices and maximum degree A=O(1): for Ising model with temperaturesatisfyinge1-which is close to the uniqueness thresholde=1-,beyond which the static versions of sampling or marginal inference problem for anti-ferromagnetic Ising model is intractable [GSV16,GSV15]; for hardcore model with fugacity A<,which matches the best bound known for sam- pling algorithm with near-linear running time on general graphs with bounded maximum de- gree [Vig99,LV99,DG00]; for proper g-coloring with g 2A,which matches the best bound known for sampling algo- rithm with near-linear running time on general graphs with bounded maximum degree [Jer95]. Our dynamic inference algorithm is based on a dynamic sampling algorithm,which efficiently main- tains N(n)independent samples for the current MRF while the MRF is subject to changes.More specif- ically,we give a dynamic version of the Gibbs sampling algorithm,a local Markov chain for sampling from the Gibbs distribution that has been studied extensively.Our techniques are based on:(1)cou- plings for dynamic instances of graphical models;and(2)dynamic data structures for representing single-site Markov chains so that the couplings can be realized algorithmically in sub-linear time.Both these techniques are of independent interests,and can be naturally extended to more general settings with multi-body interactions. Our results show that on dynamically changing graphical models,sampling-based probabilistic in- ferences can be solved significantly faster than rerunning the static algorithm at each time.This has practical significance in speeding up the iterative procedures for learning graphical models. 1.2.Related work.The problem of dynamic sampling from graphical models was introduced very recently in [FVY19].There,a dynamic sampling algorithm was given for sampling from graphical models with soft constraints,and can only deal with local updates that change a single vertex or edge at each time.And the regimes for such dynamic sampling algorithm to be efficient with small incre- mental cost,are much more restrictive than the conditions for the rapid mixing of Markov chains. Our algorithm greatly improves the regimes for efficient dynamic sampling for the Ising and hardcore models in [FVY19],and for the first time,can handle non-local updates that change all vertex/edge potentials simultaneously.Besides,the dynamic/online sampling from log-concave distributions was also studied in [NR17,LMV19]. Another related topic is the dynamic graph problems,which ask to maintain a solution (e.g.span- ners [FG19,NSWN17,WN17]or shortest paths [BC16,HKN16,HKN14])while the input graph is dy- namically changing.More recently,important progresses have been made on dynamically maintaining structures that are related to graph random walks,such as spectral sparsifier [DGGP19,ADK16]or effective resistances [DGGP18,GHP18].Instead of one particular solution,dynamic inference prob- lems ask to maintain a statistics of an exponential-sized probability space described by a dynamically changing graphical model. 1.3.Organization of the paper.In Section 2,we formally introduce the dynamic inference problem In Section 3,we formally state the main results.Preliminaries are given in Section 4.In Section 5,we outline our dynamic inference algorithm.In Section 6,we present the algorithms for dynamic Gibbs sampling.The analyses of these dynamic sampling algorithms are given in Section 7.The proof of the main theorem on dynamic inference is given in Section 8.The conclusion is given in Section 9. 2
Suppose that the current MRF has n vertices and bounded maximum degree, and each update to the MRF may change the underlying graph or all vertex/edge potentials, as long as the total amount of changes is bounded. Our algorithm maintains an approximate solution to the inference with Oe(nN(n)) space cost, and with Oe(N(n)+n) incremental time cost upon each update, assuming that the MRFs satisfy the Dobrushin-Shlosman condition [DS85a, DS85b, DS87]. The condition has been widely used to imply the efficiency of Markov chain Monte Carlo (MCMC) sampling (e.g. see [Hay06, DGJ08]). Compared to the static algorithm, which requires Ω(nN(n)) time for redrawing all N(n) samples each time, our dynamic algorithm significantly improves the time cost with an Ωe(min{n, N(n)})-factor speedup. On specific models, the Dobrushin-Shlosman condition has been established in the literature, which directly gives us following efficient dynamic inference algorithms, with Oe(nN(n)) space cost and Oe(N(n) + n) time cost per update, on graphs with n vertices and maximum degree ∆ = O(1): • for Ising model with temperature β satisfying e −2|β | > 1− 2 ∆+1 , which is close to the uniqueness threshold e −2|βc | = 1 − 2 ∆ , beyond which the static versions of sampling or marginal inference problem for anti-ferromagnetic Ising model is intractable [GŠV16, GŠV15]; • for hardcore model with fugacity λ < 2 ∆−2 , which matches the best bound known for sampling algorithm with near-linear running time on general graphs with bounded maximum degree [Vig99, LV99, DG00]; • for proper q-coloring with q > 2∆, which matches the best bound known for sampling algorithm with near-linear running time on general graphs with bounded maximum degree [Jer95]. Our dynamic inference algorithm is based on a dynamic sampling algorithm, which efficiently maintains N(n) independent samples for the current MRF while the MRF is subject to changes. More specifically, we give a dynamic version of the Gibbs sampling algorithm, a local Markov chain for sampling from the Gibbs distribution that has been studied extensively. Our techniques are based on: (1) couplings for dynamic instances of graphical models; and (2) dynamic data structures for representing single-site Markov chains so that the couplings can be realized algorithmically in sub-linear time. Both these techniques are of independent interests, and can be naturally extended to more general settings with multi-body interactions. Our results show that on dynamically changing graphical models, sampling-based probabilistic inferences can be solved significantly faster than rerunning the static algorithm at each time. This has practical significance in speeding up the iterative procedures for learning graphical models. 1.2. Related work. The problem of dynamic sampling from graphical models was introduced very recently in [FVY19]. There, a dynamic sampling algorithm was given for sampling from graphical models with soft constraints, and can only deal with local updates that change a single vertex or edge at each time. And the regimes for such dynamic sampling algorithm to be efficient with small incremental cost, are much more restrictive than the conditions for the rapid mixing of Markov chains. Our algorithm greatly improves the regimes for efficient dynamic sampling for the Ising and hardcore models in [FVY19], and for the first time, can handle non-local updates that change all vertex/edge potentials simultaneously. Besides, the dynamic/online sampling from log-concave distributions was also studied in [NR17, LMV19]. Another related topic is the dynamic graph problems, which ask to maintain a solution (e.g. spanners [FG19, NSWN17, WN17] or shortest paths [BC16, HKN16, HKN14]) while the input graph is dynamically changing. More recently, important progresses have been made on dynamically maintaining structures that are related to graph random walks, such as spectral sparsifier [DGGP19, ADK+ 16] or effective resistances [DGGP18, GHP18]. Instead of one particular solution, dynamic inference problems ask to maintain a statistics of an exponential-sized probability space described by a dynamically changing graphical model. 1.3. Organization of the paper. In Section 2, we formally introduce the dynamic inference problem. In Section 3, we formally state the main results. Preliminaries are given in Section 4. In Section 5, we outline our dynamic inference algorithm. In Section 6, we present the algorithms for dynamic Gibbs sampling. The analyses of these dynamic sampling algorithms are given in Section 7. The proof of the main theorem on dynamic inference is given in Section 8. The conclusion is given in Section 9. 2
2.DYNAMIC INFERENCE PROBLEM 2.1.Markov random fields.An instance of Markov random field (MRF)is specified by a tuple I (V,E,)where G =(V,E)is an undirected simple graph;O is a domain of g =|Ol spin states,for some finite q>1;and =(a)aevuE associates each vEV a vertex potential o:O-R and each eE E an edge potential e:Q2-R,where de is symmetric. A configuration aov maps each vertex vV to a spin state in Q.so that each vertex can be interpreted as a variable.And the Hamiltonian of a configuration oEQ is defined as: H(o)±∑(o,)+∑pe(oo vEV e=(u,v}EE This defines the Gibbs distribution r,which is a probability distribution over such that 1 oeQ,r(o)=exp(H()》 where the normalizing factor Zov exp(H())is called the partition function. The Gibbs measure u(o)can be 0 as the functions e can take the value -oo.A configuration o is called feasible if u(o)>0.To trivialize the problem of constructing a feasible configuration,we further assume the following natural condition for the MRF instances considered in this paper:1 (1) VvEV,Vo eOG(): ∑exp(c)+∑pu(oucl >0 ceO where IG(v)uV I fu,v}EE}denotes the neighborhood of v in graph G=(V,E). Some well studied typical MRFs indclude: .Ising model:The domain of each spin is O={-1,+1).Each edge eE is associated with a temperature Be R;and each vertex v E V is associated with a local field hoER.For each configuration o∈{-l,+1}',μr(o)cexp(∑fu,UYEE BeGuGu+∑vev huOu): Hardcore model:The domain isO=(0,1).Each configurationo OV indicates an independent set inG=(V,E),and ur(o)o alloll,where>0 is the fugacity parameter. proper g-coloring:uniform distribution over all proper q-colorings of G =(V,E). 2.2.Probabilistic inference and sampling.In graphical models,the task of probabilistic inference is to derive the probabilities regarding one or more random variables of the model.Abstractly,this is described by a function RK that maps each graphical model IE to a target K-dimensional probability vector,where i is the class of graphical models containing the random variables we are interested in and the K-dimensional vector describes the probabilities we want to derive.Given 0() and an MRF instance Ie the inference problem asks to estimate the probability vector e(). Here are some fundamental inference problems [WJ08]for MRF instances.Let I =(V,E,O,)be a MRF instance and A.B C V two disjoint sets where A B C V. Marginal inference:estimate the marginal distribution uA.r()of the variables in A,where VoA∈Q,HAI(oA)±∑HI(GA.T). t∈QA Posterior inference:given any TBB,estimate the posterior distribution A.r(rB)for the variables in A,where oA∈Q1,Ar(oA|tB)兰AUB,IoA,B) ∥B,I(tB) IThis condition guarantees that the marginal probabilities are always well-defined,and the problem of constructing a feasible configuration where r()>0,is trivial.The condition holds for all MRFs with soft constraints,or with hard constraints where there is a permissive spin,e.g.the hardcore model For MRFs with truly repulsive hard constraints such as proper g-coloring,the condition may translate to the condition g A +1 where A is the maximum degree of graph G, which is necessary for the irreducibility of local Markov chains for q-colorings. 3
2. Dynamic inference problem 2.1. Markov random fields. An instance of Markov random field (MRF) is specified by a tuple I = (V, E,Q, Φ), where G = (V, E) is an undirected simple graph; Q is a domain of q = |Q| spin states, for some finite q > 1; and Φ = (φa)a ∈V ∪E associates each v ∈ V a vertex potential φv : Q → R and each e ∈ E an edge potential φe : Q 2 → R, where φe is symmetric. A configuration σ ∈ Q V maps each vertex v ∈ V to a spin state in Q, so that each vertex can be interpreted as a variable. And the Hamiltonian of a configuration σ ∈ Q V is defined as: H(σ) ≜ Õ v ∈V φv (σv ) + Õ e={u,v }∈E φe (σu, σv ). This defines the Gibbs distribution µI, which is a probability distribution over Q V such that ∀σ ∈ Q V , µI(σ) = 1 Z exp(H(σ)), where the normalizing factor Z ≜ Í σ ∈QV exp(H(σ)) is called the partition function. The Gibbs measure µ(σ) can be 0 as the functions φv, φe can take the value −∞. A configuration σ is called feasible if µ(σ) > 0. To trivialize the problem of constructing a feasible configuration, we further assume the following natural condition for the MRF instances considered in this paper:1 ∀v ∈ V, ∀σ ∈ Q ΓG (v) : Õ c ∈Q exp φv (c) + Õ u ∈Γv φuv (σu,c) ! (1) > 0. where ΓG (v) ≜ {u ∈ V | {u,v} ∈ E} denotes the neighborhood of v in graph G = (V, E). Some well studied typical MRFs indclude: • Ising model: The domain of each spin is Q = {−1, +1}. Each edge e ∈ E is associated with a temperature βe ∈ R; and each vertex v ∈ V is associated with a local field hv ∈ R. For each configuration σ ∈ {−1, +1} V , µI(σ) ∝ exp Í {u,v }∈E βeσuσv + Í v ∈V hvσv . • Hardcore model: The domain isQ = {0, 1}. Each configuration σ ∈ Q V indicates an independent set in G = (V, E), and µI(σ) ∝ λ kσ k , where λ > 0 is the fugacity parameter. • proper q-coloring: uniform distribution over all proper q-colorings of G = (V, E). 2.2. Probabilistic inference and sampling. In graphical models, the task of probabilistic inference is to derive the probabilities regarding one or more random variables of the model. Abstractly, this is described by a function θ : M → R K that maps each graphical model I ∈ M to a target K-dimensional probability vector, where M is the class of graphical models containing the random variables we are interested in and the K-dimensional vector describes the probabilities we want to derive. Given θ(·) and an MRF instance I ∈ M, the inference problem asks to estimate the probability vector θ(I). Here are some fundamental inference problems [WJ08] for MRF instances. Let I = (V, E,Q, Φ) be a MRF instance and A, B ⊆ V two disjoint sets where A ] B ⊆ V . • Marginal inference: estimate the marginal distribution µA,I(·) of the variables in A, where ∀σA ∈ Q A , µA,I(σA) ≜ Õ τ ∈QV \A µI(σA, τ ). • Posterior inference: given any τB ∈ Q B , estimate the posterior distribution µA,I(· | τB) for the variables in A, where ∀σA ∈ Q A , µA,I(σA | τB) ≜ µA∪B,I(σA, τB) µB,I(τB) . 1This condition guarantees that the marginal probabilities are always well-defined, and the problem of constructing a feasible configuration σ, where µI(σ) > 0, is trivial. The condition holds for all MRFs with soft constraints, or with hard constraints where there is a permissive spin, e.g. the hardcore model. For MRFs with truly repulsive hard constraints such as proper q-coloring, the condition may translate to the condition q ≥ ∆ + 1 where ∆ is the maximum degree of graph G, which is necessary for the irreducibility of local Markov chains for q-colorings. 3