BETANCOURT ET AL. 1.2 Markov Kernels Induced From Measure-Preserving Maps Directly constructing a markoy kernel that targets w.let alone an efficient markov kernel,can be difficult.Instead of constructing a kernel directly,however,we can construct space into itself, t:Q→Q,te「, that each preserves the target measure, t,=可, where the pushforward measure,,is defined as (t,)(A)=(ot-1)(A),A∈B(Q). (C,9, which induces a Markov kernel by 9 ra,A)=fa14t(g》, where I is the indicator function, u{ggeQ4e@ In other words,the kernel assignsa probability toset,AB),by computing the measure of the preimage of that set,t(A),averaged over all isomorphisms in I.Because each t preserves the target measure,so too will their convolution and,consequently,the Markov transition induced by the kernel. This construction provides a new perspective on the limited performance of existing algorithms. 9→g+e(<fg+9) f(a E~N(0,) n~U0,1
6 BETANCOURT ET AL. 1.2 Markov Kernels Induced From Measure-Preserving Maps Directly constructing a Markov kernel that targets ̟, let alone an efficient Markov kernel, can be difficult. Instead of constructing a kernel directly, however, we can construct one indirectly be defining a family of measure-preserving maps (Petersen, 1989). Formally, let Γ be some space of continuous, bijective maps, or isomorphisms, from the space into itself, t : Q → Q, ∀t ∈ Γ, that each preserves the target measure, t∗̟ = ̟, where the pushforward measure, t∗̟, is defined as (t∗̟)(A) ≡ ̟ ◦ t −1 (A), ∀A ∈ B(Q). If we can define a σ-algebra, G, over this space then the choice of a distinguished measure over G, γ, defines a probability space, (Γ, G, γ), which induces a Markov kernel by (1) τ (q, A) ≡ Z Γ γ(dt)IA(t(q)), where I is the indicator function, IA(q) ∝ 0, q /∈ A 1, q ∈ A , q ∈ Q, A ∈ B(Q). In other words, the kernel assigns a probability to a set, A ∈ B(Q), by computing the measure of the preimage of that set, t −1 (A), averaged over all isomorphisms in Γ. Because each t preserves the target measure, so too will their convolution and, consequently, the Markov transition induced by the kernel. This construction provides a new perspective on the limited performance of existing algorithms. Example 1. We can consider Gaussian Random Walk Metropolis, for example, as being generated by random, independent translations of each point in the sample space, tǫ,η : q 7→ q + ǫ I η < f(q + ǫ) f(q) ǫ ∼ N (0, Σ) η ∼ U[0, 1]
THE GEOMETRIC FOUNDATIONS OF HAMILTONIAN MONTE CARLO 7 where f is the density of with respect to the lebesgue measure on R".When targeting complex distributions either cor the support of the indicator will be small and the resulting translations barely perturb the initial state. EXAMPLE 2.The random scan Gibbs sampler is induced by axis-aligned translations, tn:9→P() i~U1,,n} n~00,1刂. where is the cumulative distribution function of the ith conditional measure.When the target distribution is strongly correlated,the conditional measures concentrate near the initial q and,as above,the translations are stunted. ) are not li e sions (Okser mple,yield measure-preserving maps tha cross the entire target distribut ion.Unfortunately that diffusion tends to expand across the target measures only slowly(Figure 2):for any finite diffu on time the resulting Langevin kernels are localized around the initial point(Figure 3).What we need are more coherent maps that avoid such diffusive behavior One potential candidate for coherent maps are flows.A flow,is a family of isomor- phisms parameterized by a time,t, o:Q→Q,teR, that form a one-dimensional Lie group on composition, =s+t 1=6t where Ido is the natural identity map on Q.Because the inverse of a map is given only by negating t,as the time is increased the resulting pushes points away from their initial positions and avoids localized exploration (Figure 4).Our final obstacle is in engineering a flow comprised of measure-preserving maps. Flows are particularly natural on the smooth manifolds of differential geometry,and flows that preserve a given target measure can be engineered on one exceptional class of smooth manifolds known as symplectic manifolds.If we can understand these manifolds
THE GEOMETRIC FOUNDATIONS OF HAMILTONIAN MONTE CARLO 7 where f is the density of ̟ with respect to the Lebesgue measure on R n . When targeting complex distributions either ǫ or the support of the indicator will be small and the resulting translations barely perturb the initial state. Example 2. The random scan Gibbs sampler is induced by axis-aligned translations, ti,η : qi → P −1 i (η) i ∼ U{1, . . . , n} η ∼ U[0, 1] , where Pi(qi) = Z qi ∞ ̟(d˜qi |q) is the cumulative distribution function of the ith conditional measure. When the target distribution is strongly correlated, the conditional measures concentrate near the initial q and, as above, the translations are stunted. In order to define a Markov kernel that remains efficient in difficult problems we need measure-preserving maps whose domains are not limited to local exploration. Realizations of Langevin diffusions (Øksendal, 2003), for example, yield measure-preserving maps that diffuse across the entire target distribution. Unfortunately that diffusion tends to expand across the target measures only slowly (Figure 2): for any finite diffusion time the resulting Langevin kernels are localized around the initial point (Figure 3). What we need are more coherent maps that avoid such diffusive behavior. One potential candidate for coherent maps are flows. A flow, {φt}, is a family of isomorphisms parameterized by a time, t, φt : Q → Q, ∀t ∈ R, that form a one-dimensional Lie group on composition, φt ◦ φs = φs+t φ −1 t = φ−t φ0 = IdQ, where IdQ is the natural identity map on Q. Because the inverse of a map is given only by negating t, as the time is increased the resulting φt pushes points away from their initial positions and avoids localized exploration (Figure 4). Our final obstacle is in engineering a flow comprised of measure-preserving maps. Flows are particularly natural on the smooth manifolds of differential geometry, and flows that preserve a given target measure can be engineered on one exceptional class of smooth manifolds known as symplectic manifolds. If we can understand these manifolds
BETANCOURT ET AL. c2nomgameaahn极afmahoot I(Langevin wi T(Langevin with 10) (a) (b) (e) al point. en here for a Langevin diffusion targeting the tisted Gaussian distrib tion (Haario,Saksman and Tamminen,2001)
8 BETANCOURT ET AL. Fig 2. Langevin trajectories are, by construction, diffusive, and are just as likely to double back as they are to move forward. Consequently even as the diffusion time grows, here to t = 1000 as the trajectory darkens, realizations of a Langevin diffusion targeting the twisted Gaussian distribution (Haario, Saksman and Tamminen, 2001) only slowly wander away from the initial point. δq0 T (Langevin with t=1) (a) δq0 T (Langevin with t=10) (b) δq0 T (Langevin with t=100) (c) Fig 3. Because of the diffusive nature of the underlying maps, Langevin kernels expand very slowly with increasing diffusion time, t. For any reasonable diffusion time the resulting kernels will concentrate around the initial point, as seen here for a Langevin diffusion targeting the twisted Gaussian distribution (Haario, Saksman and Tamminen, 2001)
THE GEOMETRIC FOUNDATIONS OF HAMILTONIAN MONTE CARLO 9 Flow Diffusio cdbe eck on themce ueo probabilistically then we can take advantage of their properties to build Markov kernels with small autocorrelations for even the complex,high-dimensional target distributions of practical interest. 2.MEASURES ON MANIFOLDS In this section epreserving flows the foll uce int ferent etry up to e(2 e will also ion therein through the pa er.Fo rs nev in lear Bac and Muniain 1994 applications in Schutz(1980);J nd Saletan (1998 and then finally Lee(2013).The theory of symp ctic geometry in which we will be partic d Saletan (1998);Lee (2 13 with aing the most moc I thorou n coverage e of the su Smooth manifolds generalize the Euclidean space of real nd the corresponding calculus;in particular,a smooth manifold only look locally like a E clidea n space (Figures 5).This more general space includes Lie groups,Stiefel manifolds,and other spaces becoming common in contemporary applications (Byrne and Girolami,2013),not to mention regular Euclidean space as a special case.It does not,however,include any manifold with a discrete topology such as tree spaces. Formally,we assume that our sample space,satisfies the properties of a smooth, connected,and orientable n-dimensional manifold.Specifically we require that Q be a Hausdorff and second-countable topological space that is locally homeomorphic to R"and
THE GEOMETRIC FOUNDATIONS OF HAMILTONIAN MONTE CARLO 9 q t Flow Diffusion Fig 4. Because of the underlying group structure flows cannot double back on themselves like diffusions, forcing a coherent exploration of the target space. probabilistically then we can take advantage of their properties to build Markov kernels with small autocorrelations for even the complex, high-dimensional target distributions of practical interest. 2. MEASURES ON MANIFOLDS In this section we review probability measures on smooth manifolds of increasing sophistication, culminating in the construction of measure-preserving flows. Although we will relate each result to probabilistic theory and introduce intuition where we can, the formal details in the following require a working knowledge of differential geometry up to Lee (2013). We will also use the notation therein throughout the paper. For readers new to the subject but interested in learning more, we recommend the introduction in Baez and Muniain (1994), the applications in Schutz (1980); Jos´e and Saletan (1998), and then finally Lee (2013). The theory of symplectic geometry in which we will be particularly interested is reviewed in Schutz (1980); Jos´e and Saletan (1998); Lee (2013), with Cannas da Silva (2001) providing the most modern and thorough coverage of the subject. Smooth manifolds generalize the Euclidean space of real numbers and the corresponding calculus; in particular, a smooth manifold need only look locally like a Euclidean space (Figures 5). This more general space includes Lie groups, Stiefel manifolds, and other spaces becoming common in contemporary applications (Byrne and Girolami, 2013), not to mention regular Euclidean space as a special case. It does not, however, include any manifold with a discrete topology such as tree spaces. Formally, we assume that our sample space, Q, satisfies the properties of a smooth, connected, and orientable n-dimensional manifold. Specifically we require that Q be a Hausdorff and second-countable topological space that is locally homeomorphic to R n and
10 BETANCOURT ET AL u cQ (4)cR Q=SxR ucQ (lha)C R2 (a) (6) (c) FIG 5.(a)The cylinder,Q=Sx R.is a nontrivial erample of a manifold.Although not globally equivalent wherever the two neighborhoods intersect (the intersections here shoun in gray). equipped with a differential structure. fua:vahael' consisting of open neighborhoods in Q, ua cQ, and homeomorphic charts, a:la→yaCR”, that are smooth functions whenever their domains overlap(Figure 5), (R"),Va,B nug#0. Coordinates subordinate to a chart, q:4a→R where i is the ith Euclidean projection on the image of,provide local parameterizations of the manifold convenient for explicit calculations
10 BETANCOURT ET AL. Q = S 1 × R (a) U1 ⊂ Q U2 ⊂ Q (b) ψ1(U1) ⊂ R 2 ψ2(U2) ⊂ R 2 (c) Fig 5. (a) The cylinder, Q = S 1 ×R, is a nontrivial example of a manifold. Although not globally equivalent to a Euclidean space, (b) the cylinder can be covered in two neighborhoods (c) that are themselves isomorphic to an open neighborhood in R 2 . The manifold becomes smooth when ψ1 ◦ψ −1 2 : R 2 → R 2 is a smooth function wherever the two neighborhoods intersect (the intersections here shown in gray). equipped with a differential structure, {Uα, ψα}α∈I , consisting of open neighborhoods in Q, Uα ⊂ Q, and homeomorphic charts, ψα : Uα → Vα ⊂ R n , that are smooth functions whenever their domains overlap (Figure 5), ψβ ◦ ψ −1 α ∈ C ∞(R n ), ∀α, β | Uα ∩ Uβ 6= ∅. Coordinates subordinate to a chart, q i : Uα → R q → πi ◦ ψα, where πi is the ith Euclidean projection on the image of ψα, provide local parameterizations of the manifold convenient for explicit calculations